Using Keys with DNA
Cryptography has long used keys for encrypting and decrypting messages in ways that only the intended recipients can read the
original messages and in such a way that the encrypted messages can
be out in the open during any exchanges. The most basic forms of
secrets involve hushed voices and secret encoders and decoders that
you trade with certain people. Cryptography makes it possible to
send and receive secret information out in the open, to yell for all
the world to hear, and to communicate securely with people you
never met before in person. Undisclosed DNA brings these innovations to the world of DNA matching.
I Traditional Public and Private Keys
Pretend that Alice has an important message to deliver to Bob, but
Alice does not have a secure channel for delivering that message.
During delivery, other parties can see the message. If Alice and Bob
want the contents of that message to stay hidden from other parties,
they could use asymmetric cryptography
Asymmetry is a fundamental component within cryptography.
It can result in maths that is easy to perform in one direction but
difficult to do in the other direction. We can use this property to
create key pairs. Every private key has its public key and vice versa.
Bob creates a key pair so that other people can send private messages to him that no one else can read. He keeps the private key on
his own computer, but he gives out the public key to the world.
Someone such as Alice can take this public key and use it to encrypt
a message for Bob. The owner of the corresponding private key can
decrypt that message and read it
The message began clear and was simple to read. After encryption, it looks like gibberish to everyone. It is virtually impossible for
anyone to read that message’s contents without the corresponding
private key.
Everybody can see the public key, whereas nobody except Bob
will ever touch the private key. It is easy to confirm that a private key
is paired with a public key when you have both keys. If you only
have the public key, however, you would need billions of years to
figure out the private key.
2 Innovations in Key Usage
Traditionally, cryptographers saw their goal as performing encryption in a single step that was as perfect as can be. It would be easy on
your computer, unique for every user across space and time, seemingly random, impossible to reverse engineer – and all in one shot.
On the other hand, Undisclosed DNA takes the tools and principles of cryptography, but applies them with different means to different ends. We use a person’s DNA sequences – let’s say Bob’s –
to construct an encryption key
At this point, you would think that we would use the same person’s DNA to make the matching private key. It doesn’t help any2
one, however, to make a message that only the same person can read.
The usage of DNA would just be an arduous middle step, and the
results would not as powerful as those in the cryptography we use
in the internet today
First, Bob does not create a public encryption key with the goal
of giving a copy to everyone. Bob writes a message and encrypts a
message with the public key. He could then decrypt his own message with a decryption key (a private key) made from his own DNA.
But what’s the point?
We pivot from the idea of linking private keys and public keys.
Instead, many people can create private keys from their own DNA.
Then, each person can try to use her own private key to try to decrypt the message encrypted with a key based on Bob’s. The message itself is not as important as the ability to decrypt it successfully.
If a decryption key made from your DNA can read the secret message, then your DNA is similar to Bob’s. I.e., you and Bob are blood
relatives.
3 Matching Keys to Proteins
In order to perform cryptographic functions, all DNA must become numbers. Unlike cryptography that would construct a new
private key randomly, we take the totals of the four base proteins in
your mitochondrial DNA and each of your two X chromosomes
(in the case of biological females) or your mitochondrial DNA in
your X and Y chromosomes (biological males).
The more closely that two people are related (as in siblings versus third cousins), the more similar their DNA will be. Each person
will possess similar amounts of each of the four base pairs of GC,
CG, AT, and TA. This results in keys that decrypt similar content.
If you can use your DNA to create a key that successfully decrypts a message encrypted by a key derived from Alice’s DNA but
not one encrypted by a key made from Bob’s DNA, then you are related to Alice but not Bob. If you can decrypt a message from Bob’s
DNA but not Alice’s, then you are related to Bob. If you can decrypt both messages, then – surprise – you, Alice, and Bob are all
related.
4 Hiding the Connections
A simple illustration of the double helix of DNA, the production of
mRNA, and key creation for Undisclosed DNA can do a disservice.
For cellular functions, the matching of complementary pairs reveals
what the original versions were. A bit of messenger RNA that is GA-C-C came from C-T-G-G, which is not exactly a secret.
The private keys of Undisclosed DNA are not simple complements, however. We use a concept in mathematical logic that computer scientists of all stripes, and especially cryptographers, have
employed: exclusive or (XOR). Cryptographers used this in an interesting way to check data, the HMAC. XOR also allows for obscuring data. In either case, we need to differentiate it from other
uses of ‘or’.
An ‘inclusive or’ question could be ‘Will we have juice or punch
to drink at the picnic?’ with the answer of ‘Yes, we are taking a few
flavors – orange, grape, and apple.’ As long as the picnic will serve
fruit juice or fruit punch, we can reply in the affirmative. The story
is the same with a daydreamer: ‘Someday, I want to visit Jamaica or
Barbados, somewhere tropical.’ She would not be disappointed if
she won a travel package to tour the Caribbean for a month and
stayed multiple nights on both islands.
In contrast, we see ‘exclusive or’ questions with directions.
‘When we reach the trailhead, do we turn left or right?’ A smartalec could say, ‘Yes, you have to turn. Going straight is not an option. It is a fork in the path.’ Normally, though, you would interpret
this query to mean ‘Should I walk left, or should I walk right?’ We
know that we can only go left or go right.
A computer could assign a value to left and then to right. In
binary, a 0 is yes, and a 1 is no. If the correct trail is to the left, the
readout is 0,1. If we should go right, we have 1,0. The impossible
choice to go left and right at the same time is 1,1. The equally impossible choice to go neither left nor right is 0,0. If the silly scenarios
are 0,0 and 1,1, then we can say that each is false.
Let’s translate the concepts of true and false into binary as well.
We end up with 1,0=1, 0,1=1, 0,0=0, 1,1=0, and we now have
XOR. If we look at the final result of 1, all we know is that our
hiking guide told us to go left at the trailhead or she instructed us
to head right. If we look at the final result of 0, then we only know
5
that our hiking guide is not making any sense because she just told
us to go left and right at the same time or told us to do neither and
instead to fly upward. Similarly, the private keys that we make from
DNA cannot be reversed.
5 Genetic Drift
The ability to decrypt a message may seem to be black and white.
After all, the cryptography we see in PGP email, TLS for websites,
or the Signal Protocol in Signal or WhatsApp works that way. You
either have the exact key you need to decrypt a message or you don’t.
An American expression reminds us that ‘close’ only counts with
the game of horseshoes and with grenades in war
Undisclosed DNA breaks from traditional applications of
cryptography to allow for multiple possibilities to decrypt a message. If you have the exact same DNA, and therefore the same key,
as someone who encrypted a message under the methods of Undisclosed DNA, then you can decrypt the message. If you only share
DNA that is similar but not identical – maybe you are siblings –
then you can still successfully perform cryptography. Someone may
want to reach out further and tweak the message encryption process
so that second or third cousins can also be matched. Undisclosed
DNA lets you do that.
To connect relatives and indirectly measure the distance of that
relation, we look at genetic drift. All humans are at least distant relatives. Importantly for genetic diversity, the connections are very slight. A person in Beijing may share a common ancestor with
someone in Warsaw, but it may have been a thousand years ago.
Also, every new plant, animal, bacterium, and human will have
some mutations.
These very tiny mutations accelerate the differentiation between people. As these tiny changes accumulate and kinship ties
become more distant, we can say that genetic drift has also grown.
Two people who exhibit very little genetic drift will be able to decrypt messages encrypted by the other.
Reaching one’s own child seems straightforward, but we can
apply the matching methods much more widely with the “volume”
of a message. I.e., we can decide how loud that call is. Without a volume control, you would only have a simple binary of “close family
versus random stranger”, and that has relatively few use cases.
Let’s say you want to find cousins or a lost relative when you
do not have access to the DNA of that person’s parents. Maybe
you just want to see who is out there. With Undisclosed DNA
you may call out loudly whilst maintaining the secrecy that you get
from whispering.
You can adjust the encryption of a message in such a way that
only your daughter could read it. Alternatively, you can modify
your message so that anyone who is a second cousin or closer can
decrypt it successfully.
Not only does this method preserve the privacy of everyone’s
DNA and the messages themselves, it also allows for casting a wide net. Before now, the typical use cases from DNA matching were
paternity tests or matching crime scene DNA to suspects. In those
cases, you had the people in question.
Another interesting application is to see what relatives even exist and in what number. If you have your messages only allow for
matching with close relatives, a man may find that he in fact has
one child in Barcelona. Maybe a woman he briefly dated there had
become pregnant. A woman who was taken in from abroad as a
war orphan may want to find relatives decades later. To her surprise, when she adjusts her encryption to allow decryption by second cousins, she finds that a dozen of her relatives are still alive back
in the country of her birth.
These wonderful possibilities also presented a dilemma, however. You would have to be willing to forgo your privacy and completely trust some companywith your complete DNA. From a practical standpoint, this workflow made finding connections very difficult. If the cousins of our woman in the example chose to not give
up their DNA to a matching service, then they would never have
been found. At this point, what is the solution?
Many people will understandably oppose a global database of
everyone’s complete DNA to which you upload your DNA and let
the computer run a search on every other person’s DNA to look
for commonalities. With Undisclosed DNA, we can exchange messages with a great many people but without compromising anyone’s DNA.
6 Tagging
Every message that gets sent in order to both hide the general DNA
profile of someone and to render unique every message “sender”,
we calculate a hash code on the overall sequence. No one can reverse this value into the original DNA sequence because an infinite
number of DNA profiles could match a given hash value. The hash
value, however, is long enough so that we should never see an accidental match between any two of the eight billion people on earth
Twins, triplets, quadruplets – identical siblings share all their
DNA. As a result, any mathematical functions you do for one
person would yield the same results as for the other person. This
presents some problems. We lose the uniqueness of a sender. Was
a given message encrypted by the key made from the DNA of this
woman living in Bristol or her twin who was adopted in Surrey?
For the input of the hashing function, one uses the overall
DNA sequence, but one can also add the current time. This ensures a different cryptographic hash. Because of how hashes work,
no one can know what the original time was. The outputs of cryptographic hashes look random, as if the tiniest change to the input
makes each number in the output have a fifty-fifty chance of changing. Therefore, it is impossible to determine what any of the input
data was.
7 Reaching Out
Until now, this has covered the ability to decrypt something –
anything – with the correct key. If you can successfully open this
metaphorical treasure chest, then you have proven your genetic relation to someone. The actual contents of that treasure chest – what
the message says – open the door to myriad options.