Coronavirus tracking app privacy - an introduction · Dave Hrycyszyn's personal website

I work on an open source project which protects user privacy, so naturally my friends have recently been asking me about coronavirus tracking apps. They seemed to get some benefit from the answers, so I thought writing our discussions up might be helpful for a more general audience. If you find it helpful, please share it!

How is it possible to both track people and to respect their privacy at the same time? Intuitively, it seems like a contradiction in terms.

app user

I think it’s important to answer this question. If we can’t point to a simple explanation, we may resign ourselves to data-based totalitarianism as the price of good health and economic recovery.

The first thing that’s worth understanding is that good privacy-respecting apps are simple. In fact, they’re far less complex than the application most people think of when it comes to location tracking and surveillance. At the start of our discussions, my friends usually have in mind an app that works like Google Maps. Let’s start with a hypothetical app which works this way, so we can later contrast it with a privacy-respecting counter-example.

It records user identity, time, and location info, with all the data stored in a central database. The data is easily accessible by health workers, police, social services, welfare and pensions, transport, and many other government workers in addition to Google. It’s complex, so it’ll be expensive and take a long time to build.

The data looks like this:

Timestamp, location, real name
2020-04-25-00:02:32:56, outside front of 46 Mildmay Road London N16, Bob 
2020-04-25-00:02:32:56, inside 31 St. Asaph Road London SE4, Alice 
2020-04-25-00:02:33:05, outside front of 42 Mildmay Road London N16, Bob 
2020-04-25-00:02:33:05, inside 31 St. Asaph Road London SE4, Alice

I live in London, so I’ve used London street addresses and postcodes instead of latitude and longitude, to make it a bit more readable.

As we can see, Bob is out for a midnight stroll and is walking at a reasonably good pace eastbound on Mildmay Road in north London. Meanwhile, Alice is stationary (perhaps asleep) in Brockley, south-east London.

Very important note: I have no idea if this is what any actual coronavirus tracking app teams are working on. I write it this way mainly because I’ve found it to be the most common mental model of how such apps would work.

On the plus side, it would certainly do the job. You could use the data to see who might have infected who, and break the chain of infection. We shouldn’t lose sight of that: the goal is to have a beneficial effect against the virus.

So, is it possible to do better on privacy while having the same medical effect? Let’s try. Here’s a heavily simplified version of one proposal (see below for more).

Once again, everybody has a coronavirus tracking app on their phone. The app broadcasts a random identifier via Bluetooth.

Open up your phone right now and go to the Bluetooth settings; you will probably see a bunch of other devices. Some of them broadcast human-readable identifiers, others don’t. That’s what I mean when I say “Bluetooth broadcast”: phones and other devices beeping out those identifiers to other phones in range.

Mine looks like this right now:

bluetooth settings

Ok, let’s get back to Alice and Bob again.

In the case of our privacy-respecting coronavirus tracking app, Alice’s phone does not broadcast her real name. It broadcasts the randomly chosen nonsense phrase fluffy jacket jet panther. Bob’s phone broadcasts salad emphasis hearing concurrency.

As Alice and Bob, and other people go about their daily lives, their phones are listening for other peoples’ identifiers. Let’s say Alice and Bob cross paths. Bob’s phone will take note of the phrase fluffy jacket jet panther, which is Alice’s random identifier, and Alice’s phone will take note of salad emphasis hearing concurrency, which is Bob’s identifier. They also note the duration: was it a passing contact? Or were they nearby for quite a long time?

So far, no data has left anybody’s phone. Both Alice and Bob have been walking around, and their phones have each heard some nonsense identifiers from other phones.

Then, horror strikes. Alice is sick! She has come down with coronavirus!

Alice whips out her phone, and presses the big red button saying “I’ve got it”. Her phone sends only the phrase fluffy jacket jet panther to the coronavirus tracking server on the internet. Then she goes and has a rest.

Bob’s phone, meanwhile, periodically asks the coronavirus tracking server, one at a time, about anybody he spent more than 60 seconds with. And the server replies.

“Does otter machine sky pencil have the coronavirus?”: NO
“Does orange placemat squirrel bamboo have the coronavirus?”: NO

So far, Bob is in the clear.

“Does fluffy jacket jet panther have the coronavirus?”: YES

Zut alors! Bob’s feeling fine, but now he knows that he should get tested. So he does. And he doesn’t infect anybody else as a result. The grandpa-pocalypse starts to wind down, and I can meet you in the pub.

Note that the app is simple (which means the geeks can build it quickly). It gives up no real world identity, no names, no location data, and yet still has the medical effect we want.

It does still give up one piece of information: the IP addresses of the people who report themselves as sick, and the IP addresses of anybody who is asking for infection info. And in fact, because of this, we still get a bit of a surveillance machine:

a) the server owner gets the IP address, linkable to real-world identity, of everyone who self-reports that they’ve got the coronavirus. It would be better if the IP was hidden from the server.

b) if the app batches requests for finding out who is sick, like this: “Do fluffy jacket jet panther, orange placemat squirrel bamboo or fluffy jacket jet panther have the coronavirus?”, then the server owner builds up an exact picture of everyone’s social contact anyway. That’s why I noted that such requests have to be made one at a time. And once again the IP needs to be hidden from the server, so that the server can’t tell who saw who.

Our open source project, Nym, has an implementation of something called a mixnet, based partly on the European Commission’s Panoramix project, of which several Nym staffers were a part. It can hide IP addresses, in a way that won’t leak. Tor would also work.

Both Nym and Tor are free, open source software. This is important: surveillance or conspiracy fears may lessen social confidence in a coronavirus tracing app, affecting its usefulness and peoples’ safety. We want to be able to prove to people that it’s ok to be honest. They will use it and report honestly if the code is freely available for everybody to look at, and they can see and understand all the moving parts.

Hopefully this explanation helps people understand a bit about tracking apps in a simple way. My purpose here has been to show that it is thinkable to build what at first seems crazy: privacy-respecting coronavirus tracking apps. Very little information ever leaves the phone, and although there is a centralized database, it contains nothing but a list of meaningless information: a million fluffy jacket jet panthers.

The scheme above is not intended to be a full design, and in fact these are not even my ideas (see below).

It’s worth mentioning that there is currently a big debate about the proper way forward for privacy-respecting tracking apps. Since am not a computer security academic, just a coder with an interest in privacy, I’ll point you to the best group I know of: DP3T, led by Dr. Carmela Troncoso of EPFL. The group of people supporting the DP3T contact-tracing ideas are effectively the world’s privacy-respecting brain-trust on this stuff.

A few other ideas I’ve looked at:

TCN - I’ve used the TCN proposal as the basis of my explanation for this blog post. Of the schemes I’ve seen, I like it the best, personally, because it’s really small and simple. Any distortions to achieve simplicity in this blog post are my fault. Read their documents if you want the serious view on this design.
PP Contact Tracer - another proposal, very mixnet-based but perhaps a bit complex.

Lastly: if anybody wants to build a prototype coronavirus tracking app, and your design needs a mixnet for IP privacy, please say hello on Twitter or in our developer chat channel nymtech.friends on KeyBase.

We are a relatively small team, so we can’t do it all by ourselves. But we do have a potentially important piece of the puzzle, namely a functioning mixnet that can be easily accessed via WebAssembly mobile clients. We are keen to work with other teams. And we really want to eventually leave our homes and drink beer with friends again.

PS after a week of bad coughs and fever, Alice ends up being fine. Stay safe, everyone!