A Few Thoughts on Cryptographic Engineering

How safe is Apple’s Safe Browsing?

This morning brings new and exciting news from the land of Apple. It appears that, at least on iOS 13, Apple is sharing some portion of your web browsing history with the Chinese conglomerate Tencent. This is being done as part of Apple’s “Fraudulent Website Warning”, which uses the Google-developed Safe Browsing technology as the back end. This feature appears to be “on” by default in iOS Safari, meaning that millions of users could potentially be affected.

(image source)

As is the standard for this sort of news, Apple hasn’t provided much — well, any — detail on whose browsing history this will affect, or what sort of privacy mechanisms are in place to protect its users. The changes probably affect only Chinese-localized users (see Github commits, courtesy Eric Romang), although it’s difficult to know for certain. However, it’s notable that Apple’s warning appears on U.S.-registered iPhones.

Regardless of which users are affected, Apple hasn’t said much about the privacy implications of shifting Safe Browsing to use Tencent’s servers. Since we lack concrete information, the best we can do is talk a bit about the technology and its implications. That’s what I’m going to do below.

What is “Safe Browsing”, and is it actually safe?

Several years ago Google noticed that web users tended to blunder into malicious sites as they browsed the web. This included phishing pages, as well as sites that attempted to push malware at users. Google also realized that, due to its unique vantage point, it had the most comprehensive list of those sites. Surely this could be deployed to protect users.

The result was Google’s “safe browsing”. In the earliest version, this was simply an API at Google that would allow your browser to ask Google about the safety of any URL you visited. Since Google’s servers received the full URL, as well as your IP address (and possibly a tracking cookie to prevent denial of service), this first API was kind of a privacy nightmare. (This API still exists, and is supported today as the “Lookup API“.)

To address these concerns, Google quickly came up with a safer approach to, um, “safe browsing”. The new approach was called the “Update API”, and it works like this:

  1. Google first computes the SHA256 hash of each unsafe URL in its database, and truncates each hash down to a 32-bit prefix to save space.
  2. Google sends the database of truncated hashes down to your browser.
  3. Each time you visit a URL, your browser hashes it and checks if its 32-bit prefix is contained in your local database.
  4. If the prefix is found in the browser’s local copy, your browser now sends the prefix to Google’s servers, which ship back a list of all full 256-bit hashes of the matching  URLs, so your browser can check for an exact match.

At each of these requests, Google’s servers see your IP address, as well as other identifying information such as database state. It’s also possible that Google may drop a cookie into your browser during some of these requests. The Safe Browsing API doesn’t say much about this today, but Ashkan Soltani noted this was happening back in 2012.

It goes without saying that Lookup API is a privacy disaster. The “Update API” is much more private: in principle, Google should only learn the 32-bit hashes of some browsing requests. Moreover, those truncated 32-bit hashes won’t precisely reveal the identity of the URL you’re accessing, since there are likely to be many collisions in such a short identifier. This provides a form of k-anonymity.

The weakness in this approach is that it only provides some privacy. The typical user won’t just visit a single URL, they’ll browse thousands of URLs over time. This means a malicious provider will have many “bites at the apple” (no pun intended) in order to de-anonymize that user. A user who browses many related websites — say, these websites — will gradually leak details about their browsing history to the provider, assuming the provider is malicious and can link the requests. (Updated to add: There has been some academic research on such threats.)

And this is why it’s so important to know who your provider actually is.

What does this mean for Apple and Tencent?

That’s ultimately the question we should all be asking.

The problem is that Safe Browsing “update API” has never been exactly “safe”. Its purpose was never to provide total privacy to users, but rather to degrade the quality of browsing data that providers collect. Within the threat model of Google, we (as a privacy-focused community) largely concluded that protecting users from malicious sites was worth the risk. That’s because, while Google certainly has the brainpower to extract a signal from the noisy Safe Browsing results, it seemed unlikely that they would bother. (Or at least, we hoped that someone would blow the whistle if they tried.)

But Tencent isn’t Google. While they may be just as trustworthy, we deserve to be informed about this kind of change and to make choices about it. At very least, users should learn about these changes before Apple pushes the feature into production, and thus asks millions of their customers to trust them.

We shouldn’t have to read the fine print

When Apple wants to advertise a major privacy feature, they’re damned good at it. As an example:  this past summer the company announced the release of the privacy-preserving “Find My” feature at WWDC, to widespread acclaim. They’ve also been happy to claim credit for their work on encryption, including technology such as iCloud Keychain.

But lately there’s been a troubling silence out of Cupertino, mostly related to the company’s interactions with China. Two years ago, the company moved much of iCloud server infrastructure into mainland China, for default use by Chinese users. It seems that Apple had no choice in this, since the move was mandated by Chinese law. But their silence was deafening. Did the move involve transferring key servers for end-to-end encryption? Would non-Chinese users be affected? Reporters had to drag the answers out of the company, and we still don’t know many of them.

In the Safe Browsing change we have another example of Apple making significant modifications to its privacy infrastructure, largely without publicity or announcement. We have learn about this stuff from the fine print. This approach to privacy issues does users around the world a disservice.

It increasingly feels like Apple is two different companies: one that puts the freedom of its users first, and another that treats its users very differently. Maybe Apple feels it can navigate this split personality disorder and still maintain its integrity.

I very much doubt it will work.