What is Differential Privacy?

Yesterday at the WWDC keynote, Apple announced a series of new security and privacy features, including one feature that’s drawn a bit of attention — and confusion. Specifically, Apple announced that they will be using a technique called “Differential Privacy” (henceforth: DP) to improve the privacy of their data collection practices.

The reaction to this by most people has been a big “???”, since few people have even heard of Differential Privacy, let alone understand what it means. Unfortunately Apple isn’t known for being terribly open when it comes to sharing the secret sauce that drives their platform, so we’ll just have to hope that at some point they decide to publish more. What we know so far comes from Apple’s iOS 10 Preview guide:

Starting with iOS 10, Apple is using Differential Privacy technology to help discover the usage patterns of a large number of users without compromising individual privacy. To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience. In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes.

To make a long story short, it sounds like Apple is going to be collecting a lot more data from your phone. They’re mainly doing this to make their services better, not to collect individual users’ usage habits. To guarantee this, Apple intends to apply sophisticated statistical techniques to ensure that this aggregate data — the statistical functions it computes over all your information — don’t leak your individual contributions. In principle this sounds pretty good. But of course, the devil is always in the details.

While we don’t have those details, this seems like a good time to at least talk a bit about what Differential Privacy is, how it can be achieved, and what it could mean for Apple — and for your iPhone.

The motivation

In the past several years, “average people” have gotten used to the idea that they’re sending a hell of a lot of personal information to the various services they use. Surveys also tell us they’re starting to feel uncomfortable about it.

This discomfort makes sense when you think about companies using our personal data to market (to) us. But sometimes there are decent motivations for collecting usage information. For example, Microsoft recently announced a tool that can diagnose pancreatic cancer by monitoring your Bing queries. Google famously runs Google Flu Trends. And of course, we all benefit from crowdsourced data that improves the quality of the services we use — from mapping applications to restaurant reviews.

Unfortunately, even well-meaning data collection can go bad. For example, in the late 2000s, Netflix ran a competition to develop a better film recommendation algorithm. To drive the competition, they released an “anonymized” viewing dataset that had been stripped of identifying information. Unfortunately, this de-identification turned out to be insufficient. In a well-known piece of work, Narayanan and Shmatikov showed that such datasets could be used to re-identify specific users — and even predict their political affiliation! — if you simply knew a little bit of additional information about a given user.

This sort of thing should be worrying to us. Not just because companies routinely share data (though they do) but because breaches happen, and because even statistics about a dataset can sometimes leak information about the individual records used to compute it. Differential Privacy is a set of tools that was designed to address this problem.

What is Differential Privacy?

Differential Privacy is a privacy definition that was originally developed by Dwork, Nissim, McSherry and Smith, with major contributions by many others over the years. Roughly speaking, what it states can summed up intuitively as follows:

Imagine you have two otherwise identical databases, one with your information in it, and one without it. Differential Privacy ensures that the probability that a statistical query will produce a given result is (nearly) the same whether it’s conducted on the first or second database.

One way to look at this is that DP provides a way to know if your data has a significant effect on the outcome of a query. If it doesn’t, then you might as well contribute to the database — since there’s almost no harm that can come of it. Consider a silly example:

Imagine that you choose to enable a reporting feature on your iPhone that tells Apple if you like to use the 💩  emoji routinely in your iMessage conversations. This report consists of a single bit of information: 1 indicates you like 💩 , and 0 doesn’t. Apple might receive these reports and fill them into a huge database. At the end of the day, it wants to be able to derive a count of the users who like this particular emoji.

It goes without saying that the simple process of “tallying up the results” and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you. A key observation of the Differential Privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that’s not quite exact — but that masks the contents of any given row. (For other interesting functions, there are many other techniques as well.)

Even more usefully, the calculation of “how much” noise to inject can be made without knowing the contents of the database itself (or even its size). That is, the noise calculation can be performed based only on knowledge of the function to be computed, and the acceptable amount of data leakage.

A tradeoff between privacy and accuracy

Now obviously calculating the total number of 💩 -loving users on a system is a pretty silly example. The neat thing about DP is that the same overall approach can be applied to much more interesting functions, including complex statistical calculations like the ones used by Machine Learning algorithms. It can even be applied when many different functions are all computed over the same database.

But there’s a big caveat here. Namely, while the amount of “information leakage” from a single query can be bounded by a small value, this value is not zero. Each time you query the database on some function, the total “leakage” increases — and can never go down. Over time, as you make more queries, this leakage can start to add up.

This is one of the more challenging aspects of DP. It manifests in two basic ways:

  1. The more information you intend to “ask” of your database, the more noise has to be injected in order to minimize the privacy leakage. This means that in DP there is generally a fundamental tradeoff between accuracy and privacy, which can be a big problem when training complex ML models.
  2. Once data has been leaked, it’s gone. Once you’ve leaked as much data as your calculations tell you is safe, you can’t keep going — at least not without risking your users’ privacy. At this point, the best solution may be to just to destroy the database and start over. If such a thing is possible.

The total allowed leakage is often referred to as a “privacy budget”, and it determines how many queries will be allowed (and how accurate the results will be). The basic lesson of DP is that the devil is in the budget. Set it too high, and you leak your sensitive data. Set it too low, and the answers you get might not be particularly useful.

Now in some applications, like many of the ones on our iPhones, the lack of accuracy isn’t a big deal. We’re used to our phones making mistakes. But sometimes when DP is applied in complex applications, such as training Machine Learning models, this really does matter.

Mortality vs. info disclosure, from Frederikson et al.
The red line is partient mortality.

To give an absolutely crazy example of how big the tradeoffs can be, consider this paper by Frederikson et al. from 2014. The authors began with a public database linking Warfarin dosage outcomes to specific genetic markers. They then used ML techniques to develop a dosing model based on their database — but applied DP at various privacy budgets while training the model. Then they evaluated both the information leakage and the model’s success at treating simulated “patients”.

The results showed that the model’s accuracy depends a lot on the privacy budget on which it was trained. If the budget is set too high, the database leaks a great deal of sensitive patient information — but the resulting model makes dosing decisions that are about as safe as standard clinical practice. On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the “noise-ridden” model had a tendency to kill its “patients”.

Now before you freak out, let me be clear: your iPhone is not going to kill you. Nobody is saying that this example even vaguely resembles what Apple is going to do on the phone. The lesson of this research is simply that there are interesting tradeoffs between effectiveness and the privacy protection given by any DP-based system — these tradeoffs depend to a great degree on specific decisions made by the system designers, the parameters chosen by the deploying parties, and so on. Hopefully Apple will soon tell us what those choices are.

How do you collect the data, anyway?

You’ll notice that in each of the examples above, I’ve assumed that queries are executed by a trusted database operator who has access to all of the “raw” underlying data. I chose this model because it’s the traditional model used in most of the literature, not because it’s a particularly great idea.

In fact, it would be worrisome if Apple was actually implementing their system this way. That would require Apple to collect all of your raw usage information into a massive centralized database, and then (“trust us!”) calculate privacy-preserving statistics on it. At a minimum this would make your data vulnerable to subpoenas, Russian hackers, nosy Apple executives and so on.

Fortunately this is not the only way to implement a Differentially Private system. On the theoretical side, statistics can be computed using fancy cryptographic techniques (such as secure multi-party computation or fully-homomorphic encryption.) Unfortunately these techniques are probably too inefficient to operate at the kind of scale Apple needs.

A much more promising approach is not to collect the raw data at all. This approach was recently pioneered by Google to collect usage statistics in their Chrome browser. The system, called RAPPOR, is based on an implementation of the 50-year old randomized response technique. Randomized response works as follows:

  1. When a user wants to report a piece of potentially embarrassing information (made up example: “Do you use Bing?”), they first flip a coin, and if the coin comes up “heads”, they return a random answer — calculated by flipping a second coin. Otherwise they answer honestly.
  2. The server then collects answers from the entire population, and (knowing the probability that the coins will come up “heads”), adjusts for the included “noise” to compute an approximate answer for the true response rate.

Intuitively, randomized response protects the privacy of individual user responses, because a “yes” result could mean that you use Bing, or it could just be the effect of the first mechanism (the random coin flip). More formally, randomized response has been shown to achieve Differential Privacy, with specific guarantees that can adjusted by fiddling with the coin bias.

 I’ve met Craig Federighi. He actually
looks like this in person.
RAPPOR takes this relatively old technique and turns it into something much more powerful. Instead of simply responding to a single question, it can report on complex vectors of questions, and may even return complicated answers, such as strings — e.g., which default homepage you use. The latter is accomplished by first encoding the string into a Bloom filter — a bitstring constructed using hash functions in a very specific way. The resulting bits are then injected with noise, and summed, and the answers recovered using a (fairly complex) decoding process.

While there’s no hard evidence that Apple is using a system like RAPPOR, there are some small hints. For example, Apple’s Craig Federighi describes Differential Privacy as using hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private.” That’s pretty weak evidence for anything, admittedly, but presence of the “hashing” in that quote at least hints towards the use of RAPPOR-like filters.

The main challenge with randomized response systems is that they can leak data if a user answers the same question multiple times. RAPPOR tries to deal with this in a variety of ways, one of which is to identify static information and thus calculate “permanent answers” rather than re-randomizing each time. But it’s possible to imagine situations where such protections could go wrong. Once again, the devil is very much in the details — we’ll just have to see. I’m sure many fun papers will be written either way.

So is Apple’s use of DP a good thing or a bad thing?

As an academic researcher and a security professional, I have mixed feelings about Apple’s announcement. On the one hand, as a researcher I understand how exciting it is to see research technology actually deployed in the field. And Apple has a very big field.

On the flipside, as security professionals it’s our job to be skeptical — to at a minimum demand people release their security-critical code (as Google did with RAPPOR), or at least to be straightforward about what it is they’re deploying. If Apple is going to collect significant amounts of new data from the devices that we depend on so much, we should really make sure they’re doing it right — rather than cheering them for Using Such Cool Ideas. (I made this mistake already once, and I still feel dumb about it.)

But maybe this is all too “inside baseball”. At the end of the day, it sure looks like Apple is honestly trying to do something to improve user privacy, and given the alternatives, maybe that’s more important than anything else.

Attack of the Week: Apple iMessage

Today’s Washington Post has a story entitled “Johns Hopkins researchers poke a hole in Apple’s encryption“, which describes the results of some research my students and I have been working on over the past few months.

As you might have guessed from the headline, the work concerns Apple, and specifically Apple’s iMessage text messaging protocol. Over the past months my students Christina Garman, Ian Miers, Gabe Kaptchuk and Mike Rushanan and I have been looking closely at the encryption used by iMessage, in order to determine how the system fares against sophisticated attackers. The results of this analysis include some very neat new attacks that allow us to — under very specific circumstances — decrypt the contents of iMessage attachments, such as photos and videos.

The research team. From left: Gabe Kaptchuk, Mike Rushanan, Ian Miers, Christina Garman

Now before I go further, it’s worth noting that the security of a text messaging protocol may not seem like the most important problem in computer security. And under normal circumstances I might agree with you. But today the circumstances are anything but normal: encryption systems like iMessage are at the center of a critical national debate over the role of technology companies in assisting law enforcement.

A particularly unfortunate aspect of this controversy has been the repeated call for U.S. technology companies to add “backdoors” to end-to-end encryption systems such as iMessage. I’ve always felt that one of the most compelling arguments against this approach — an argument I’ve made along with other colleagues — is that we just don’t know how to construct such backdoors securely. But lately I’ve come to believe that this position doesn’t go far enough — in the sense that it is woefully optimistic. The fact of the matter is that forget backdoors: we barely know how to make encryption work at all. If anything, this work makes me much gloomier about the subject.

But enough with the generalities. The TL;DR of our work is this:

Apple iMessage, as implemented in versions of iOS prior to 9.3 and Mac OS X prior to 10.11.4, contains serious flaws in the encryption mechanism that could allow an attacker — who obtains iMessage ciphertexts — to decrypt the payload of certain attachment messages via a slow but remote and silent attack, provided that one sender or recipient device is online. While capturing encrypted messages is difficult in practice on recent iOS devices, thanks to certificate pinning, it could still be conducted by a nation state attacker or a hacker with access to Apple’s servers. You should probably patch now.

For those who want the gory details, I’ll proceed with the rest of this post using the “fun” question and answer format I save for this sort of post.

What is Apple iMessage and why should I care?

Those of you who read this blog will know that I have a particular obsession with Apple iMessage. This isn’t because I’m weirdly obsessed with Apple — although it is a little bit because of that. Mostly it’s because I think iMessage is an important protocol. The text messaging service, which was introduced in 2011, has the distinction of being the first widely-used end-to-end encrypted text messaging system in the world.

To understand the significance of this, it’s worth giving some background. Before iMessage, the vast majority of text messages were sent via SMS or MMS, meaning that they were handled by your cellular provider. Although these messages are technically encrypted, this encryption exists only on the link between your phone and the nearest cellular tower. Once an SMS reaches the tower, it’s decrypted, then stored and delivered without further protection. This means that your most personal messages are vulnerable to theft by telecom employees or sophisticated hackers. Worse, many U.S. carriers still use laughably weak encryption and protocols that are vulnerable to active interception.

So from a security point of view, iMessage was a pretty big deal. In a single stroke, Apple deployed encrypted messaging to millions of users, ensuring (in principle) that even Apple itself couldn’t decrypt their communications. The even greater accomplishment was that most people didn’t even notice this happened — the encryption was handled so transparently that few users are aware of it. And Apple did this at very large scale: today, iMessage handles peak throughput of more than 200,000 encrypted messages per second, with a supported base of nearly one billion devices.

So iMessage is important. But is it any good?

Answering this question has been kind of a hobby of mine for the past couple of years. In the past I’ve written about Apple’s failure to publish the iMessage protocol, and on iMessage’s dependence on a vulnerable centralized key server. Indeed, the use of a centralized key server is still one of iMessage’s biggest weaknesses, since an attacker who controls the keyserver can use it to inject keys and conduct man in the middle attacks on iMessage users.

But while key servers are a risk, attacks on a key server seem fundamentally challenging to implement — since they require the ability to actively manipulate Apple infrastructure without getting caught. Moreover, such attacks are only useful for prospective surveillance. If you fail to substitute a user’s key before they have an interesting conversation, you can’t recover their communications after the fact.A more interesting question is whether iMessage’s encryption is secure enough to stand up against retrospective decryption attacks — that is, attempts to decrypt messages after they have been sent. Conducting such attacks is much more interesting than the naive attacks on iMessage’s key server, since any such attack would require the existence of a fundamental vulnerability in iMessage’s encryption itself. And in 2016 encryption seems like one of those things that we’ve basically figured out how to get right.

Which means, of course, that we probably haven’t.

How does iMessage encryption work?

What we know about the iMessage encryption protocol comes from a previous reverse-engineering effort by a group from Quarkslab, as well as from Apple’s iOS Security Guide. Based on these sources, we arrive at the following (simplified) picture of the basic iMessage encryption scheme:

To encrypt an iMessage, your phone first obtains the RSA public key of the person you’re sending to. It then generates a random AES key k and encrypts the message with that key using CTR mode. Then it encrypts k using the recipient’s RSA key. Finally, it signs the whole mess using the sender’s ECDSA signing key. This prevents tampering along the way.

So what’s missing here?

Well, the most obviously missing element is that iMessage does not use a Message Authentication Code (MAC) or authenticated encryption scheme to prevent tampering with the message. To simulate this functionality, iMessage simply uses an ECDSA signature formulated by the sender. Naively, this would appear to be good enough. Critically, it’s not.

The attack works as follows. Imagine that a clever attacker intercepts the message above and is able to register her own iMessage account. First, the attacker strips off the original ECDSA signature made by the legitimate sender, and replaces it with a signature of her own. Next, she sends the newly signed message to the original recipient using her own account:

The outcome is that the user receives and decrypts a copy of the message, which has now apparently originated from the attacker rather than from the original sender. Ordinarily this would be a pretty mild attack — but there’s a useful wrinkle. In replacing the sender’s signature with one of her own, the attacker has gained a powerful capability. Now she can tamper with the AES ciphertext (red) at will.

Specifically, since in iMessage the AES ciphertext is not protected by a MAC, it is therefore malleable. As long as the attacker signs the resulting message with her key, she can flip any bits in the AES ciphertext she wants — and this will produce a corresponding set of changes when the recipient ultimately decrypts the message. This means that, for example, if the attacker guesses that the message contains the word “cat” at some position, she can flip bits in the ciphertext to change that part of the message to read “dog” — and she can make this change even though she can’t actually read the encrypted message.

Only one more big step to go.

Now further imagine that the recipient’s phone will decrypt the message correctly provided that the underlying plaintext that appears following decryption is correctly formatted. If the plaintext is improperly formatted — for a silly example, our tampering made it say “*7!” instead of “pig” — then on receiving the message, the recipient’s phone might return an error that the attacker can see.

It’s well known that such a configuration capability allows our attacker the ability to learn information about the original message, provided that she can send many “mauled” variants to be decrypted. By mauling the underlying message in specific ways — e.g., attempting to turn “dog” into “pig” and observing whether decryption succeeds — the attacker can gradually learn the contents of the original message. The technique is known as a format oracle, and it’s similar to the padding oracle attack discovered by Vaudenay.

So how exactly does this format oracle work?

The format oracle in iMessage is not a padding oracle. Instead it has to do with the compression that iMessage uses on every message it sends.

You see, prior to encrypting each message payload, iMessage applies a complex formatting that happens to conclude with gzip compression. Gzip is a modestly complex compression scheme that internally identifies repeated strings, applies Huffman coding, then tacks a CRC checksum computed over the original data at the end of the compressed message. It’s this gzip-compressed payload that’s encrypted within the AES portion of an iMessage ciphertext.

It turns out that given the ability to maul a gzip-compressed, encrypted ciphertext, there exists a fairly complicated attack that allows us to gradually recover the contents of the message by mauling the original message thousands of times and sending the modified versions to be decrypted by the target device. The attack turns on our ability to maul the compressed data by flipping bits, then “fix up” the CRC checksum correspondingly so that it reflects the change we hope to see in the uncompressed data. Depending on whether that test succeeds, we can gradually recover the contents of a message — one byte at a time.

While I’m making this sound sort of simple, the truth is it’s not. The message is encoded using Huffman coding, with a dynamic Huffman table we can’t see — since it’s encrypted. This means we need to make laser-specific changes to the ciphertext such that we can predict the effect of those changes on the decrypted message, and we need to do this blind. Worse, iMessage has various countermeasures that make the attack more complex.The complete details of the attack appear in the paper, and they’re pretty eye-glazing, so I won’t repeat them here. In a nutshell, we are able to decrypt a message under the following conditions:

  1. We can obtain a copy of the encrypted message
  2. We can send approximately 2^18 (invisible) encrypted messages to the target device
  3. We can determine whether or not those messages decrypted successfully or not
The first condition can be satisfied by obtaining ciphertexts from a compromise of Apple’s Push Notification Service servers (which are responsible for routing encrypted iMessages) or by intercepting TLS connections using a stolen certificate — something made more difficult due to the addition of certificate pinning in iOS 9. The third element is the one that initially seems the most challenging. After all, when I send an iMessage to your device, there’s no particular reason that your device should send me any sort of response when the message decrypts. And yet this information is fundamental to conducting the attack!

It turns out that there’s a big exception to this rule: attachment messages.

How do attachment messages differ from normal iMessages?

When I include a photo in an iMessage, I don’t actually send you the photograph through the normal iMessage channel. Instead, I first encrypt that photo using a random 256-bit AES key, then I compute a SHA1 hash and upload the encrypted photo to iCloud. What I send you via iMessage is actually just an iCloud.com URL to the encrypted photo, the SHA1 hash, and the decryption key.

Contents of an “attachment” message.

 

When you successfully receive and decrypt an iMessage from some recipient, your Messages client will automatically reach out and attempt to download that photo. It’s this download attempt, which happens only when the phone successfully decrypts an attachment message, that makes it possible for an attacker to know whether or not the decryption has succeeded.

One approach for the attacker to detect this download attempt is to gain access to and control your local network connections. But this seems impractical. A more sophisticated approach is to actually maul the URL within the ciphertext so that rather than pointing to iCloud.com, it points to a related URL such as i8loud.com. Then the attacker can simply register that domain, place a server there and allow the client to reach out to it. This requires no access to the victim’s local network.

By capturing an attachment message, repeatedly mauling it, and monitoring the download attempts made by the victim device, we can gradually recover all of the digits of the encryption key stored within the attachment. Then we simply reach out to iCloud and download the attachment ourselves. And that’s game over. The attack is currently quite slow — it takes more than 70 hours to run — but mostly because our code is slow and not optimized. We believe with more engineering it could be made to run in a fraction of a day.

Result of decrypting the AES key for an attachment. Note that the ? symbol represents a digit we could not recover for various reasons, typically due to string repetitions. We can brute-force the remaining digits.

The need for an online response is why our attack currently works against attachment messages only: those are simply the messages that make the phone do visible things. However, this does not mean the flaw in iMessage encryption is somehow limited to attachments — it could very likely be used against other iMessages, given an appropriate side-channel.

How is Apple fixing this?

Apple’s fixes are twofold. First, starting in iOS 9.0 (and before our work), Apple began deploying aggressive certificate pinning across iOS applications. This doesn’t fix the attack on iMessage crypto, but it does make it much harder for attackers to recover iMessage ciphertexts to decrypt in the first place.

Unfortunately even if this works perfectly, Apple still has access to iMessage ciphertexts. Worse, Apple’s servers will retain these messages for up to 30 days if they are not delivered to one of your devices. A vulnerability in Apple Push Network authentication, or a compromise of these servers could read them all out. This means that pinning is only a mitigation, not a true fix.

As of iOS 9.3, Apple has implemented a short-term mitigation that my student Ian Miers proposed. This relies on the fact that while the AES ciphertext is malleable, the RSA-OAEP portion of the ciphertext is not. The fix maintains a “cache” of recently received RSA ciphertexts and rejects any repeated ciphertexts. In practice, this shuts down our attack — provided the cache is large enough. We believe it probably is.

In the long term, Apple should drop iMessage like a hot rock and move to Signal/Axolotl.

So what does it all mean?

As much as I wish I had more to say, fundamentally, security is just plain hard. Over time we get better at this, but for the foreseeable future we’ll never be ahead. The only outcome I can hope for is that people realize how hard this process is — and stop asking technologists to add unacceptable complexity to systems that already have too much of it.

Let’s talk about iMessage (again)

Yesterday’s New York Times carried a story entitled “Apple and other tech companies tangle with U.S. over data access“. It’s a vague headline that manages to obscure the real thrust of the story, which is that according to reporters at the Times, Apple has not been forced to backdoor their popular encrypted iMessage system. This flies in the face of some rumors to the contrary.

While there’s not much new information in here, people on Twitter seem to have some renewed interest in how iMessage works; whether Apple could backdoor it if they wanted to; and whether the courts could force them to. The answers to those questions are respectively: “very well“, “absolutely“, and “do I look like a national security lawyer?”

So rather than tackle the last one, which nobody seems to know the answer to, I figure it would be informative to talk about the technical issues with iMessage (again). So here we go.

How does iMessage work?

Fundamentally the mantra of iMessage is “keep it simple, stupid”. It’s not really designed to be an encryption system as much as it is a text message system that happens to include encryption. As such, it’s designed to take away most of the painful bits you expect from modern encryption software, and in the process it makes the crypto essentially invisible to the user. Unfortunately, this simplicity comes at some cost to security.

Let’s start with the good: Apple’s marketing material makes it clear that iMessage encryption is “end-to-end” and that decryption keys never leave the device. This claim is bolstered by their public security documentation as well as outside efforts to reverse-engineer the system. In iMessage, messages are encrypted with a combination of 1280-bit RSA public key encryption and 128-bit AES, and signed with ECDSA under a 256-bit NIST curve. It’s honestly kind of ridiculous, but whatever. Let’s call it good enough.

iMessage encryption in a nutshell boils down to this: I get your public key, you get my public key, I can send you messages encrypted to you, and you can be sure that they’re authentic and really came from me. Everyone’s happy.

But here’s the wrinkle: where do those public keys come from?

Where do you get the keys?

Key request to Apple’s server.

It’s this detail that exposes the real weakness of iMessage. To make key distribution ‘simple’, Apple takes responsibility for handing out your friends’ public keys. It does this using a proprietary key server that Apple owns and operates. Your iPhone requests keys from Apple using a connection that’s TLS-encrypted, and employs some fancy cryptographic tokens. But fundamentally, it relies on the assumption that Apple is good, and is really going to give you you the right keys for the person you want to talk to.

But this honesty is just an assumption. Since the key lookup is completely invisible to the user, there’s nothing that forces Apple to be honest. They could, if inspired, give you a public key of their choosing, one that they hold the decryption key for. They could give you the FBI’s key. They could give you Dwayne “The Rock” Johnson’s key, though The Rock would presumably be very non-plussed by this.

Indeed it gets worse. Because iMessage is designed to support several devices attached to the same account, each query to the directory server can bring back many keys — one for each of your devices. An attacker can simply add a device (or a fake ‘ghost device’) to Apple’s key server, and senders will encrypt messages to that key along with the legitimate ones. This enables wiretapping, provided you can get Apple to help you out.

But why do you need Apple to help you out?

As described, this attack doesn’t really require direct collaboration from Apple. In principle, the FBI could just guess the target’s email password, or reset the password and add a new device all on their own. Even with a simple subpoena, Apple might be forced to hand over security questions and/or password hashes.

The real difficulty is caused by a final security feature in iMessage: when you add a new device, or modify the devices attached to your account, Apple’s key server sends a notification to each of the existing devices already to the account. It’s not obvious how this feature is implemented, but one thing is clear — it seems likely that, at least in theory, Apple could shut it off if they needed to.* After all, this all comes down to code in the key server.

Fixing this problem seems hard. You could lock the key server in a giant cage, then throw away the key. But as long as Apple retains the ability to update their key server software, solving this problem seems fundamentally challenging. (Though not impossible — I’ll come back to this in a moment.)

Can governments force Apple to modify their key server?

It’s not clear. While it seems pretty obvious that Apple could in theory substitute keys and thus enable eavesdropping, in practice it may require substantial changes to Apple’s code. And while there are a few well-known cases in which the government has forced companies to turn over keys, changing the operation of a working system is a whole different ball of wax.

And iMessage is not just any working system. According to Apple, it handles several billion messages every day, and is fundamental to the operation of millions of iPhones. When you have a deployed system at that scale, the last thing you want to do is mess with it — particularly if it involves crypto code that may not even be well understood by its creators. There’s no amount of money you could pay me to be ‘the guy who broke iMessage’, even for an hour.

Any way you slice it, it’s a risky operation. But for a real answer, you’ll have to talk to a lawyer.

Why isn’t key substitution a good solution to the ‘escrow’ debate?

Another perspective on iMessage — one I’ve heard from some attorney friends — is that key server tampering sounds like a pretty good compromise solution to the problem of creating a ‘secure golden key‘ (AKA giving governments access to plaintext).

This view holds that key substitution allows only proactive eavesdropping: the government has to show up with a warrant before they can eavesdrop on a customer. They can’t spy on everyone, and they can’t go back and read your emails from last month. At the same time, most customers still get true ‘end to end’ encryption.

I see two problems with this view. First, tampering with the key server fundamentally betrays user trust, and undermines most of the guarantees offered by iMessage. Apple claims that they offer true end-to-end encryption that they can’t read — and that’s reasonable in the threat model they’ve defined for themselves. The minute they start selectively substituting keys, that theory goes out the window. If you can substitute a few keys, why not all of them? In this world, Apple should expect requests from every Tom, Dick and Harry who wants access to plaintext, ranging from divorce lawyers to foreign governments.

A snapshot of my seven (!) currently enrolled iMessage
devices, courtesy Frederic Jacobs.

The second, more technical problem is that key substitution is relatively easy to detect. While Apple’s protocols are an obfuscated mess, it is at least in theory possible for users to reverse-engineer them to view the raw public keys being transmitted — and thus determine whether the key server is being honest. While most criminals are not this sophisticated, a few are. And if they aren’t sophisticated, then tools can be built to make this relatively easy. (Indeed, people have already built such tools — see my key registration profile at right.)

Thus key substitution represents at most a temporary solution to the ‘government access’ problem, and one that’s fraught with peril for law enforcement, and probably disastrous for the corporations involved. It might seem tempting to head down this rabbit hole, but it’s rabbits all the way down.

What can providers do to prevent key substitution attacks?

Signal’s “key fingerprint” screen.


From a technical point of view, there are a number of things that providers can do to harden their key servers. One is to expose ‘key fingerprints’ to users who care, which would allow them to manually compare the keys they receive with the keys actually registered by other users. This approach is used by OpenWhisperSystems’ Signal, as well as PGP. But even I acknowledge that this kind of stinks.

A more user-friendly approach is to deploy a variant of Certificate Transparency, which requires providers to publish a publicly verifiable proof that every public key they hand out is being transmitted to the whole world. This allows each client to check that the server is handing out the actual keys they registered — and by implication, that every other user is seeing the same thing.

The most complete published variant of this is called CONIKS, and it was proposed by a group at Princeton, Stanford and the EFF (one of the more notable authors is Ed Felten, now Deputy U.S. Chief Technology Officer). CONIKS combined key transparency with a ‘verification protocol’ that allows clients to ensure that they aren’t being sidelined and fed false information.

CONIKS isn’t necessarily the only game in town when it comes to preventing key substitution attacks, but it represents a powerful existence proof that real defenses can be mounted. Even though Apple hasn’t chosen to implement CONIKS, the fact that it’s out there should be a strong disincentive for law enforcement to rely heavily on this approach.

So what next?

That’s the real question. If we believe the New York Times, all is well — for the moment. But not for the future. In the long term, law enforcement continues to ask for an approach that allows them to access the plaintext of encrypted messages. And Silicon Valley continues to find new ways to protect the confidentiality of their user’s data, against a range of threats beginning in Washington and proceeding well beyond.

How this will pan out is anyone’s guess. All we can say is that it will be messy.

Notes:

* How they would do this is really a question for Apple. The feature may involve the key server sending an explicit push message to each of the devices, in which case it would be easy to turn this off. Alternatively, the devices may periodically retrieve their own keys to see what Apple’s server is sending out to the world, and alert the user when they see a new one. In the latter case, Apple could selectively transmit a doctored version of the key list to the device owner.

Why can’t Apple decrypt your iPhone?

Last week I wrote about Apple’s new default encryption policy for iOS 8. Apple_Computer_Logo_rainbow.svgSince that piece was intended for general audiences I mostly avoided technical detail. But since some folks (and apparently the Washington Post!) are still wondering about the nitty-gritty details of Apple’s design, I thought it might be helpful to sum up what we know and noodle about what we don’t.

To get started, it’s worth pointing out that disk encryption is hardly new with iOS 8. In fact, Apple’s operating system has enabled some form of encryption since before iOS 7. What’s happened in the latest update is that Apple has decided to protect much more of the interesting data on the device under the user’s passcode. This includes photos and text messages — things that were not previously passcode-protected, and which police very much want access to.*

Excerpt fro Apple iOS Security Guide, 9/2014.

So to a large extent the ‘new’ feature Apple is touting in iOS 8 is simply that they’re encrypting more data. But it’s also worth pointing out that newer iOS devices — those with an “A7 or later A-series processor” — also add substantial hardware protections to thwart device cracking.

In the rest of this post I’m going to talk about how these protections may work and how Apple can realistically claim not to possess a back door.

One caveat: I should probably point out that Apple isn’t known for showing up at parties and bragging about their technology — so while a fair amount of this is based on published information provided by Apple, some of it is speculation. I’ll try to be clear where one ends and the other begins.

Password-based encryption 101

Normal password-based file encryption systems take in a password from a user, then apply a key derivation function (KDF) that converts a password (and some salt) into an encryption key. This approach doesn’t require any specialized hardware, so it can be securely implemented purely in software provided that (1) the software is honest and well-written, and (2) the chosen password is strong, i.e., hard to guess.

The problem here is that nobody ever chooses strong passwords. In fact, since most passwords are terrible, it’s usually possible for an attacker to break the encryption by working through a ‘dictionary‘ of likely passwords and testing to see if any decrypt the data. To make this really efficient, password crackers often use special-purpose hardware that takes advantage of parallelization (using FPGAs or GPUs) to massively speed up the process.

Thus a common defense against cracking is to use a ‘slow’ key derivation function like PBKDF2 or scrypt. Each of these algorithms is designed to be deliberately resource-intensive, which does slow down normal login attempts — but hits crackers much harder. Unfortunately, modern cracking rigs can defeat these KDFs by simply throwing more hardware at the problem. There are some approaches to dealing with this — this is the approach of memory-hard KDFs like scrypt — but this is not the direction that Apple has gone.

How Apple’s encryption works

Apple doesn’t use scrypt. Their approach is to add a 256-bit device-unique secret key called a UID to the mix, and to store that key in hardware where it’s hard to extract from the phone. Apple claims that it does not record these keys nor can it access them. On recent devices (with A7 chips), this key and the mixing process are protected within a cryptographic co-processor called the Secure Enclave.

The Apple Key Derivation function ‘tangles’ the password with the UID key by running both through PBKDF2-AES — with an iteration count tuned to require about 80ms on the device itself.** The result is the ‘passcode key’. That key is then used as an anchor to secure much of the data on the phone.

Overview of Apple key derivation and encryption (iOS Security Guide, p.10)

Since only the device itself knows UID — and the UID can’t be removed from the Secure Enclave — this means all password cracking attempts have to run on the device itself. That rules out the use of FPGA or ASICs to crack passwords. Of course Apple could write a custom firmware that attempts to crack the keys on the device but even in the best case such cracking could be pretty time consuming, thanks to the 80ms PBKDF2 timing.

(Apple pegs such cracking attempts at 5 1/2 years for a random 6-character password consisting of lowercase letters and numbers. PINs will obviously take much less time, sometimes as little as half an hour. Choose a good passphrase!)

So one view of Apple’s process is that it depends on the user picking a strong password. A different view is that it also depends on the attacker’s inability to obtain the UID. Let’s explore this a bit more.

Securing the Secure Enclave

The Secure Enclave is designed to prevent exfiltration of the UID key. On earlier Apple devices this key lived in the application processor itself. Secure Enclave provides an extra level of protection that holds even if the software on the application processor is compromised — e.g., jailbroken.

One worrying thing about this approach is that, according to Apple’s documentation, Apple controls the signing keys that sign the Secure Enclave firmware. So using these keys, they might be able to write a special “UID extracting” firmware update that would undo the protections described above, and potentially allow crackers to run their attacks on specialized hardware.

Which leads to the following question? How does Apple avoid holding a backdoor signing key that allows them to extract the UID from the Secure Enclave?

It seems to me that there are a few possible ways forward here.

  1. No software can extract the UID. Apple’s documentation even claims that this is the case; that software can only see the output of encrypting something with UID, not the UID itself. The problem with this explanation is that it isn’t really clear that this guarantee covers malicious Secure Enclave firmware written and signed by Apple.

Update 10/4: Comex and others (who have forgotten more about iPhone internals than I’ve ever known) confirm that #1 is the right answer. The UID appears to be connected to the AES circuitry by a dedicated path, so software can set it as a key, but never extract it. Moreover this appears to be the same for both the Secure Enclave and older pre-A7 chips. So ignore options 2-4 below.

  • Apple does have the ability to extract UIDs. But they don’t consider this a backdoor, even though access to the UID should dramatically decrease the time required to crack the password. In that case, your only defense is a strong password.
  • Apple doesn’t allow firmware updates to the Secure Enclave firmware period. This would be awkward and limiting, but it would let them keep their customer promise re: being unable to assist law enforcement in unlocking phones.
  • Apple has built a nuclear option. In other words, the Secure Enclave allows firmware updates — but before doing so, the Secure Enclave will first destroy intermediate keys. Firmware updates are still possible, but if/when a firmware update is requested, you lose access to all data currently on the device.

 

All of these are valid answers. In general, it seems reasonable to hope that the answer is #1. But unfortunately this level of detail isn’t present in the Apple documentation, so for the moment we just have to cross our fingers.

Addendum: how did Apple’s “old” backdoor work?

One wrinkle in this story is that allegedly Apple has been helping law enforcement agencies unlock iPhones for a while. This is probably why so many folks are baffled by the new policy. If Apple could crack a phone last year, why can’t they do it today?

But the most likely explanation for this policy is probably the simplest one: Apple was never really ‘cracking’ anything. Rather, they simply had a custom boot image that allowed them to bypass the ‘passcode lock’ screen on a phone. This would be purely a UI hack and it wouldn’t grant Apple access to any of the passcode-encrypted data on the device. However, since earlier versions of iOS didn’t encrypt all of the phone’s interesting data using the passcode, the unencrypted data would be accessible upon boot.

No way to be sure this is the case, but it seems like the most likely explanation.

Notes:

* Previous versions of iOS also encrypted these records, but the encryption key was not derived from the user’s passcode. This meant that (provided one could bypass the actual passcode entry phase, something Apple probably does have the ability to do via a custom boot image), the device could decrypt this data without any need to crack a password.

** As David Schuetz notes in this excellent and detailed piece, on phones with Secure Enclave there is also a 5 second delay enforced by the co-processor. I didn’t (and still don’t) want to emphasize this, since I do think this delay is primarily enforced by Apple-controlled software and hence Apple can disable it if they want to. The PBKDF2 iteration count is much harder to override.

Can Apple read your iMessages?

About a year ago I wrote a short post urging Apple to publish the technical details of iMessage encryption. I’d love tell you that Apple saw my influential crypto blogging and fell all over themselves to produce a spec, but, no. iMessage is the same black box it’s always been.

What’s changed is that suddenly people seem to care. Some of this interest is due to Apple’s (alleged) friendly relationship with the NSA. Some comes from their not-so-friendly relationship with the DEA. Whatever the reason, people want to know which of our data Apple has and who they’re sharing it with.

And that brings us back to iMessage encryption. Apple runs one of the most popular encrypted communications services on Earth, moving over two billion iMessage every day. Each one is loaded with personal information the NSA/DEA would just love to get their hands on. And yet Apple claims they can’t. In fact, even Apple can’t read them:

There are certain categories of information which we do not provide to law enforcement or any other group because we choose not to retain it.

For example, conversations which take place over iMessage and FaceTime are protected by end-to-end encryption so no one but the sender and receiver can see or read themApple cannot decrypt that data.

This seems almost too good to be true, which in my experience means it probably is. My view is inspired by something I like to call “Green’s law of applied cryptography”, which holds that applied cryptography mostly sucks. Crypto never offers the unconditional guarantees you want it to, and when it does your users suffer terribly.

And that’s the problem with iMessage: users don’t suffer enough. The service is almost magically easy to use, which means Apple has made tradeoffs — or more accurately, they’ve chosen a particular balance between usability and security. And while there’s nothing wrong with tradeoffs, the particulars of their choices make a big difference when it comes to your privacy. By witholding these details, Apple is preventing its users from taking steps to protect themselves.

The details of this tradeoff are what I’m going to talk about in this post. A post which I swear will be the last post I ever write on iMessage. From here on out it’ll be ciphers and zero knowledge proofs all the way.

Apple backs up iMessages to iCloud

That’s the super-secret
NSA spying chip.
The biggest problem with Apple’s position is that it just plain isn’t true. If you use the iCloud backup service to back up your iDevice, there’s a very good chance that Apple can access the last few days of your iMessage history.
For those who aren’t in the Apple ecosystem: iCloud is an optional backup service that Apple provides for free. Backups are great, but if iMessages are backed up we need to ask how they’re protected. Taking Apple at their word — that they really can’t get your iMessages — leaves us with two possibilities:
  1. iMessage backups are encrypted under a key that ‘never leaves the device‘.
  2. iMessage backups are encrypted using your password as a key.

Unfortunately neither of these choices really works — and it’s easy to prove it. All you need to do is run the following simple experiment: First, lose your iPhone. Now change your password using Apple’s iForgot service (this requires you to answer some simple security questions or provide a recovery email). Now go to an Apple store and shell out a fortune buying a new phone.

If you can recover your recent iMessages onto a new iPhone — as I was able to do in an Apple store this afternoon — then Apple isn’t protecting your iMessages with your password or with a device key. Too bad. (Update 6/27: Ashkan Soltani also has some much nicer screenshots from a similar test.)

The sad thing is there’s really no crypto to understand here. The simple and obvious point is this: if I could do this experiment, then someone at Apple could have done it too. Possibly at the request of law enforcement. All they need are your iForgot security questions, something that Apple almost certainly does keep.* 

Apple distributes iMessage encryption keys

But maybe you don’t use backups. In this case the above won’t apply to you, and Apple clearly says that their messages are end-to-end encrypted. The question you should be asking now is: encrypted to whom?

The problem here is that encryption only works if I have your encryption key. And that means before I can talk to you I need to get hold of it. Apple has a simple solution to this: they operate a directory lookup service that iMessage can use to look up the public key associated with any email address or phone number. This is great, but represents yet another tradeoff: you’re now fundamentally dependent on Apple giving you the right key.

HTTPS request/response containing a
“message identity key” associated with an
iPhone phone number (modified). These keys are
sent over SSL.

The concern here is that Apple – or a hacker who compromises Apple’s directory server – might instead deliver their own key. Since you won’t know the difference, you’ll be encrypting to that person rather than to your friend.**

Moreover, iMessage lets you associate multiple public keys with the same account — for example, you can you add a device (such as a Mac) to receive copies of messages sent to your phone. From what I can tell, the iMessage app gives the sender no indication of how many keys have been associated with a given iMessage recipient, nor does it warn them if the recipient suddenly develops new keys.

The practical upshot is that the integrity of iMessage depends on Apple honestly handing out keys. If they cease to be honest (or if somebody compromises the iMessage servers) it may be possible to run a man-in-the-middle attack and silently intercept iMessage data.

Now to some people this is obvious, and to other’s it’s no big deal. All of which is fine. But people should at least understand the strengths and weaknesses of the particular design that Apple has chosen. Armed with that knowledge they can make up their minds how much they want to trust Apple.

Apple can retain metadata

While Apple may encrypt the contents of your communication, their statement doesn’t exactly rule out the possibility they store who you’re talking to. This is the famous meta-data the NSA already sweeps up and (as I’ve said before) it’s almost impossible not to at least collect this information, especially since Apple actually delivers your messages through their servers.

This metadata can be as valuable as the data itself. And while Apple doesn’t retain the content of your messages, their statement says nothing about all that metadata.

Apple doesn’t use Certificate Pinning

As a last – and fairly minor point – iMessage client applications (for iPhone and Mac) communicate with Apple’s directory service using the HTTPS protocol. (Note that this applies to directory lookup messages: the actual iMessages are encrypted separately and travel over XMPP Apple’s push network protocol.)

Using HTTPS is a good thing, and in general it provides strong protections against interception. But it doesn’t protect against all attacks. There’s still a very real possibility that a capable attacker could obtain a forged certificate (possibly by compromising a Certificate Authority) and thus intercept or modify communications with Apple.

This kind of thing isn’t as crazy as it sounds. It happened to hundreds of thousands of Iranian Gmail users, and it’s likely to happen again in the future. The standard solution to this problem is called ‘certificate pinning‘ — this essentially tells the application not to trust unknown certificates. Many apps such as Twitter do this. However based on the testing I did while writing this post, Apple doesn’t.

Conclusion

I don’t write any of this stuff because I dislike Apple. In fact I love their products and would bathe with them if it didn’t (unfortunately) violate the warranty.

But the flipside of my admiration is simple: I rely on these devices and want to know how secure they are. I see absolutely no downside to Apple presenting at least a high-level explanation to experts, even if they keep the low-level details to themselves. This would include the type and nature of the encryption algorithms used, the details of the directory service and the key agreement protocol.

Apple may Think Different, but security rules apply to them too. Sooner or later someone will compromise or just plain reverse-engineer the iMessage system. And then it’ll all come out anyway.

Notes:

* Of course it’s possible that Apple is using your security questions to derive an encryption key. However this seems unlikely. First because it’s likely that Apple has your question/answers on file. But even if they don’t, it’s unlikely that many security answers contain enough entropy to use for encryption. There are only so many makes/models of cars and so many birthdays. Apple’s 2-step authentication may improve things if you use it — but if so Apple isn’t saying.

** In practice it’s not clear if Apple devices encrypt to this key directly or if they engage in an OTR-like key exchange protocol. What is clear is that iMessage does not include a ‘key fingerprint’ or any means for users to verify key authenticity, which means fundamentally you have to trust Apple to guarantee the authenticity of your keys. Moreover iMessage allows you to send messages to offline users. It’s not clear how this would work with OTR.

Dear Apple: Please set iMessage free

Normally I avoid complaining about Apple because (a) there are plenty of other people carrying that flag, and (b) I honestly like Apple and own numerous lovely iProducts. I’m even using one to write this post.

Moroever, from a security point of view, there isn’t that much to complain about. Sure, Apple has a few irritating habits — shipping old, broken versions of libraries in its software, for example. But on the continuum of security crimes this stuff is at best a misdemeanor, maybe a half-step above ‘improper baby naming‘. Everyone’s software sucks, news at 11.

There is, however, one thing that drives me absolutely nuts about Apple’s security posture. You see, starting about a year ago Apple began operating one of the most widely deployed encrypted text message services in the history of mankind. So far so good. The problem is that they still won’t properly explain how it works.

And nobody seems to care.

I am, of course, referring to iMessage, which was deployed last year in iOS Version 5. It allows — nay, encourages — users to avoid normal carrier SMS text messages and to route their texts through Apple instead.

Now, this is not a particularly new idea. But iMessage is special for two reasons. First it’s built into the normal iPhone texting application and turned on by default. When my Mom texts another Apple user, iMessage will automatically route her message over the Internet. She doesn’t have to approve this, and honestly, probably won’t even know the difference.

Secondly, iMessage claims to bring ‘secure end-to-end encryption‘ (and authentication) to text messaging. In principle this is huge! True end-to-end encryption should protect you from eavesdropping even by Apple, who carries your message. Authentication should protect you from spoofing attacks. This stands in contrast to normal SMS which is often not encrypted at all.

So why am I looking a gift horse in the mouth? iMessage will clearly save you a ton in texting charges and it will secure your messages for free. Some encryption is better than none, right?

Well maybe.

To me, the disconcerting thing about iMessage is how rapidly it’s gone from no deployment to securing billions of text messages for millions of users. And this despite the fact that the full protocol has never been published by Apple or (to my knowledge) vetted by security experts. (Note: if I’m wrong about this, let me know and I’ll eat my words.)

What’s worse is that Apple has been hyping iMessage as a secure protocol; they even propose it as a solution to some serious SMS spoofing bugs. For example:

Apple takes security very seriously. When using iMessage instead of SMS, addresses are verified which protects against these kinds of spoofing attacks. One of the limitations of SMS is that it allows messages to be sent with spoofed addresses to any phone, so we urge customers to be extremely careful if they’re directed to an unknown website or address over SMS.

And this makes me nervous. While iMessage may very well be as secure as Apple makes it out to be, there are plenty of reasons to give the protocol a second look.

For one thing, it’s surprisingly complicated.

iMessage is not just two phones talking to each other with TLS. If this partial reverse-engineering of the protocol (based on the MacOS Mountain Lion Messages client) is for real, then there are lots of moving parts. TLS. Client certificates. Certificate signing requests. New certificates delivered via XML. Oh my.

As a general rule, lots of moving parts means lots of places for things to go wrong. Things that could seriously reduce the security of the protocol. And as far as I know, nobody’s given this much of  a look. It’s surprising.

Moreover, there are some very real questions about what powers Apple has when it comes to iMessage. In principle ‘end-to-end’ encryption should mean that only the end devices can read the connection. In practice this is almost certainly not the case with iMessage. A quick glance at the protocol linked above is enough to tell me that Apple operates as a Certificate Authority for iMessage devices. And as a Certificate Authority, it may be able to substantially undercut the security of the protocol. When would Apple do this? How would it do this? Are we allowed to know?

Finally, there have been several reports of iMessages going astray and even being delivered to the wrong (or stolen) devices. This stuff may all have a reasonable explanation, but it’s yet another set of reasons why we it would be nice to understand iMessage better than we do now if we’re going to go around relying on it.

So what’s my point with all of this?

This is obviously not a technical post. I’m not here to present answers, which is disappointing. If I knew the protocol maybe I’d have some. Maybe I’d even be saying good things about it.

Rather, consider this post as a plea for help. iMessage is important. People use it. We ought to know how secure it is and what risks those people are taking by using it. The best solution would be for Apple to simply release a detailed specification for the protocol — even if they need to hold back a few key details. But if that’s not possible, maybe we in the community should be doing more to find out.

Remember, it’s not just our security at stake. People we know are using these products. It would be awfully nice to know what that means.

iCloud: Who holds the key?

Ars Technica brings us today’s shocking privacy news: ‘Apple holds the master decryption key when it comes to iCloud security, privacy‘. Oh my.

The story is definitely worth a read, though it may leave you shaking your head a bit. Ars’s quoted security experts make some good points, but they do it in a strange way — and they propose some awfully questionable fixes.

But maybe I’m too picky. To be honest, I didn’t realize that there was even a question about who controlled the encryption key to iCloud storage. Of course Apple does — for obvious technical reasons that I’ll explain below. You don’t need to parse Apple’s Terms of Service to figure this out, which is the odd path that Ars’s experts have chosen:

In particular, Zdziarski cited particular clauses of iCloud Terms and Conditions that state that Apple can “pre-screen, move, refuse, modify and/or remove Content at any time” if the content is deemed “objectionable” or otherwise in violation of the terms of service. Furthermore, Apple can “access, use, preserve and/or disclose your Account information and Content to law enforcement authorities” whenever required or permitted by law.

Well, fine, but so what — Apple’s lawyers would put stuff like this into their ToS even if they couldn’t access your encrypted content. This is what lawyers do. These phrases don’t prove that Apple can access your encrypted files (although, I remind you, they absolutely can), any more than Apple’s patent application for a 3D LIDAR camera ‘proves’ that you’re going to get one in your iPhone 5.

Without quite realizing what I was doing, I managed to get myself into a long Twitter-argument about all this with the Founder & Editor-in-Chief of Ars, a gentleman named Ken Fisher. I really didn’t mean to criticize the article that much, since it basically arrives at the right conclusions — albeit with a lot of nonsense along the way.

Since there seems to be some interest in this, I suppose it’s worth a few words. This may very well be the least ‘technical’ post I’ve ever written on this blog, so apologies if I’m saying stuff that seems a little obvious. Let’s do it anyway.

The mud puddle test

You don’t have to dig through Apple’s ToS to determine how they store their encryption keys. There’s a much simpler approach that I call the ‘mud puddle test’:

  1. First, drop your device(s) in a mud puddle.
  2. Next, slip in said puddle and crack yourself on the head. When you regain consciousness you’ll be perfectly fine, but won’t for the life of you be able to recall your device passwords or keys.
  3. Now try to get your cloud data back.

Did you succeed? If so, you’re screwed. Or to be a bit less dramatic, I should say: your cloud provider has access to your ‘encrypted’ data, as does the government if they want it, as does any rogue employee who knows their way around your provider’s internal policy checks.

And it goes without saying: so does every random attacker who can guess your recovery information or compromise your provider’s servers.

Now I realize that the mud puddle test doesn’t sound simple, and of course I don’t recommend that anyone literally do this — head injuries are no fun at all. It’s just a thought experiment, or in the extreme case, something you can ‘simulate’ if you’re willing to tell your provider few white lies.

But you don’t need to simulate it in Apple’s case, because it turns out that iCloud is explicitly designed to survive the mud puddle test. We know this thanks to two iCloud features. These are (1) the ability to ‘restore’ your iCloud backups to a brand new device, using only your iCloud password, and (2) the ‘iForgot’ service, which lets you recover your iCloud password by answering a few personal questions.

Since you can lose your device, the key isn’t hiding there. And since you can forget your password, it isn’t based on that. Ergo, your iCloud data is not encrypted end-to-end, not even using your password as a key (or if it is, then Apple has your password on file, and can recover it from your security questions.) (Update: see Jonathan Zdziarski’s comments at the end of this post.)

You wanna make something of it?

No! It’s perfectly reasonable for a consumer cloud storage provider to design a system that emphasizes recoverability over security. Apple’s customers are far more likely to lose their password/iPhone than they are to be the subject of a National Security Letter or data breach (hopefully, anyway).

Moreover, I doubt your median iPhone user even realizes what they have in the cloud. The iOS ‘Backup’ service doesn’t advertise what it ships to Apple (though there’s every reason to believe that backed up data includes stuff like email, passwords, personal notes, and those naked photos you took.) But if people don’t think about what they have to lose, they don’t ask to secure it. And if they don’t ask, they’re not going to receive.

My only issue is that we have to have this discussion in the first place. That is, I wish that companies like Apple could just come right out and warn their users: ‘We have access to all your data, we do bulk-encrypt it, but it’s still available to us and to law enforcement whenever necessary’. Instead we have to reverse-engineer it by inference, or by parsing through Apple’s ToS. That shouldn’t be necessary.

But can’t we fix this with Public-Key Encryption/Quantum Cryptography/ESP/Magical Unicorns?

No, you really can’t. And this is where the Ars Technica experts go a little off the rails. Their proposed solution is to use public-key encryption to make things better. Now this is actually a great solution, and I have no objections to it. It just won’t make things better.

To be fair, let’s hear it in their own words:

First, cloud services should use asymmetric public key encryption. “With asymmetric encryption, the privacy and identity of each individual user” is better protected, Gulri said, because it uses one “public” key to encrypt data before being sent to the server, and uses another, “private” key to decrypt data pulled from the server. Assuming no one but the end user has access to that private key, then no one but the user—not Apple, not Google, not the government, and not hackers—could decrypt and see the data.

I’ve added the boldface because it’s kind of an important assumption.

To make a long story short, there are two types of encryption scheme. Symmetric encryption algorithms have a single secret key that is used for both encryption and decryption. The key can be generated randomly, or it can be derived from a password. What matters is that if you’re sending data to someone else, then both you and the receiver need to share the same key.

Asymmetric, or public-key encryption has two keys, one ‘public key’ for encryption, and one secret key for decryption. This makes it much easier to send encrypted data to another person, since you only need their public key, and that isn’t sensitive at all.

But here’s the thing: the difference between these approaches is only related to how you encrypt the data. If you plan to decrypt the data — that is, if you ever plan to use it — you still need a secret key. And that secret key is secret, even if you’re using a public-key encryption scheme.

Which brings us to the real problem with all encrypted storage schemes: someone needs to hold the secret decryption key. Apple has made the decision that consumers are not in the best position to do this. If they were willing to allow consumers to hold their decryption keys, it wouldn’t really matter whether they were using symmetric or public-key encryption.

So what is the alternative?

Well, for a consumer-focused system, maybe there really isn’t one. Ultimately people back up their data because they’re afraid of losing their devices, which cuts against the idea of storing encryption keys inside of devices.

You could take the PGP approach and back up your decryption keys to some other location (your PC, for example, or a USB stick). But this hasn’t proven extremely popular with the general public, because it’s awkward — and sometimes insecure.

Alternatively, you could use a password to derive the encryption/decryption keys. This approach works fine if your users pick decent passwords (although they mostly won’t), and if they promise not to forget them. But of course, the convenience of Apple’s “iForgot” service indicates that Apple isn’t banking on users remembering their passwords. So that’s probably out too.

In the long run, the answer for non-technical users is probably just to hope that Apple takes good care of your data, and to hope you’re never implicated in a crime. Otherwise you’re mostly out of luck. For tech-savvy users, don’t use iCloud and do try to find a better service that’s willing to take its chances on you as the manager of your own keys.

In summary

I haven’t said anything in this post that you couldn’t find in Chapter 1 of an ‘Intro to Computer Security’ textbook, or a high-level article on Wikipedia. But these are important issues, and there seems to be confusion about them.

The problem is that the general tech-using public seems to think that cryptography is a magical elixir that can fix all problems. Companies — sometimes quite innocently — market ‘encryption’ to convince people that they’re secure, when in fact they’re really not. Sooner or later people will figure this out and things will change, or they won’t and things won’t. Either way it’ll be an interesting ride.

Update 4/4: Jonathan Zdziarski ‏tweets to say my ‘mud puddle’ theory is busted: since the iForgot service requires you to provide your birthdate and answer a ‘security question’, he points out that this data could be used as an alternative password, which could encrypt your iCloud password/keys — protecting them even from Apple itself.

The problem with his theory is that security answers don’t really make very good keys, since (for most users) they’re not that unpredictable. Apple could brute-force their way through every likely “hometown” or “favorite sport” in a few seconds. Zdziarski suggests that Apple might employ a deliberately slow key derivation function to make these attacks less feasible, and I suppose I agree with him in theory. But only in theory. Neither Zdziarski or I actually believe that Apple does any of this.