Wednesday, January 14, 2015

Hopefully the last post I'll ever write on Dual EC DRBG

I've been working on some other blog posts, including a conclusion of (or at least an installment in) this exciting series on zero knowledge proofs. That's coming soon, but first I wanted to take a minute to, well, rant.

The subject of my rant is this fascinating letter authored by NSA cryptologist Michael Wertheimer in February's Notices of the American Mathematical Society. Dr. Wertheimer is currently the Director of Research at NSA, and formerly held the position of Assistant Deputy Director and CTO of the Office of the Director of National Intelligence for Analysis.

In other words, this is a guy who should know what he's talking about.

The subject of Dr. Wertheimer's letter is near and dear to my heart: the alleged subversion of NIST's standards for random number generation -- a subversion that was long suspected and apparently confirmed by classified documents leaked by Edward Snowden. The specific algorithm in question is called Dual EC DRBG, and it very likely contains an NSA backdoor. Those who've read this blog should know that I think it's as suspicious as a three dollar bill.

Reading Dr. Wertheimer's letter, you might wonder what I'm so upset about. On the face of it, the letter appears to express regret. To quote (with my emphasis):
With hindsight, NSA should have ceased supporting the Dual_EC_DRBG algorithm immediately after security researchers discovered the potential for a trapdoor. In truth, I can think of no better way to describe our failure to drop support for the Dual_EC_DRBG algorithm as anything other than regrettable. The costs to the Defense Department to deploy a new algorithm were not an adequate reason to sustain our support for a questionable algorithm. Indeed, we support NIST’s April 2014 decision to remove the algorithm. Furthermore, we realize that our advocacy for the Dual_EC_DRBG casts suspicion on the broader body of work NSA has done to promote secure standards. 
I agree with all that. The trouble is that on closer examination, the letter doesn't express regret for the inclusion of Dual EC DRBG in national standards. The transgression Dr. Wertheimer identifies is merely that NSA continued to support the algorithm after major questions were raised. That's bizarre.

Even worse, Dr. Wertheimer reserves a substantial section of his letter for a defense of the decision to deploy Dual EC. It's those points that I'd like to address in this post.

Let's take them one at a time.
1: The Dual_EC_DRBG was one of four random number generators in the NIST standard; it is neither required nor the default.
It's absolutely true that Dual EC was only one of four generators in the NIST standard. It was not required for implementers to use it, and in fact they'd be nuts to use it -- given that overall it's at least two orders of magnitude slower than the other proposed generators.

The bizarre thing is that people did indeed adopt Dual EC in major commercial software packages. Specifically, RSA Security included it as the default generator in their popular BSAFE software library. Much worse, there's evidence that RSA was asked to do this by NSA, and were compensated for their compliance.

This is the danger with standards. Once NIST puts its seal on an algorithm, it's considered "safe". If the NSA came to a company and asked it to use some strange, non-standard algorithm, the request would be considered deeply suspicious by company and customers alike. But how can you refuse to use a standard if your biggest client asks you to? Apparently RSA couldn't.
2: The NSA-generated elliptic curve points were necessary for accreditation of the Dual_EC_DRBG but only had to be implemented for actual use in certain DoD applications.
This is a somewhat misleading statement, one that really needs to be unpacked.

First, the original NSA proposal of Dual EC DRBG contained no option for alternate curve points. This is an important point, since its the selection of curve points that give Dual EC its potential for a "back door". By generating two default points (P, Q) in a specific way, the NSA may have been able to create a master key that would allow them to very efficiently decrypt SSL/TLS connections.

If you like conspiracy theories, here's what NIST's John Kelsey was told when he asked how the NSA's points were generated:



In 2004-2005, several participants on the ANSI X9 tools committee pointed out the potential danger of this backdoor. One of them even went so far as to file a patent on using the idea to implement key escrow for SSL/TLS connections. (It doesn't get more passive aggressive than that.)

In response to the discovery of such an obvious flaw, the ANSI X9 committee immediately stopped recommending the NSA's points -- and relegated them to be simply an option, one to be used by the niche set of government users who required them.

I'm only kidding! Actually the committee did no such thing.

Instead, at the NSA's urging, the ANSI committee retained the original NSA points as the recommended parameters for the standard. It then added an optional procedure for generating alternative points. When NIST later adopted the generator in its SP800-90A standard, it mirrored the ANSI decision. But even worse, NIST didn't even bother to publish the alternative point generation algorithm. To actually implement it, you'd need to go buy the (expensive) non-public-domain ANSI standard and figure it out to implement it yourself:


This is, to paraphrase Douglas Adams, the standards committee equivalent of putting the details in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard'.

To the best of our knowledge, nobody has ever used ANSI's alternative generation procedure in a single one of the many implementations of Dual EC DRBG in commercial software.  It's not even clear how you could have used that procedure in a FIPS-certified product, since the FIPS evaluation process (conducted by CMVP) still requires you to test against the NSA-generated points.
3. The trapdoor concerns were openly studied by ANSI X9F1, NIST, and by the public in 2007. 
This statement has the benefit of being literally true, while also being pretty damned misleading.

It is true that in 2007 -- after Dual EC had been standardized -- two Microsoft researchers, Dan Shumow and Neils Ferguson openly raised the alarm about Dual EC. The problem here is that the flaws in Dual EC were not first discovered in 2007. They were discovered much earlier in the standardization process and nobody ever heard about them.

As I noted above, the ANSI X9 committee detected the flaws in Dual EC as early as 2004, and in close consultation with NSA agreed to address them -- in a manner that was highly beneficial to the NSA. But perhaps that's understandable, given that the committee was anything but 'open'.

In fact, this is an important aspect of the controversy that even NIST has criticized. The standardization of these algorithms was conducted through ANSI. And the closed ANSI committee consisted of representatives from a few select companies, NIST and the NSA. No public notice was given of the potential vulnerabilities discovered in the RNG. Moreover, a patent application that might have shone light on the backdoor was mired in NSA pre-publication review for over two years.

This timeline issue might seem academic, but bear this in mind: we now know that RSA Security began using the Dual EC DRBG random number generator in BSAFE -- as the default, I remind you -- way back in 2004. That means for three years this generator was widely deployed, yet serious concerns were not communicated to the public.

To state that the trapdoor concerns were 'openly' studied in 2007 is absolutely true. It's just completely irrelevant.

In conclusion

I'm not a mathematician, but like anyone who works in a mathematical area, I find there are aspects of the discipline that I love. For me it's the precision of mathematical statements, and the fact that the truth or falsity of a statement can -- ideally -- be evaluated from the statement itself, without resorting to differing opinions or understandings of the context.

While Dr. Wertheimer's letter is hardly a mathematical work, it troubles me to see such confusing statements in a publication of the AMS. As a record of history, Dr. Wertheimer's letter leaves much to be desired, and could easily lead people to the wrong understanding.

Given the stakes, we deserve a more exact accounting of what happened with Dual EC DRBG. I hope someday we'll see that.

Monday, December 29, 2014

On the new Snowden documents

If you don't follow NSA news obsessively, you might have missed yesterday’s massive Snowden document dump from Der Spiegel. The documents provide a great deal of insight into how the NSA breaks our cryptographic systems. I was very lightly involved in looking at some of this material, so I'm glad to see that it's been published.

Unfortunately with so much material, it can be a bit hard to separate the signal from the noise. In this post I’m going to try to do that a little bit -- point out the bits that I think are interesting, the parts that are old news, and the things we should keep an eye on.

Background

Those who read this blog will know that I’ve been wondering for a long time how NSA works its way around our encryption. This isn't an academic matter, since it affects just about everyone who uses technology today.

What we've learned since 2013 is that NSA and its partners hoover up vast amounts of Internet traffic from fiber links around the world. Most of this data is plaintext and therefore easy to intercept. But at least some of it is encrypted -- typically protected by protocols such as SSL/TLS or IPsec.

Conventional wisdom pre-Snowden told us that the increasing use of encryption ought to have shut the agencies out of this data trove. Yet the documents we’ve seen so far indicate just the opposite. Instead, the NSA and GCHQ have somehow been harvesting massive amounts of SSL/TLS and IPSEC traffic, and appear to be making inroads into other technologies such as Tor as well.

How are they doing this? To repeat an old observation, there are basically three ways to crack an encrypted connection:
  1. Go after the mathematics. This is expensive and unlikely to work well against modern encryption algorithms (with a few exceptions). The leaked documents give very little evidence of such mathematical breaks — though a bit more on this below.
  2. Go after the implementation. The new documents confirm a previously-reported and aggressive effort to undermine commercial cryptographic implementations. They also provide context for how important this type of sabotage is to the NSA.
  3. Steal the keys. Of course, the easiest way to attack any cryptosystem is simply to steal the keys. Yesterday we received a bit more evidence that this is happening.
I can’t possibly spend time on everything that’s covered by these documents — you should go read them yourself — so below I’m just going to focus on the highlights.

Not so Good Will Hunting

First, the disappointing part. The NSA may be the largest employer of cryptologic mathematicians in the United States, but — if the new story is any indication — those guys really aren’t pulling their weight.

In fact, the only significant piece of cryptanalytic news in the entire stack comes is a 2008 undergraduate research project looking at AES. Sadly, this is about as unexciting as it sounds -- in fact it appears to be nothing more than a summer project by a visiting student. More interesting is the context it gives around the NSA’s efforts to break block ciphers such as AES, including the NSA's view of the difficulty of such cryptanalysis, and confirmation that NSA has some ‘in-house techniques’. 


Additionally, the documents include significant evidence that NSA has difficulty decrypting certain types of traffic, including Truecrypt, PGP/GPG, Tor and ZRTP from implementations such as RedPhone. Since these protocols share many of the same underlying cryptographic algorithms — RSA, Diffie-Hellman, ECDH and AES — some are presenting this as evidence that those primitives are cryptographically strong.

As with the AES note above, this ‘good news’ should also be taken with a grain of salt. With a small number of exceptions, it seems increasingly obvious that the Snowden documents are geared towards NSA’s analysts and operations staff. In fact, many of the systems actually seem aimed at protecting knowledge of NSA's cryptanalytic capabilities from NSA's own operational staff (and other Five Eyes partners). As an analyst, it's quite possible you'll never learn why a given intercept was successfully decrypted.

To put this a bit more succinctly: the lack of cryptanalytic red meat in these documents may not truly be representative of the NSA’s capabilities. It may simply be an artifact of Edward Snowden's clearances at the time he left the NSA.

Tor

One of the most surprising aspects of the Snowden documents — to those of us in the security research community anyway — is the NSA’s relative ineptitude when it comes to de-anonymizing users of the Tor anonymous communications network.

The reason for our surprise is twofold. First, Tor was never really designed to stand up against a global passive adversary — that is, an attacker who taps a huge number of communications links. If there’s one thing we’ve learned from the Snowden leaks, the NSA (plus GCHQ) is the very definition of the term. In theory at least, Tor should be a relatively easy target for the agency.

The real surprise, though, is that despite this huge signals intelligence advantage, the NSA has barely even tested their ability to de-anonymize users. In fact, this leak provides the first concrete evidence that NSA is experimenting with traffic confirmation attacks to find the source of Tor connections. Even more surprising, their techniques are relatively naive, even when compared to what’s going on in the ‘research’ community.


This doesn’t mean you should view Tor as secure against the NSA. It seems very obvious that the agency has identified Tor as a high-profile target, and we know they have the resources to make much more headway against the network. The real surprise is that they haven’t tried harder. Maybe they're trying now.

SSL/TLS and IPSEC

A few months ago I wrote a long post speculating about how the NSA breaks SSL/TLS. Because it’s increasingly clear that the NSA does break these protocols, and at relatively large scale.

The new documents don’t tell us much we didn’t already know, but they do confirm the basic outlines of the attack. The first portion requires endpoints around the world that are capable of performing the raw decryption of SSL/TLS sessions provided they know the session keys. The second is a separate infrastructure located on US soil that can recover those session keys when needed.

All of the real magic happens within the key recovery infrastructure. These documents provide the first evidence that a major attack strategy for NSA/GCHQ involves key databases containing the private keys for major sites. For the RSA key exchange ciphersuites of TLS, a single private key is sufficient to recover vast amounts of session traffic — in real time or even after the fact.


The interesting question is how the NSA gets those private keys. The easiest answer may be the least technical. A different Snowden leak shows gives some reason to believe that the NSA may have relationships with employees at specific named U.S. entities, and may even operate personnel “under cover”. This would certainly be one way to build a key database.



But even without the James Bond aspect of this, there’s every reason to believe that NSA has other means to exfiltrate RSA keys from operators. During the period in question, we know of at least one vulnerability (Heartbleed) that could have been used to extract private keys from software TLS implementations. There are still other, unreported vulnerabilities that could be used today.

Pretty much everything I said about SSL/TLS also applies to VPN protocols, with the additional detail that many VPNs use broken protocols and relatively poorly-secured pre-shared secrets that can in some cases be brute-forced. The NSA seems positively gleeful about this.

Open Source packages: Redphone, Truecrypt, PGP and OTR

The documents provide at least circumstantial evidence that some open source encryption technologies may thwart NSA surveillance. These include Truecrypt, ZRTP implementations such as RedPhone, PGP implementations, and Off the Record messaging. These packages have a few commonalities:
  1. They’re all open source, and relatively well studied by researchers.
  2. They’re not used at terribly wide scale (as compared to e.g., SSL or VPNs)
  3. They all work on an end-to-end basis and don’t involve service providers, software distributers, or other infrastructure that could be corrupted or attacked.
What’s at least as interesting is which packages are not included on this list. Major corporate encryption protocols such as iMessage make no appearance in these documents, despite the fact that they ostensibly provide end-to-end encryption. This may be nothing. But given all we know about NSA’s access to providers, this is definitely worrying.

A note on the ethics of the leak

Before I finish, it's worth addressing one major issue with this reporting: are we, as citizens, entitled to this information? Would we be safer keeping it all under wraps? And is this all 'activist nonsense'?

This story, more than some others, skates close to a line. I think it's worth talking about why this information is important.

To sum up a complicated issue, we live in a world where targeted surveillance is probably necessary and inevitable. The evidence so far indicates that NSA is very good at this kind of work, despite some notable failures in actually executing on the intelligence it produces.

Unfortunately, the documents released so far also show that a great deal of NSA/GCHQ surveillance is not targeted at all. Vast amounts of data are scooped up indiscriminately, in the hope that some of it will someday prove useful. Worse, the NSA has decided that bulk surveillance justifies its efforts to undermine many of the security technologies that protect our own information systems. The President's own hand-picked review council has strongly recommended this practice be stopped, but their advice has -- to all appearances -- been disregarded. These are matters that are worthy of debate, but this debate hasn't happened.

Unfortunate if we can't enact changes to fix these problems, technology is probably about all that's left. Over the next few years encryption technologies are going to be widely deployed, not only by individuals but also by corporations desperately trying to reassure overseas customers who doubt the integrity of US technology.

In that world, it's important to know what works and doesn't work. Insofar as this story tells us that, it makes us all better off. 

Thursday, November 27, 2014

Zero Knowledge Proofs: An illustrated primer

One of the best things about modern cryptography is the beautiful terminology. You could start any number of punk bands (or Tumblrs) named after cryptography terms like 'hard-core predicate', 'trapdoor function', ' or 'impossible differential cryptanalysis'. And of course, I haven't even mentioned the one term that surpasses all of these. That term is 'zero knowledge'.

In fact, the term 'zero knowledge' is so appealing that it leads to problems. People misuse it, assuming that zero knowledge must be synonymous with 'really, really secure'. Hence it gets tacked onto all kinds of stuff -- like encryption systems and anonymity networks -- that really have nothing to do with true zero knowledge protocols.

This all serves to underscore a point: zero-knowledge proofs are one of the most powerful tools cryptographers have ever devised. But unfortunately they're also relatively poorly understood. In this series of posts I'm going try to give a (mostly) non-mathematical description of what ZK proofs are, and what makes them so special. In this post and the next I'll talk about some of the ZK protocols we actually use.

Origins of Zero Knowledge

The notion of 'zero knowledge' was first proposed in the 1980s by MIT researchers Shafi Goldwasser, Silvio Micali and Charles Rackoff. These researchers were working on problems related to interactive proof systems, theoretical systems where a first party (called a 'Prover') exchanges messages with a second party ('Verifier') to convince the Verifier that some mathematical statement is true.*

Prior to Goldwasser et al., most work in this area focused the soundness of the proof system. That is, it considered the case where a malicious Prover attempts to 'trick' a Verifier into believing a false statement. What Goldwasser, Micali and Rackoff did was to turn this problem on its head. Instead of worrying only about the Prover, they asked: what happens if you don't trust the Verifier? 

The specific concern they raised was information leakage. Concretely, they asked, how much extra information is the Verifier going to learn during the course of this proof, beyond the mere fact that the statement is true?

It's important to note that this is not simply of theoretical interest. There are real, practical applications where this kind of thing matters.

Here's one: imagine that a real-world client wishes to log into a web server using a password. The standard 'real world' approach to this problem involves storing a hashed version of the password on the server. The login can thus be viewed as a sort of 'proof' that a given password hash is the output of a hash function on some password -- and more to the point, that the client actually knows the password.

Most real systems implement this 'proof' in the absolute worst possible way. The client simply transmits the original password to the server, which re-computes the password hash and compares it to the stored value. The problem here is obvious: at the conclusion of the protocol, the server has learned my cleartext password. Modern password hygiene therefore involves a good deal of praying that servers aren't compromised.

What Goldwasser, Micali and Rackoff proposed was a new hope for conducting such proofs. If fully realized, zero knowledge proofs would allow us to prove statements like the one above, while provably revealing no information beyond the single bit of information corresponding to 'this statement is true'.

A 'real world' example

So far this discussion has been pretty abstract. To make things a bit more concrete, let's go ahead and give a 'real' example of a (slightly insane) zero knowledge protocol.

For the purposes of this example, I'd like you to imagine that I'm a telecom magnate in the process of deploying a new cellular communications network. My network structure is represented by the graph below. Each vertex in this graph represents a cellular radio tower, and the connecting lines (edges) indicate locations where two cells overlap, meaning that their transmissions are likely to interfere with each other.

This overlap is problematic, since it means that signals from adjacent towers are likely to scramble reception. Fortunately my network design allows me to configure each tower to one of three different frequency bands to avoid such interference.

Thus the challenge in deploying my network is to assign frequency bands to the towers such that no two overlapping cells share the same frequencies. If we use colors to represent the frequency bands, we can quickly work out one solution to the problem:


Of course, many of you will notice that what I'm describing here is simply an instance of the famous theory problem called the graph three-coloring problem. You might also know that what makes this problem interesting is that, for some graphs, it can be quite hard to find a solution, or even to determine if a solution exists. In fact, graph three-coloring -- specifically, the decision problem of whether a given graph supports a solution with three colors -- is known to be in the complexity class NP-complete.

It goes without saying that the toy example above is easy to solve by hand. But what if it wasn't? For example, imagine that my cellular network was very large and complex, so much so that the computing power at my disposal was not sufficient to find a solution. In this instance, it would be desirable to outsource the problem to someone else who has plenty of computing power. For example, I might hire my friends at Google to solve it for me on spec.

But this leads to a problem.

Suppose that Google devotes a large percentage of their computing infrastructure to searching for a valid coloring for my graph. I'm certainly not going to pay them until I know that they really have such a coloring. At the same time, Google isn't going to give me a copy of their solution until I've paid up. We'll wind up at an impasse.

In real life there's probably a common-sense answer to this dilemma, one that involves lawyers and escrow accounts. But this is not a blog about real life, it's a blog about cryptography. And if you've ever read a crypto paper, you'll understand that the right way to solve this problem is to dream up an absolutely crazy technical solution.

A crazy technical solution (with hats!)

The engineers at Google consult with Silvio Micali at MIT, who in consultation with his colleagues Oded Goldreich and Avi Wigderson, comes up with the following clever protocol -- one so elegant that it doesn't even require any computers. All it requires is a large warehouse, lots of crayons, and plenty of paper. Oh yes, and a whole bunch of hats.**

Here's how it works.

First I will enter the warehouse, cover the floor with paper, and draw a blank representation of my cell network graph. Then I'll exit the warehouse. Google can now enter enter, shuffle a collection of three crayons to pick a random assignment of the three agreed-upon crayon colors (red/blue/purple, as in the example above), and color in the graph in with their solution. Note that it doesn't matter which specific crayons they use, only that the coloring is valid.

Before leaving the warehouse, Google covers up each of the vertices with a hat. When I come back in, this is what I'll see:


Obviously this approach protects Google's secret coloring perfectly. But it doesn't help me at all. For all I know, Google might have filled in the graph with a random, invalid solution. They might not even have colored the graph at all.

To address my valid concerns, Google now gives me an opportunity to 'challenge' their solution to the graph coloring. I'm allowed to pick -- at random -- a single 'edge' of this graph (that is, one line between two adjacent hats). Google will then remove the two corresponding hats, revealing a small portion of their solution:

Notice that there are two outcomes to my experiment:
  1. If the two revealed vertices are the same color (or aren't colored in at all!) then I definitely know that Google is lying to me. Clearly I'm not going to pay Google a cent.
  2. If the two revealed vertices are different colors, then Google might not be lying to me.
Hopefully the first proposition is obvious. The second one requires a bit more consideration. The problem is that even after our experiment, Google could still be lying to me -- after all, I only looked under two of the hats. If there are E different edges in the graph, then Google could fill in an invalid solution and still get away with it most of the time. Specifically, after one test they could succeed in cheating me with probability up to (E-1)/E (which for a 1,000 edge graph works out to 99.9% of the time).

Fortunately Google has an answer to this. We'll just run the protocol again!

We put down fresh paper with a new, blank copy of the graph. Google now picks a new (random) shuffle of the three crayons. Next they fill in the graph with a valid solution, but using the new random ordering of the three colors.

The hats go back on. I come back in and repeat the challenge process, picking a new random edge. Once again the logic above applies. Only this time if all goes well, I should now be slightly more confident that Google is telling me the truth. That's because in order to cheat me, Google would have had to get lucky twice in a row. That can happen -- but it happens with relatively lower probability. The chance that Google fools me twice in a row is now (E-1)/E * (E-1)/(or about 99.8% probability for our 1,000 edge example above).

Fortunately we don't have to stop at two challenges. In fact, we can keep trying this over and over again until I'm confident that Google is probably telling me the truth.

But don't take my word for it. Thanks to some neat Javascript, you can go try it yourself.

Note that I'll never be perfectly certain that Google is being honest -- there's always going to be a tiny probability that they're cheating me. But after a large number of iterations (E^2, as it happens) I can eventually raise my confidence to the point where Google can only cheat me with negligible probability -- low enough that for all practical purposes it's not worth worrying about. And then I'll be able to safely hand Google my money.

What you need to believe is that Google is also protected. Even if I try to learn something about their solution by keeping notes between protocol runs, it shouldn't matter. I'm foiled by Google's decision to randomize their color choices between each iteration. The limited information I obtain does me no good, and there's no way for me to link the data I learn between interactions.

What makes it 'zero knowledge'?

I've claimed to you that this protocol leaks no information about Google's solution. But don't let me get away with this! The first rule of modern cryptography is never to trust people who claim such things without proof.

Goldwasser, Micali and Rackoff proposed three following properties that every zero-knowledge protocol must satisfy. Stated informally, they are:
  1. Completeness. If Google is telling the truth, then they will eventually convince me (at least with high probability).
  2. Soundness. Google can only convince me if they're actually telling the truth. 
  3. Zero-knowledgeness. (Yes it's really called this.) I don't learn anything else about Google's solution.
We've already discussed the argument for completeness. The protocol will eventually convince me (with a negligible error probability), provided we run it enough times. Soundness is also pretty easy to show here. If Google ever tries to cheat me, I will detect their treachery with overwhelming probability.

The hard part here is the 'zero knowledgeness' property. To do this, we need to conduct a very strange thought experiment.

A thought experiment (with time machines)

First, let's start with a crazy hypothetical. Imagine that Google's engineers aren't quite as capable as people make them out to be. They work on this problem for weeks and weeks, but they never manage to come up with a solution. With twelve hours to go until showtime, the Googlers get desperate. They decide to trick me into thinking they have a coloring for the graph, even though they don't.

Their idea is to sneak into the GoogleX workshop and borrow Google's prototype time machine. Initially the plan is to travel backwards a few years and use the extra working time to take another crack at solving the problem. Unfortunately it turns out that, like most Google prototypes, the time machine has some limitations. Most critically: it's only capable of going backwards in time four and a half minutes.

So using the time machine to manufacture more working time is out. But still, it turns out that even this very limited technology can still be used to trick me.
I don't really know what's going on here
but it seemed apropos.

The plan is diabolically simple. Since Google doesn't actually know a valid coloring for the graph, they'll simply color the paper with a bunch of random colors, then put the hats on. If by sheer luck, I challenge them on a pair of vertices that happen to be different colors, everyone will heave a sigh of relief and we'll continue with the protocol. So far so good.

Inevitably, though, I'm going to pull off a pair of hats and discover two vertices of the same color. In the normal protocol, Google would now be totally busted. And this is where the time machine comes in. Whenever Google finds themselves in this awkward situation, they simply fix it. That is, a designated Googler pulls a switch, 'rewinds' time about four minutes, and the Google team recolors the graph with a completely new random solution. Now they let time roll forward and try again.

In effect, the time machine allows Google to 'repair' any accidents that happen during their bogus protocol execution, which makes the experience look totally legitimate to me. Since bad challenge results will occur only 1/3 of the time, the expected runtime of the protocol (from Google's perspective) is only moderately greater than the time it takes to run the honest protocol. From my perspective I don't even know that the extra time machine trips are happening.

This last point is the most important. In fact, from my perspective, being unaware that the time machine is in the picture, the resulting interaction is exactly the same as the real thing. It's statistically identical. And yet it's worth pointing out again that in the time machine version, Google has absolutely no information about how to color the graph.

What the hell is the point of this?

What we've just shown is an example of a simulation. Note that in a world where time runs only forward and nobody can trick me with a time machine, the hat-based protocol is correct and sound, meaning that after E^2 rounds I should be convinced (with all but negligible probability) that the graph really is colorable and that Google is putting valid inputs into the protocol.

What we've just shown is that if time doesn't run only forward -- specifically, if Google can 'rewind' my view of time -- then they can fake a valid protocol run even if they have no information at all about the actual graph coloring.

From my perspective, what's the difference between the two protocol transcripts? When we consider the statistical distribution of the two, there's no difference at all. Both convey exactly the same amount of useful information.

Believe it or not, this proves something very important.

Specifically, assume that I (the Verifier) have some strategy that 'extracts' useful information about Google's coloring after observing an execution of the honest protocol. Then my strategy should work equally well in the case where I'm being fooled with a time machine. The protocol runs are, from my perspective, statistically identical. I physically cannot tell the difference.

Thus if the amount of information I can extract is identical in the 'real experiment' and the 'time machine experiment', yet the amount of information Google puts into the 'time machine' experiment is exactly zero -- then this implies that even in the real world the protocol must not leak any useful information.

Thus it remains only to show that computer scientists have time machines. We do! (It's a well-kept secret.)

Getting rid of the hats (and time machines)

Of course we don't actually want to run a protocol with hats. And even Google (probably?) doesn't have a literal time machine.

To tie things together, we first need to bring our protocol into the digital world. This requires that we construct the digital equivalent of a 'hat': something that both hides a digital value, while simultaneously 'binding' (or 'committing') the maker to it, so she can't change her mind after the fact.

Fortunately we have a perfect tool for this application. It's called a digital commitment scheme. A commitment scheme allows one party to 'commit' to a given message while keeping it secret, and then later 'open' the resulting commitment to reveal what's inside. They can be built out of various ingredients, including (strong) cryptographic hash functions.******

Given a commitment scheme, we now have all the ingredients we need to run the zero knowledge protocol electronically. The Prover first encodes its vertex colorings as a set of digital messages (for example, the numbers 0, 1, 2), then generates digital commitments to each one. These commitments get sent over to the Verifier. When the Verifier challenges on an edge, the Prover simply reveals the opening values for the commitments corresponding to the two vertices.

So we've managed to eliminate the hats. But how do we prove that this protocol is zero knowledge?

Fortunately now that we're in the digital world, we no longer need a real time machine to prove things about this protocol. A key trick is to specify in our setting that the protocol is not going to be run between two people, but rather between two different computer programs (or, to be more formal, probabilistic Turing machines.)

What we can now prove is the following theorem: if you could ever come up with a computer program (for the Verifier) that extracts useful information after participating in a run of the protocol, then it would be possible to use a 'time machine' on that program in order to make it extract the same amount of useful information from a 'fake' run of the protocol where the Prover doesn't put in any information to begin with.

And since we're now talking about computer programs, it should be obvious that rewinding time isn't such an extraordinary feat at all. In fact, we rewind computer programs all the time. For example, consider using virtual machine software with a snapshot capability.

Example of rewinding through VM snapshots. An initial VM is played forward, rewound to an
 initial snapshot, then execution is forked to a new path. 
Even if you don't have fancy virtual machine software, any computer program can be 'rewound' to an earlier state, simply by starting the program over again from the beginning and feeding it exactly the same inputs. Provided that the inputs -- including all random numbers -- are fixed, the program will always follow the same execution path. Thus you can rewind a program just by running it from the start and 'forking' its execution when it reaches some desired point.

Ultimately what we get is the following theorem. If there exists any Verifier computer program that successfully extracts information by interactively running this protocol with some Prover, then we can simply use the rewinding trick on that program to commit to a random solution, then 'trick' the Verifier by rewinding its execution whenever we can't answer its challenge correctly. The same logic holds as we gave above: if such a Verifier succeeds in extracting information after running the real protocol, then it should be able to extract the same amount of information from the simulated, rewinding-based protocol. But since there's no information going into the simulated protocol, there's no information to extract. Thus the information the Verifier can extract must always be zero.

Ok, so what does this all mean?

So let's recap. We know that the protocol is complete and sound, based on our analysis above. The soundness argument holds in any situation where we know that nobody is fiddling with time -- that is, the Verifier is running normally and nobody is rewinding its execution.

At the same time, the protocol is also zero knowledge. To prove this, we showed that any Verifier program that succeeds in extracting information must also be able to extract information from a protocol run where rewinding is used and no information is available in the first place. Which leads to an obvious contradiction, and tells us that the protocol can't leak information in either situation.

There's an important benefit to all this. Since it's trivial for anyone to 'fake' a protocol transcript, even after Google proves to me that they have a solution, I can't re-play a recording of the protocol transcript to prove anything to anyone else (say, a judge). That's because the judge would have no guarantee that the video was recorded honestly, and that I didn't simply edit in the same way Google might have done using the time machine. This means that protocol transcripts themselves contain no information. The protocol is only meaningful if I myself participated, and I can be sure that it happened in real time.

Proofs for all of NP!

If you've made it this far, I'm pretty sure you're ready for the big news. Which is that 3-coloring cellphone networks isn't all that interesting of a problem -- at least, not in and of itself.

The really interesting thing about the 3-coloring problem is that it's in the class NP-complete. To put this informally, the wonderful thing about such problems is that any other problem in the class NP can be translated into an instance of that problem.

In a single stroke, this result -- due to Goldreich, Micali and Wigderson -- proves that 'efficient' ZK proofs exists for a vast class of useful statements, many of which are way more interesting than assigning frequencies to cellular networks. You simply find a statement (in NP) that you wish to prove, such as our hash function example from above, then translate it into an instance of the 3-coloring problem. At that point you simply run the digital version of the hat protocol.

In summary, and next time

Of course, actually running this protocol for interesting statements would be an insanely silly thing for anyone to do, since the cost of doing so would include the total size of the original statement and witness, plus the reduction cost to convert it into a graph, plus the E^2 protocol rounds you'd have to conduct in order to convince someone that the proof is valid. Theoretically this is 'efficient', since the total cost of the proof would be polynomial in the input size, but in practice it would be anything but.

So what we've shown so far is that such proofs are possible. It remains for us to actually find proofs that are practical enough for real-world use.

In the next post I'll talk about some of those -- specifically, the efficient proofs that we use for various useful statements. I'll give some examples (from real applications) where these things have been used. Also at reader request: I'll also talk about why I dislike SRP so much.

Notes:

* Formally, the goal of an interactive proof is to convince the Verifier that a particular string belongs to some language. Typically the Prover is very powerful (unbounded), but the Verifier is limited in computation.

** This example is based on the original solution of Goldwasser, Micali and Rackoff, and the teaching example using hats is based on an explanation by Silvio Micali. I take credit only for the silly mistakes.

****** A simple example of a commitment can be built using a hash function. To commit to the value "x" simply generate some (suitably long) string of random numbers, which we'll call 'salt', and output the commitment C = Hash(salt || x). To open the commitment, you simply reveal 'x' and 'salt'. Anyone can check that the original commitment is valid by recomputing the hash. This is secure under some (moderately strong) assumptions about the function itself.

Tuesday, October 28, 2014

Attack of the Week: Unpicking PLAID

A few years ago I came across an amusing Slashdot story: 'Australian Gov't offers $560k Cryptographic Protocol for Free'. The story concerned a protocol developed by Australia's Centrelink, the equivalent of our Health and Human Services department, that was wonderfully named the Protocol for Lightweight Authentication of ID, or (I kid you not), 'PLAID'.

Now to me this headline did not inspire a lot of confidence. I'm a great believer in TANSTAAFL -- in cryptographic protocol design and in life. I figure if someone has to use 'free' to lure you in the door, there's a good chance they're waiting in the other side with a hammer and a bottle of chloroform, or whatever the cryptographic equivalent might be.

A quick look at PLAID didn't disappoint. The designers used ECB like it was going out of style; did unadvisable things with RSA encryption, and that was only the beginning.

What PLAID was not, I thought at the time, was bad to the point of being exploitable. Moreover, I got the idea that nobody would use the thing. It appears I was badly wrong on both counts.

This is apropos of a new paper authored by Degabriele, Fehr, Fischlin, Gagliardoni, Günther, Marson, Mittelbach and Paterson entitled 'Unpicking Plaid: A Cryptographic Analysis of an ISO-standards-track Authentication Protocol'. Not to be cute about it, but this paper shows that PLAID is, well, bad.

As is typical for this kind of post, I'm going to tackle the rest of the content in a 'fun' question and answer format.
What is PLAID?
In researching this blog post I was somewhat amazed to find that Australians not only have universal healthcare, but that they even occasionally require healthcare. That this surprised me is likely due to the fact that my knowledge of Australia mainly comes from the first two Crocodile Dundee movies.

It seems that not only do Australians have healthcare, but they also have access to a 'smart' ID card that allows them to authenticate themselves. These contactless smartcards run the proprietary PLAID protocol, which handles all of the ugly steps in authenticating the bearer, while also providing some very complicated protections to prevent user tracking.

This protocol has been standardized as Australian standard AS-5185-2010 and is currently "in the fast track standardization process for ISO/IEC 25185-1.2". You can obtain your own copy of the standard for a mere 118 Swiss Francs, which my currency converter tells me is entirely too much money. So I'll do my best to provide the essential details in this post -- and many others are in the research paper.
How does PLAID work?
Accompanying tinfoil hat
not pictured.
PLAID is primarily an identification and authentication protocol, but it also attempts to offer its users a strong form of privacy. Specifically, it attempts to ensure that only authorized readers can scan your card and determine who you are.

This is a laudable goal, since contactless smartcards are 'promiscuous', meaning that they can be scanned by anyone with the right equipment. Countless research papers have been written on tracking RFID devices, so the PLAID designers had to think hard about how they were going to prevent this sort of issue.

Before we get to what steps the PLAID designers took, let's talk about what one shouldn't do in building such systems.

Let's imagine you have a smartcard talking to a reader where anyone can query the card. The primary goal of an authentication protocol is to perform some sort of mutual authentication handshake and derive a session key that the card can use to exchange sensitive data. The naive protocol might look like this:

Naive approach to an authentication protocol. The card identifies itself
  before the key agreement protocol is run, so the reader can look up a card-specific key.
The obvious problem with this protocol is that it reveals the card ID number to anyone who asks. Yet such identification appears necessary, since each card will have its own secret key material -- and the reader must know the card's key in order to run an authenticated key agreement protocol.

PLAID solves this problem by storing a set of RSA public keys corresponding to various authorized applications. When the reader says "Hi, I'm a hospital", a PLAID card determines which public key it use to talk to hospitals, then encrypts data under the key and sends it over. Only a legitimate hospital should know the corresponding RSA secret key to decrypt this value.
So what's the problem here?
PLAID's approach would be peachy if there were only one public key to deal with. However PLAID cards can be provisioned to support many applications (called 'capabilities'). For example, a citizen who routinely finds himself wrestling crocodiles might possess a capability unique to the Reptile Wound Treatment unit of a local hospital.* If the card responds to this capability, it potentially leaks information about the bearer.

To solve this problem, PLAID cards do some truly funny stuff.

Specifically, when a reader queries the card, the reader initially transmits a set of capabilities that it will support (e.g., 'hospital', 'bank', 'social security center'). If the PLAID card has been provisioned with a matching public key, it goes ahead and uses it. If no matching key is found, however, the card does not send an error -- since this would reveal user-specific information. Instead, it fakes a response by encrypting junk under a special 'dummy' RSA public key (called a 'shill key') that's stored within the card. And herein lies the problem.

You see, the 'shill key' is unique to each card, which presents a completely new avenue for tracking individual cards. If an attacker can induce an error and subsequently fingerprint the resulting RSA ciphertext -- that is, figure out which shill key was used to encipher it -- they can potentially identify your card the next time they encounter you.

A portion of the PLAID protocol (source). The reader (IFD) is on the left, the card (ICC) is on the right.
Thus the problem of tracking PLAID cards devolves to one of matching RSA ciphertexts to unknown public keys. The PLAID designers assumed this would not be possible. What Degabriele et al. show is that they were wrong.
What do German Tanks have to do with RSA encryption?

To distinguish the RSA moduli of two different cards, the researchers employed of an old solution to a problem called the German Tank Problem. As the name implies, this is a real statistical problem that the allies ran up against during WWII.

The problem can be described as follows:

Imagine that a factory is producing tanks, where each tank is printed with a sequential serial number in the ordered sequence 1, 2, ..., N. Through battlefield captures you then obtain a small and (presumably) random subset of k tanks. From the recovered serial numbers, your job is to estimate N, the total number of tanks produced by the factory.

Happily, this is the rare statistical problem with a beautifully simple solution.** Let m be the maximum serial number in the set of k tanks you've observed. To obtain a rough guess Ñ for the total number of tanks produced, you can simply compute the following formula:


So what's this got to do with guessing an unknown RSA key?

Well, this turns out to be another instance of exactly the same problem. Imagine that I can repeatedly query your card with bogus 'capabilities' and thus cause it to enter its error state. To foil tracking, the card will send me a random RSA ciphertext encrypted with its (card-specific) "shill key". I don't know the public modulus corresponding to your key, but I do know that the ciphertext was computed using the standard RSA encryption formula m^e mod N.

My goal, as it was with the tanks, is to make a guess for N.

It's worth pointing out that RSA is a permutation, which means that, provided the plaintext messages are randomly distributed, the ciphertexts will be randomly distributed as well. So all I need to do is collect a number of ciphertexts and apply the equation above. The resulting guess Ñ should then serve as a (fuzzy) identifier for any given card.

Results for identifying a given card in a batch of (up to) 100 cards. Each card was initially 'fingerprinted' by collecting k1=1000 RSA ciphertexts. Arbitrary cards were later identified by collecting varying number of ciphertexts (k2).
Now obviously this isn't the whole story -- the value Ñ isn't exact, and you'll get different levels of error depending on how many ciphertexts you get, and how many cards you have to distinguish amongst. But as the chart above shows, it's possible to identify a specific card within in a real system (containing 100 cards) with reasonable probability.
But that's not realistic at all. What if there are other cards in play?
It's important to note that real life is nothing like a laboratory experiment. The experiment above considered a 'universe' consisting of only 100 cards, required an initial 'training period' of 1000 scans for a given card, and subsequent recognition demanded anywhere from 50 to 1000 scans of the card.

Since the authentication time for the cards is about 300 milliseconds per scan, this means that even the minimal number (50) of scans still requires about 15 seconds, and only produces the correct result with about 12% probability. This probability goes up dramatically the more scans you get to make.

Nonetheless, even the 50-scan result is significantly better than guessing, and with more concerted scans can be greatly improved. What this attack primarily serves to prove is that homebrew solutions are your enemy. Cooking up a clever approach to foiling tracking might seem like the right way to tackle a problem, but sometimes it can make you even more vulnerable than you were before.
How should one do this right?
The basic property that the PLAID designers were trying to achieve with this protocol is something called key privacy. Roughly speaking, a key private cryptosystem ensures that an attacker cannot link a given ciphertext (or collection of ciphertexts) to the public key that created them -- even if the attacker knows the public key itself.

There are many excellent cryptosystems that provide strong key privacy. Ironically, most are more efficient than RSA; one solution to the PLAID problem is simply to switch to one of these. For example, many elliptic-curve (Diffie-Hellman or Elgamal) solutions will generally provide strong key privacy, provided that all public keys in the system are set in the same group.

A smartcard encryption scheme based on, say, Curve25519 would likely avoid most of the problems presented above, and might also be significantly faster to boot.
What else should I know about PLAID?
There are a few other flaws in the PLAID protocol that make the paper worthy of a careful read -- if only to scare you away from designing your own protocols.

In addition to the shill key fingerprinting attacks, the authors also show how to identify the set of capabilities that a card supports, using a relatively efficient binary search approach. Beyond this, there are also many significantly more mundane flaws due to improper use of cryptography. 

By far the ugliest of these are the protocol's lack of proper RSA-OAEP padding, which may present potential (but implementation specific) vulnerabilities to Bleichenbacher's attack. There's also some almost de rigueur misuse of CBC mode encryption, and a handful of other flaws that should serve as audit flags if you ever run into them in the wild.


At the end of the day, bugs in protocols like PLAID mainly help to illustrate the difficulty of designing secure protocols from scratch. They also keep cryptographers busy with publications, so perhaps I shouldn't complain too much.
You should probably read up a bit on Australia before you post again.
I would note that this really isn't a question. But it's absolutely true. And to this end: I would sincerely like to apologize to any Australian citizens who may have been offended by this post.

Notes:

* This capability is not explicitly described in the PLAID specification.

** The solution presented here -- and used in the paper -- is the frequentist approach to the problem. There is also a Bayesian approach, but it isn't required for this application.

Tuesday, October 14, 2014

Attack of the week: POODLE

Believe it or not, there's a new attack on SSL. Yes, I know you're thunderstruck. Let's get a few things out of the way quickly.

First, this is not another Heartbleed. It's bad, but it's not going to destroy the Internet. Also, it applies only to SSLv3, which is (in theory) an obsolete protocol that we all should have ditched a long time ago. Unfortunately, we didn't.

Anyway, enough with the good news. Let's get to the bad.

The attack is called POODLE, and it was developed by Bodo Möller, Thai Duong and Krzysztof Kotowicz of Google. To paraphrase Bruce Schneier, attacks only get better -- they never get worse. The fact that this attack is called POODLE also tells us that attack names do get worse. But I digress.

The rough summary of POODLE is this: it allows a clever attacker who can (a) control the Internet connection between your browser and the server, and (b) run some code (e.g., script) in your browser to potentially decrypt authentication cookies for sites such as Google, Yahoo and your bank. This is obviously not a good thing, and unfortunately the attack is more practical than you might think. You should probably disable SSLv3 everywhere you can. Sadly, that's not so easy for the average end user.

To explain the details, I'm going to use the usual 'fun' question and answer format I employ for attacks like these.
What is SSL?
SSL is probably the most important security protocol on the Internet. It's used to encrypt connections between two different endpoints, most commonly your web browser and a web server. We mostly refer to SSL by the dual moniker SSL/TLS, since the protocol suite known as Secure Sockets Layer was upgraded and renamed to Transport Layer Security back in 1999.

This bug has nothing to do with TLS, however. It's purely a bug in the old pre-1999 SSL, and specifically version 3 -- something we should have ditched a long time ago. Unfortunately, for legacy reasons many browsers and servers still support SSLv3 in their configurations. It turns out that when you try to turn this option off, a good portion of the Internet stops working correctly, thanks to older browsers and crappy load balancers, etc.

As a result, many modern browsers and servers continue to support SSLv3 as an option. The worst part of this is that in many cases an active attacker can actually trigger a fallback. That is, even if both the server and client support more modern protocols, as long as they're willing to support SSLv3, an active attacker can force them to use this old, terrible protocol. In many cases this fallback is transparent to the user.
What's the matter with SSL v3?
So many things it hurts to talk about. For our purposes we need focus on just one. This has to do with the structure of encryption padding used when encrypting with the CBC mode ciphersuites of SSLv3.

SSL data is sent in 'record' structures, where each record is first authenticated using a MAC. It's subsequently enciphered using a block cipher (like 3DES or AES) in CBC mode. This MAC-then-encrypt design has been the cause of much heartache in the past. It's also responsible for the problems now.

Here's the thing: CBC mode encryption requires that the input plaintext length be equal to a multiple of the cipher's block size (8 bytes in the case of 3DES, 16 bytes for AES). To make sure this is the case, SSL implementations add 'padding' to the plaintext before encrypting it. The padding can be up to one cipher block in length, is not covered by the MAC, and always ends with a single byte denoting the length of the padding that was added.

In SSLv3, the contents of the rest of the padding is unspecified. This is the problem that will vex us here.
How does the attack work?
Let's imagine that I'm an active attacker who is able to obtain a CBC-encrypted record containing an interesting message like a cookie. I want to learn a single byte of this cookie -- and I'm willing to make the assumption that this byte happens to live at the end of a cipher block boundary.

(Don't worry about how I know that the byte I want to learn is in this position. Just accept this as a given for now.)

Imagine further that the final block of the record in question contains a full block of padding. If we're using AES as our cipher, this means that the last byte of the plaintext of the final block contains a '15' value, since there are 15 bytes of padding. The preceding 15 bytes of said block contain arbitrary values that the server will basically strip off and ignore upon decryption, since SSLv3 doesn't specify what they should contain. (NB: TLS does, which prevents this issue.)

The attack works like this. Since I control the Internet connection, I can identify the enciphered block that I want to learn within an encrypted record. I can then substitute (i.e., move) this block in place of the final block that should contain only padding.

When the server receives this new enciphered record, it will go ahead and attempt to decrypt the final block (which I'll call C_n) using the CBC decryption equation, which looks like this:
Decrypted final block := Decipher(C_n) XOR C_{n-1}
Note that C_{n-1} is the second-to-last block of the encrypted record.

If the decrypted final block does not contain a '15' in the final position, the server will assume either that the block is bogus (too much padding) or that there's less padding in the message than we intended. In the former case it will simply barf. In the latter case it will assume that the meaningful message is longer than it actually is, which should trigger an error in decryption since MAC verification will fail. This should also terminate the SSL connection.

Indeed, this is by far the most likely outcome of our experiment, since the deciphered last byte is essentially random -- thus failure will typically occur 255 out of every 256 times we try this experiment. In this case we have to renegotiate the handshake and try again.

Every once in a while we'll get lucky. In 1/256 of the cases, the deciphered final block will contain a 15 byte at the final position, and the server will accept this as as a valid padding length. The preceding fifteen bytes have also probably been changed, but the server will then strip off and ignore those values -- since SSLv3 doesn't care about the contents of the padding. No other parts of the ciphertext have been altered, so decryption will work perfectly and the server should report no errors.

This case is deeply meaningful to us. If this happens, we know that the decipherment of the final byte of C_n, XORed with the final byte of the preceding ciphertext block, is equal to '15'. From this knowledge we can easily determine the actual plaintext value of the original byte we wanted to learn. We can recover this value by XORing it with the final byte of the preceding ciphertext block, then XOR that with the last byte of the ciphertext block that precedes the original block we targeted.

Voila, in this case -- which occurs with probability 1/256 -- we've decrypted a single byte of the cookie.

The important thing to know is that if at first we don't succeed, we can try, try again. That's because each time we fail, we can re-run the SSL handshake (which changes the encryption key) and try the attack again. As long as the cookie byte we're attacking stays in the same position, we can continue our attempts until we get lucky. The expected number of attempts needed for success is 256.
We've got one byte, how do we get the rest?
The ability to recover a single byte doesn't seem so useful, but in fact it's all we need to decipher the entire cookie -- if we're able to control the cookie's alignment and location within the enciphered record. In this case, we can simply move one byte after another into that critical final-byte-of-the-cipher-block location and run the attack described above.

One way to do this is to trick the victim's browser into running some Javascript we control. This script will make SSL POST requests to a secure site like Google. Each time it does so, it will transmit a request path first, followed by an HTTP cookie and other headers, followed by a payload it controls.
Source: Möller et al.
Since the script controls the path and payload, by varying these values and knowing the size of the intermediate headers, the script can systematically align each specific byte of the cookie to any location it wants. It can also adjust the padding length to ensure that the final block of the record contains 16 bytes of padding.

This means that our attack can now be used to decrypt an entire cookie, with an average of 256 requests per cookie byte. That's not bad at all.
So should we move to West Virginia and stock up on canned goods?
Portions of the original SSL v3 specification being
reviewed at IETF 90.
Maybe. But I'm not so sure. For a few answers on what to do next, see Adam Langley and Rob Graham's blog posts on this question.

Note that this entire vulnerability stems from the fact that SSLv3 is older than Methuselah. In fact, there are voting-age children who are younger than SSLv3. And that's worrying.

The obvious and correct solution to this problem is find and kill SSLv3 anywhere it lurks. In fact, this is something we should have done in the early 2000s, if not sooner. We can do it now, and this whole problem goes away.

The problem with the obvious solution is that our aging Internet infrastructure is still loaded with crappy browsers and servers that can't function without SSLv3 support. Browser vendors don't want their customers to hit a blank wall anytime they access a server or load balancer that only supports SSLv3, so they enable fallback. Servers administrators don't want to lock out the critical IE6 market, so they also support SSLv3. And we all suffer.

Hopefully this will be the straw that breaks the camel's back and gets us to abandon obsolete protocols like SSLv3. But nobody every went bankrupt betting on insecurity. It's possible that ten years from now we'll still be talking about ways to work around POODLE and its virulent flesh-eating offspring. All we can do is hope that reason will prevail.

Saturday, October 4, 2014

Why can't Apple decrypt your iPhone?

Last week I wrote about Apple's new default encryption policy for iOS 8. Since that piece was intended for general audiences I mostly avoided technical detail. But since some folks (and apparently the Washington Post!) are still wondering about the nitty-gritty details of Apple's design, I thought it might be helpful to sum up what we know and noodle about what we don't.

To get started, it's worth pointing out that disk encryption is hardly new with iOS 8. In fact, Apple's operating system has enabled some form of encryption since before iOS 7. What's happened in the latest update is that Apple has decided to protect much more of the interesting data on the device under the user's passcode. This includes photos and text messages -- things that were not previously passcode-protected, and which police very much want access to.*
Excerpt from Apple iOS Security Guide, 9/2014.
So to a large extent the 'new' feature Apple is touting in iOS 8 is simply that they're encrypting more data. But it's also worth pointing out that newer iOS devices -- those with an "A7 or later A-series processor" -- also add substantial hardware protections to thwart device cracking.

In the rest of this post I'm going to talk about how these protections may work and how Apple can realistically claim not to possess a back door.

One caveat: I should probably point out that Apple isn't known for showing up at parties and bragging about their technology -- so while a fair amount of this is based on published information provided by Apple, some of it is speculation. I'll try to be clear where one ends and the other begins.

Password-based encryption 101

Normal password-based file encryption systems take in a password from a user, then apply a key derivation function (KDF) that converts a password (and some salt) into an encryption key. This approach doesn't require any specialized hardware, so it can be securely implemented purely in software provided that (1) the software is honest and well-written, and (2) the chosen password is strong, i.e., hard to guess.

The problem here is that nobody ever chooses strong passwords. In fact, since most passwords are terrible, it's usually possible for an attacker to break the encryption by working through a 'dictionary' of likely passwords and testing to see if any decrypt the data. To make this really efficient, password crackers often use special-purpose hardware that takes advantage of parallelization (using FPGAs or GPUs) to massively speed up the process.

Thus a common defense against cracking is to use a 'slow' key derivation function like PBKDF2 or scrypt. Each of these algorithms is designed to be deliberately resource-intensive, which does slow down normal login attempts -- but hits crackers much harder. Unfortunately, modern cracking rigs can defeat these KDFs by simply throwing more hardware at the problem. There are some approaches to dealing with this -- this is the approach of memory-hard KDFs like scrypt -- but this is not the direction that Apple has gone.

How Apple's encryption works

Apple doesn't use scrypt. Their approach is to add a 256-bit device-unique secret key called a UID to the mix, and to store that key in hardware where it's hard to extract from the phone. Apple claims that it does not record these keys nor can it access them. On recent devices (with A7 chips), this key and the mixing process are protected within a cryptographic co-processor called the Secure Enclave.


The Apple Key Derivation function 'tangles' the password with the UID key by running both through PBKDF2-AES -- with an iteration count tuned to require about 80ms on the device itself.** The result is the 'passcode key'. That key is then used as an anchor to secure much of the data on the phone.
Overview of Apple key derivation and encryption (iOS Security Guide, p.10).   
Since only the device itself knows UID -- and the UID can't be removed from the Secure Enclave -- this means all password cracking attempts have to run on the device itself. That rules out the use of FPGA or ASICs to crack passwords. Of course Apple could write a custom firmware that attempts to crack the keys on the device but even in the best case such cracking could be pretty time consuming, thanks to the 80ms PBKDF2 timing.

(Apple pegs such cracking attempts at 5 1/2 years for a random 6-character password consisting of lowercase letters and numbers. PINs will obviously take much less time, sometimes as little as half an hour. Choose a good passphrase!)

So one view of Apple's process is that it depends on the user picking a strong password. A different view is that it also depends on the attacker's inability to obtain the UID. Let's explore this a bit more.

Securing the Secure Enclave

The Secure Enclave is designed to prevent exfiltration of the UID key. On earlier Apple devices this key lived in the application processor itself. Secure Enclave provides an extra level of protection that holds even if the software on the application processor is compromised -- e.g., jailbroken.

One worrying thing about this approach is that, according to Apple's documentation, Apple controls the signing keys that sign the Secure Enclave firmware. So using these keys, they might be able to write a special "UID extracting" firmware update that would undo the protections described above, and potentially allow crackers to run their attacks on specialized hardware.


Which leads to the following question? How does Apple avoid holding a backdoor signing key that allows them to extract the UID from the Secure Enclave?

It seems to me that there are a few possible ways forward here.
  1. No software can extract the UID. Apple's documentation even claims that this is the case; that software can only see the output of encrypting something with UID, not the UID itself. The problem with this explanation is that it isn't really clear that this guarantee covers malicious Secure Enclave firmware written and signed by Apple.

    Update 10/4: Comex and others (who have forgotten more about iPhone internals than I've ever known) confirm that #1 is the right answer. The UID appears to be connected to the AES circuitry by a dedicated path, so software can set it as a key, but never extract it. Moreover this appears to be the same for both the Secure Enclave and older pre-A7 chips. So ignore options 2-4 below.
     
  2. Apple does have the ability to extract UIDs. But they don't consider this a backdoor, even though access to the UID should dramatically decrease the time required to crack the password. In that case, your only defense is a strong password.
     
  3. Apple doesn't allow firmware updates to the Secure Enclave firmware period. This would be awkward and limiting, but it would let them keep their customer promise re: being unable to assist law enforcement in unlocking phones.
      
  4. Apple has built a nuclear option. In other words, the Secure Enclave allows firmware updates -- but before doing so, the Secure Enclave will first destroy intermediate keys. Firmware updates are still possible, but if/when a firmware update is requested, you lose access to all data currently on the device. 
All of these are valid answers. In general, it seems reasonable to hope that the answer is #1. But unfortunately this level of detail isn't present in the Apple documentation, so for the moment we just have to cross our fingers.

Addendum: how did Apple's "old" backdoor work?

One wrinkle in this story is that allegedly Apple has been helping law enforcement agencies unlock iPhones for a while. This is probably why so many folks are baffled by the new policy. If Apple could crack a phone last year, why can't they do it today?

But the most likely explanation for this policy is probably the simplest one: Apple was never really 'cracking' anything. Rather, they simply had a custom boot image that allowed them to bypass the 'passcode lock' screen on a phone. This would be purely a UI hack and it wouldn't grant Apple access to any of the passcode-encrypted data on the device. However, since earlier versions of iOS didn't encrypt all of the phone's interesting data using the passcode, the unencrypted data would be accessible upon boot.

No way to be sure this is the case, but it seems like the most likely explanation.

Notes:

* Previous versions of iOS also encrypted these records, but the encryption key was not derived from the user's passcode. This meant that (provided one could bypass the actual passcode entry phase, something Apple probably does have the ability to do via a custom boot image), the device could decrypt this data without any need to crack a password.

** As David Schuetz notes in this excellent and detailed piece, on phones with Secure Enclave there is also a 5 second delay enforced by the co-processor. I didn't (and still don't) want to emphasize this, since I do think this delay is primarily enforced by Apple-controlled software and hence Apple can disable it if they want to. The PBKDF2 iteration count is much harder to override.