Monday, April 30, 2012

An update on the TLS MITM situation

A couple months back I wrote a pretty overwrought post lamenting the broken state of our PKI. At the time, I was kind of upset that 'trusted' Certificate Authorities (*ahem*, Trustwave) had apparently made a business of selling that trust for the purpose of intercepting SSL/TLS connections.

Now, I may be off base, but this seems like a bad thing to me. It also seemed like pretty stupid business. After all, it's not like certificates are hard to make (seriously -- I made three while writing the previous sentence). When your sole distinguishing feature is that people trust you, why would you ever mess with that trust?

In any case, the worst part of the whole thing was a suspicion that the practice didn't end with Trustwave; rather, that Trustwave was simply the CA that got caught. (No, this doesn't make it better. Don't even try that.)

But if that was the case, then who else was selling these man-in-the-middle 'skeleton' certificates? Who wasn't? Could I just call up and order one today, or would it be all dark parking lots and windowless vans, like the last time I bought an 0day from VUPEN? (Kidding. Sort of.)

The saddest part of the whole story is that out of all the major browser vendors, only one -- the Mozilla foundation, makers of Firefox -- was willing to take public action on the issue. This came in the form of a letter that Mozilla sent to each of the major CAs in the Mozilla root program:
Please reply by March 2, 2012, to confirm completion of the following actions or state when these actions will be completed. 
1) Subordinate CAs chaining to CAs in Mozilla’s root program cannot be used for MITM or “traffic management” of domain names or IPs that the certificate holder does not legitimately own or control, regardless of whether it is in a closed and controlled environment or not. Please review all of the subordinate CAs that chain up to your root certificates in NSS to make sure that they cannot be used in this way. Any existing subordinate CAs that can be used for that purpose must be revoked and any corresponding HSMs destroyed as soon as possible, but no later than April 27, 2012. 
Which brings me to the point of this post. March 2 passed a long time ago. The results are mostly in. So what have we learned?

(Note: to read the linked spreadsheet, look at the column labeled "Action #1". If it contains an A, B, or E, that means the CA has claimed innocence -- it does not make SubCAs for signing MITM certificates, or else it does make SubCAs, but the holders are contractually-bound not to abuse them.* If there's a C or a D there, it means that the CA is working the problem and will have things sorted out soon.)

The high-level overview is that most of the Certificate Authorities claim innocence. They simply don't make SubCAs (now, anyway). If they do, they contractually require the holders not to use them for MITM eavesdropping. One hopes these contracts are enforceable. But on the surface, at least, things look ok.

There are a couple of exceptions: notably Verizon Business (in my hometown!), KEYNECTIS and GlobalSign. All three claim to have reasonable explanations, and are basically just requesting extensions so they can get their ducks in a row (KEYNECTIS is doing in-person inspections to make sure nothing funny is going on, which hopefully just means they're super-diligent, and not that they're picking up MITM boxes and throwing them in the trash.) **

But the upshot is that if we can believe this self-reported data, then Trustwave really was the outlier. They were the only people dumb enough to get into the business of selling SubCAs to their client for the purpose of intercepting TLS connections. Or at very least, the other players in this business knew it was a trap, and get themselves out long ago.

Anyone else believe this? I'd sure like to.


* 'Abuse' here means signing certificates for domains that you don't control. Yes, it's unclear how strong these contracts are, and if there's no possibility for abuse down the line. But we can only be so paranoid.

** I confess there some other aspects I don't quite follow here -- for example, why are some 'compliant' CAs marked with 'CP/CPS does not allow MITM usage', while others are 'In Progress' in making that change. I'm sure there's a good technical reason for this. Hopefully the end result is compliance all around.

Friday, April 27, 2012

Wonk post: Circular security

Apologies in advance for a super-wonky post, but I'm in kind of a wonky mood this week. It happens that I'm putting together a talk on the subject of circular security, and I figured this blog might be a good place to get my thoughts together.

If you've never heard of circular security, don't feel bad -- you're in good company. In a nutshell, circular security describes the way that an encryption scheme behaves when you ask it to encrypt its own key. In theory, that behavior can be pretty unpredictable.

The key word in the previous sentence is 'theory'. Circular security is a very interesting research area, but if your primary reason for reading this blog is to learn about practical things, this post may not be for you. (Though please come back next week, when I hope to be writing about electronic cash!)

Assuming you're still reading, the first order of business is to talk about what circular security (or insecurity) is in the first place. To do that, we need to define what it means for an encryption scheme to be 'secure'. And that means a discussion of semantic security.

Semantic Security

Semantic security is one of the earliest and most important security definitions in our field. It's generally considered to be the minimum bar for any modern 'secure' encryption scheme. In English, the informal definition of semantic security is something like this:
An attacker who sees the encryption of a message -- drawn from a known, or even chosen message space -- gains approximately the same amount of information as they would have obtained if they didn't see the encryption in the first place. (The one exception being the length of the message.)
This is a nice, intuitive definition since it captures what we really want from encryption.

You see, in an ideal world, we wouldn't need encryption at all. We would send all of our data via some sort of magical, buried cable that our adversary couldn't tap. Unfortunately, in the real world we don't have magical cables: we send our data via the public WiFi at our neighborhood Starbucks.

Semantic security tells us not to worry. As long as we encrypt with a semantically-secure scheme, the adversary who intercepts our encrypted data won't learn much more than the guy who didn't intercept it at all -- at worst, he'll learn only the amount of data we sent. Voila, security achieved.

(Now, just for completeness: semantic security is not the strongest definition we use for security, since it does not envision active attacks, where the adversary can obtain the decryption of chosen ciphertexts. But that's a point for another time.)

Unfortunately, before we can do anything with semantic security, we need to turn the English-language intuition above into something formal and mathematical. This is surprisingly complicated, since it requires us to formalize the notion of 'an attacker gains approximately the same amount of information'. In the early definitions this was done by making grand statements about predicate functions. This approach is faithful and accurate. It's also kind of hard to do anything with.

Fortunately there's a much simpler, yet equivalent way to define semantic security. This definition is called 'indistinguishability under chosen plaintext attack', or IND-CPA for short. IND-CPA is described by the following game which is 'played' between an adversary and some honest party that we'll call the challenger:
  1. The challenger generates a public/private keypair, and gives the public key to the adversary.
  2. The adversary eventually picks two equal-length messages (M0, M1) from the set of allowed plaintexts, and sends them to the challenger.
  3. The challenger flips a coin, then returns the encryption of one of the messages under the public key.
  4. The attacker tries to guess which message was encrypted. He 'wins' if he guesses correctly.
(This game can also be applied to symmetric encryption. Since there's no public key in a symmetric scheme, the challenger makes up the lost functionality by providing an encryption oracle: i.e., it encrypts messages for the adversary as she asks for them.) 

A quick glance at the game should convince you that even a moron adversary can win this game. In fact, if the adversary simply guesses randomly he'll be right exactly half the time. Hence the real question is: how much better can he do?

And that leads us to our (basically) formal definition: 
A scheme is IND-CPA secure if no adversary can win the above game with probability (significantly) greater than 1/2.
The term 'significantly' in this case hides a lot of detail -- typically it means that 'non-negligibly'. In many schemes we also require the adversary to run in limited time (i.e., be a probabilistic polynomial-time Turing machine). But details, details...

Encrypting your own key

One of the weirder aspects of the IND-CPA definition above is that it doesn't handle a very basic (and surprisingly common!) use case: namely, cases where you encrypt your own secret key.

Believe it or not, this actually happens. If you use a disk encryption system like Bitlocker, you maybe already be encrypting your own keys, without even noticing it. Moreover, newer schemes like Gentry's Fully Homomorphic Encryption depend fundamentally on this kind of "circular" encryption.

It seems surprising that IND-CPA can give us such a strong notion of security -- where the attacker learns nothing about a plaintext -- and yet can't handle this very simple case. After all, isn't your secret key just another message? What makes it special?

The technical answer to this question hangs on the fact that the IND-CPA game above only works  with messages chosen by the adversary. Specifically, the adversary is asked to attack the encryption of either M0 or M1, which he chooses in step (2). Since presumably -- if the scheme is secure -- the adversary doesn't know the scheme's secret key, he won't be able to submit (M0,M1) such that either message contains the scheme's secret key. (Unless he makes a lucky guess, but this should happen with at most negligible probability.)

What this means is that an encryption scheme could do something terribly insecure when asked to encrypt its own secret key. For example, it could burp out the secret key without encrypting it at all! And yet it would still satisfy the IND-CPA definition. Yikes! And once you raise the possibility that such a scheme could exist, cryptographers will immediately wants to actually build it.

(This may seem a little perverse: after all, aren't there enough broken schemes in the world without deliberately building more? But when you think about it, this kind of 'counterexample' is extremely valuable to us. If we know that such oddball, insecure schemes can exist, that motivates to watch out for them in the real constructions that we use. And it tells us a little about the strength of our definitions.)

It turns out that there's a 'folklore' approach that turns any IND-CPA secure public-key encryption scheme into one that's still IND-CPA secure, but is also totally insecure if it ever should encrypt its own secret key.

The basic approach is to modify the original scheme by changing the encryption algorithm as follows:
  1. On input a public key and a message to be encrypted, it tests to see if the message is equal to the scheme's secret key.*
  2. If not, it encrypts the message using the original scheme's encryption algorithm (which, as we noted previously, is IND-CPA secure).
  3. If the message is equal to the secret key, it just outputs the secret key in cleartext.
It's pretty easy to see that this scheme is as secure as the original scheme for messages that aren't the secret key. It's also easy to see that it's totally insecure when you do encrypt the actual secret key. (Though I've glossed over a small technical challenge in step 2, see footnote*).

Circles and cycles

Just as cryptographers were congratulating each other for answering the previous question -- showing that there are schemes that fail catastrophically when they encrypt their own secret key -- some smartass decided to up the stakes.

The question (s)he asked was: what if two people encrypt each other's secret keys?

Let's be clear what I'm saying here. Imagine that Alice decides to entrust Bob with her secret key, so she wraps it up under his public key (say, sending him a PGP encrypted email). And imagine that Bob decides to do exactly the same with his key, encrypting it under Alice's public key. We now have the following 'key cycle':
           CA = Encrypt(pkA, skB),               CB = Encrypt(pkB, skA)
To be clear, IND-CPA by itself tells us that it's perfectly secure for Alice to encrypt her own key under Bob's key (or vice versa). There's no problem there. However, the minute Bob also encrypts Alice's secret key, a cycle is formed -- and semantic security doesn't tell us anything about whether this is secure.

So this is worrisome in theory, but are there actual schemes where such cycles can cause problem?

Up until 2010, the answer was: no. It turns out to be much harder to find a counterexample for this case, since the approach described in the previous section doesn't work. You can't just hack a little bit of code into the encryption algorithm; the ciphertexts are encrypted independently. At the time they're encrypted, the ciphertexts are perfectly secure. They only become insecure when they come into close proximity with one another.

(If a weird analogy helps, think of those encrypted keys like two hunks of Plutonium, each inside of its own briefcase. As long as they're apart, everything's fine. But get them close enough to one another, they interact with one another and basically ruin your whole day.)

It gets worse

A breakthrough occurred at Eurocrypt 2010, where Acar, Belenkiy, Bellare and Cash showed that indeed, there is a scheme that's perfectly IND-CPA secure in normal usage, but fails when Alice and Bob encrypt their keys in a cycle like the one described above.

The Acar et al. scheme is based on certain type of elliptic-curve setting known as bilinear SXDH, and what they show is that when Alice and Bob create a key cycle like the one above, an adversary can recognize it as such.

To be clear, what this means is that the ciphertexts (Encrypt(pkA, skB), Encrypt(pkB, skA)) jointly leak a little bit of information -- simply the fact that they encrypt each other's secret keys! This may not seem like much, but it's far more than they should ever reveal.

The Acar et al. result is interesting to me, because along with my colleague Susan Hohenberger, I was thinking about the same problem around the same time. We independently came up with a similar finding just a few months after Acar et al. submitted theirs -- crappy luck, but it happens. On the bright side, we discovered something slightly worse.

Specifically, we were able to construct a scheme that's totally secure in normal usage, but becomes catastrophically insecure the minute Alice and Bob encrypt each others' secret keys. This means in practice that if two parties innocently encrypt a key cycle, then both of their secret keys become public. This means every message that either party has ever encrypted (or will encrypt) becomes readable. Not good!**

The worst part is that both the Acar et al. scheme and our scheme are (almost) normal-looking constructions. They could be real schemes that someone came up with and deployed in the real world! And if they did, and if someone encrypted their secret keys in a cycle, things would be very bad for everyone.

So what do we do about this?

The solution to this problem is to develop schemes that can handle (by design) the case where someone encrypts a secret key, or a function of a secret key. This concept is known as Key-Dependent Message (KDM) security, and it's been the subject of some research.

Unfortunately, building provably KDM-secure schemes is not an easy task, at least, not without resorting to artificial constructs like random oracles. While this may change in the future, for the moment we have a ways to go before we can build truly efficient schemes that don't (potentially) have problems like the above.

And this is what makes cryptography so interesting. No matter what we say about our schemes being  'provably secure', the truth is: there's always something we haven't thought of. Just when you thought you'd finally solved all the major security problems (ha!), another one will always pop up to surprise you.


* The obvious objection is that a public key encryption algorithm takes in only a public key and a message. It doesn't take in the scheme's secret key. So: how can it check whether the given message is the encryption of the secret key? There's a general solution to this, but I'll leave it as an exercise for the reader.

** David Cash, Susan Hohenberger and I have summarized these results, along with several related to symmetric encryption, and will be presenting them at PKC 2012. If you're interested, a full version of our paper should appear here shortly.

Monday, April 16, 2012

So long False Start, we hardly knew ye

Last week brought the sad news that Google is removing support for TLS False Start from the next version of Chrome. This follows on Google's decision to withdraw TLS Snap Start, and caps off the long line of failures that started with the demise of Google Wave. Industry analysts are already debating the implications for Larry Page's leadership and Google's 'Social' strategy.

(I am, of course, being facetious.)

More seriously, it's sad to see False Start go, since the engineering that led to False Start is one of the most innovative things happening in Internet security today. It's also something that only Google can really do, since they make a browser and run a very popular service. The combination means they can try out new ideas on a broad scale without waiting for IETF approval and buy-in. Nor are they shy about doing this: when the Google team spots a problem, they tackle it head-on, in what I call the 'nuke it from orbit' approach. It's kind of refreshing.

Of course, the flipside of experimentation is failure, and Google has seen its share of failed experiments. The last was Snap Start, which basically tackled the same problem as False Start, but in a different way. And now False Start is going the way of the coelacanth: it'll still exist, but you'll have a hard time finding it.

The response from the security community has been a lot of meaningful chin-tapping and moustache-stroking, but not (surprisingly) a lot of strong opinions. This is because -- as we all recently discovered -- none of us had really spent much time thinking about False Start before Google killed it.* And so now we're all doing it retroactively.

What the heck is (was) False Start anyway?
Secure web (TLS) is dirt slow. At least, that's the refrain you'll hear from web companies that don't deploy it. There are many objections but they usually boil down to two points: (1) TLS requires extra computing hardware on the server side, and (2) it adds latency to the user experience.

Why latency? Well, to answer that you only need to look at the TLS handshake. The handshake establishes a communication key between the server and client. It only needs to happen once per browsing session. Unfortunately it happens at the worst possible time: the first time you visit a new site. You never get a second chance to make a first impression.

Why is the standard handshake so slow? Short answer: key establishment requires two round trips -- four 'flights' -- before the client can send any actual data (say, an HTTP GET request). If you're communicating via undersea cable, or if you're using a mobile phone with a huge ping time (my AT&T phone just registered 250ms!) this is going to add up.

Standard TLS handshake (source). Application data exchange only occurs at step (9) in the diagram. Note that step (5) is optional, and steps (4),(7) can be conducted in the same message 'flight'.
This is the problem that False Start tried to address. The designers -- Langley, Modadugu and Moeller -- approached this in the simplest possible way: they simply started sending data earlier.

You see, the bulk of the key exchange work in TLS really occurs in the first three messages, "client hello", "server hello", and "client key exchange". By the time the client is ready to send this last message, it already knows the encryption key. Hence it can start transmitting encrypted data right then, cutting a full roundtrip off the handshake. The rest of the handshake is just 'mop-up': mostly dedicated to ensuring that nobody tampered with the handshake messages in-flight.

If your reaction to the previous sentence was: 'isn't it kind of important to know that nobody tampered with the handshake messages?', you've twigged to the primary security objection to False Start. But we'll come back to that in a moment.

False Start sounds awesome: why did Google yank it?

The problem with False Start is that there are many servers (most, actually) that don't support it. The good news is that many of them will tolerate FS. That is, if the Client starts sending data early, they'll hold onto the data until they've finished their side of the handshake.

But not every server is so flexible. A small percentage of the servers on the Internet -- mostly SSL terminators, according to Langley -- really couldn't handle False Start. They basically threw up all over it.

Google knew about this before deploying FS in Chrome, and had taken what I will loosely refer to as the 'nuke it from orbit squared' approach. They scanned the Internet to compile a blacklist list of IPs that didn't handle False Start, and they shipped it with Chrome. In principle this should have done the job.

Unfortunately, it didn't. It seems that the non-FS-compatible servers were non-compliant in a weird non-deterministic way. A bunch looked good when Google scanned them, but really weren't. These slipped past Google's scan, and ultimately Google gave up on dealing with them. False Start was withdrawn.

Ok, it has issues. But is it secure?
From my point of view this is the much more interesting question. A bunch of smart folks (and also: me) wasted last Friday afternoon discussing this on Twitter. The general consensus is that we don't really know.

The crux of the discussion is the Server "finished" message (#8 in the diagram above). This message is not in TLS just for show: it contains a hash over all of the handshake messages received by the server. This allows the client to verify that no 'man in the middle' is tampering with handshake messages, and in normal TLS it's supposed to verify this before it sends any sensitive data.

This kind of tampering isn't theoretical. In the bad old days of SSL2, it was possible to quietly downgrade the ciphersuite negotiation so that the Browser & Server would use an 'export-weakened' 40-bit cipher even when both wanted to use 128-bit keys. Integrity checks (mostly) eliminate this kind of threat.

The FS designers were wise to this threat, so they include some mitigations to deal with it. From the spec:
Clients and servers MUST NOT use the False Start protocol modification in a handshake unless the cipher suite uses a symmetric cipher that is considered cryptographically strong ... While an attacker can change handshake messages to force a downgrade to a less secure symmetric cipher than otherwise would have been chosen, this rule ensures that in such a downgrade attack no application data will be sent under an insecure symmetric cipher.
While this seems to deal with the obvious threats ('let's not use encryption at all!'), it's not clear that FS handles every possible MITM threat that could come up. For example, there were a number of mitigations proposed to deal with the BEAST attack -- from basic things like 'don't use TLS 1.0' to 'activate the empty-fragment' protection. There's some reason to be fearful that these protections could be overwhelmed or disabled by an attacker who can tamper with handshake messages.

Note that this does not mean that False Start was actually vulnerable to any of the above. The last thing I want to do is claim a specific vulnerability in FS. Or rather, if I had a specific, concrete vulnerability, I'd be blogging about it.

Rather, the concern here is that TLS has taken a long time to get to the point it is now -- and we're still finding bugs (like BEAST). When you rip out parts of the anti-tamper machinery, you're basically creating a whole new world of opportunities for clever (or even not-so-clever) attacks on the protocol. And that's what I worry about.

Hands off my NSS
There's one last brief point I'd like to mention about False Start and Snap Start, and that has to do with the implementation. If you've ever spent time with the NSS or OpenSSL code, you know that these are some of the finest examples of security coding in the world.

Haha, I kid. They're terrifying.

Every time Google adds another extension (Snap Start, False Start, etc.), NSS gets patched. This additional patch code is small, and it's written by great coders -- but even so, this patching adds new cases and complexity to an already very messy library. It would be relatively easy for a bug to slip into one of these patches, or even the intersection of two patches. And this could affect more than just Google users.

In summary

It may seem a little odd to spend a whole blog post on a protocol that's already been withdrawn, but really: it's worth it. Examining protocols like False Start is the only way we learn; and making TLS faster is definitely something we want to learn about. Second, it's not like False Start is the last we'll hear from the Google TLS crew. It'll certainly help if we're up-to-speed when they release "False Start II: False Start with a Vengeance".

Finally, False Start isn't really dead -- not yet, anyway. It lives on in Chrome for conducting Next Protocol Negotiation (NPN) transactions, used to negotiate SPDY.

On a final note, let me just say that I'm mostly enthusiastic about the work that Google is doing (modulo a few reservations noted above). I hope to see it continue, though I would like to see it get a little bit more outside scrutiny. After all, even though Google's primarily acting as its own test case, many people use Chrome and Google! If they get something wrong, it won't just be Google doing the suffering.


* With some notable exceptions: see this great presentation by Nate Lawson.

Wednesday, April 11, 2012

It's the end of the world as we know it (and I feel fine)

Unitards: another consequence
of quantum computing.
Back in December I asked readers for some topics they were particularly keen to read about on this blog. One of the best (and sadly, most challenging) suggestions was to say something about post-quantum cryptography. Roughly speaking, this term describes the future of cryptography after quantum computers arrive and screw things up for everyone.

Now, just for the record, quantum computing is not a bad thing; in fact, it's one of the most exciting discoveries of our time. At minimum QCs will help us efficiently solve challenging problems that take forever on today's computers. It's entirely possible, though not guaranteed, that they'll totally revolutionalize the way we compute (see this breathless article). Whichever it is, quantum computers will be a general good.

But not necessarily for cryptographers.

This is because cryptographers don't always want faster algorithms for solving hard problems. It isn't that we thrive on people's frustration; it's that many of our best cryptosystems simply depend on the hardness of problems like integer factorization. By making these problems easy, QCs throw a huge wrench into the works. And this isn't a hypothetical concern: thanks to Shor's algorithm, many of our most beloved cryptosystems are already living on borrowed time.

But all is not lost. We have two very important things working in our favor. The first is time. For all the excitement around quantum computing, nobody's managed to actually build a useful one. And (no matter what D-Wave says) it doesn't look like they will anytime soon.

The second is that quantum computers are not magic. QCs are very good at quickly solving certain problems, but their usefulness is more limited than most people realize. Moreover, we only know of a handful of useful quantum algorithms today. This could change: think how many classical algorithms were discovered after classical computers were built. But it gives us hope.

In the rest of this post I'm going to try to give a flavor of the problem, and point out some of the bright spots -- areas where we feel pretty good. I want to be clear that this isn't my research area, and many of the best ideas in this post come from the excellent book Post-Quantum Cryptography, edited by Bernstein, Buchmann and Dahmen. (A dead-tree copy of the book will run you $90, but you can check out Dan Berstein's fantastic intro right here.)

Quantum vs. Classical

Quantum computers are only human. (Well, technically,
they're complicated machines full of super-cooled liquid
nitrogen and helium.) 
There are many ways to build a classical computer. Whether you use transistors, gears, or photons, one thing binds all of these designs together: at any given moment, a classical computer should be in exactly one state.

To bring this back to cryptography, let's say you're trying to brute-force your way through the keyspace of a cryptosystem. In a classical computer this means trying one key at a time, for as many keys as it takes to get a useful result. Parallelization can speed this up, but ultimately just moves the problem sideways -- trading resources for time.

Quantum computers use an entirely different model of computing. The fundamental component of a quantum computer is a quantum bit ('qubit'), a value that can be in two states (1 and 0) at the same time. By entangling n qubits, it's possible to construct a computer that exists in a 'superposition' of 2^n states simultaneously. And this is when things get interesting.

And complicated. Entrangling qubits is one thing (and not necessarily an easy thing). The real trick is finding a way to do computation with a computer that's in more than one state, and -- even more importantly -- to pull a single 'right' answer back into our classical world once the computation is done.

These words obscure a lot of detail; an entire field's worth of detail. What matters for our purposes is that in some cases it seems like quantum algorithms can harness this power to accomplish things much faster than classical algorithms. When this happens, it makes certain problems much easier (i.e., less time-consuming) to solve. But not every problem.

Quantum computers are not magic!

The important thing to know is that quantum computers can't do everything. They can't see the future. They can't even (we believe) solve NP-complete problems in polynomial time. In fact, the general belief among researchers is that the class BQP -- roughly, things that quantum computers can do efficiently -- does not include all of NP (though in fairness, nobody knows for sure).

Proposed hierarchy of complexity classes
with a guess for where BQP may fit. (Wikipedia)
More importantly, anything that a quantum computer can do can also be done by a classical computer, though possibly with an exponential slowdown. This means that QCs aren't going to do anything against information-theoretically secure cryptosystems, like the One Time Pad. Those schemes don't care how powerful the attacker's computer is.

Given these facts, you might not all that impressed with quantum computers right now. You'd be wrong.

The man himself

Dr. Peter Shor
The reason we're so freaked out about quantum computing is because of the harmless-looking gentleman on the right. Dr. Peter Shor claims responsibility for one of the most amazing, yet ostensibly useless, theoretical accomplishments in the history of computing: he developed a polynomial-time algorithm for factoring large integers. The only problem is that nobody has a computer that can run it.

I can only imagine how he explains this to his neighbors.

Shor's algorithm actually has two components: (1) a classical portion that converts the factoring problem into an order-finding problem. And (2) a quantum portion that solves the latter in polynomial time (with some reasonable error probability). Shor's algorithm can also be used to solve the Discrete Logarithm Problem (DLP) in fields and (with tweaks) elliptic curve groups.

The problem? Almost every practical public-key cryptosystem in use today depends on the hardness of the above problems. That includes RSA and Diffie-Hellman (both used in TLS), Elgamal (used in PGP), DSA and ECDSA (used to sign X.509 certificates, among other things), and even the Rabin cryptosystem (used primarily by Bram Cohen).

These are important algorithms, used today not just to protect short-term data, but also to encrypt data that may be still be important several decades from now. If anyone manages to build a reasonable quantum computer in that time, some people are going to be pretty unhappy.

I should quickly mention that Shor's algorithm is not the only quantum algorithm that cryptographers care about -- it just happens to be the most interesting one. There's also Grover's algorithm, which can be used to improve brute-force attacks on symmetric encryption schemes like AES. But the speedup from Grover's algorithm is 'only' quadratic (it essentially halves the key length), not exponential like Shor's algorithm. We can live with quadratic.

So what can we do about it?

With all the background out of the way, we can finally get to the point of this post. What can cryptographers do to deal with the looming potential of quantum computers? And when should we start doing it? There are a few answers to this question.

Solution #1: Business as usual

This is probably my favorite solution. Don't do anything. Quantum computing is a long way off, and it may never happen at all. Let's just keep doing what we're doing, until we're old enough that someone gives us a pension and we can dump the problem in someone else's lap. Besides, isn't there an asteroid heading towards the Earth?

No, this isn't really a solution at all, but it seems to be the default mentality right now. It probably isn't hurting us -- for the moment.

Solution #2: Don't overreact

While Shor's algorithm is a pretty serious threat to RSA, (EC)DSA and most of the public-key algorithms we use in practice, these aren't the only algorithms that we use. And the good news is that quantum computers probably won't have the same effect on those.

AES, for example, would be affected by Grover's algorithm, but only (in the absolute absurdly worst case), by reducing the effective key size by half. So a 256-bit key would still provide at least the same security margin as today's 128-bit keys. We can probably live with that -- and if not, we can always get bigger keys.

The only hole in this argument is that we have absolutely no idea what specific attacks people will develop against ciphers like AES in the future . Even now -- using only classical algorithms -- researchers have managed to shave a couple of bits off of the AES security margin. Quantum computers may substantially expand our toolbox.

Solution #3: Lattices, lattices, lattices!

If you haven't been to a crypto conference lately, you haven't been exposed to the new hotness in our field: lattice-based cryptography. Lattices are a mathematical structure that happens to turn up a lot in nature: for example, in the ordering of atoms in a crystal.

In a more abstract setting, lattice 'problems' -- can be used to build a wide variety of cryptosystems. The best known examples include the NTRU scheme (used for encryption and signature) and Craig Gentry's fully-homomorphic encryption scheme. But there are plenty more.

Lattices get a lot of attention these days, in part because you can do neat things with them (fully-homomorphic encryption is a great example). But a more general argument for lattice-based crypto is its resistance to quantum attacks. In contrast to factoring and DLP, there are no quantum algorithms known to run (substantially) faster than classical algorithms against the lattice problems we use to construct modern cryptosystems.

As with all of these claims, the keyword is 'known'. Will we someday find such algorithms? Who knows. But deploying schemes that aren't quantum-broken (yet) certainly feels a lot better than deploying schemes that are.

Solution #4: Fight fire with fire

QKD rig.
The last, and possibly most desperate solution of all, is to dispense with computational problems altogether, and simply use the strength of quantum mechanics against itself.

This type of cryptography is known Quantum Key Distribution. QKD uses quantum-mechanical effects, rather than hard computational problems, to protect messages transmitted over a long distance.

The basic idea is quite simple: anyone who measures the quantum state of a particle must by definition change it (physics students: you're welcome to correct me here!). The exception is entangled particles, which will both measure identically, even if the measurement is conducted at distant points. QKD rigs entangle particles, then transmit one set to a remote party via a long fiber-optic cable. If an attacker taps the cable and measures the particle, his tampering will be detected. If not, both parties can derive a random one-time pad, which they can then use to communicate. Quantum computers can't do anything about this.

Unfortunately the more 'perfect' something sounds, the less you should trust it, especially if it's a research system. This is what QKD researchers discovered last summer, when a team showed how to override the photon detectors in a QKD system with adversarially-generated signals of their own. The result: they were able to completely intercept the key without detection. This can all be fixed, but it goes to show that this is a harder problem than it looks.

More to the point, QKD requires a massive change in the way we distribute data. Today we send our ciphertexts through packet-switched networks, but QKD systems require dedicated fiber-optic connections between each participant. Thus, QKD isn't likely to give us a good replacement for TLS anytime soon.

In summary

This has been yet another long post. Before I wrap up I just want to preemptively apologize to all of the physicists and quantum complexity theorists whose work I've mangled along the way.

No matter how terribly I've done this, the basic story is sound: quantum computers are at least theoretically possible, they're quite likely to be practical, and they really will change a lot of things when they arrive. Mostly for the better. But not for the cryptosystems we use today.

If we don't find a way to do something about this, the next few decades are going to be a very interesting time for crypto. For this reason alone, I'm hoping that asteroid isn't coming; I'd really like to see how it all turns out.

Wednesday, April 4, 2012

iCloud: Who holds the key?

Ars Technica brings us today's shocking privacy news: 'Apple holds the master decryption key when it comes to iCloud security, privacy'. Oh my.

The story is definitely worth a read, though it may leave you shaking your head a bit. Ars's quoted security experts make some good points, but they do it in a strange way -- and they propose some awfully questionable fixes.

But maybe I'm too picky. To be honest, I didn't realize that there was even a question about who controlled the encryption key to iCloud storage. Of course Apple does -- for obvious technical reasons that I'll explain below. You don't need to parse Apple's Terms of Service to figure this out, which is the odd path that Ars's experts have chosen:
In particular, Zdziarski cited particular clauses of iCloud Terms and Conditions that state that Apple can "pre-screen, move, refuse, modify and/or remove Content at any time" if the content is deemed "objectionable" or otherwise in violation of the terms of service. Furthermore, Apple can "access, use, preserve and/or disclose your Account information and Content to law enforcement authorities" whenever required or permitted by law.
Well, fine, but so what -- Apple's lawyers would put stuff like this into their ToS even if they couldn't access your encrypted content. This is what lawyers do. These phrases don't prove that Apple can access your encrypted files (although, I remind you, they absolutely can), any more than Apple's patent application for a 3D LIDAR camera 'proves' that you're going to get one in your iPhone 5.

Without quite realizing what I was doing, I managed to get myself into a long Twitter-argument about all this with the Founder & Editor-in-Chief of Ars, a gentleman named Ken Fisher. I really didn't mean to criticize the article that much, since it basically arrives at the right conclusions -- albeit with a lot of nonsense along the way.

Since there seems to be some interest in this, I suppose it's worth a few words. This may very well be the least 'technical' post I've ever written on this blog, so apologies if I'm saying stuff that seems a little obvious. Let's do it anyway.

The mud puddle test

You don't have to dig through Apple's ToS to determine how they store their encryption keys. There's a much simpler approach that I call the 'mud puddle test':
  1. First, drop your device(s) in a mud puddle. 
  2. Next, slip in said puddle and crack yourself on the head. When you regain consciousness you'll be perfectly fine, but won't for the life of you be able to recall your device passwords or keys.
  3. Now try to get your cloud data back. 
Did you succeed? If so, you're screwed. Or to be a bit less dramatic, I should say: your cloud provider has access to your 'encrypted' data, as does the government if they want it, as does any rogue employee who knows their way around your provider's internal policy checks.

And it goes without saying: so does every random attacker who can guess your recovery information or compromise your provider's servers.

Now I realize that the mud puddle test doesn't sound simple, and of course I don't recommend that anyone literally do this -- head injuries are no fun at all. It's just a thought experiment, or in the extreme case, something you can 'simulate' if you're willing to tell your provider few white lies.

But you don't need to simulate it in Apple's case, because it turns out that iCloud is explicitly designed to survive the mud puddle test. We know this thanks to two iCloud features. These are (1) the ability to 'restore' your iCloud backups to a brand new device, using only your iCloud password, and (2) the 'iForgot' service, which lets you recover your iCloud password by answering a few personal questions.

Since you can lose your device, the key isn't hiding there. And since you can forget your password, it isn't based on that. Ergo, your iCloud data is not encrypted end-to-end, not even using your password as a key (or if it is, then Apple has your password on file, and can recover it from your security questions.) (Update: see Jonathan Zdziarski's comments at the end of this post.)

You wanna make something of it?

No! It's perfectly reasonable for a consumer cloud storage provider to design a system that emphasizes recoverability over security. Apple's customers are far more likely to lose their password/iPhone than they are to be the subject of a National Security Letter or data breach (hopefully, anyway).

Moreover, I doubt your median iPhone user even realizes what they have in the cloud. The iOS 'Backup' service doesn't advertise what it ships to Apple (though there's every reason to believe that backed up data includes stuff like email, passwords, personal notes, and those naked photos you took.) But if people don't think about what they have to lose, they don't ask to secure it. And if they don't ask, they're not going to receive.

My only issue is that we have to have this discussion in the first place. That is, I wish that companies like Apple could just come right out and warn their users: 'We have access to all your data, we do bulk-encrypt it, but it's still available to us and to law enforcement whenever necessary'. Instead we have to reverse-engineer it by inference, or by parsing through Apple's ToS. That shouldn't be necessary.

But can't we fix this with Public-Key Encryption/Quantum Cryptography/ESP/Magical Unicorns?

No, you really can't. And this is where the Ars Technica experts go a little off the rails. Their proposed solution is to use public-key encryption to make things better. Now this is actually a great solution, and I have no objections to it. It just won't make things better.

To be fair, let's hear it in their own words:
First, cloud services should use asymmetric public key encryption. "With asymmetric encryption, the privacy and identity of each individual user" is better protected, Gulri said, because it uses one "public" key to encrypt data before being sent to the server, and uses another, "private" key to decrypt data pulled from the server. Assuming no one but the end user has access to that private key, then no one but the user—not Apple, not Google, not the government, and not hackers—could decrypt and see the data.
I've added the boldface because it's kind of an important assumption.

To make a long story short, there are two types of encryption scheme. Symmetric encryption algorithms have a single secret key that is used for both encryption and decryption. The key can be generated randomly, or it can be derived from a password. What matters is that if you're sending data to someone else, then both you and the receiver need to share the same key.

Asymmetric, or public-key encryption has two keys, one 'public key' for encryption, and one secret key for decryption. This makes it much easier to send encrypted data to another person, since you only need their public key, and that isn't sensitive at all.

But here's the thing: the difference between these approaches is only related to how you encrypt the data. If you plan to decrypt the data -- that is, if you ever plan to use it -- you still need a secret key. And that secret key is secret, even if you're using a public-key encryption scheme.

Which brings us to the real problem with all encrypted storage schemes: someone needs to hold the secret decryption key. Apple has made the decision that consumers are not in the best position to do this. If they were willing to allow consumers to hold their decryption keys, it wouldn't really matter whether they were using symmetric or public-key encryption.

So what is the alternative?

Well, for a consumer-focused system, maybe there really isn't one. Ultimately people back up their data because they're afraid of losing their devices, which cuts against the idea of storing encryption keys inside of devices.

You could take the PGP approach and back up your decryption keys to some other location (your PC, for example, or a USB stick). But this hasn't proven extremely popular with the general public, because it's awkward -- and sometimes insecure.

Alternatively, you could use a password to derive the encryption/decryption keys. This approach works fine if your users pick decent passwords (although they mostly won't), and if they promise not to forget them. But of course, the convenience of Apple's "iForgot" service indicates that Apple isn't banking on users remembering their passwords. So that's probably out too.

In the long run, the answer for non-technical users is probably just to hope that Apple takes good care of your data, and to hope you're never implicated in a crime. Otherwise you're mostly out of luck. For tech-savvy users, don't use iCloud and do try to find a better service that's willing to take its chances on you as the manager of your own keys.

In summary

I haven't said anything in this post that you couldn't find in Chapter 1 of an 'Intro to Computer Security' textbook, or a high-level article on Wikipedia. But these are important issues, and there seems to be confusion about them.

The problem is that the general tech-using public seems to think that cryptography is a magical elixir that can fix all problems. Companies -- sometimes quite innocently -- market 'encryption' to convince people that they're secure, when in fact they're really not. Sooner or later people will figure this out and things will change, or they won't and things won't. Either way it'll be an interesting ride.

Update 4/4: Jonathan Zdziarski ‏tweets to say my 'mud puddle' theory is busted: since the iForgot service requires you to provide your birthdate and answer a 'security question', he points out that this data could be used as an alternative password, which could encrypt your iCloud password/keys -- protecting them even from Apple itself.

The problem with his theory is that security answers don't really make very good keys, since (for most users) they're not that unpredictable. Apple could brute-force their way through every likely "hometown" or "favorite sport" in a few seconds. Zdziarski suggests that Apple might employ a deliberately slow key derivation function to make these attacks less feasible, and I suppose I agree with him in theory. But only in theory. Neither Zdziarski or I actually believe that Apple does any of this.

Monday, April 2, 2012

Poker is hard, especially for cryptographers

I have this friend who's an obsessive poker fan, to the point where it's actually a little scary. The worst part is that as a computer scientist, he approaches the game with a numeracy that would make an Apollo 11 technician blush. He refuses to touch a hand until he's computed the precise odds, which wouldn't be so bad if he didn't tell me about it -- all while gibbering mysterious 19th-century words like 'flop' and 'river' and 'hole card'.

When this happens I try to smile and nod like I have some idea what he's talking about. But trust me, I don't. My card-playing experience runs to a few tense games of 'Go Fish!' and a drunken weekend in Vegas playing Blackjack.

Still, there's one aspect of card playing that I am fascinated by: mental poker. This is a blanket term for a protocol that allows mutually-distrustful people to deal and play cards without a dealer, or, indeed, any trusted party at all.

Like all crypto/privacy protocols, mental poker interests me because it's so counterintuitive. You get a bunch of people together, none of whom trust each other -- some of whom may be actively trying to cheat the others -- and at the end of the day everyone gets a fair deal, or at worst, enough warning to get out clean.

I also love the concept because it's so obviously disruptive. In case you hadn't heard, US law enforcement doesn't have warm, cuddly feelings towards online poker. This past spring, the FBI took down three of the biggest sites, and there's no reason to think they're done. I may not be a poker player myself, but decentralized mental poker appeals to me, mostly because it would be a lot harder to shut down -- especially if it was tied to a peer-to-peer currency such as Bitcoin. (On the other hand, you'd be playing for Bitcoin. So there's that.)

In the rest of this post I'm going to talk a little bit about where mental poker came from and how it works. I don't plan to give too many equations, but it will get a little (or a lot) wonky, and will certainly depart a bit from the pro-implementer, pro-engineering spirit of this blog. (If you want to pass, I certainly won't hold it against you.)

Secret history

Like every other problem in our field, 'Mental poker' was first proposed by Shamir, Rivest and Adleman (note to self: invent new research field, get dibs on all the interesting problems.) Their motivation was -- well, let's just let them tell it:
Once there were two "mental chess" experts who had become tired of their favorite pastime. Let's play "mental poker" for some variety suggested one. "Sure" said the other. "Just let me deal!"
In that first paper, Shamir et al. managed the fascinating trick of both proving that mental poker is impossible, and then giving a protocol to do it securely. (Don't get discouraged by this -- it happens.)

While the Shamir et al. paper may have been the first to address mental poker, it definitely wasn't the last. Indeed, it's an open secret that most of the 'fundamental' advances in cryptography -- semantically-secure encryption, zero-knowledge proofs, etc. -- were actually just happy accidents: things that dropped out while MIT cryptographers were fiendishly seeking better ways to play Five Card Stud with their colleagues at Weizmann.

Whether you believe this or not, the fact is that mental poker is really challenging, and did motivate a lot of new work. To explain this, let's go over the basics of the problem.
Face up or Face Down?

For a mental poker scheme to be useful, it has to achieve three basic goals:
  1. The shuffle must be random.
  2. The deck must be correct -- no trick aces! 
  3. Some cards must be dealt face down.
Let's focus on the last point for a moment. If you've ever toyed with the idea of writing a card game, you probably know how to represent face-up cards: just let each be an integer from 0 through 51. Obviously we'll need a table to map those values into something that humans can work with, but fundamentally this is no sweat.

Face-down cards, on the other hand, are more complicated. We need a publicly-visible data structure that can do what a physical card does: it must commit the holder to a particular face value. At the same time, it must hide that value from everyone else.

The simple and elegant solution (in case you didn't see it coming) is to encrypt the face-down cards using a semantically-secure encryption scheme such as Elgamal.

(A quick note: 'semantic security' is a fundamental security definition that's considered the minimum bar for every modern encryption scheme. In english, it means that you can't learn any useful information from a ciphertext. It goes without saying that the definition was proposed in a paper about mental poker.)

While encryption solves some of our problems, it doesn't quite solve them all. In fact, it gives us a big new one. If we're going to encrypt the cards, who the heck is going to hold the decryption key?

And this is where modern public-key cryptography really shines. A basic trick is that every player can generate a 'share' of the decryption key such that the sum of all those shares is the decryption key for the entire scheme (they can also use these shares to generate the public key). This allows the players to cooperatively decrypt any ciphertext, such that no individual player ever gets the entire decryption key.

And now we have almost enough to describe a 'simple' mental poker protocol.
Generate the deck. First, one player generates an ordered 'deck' of cards -- the integers (0, 1, 2, 3, ..., 51) -- and encrypts each one using known randomness. Since this player might be a cheater, she publishes all of her work so the group can repeat and verify it. If all goes well, the group should be confident that the deck is well-formed, i.e., there are no missing cards or duplicates.
Shuffle. The players now pass the deck around the 'table', giving each individual a chance to shuffle it thoroughly. This is harder than it sounds; it's not enough to simply permute the order of the encrypted cards, since the ciphertexts themselves are recognizable (think of each card as having a different back-face). The good news is that schemes like Elgamal allow you to re-randomize a ciphertext -- change its outward appearance without changing the underlying plaintext.
Unfortunately, re-randomizing ciphertexts leads to yet another serious limitation: if the output of the shuffle can't be linked to the input, then a malicious player could simply replace the cards instead of shuffling them. The last player to shuffle would essentially be deciding the order (and even the types of cards in) the deck!
Fortunately, in their quest for better ways to play poker, cryptographers have solved this this problem as well. The answer is something called a publicly-verifiable shuffle (used in Mix networks, which completely by accident were later shown to have applications in e-Voting and anonymous communication). In these systems, the shuffler can actually prove that the shuffled deck contains the same cards, without giving away the shuffle.
Dealing. If everything's gone well, and if at least one player is honest, the players should now have a beautifully shuffled deck of encrypted cards. It remains only to deal them. 
To do this, the players cooperatively decrypt the cards using their key shares. If the card is face-up, they go 'all the way' and simply reveal the plaintext to the whole group. If it's face-down, they collaborate most of the way, but let the recipient handle the last component of the decryption process.
Ok, maybe this wasn't so simple. And there's still a lot I'm leaving out -- including the whole Mix mechanism and a bunch of zero-knowledge proofs you'd need to prevent cheating in the decryption process. Still, figuring that stuff out isn't the problem.

The problem is that the protocol above is expensive.

A single shuffle alone can result in megabytes of data, which every party has to verify. So much for playing mental poker over the phone (or, to use a modern analogy, Twitter). Clearly a better approach is needed.

Golle to the rescue?

The problem with the 'classic' approach is that it requires the players to generate and shuffle an entire deck of cards every time they play. But this is pointless work, since many games don't actually use the entire deck. A better approach would generate random cards on the fly, then check for 'collisions': cards that have already been dealt to the players.

This is what Philippe Golle proposed back in 2005, in a paper submitted to the ITCC e-Gaming track (What is this? It sounds awesome.) I know about this paper because a student recently implemented it in Charm; hence I can attest to the fact that it's practical. Actually, it cooks.

Golle's idea was to use the additively-homomorphic property of schemes such as Elgamal,* to allow the parties to draw random cards. In a nutshell, each of the k players selects a random value in the range 0 to 51, then encrypts her value and publishes it to the other players. The group can now add the ciphertexts together, which gives them the encryption of (r_1 + r_2 + ... + r_k).

By working together the group can decrypt the summed ciphertext, which reveals a plaintext in the range 0 to (k*51). By reducing this modulo 52 they obtain a random card number.

The big obstacle to this approach is that you can get repeated cards, i.e., collisions. The clever part of Golle's protocol is the way that he checks that a given card hasn't been dealt to a player already, which allows him to 'throw back' any colliding cards.

I won't spend much time on how this works, except to point out that Golle's paper may have a small bug (though one that's easily fixed). To make a long story short, when a collision occurs in the first round of dealing -- i.e., a card is dealt that already exists in a player's hand -- Golle will actually 'throw back' both the new card, and the card that was already in the player's hand.

Why am I complaining about this? Well, imagine that a real poker dealer did this. That is, he paused, asked to look at your hand, took away the King of Hearts and put it back in the deck (with a brief apology for his mistake). It strikes me that this would not be viewed as kosher. I'll leave it to a real poker player to tell me what the implications are, but I'm trusting they're not good.

All the rest
This has been a long post, and while I hope it's given some of the flavor of the problem, obviously there's still tons I haven't said.

For example, how do you check for winning conditions? How do you handle players who 'drop out' when the cards don't go their way? And finally, how would you tie the result of the game to a Bitcoin 'pot'? (Bitcoin script seems like a great candidate for this, but unfortunately it's not well supported on the actual Bitcoin network.)

And of course, none of this addresses the real problem with online poker, which mere cryptography does not solve -- namely, how to keep your opponents from sharing their hands via a backchannel.

Still, this is a fun area that I'd love to see pursued with a bit more vigor. Yes, from time to time someone promises to do it, but those promises never quite materialize. So if you're a poker fan and you find this stuff interesting, please do drop me a line.


* For the super-wonky, Elgamal is actually multiplicatively homomorphic, but that's ok. What's being encrypted is g^r, for some generator g. The product of (g^r1 * g^r2 = g^{r1+r2}) which can be decrypted by testing the result against a table of pre-computed values. (This only works for small values).