Indifferentiability

After umpteen weeks writing about broken stuff, I’m thrilled to tell you that for once, nothing awful is happening in the crypto world. It won’t last. But while it does, let’s grab the opportunity to talk about something constructive. 

Now a word of warning: what I’m going to talk about today is fairly wonky and (worse), involves hash functions. If you’re not feeling up for this, this is your cue to bail and go read something nice about buffer overflows.

For those still with me, the subject of this post is the design of hash functions, and more specifically: the indifferentiability proofs that designers write to argue for their security. I was surprised to find that most people have never heard of these proofs, and thus have no idea why they’re useful. That’s too bad, since they’re extremely important to the way we analyze hash functions today.

Merkle-Damgård

This is not Ivan Damgård. (Seriously Google?)

The best way to begin any discussion of hash function design is to take a quick peek inside of the hash functions we actually use. Since the most popular hashes today are MD5 (ugh) and SHA, the right place to start is with the ‘Merkle-Damgård’ paradigm.

To understand Merkle-Damgård, you need to understand that cryptographers love to build complicated things out of simpler components. Under the hood of most block ciphers you’ll find S-boxes. Similarly, if you take the lid off a Merkle-Damgård hash function — surprise! — you find block ciphers. Or at least, something very much like them.

This approach dates to a 1979 proposal by a young cryptographer named Ralph Merkle. What Merkle showed is a way to build hash functions with a variable-length input, using any fixed one-way compression function (a one-way function that spits out fewer bits than it takes in). While Merkle wasn’t specific about the function, he suggested that DES might be a good candidate.

Expressed as a colorful diagram, the Merkle construction looks something like this:

Merkle-Damgård Construction (source: Wikipedia because I’m too lazy to
draw my own diagrams). IV is a fixed value. f is a one-way compression function.

The beauty of Merkle’s proposal is that it’s relatively simple to understand. You simply chop your message into blocks, then feed each block into the function f along with the output of the previous function evaluation. Throw in a finalization stage and you’re done.

Of course there’s a difference between proposing a technique and showing that it actually works. It would take ten more years, but at CRYPTO 1989, Merkle and another cryptographer named Ivan Damgård independently submitted formal analyses of Merkle’s proposal. What they showed is that as long as the function f has certain ideal properties, the resulting hash function is guaranteed to be collision-resistant.  The rest, as they say, is history.

The popularity of Merkle-Damgård can be attributed in part to its security proof. But it also owes something to some major practical advantages:

  1. You can use any secure block cipher as the function f, with just a few tweaks.
  2. M-D hash functions can be pretty darn fast, again depending on f and how you use it.
  3. M-D hashes allow you to digest ‘live’ data streams, where you don’t know in advance how much data you’re going to be hashing. 
Of course, Merkle-Damgård hashes also have serious weaknesses. The most famous is the ‘length extension attack‘ in which an attacker, given only H(M) for some unknown message M, can ‘tack on’ additional blocks of her own choosing. This issue spells big trouble for people who think that H(key || message) is a good Message Authentication CodeWhat’s interesting about the length-extension issue is not that it leads to broken MACs. I mean, that is interesting, and it’s why you should use HMAC. But what’s really interesting is that this flaw doesn’t represent a violation of the collision-resistance guarantee. The two issues are in fact completely orthogonal. And this tells us something fundamental. Namely: collision-resistance is not enough.Today’s implementers do all kinds of crazy things with hash functions, and many of those applications require much more than collision-resistance. To achieve the necessary properties, we first need to figure out what they are. And that requires us to think hard about the following question:

What the heck is a secure hash function?

If you crack a typical security textbook (or visit the Wikipedia page on hash functions), you’ll see a long list of things of things a hash function ‘must’ accomplish. The list usually starts with these:
  1. Collision resistance. It should be hard to find any pair of messages M1, M2 such that H(M1) == H(M2).
  2. Pre-image resistance. Given only h it should be hard to find a ‘pre-image’ M2 such that H(M2) == h.

Now leave aside the technical fact that none of the unkeyed hash functions we use today are ‘truly’ collision-resistant. Or that the above definition of pre-image resistance implies that I can hash my cat’s name (‘fluffy’) and nobody can invert the hash (note: not true. Go ask LinkedIn if you don’t believe me.) The real problem is that these definitions don’t cover the things that people actually do with hash functions.

For example, take the construction of PRNGs. A common PRNG design hashes together large pools of gathered entropy in the hope that the result will be sufficiently uniform for cryptographic work. This is so common that it’s probably happening somewhere on your computer right now. And yet, absolutely nothing in the definitions above implies that this technique is safe!* Similar problems exist for key derivation functions, and even for signature schemes like ECDSA which clearly require hash functions that are more than just collision-resistant.
The more you look into the way that people use hash functions, the more you realize that they really need something that produces ‘random-looking’ output. Unfortunately, this notion is surprisingly hard to formalize. Hash functions are unkeyed, so they’re not pseudo-random functions. What in the world are people asking for?

Random oracles and indifferentiability

The answer, if you dig hard enough, is that people want hash functions to be random oracles.

Random oracles are cryptographers’ conception of what an ‘ideal’ hash function should be. Put succinctly, a random oracle is a perfectly random function that you can evaluate quickly. Random functions are beautiful not just because the output is random-looking (of course), but also because they’re automatically collision-resistant and pre-image resistant. It’s the only requirement you ever need.

The problem with random functions is that you just can’t evaluate them quickly: you need exponential storage space to keep them, and exponential time to evaluate one. Moreover, we know of nothing in the ‘real’ world that can approximate them. When cryptographers try to analyze their schemes with random functions, they have to go off into an imaginary fantasy world that we call the ‘random oracle model.

But ok, this post is not to judge. For the moment, let’s imagine that we are willing to visit this fantasy world. An obvious question is: what would it take to build a random oracle? If we had a compression function that was good enough — itself a random function — could we use a technique like Merkle-Damgård to get the rest of the way?

In 2004, Maurer, Renner and Holenstein gave us a powerful tool for answering this question. What they showed is that it’s always possible to replace functionality A (e.g., a random oracle) with another functionality B (e.g., an ideal compression function) provided that the following rules are satisfied:

  1. There exists a way to ‘construct’ something ‘like’ A out of B.
  2. There exists a way to ‘simulate’ something ‘like’ B using A.
  3. An attacker who interacts with {constructed A-like thing, B} cannot tell the difference (i.e., can’t differentiate it) from {A, simulated B-like thing}

The definition of simulation gets a bit wonky. but expressed in simpler language all this means is: if you can show that your hash function, instantiated with an ‘ideal’ compression function, looks indistinguishable from a random oracle. And you can show that a manufactured compression function, built using a random oracle as an ingredient, looks indistinguishable from an ideal compression function, then you can always replace one with the other. That is, your hash function is ‘good enough’ to be a random oracle.

The following year, Coron, Dodis, Malinaud and Puniya applied this framework to Merkle-Damgård-hash functions. Their first result was immediate: such a proof does not work for Merkle-Damgård. Of course this shouldn’t actually surprise us. We already know that Merkle-Damgård doesn’t behave like a random oracle, since random oracles don’t exhibit length-extension attacks. Still it’s one thing to know this, and another to see a known problem actually turn up and screw up a proof. So far, no problem.

What Coron et al. showed next is much more interesting:
  • They proved formally that Merkle-Damgård can be made indifferentiable from a random oracle, as long as you apply a prefix-free encoding to the input before hashing it. Prefix-free encodings prevent length-extensions by ensuring that no message can ever be a prefix of another.
  • Next, they proved the security of HMAC applied to a Merkle-Damgård hash.
  • Finally, and best of all, they showed that if you simply drop some bits from the last output block — something called a ‘chop’ construction — you can make Merkle-Damgård hashes secure with much less work.

The best part of Coron et al.‘s findings is that the chop construction is already (inadvertently) in place on SHA384, which is constructed by dropping some output bits from its big-brother hash SHA512. The modern hash variants SHA512/224 and SHA512/256 also have this property.** So this theoretical work already has one big payoff: we know that (under certain assumptions) these hashes may be better than some of the others.

And these results have bigger implications. Now that we know how to do this, we can repeat the process for just about every candidate hash function anyone proposes. This lets us immediately weed out obvious bugs, and avoid standardizing another hash with problems like the length extension attack. This process has become so common that all of the SHA3 candidates now sport exactly such an indifferentiability proof.

Of course, in the real world, indifferentiability only takes you so far. It does tell us something, but it doesn’t tell us everything. Sure, if the compression function is perfect, you obtain a strong result about the hash function. But compression functions are never perfect. Real compression functions have glitches and oddities that can make these theoretical results irrelevant. This is why we’ll always need smart people to arm wrestle over which hash we get to use next.

In conclusion

If I had it in me, I’d go on to talk about the SHA3 candidates, and the techniques that each uses to achieve security in this model. But this has already been a long post, so that will have to wait for another time.

I want to say only one final thing.

This is a practical blog, and I admit that I try to avoid theory. What fascinates me about this area is that it’s a great example of a place where theory has directly come to the aid of practice. You may think of hash functions as whizzing little black boxes of ad-hoc machinery, and to some extent they are. But without theoretical analysis like this, they’d be a whole lot more ad-hoc. They might not even work.

Remember this when NIST finally gets around to picking Keccak BLAKE.

Notes:

* For a ridiculous example, imagine that you have a secure (collision-resistant, pre-image resistant) hash function H. Now construct a new hash function H’ such that H'(M) = {“long string of 0s” || H(M)}. This function is as collision-resistant as the original, but won’t be very useful if you’re generating keys with it.

** Thanks to Paulo Barreto for fixing numerous typos and pointing out that SHA512/256 and /224 make excellent candidates for chop hashes!

A bad couple of years for the cryptographic token industry

SafeNet eToken PRO Anywhere

There was a time just a few weeks ago when it seemed like the CRYPTO 2012 accepted papers list might not be posted in time for, well, CRYPTO 2012. Fortunately the list is public now, which means (1) we’ve avoided the longest rump session in the history of cryptography, and (2) I get to tell you about a particularly neat paper.

The paper in question is Efficient Padding Oracle Attacks on Cryptographic Hardware‘ by Bardou, Focardi, Kawamoto, Simionato, Steel and Tsay. This is a typically understated academic title for a surprisingly nifty result.

Here’s the postage stamp version: due to a perfect storm of (subtle, but not novel) cryptographic flaws, an attacker can extract sensitive keys from several popular cryptographic token devices. This is obviously not good, and it may have big implications for people who depend on tokens for their day-to-day security. If that describes you, I suggest you take a look at this table:

Tokens affected by the Bardou et al. attacks. A checkmark means “vulnerable”. (source).

That’s the headline news, anyway. The more specific (and important) lesson for cryptographic implementers is: if you’re using PKCS#1v1.5 padding for RSA encryption, cut it out. Really. This is the last warning you’re going to get.

So much for the short version. Keep reading for the long one.

What are cryptographic tokens?

If you’ve ever developed cryptographic software, you know that general-purpose computers aren’t the safest place to store keys. Leaving aside stuff like this, all it takes is one nasty software bug to send all your important key material (e.g., TLS server keys, certificate signing keys, authentication keys) into someone else’s hands.

Since it seems increasingly difficult to harden standard computers, a popular solution is to simply put the keys somewhere else. The most popular ‘somewhere’ is onboard a cryptographic co-processor. When the co-processor is expensive and powerful, we often refer to it as a Hardware Security Module (HSM). When it’s removable and consumer-grade we call it a ‘token‘. Most tokens are used for authentication purposes (think SecurID), but they’re useful for other things like disk encryption. Modern tokens typically look like USB sticks.

Tokens operate under some of the most challenging conditions you can imagine. The computer they’re connected to may be completely pwned. An attacker could steal a token and physically dismember it in an attempt to extract its secrets. None of this is easy to deal with, and manufacturers have invested a lot of time and money getting this right.

Unfortunately, what Bardou et al. are telling us is: maybe they didn’t invest enough. Several popular token brands are still vulnerable to cryptographic attacks that have been well-known since the 1990s. And that is bad news.

But this paper is brand new. Why are you talking about ‘a bad couple of years’?

While this new paper is a bad result for the token industry, it’s not the first bad news we’ve had about these devices.

Let me explain. You see, while tokens won’t let you extract sensitive keys, they make an exception if the sensitive keys are encrypted (aka: ‘wrapped’). This makes a lot of sense. How else are you supposed to import/export/back up keys from a token? In theory, this encryption (wrapping) should keep the keys safe even if an attacker has totally compromised your computer.

An older problem — discovered by a series of researchers in 2009 and 2010 — is that token manufacturers were careless in the way they implemented the PKCS#11 Cryptographic Token Interface Standard. The spec says that encryption keys are to be used for either (a) encryption/decryption of normal data, OR (b) for wrapping sensitive keys. But never both. If you allow users to perform both operations with a given key, you enable the following attack:

  1. Ask the token to encrypt a sensitive key K and export the ciphertext.
  2. Ask the token to decrypt the ciphertext and hand you the result.
  3. Congratulations, you have K!

It’s hard to believe that something this ‘stupid’ actually worked, but for some tokens it did. The whole mess reached a head in 2010 when Bortolozzo, Focardi, Centenaro and Steel automated the process of finding API bugs like this, and were able to ‘break’ a whole bunch of tokens in one go.

(Token manufacturers were not happy with this, by the way. At CCS 2010, Graham Steel described being invited to a major French token manufacturer where the CEO held him captive and berated him gently tried to persuade him not to publish his results. Such are the hazards of responsible disclosure.)

The above issues should be long fixed by now. Still, they raised questions! If token manufacturers can’t implement a secure API, how well are they doing with things like, say, complicated cryptographic algorithms? And that’s the question this new paper tries to answer.

So what’s wrong with the crypto?

What the CRYPTO paper shows is that several commercial tokens (but not all of them!) are vulnerable to practical padding oracle attacks on both symmetric (AES-CBC) and asymmetric (RSA PKCS #1v1.5) encryption. These attacks are a huge problem, since they give an attacker access to keys that otherwise should be safely encrypted outside of the token.

Padding attacks take different forms, but the general intuition is this:

  1. Many encryption schemes apply padding to the plaintext before encryption.
  2. After decrypting a given ciphertext, most implementations check to see if the padding is valid. Some output a recognizable error (or timing differential) when it isn’t.
  3. By mauling a legitimate (intercepted) ciphertext in various ways, the attacker can submit many modified versions of the ciphertext to be decrypted. Some of these will have valid padding, others will produce error results. Suprisingly, just these bits of information can be sufficient to gradually reveal the underlying plaintext.

For tokens, you start with an encrypted key you want to learn, and you submit it to the token for import. Normally the token will just accept it, which isn’t very helpful. So instead, you modify the ciphertext in some way, and send the mangled ciphertext to be imported. If the token responds with something like ‘could not import key: the ciphertext had bad padding’, you may be able to run a padding oracle attack.

(Note that it doesn’t take an explicit error: if decryption is simply faster when the padding is bad, you’ve got everything you need.)

The most famous (and efficient) padding oracle attack is Vaudenay’s attack on padded CBC-mode encryption, proposed back in 2002. (See here for an idea of what the attack looks like.) The first finding of Bardou et al. is that many tokens are vulnerable to Vaudenay’s attack. Yikes.

But what about RSA encryption? This is important, since many tokens only support RSA to import secret keys.

Fortunately (for the researchers) RSA encryption also uses padding, and even better: these tokens use a scheme that’s long been known to have padding oracle problems of its own. That scheme is known as RSA-PKCS#1v1.5, and it was famously broken by Daniel Bleichenbacher in the late 1990s. As long as these implementations produce obvious errors (or timing differences) when they encounter bad padding, there should be an ‘off the shelf’ attack that works against them.

But here the researchers ran into a problem. Bleichenbacher’s attack is beautiful, but it’s too slow to be useful against tokens. When attacking a 1024-bit RSA key, it can require millions of decryption attempts! Your typical consumer-grade token token can only compute a few decryptions per second.

No problem! To make their attack work, Bardou et al. ‘simply’ optimized Bleichenbacher’s attack. The new version requires mere thousands (or tens of thousands) of decryption attempts, which means you can unwrap sensitive keys in just a few minutes on some tokens.

This table tells us how long the attacks actually took to run against real tokens:

Timing results for the optimized Bleichenbacher attacks (on 1024-bit RSA encryptions). ‘Oracle’ indicates the nature of the RSA decryption/padding check implementation. (source)

Is my HSM secure?

If you’re an HSM manufacturer I have ‘good’ news for you. You’re ok. For now.

Oh, not because your HSM is secure or anything. Hilariously, the researchers were unable to run their attacks on a commercial HSM because they couldn’t afford one. (They can costs upwards of EUR20,000.) Don’t get complacent: they’re working on it.

How does the RSA attack work?

For the real details you should see the paper. Here I can offer only the tiniest little intuition.

The Bleichenbacher attack relies on the fact that RSA is homomorphic with respect to multiplication. That means you can multiply a hidden RSA plaintext (the “m” in “m^e mod N”) by some known value “s” of your choosing. What you do is compute “m^e * s^e mod N”. The decryption of this value is “(m*s) mod N”.

Now consider the details of RSA-PKCS#1v1.5. To encrypt a legitimate message, the encryptor first pads it with the following bytes, and treats the padded result as the integer “m” to be RSA encrypted.

0x 00 02 { at least 8 non-zero random bytes } 00 { message }

So you’ve intercepted a valid, PKCS#1v1.5 padded RSA ciphertext, and you have access to a decryptor who knows the secret key. The trick now is to ‘mangle’ the legitimate ciphertext in various clever ways and submit these new results to the decryptor.

The basic approach is to multiply the ciphertext by different chosen values “s^e mod N” (for known “s”) until eventually you find one that does not produce a padding error. This means that the decrypted value “(m*s mod N)” is itself a PKCS-padded message. This seems inconsequential, but actually it tells you quite a lot. Presumably you know “s”, since you chose it. You also know that “m*s mod N” is an integer of a very specific form, one that satisfies the padding checks. By adaptively choosing new “s” values and submitting many, many ciphertexts for decryption, these tiny bits of information will allow you to ‘zero in’ on the underlying plaintext.

The authors of this paper describe several optimizations to Bleichenbacher’s attack. What I find most interesting is the way they exploited differences in the way that tokens actually implement their padding checks. It turns out that this nature of this implementation makes a huge difference to the efficiency of the attack.

Five possible ‘implementations’ are described below.

  1. FFF: padding is ‘ok’ only if correctly padded and plaintext is of a specific length (e.g., it’s a 128-bit AES key)
  2. FFT: padding is ‘ok’ only if correctly padded, but plaintext can be any length.
  3. FTT: same as above, but also allows 0s in the “non-zero random bytes”.
  4. TFT: same as above, but ‘ok’ even if there are no zeros after the first byte.
  5. TTT: padding is ‘ok’ as long as it starts with ‘0x 00 02’.

These all look quite similar, but have very different practical implications!

You see, intuitively, the Bleichenbacher attack only ‘works’ if sometimes the decryptor says ‘hey, this mauled ciphertext has good padding!’. As a general rule, the more often that happens, the faster the attack works.

It goes without saying that the TTT padding check is a lot more likely to produce ‘good padding’ results than the ‘FFF’, and correspondingly for the intermediate ones. FFF is so challenging that it’s basically impractical. The ‘good’ (uh, ‘bad’) news is that many tokens tend towards the more useful implementations. And as you can see in the table below, little details like this have huge impacts.

Total number of decryption queries required to decrypt an RSA-PKCS#1v1.5-encrypted symmetric (AES) key.  ‘Original algorithm’ (blue box) shows the mean/median performance of the standard Bleichenbacher attack. ‘Modified algorithm’ (red box) is the mean/median performance of the new, optimized algorithm. Note that the specific implementation of the padding check (FFF/TTT etc.) makes a huge difference in the cost. (source) 

What can I do to protect my own code from padding oracle attacks?

If you’re using symmetric encryption, that’s easy. Simply use authenticated encryption, or put a MAC on all of your ciphertexts. Do this correctly and Vaudenay padding oracles will not trouble you.

RSA encryption is more difficult. The ‘best practice’ in implementing RSA is: don’t implement RSA. Other people have done it better than you can. Go find a good implementation of RSA-OAEP and use that. End of story.

Unfortunately, even with RSA-OAEP, this can be difficult. There are attacks on OAEP encryption too (some are mentioned in this paper). And if you’re working in hardware, life is particularly difficult. Even small timing differentials can leak details about your padding check. If this describes your use case: find a professional and be very, very careful.

Are there any broad security lessons, any wisdom, to be drawn from this result?

Usually the lesson I glean from these attacks is: people make mistakes. Which is barely a lesson, and it’s certainly not wisdom. For some of the issues in this paper, namely the symmetric padding oracle attacks, that’s still about what I get. Manufacturers made unforced errors.

But the RSA-PKCS#1v1.5 attacks are, to me, a bit more meaningful. I run into bad implementations of this obsolete standard this all the time. Most of the time when I bring it up, nobody really cares. The typical response is something like: it’s called the ‘million message attack’ for a reason. There’s no way anyone run that thing against our system.

Bruce Schneier says ‘attacks only get better, they never get worse’. In my opinion, the lesson of this paper is that attacks not only get better, but if you go around tempting fate, they’re quite capable of getting better by precisely the amount necessary to break your (high-value) system. You should never use a broken primitive just because the ‘known’ attack seems impractical. 

And while this may not be ‘wisdom’ either, I think it’s advice to live by.

Flame, certificates, collisions. Oh my.

Update 6/6: Microsoft has given us more details on what’s going on here. If these collision attacks are truly novel, this tells us a lot about who worked on Flame and how important it is. 

Update 6/7: Yes, it’s officially a novel MD5 collision attack, with the implication that top cryptographers were involved in Flame’s creation. We truly are living in the future.

See detailed updates (and a timeline) at the bottom of this post.

If you pay attention to these things, you’ve heard that there’s a new piece of government-issued malware making the rounds. It’s called Flame (or sometimes Skywiper), and it’s basically the most sophisticated — or most bloated — piece of malicious software ever devised. Kaspersky gets the credit for identifying Flame, but the task of tearing it to pieces has fallen to a whole bunch of different people.

Normally I don’t get too excited about malware, cause, well, that kind of thing is somebody else’s problem. But Flame is special: if reports are correct, this is the first piece of malware ever to take advantage of an MD5 certificate collision to enable code signing. Actually, it may be the first real anything to use MD5 certificate collisions. Neat stuff.

What we know is pretty sketchy, and is based entirely on the information-sparse updates coming from Microsoft’s security team. Here’s the story so far:

Sometime yesterday (June 3), Microsoft released an urgent security advisory warning administrators to revoke two Intermediate certificates hanging off of the Microsoft root cert. These belong to the Terminal Services Licensing service. Note that while these are real certificates — with keys and everything — they actually have nothing to do with security: their sole purpose was to authorize, say, 20 seats on your Terminal Services machine.

Despite the fact that these certs don’t serve any real security purpose, and shouldn’t be confused with anything that matters, Microsoft hung them off of the same root as the real certificates it uses for, well, important stuff. Stuff like signing Windows Update packages. You see where this is going?

Actually, this shouldn’t have been a real problem, because these licensing certs (and any new certificates beneath them) should have been recognizably different from code-signing certificates. Unfortunately, someone in Product Licensing appears to have really screwed the pooch. These certificates were used to generate Licensing certificates for customers. And each of those certificates had the code signing bit set.

The result? Anyone who paid for a Terminal Services license could (apparently) sign code as if they were Microsoft. If you’re wondering what that looks like, it looks like this.

This is obviously a bad thing. For many reasons. Mikko Hypponen points out that many organizations whitelist software signing by Microsoft, so even if institutions were being paranoid, this malware basically had carte blanche.

Ok, so far this is really bad, but just a garden-variety screwup. I mean, a huge one, to be sure. But nothing really interesting.

Here’s where that changes. Just today, Microsoft released a new update. This is the new piece:

The Flame malware used a cryptographic collision attack in combination with the terminal server licensing service certificates to sign code as if it came from Microsoft. However, code-signing without performing a collision is also possible.

Now, I’m not sure why the ‘collision attack’ is necessary if code-signing is possible without a collision. But who cares! A collision attack! In the wild! On top-secret government-issued Malware! Dan Brown couldn’t hold a candle to this.

If we look at the Licensing certificates in question, we do indeed see that at least one — created in 2009 — uses the MD5 hash function, something everyone knows is a bad idea:

Certificate:    Data:        Version: 3 (0x2)        Serial Number:            3a:ab:11:de:e5:2f:1b:19:d0:56    Signature Algorithm: md5WithRSAEncryption        Issuer: OU=Copyright (c) 1997 Microsoft Corp., OU=Microsoft Corporation, CN=Microsoft Root Authority        Validity            Not Before: Dec 10 01:55:35 2009 GMT            Not After : Oct 23 08:00:00 2016 GMT        Subject: C=US, ST=Washington, L=Redmond, O=Microsoft Corporation, OU=Copyright (c) 1999 Microsoft Corp., CN=Microsoft Enforced Licensing Intermediate PCA        Subject Public Key Info:            Public Key Algorithm: rsaEncryption                Public-Key: (2048 bit)

And so… Well, I don’t really know. That’s where we run out of information. And end up with a lot of questions.

For one thing, why did the Flame authors need a collision? What extra capabilities did it get them? (Update: the best theory I’ve heard comes from Nate Lawson, who theorizes that the collision might have allowed the attackers to hide their identity. Update 6/6: The real reason has to do with certificate extensions and compatibility with more recent Windows versions. See the end of this post.)

More importantly, what kind of resources would it take to find the necessary MD5 collision? That would tell us a lot about the capabilities of the people government contractors who write malware like this. In 2008 it took one day on a supercomputer, i.e., several hundred PS3s linked together.

And just out of curiosity, what in the world was Microsoft doing issuing MD5-based certificates in 2009? I mean, the code signing bit is disastrous, but at least that’s a relatively subtle issue — I can understand how someone might have missed it. But making MD5 certificates one year after they were definitively shown to be insecure, that’s hard to excuse. Lastly, why do licensing certificates even need to involve a Certificate Signing Request at all, which is the vector you’d use for a collision attack?

I hope that we’ll learn the answers to these questions over the next couple of days. In the mean time, it’s a fascinating time to be in security. If not, unfortunately, a very good time for anyone else.

***

Update 6/5: For those who aren’t familiar with certificate collision attacks, I should clear up the basic premise. Most Certificate Authorities sign a Certificate Signing Request (CSR) issued by a possibly untrustworthy customer. The idea of an MD5 collision-finding attack is to come up with two different CSRs that hash to the same thing. One would be legitimate and might contain your Terminal Services licensing number. The other would contain… something else. By signing the first CSR, Microsoft would be implicitly creating a certificate on the second batch of information.

Finding collisions is a tricky process, since it requires you to muck with the bits of the public key embedded in the certificate (see this paper for more details). Also, some CAs embed a random serial number into the certificate, which really messes with the attack. Microsoft did not.

Finally, some have noticed that Microsoft is still using a root certificate based on MD5, one that doesn’t expire until 2020, and hasn’t been revoked. What makes this ok? The simple answer is that it’s not ok. However, Microsoft probably does not sign arbitrary CSRs with that root certificate, meaning that collision attacks are not viable against it. It’s only the Intermediate certs, the ones that actually sign untrusted CSRs you need to worry about — today.

***

Update 6/6: Microsoft has finally given us some red meat. Short summary: the Terminal Services Licensing certs do work to sign code on versions of Windows prior to Vista, with no collision attack needed. All in all, that’s a heck of a bad thing. But it seems that more recent versions of Windows objected to the particular X.509 extensions that were in the TS licensing certs. The government (or their contractors) therefore used a collision attack to make a certificate that did not have these extensions. From what I’m reading on mailing lists, the collision appears to be “new, and of scientific interest”.

If this pans out, it’s a big deal. Normally the government doesn’t blow a truly sophisticated cryptanalytic attack on some minor spying effort. Getting MD5 collisions up and running is a big effort; it’s not something you can do from a cookbook. But developing novel collision techniques is a step beyond that. I take this to mean that Flame’s authors were breaking out the big guns, which tells us something about its importance to our government. It also tells us that the creation of Flame may have involved some of the top mathematicians and cryptographers in (our?) intelligence services. This is an important datapoint.

***

Update 6/7: It looks like it does pan out. Marc Stevens — one of the authors of the earlier MD5 certificate collision papers — confirms that Flame’s certificate uses a novel collision-finding technique.

“Flame uses a completely new variant of a ‘chosen prefix collision attack’ to impersonate a legitimate security update from Microsoft. The design of this new variant required world-class cryptanalysis”

We can only speculate about why this would be necessary, but a good guess is that Flame is old (or, at very least, the crypto techniques are). If the collision technique was in the works prior to 2009 (and why wouldn’t it be?), Stevens et al. wouldn’t have had much relevance. For those who like details, a (very incomplete) timeline looks something like this:

  1. Mid-1990s: MD5 is effectively obsolete. Switch to a better hash function!
  2. August 2004: Wang and Yu present first (‘random’), manually-found collision on MD5.
  3. 2004-2007: Collisions extended to ‘meaningful’ documents. Lots of stuff omitted here.
  4. May 2007: (CWI) Stevens et al. publish a first impractical technique for colliding certificates.
  5. Late 2008: (CWI) Stevens et al. develop a practical certificate collision.
  6. December 2008: Vendors notified.
  7. Feburary 2009: Paper submitted to CRYPTO.
  8. Summer 2009: Source code released.

Even assuming that CWI team had perfect security with their result, conference program committees are hardly built for secrecy — at least, not from goverments. So it’s reasonable to assume that the Stevens et al. technique was known by February 2009, possibly earlier. Which means that the Flame developers may have had their own approach under development sometime before that date.

The really interesting question is: when did secret collision research get ahead of the public, academic stuff? Post-2007? Post-2004? Pre-2004? I doubt we’ll ever know the answer to this question. I sure would love to.

Update 6/12: Alex Sotirov has a great Summercon presentation with many of the details. The word from those who know is: don’t take his $20,000 figure too literally. We know nothing about the techniques used to find this collision.

Posts so far

The problem with blogs is that, well, they’re weblogs. The ‘good’ posts dribble off the bottom, where they get mixed in with the bad, and nobody ever sees them again. The more crap I write, the worse this problem is.

To fight this — and to prevent myself from writing the same post over and over again — I thought it might be helpful to compile a list of a few posts that aren’t too embarassing. If you’re new here, you can treat this as a table of contents to this blog.

(Hey, there are things going on in the world! The CRYPTO list of accepted papers has finally been published! There are neat attacks on cryptographic tokens! I want to write about all of it, but I just don’t have time today. So please accept these re-runs for now, and hopefully I’ll have new content soon.)

On the mess that is our public-key infrastructure:

  1. The Internet is broken, can we please fix it? On Trustwave & MITM attacks.
  2. TACK, a proposal for dynamically ‘pinning’ certificates.
High-level intro posts:
  1. It’s the end of the world as we know it, and I feel fine. Post-quantum crypto from 30,000 feet.
  2. What is the random oracle model and why should I care? An early series, a little embarrassing.
  3. Format preserving encryption. Or: how to encrypt a credit card number with AES.
  4. What’s TLS Snap Start? and So long False Start. On two (now withdrawn) TLS extensions.
  5. The future of electronic currency. On anonymous e-cash.
  6. Offline security through CAPTCHAs. A neat old idea for preventing dictionary attacks.
  7. Poker is hard, especially for cryptographers. All about mental poker.
  8. Fully-Homomorphic Encryption. Unfortunately this is still unfinished…
How to use cryptography (in)securely:
  1. How (not) to use symmetric encryption. A few of the worst pitfalls.
  2. What’s the deal with RC4? A history of attacks on an old stream cipher.
  3. How to choose an authenticated encryption mode. Very important!
  4. Random number generation, an illustrated primer. A look under the hood.
  5. Surviving a bad RNG. What to do if your (P)RNG isn’t carrying its weight.
  6. Circular security. A wonkier, more theoretical subject.
  7. On multiple encryption. Are you safer if you encrypt twice?
Crypto attack(s) of the week:
  1. On the BEAST attackNote: written before the details were made public.
  2. XML Encryption. Why you should authenticate your ciphertexts.
  3. Side-channel attacks on DESFire. Neat.
  4. Datagram TLS. Alfardan & Paterson show that  timing attacks are (still) practical.
  5. 2011 Redux. A quick summary of a whole year.
  6. Satellite phone encryption is terrible. Attacks on two satphone ciphers.
  7. The story of EAX(prime). And why security proofs are like Knight Rider.
  8. A tale of two patches. Analyzing two recent OpenSSL bugs.
Rants:
  1. Digital Fortress: I read it so you don’t have to. Dan Brown embarrasses cryptography.
  2. Bram Cohen corrected. In which I randomly flame Bram Cohen.
  3. Bram Cohen corrects me? Bram turns out to be a good sport.
  4. Why Antisec matters. The security industry is a joke?
When I though this blog was a book:
  1. Introduction
  2. Where things fall apart: PrimitivesProtocols

TACK

Those who read this blog know that I have a particular fascination with our CA system and how screwed up it is. The good news is that I’m not the only person who feels this way, and even better, there are a few people out there doing more than just talking about it.

Two of the ‘doers’ are Moxie Marlinspike and Trevor Perrin, who recently dropped TACK, a new proposal for ‘pinning’ TLS public keys. I would have written about TACK last week when it was fresh, but I was experimenting with a new approach to international travel where one doesn’t adjust oneself to the local timezone (nb: if you saw me and I was acting funny last week… sorry.)

TACK stands for Trust Assertions for Certificate Keys, but really, does it matter? TACK is one of those acronyms that’s much more descriptive in short form than long: it’s a way to ‘pin’ TLS servers to the correct public key even when a Certificate Authority is telling you differently, e.g., because the CA is misbehaving or has had its keys stolen.

In the rest of this post I’m going to give a quick overview of the problem that TACK tries to solve, and what TACK brings to the party.

Why do I need TACK?

For a long answer to this question you can see this previous post. But here’s the short version:

Because ‘secure communication’ on the Internet requires us to trust way too many people, many of of whom haven’t earned our trust, and probably couldn’t earn it if they tried.

Slightly less brief: most secure communication on the Internet is handled using the TLS protocol. TLS lets us communicate securely with anyone, even someone we’ve never met before.

While TLS is wonderful, it does have one limitation: it requires you to know the public key of the server you want to talk to. If an attacker can convince you to use the wrong public key (say, his key), you’ll end up talking securely to him instead. This is called a Man-in-the-Middle (MITM) attack and yes, it happens.

In principle we solve this problem using PKI. Any one of several hundred ‘trusted’ Certificate Authorities can digitally sign the server’s public key together with a domain name. The server sends the resulting certificate at the start of the protocol. If we trust all the CAs, life is good. But if any single one of those CAs lets us down, things get complicated.

And this is where the current CA model breaks down. Since every CA has the authority to sign any domain, a random CA in Malaysia can sign your US .com, even if you haven’t authorized it to, and even if that CA has never signed a .com before. It’s an invitation to abuse, since an attacker only need steal one CA private key anywhere in the world, and he can MITM any website.

The worst part of the ‘standard’ certificate model is how dumb it is. You’ve been connecting to your bank for a year, and each time it presented a particular public key certified by Verisign. Then one day that same site shows up with a brand new public key, signed by a CA from Guatemala. Should this raise flags? Probably. Will your browser warn you? Probably not. And this is where TACK comes in.

So what is TACK and how does it differ from other pinning schemes?

TACK is an extension to TLS that allows servers to ‘pin’ their public keys, essentially making the following statement to your TLS client:

This is a valid public key for this server. If you see anything else in this space, and I haven’t explicitly told you I’m changing keys, there’s something wrong.

Now, public-key pinning is not a new idea. A similar technique has been used by protocols like SSH for years, Google Chrome has a few static pins built in, and Google even has a draft HTTP extension that adds dynamic pinning capabilities to that protocol.

The problem with earlier pinning schemes is that they’re relatively inflexible. Big sites like Google and Facebook don’t run just one server, nor do they deploy a single public key. They run dozens or hundreds of servers, TLS terminators, etc., all configured with a mess of different keys and certificates. Moreover, configurations change frequently for business reasons. Pinning a particular public key to ‘facebook.com’ seems like a great idea, but probably isn’t as simple as you’d think it is. Moreover, a mistaken ‘pin’ is an invitation to a network disaster of epic proportions.

TACK takes an end-run around these problems, or rather, a couple of them:

  1. It adds a layer of indirection. Instead of pinning a particular TLS public key (either the server’s key, or some higher-level certificate public key), TACK pins sites to a totally separate ‘TACK key’ that’s not part of the certificates at all. You then use this key to make assertions about any of the keys on your system, even if each server uses a different one.
  2. It’s only temporary. A site pin lasts only a few days, no more than a month — unless you visit the site again to renew it. This puts pretty strict limits on the amount of damage you can do, e.g., by misconfiguring your pins, or by losing the TACK key.

So what exactly happens when I visit a TACK site?

So you’ve securely connected via TLS to https://facebook.com, which sports a valid certificate chain. Your browser also supports TACK. In addition to all the usual TLS negotiation junk, the server sends down a TACK public key in a new TLS extension field. It uses the corresponding TACK signing key to sign the server’s TLS public key. Next:

  1. Your browser makes note of the TACK key, but does nothing at all.
You see, it takes two separate connections to a website before any of the TACK machinery gets started. This is designed to insulate clients against the possibility that one connection might be an MITM.

The next time you connect, the server sends the same TACK public key/signature pair again, which activates the ‘pin’ — creating an association between ‘facebook.com’ and the TACK key, and telling your browser that this server public key is legitimate. If you last connected five days ago, then the ‘pin’ will be valid for five more days.

From here on out — until the TACK pin expires — your client will require that every TLS public key associated with facebook.com be signed by the TACK key. It doesn’t matter if Facebook has eight thousand different public keys, as long as each of those public keys gets shipped to the client along with a corresponding TACK signature. But if at any point your browser does not see a necessary signature, it will reject the connection as a possible MITM.

And that’s basically it.
The beauty of this approach is that the TACK key is entirely separate from the certificate keys. You can sign dozens of different server public keys with a single TACK key, and you can install new TLS certificates whenever you want. Clients’ browsers don’t care about this — the only data they need to store is that one TACK key and the association with facebook.com.

But what if I lose my TACK key or it gets stolen?

The nice thing about TACK is that it augments the existing CA infrastructure. Without a valid certificate chain (and a legitimate TLS connection), a TACK key is useless. Even if you lose the thing, it’s not a disaster, since pins expire relatively quickly. (Earlier proposals devote a lot of time to solving these problems, and even require you to back up your pinning keys.)

The best part is that the TACK signing key can remain offline. Unlike TLS private keys, the only thing the TACK key is used for is creating static signatures, which get stored on the various servers. This dramatically reduces the probability that the key will be stolen.

So is TACK the solution to everything?

TACK addresses a real problem with our existing TLS infrastructure, by allowing sites to (cleanly) set pins between server public keys and domain names. If widely deployed, this should help to protect us against the very real threat of MITM. It’s also potentially a great solution for those edge cases like MTA-to-MTA mailserver connections, where the PKI model actually doesn’t work very well, and pinning may be the most viable route to security.

But of course, TACK isn’t the answer to everything. The pins it sets are only temporary, and so a dedicated attacker with long-term access to a connection could probably manage to set incorrect pins. At the end of the day this seems pretty unlikely, but it’s certainly something to keep in mind.

If wishes were horses then beggars would ride… a Pwnie!

In case you’re wondering about the title above, I ripped it off from an old Belle Waring post on the subject of wishful thinking in politics. Her point (inspired by this Calvin & Hobbes strip) is that whenever you find yourself imagining how things should be in your preferred view of the world, there’s no reason to limit yourself. Just go for broke!

Take a non-political example. You want Facebook to offer free services, protect your privacy and shield you from targeted advertising? No problem! And while you’re wishing for those things, why not wish for a pony too!

This principle can also apply to engineering. As proof of this, I offer up a long conversation I just had with Dan Kaminsky on Twitter, related to the superiority of DNSSEC vs. application-layer solutions to securing email in transit. Short summary: Dan thinks DNSSEC will solve this problem, all we have to do is get it deployed everywhere in a reasonable amount of time. I agree with Dan. And while we’re daydreaming, ponies! (See how it works?)

The impetus for this discussion was a blog post by Tom Ritter, who points out that mail servers are horrendously insecure when it comes to shipping mail around the Internet. If you know your mailservers, there are basically two kinds of SMTP connection. MUA-to-MTA connections, where your mail client (say, Apple Mail) sends messages to an SMTP server such as GMail. And MTA-to-MTA connections, where Gmail sends the message on to a destination server (e.g., Hotmail).

Now, we’ve gotten pretty good at securing the client-to-server connections. In practice, this is usually done with TLS, using some standard server certificate that you can pick up from a reputable company like Trustwave. This works fine, since you know the location of your outgoing mail server and can request that the connection always use TLS.

Unfortunately, we truly suck at handling the next leg of the communication, where the email is shipped on to the next MTA, e.g., from Google to Hotmail. In many cases this connection isn’t encrypted at all, making it vulnerable to large-scale snooping. But even when it is secured, it’s not secured very well.

What Tom points out is that many mail servers (e.g., Gmail’s) don’t actually bother to use a valid TLS certificate on their MTA servers. Since this is the case, most email senders are configured to accept the bogus certs anyway, because everyone has basically given up on the system.

We could probably fix the certs, but you see, it doesn’t matter! That’s because finding that next MTA depends on a DNS lookup, and (standard) DNS is fundamentally insecure. A truly nasty MITM attacker can spoof the MX record returned, causing your server to believe that mail.evilempire.com is the appropriate server to receive Hotmail messages. And of course, mail.evilempire.com could have a perfectly legitimate certificate for its domain — which means you’d be hosed even if you checked it.

(It’s worth pointing out that the current X.509 certificate infrastructure does not have a universally-accepted field equivalent to “I am an authorized mailserver for Gmail”. It only covers hostnames.)

The question is, then, what to do about this?

There are at least three options. One is that we just suck it up and assume that email’s insecure anyway. If people want more security, they should end-to-end encrypt their mail with something like GPG. I’m totally sympathetic to this view, but I also recognize that almost nobody encrypts their email with GPG. Since we already support TLS on the MTA-to-MTA connection, perhaps we should be doing it securely?

The second view is that we fix this using some chimera of X.509 extensions and public-key pinning. In this proposal (inspired by an email from Adam Langley***), we’d slightly extend X.509 implementations to recognize an “I am a mailserver for Hotmail.com” field, we get CAs to sign these, and we install them on Hotmail’s servers. Of course, we’ll have to patch mailservers like Sendmail to actually check for these certs, and we’d have to be backwards compatible to servers that don’t support TLS at all — meaning we’d need to pin public keys to prevent downgrade attacks. It’s not a very elegant solution at all.

The third, ‘elegant’ view is that we handle all of our problems using DNSSEC. After all, this is what DNSSEC was built for! To send an email to Hotmail, my server just queries DNS for Hotmail.com, and (through the magic of public-key cryptography) it gets back a signed response guaranteeing that the MX record is legitimate. And a pony too!

But of course, back in the real world this is not at all what will happen, for one very simple reason:

  • Many mail services do not support DNSSEC for their mail servers.
  • Many operating systems do not support DNSSEC resolution. (e.g., Windows 8)

Ok, that’s two reasons, and I could probably add a few more. But at the end of the day DNSSEC is the pony with the long golden mane, the one your daughter wants you to buy her — right after you buy her the turtle and the fish and the little dog named Sparkles.

So maybe you think that I’m being totally unfair. If people truly want MTA-to-MTA connections to be secure, they’ll deploy DNSSEC and proper certificates. We’ll all modify Sendmail/Qmail/Etc. to validate that the DNS resolution is ‘secured’ and they’ll proceed appropriately. If DNS does not resolve securely, they’ll… they’ll…

Well, what will they do? It seems like there are only two answers to that question. Number one: when DNSSEC resolution fails (perhaps because it’s not supported by Hotmail), then the sender fails open. It accepts whatever insecure DNS response it can get, and we stick with the current broken approach. At least your email gets there.

Alternatively, we modify Sendmail to fail closed. When it fails to accomplish a secure DNS resolution, the email does not go out. This is the security-nerd approach.

Let me tell you something: Nobody likes the security-nerd approach.

As long as DNSSEC remains at pitiful levels of deployment, it’s not going to do much at all. Email providers probably won’t add DNSSEC support for MX records, since very few mailservers will be expecting it. Mailserver applications won’t employ (default, mandatory) DNSSEC checking because none of the providers will offer it! Around and around we’ll go.

But this isn’t just a case against DNSSEC, it relates to a larger security architecture question. Namely: when you’re designing a security system, beware of relying on things that are outside of your application boundary.

What I mean by this is that your application (or transport layer implementation, etc.) should not be dependent. If possible, it should try to achieve its security goals all by itself, with as few assumptions about the system that it’s running on, unless those assumptions are rock solid. If there’s any doubt that these assumptions will hold in practice, your system needs a credible fallback plan other than “ah, screw it” or “let’s be insecure”.

The dependence on a secure DNS infrastructure is one example of this boundary-breaking problem, but it’s not the only one.

For example, imagine that you’re writing an application depends on having a strong source of pseudo-random numbers. Of course you could get these from /dev/rand (or /dev/urand), but what if someone runs your application on a system that either (a) doesn’t have these, or (b) has them, but they’re terrible?

Now sometimes you make the choice to absorb a dependency, and you just move on. Maybe DNSSEC is a reasonable thing to rely on. But before you make that decision, you should also remember that it’s not just a question of your assumptions breaking down. There are human elements to think of.

Remember that you may understand your design, but if you’re going to be successful at all, someday you’ll have to turn it over to someone else. Possibly many someones. And those people need to understand the fullness of your design, meaning that if you made eight different assumptions, some of which occur outside of your control, then your new colleagues won’t just have eight different ways to screw things up. They’ll have 2^8 or n^8 different ways to screw it up.

You as the developer have a duty to yourself and to those future colleagues to keep your assumptions right there where you can keep an eye on them, unless you’re absolutely positively certain that you can rely on them holding in the future.

Anyway, that’s enough on this subject. There’s probably room on this blog for a longer and more detailed post about how DNSSEC works, and I’d like to write that post at some point. Dan obviously knows a lot more of the nitty-gritty than I do, and I’m (sort of) sympathetic to his position that DNSSEC is awesome. But that’s a post for the future. Not the DNSSEC-distant future, hopefully!

Notes:

* Dan informs me that he already has a Pwnie, hence the title.
** Thomas Ptacek also has a much more convincing (old) case against DNSSEC.
*** In fairness to Adam, I think his point was something like ‘this proposal is batsh*t insane’, maybe we should just stick with option (1). I may be wrong.

How to choose an Authenticated Encryption mode

If you’ve hung around this blog for a while, you probably know how much I like to complain. (Really, quite a lot.) You might even be familiar with one of my favorite complaints: dumb crypto standards. More specifically: dumb standards promulgated by smart people.

The people in question almost always have justifications for whatever earth-shakingly stupid decision they’re about to take. Usually it’s something like ‘doing it right would be hard‘, or ‘implementers wouldn’t be happy if we did it right‘. Sometimes it’s ‘well, we give the option to do it right‘. In the worst case they’ll tell you: ‘if it bothers you so much, why don’t you join the committee and suggest that idea yourself, Mr. Smartypants‘.

Well, first of all, it’s Dr. Smartypants. And moreover, I’ve tried. It doesn’t work.

Case in point: I happen to be lurking on the mailing list of a standards committee that recently decided to allow unauthenticated CBC mode encryption as an option in their new web encryption standard. When I pointed out that the exact same decision led to the failure of a previous standard — ironically, one that this new standard will probably replace — I was told, politely, that:

  1. Mandating authenticated encryption would be hard.
  2. Real implementers don’t know how to implement it.
  3. We already offer the option to use authenticated encryption.
  4. Stop telling us things we already know.

The worst part: they really did know. The committee included some smart, smart people. People who know that this is a bad idea, and who have decided either to just go with it, or else have convinced themselves that implementers won’t (a) pick the easy, insecure option, and then (b) screw it up completely. I have news for these people: Yes, they will. This is why we write standards.

After all this build-up, it may surprise you that this is not a post about standards committees. It’s not even a post about smart people screwing things up. What I’m here to talk about today is Authenticated Encryption, what the hell it is, why you need it. And finally, (assuming you’re good with all that) which of the many, many AE schemes should you consider for your application.
First, some background.

What’s Authenticated Encryption and why should I care?

For those of you who don’t know what AE is, I first need to explain one basic fact that isn’t well explained elsewhere:

Nearly all of the symmetric encryption modes you learned about in school, textbooks, and Wikipedia are (potentially) insecure.

This covers things like AES when used in standard modes of operation like CBC and CTR. It also applies to stream ciphers like RC4. Unfortunately, the list of potentially insecure primitives includes many of the common symmetric encryption schemes that we use in practice.

Now, I want to be clear. These schemes are not insecure because they leak plaintext information to someone who just intercepts a ciphertext. In fact, most modern schemes hold up amazingly well under that scenario, assuming you choose your keys properly and aren’t an idiot.

The problem occurs when you use encryption in online applications, where an an adversary can intercept, tamper with, and submit ciphertexts to the receiver. If the attacker can launch such attacks, many implementations can fail catastrophically, allowing the attacker to completely decrypt messages.

Sometimes these attacks requires the attacker to see only an error message from the receiver. In other cases all he needs to do is measure time it takes for the receiver to acknowledge the submission. This type of attack is known as a chosen ciphertext attack, and by far the most common embodiment is the ‘padding oracle attack‘ discovered in 2002 by Serge Vaudenay. But there are others.

The simplest way to protect yourself against these attacks is to simply MAC your ciphertexts with a secure Message Authentication Code such as HMAC-SHA. If you prefer this route, there are two essential rules:

  1.  Always compute the MACs on the ciphertext, never on the plaintext.
  2.  Use two different keys, one for encryption and one for the MAC.

Rule (1) prevents chosen-ciphertext attacks on block cipher modes such as CBC, since your decryption process can reject those attacker-tampered ciphertexts before they’re even decrypted. Rule (2) deals with the possibility that your MAC and cipher will interact in some unpleasant way. It can also help protect you against side-channel attacks.

This approach — encrypting something, then MACing it — is not only secure, it’s provably secure as long as your encryption scheme and MAC have certain properties. Properties that most common schemes do seem to possess.*

Dedicated AE(AD) modes

Unfortunately, the ‘generic composition’ approach above is not the right answer for everyone. For one thing, it can be a little bit complicated. Moreover, it requires you to implement two different primitives (say, a block cipher and a hash function for HMAC), which can be a hassle. Last, but not least, it isn’t necessarily the fastest way to get your messages encrypted.

The efficiency issue is particularly important if you’re either (a) working on a constrained device like an embedded system, or (b) you’re working on a fast device, but you just need to encrypt lots of data. This is the case for network encryptors, which have to process data at line speeds — typically many gigabytes per second!

For all of these reasons, we have specialized block cipher modes of operation called Authenticated Encryption (AE) modes, or sometimes Authenticated Encryption with Associated Data (AEAD). These modes handle both the encryption and the authentication in one go, usually with a single key.

AE(AD) modes were developed as a way to make the problem of authentication ‘easy’ for implementers. Moreover, some of these modes are lightning fast, or at least allow you to take advantage of parallelization to speed things up.

Unfortunately, adoption of AE modes has been a lot slower than one would have hoped for, for a variety of reasons. One of which is: it’s hard to find good implementations, and another is that there are tons and tons of AE(AD) schemes.

So, which AE mode should I choose?

And now we get down to brass tacks. There are a plethora of wonderful AE(AD) modes out there, but which one should you use? There are many things to consider. For example:

  • How fast is encryption and decryption?
  • How complicated is the implementation?
  • Are there free implementations out there?
  • Is it widely used?
  • Can I parallelize it?
  • Is it ‘on-line’, i.e., do I need to know the message length before I start encrypting?
  • Is it patented?
  • Does it allow me to include Associated Data (like a cleartext header)?
  • What does Matt Green think about it?

To answer these questions (and particularly the most important final one), let’s take a look at a few of the common AE modes that are out there. All of these modes support Associated Data, which means that you can pre-pend an unencrypted header to your encrypted message if you want. They all take a single key and some form of Initialization Vector (nonce). Beyond that, they’re quite different inside.

GCM. Galois Counter Mode has quietly become the most popular AE(AD) mode in the field today, despite the fact that everyone hates it. The popularity is due in part to the fact that GCM is extremely fast, but mostly it’s because the mode is patent-free. GCM is ‘on-line’ and can be parallelized, and (best): recent versions of OpenSSL and Crypto++ provide good implementations, mostly because it’s now supported as a TLS ciphersuite. As a side benefit, GCM will occasionally visit your house and fix broken appliances.

Given all these great features, you might ask: why does everyone hate GCM? In truth, the only people who hate GCM are those who’ve had to implement it. You see, GCM is CTR mode encryption with the addition of a Carter-Wegman MAC set in a Galois field. If you just went ‘sfjshhuh?’, you now understand what I’m talking about. Implementing GCM is a hassle in a way that most other AEADs are not. But if you have someone else’s implementation — say OpenSSL’s — it’s a perfectly lovely mode.

OCB. In performance terms Offset Codebook Mode blows the pants off of all the other modes I mention in this post. It’s ‘on-line’ and doesn’t require any real understanding of Galois fields to implement** — you can implement the whole thing with a block cipher, some bit manipulation and XOR. If OCB was your kid, he’d play three sports and be on his way to Harvard. You’d brag about him to all your friends.

Unfortunately OCB is not your kid. It belongs to Philip Rogaway, who also happens to hold a patent on it. This is no problem if you’re developing GPL software (it’s free for you), but if you want to use it in a commercial product — or even license under Apache — you’ll probably have to pay up. As a consequence OCB is used in approximately no industry standards, though you might find it in some commercial products.

EAX. Unlike the other modes in this section, EAX mode doesn’t even bother to stand for anything. We can guess that E is Encryption and A is Authentication, but X? I’m absolutely convinced that EAX is secure, but I cannot possibly get behind a mode of operation that doesn’t have a meaningful acronym.

EAX is a two-pass scheme, which means that encryption and authentication are done in separate operations. This makes it much slower than GCM or OCB, though (unlike CCM) it is ‘on-line’. Still, EAX has three things going for it: first, it’s patent-free. Second, it’s pretty easy to implement. Third, it uses only the Encipher direction of the block cipher, meaning that you could technically fit it into an implementation with a very constrained code size, if that sort of thing floats your boat. I’m sure there are EAX implementations out there; I just don’t know of any to recommend.

Whatever you do, be sure not to confuse EAX mode with its dull cousin EAX(prime), which ANSI developed only so it could later be embarrassingly broken.

CCM. Counter Mode with CBC MAC is the 1989 Volvo station wagon of AEAD modes. It’ll get you to your destination reliably, just not in a hurry. Like EAX, CCM is also a two-pass scheme. Unfortunately, CCM is not ‘on-line’, which means you have to know the size of your message before you start encrypting it. The redeeming feature of CCM is that it’s patent-free. In fact, it was developed and implemented in the 802.11i standard (instead of OCB) solely because of IP concerns. You can find an implementation in Crypto++.

The rest. There are a few more modes that almost nobody uses. These include XCBC, IAPM and CWC. I have no idea why the first two haven’t taken off, or if they’re even secure. CWC is basically a much slower version of GCM mode, so there’s no real reason to use it. And of course, there are probably plenty more that I haven’t listed. In general: you should use those at your own risk.

Summing up

So where are we?

In general, the decision of which cipher mode to use is not something most people make every day, but when you do make that decision, you need to make the right one. Having read back through the post, I’m pretty sure that the ‘right’ answer for most people is to use GCM mode and rely on a trusted free implementation, like the one you can get from OpenSSL.

But there are subcases. If you’re developing a commercial product, don’t care about cross-compatibility, and don’t mind paying ‘a small one-time fee‘, OCB is also a pretty good option. Remember: even cryptographers need to eat.

Finally, if you’re in the position of developing your own implementation from scratch (not recommended!) and you really don’t feel confident with the more complicated schemes, you should serious consider EAX or CCM. Alternatively, just use HMAC on your ciphertexts. All of these things are relatively simple to deal with, though they certainly don’t set the world on fire in terms of performance.

The one thing you should not do is say ‘gosh this is complicated, I’ll just use CBC mode and hope nobody attacks it’, at least not if you’re building something that will potentially (someday) be online and subject to active attacks like the ones I described above. There’s already enough stupid on the Internet, please don’t add more.

Notes:

* Specifically, your encryption scheme must be IND-CPA secure, which would apply to CBC, CTR, CFB and OFB modes implemented with a secure block cipher. Your MAC must be existentially unforgeable under chosen message attack (EU-CMA), a property that’s (believed) to be satisfied by most reasonable instantiations of HMAC.

** An earlier version of this post claimed that OCB didn’t use Galois field arithmetic. This commenter on Reddit correctly points out that I’m an idiot. It does indeed do so. I stand by my point that the implementation is dramatically simpler than GCM.

A tale of two patches

Nick Mathewson over at the Tor project points me to the latest vulnerability in OpenSSL’s implementation of TLS and DTLS (though only in some OpenSSL versions, YMMV). For those of you keeping score at home, this makes two serious flaws in the same small chunk of DTLS code in about four months.

I should note that Tor doesn’t use DTLS. In fact, based on a quick Google search, there may be more people with radioactive-spider-based superpowers than there are projects actually using OpenSSL DTLS (these folks being one possible exception). Moreover, it’s not clear that this bug (unlike the previous one) is useful for anything more than simple DoS.

But that doesn’t make it useless. Even if these things aren’t exploitable, bugs like this one provide us with the opportunity to learn something about how even ‘simple’ decryption code can go wrong. And that’s what we’re going to do in this post.

So let’s go take a look at the unpatched code in question.

This code handles the decryption of DTLS packets, and comes in a slightly older OpenSSL version, v1.0.0e (release 9/2011). To give you a flavor of how one could miss the first (less serious) issue, I was going to reproduce the entire 161-line megafunction in which the bug occurs. But it’s just too huge and awkward. Here’s the relevant bit:

int dtls1_enc(SSL *s, int send)
{
SSL3_RECORD *rec;
EVP_CIPHER_CTX *ds;
unsigned long l;
int bs,i,ii,j,k,n=0;
const EVP_CIPHER *enc; 

        
if ((bs != 1) && !send)
  {
ii=i=rec->data[l-1]; /* padding_length */
i++;
if (s->options&SSL_OP_TLS_BLOCK_PADDING_BUG)
   {
/* First packet is even in size, so check */
if ((memcmp(s->s3->read_sequence,
                            “\0\0\0\0\0\0\0\0”,8) == 0) && !(ii & 1))
s->s3->flags|=TLS1_FLAGS_TLS_PADDING_BUG;
if (s->s3->flags & TLS1_FLAGS_TLS_PADDING_BUG)
i–;
   }
/* TLS 1.0 does not bound the number of padding bytes by the block size.
* All of them must have value ‘padding_length’. */
if (i > (int)rec->length)
   {
/* Incorrect padding. SSLerr() and ssl3_alert are done
* by caller: we don’t want to reveal whether this is
* a decryption error or a MAC verification failure
* (see http://www.openssl.org/~bodo/tls-cbc.txt
*/
return -1;
   }
for (j=(int)(l-i); j<(int)l; j++)
   {
if (rec->data[j] != ii)
      {
/* Incorrect padding */
return -1;
      }
   }
rec->length-=i;
            
rec->data += bs;    /* skip the implicit IV */
rec->input += bs;
rec->length -= bs;
   }
 }
return(1);
}

Now I could waste this whole post ranting about code quality, pointing out that variable names like ‘l’ and ‘bs’ were cute when Kernighan and his buddies were trying to get AWK small enough so they could write it down and type it in every day (or whatever they did for storage back then), but not so cute in 2012 inside of the most widely-used crypto library on earth.

I could also ask why this function is named “dtls1_enc” when in fact it actually contains decryption code as well as encryption code. I might even suggest that combining both operations into one massive, confusing function might have sounded like a better idea than it actually is.

But I’m not going to do all that, since that would be dull and the point of this post is to find a bug. So, do you see it? Hint: it’s in the boldfaced part.

Maybe I should give you a little more. Every DTLS packet begins with an ‘implicit’ Initialization Vector of length ‘bs’. The packet also comes with a bit of padding at the end, to bring the packet up to an even multiple of the cipher’s block size. Let ‘Mx’ represent message bytes and let ‘LL’ be the number of padding bytes expressed as a byte. Then the tail of the packet will look something like:

0x M1 M2 M3 M4 LL LL LL LL LL LL LL LL LL LL LL LL

An important note is that TLS 1.0 allows up to 255 padding bytes even if the blocksize is much less than that. So the last twenty-something lines of the boldfaced decryption code are basically checking the last byte of the decrypted message, then working backward through (possibly) multiple blocks of ‘bs’ bytes each, in order to strip off all of that ugly padding.

Do you see it now?

Ok, I’ll spoil it for you. The entirety of the bug comes in the statement “if (i > (int)rec->length)”, which ensures that the message is as long as the padding claims to be. What this doesn’t do is check that the Initialization Vector is there as well (update: I’ve edited this post a bit, since I got confused the first time!). Because of the missing check it’s possible to submit a short message that consists of only padding, in which case rec->length will be less than ‘bs’. At which point we flow through and hit this statement:

rec->length -= bs;

Oops. See what happens there? It might be helpful if I told you that rec->length is an unsigned long. So predictable consequences.

The OpenSSL folks describe this as a DoS vulnerability, not an exploitable one. I’m perfectly happy to take them at their word. But, yuck. This sort of stuff happens, I realize, but should it have to?

Since you’re still reading this, you’re obviously some kind of glutton for punishment. That probably means you’re going to stick around for the second (and somewhat older) issue, which was patched back in January. In fact, I wrote about it on this blog. To get there, you have to climb one level up to the caller of the function we just looked at.

This is easier said than done since — this being OpenSSL — of course the previous function isn’t called directly, it’s accessed via a function pointer! Oh lord. Ah, here we are:

static int
dtls1_process_record(SSL *s)
{


/* decrypt in place in ‘rr->input’ */
rr->data=rr->input;


enc_err = s->method->ssl3_enc->enc(s,0);  <<<<****** Call to the previous function!
if (enc_err <= 0)
 {
/* decryption failed, silently discard message */
if (enc_err < 0)
  {
rr->length = 0;
s->packet_length = 0;
  }
goto err;  <<<<****** Send an error message
}

/* check the MAC for rr->input (it’s in mac_size bytes at the tail) */
if (rr->length < mac_size)
  {
#if 0 /* OK only for stream ciphers */
al=SSL_AD_DECODE_ERROR;
SSLerr(SSL_F_DTLS1_PROCESS_RECORD,SSL_R_LENGTH_TOO_SHORT);
goto f_err;
#else
goto err;
#endif
  }
rr->length-=mac_size;
i=s->method->ssl3_enc->mac(s,md,0);
if (i data[rr->length]),mac_size) != 0)
  {
goto err;  <<<<****** Send an error message
  }
}
}
See it? Oh, maybe it will help if we revisit one of the bigger comments in the first function, the one that gets called:
 

/* Incorrect padding. SSLerr() and ssl3_alert are done
* by caller: we don’t want to reveal whether this is
* a decryption error or a MAC verification failure
* (see http://www.openssl.org/~bodo/tls-cbc.txt
*/
return -1;

Does that help? The people who wrote the first function knew full well how dangerous it is to reveal whether you’re failing due to a padding verification error or a MAC verification error. They even provided a link to the reason why!

Unfortunately the people writing the one-level up code didn’t understand the full implications of this, or didn’t realize that the information could leak out even if you don’t reveal exactly why you’ve barfed. People can learn why you failed even if there’s just a timing difference in when you give them the error code! This was known and mitigated in the TLS code, but not in the DTLS version.

And the end result is that there are two error conditions, occurring a significant distance from one another in the code. An attacker can monitor the decryption of chosen packets and use this information to slowly decrypt any plaintext. Ouch.

So what does it all mean?

You wanted meaning? I just wanted to look at some code.

But if I was to assign some arbitrary meaning to these mistakes, it’s that cryptographic code is frigging hard to write. And that more eyes does not necessarily mean fewer bugs, particularly if the code is messy and you have two versions doing substantially similar things (DTLS and TLS). It doesn’t help if your coding standard was basically invented on a dare.

OpenSSL has gotten to where it is today not for being a great library, but for basically being the library. And it’s certainly a good one, now that it’s been subjected to several years of unrelenting scrutiny. But the DTLS code is new and hasn’t received that same level of attention, which is why it’s fun to watch the bugs get shaken out all over again (unfortunately the first bug also appears in TLS code, which is harder to explain). I’m sure we haven’t seen the last of it.

The future of electronic currency

Crypto is fantastic for many things, but those who readthis blog know that I have a particular fascination for its privacy applications. Specifically, what interests me are the ways we can use cryptography to transact (securely) online, without revealing what we’re doing, or who we’re doing it with.

This is particularly relevant, since today we’re in the middle of an unprecedented social and technological experiment: moving our entire economy out of metal and paper and into the ‘net. I’ve already had to explain to my four-year old what newspapers are; I imagine he’ll have a similarly experience when his children ask why people once carried funny pieces of paper around in their wallet.

Between credit and debit cards, EFT, online banking and NFC, it seems like the days of cash are numbered. Unfortunately, all is not sunshine and roses. The combination of easy-to-search electronic records and big data seems like a death-knell for our individual privacy. Cryptography holds the promise to get some of that privacy back, if we want it.

In this post I’m going to take a very quick look at a few privacy-preserving ‘e-cash’ technologies that might help us do just that. (And yes, I’ll also talk about Bitcoin.)

Why don’t we have electronic cash today?

The simple answer is that we already have electronic money; we just don’t call it that. Right now in its mainstream incarnation, it takes the form of little plastic cards that we carry around in our wallets. If you live in a developed nation and aren’t too particular about tipping valets, you can pretty much survive without ever touching hard currency.

The problem is that credit and debit cards are not cash. They’re very good for money transfers, but they have two specific limitations: first, they require you to access an online payment network. This means that they lose their usefulness at exactly the moment when you need them most: typically, after a disaster has wiped out or severely limited your connectivity (e.g., most hurricanes in Florida, NYC the morning after 9/11, etc).

Secondly, funds transfer systems offer none of the privacy advantages of real cash. This is probably by (government) preference: untraceable cash lends itself to unsavory activities, stuff like drug dealing, arms purchases and tax evasion. Our modern banking system doesn’t necessarily stop these activities, but it’s a godsend for law enforcement: just about every transaction can be traced down to the $0.01. (And even if you aren’t a drug dealer, there are still plenty of folks who’ll pay good money for a copy of your spending history, just so they can sell you stuff.)

The genesis of private e-cash

chaum_1
David Chaum

Credit for the invention of true, privacy-preserving electronic cash generally goes to David Chaum. Chaum proposed his ideas in a series of papers throughout the 1980s, then made a fortune providing the world with untraceable electronic cash.

Well, actually, the statement above is not quite accurate. According to legend, Chaum turned down lucrative offers from major credit card companies in favor of starting his own e-cash venture. I don’t need to tell you how the story ends — you’ve probably already noticed that your wallet isn’t full of untraceable electronic dollars (and if it is: I’m sorry.)

There’s an important lesson here, which is that getting people to adopt electronic cash requires a lot more than just technology. Fortunately, the failure of e-cash has a silver lining, at least for the field of cryptography: Chaum went on to pioneer anonymous electronic voting and a whole mess of other useful stuff.

Like many e-cash systems since, Chaum’s earliest paper on the e-cash proposed to use digital ‘coins’, each of some fixed denomination (say, $1). A coin was simply a unique serial number, generated by the holder and digitally signed using a private key known only to the bank. When a user ‘spends’ a coin, the merchant can verify the signature and ‘deposit’ the coin with the bank — which will reject any coin that’s already been spent.

(Of course, this doesn’t prevent the merchant from re-spending your hard earned money. To deal with this, the user can replace that serial number with a freshly-generated public key for a signature scheme. The bank will sign the public key, then the user can provide the merchant with the public key — signed by the bank — and use the corresponding signing key to sign the merchant’s name and transaction info.)

However you do it, the system as described has a crucial missing element: it’s not private. The bank knows which serial numbers it signs for you, and also knows where they’re being spent. This provides a linkage between you and, say, that anarchist bookstore where you’re blowing your cash.

To address this, Chaum replaced the signing process with a novel blind signature protocol. Blind signature is exactly what it sounds like: a way for the bank to sign a message without actually seeing it. Using this technology, the user could make up a serial number and not tell the bank; the blind signature protocol would provide the necessary signature. Even if the bank was trying to track the coins, it wouldn’t be able to link them to the user.

Chaum even provided a nice real-world analogy for his idea: place a document inside of an element along with a sheet of carbon paper, then let the bank sign the outside of the envelope, conveying the signature through and onto the document. This doesn’t literally describe how blind signatures work, but the real cryptographic constructions aren’t that much worse: you can readily obtain blind versions of RSADSA and the Schnorr/Elgamal signatures without (mostly) breaking a sweat (see this footnote for details).

The double-spending problem and going offline

Digital signatures do one thing very well: they prevent unauthorized users from issuing their own coins. Unfortunately they don’t prevent a second serious problem: users who copy legitimate coins.

Copying is where electronic cash really differs from its physical equivalent. Real money is hard to copy — by design. If it wasn’t, we wouldn’t use it. When people get too clever at copying it, we even send men with guns to shut them down.

Electronic coins are very different. It’s almost impossible to work with data without copying it; from long-term storage to RAM, from RAM to the processor cache, from one computer to another over a network. Electronic coins must be copied, and this fundamentally changes the nature of the problem. The boogeyman here is ‘double spending‘, where a user tries to spend the same valid coin with many different merchants. Left unchecked, double-sending does more than screw over a merchant. It can totally debase the currency supply, making coins almost impossible for merchants to trust.

Chaum’s original solution dealt with double-spenders by requiring the bank to be online, so users could immediately deposit their coins — and make sure they were fresh. This works great, but it’s damn hard to handle in a system that works offline, i.e., without a live network connection. Indeed, offline spending is the big problem that most e-cash solutions have tried to tackle.

There are two basic solutions to the offline problem. Neither is perfect. They are:

  • Use trusted hardware. Force users to store their coins inside of some piece of bank-trusted (and tamper-resistant) piece of hardware such as a cryptographic smartcard. The hardware can enforce correct behavior, and prevent users from learning the actual coin values.
  • Revoke double-spenders’ anonymity. Alternatively, it’s possible to build e-cash systems that retain the users’ anonymity when they participate honestly, but immediately revokes their anonymity when they cheat (i.e., double-spend the same coin).

Although these solutions are elegant, they also kind of suck. This is because neither is really sufficient to deal with the magnitude of the double-spending problem.

To understand what I’m talking about, consider the following scam: I withdraw $10,000 from the bank, then spend each of my coins with 1,000 different offline merchants. At the end of the day, I’ve potentially walked away with $10,000,000 in merchandise (assuming it’s portable) before anyone realizes what I’ve done. That’s a lot of dough for a single scam.

In fact, it’s enough dough that it would justify some serious investment in hardware reverse-engineering, which makes it hard to find cost-effective hardware that’s sufficient to handle the threat. Finding the owner of the coin isn’t much of a deterrent either — most likely you’ll just find some guy in Illinois who had his wallet stolen.

That doesn’t mean these approaches are useless: in fact, they’re very useful in certain circumstances, particularly if used in combination with an online bank. Moreover the problem of revealing a user’s identity (on double-spend) is an interesting one. There are several schemes that do this, including one by Chaum, Fiat and Naor, and a later (very elegant) scheme by Stefan Brands. (For a bit more about these schemes, see this footnote.)

Compact wallets and beyond

There have been quite a few developments over the past few years, but none are as dramatic as the original schemes. Still, they’re pretty cool.

One scheme that deserves a few words is the ‘Compact e-Cash‘ system of Camenisch, Hohenberger and Lysyanskaya. This system is nice because users can store millions of e-coins in a relatively small format, but also because it uses lots of neat crypto — including signatures with efficient protocols and zero-knowledge proofs.

At a very high level, when a user withdraws n coins from the bank in this system, the bank provides the user with a digital signature on the following values: the user’s public key, the number of coins n withdrawn, and a secret seed value seed that’s generated cooperatively by the bank and the user.

The bank learns the number of coins and user’s public key, but only the user learns seed. To spend the ith coin in the wallet, the user generates a ‘serial number’ SN = F(seed, i), where F is some pseudo-random function. The user also provides a non-interactive zero-knowledge proof that (a) 0 < i < n, (b) SN is correctly formed, and (c) she has a signature on seed from the bank (among other things). This zero-knowledge proof is a beautiful thing, because it does not leak any information beyond these statements, and can’t even be linked back to the user’s key in the event that she loses it. The online bank records each serial number it sees, ensuring that no coin will ever be spent twice.

This may seem pretty complicated, but the basic lesson is that we can do lots of neat things with these technologies. We can even build coins that can be spent k times for some arbitrary k, only revealing your identity if they’re used more times than that; this turns out to be useful anonymous login applications, where users want to access a resource a fixed number of times, but don’t want anyone counting their accesses.

Unfortunately, we haven’t managed to build any of this stuff and deploy it in a practical setting.

Bitcoin

Which brings us to the one widely-deployed, practical electronic cash system in the world today. What about Bitcoin?

I’m a big fan of Bitcoin (from a technical perspective), but it has a few limitations that make Bitcoins a little bit less private than real e-cash should be.

Despite the name, Bitcoin doesn’t really deal with ‘coins’: it’s actually a transaction network. Users generate blocks of a certain value then transfer quantities of currency using ECDSA public keys as identifiers. The core innovation in Bitcoin is a distributed public bulletin-board (the ‘block-chain’) that records every transaction in Bitcoin’s history. This history lets you check that any given chunk of currency has a valid pedigree.

While the Bitcoin block-chain is essential to security, it’s also Bitcoin’s privacy achilles heel. Since every transaction is public — and widely disseminated — there’s no hiding that it took place. To make up for this, Bitcoin offers pseudonymity: your public key isn’t tied to your identity in any way, and indeed, you can make as many of them as you want. You can even transfer your coins from one key to another.

Now, I’m not really complaining about this. But it should be noted that pseudonymity is to anonymity what sugar-free chocolates are to the real thing. While I don’t know of anyone who’s actively looking to de-anonymize Bitcoin transactions (scratch that, Zooko points out that some people are), there has been plenty of work on extracting (or ‘re-identifying’) pseudonymized data sets. If you don’t believe me, see this work by Narayanan and Shamtikov on de-anonymizing social network data, or this one that does the same thing for the Netflix prize dataset. And those are just two of many examples.

Many knowledgable Bitcoin users know this, and some have even developed Bitcoin ‘mixers’ that stir up large pools of Bitcoin from different users, in the hopes that this will obfuscate the transaction history. This sounds promising, but has a lot of problems — starting with the fact that none few seem to be actually online as I write this post.* Even if one was available, you’d basically be placing your privacy trust into the hands of one party who could totally screw you. (A large, distributed system like Tor could do the job, but none seems to be on the horizon). Finally, you’d need a lot of transaction volume to stay safe.

At the same time, it seems difficult to shoehorn the e-cash techniques from the previous sections into Bitcoin, because those systems rely on a centralized bank, and also assume that coins are used only once. Bitcoin has no center, and coins are used over and over again forever as they move from user to user. Any anonymous coin solution would have to break this linkage, which seems fundamentally at odds with the Bitcoin design. (Of course that doesn’t mean it isn’t possible! ;)**

In summary

This has hardly been an exhaustive summary of how e-cash works, but hopefully it gives you a flavor of the problem, along with a few pointers for further reading.

I should say that I don’t live the most interesting life, and about the only embarrassing thing you’ll see on my credit cards is the amount of money we waste on Diet Coke (which is totally sick). Still, this isn’t about me. As our society moves away from dirty, messy cash and into clean — and traceable — electronic transactions, I really do worry that we’re losing something important. Something fundamental.

This isn’t about avoiding marketers, or making it easier for people to have affairs. Privacy is something humans value instinctively even when we don’t have that much to hide. It’s the reason we have curtains on our windows. We may let go of our privacy today when we don’t realize what we’re losing, but at some point we will realize the costs of this convenience. The only question is when that will be, and what we’ll do about it when the day comes.

Notes:

* I was wrong about this: a commenter points out that Bitcoin Fog is up and running as a Tor hidden service. I’ve never tried this, so I don’t know how well it works. My conclusion still stands: mixing works well if you trust the BF operators and there’s enough transaction volume to truly mix your spending. We shouldn’t have to trust anyone.

** Some people have tried though: for example, OpenCoin tries to add Chaum-style cash to Bitcoin. Ditto Open Transactions. From what I can tell, these protocols still do Chaumian cash ‘old style’: that is, they require a trusted ‘bank’, or ‘issuer’ and don’t actually integrate into the distributed trust framework of Bitcoin. Still very nice work. h/t commenters and Stephen Gornick (who also fixed a few typos).

An update on the TLS MITM situation

A couple months back I wrote a pretty overwrought post lamenting the broken state of our PKI. At the time, I was kind of upset that ‘trusted’ Certificate Authorities (*ahem*, Trustwave) had apparently made a business of selling that trust for the purpose of intercepting SSL/TLS connections.

Now, I may be off base, but this seems like a bad thing to me. It also seemed like pretty stupid business. After all, it’s not like certificates are hard to make (seriously — I made three while writing the previous sentence). When your sole distinguishing feature is that people trust you, why would you ever mess with that trust?

In any case, the worst part of the whole thing was a suspicion that the practice didn’t end with Trustwave; rather, that Trustwave was simply the CA that got caught. (No, this doesn’t make it better. Don’t even try that.)

But if that was the case, then who else was selling these man-in-the-middle ‘skeleton’ certificates? Who wasn’t? Could I just call up and order one today, or would it be all dark parking lots and windowless vans, like the last time I bought an 0day from VUPEN? (Kidding. Sort of.)

The saddest part of the whole story is that out of all the major browser vendors, only one — the Mozilla foundation, makers of Firefox — was willing to take public action on the issue. This came in the form of a letter that Mozilla sent to each of the major CAs in the Mozilla root program:

Please reply by March 2, 2012, to confirm completion of the following actions or state when these actions will be completed.

1) Subordinate CAs chaining to CAs in Mozilla’s root program cannot be used for MITM or “traffic management” of domain names or IPs that the certificate holder does not legitimately own or control, regardless of whether it is in a closed and controlled environment or not. Please review all of the subordinate CAs that chain up to your root certificates in NSS to make sure that they cannot be used in this way. Any existing subordinate CAs that can be used for that purpose must be revoked and any corresponding HSMs destroyed as soon as possible, but no later than April 27, 2012.

Which brings me to the point of this post. March 2 passed a long time ago. The results are mostly in. So what have we learned?


(Note: to read the linked spreadsheet, look at the column labeled “Action #1”. If it contains an A, B, or E, that means the CA has claimed innocence — it does not make SubCAs for signing MITM certificates, or else it does make SubCAs, but the holders are contractually-bound not to abuse them.* If there’s a C or a D there, it means that the CA is working the problem and will have things sorted out soon.)

The high-level overview is that most of the Certificate Authorities claim innocence. They simply don’t make SubCAs (now, anyway). If they do, they contractually require the holders not to use them for MITM eavesdropping. One hopes these contracts are enforceable. But on the surface, at least, things look ok.

There are a couple of exceptions: notably Verizon Business (in my hometown!), KEYNECTIS and GlobalSign. All three claim to have reasonable explanations, and are basically just requesting extensions so they can get their ducks in a row (KEYNECTIS is doing in-person inspections to make sure nothing funny is going on, which hopefully just means they’re super-diligent, and not that they’re picking up MITM boxes and throwing them in the trash.) **

But the upshot is that if we can believe this self-reported data, then Trustwave really was the outlier. They were the only people dumb enough to get into the business of selling SubCAs to their client for the purpose of intercepting TLS connections. Or at very least, the other players in this business knew it was a trap, and get themselves out long ago.

Anyone else believe this? I’d sure like to.

Notes:

* ‘Abuse’ here means signing certificates for domains that you don’t control. Yes, it’s unclear how strong these contracts are, and if there’s no possibility for abuse down the line. But we can only be so paranoid.

** I confess there some other aspects I don’t quite follow here — for example, why are some ‘compliant’ CAs marked with ‘CP/CPS does not allow MITM usage’, while others are ‘In Progress’ in making that change. I’m sure there’s a good technical reason for this. Hopefully the end result is compliance all around.