Oh my…

Neal Koblitz is not a happy camper:

At Crypto 2011 the Program Chair, Phil Rogaway, explained that the paper [2] was chosen for the Best Paper Award “overwhelmingly” by the Program Committee, which “praised the work for its broad appeal…and its potential impact.”

And, indeed, the treatment of hashed ElGamal encryption in [2] is in some sense a remarkable achievement. Usually one has to leave the sciences entirely if one wishes to find works of scholarship — for example, Michel Foucault’s turgid 760-page three-volume philosophical treatise on sex [14] — that have been so successful in turning something that should be interesting and accessible to everyone into something lengthy, unreadable, and boring.

DESFire

If you skipped today’s crypto news, you missed some good press for Daviddesfire Oswald and Christof Paar’s recent side-channel attack on the Mifare DESFire MF3ICD40 chip.* I’m hardly an expert on side-channel attacks, but being uninformed has never stopped me before, and I figured it might be fun to read the paper and talk about this very neat attack.

What’s the DESFire MF3ICD40 and where is it used?

I’m guessing MF3ICD40 probably isn’t a household name where you live, but you might be familiar with one of the systems that use it.  These include the San Francisco Clipper card, the Czech “in-karta”, and a cute Australian card transit card known as the myki.  It also powers a number of access control systems, like the prox cards that let you into some high security buildings.
Clipper card.

The MF3ICD40, henceforth “MF3”, is a capable little chip.  It’s a Radio Frequency Identification (RFID) device, meaning that it communicates wirelessly with a reader — e.g., a Clipper card turnstyle.  It can store a certain amount of information and retrieve it on demand.  Like the DST chip I mentioned in an earlier post, every MF3 contains a cryptographic key and a cipher, which it can use to (among other things) perform a challenge/response authentication protocol.

Unlike the DST, the MF3 uses a pretty reasonable cipher for its authentication protocol. Specifically, it employs Triple-DES with a 112-bit key.  While 3DES is definitely getting a bit long in the tooth (it’s been deprecated in FIPS), it’s not thought to be vulnerable to any truly practical attacks.  Nor, to the best of my knowledge, is there anything wrong with the way that the MF3 uses 3DES.

So if the crypto is ok, what’s the problem?

Just like the DST, the MF3 chip contains no internal power source.  Instead, power for the chip is extracted from a magnetic field generated by the reader.  This is mostly a good thing — by powering the chip this way, the designers were able to dramatically reduce its size and cost.  Moreover, you don’t need to replace your MF3 when some crappy battery dies.

Unfortunately, the use of an external power source opens the device up to a special class of side-channel attack known as Correlation Power Analysis, or CPA.  Since the MF3 draws power from the reader, a malicious reader can actually tell — from moment to moment — how much power the device is using.

Ordinarily this wouldn’t be very interesting. However, if you know exactly how the device performs the computation, and you can repeatedly measure the device’s power consumption while sending it carefully crafted input values, you can actually learn quite a lot.  Specifically, when the device encrypts (or decrypts) your chosen inputs, those power fluctuations can actually leak the bits of the cryptographic key.  In the case of MF3, you can obtain the full 112-bit key in about seven hours.

Isn’t side channel analysis pretty well known?

Yes, and that’s what makes this interesting.  With caveats.

Side channel attacks themselves are hardly a new thing.  The concept of using power analysis to attack cipher implementations was first proposed in CRYPTO ’99 by Paul Kocher, Joshua Jaffe and Benjamin Jun, all of whom worked for Paul’s company, Cryptography Research.  In a truly inspired move, Paul and his researchers turned around and patented the standard set of countermeasures to the very attack they had invented.  These countermeasures can be found in just about every high-security smart card manufactured today.

This story is important because the MF3 clearly didn’t have the latest and greatest hardware countermeasures built into it.  If it had, the Oswald and Paar attacks wouldn’t have gone anywhere. (Though interestingly, it did have one countermeasure, see below.)

This seems to be a common situation for a certain class of RFID devices, of which the MF3 is one.  These devices toe a funny line between “contactless smart card” (where SCA countermeasures would be expected) and “plain-old RFID” where security isn’t thought to be as important. Unfortunately, the people who select these systems may not realize this, and that’s when bad things can happen.

I should also add that — unlike a recent side-channel attack on KeeLoq — this attack does not require the adversary to physically crack open the device. It’s implemented purely via the EM field.

So how does the attack work?

This is clearly the crux of the matter, and unfortunately this is where I’m going to have to let you off with a warning.  Side channel analysis is just not my area, and there’s little I can offer here that would improve on reading the paper itself.**, ***

But to make a very short go of it, DES works by breaking the key up into smaller pieces called “subkeys”, each of which is used in a different round of the cipher.  Portions of these subkeys are combined with bits of of an attacker-supplied input and sent to a series of small functions called S-boxes.  Since only a few bits go into each S-box, there is a relatively small set of possible input values, each of which results in a slightly different power consumption profile.

By observing the power consumed when calculating these S-boxes on different inputs — both for known and unknown keys, and over many, many experiments — the attacker can correlate these traces to figure out the bits that come from an unknown key.  The full key recovery attack takes ~250,000 traces, which require about 7 hours to collect.

The most interesting aspect of the paper is that the authors didn’t actually know how the DES implementation in the MF3 worked.  Much of the paper is devoted to reverse-engineering the necessary details of the operation by examining power traces.  What’s most interesting about this discussion is that along the way, they discovered that the MF3 actually does implement a countermeasure to CPA — specifically, it randomly adds up to eight “dummy rounds” to DES. Unfortunately for the MF3, this protection was not sufficient to prevent the attack.

Figure from the Oswald-Paar paper.  These charts show the operation of the DES cipher  on several input values.  The authors used this information to reverse-engineer the operation of the device.

So what’s next for MF3?

A quick glance at the MF3 website will tell you that the manufacturer “has begun phasing out the MF3ICD40”, but that “orders … will continue to be taken until December 31st, so start planning your new year’s party now!” (Ok, I made up the last bit.) From here on out, it looks like it’s going to be AES-based cards all the way.

The note doesn’t specifically mention this attack, but I assume that’s what’s behind it.  If so, that’s a pretty good response for a manufacturer.  To pull a profitable card off of the market probably cost them quite a bit, but it protects their reputation, not to mention their customers.  Unfortunately, there’s no indication that the newer AES-based cards actually have stronger protections against side channel analysis; they’re just using a different cipher.  But I guess that’s what next year’s CHES is for.

Notes:

* This post brought to you by David Oswald and Christof Paar.  Breaking Mifare DESFire MF3ICD40: Power Analysis and Templates in the Real World.  In CHES ’11.

** This is an extended version of the conference paper that I read. Thanks to Bart Coppens for finding the link.

*** If you want to hear from people who do know their side channels, see Luke Mather’s excellent post on DPA and distinguishers from the Bristol Cryptography Blog.

What is the Random Oracle Model and why should you care? (Part 2)

This is part 2 of a series on the Random Oracle Model.  See “Part 1: An introduction” for the previous post.

In a previous post I promised to explain what the 4430509846_717f9aab2b_zRandom Oracle Model is, and — more importantly — why you should care about it.  I think I made a good start, but I realize that I haven’t actually answered either of these questions.

Instead what I did is talk a lot about hash functions. I described what an ideal hash function would look like: namely, a random function.  Then I spent most of the post explaining why random functions were totally impractical (recap: too big to store, too slow to compute).

At the end of it all, I left us with the following conclusions: random functions would certainly make great hash functions, but unfortunately we can’t use them in real life. Still, if we can’t get a security proof any other way, it might just be useful to pretend that a hash function can behave like a random function. Of course, when we actually deploy the scheme we’ll have to use a real hash function like SHA or RIPEMD — definitely not random functions — and it’s not clear what our proof would mean in that case.

Still, this isn’t totally useless. If we can accomplish such a security proof, we can at least rule out the “obvious” attacks (based on the kind of glitches that real scheme designers tend to make). Any specific attack on the scheme would essentially have to have something to do with the properties of the hash function. This is still a good heuristic.

As you can see, this has gotten somewhat technical, and it’s going to get a bit more technical through to the end of the post. If you’ve gotten enough background to get the gist of the thing, feel free to skip on to the next post in this series, in which I’ll talk about where this model is used (hint all modern RSA signing and encryption!), and what it means for cryptography in practice.

Brass tacks

A security proof typically considers the interaction of two or more parties, one of whom we refer to as the adversary. In a typical proof, most of the players are “honest”, meaning they’ll operate according to a defined cryptographic protocol. The adversary, on the other hand, can do anything she wants. Her goal is to break the scheme.

Usually we spell out a very specific game that the parties will play.  For an encryption scheme, the rules of this game allow the adversary to do things that she might be able to do in a “real world” environment. For example: she might be able to obtain the encryption of chosen plaintexts and/or the decryption of chosen ciphertexts. In most cases she’ll intercept a legitimate ciphertext transmitted by an honest encryptor. Typically her goal is to learn (something about) the underlying plaintext.

It’s standard to assume that the parties will all know the scheme or protocol, including how to compute the hash function if one is used. This is a common-sense assumption, sometimes referred to as Kerckhoffs’s Principle.  Moreover, we generally assume that the adversary is limited in some way — she doesn’t have unlimited computational power or time, which means she can’t just brute force the decryption key.

Security proof in the random oracle model.  All parties (Encryptor, Decryptor and Adversary) can call out to the Random Oracle, which computes the hash function for them and provides consistent responses to all parties.

A Random Oracle security proof is just like this, but we make one further tweak. Even though everyone is allowed to compute the hash function, they can’t do it by themselves. Hashing is done by a new magical party called the “oracle”. If any party needs to hash something, they ship the message (over an imaginary secure channel) to the oracle, who computes the hash and sends it back to them.

Why the oracle? Well, since the oracle is “outside” of the parties, they no longer need to carry around a description of the hash function, and they don’t need to do the work of computing it.  This is good because — as I mentioned in the previous post — the full description of a random function is ridiculously large, and consequently takes an insane of amount of time to compute. By sticking it into the oracle, we can handwave away that ugliness.

We’ll place two restrictions on the random oracle. First, it must implement a genuine random function (sampled at the beginning of the game), with the same input/output profile as a realistic hash function we might use (e.g., 256-bit output strings if we’re eventually going to implement with SHA256).

Secondly, the responses given by the oracle must be consistent.  In other words, just as hashing “Peter Piper” through SHA1 will always produce 1c6244929dc8216f5c1024eebefb5cb73c378431,* regardless of who does it, sending a given string through the random oracle will produce the same result, regardless of which party asks for it.

Lazy Afternoon

There’s one last (and kind of neat) trick related to how we model the Random Oracle. Internally a random oracle just implements a huge table mapping input values to random output strings.  Since this is the case, our security proofs can fill this table in “lazily”. Rather than starting with an entire table all filled in, we can instead begin our analysis with an empty table, and generate it as we go. It works like this:
  1. At the beginning of the game the oracle’s table is empty.
  2. Whenever some party asks the oracle to hash a message, the oracle first checks to see if that input value is already stored in the table.
  3. If not, it generates a random output string, stores the input message and the new output string in its table, and returns the output string to the caller.
  4. If the oracle does find the input value in the table, it returns output value that’s already stored there.
If you think about this for a minute, you’ll realize that, to the various parties, this approach “looks” exactly the same as if the oracle had begun with a complete table. But it makes our lives a little easier.

Back to Delphia

In the last post I proposed to build a stream cipher out of a hash function H(). To get a little bit more specific, here’s a version of what that algorithm might look like:
1. Pick a random secret key (say 128 bits long) and share it with both the sender and receiver.

    Set IV to 1.

2. Cut a plaintext message into equal-length blocks M1, …, MN,

    where every block is exactly as long as the output of the hash function.

3. Using || to indicate concatenation and ⊕ for exclusive-OR, encrypt the message as follows:

      C1 = H(key || IV)       ⊕ M1

      C2 = H(key || IV+1)   ⊕ M2

      …

      CN = H(key || IV+N) ⊕ MN

4. Output (IV, C1, C2, ..., CN) as the ciphertext.

5. Make sure to set IV = (IV+N+1) before we encrypt the next message.
If you’re familiar with the modes of operation, you might notice that this scheme looks a lot like CTR mode encryption, except that we’re using a hash function instead of a block cipher.

A proof sketch!

Let’s say that I want to prove that this scheme is secure in the Random Oracle Model. Normally we’d use a specific definition of “secure” that handles many things that a real adversary could do.  But since this is a blog, we’re going to use the following simplified game:

Some honest party — the encryptor — is going to encrypt a message.  The adversary will intercept the ciphertext.  The adversary “wins” if he can learn any information about the underlying plaintext besides its length.

Recall that we’re in the magical Random Oracle Model. This is just like the real world, except whenever any party computes the hash function H(), s/he’s actually issuing a call out to the oracle, who will do all the work of hashing using a random function. Everyone can call the oracle, even the adversary.

Following the scheme description above, the encryptor first chooses a random key (step 1). He divides his plaintext up into blocks (step 2). For each block i = 1 to N, he queries the oracle to obtain H(key || i).  Finally, he XORs the received hashes with the blocks of the plaintext (step 3), and sends the ciphertext (C1, …, CN) out such that the adversary intercepts it (step 4).

We can now make the following observations.
  1. Internally, the oracle starts with an empty table.
  2. Each time the oracle receives a new query of the form “key || i” from the encryptor, it produces a new, perfectly random string for the output value. It stores both input and output in its table, and sends the output back to the encryptor.
  3. The encryptor forms the ciphertext blocks (C1, …, CN) by XORing each message block with one of these random strings.
  4. The encryptor never asks for the same input string twice, because the IV is always incremented.And finally, we make the most important observation:
  5. XORing a message with a secret, perfectly random string is a very effective way to encrypt it. In fact, it’s a One Time Pad. As long as the adversary does not know the random string, the resulting ciphertext is itself randomly-distributed, and thus reveals no information about the underlying message (other than its length).
Based on observation (5), all we have to do is show that the values returned when the oracle hashes “key || i” are (a) each perfectly random, and (b) that the adversary does not learn them. If we can do this, then we’re guaranteed that the ciphertext (C1, …, CN) will be a random string, and the adversary can learn nothing about the underlying plaintext!

The Final Analysis

Demonstrating part (a) from above is trivial. When the encryptor asks the oracle to compute H(key || i), provided that this value hasn’t been asked for previously, the oracle will generate a perfectly random string — which is all spelled out in the definition of how the oracle works. Also, the encryptor never asks for the same input value twice, since it always updates its IV value.

Part (b) is a little harder.  Is there any way the Adversary could learn the oracle’s output for H(key || i) for even one value of i from 1 to N? The oracle does not make its table public, and the adversary can’t “listen in” on the hash calls that the encryptor made.

Thinking about this a little harder, you’ll realize that there’s exactly one way that the adversary could learn one of those strings: she can query the oracle too. Specifically, all she has to do is query the oracle on “key || i” for any 1 <= i <= N and she’ll get back one of the strings that the encryptor used to encrypt the ciphertext. If the adversary learns one of these strings, she can easily recover (at least part of) the plaintext, and our argument doesn’t work anymore.

There’s one obvious problem with this, at least from the adversary’s point of view: she doesn’t know the key.

So in order to query on “key || i”, the adversary would have to guess the key.  Every time she submits a guess “guess || i” to the oracle she’ll either be wrong — in which case she gets back a useless random string. Or she’ll be right, in which case our scheme won’t be secure anymore.

But fortunately for us, the key is random, and quite big (128 bits). This tells us something about the adversary’s probability of success.

Specifically: if she makes exactly one guess, her probability of guessing the right key is quite small: one in the number of possible keys.  For our specific example, her success probability would be 1/(2^128), which is astronomically small. Or course, she could try guessing more than once; if makes, say q guesses, her success probability rises to q/(2^128).

The basic assumption we started with is that the adversary is somehow limited in her computational power. This means she can only perform a reasonable number of hashes. In a formal security proof this is where we might invoke polynomial vs. exponential time. But rather than do that, let’s just plug in some concrete numbers.

Let’s assume that the adversary only has time to compute 2^40 (1.1 trillion) hashes. Assuming we used a 128-bit key, then her probability of guessing the key is at most (2^40)/(2^128) = 1/(2^88).  This is still a pretty tiny number.

Think a real-world adversary might be able to compute more than 2^40 hashes? No problem.  Plug in your own numbers — the calculation stays the same. And if you think the scheme isn’t secure enough, you can always try making the key longer.

The point here is that we’ve demonstrated that in the Random Oracle Model the above scheme is ‘as secure as the key’.** In other words, the adversary’s best hope to attack the scheme is to find the key via brute-force guessing. As long as the key is large enough, we can confidently conclude that a “realistic” (i.e., time-bounded) adversary will have a very low — or at least calculable — probability of learning anything about the message.

In Summary

Depending on your point of view, this may have seemed a bit technical. Or maybe not. In any case, it definitely went a bit past the “layperson’s introduction” I promised in the previous post. Moreover, it wasn’t technical enough, in that it didn’t capture all of the ways that cryptographers can “cheat” when using the random oracle model.***

I’m going to do my best to up for that in the next post, where we’ll talk a bit about some real schemes with proofs in the random oracle model, and — more importantly — why cryptographers are willing to put up with this kind of thing.

See the next post: Part 3, in which the true nature of the Random Oracle Model is revealed.

Notes:

* I got this result from an online SHA1 calculator of untrusted veracity.  YMMV.

** Of course, in a real proof we’d be more formal, and we’d also consider the possibility that an adversary might at least be able to obtain the encryption of some chosen plaintexts.  This would be captured in the security game we use.

*** Some will note that I didn’t exactly need to model the hash function as a random oracle to make this example work.  Had I invoked a keyed hash function, I might have used a weaker assumption (and in this case, weak is good!) that H() is pseudo-random.  But this is really a topic for another post.  I chose this example for pedagogical reasons (it’s easy to understand!) rather than its unique need for a random oracle analysis.

How standards go wrong: constructive advice edition

The other day I poked a little fun at people who use non-standard encryption schemes in major Internet standards.  I admit that my flowchart was a tad snarky.  Moreover you could have gotten essentially the same advice by traveling back to 1998 and talking to Bruce Schneier.*

So to atone for my snark, I wanted to offer some more serious thoughts on this problem.  This is important, since TLS 1.0 (which I discussed the other day) is hardly the first widely-deployed spec to use bogus cryptography.  In my mind, the prevalence of this kind of thing raises two important questions:

  1. How do these bugs creep into major protocol specs?
  2. What can can specification designers do stop it?

Below I’m going to speculate wildly about the first question, and hopefully in the process generate some answers to the second.  But first, an important caveat: I’m far too lazy to join any standards committees myself,** so feel free to take my advice with appropriate salt.

Problem #1: Too much committee specialization and delegation.

A few years ago I became involved in a legal dispute related to an IEEE wireless security standard.  My role in this case required me to read the minutes of every single meeting held by the standards committee.  (This is a funny thing about litigation — lawyers will often read these documents more carefully than any committee member ever will.)

The good news: this committee did a very nice job, especially given the magnitude of the problem they were trying to solve — which I will analogize to fixing a levee breach with four sandbags and a staple remover.

What struck me about the process, however, is that from a large committee, most of the critical security decisions were ultimately delegated to one or two domain experts, with the rest of the committee assuming a passive approach.

“How should we arrange the sandbags? Ok, Bob’s our sandbag expert. Jerry, should the staple remover be red? Ok, let’s vote on it. Excellent.”

The committee got lucky in that it had some fantastic domain experts.  But not every committee gets this lucky.  Furthermore, in this case the arrangement of the sandbags was critical to the stability of the entire system, so there was pressure to get it right.

On the other hand, many standards have little used features that nobody much cares about.  When the details of this feature are farmed out, the pressure is rarely as high, and this is when things can go poorly.

Problem #2: A lack of appreciation for security proofs.

Provable security is probably the neatest development in the history of computer security.  Rather than simply conjecture that an encryption scheme is secure, we can often prove it exactly.  Even when our proofs rest on unrealistic assumptions (like the existence of ideal ciphers or random oracles), they can still be effective at ruling out many “obvious” attacks on a cryptosystem.

With this power at our fingertips, we should never have to worry about something like the recent BEAST attack on SSL and TLS 1.0.  And yet stuff like this still happens all the time.

By way of a reminder: BEAST relies on a glitch in the way that SSL/TLS 1.0 implements CBC-mode encryption.  In normal CBC-mode, every ciphertext is constructed using a random Initialization Vector (IV).  In SSL/TLS 1.0, the designers “optimized” by setting the IV to be the final block of the previous ciphertext, sometimes known as a “residue”.

Presumably the designers thought this was ok.  It wasn’t in line with best practices, but it saved a few bytes.  Moreover, nobody had proposed any practical attacks on this approach (and wouldn’t, until nearly five years after the TLS 1.0 spec was finalized).

Still, what if the designers had been obliged to offer a proof the security of this approach?  If they had, they would have quickly realized that there was a problem.  You see, it’s quite possible to prove that standard CBC mode is secure under chosen plaintext attack.***  But you can’t do the same thing with the residue optimization.  Any reasonable proof attempt will crash and burn.

Requiring a proof would have immediately knocked this optimization out of the protocol, long before anyone had thought of a specific attack on it.

Now, you might argue that TLS 1.0 was written way back in 1999, and that it inherited this nonsense from an even older protocol (SSL).  Furthermore, back then “practice oriented” provable security wasn’t well understood; people were too busy buying tech stocks.  If you were really pedantic, you might even remind me that things eventually got better in 2006 when TLS 1.1 came out.

And yet…  Even in TLS 1.1, two whole years after an attack on TLS 1.0 was pointed out, the designers were still putting goofy, unproven stuff into the spec:

      The following alternative procedure MAY be used; however, it has
not been demonstrated to be as cryptographically strong
as the
above procedures
.  The sender prepends a fixed block F to the
plaintext (or, alternatively, a block generated with a weak PRNG).
He then encrypts as in (2), above, using the CBC residue from the
previous block as the mask for the prepended block.  Note that in
this case the mask for the first record transmitted by the
application (the Finished) MUST be generated using a
cryptographically strong PRNG.

This kind of thing may be right, it may be wrong.  But that’s my point.  If you don’t know, don’t put it in the spec.

Problem #3: What, no point releases?

A couple of months ago I installed MacOS 10.7 on my Macbook Pro.  Just as I was getting used to all the new UI “improvements”, a software update arrived.  Now I’m running MacOS 10.7.1.  Even at this moment, my Mac would really like to talk to me about another security update — which I’m too lazy to install (cobbler’s children, shoes, etc.).

And this is the Macintosh experience.  I shudder to think what it’s like on Windows.****

On the flipside, some major Internet standards seem too respectable to patch.  Take our friend TLS 1.0, which “shipped” in 1999 with a bad cipher mode inherited from SSL.  In 2002, the OpenSSL project had the foresight to apply a band-aid patch.  In 2004 some specific attacks were identified.  And it took still another two years for TLS version 1.1 to propose a formal patch to these issues.

How would you feel if your operating system went seven years between security patches?  Especially when experts “in the know” already recognized the problems, and had functional workarounds.  Where was TLS 1.0.1?  It would have been nice to get that to the folks working on NSS.

Problem #4: Too many damn options.

I mentioned this once before on this blog, but a good point can never be made too many times.  It’s very common for major protocol specifications to include gobs and gobs of options.  In one sense this makes sense — you never know precisely what your users will require.  If you don’t cater to them, they’ll hack in their own approach, or worse: design a totally new protocol.

But every option you add to a protocol spec implicitly creates a new protocol.  This protocol has to be analyzed.  And this analysis takes time and adds complexity.  Add this to all the other problems I mention here, you can have some interesting times.

Problem #5: Don’t worry, implementers won’t follow the spec!

Anyone who’s ever written a specification knows that it’s like a platonic form.  Just as your kitchen table is never quite as good as the “perfect” table (mine has three Splenda packets propping up one leg), the implementation of your spec will never be quite exact.

Usually this is a bad thing, and it’s frequently how protocols break in practice. However, there are some very strange cases — such as when the specification is broken — where incorrect implementation may actually be perceived as a good thing.

Case in point: a few years back I was asked to look at one portion of a new proposed standard for securing consumer electronics devices.  The spec was already public, and some devices had even been manufactured to support it.

The bug I found was not a showstopper, but it did completely negate the effectiveness of one particular aspect of the protocol.  The discussion I had with the standards body, however, was quite strange.  Good news, they told me, none of the existing devices actually implement the protocol as written!  They’re not vulnerable!  

Ok, that’ is good news — for now.  But really, what kind of solution is this?  If a protocol spec is bad, then fix it!  You’re much better off fixing the spec once than hoping that none of the eventual implementers will do what you tell them to do.

In summary

This has been a long post.  There’s more I could probably say, but I’m exhausted. The truth is, snark is a lot more fun than “real advice”.

I have no idea if these thoughts really sum up the totality of how standards go wrong.  Maybe I’m still being a bit unfair — many security standards are extremely resilient.  It’s not like you hear about a “0-day” attack on TLS every other day.

But committees do screw up, and there are a few simple things that should happen in order to reduce the occurrence of such problems.  Mandate security proofs where possible.  Get outside advice, if you need it.  Reduce the number of features you support.  Update your specifications aggressively.  Make sure your implementers understand what’s important.  Specification design is hard work, but fixing a bad spec is a million times worse.

Anyway, maybe one of these days I’ll be crazy enough to get involved with a standards committee, and then you can all laugh at how badly I screw it up.

Notes:

If you do this, please call 1998-me and tell him not to waste two years trying to sell music over the Internet.  Also, tell him to buy Apple stock.

** I can’t handle the meetings.

*** A common approach is to assume that the block cipher is a pseudo-random permutation.  See this work for such an analysis.

**** Cruel Windows joke.  Honestly, I hear it’s gotten much better for you guys.

Bram Cohen Corrected

Update 11/24/11: Bram has responded my criticisms in this post. See the comments section below for his thoughts and my response. Also, see this new post for my more detailed response.

I was innocently Googling the title of my blog (yes, I do this, it’s sad) when I came across this May 2011 post by Bram Cohen entitled “Practical Cryptography Corrected“.

Ostensibly the post is a criticism of Bruce Schneier’s book Practical Cryptography. Instead, it’s proof that inventing the world’s most useful peer-to-peer protocol does not make you a cryptographer.  More worrisome than that, it’s an example of someone with a big megaphone using it to give really bad advice.

Bram’s post has ten points, of which four are (somewhat) problematic.  Let’s take them one at a time.

Bram’s Practical Advice #1: For an RSA exponent, you should always use 2. Technically that’s Rabin-Williams, and requires slightly different implementation, but that actually works in its favor. Rabin-Williams has a reduction to factoring, RSA does not.

No, no, no.  A thousand times no.

If you’re going to encrypt with RSA, you should encrypt with RSA.  That means using a standard, well-tested RSA implementation with the traditional RSA public exponent (e = 0x10001).*

Unless you’re an expert, do not go switching your exponents around.  Rabin-Williams (public exponent 2) looks like RSA, but it’s a very different cryptosystem — both in how decryption works and in what goes wrong if you screw up your implementation.

For one thing, when you implement Rabin-Williams incorrectly, it becomes vulnerable to a chosen ciphertext attack that completely recovers the scheme’s secret key.  RSA does not have this problem.  This issue can be managed through skillful use of padding**, but your typical from-scratch implementer is not skillful.  Even bad implementations of good padding can lead to problems.  Why chance it?

Which brings me to my first law: The probability that your RSA ciphertext will be broken via some number-theoretical attack is vanishingly small.  The probability that it will be broken because you screwed up your implementation, on the other hand, is high.  All current approaches to solving the RSA problem work by attacking the factoring problem anyway, so it’s unclear what advantage we’re buying ourselves here.

Bram’s Practical Advice #2: You should always do encryption as a layer outside of authentication.

Cryptographers have been arguing the order of encryption and authentication for nearly thirty years.  Should you encrypt first, then apply authentication (a MAC)? Or should you authenticate first, then encrypt?

By 2000 we had basically put this question to bed.  The recommended best practice? Always encrypt first (with a strong cipher and mode of operation), then apply a MAC to the resulting ciphertext.  This isn’t just easy to remember, it’s provably secure against adaptive chosen ciphertext attacks.  That means you’re protected against very real issues like padding oracle attacks.  Bram’s technique doesn’t give you this, and that can spell disaster.

Bram’s Practical Advice #3: For an encryption mode, you should always use CTR, and always use a nonce of zero, and never reuse keys.

Bram’s Practical Advice #4: You should never use the same key for both encryption and authentication. If you need both encryption and authentication, you should use two keys. One for encryption, and one for authentication.

This one’s a twofer, and technically none of it is bad advice.  I’m a fan of CTR mode, and zero IVs are just fine in CTR (but never in CBC!) mode.

I have only two points to make: first, why not use a random IV?  What’s going to happen to you if you hard-code a zero IV, and then the next (idiot) developer switches your mode of operation to CBC?  This shouldn’t happen, but it might.

But more generally, my advice is not to use CTR at all.  Use a provably-secure authenticated mode of operation like CCM, OCB or (best) GCM, in the way recommended by standards.  These modes handle both encryption and authentication for you, and better yet — they all use a single key (by design). You’re much better off using these standards than hand-rolling your own encrypt/MAC approach (especially if — per Bram’s Advice #2) you plan on doing it wrong.

In Summary

I have a lot of respect for Bram, and if I come off a bit critical in the post above, it’s isn’t because I dislike the guy.  In fact, many of his remaining points are good.

But the only thing worse than bad advice is bad advice from a voice of authority.  And Bram has a lot of authority, which could lead people to trust him over some goofy, non-famous cryptographer (or even Bruce Schneier, for some crazy reason).

Fixing people’s cryptographic mistakes has given me a good living so far, but that doesn’t mean I like it.  So please, please don’t listen to Bram.  Do things right, follow standards, and most importantly: if you don’t know exactly what you’re doing, don’t do it at all.

Notes:

* I initially wrote “e = 100001” which is totally bogus.  Goes to show you shouldn’t trust a blog post.  Thanks to GSBabil.

** Use of padding, incidentally, completely breaks the reduction to factoring unless you conduct the proof in the Random Oracle Model.  And as I discuss here, there are some serious problems with that.

What is the Random Oracle Model and why should you care? (Part 1)

This is the first in a series of posts on the Random Oracle Model.  This might get a little wonky for some people’s tastes, so if you’re not interested in provable security, no problem.  I’ll be back with more on software and physical security once I get this out of my system.

It happens that I’m scheduled to teach a class today onprovable security, and something called the “Random Oracle Model”.  While getting my thoughts together, it occurred to me that a) this subject might be of interest to people who aren’t my grad students, and b) there doesn’t seem to be a good “layperson’s” introduction to it.For the most part I like to talk about how cryptography gets implemented.  But implementation doesn’t matter a lick if you’re starting with insecure primitives.

So at the risk of alienating my tiny readership, I’m going to take a break from our regularly scheduled “how cryptographic systems go bad” programming to talk about this wonkish subject — which, as we’ll see in a bit, is itself just another flavor of how crypto goes wrong.

Hash Functions for Dummies

Before we talk about Random Oracles, we first need to discuss hash functions.  These are functions that take a (potentially) long input, and ‘hash’ it down into a fixed-size value that we refer to as a Message Digest.

We use hash functions a lot in cryptography.  They get the most press for turning up in digital signature schemes, but they also make an appearance in many encryption schemes.  In fact, throughout the field, it’s pretty hard to throw a stone without hitting one.

For the most part, a cryptographic hash function has to satisfy at least some of the following properties.  First of all, we expect it to be “one way”, meaning that it’s hard to ‘invert’ the function, i.e., to find the original message given only a Message Digest.

Secondly, a hash function should be “collision resistant”.  In other words, it should be hard to find two different messages (M, M’) such that Hash(M) == Hash(M’).  This property is very important for digital signatures, since we typically don’t sign the message directly, we sign a hash of the message.  If an adversary can find two messages that hash to the same value, then a signature on (the hash of) one message is also a signature on the other.  In practice this can lead to bad consequences.

Sometimes we demand even more from our hash functions.  The tricky part is knowing how much more we can ask.  To illustrate, let me give an example.

Encryption in Delphia

Imagine that you live in Delphia, a nation that bans all encryption algorithms.  Despite this, you have secrets to keep.  You notice that your government has not banned hash functions.  This gives you an idea: perhaps you can ‘hack’ an encryption scheme out of a hash function.

This isn’t crazy.  The outputs of many hash algorithms look pretty random.  Maybe you could use a hash function to build a stream cipher.  Basically, you’ll use a hash function to hash a secret key (along with some kind of counter value); this will result in a stream of ‘random looking’ bits which you can then XOR with your plaintext.

The question is: would a “one-way” and “collision-resistant” hash function be sufficient to ensure that this scheme is secure?  I’ll give you a blunt hint: no.  Technically neither of these properties implies that the output of a hash function be ‘random looking’.  You can conceive of hash functions that meet those guarantees, and yet produce outputs that would be useless for building a stream cipher (they might, say, contain long strings of predictable values).  If you use these to build your cipher, you’ll be terribly disappointed.

Spherical horses

When cryptographers first started pondering problems like the above, their question was what they wanted from an “ideal” hash function.  To a mathematician, the answer turned out to be obvious.  They wanted it to behave like a random function.  That term has a very specific mathematical definition; for the purposes of this discussion we’ll use an equivalent one that’s easier to talk about.

Imagine that we wanted to implement an ideal hash function.  Instead of designing a small algorithm that uses confusion and diffusion (as we do with most real hash functions), we might instead create a big hard-coded table.  The table would have the input messages in one column, and the corresponding outputs on the other.

The important thing about this table is that every single output value is an independent, random string.

       Input (binary)                  Output                      
       0                               Totally random bitstring 1
       1                               Totally random bitstring 2
       00                              Totally random bitstring 3
       01                              Totally random bitstring 4
       10                              Totally random bitstring 5
       11                              Totally random bitstring 6
       and so on...

Obviously the full table for a useful hash function would be pretty big.  How big? Well, SHA1 accepts messages up to 2^64 bytes in length.  So the answer is: too big to fit in this blog.  In fact, there would be more entries in the full table than there are atoms in the universe.

Even if we had room to store such a table, the process of hashing — looking up an input and finding the output — would be awfully time consuming.  If we stored our hash table on the tape of a Turing machine (the quintissential computer), just moving the tape head to the right place would take (on average) a number of time steps that’s exponential in the size of the input.

From time to time when I mention all of this in my intro course, some ambitious student tries to come up with a clever way to shrink the table down into something that we can carry around.  But here’s the problem: the set of output strings are perfectly random. Finding a way to represent them in a compact hash would be equivalent to compressing random data.  Unfortunately, information theory tells us that this is a big no-no.

Doubling down

Let’s take stock for a moment.  So far I hope that I’ve convinced you of at least two things. First, cryptographers would really like their hash functions to behave like random functions.

Second, “real” hash functions can’t be random functions, because that would be totally unworkable and impractical.  If you look at practical hash functions like SHA1, SHA512 and RIPEMD160 — all of which have nice, short implementations consisting of a few KB of code — it should be obvious that these are not random functions, no matter how ‘random’ the outputs look.

Cambridge, Massachusetts, 10:35am

So if we can’t use random functions in practice, what about an alternative approach? Perhaps we could model our hash functions as random functions, just for the purposes of the security proof.  Then in real life we’d substitute in SHA or RIPEMD or some practical hash function.  It’s not totally clear that the security proof would still mean anything, but at least we’d be doing something.

I’ll describe a scene as Hollywood would present it: an unrealistically well-lit office at MIT. A noted cryptographer, played by Steve Buscemi, has just proposed to model hash functions as random functions.  His colleague, played by Kate Winslet, sets him straight.

“Can’t do it,” she explains, shaking her head sadly, “it’s not just inconvenient that it takes exponential time to evaluate a random function.  If we allowed this sort of hashing in our security proof, we would have to let the parties — including the Adversary — run for an exponential number of time steps.  But we can’t do that.  Our entire security proof is based on the idea that the parties can only perform limited computations, specifically those that can be conducted in polynomial-time.  If let the parties peform arbitrary exponential-time computations, then adversary could do anything: he could even brute-force the key.”

(Hollywood would probably punch up this dialog a bit.)

The breakthrough comes from a passing janitor (Matt Damon, reprising his role from Good Will Hunting): “What if you just create an imaginary world where everyone is limited to polynomial time computations, but there’s a special exception for hashing.  Hashing, and hashing alone, doesn’t take any time at all.  When you try to hash something, the clock stops.”

Towards a paradigm

And so this is where we are.  The perfect hash function would be a random function.  But random functions are impractical — we can’t use them in real life, or even in our security proofs without breaking them.

Cryptographers are stubborn people, and we have good imaginations.  So we’ve come up with a daydream, a way to pretend that random functions are practical — just for the purposes of our security proof.

The implications of this daydream turn out to be both wonderful and terrible.  On the one hand, it allows us to prove the security of schemes that can’t be proven in other ways. Furthermore, we can do it very efficiently.  On the other hand, it leads to security proofs that are, in a technical sense, totally meaningless.

After all, we will ultimately have to implement the scheme with a real hash function like SHA, which is decidedly not random.  What does a random oracle security proof buy us then?  This is one of the big questions that has plagued cryptographers since this idea came about.

And if you think that this is all academic, you’d be wrong.  The security proofs for some of the most widely-used cryptographic primitives are set in this model: this includes most RSA signing and encryption, as well as the Elgamal signature scheme upon which DSA is based.

If you take nothing else away from this post, you should notice that the schemes listed above are important.  If we’re doing something funny in the security proofs, people should understand what it is.

In my next post I’ll actually explain what a “random oracle” is, and how it relates to the crazy assumptions we made above.  I’ll also explain how we can prove things in this model, and talk about why cryptographers love to hate it so much.

For the next post in this series, click here.

Where Things Fall Apart: Protocols (Part 2 of 2)

This is the fourth installment in a series on cryptographic engineering.  You can see the previous posts here:
 

Test Drive

Your vehicle ignition system is the result of a Darwinian arms race between the people who build cars — and those who like to drive them.

Vehicle ignition switches through history.  From left: vintage floor-mounted ignition button; 60s-era cylinder lock; 90s/2000-era high security key; 2011 dash-mounted ignition button.

The very first electronic vehicle ignition was nothing more than a switch that completed an electrical circuit. This worked fine in small towns and out on the farm, but things weren’t so simple in the big city. So manufacturers adapted, first adding a mechanical lock cylinder then hardening the wiring. This worked for a while, until inevitably the thieves got smarter. Worse, at this point the answer wasn’t so obvious: ignition lock technology stagnated. By the late 1980s and early 1990s, vehicle theft was a multi-billion dollar industry.

A few luxury manufacturers tried to improve the physical security of their locks using high-security keys and non-standard templates. But for most manufacturers, however, there was already a more promising approach at hand. Cars themselves were becoming reliant on microcontrollers for engine control. Why not use a digital lock?

The result is the vehicle immobilizer. A typical first-gen immobilizer used a small chip embedded into the head of the car key. This chip had a single purpose: when the driver inserted the key into the lock cylinder, it would squirt out a code (or serial number), which could be received by an antenna in the lock cylinder. If this code matched what the vehicle expected to see, the engine would start. Expressed as a protocol, the transaction looked like this:

Immobilizers effectively shut down traditional hotwiring and lock-picking. But they had a fatal flaw that criminals soon discovered. Since the code never changed, someone with the right equipment could eavesdrop on the communication (or borrow your keys), and later replay it to the car. This sounds complicated, but quickly became practical thanks to inexpensive devices called “code-grabbers”.

Once again manufacturers adapted. The next generation of immobilizers dropped the fixed key in favor of a simple challenge/response authentication protocol. In this approach, the immobilizer chip and car share a cryptographic key of some sort. When you insert your car key, the car generates a random “challenge” number and sends it to the key. The chip in your car key uses the cryptographic key to compute a response based on the challenge. This tops code-grabbers, since the key itself never goes over the air, and the challenge always changes.

Challenge response protocol between vehicle and immobilizer key.  The key and car share a deterministic cryptographic algorithm F() and a secret key. The car computes F(key, challenge) and compares it to the response value.
So we’ve laid out a cryptographic protocol, but it’s a simple one, and one that’s likely to be effective. What could go wrong?

40 bits of personal history

Back in 2005, along with some colleagues, I looked at the immobilizer system used in a few million Ford, Toyota and Nissan vehicles. This particular system was based on a Texas Instruments chip called the Digital Signature Transponder (DST), a tiny RFID chip with a cipher F() and a 40-bit secret key.

Two DST form factors.  The big one is a Mobil Speedpass, which also relies on the DST technology.

The DST uses exactly the challenge-response protocol I describe above. The reader (car) sends it a 40 bit challenge, the DST encrypts that value with its cipher, truncates the result down to a 24 bit response, ships it back. The car also has a copy of the secret key which it uses to verify the response.

The problem with the DST is not the protocol. Rather, it’s the number I mentioned above: 40. As in 40-bit key length. If an adversary — say, a malicious parking attendant — borrows your car key, she can issue a challenge to the chip. After collecting the response, she can, at her leisure, test every single one of the approximately 1.1 trillion possible Immobilizer keys until they find one where F(key, challenge) is equal to the response they got from your DST chip.** This sounds hard, but it only takes a few hours on an FPGA.

This process is called “cloning”. It’s not the scariest attack since, in general, it requires the adversary to get your car key, or at least get close enough to scan it.

Now we come to the interesting part. Texas Instruments was well aware of the DST’s keysize limitation. To foil this attack, they deployed a new chip called the DST+. Rather than simply replace the weak 40-bit algorithm with a better one, which would have solved the problem, they decided to address cloning attacks using a protocol.
DST+ Mutual Authentication protocol.  From a presentation in the  Fourth Conference on the Advanced Encryption Standard (AES) (2004).
What I know about the DST+ protocol comes from a public presentation posted by a Dr. Ulrich Kaiser from Texas Instruments Germany. I freely admit that I’ve never verified this diagram against a real DST+, so caveat emptor.
The diagram is a little confusing: let’s walk through it step by step. The car (“Reader”) is on the left, and the DST+ chip (“Transponder”) is on the right. For the most part it’s exactly the same as the DST: the car generates a 40-bit challenge and sends it over to the chip.  The chip encrypts the challenge under its secret (“Immobilizer”) key, truncates the result down to 24 bits, and sends back the response.
 
The new stuff is “tacked on” to the original protocol.  long with the challenge, the car transmits an extra 24 bits that it derives by: 1) encrypting its own challenge under the Immobilizer key, 2) encrypting the result again under a second “Mutual Authentication Key” (MAK), and 3) truncating that  result down to 24 bits.
Since the DST+ chip shares both keys, it can verify that the car’s transmission is correct before it responds. If the value isn’t correct, the DST+ clams up. No response for you!  
In principle this stumps our malicious valet.  He doesn’t know the right keys.  He can send the DST+ all the challenges he wants — but it won’t answer him.  No responses, no attack.

All’s well that ends well?

A first observation is that the DST+ protocol only protects against challenges sent by an unauthorized reader. If our valet can eavesdrop on the communication between the DST+ and the legitimate reader in the car, he can still obtain a (challenge, response) pair.  Since these values are identical to those in the original DST protocol, the same attacks apply. He can use an FPGA to brute force the 40-bit Immobilizer key.

Here’s something else. Once he’s got the car’s Immobilizer key, he can go back and find the Mutual Authentication Key (MAK).  Given the challenge sent by the car, along with the 24-bit “additional authentication” string, he can:

  1. compute I = F(Immobilizer key, challenge),
  2. use the FPGA to test every single possible MAK value
  3. stop when he finds a MAK value s.t. F(MAK, I) matches the “additional authentication”.
This might seem a little pointless. After all, we already have the Immobilizer key, which is all we need to simulate a DST+ and thus start the car. Why bother going back for the MAK?

Into hypothetical territory

Yet imagine… What if a car manufacturer made a tiny mistake.  What if, speaking hypothetically, the manufacturer decided to use a single MAK across many different cars — say, every 2009 Toyota Camry? A tiny, harmless optimization.

And yet.

We know that knowledge of the Immobilizer Key makes it easy to find the car’s MAK.  But this works the other way, too. If many cars share a MAK, then anyone who knows that value can use it to derive the Immobilizer key for a car.

Even better (or worse, depending on your point of view) our attacker can do this without ever seeing the car key at all.  All you need is a challenge value and “additional authentication” value, both of which the car will give to you. The owner can be fast asleep with his keys safe on his nightstand next to him. You’re outside stealing his car.

So in other words, if you use the DST+ mutual authentication key, and make the small mistake of re-using a MAK across multiple vehicles, you’ve transformed a mild key cloning attack into something much worse. People can now steal your car even without scanning your key.

Please keep in mind that all of this is hypothetical and speculative. But the re-use of a MAK key could happen.  There’s evidence that it may have, at least in the past. What it goes to show that if you’re not very careful about your goals and security properties, protocols can do unexpected things.  They can make you less secure.

Rolling it up

These posts were not intended to be an in-depth tutorial on the mysteries of protocol design and analysis.  I do hope to talk about that more in the future.  So far we’ve barely scratched the surface of what can go wrong in a cryptographic protocol.  And these are certainly not the best examples of “bad” protocols.

Instead, the purpose of this discussion was to provide a couple of case studies involving real protocols whose failure has implications for millions of people.  It was also to show you how tiny changes to a protocol can have a significant impact.

In the next few installment of this overview series we’ll look a bit at hardware, physical security, and the kind of things that can go wrong even when you build the best machines with the best intentions.

Footnotes:

** Note, since the response is only 24 bits long, there’s a possibility of collisions, i.e., many different keys will satisfy truncate(F(key, challenge)) == response.  To deal with this problem, all you need to do is obtain a second (challenge, response) pair and weed out the false positives.

Where Things Fall Apart: Protocols (Part 1 of 2)

This is the third installment in a series on cryptographic engineering.  You can see the previous posts here:
 

Protocols

Once upon a time there were no cryptographic protocols. Most cryptographic communication took the form of one-move operations like this one:

 

There are plenty of things that can go wrong in this scenario. But if you’re the sender (as opposed to the sap carrying the message) you could restrict your concerns to the security of the primitive (e.g., cipher), or perhaps to the method of key distribution.  Admittedly, through most of history it was almost impossible to get these things “right”.  Still, you knew where your weaknesses were.

Fast forward to present day. Modern cryptographic systems are almost never content with simple one-way communication. Consider the simple act of plugging in your high-definition TV via a digital connection. Thanks to modern rights management protocols such as HDCP and DTCP, this can lead to communication flows like this:

Not only is the communication here vastly more complicated, but it takes place in the presence of a more dangerous adversary.  For one thing, he owns the communication channel and the hardware — it’s probably sitting in his living room or lab.  Most importantly, he’s got time on his hands.  Your typical barbarian only got one chance to capture a ciphertext.  If this guy needs to, he can run the protocol 100,000 times.

Now, most people will tell you that cryptography is one of the most settled areas of computer security.  Relative to the disaster that is software security, this is true. Particularly in when it comes to standardized cryptographic primitives, we’ve made enormous progress — to the point where attacks lag by years, if not decades (though, see my previous post).  If you use reasonable primitives, your attacker will not get you by breaking the algorithm.

As Gibson said, the future is here, but it’s not evenly distributed. And in the field of cryptography, protocol design is one area in which progress has definitely not been as fast.

We’ve learned a huge amount about what not to do.  We have tools that can sometimes catch us when we trip up.  Unfortunately, we still don’t have a widely-accepted methodology for building provably-secure protocols that area as practical as they are safe to use.

Maybe I’m getting ahead of myself.  What do I even mean when I refer to a “cryptographic protocol”?

For the purposes of this discussion, a protocol is an interactive communication between two or more parties. Obviously a cryptographic protocol is one that combines cryptographic primitives like encryption, digital signing, and key agreement to achieve some security goal.

What makes cryptographic protocol special is the threat.  When we analyze cryptographic protocols, we assume that the protocol will be conducted in the presence of an adversary, who (at minimum) eavesdrops on all communications. More commonly we assume that the adversary also has the ability to record, inject and modify communications flowing across the wire.  This feature is what makes cryptographic protocols so much more interesting — and troublesome — than the other types of protocol that crop up in computer networking.

A first example: SSL and TLS

SSL and its younger cousin TLS are probably the best-known security protocols on the Internet. The most recent version is TLS 1.2 and we think it’s a pretty solid protocol now. But it didn’t reach achieve this status in one fell swoop. The history of each SSL/TLS revision includes a number of patches, each improving on various flaws found in the previous version.

The original SSL v1 was a proprietary protocol created by Netscape Inc. back in 1994 to secure web connections between a browser and a web server. SSL v2 was standardized the next year. The protocol is a highly flexible framework by which two parties who have never met before can authenticate each other, agree on transport keys, and encrypt/authenticate data. Part of the flexible nature of the protocol is that it was designed to support many different ciphersuites and modes of operation. For example, it’s possible to configure SSL to authenticate data without encrypting it, even though almost nobody actually does this.

Now let me be clear. There’s nothing wrong with flexibility, per se. The benefits of flexibility, extensibility and modularity are enormous in software design — there’s nothing worse than re-factoring 100,000 lines of production code because the original designers didn’t consider the possibility of new requirements. It’s just that when it comes to building a secure cryptographic protocol, flexibility almost always works against you.

Let’s consider a concrete example. The SSL v2 protocol included a very nice feature known as “ciphersuite negotiation”. This allowed both parties to negotiate which ciphers they wanted to use from a (potentially) distinct list of candidates that each one supported. In practice, some of those ciphers were likely to be better than others. Some were designed to be worse: for example, they were created to support export-controlled browsers that max out at a 40-bit key length.

There’s nothing fundamentally wrong with ciphersuite negotiation. It clearly gives a lot of flexibility to SSL, since the protocol can easily be upgraded to support new ciphersuites without breaking compatibility with older implementations.  From the spec, you can see that this section of the protocol was treated as something of an abstraction by the designers. The negotiation is handled via its own dedicated set of messages.  Looking at this, you can almost hear some long-ago dry erase marker scribbling “ciphersuite negotiation messages go here”.

Unfortunately, in providing all of this flexibility, the designers of SSL v2 essentially created many implicit protocols, each of which could be instantiated by changing the details of the ciphersuite negotiation process. Worse, unlike the messages in the “main” protocol, the ciphersuite negotiation messages exchanged between client and server in the SSLv2 protocol were not authenticated.

In practice, this means that our adversary can sit on the wire in between the two, replacing and editing the messages passing by to make it look like each party only supports the lowest common ciphersuite. Thus, two parties that both support strong 128-bit encryption can be easily tricked into settling on the 40-bit variant.  And 40 bits is not good.

Now, the good news is that this vulnerability is a piece of history.  It was closed in SSL v3, and the fix has held up well through the present day.  But that doesn’t mean TLS and SSL are home free.

This post is continued in the next section: “Where Things Fall Apart: Protocols (Part 2)”.