How standards go wrong: constructive advice edition

The other day I poked a little fun at people who use non-standard encryption schemes in major Internet standards.  I admit that my flowchart was a tad snarky.  Moreover you could have gotten essentially the same advice by traveling back to 1998 and talking to Bruce Schneier.*

So to atone for my snark, I wanted to offer some more serious thoughts on this problem.  This is important, since TLS 1.0 (which I discussed the other day) is hardly the first widely-deployed spec to use bogus cryptography.  In my mind, the prevalence of this kind of thing raises two important questions:

  1. How do these bugs creep into major protocol specs?
  2. What can can specification designers do stop it?

Below I’m going to speculate wildly about the first question, and hopefully in the process generate some answers to the second.  But first, an important caveat: I’m far too lazy to join any standards committees myself,** so feel free to take my advice with appropriate salt.

Problem #1: Too much committee specialization and delegation.

A few years ago I became involved in a legal dispute related to an IEEE wireless security standard.  My role in this case required me to read the minutes of every single meeting held by the standards committee.  (This is a funny thing about litigation — lawyers will often read these documents more carefully than any committee member ever will.)

The good news: this committee did a very nice job, especially given the magnitude of the problem they were trying to solve — which I will analogize to fixing a levee breach with four sandbags and a staple remover.

What struck me about the process, however, is that from a large committee, most of the critical security decisions were ultimately delegated to one or two domain experts, with the rest of the committee assuming a passive approach.

“How should we arrange the sandbags? Ok, Bob’s our sandbag expert. Jerry, should the staple remover be red? Ok, let’s vote on it. Excellent.”

The committee got lucky in that it had some fantastic domain experts.  But not every committee gets this lucky.  Furthermore, in this case the arrangement of the sandbags was critical to the stability of the entire system, so there was pressure to get it right.

On the other hand, many standards have little used features that nobody much cares about.  When the details of this feature are farmed out, the pressure is rarely as high, and this is when things can go poorly.

Problem #2: A lack of appreciation for security proofs.

Provable security is probably the neatest development in the history of computer security.  Rather than simply conjecture that an encryption scheme is secure, we can often prove it exactly.  Even when our proofs rest on unrealistic assumptions (like the existence of ideal ciphers or random oracles), they can still be effective at ruling out many “obvious” attacks on a cryptosystem.

With this power at our fingertips, we should never have to worry about something like the recent BEAST attack on SSL and TLS 1.0.  And yet stuff like this still happens all the time.

By way of a reminder: BEAST relies on a glitch in the way that SSL/TLS 1.0 implements CBC-mode encryption.  In normal CBC-mode, every ciphertext is constructed using a random Initialization Vector (IV).  In SSL/TLS 1.0, the designers “optimized” by setting the IV to be the final block of the previous ciphertext, sometimes known as a “residue”.

Presumably the designers thought this was ok.  It wasn’t in line with best practices, but it saved a few bytes.  Moreover, nobody had proposed any practical attacks on this approach (and wouldn’t, until nearly five years after the TLS 1.0 spec was finalized).

Still, what if the designers had been obliged to offer a proof the security of this approach?  If they had, they would have quickly realized that there was a problem.  You see, it’s quite possible to prove that standard CBC mode is secure under chosen plaintext attack.***  But you can’t do the same thing with the residue optimization.  Any reasonable proof attempt will crash and burn.

Requiring a proof would have immediately knocked this optimization out of the protocol, long before anyone had thought of a specific attack on it.

Now, you might argue that TLS 1.0 was written way back in 1999, and that it inherited this nonsense from an even older protocol (SSL).  Furthermore, back then “practice oriented” provable security wasn’t well understood; people were too busy buying tech stocks.  If you were really pedantic, you might even remind me that things eventually got better in 2006 when TLS 1.1 came out.

And yet…  Even in TLS 1.1, two whole years after an attack on TLS 1.0 was pointed out, the designers were still putting goofy, unproven stuff into the spec:

      The following alternative procedure MAY be used; however, it has
not been demonstrated to be as cryptographically strong
as the
above procedures
.  The sender prepends a fixed block F to the
plaintext (or, alternatively, a block generated with a weak PRNG).
He then encrypts as in (2), above, using the CBC residue from the
previous block as the mask for the prepended block.  Note that in
this case the mask for the first record transmitted by the
application (the Finished) MUST be generated using a
cryptographically strong PRNG.

This kind of thing may be right, it may be wrong.  But that’s my point.  If you don’t know, don’t put it in the spec.

Problem #3: What, no point releases?

A couple of months ago I installed MacOS 10.7 on my Macbook Pro.  Just as I was getting used to all the new UI “improvements”, a software update arrived.  Now I’m running MacOS 10.7.1.  Even at this moment, my Mac would really like to talk to me about another security update — which I’m too lazy to install (cobbler’s children, shoes, etc.).

And this is the Macintosh experience.  I shudder to think what it’s like on Windows.****

On the flipside, some major Internet standards seem too respectable to patch.  Take our friend TLS 1.0, which “shipped” in 1999 with a bad cipher mode inherited from SSL.  In 2002, the OpenSSL project had the foresight to apply a band-aid patch.  In 2004 some specific attacks were identified.  And it took still another two years for TLS version 1.1 to propose a formal patch to these issues.

How would you feel if your operating system went seven years between security patches?  Especially when experts “in the know” already recognized the problems, and had functional workarounds.  Where was TLS 1.0.1?  It would have been nice to get that to the folks working on NSS.

Problem #4: Too many damn options.

I mentioned this once before on this blog, but a good point can never be made too many times.  It’s very common for major protocol specifications to include gobs and gobs of options.  In one sense this makes sense — you never know precisely what your users will require.  If you don’t cater to them, they’ll hack in their own approach, or worse: design a totally new protocol.

But every option you add to a protocol spec implicitly creates a new protocol.  This protocol has to be analyzed.  And this analysis takes time and adds complexity.  Add this to all the other problems I mention here, you can have some interesting times.

Problem #5: Don’t worry, implementers won’t follow the spec!

Anyone who’s ever written a specification knows that it’s like a platonic form.  Just as your kitchen table is never quite as good as the “perfect” table (mine has three Splenda packets propping up one leg), the implementation of your spec will never be quite exact.

Usually this is a bad thing, and it’s frequently how protocols break in practice. However, there are some very strange cases — such as when the specification is broken — where incorrect implementation may actually be perceived as a good thing.

Case in point: a few years back I was asked to look at one portion of a new proposed standard for securing consumer electronics devices.  The spec was already public, and some devices had even been manufactured to support it.

The bug I found was not a showstopper, but it did completely negate the effectiveness of one particular aspect of the protocol.  The discussion I had with the standards body, however, was quite strange.  Good news, they told me, none of the existing devices actually implement the protocol as written!  They’re not vulnerable!  

Ok, that’ is good news — for now.  But really, what kind of solution is this?  If a protocol spec is bad, then fix it!  You’re much better off fixing the spec once than hoping that none of the eventual implementers will do what you tell them to do.

In summary

This has been a long post.  There’s more I could probably say, but I’m exhausted. The truth is, snark is a lot more fun than “real advice”.

I have no idea if these thoughts really sum up the totality of how standards go wrong.  Maybe I’m still being a bit unfair — many security standards are extremely resilient.  It’s not like you hear about a “0-day” attack on TLS every other day.

But committees do screw up, and there are a few simple things that should happen in order to reduce the occurrence of such problems.  Mandate security proofs where possible.  Get outside advice, if you need it.  Reduce the number of features you support.  Update your specifications aggressively.  Make sure your implementers understand what’s important.  Specification design is hard work, but fixing a bad spec is a million times worse.

Anyway, maybe one of these days I’ll be crazy enough to get involved with a standards committee, and then you can all laugh at how badly I screw it up.


If you do this, please call 1998-me and tell him not to waste two years trying to sell music over the Internet.  Also, tell him to buy Apple stock.

** I can’t handle the meetings.

*** A common approach is to assume that the block cipher is a pseudo-random permutation.  See this work for such an analysis.

**** Cruel Windows joke.  Honestly, I hear it’s gotten much better for you guys.