Attack of the week: POODLE

Believe it or not, there’s a new attack on SSL. 4241034941_3188086980_mYes, I know you’re thunderstruck. Let’s get a few things out of the way quickly.

First, this is not another Heartbleed. It’s bad, but it’s not going to destroy the Internet. Also, it applies only to SSLv3, which is (in theory) an obsolete protocol that we all should have ditched a long time ago. Unfortunately, we didn’t.

Anyway, enough with the good news. Let’s get to the bad.

The attack is called POODLE, and it was developed by Bodo Möller, Thai Duong and Krzysztof Kotowicz of Google. To paraphrase Bruce Schneier, attacks only get better — they never get worse. The fact that this attack is called POODLE also tells us that attack names do get worse. But I digress.

The rough summary of POODLE is this: it allows a clever attacker who can (a) control the Internet connection between your browser and the server, and (b) run some code (e.g., script) in your browser to potentially decrypt authentication cookies for sites such as Google, Yahoo and your bank. This is obviously not a good thing, and unfortunately the attack is more practical than you might think. You should probably disable SSLv3 everywhere you can. Sadly, that’s not so easy for the average end user.

To explain the details, I’m going to use the usual ‘fun’ question and answer format I employ for attacks like these.

What is SSL?

SSL is probably the most important security protocol on the Internet. It’s used to encrypt connections between two different endpoints, most commonly your web browser and a web server. We mostly refer to SSL by the dual moniker SSL/TLS, since the protocol suite known as Secure Sockets Layer was upgraded and renamed to Transport Layer Security back in 1999.

This bug has nothing to do with TLS, however. It’s purely a bug in the old pre-1999 SSL, and specifically version 3 — something we should have ditched a long time ago. Unfortunately, for legacy reasons many browsers and servers still support SSLv3 in their configurations. It turns out that when you try to turn this option off, a good portion of the Internet stops working correctly, thanks to older browsers and crappy load balancers, etc.

As a result, many modern browsers and servers continue to support SSLv3 as an option. The worst part of this is that in many cases an active attacker can actually trigger a fallback. That is, even if both the server and client support more modern protocols, as long as they’re willing to support SSLv3, an active attacker can force them to use this old, terrible protocol. In many cases this fallback is transparent to the user.

What’s the matter with SSL v3?

So many things it hurts to talk about. For our purposes we need focus on just one. This has to do with the structure of encryption padding used when encrypting with the CBC mode ciphersuites of SSLv3.

SSL data is sent in ‘record’ structures, where each record is first authenticated using a MAC. It’s subsequently enciphered using a block cipher (like 3DES or AES) in CBC mode. This MAC-then-encrypt design has been the cause of much heartache in the past. It’s also responsible for the problems now.

Here’s the thing: CBC mode encryption requires that the input plaintext length be equal to a multiple of the cipher’s block size (8 bytes in the case of 3DES, 16 bytes for AES). To make sure this is the case, SSL implementations add ‘padding’ to the plaintext before encrypting it. The padding can be up to one cipher block in length, is not covered by the MAC, and always ends with a single byte denoting the length of the padding that was added.

In SSLv3, the contents of the rest of the padding is unspecified. This is the problem that will vex us here.

How does the attack work?

Let’s imagine that I’m an active attacker who is able to obtain a CBC-encrypted record containing an interesting message like a cookie. I want to learn a single byte of this cookie — and I’m willing to make the assumption that this byte happens to live at the end of a cipher block boundary.

(Don’t worry about how I know that the byte I want to learn is in this position. Just accept this as a given for now.)

Imagine further that the final block of the record in question contains a full block of padding. If we’re using AES as our cipher, this means that the last byte of the plaintext of the final block contains a ’15’ value, since there are 15 bytes of padding. The preceding 15 bytes of said block contain arbitrary values that the server will basically strip off and ignore upon decryption, since SSLv3 doesn’t specify what they should contain. (NB: TLS does, which prevents this issue.)

The attack works like this. Since I control the Internet connection, I can identify the enciphered block that I want to learn within an encrypted record. I can then substitute (i.e., move) this block in place of the final block that should contain only padding.

When the server receives this new enciphered record, it will go ahead and attempt to decrypt the final block (which I’ll call C_n) using the CBC decryption equation, which looks like this:

Decrypted final block := Decipher(C_n) XOR C_{n-1}

Note that C_{n-1} is the second-to-last block of the encrypted record.

If the decrypted final block does not contain a ’15’ in the final position, the server will assume either that the block is bogus (too much padding) or that there’s less padding in the message than we intended. In the former case it will simply barf. In the latter case it will assume that the meaningful message is longer than it actually is, which should trigger an error in decryption since MAC verification will fail. This should also terminate the SSL connection.

Indeed, this is by far the most likely outcome of our experiment, since the deciphered last byte is essentially random — thus failure will typically occur 255 out of every 256 times we try this experiment. In this case we have to renegotiate the handshake and try again.

Every once in a while we’ll get lucky. In 1/256 of the cases, the deciphered final block will contain a 15 byte at the final position, and the server will accept this as as a valid padding length. The preceding fifteen bytes have also probably been changed, but the server will then strip off and ignore those values — since SSLv3 doesn’t care about the contents of the padding. No other parts of the ciphertext have been altered, so decryption will work perfectly and the server should report no errors.

This case is deeply meaningful to us. If this happens, we know that the decipherment of the final byte of C_n, XORed with the final byte of the preceding ciphertext block, is equal to ’15’. From this knowledge we can easily determine the actual plaintext value of the original byte we wanted to learn. We can recover this value by XORing it with the final byte of the preceding ciphertext block, then XOR that with the last byte of the ciphertext block that precedes the original block we targeted.

Voila, in this case — which occurs with probability 1/256 — we’ve decrypted a single byte of the cookie.

The important thing to know is that if at first we don’t succeed, we can try, try again. That’s because each time we fail, we can re-run the SSL handshake (which changes the encryption key) and try the attack again. As long as the cookie byte we’re attacking stays in the same position, we can continue our attempts until we get lucky. The expected number of attempts needed for success is 256.

We’ve got one byte, how do we get the rest?

The ability to recover a single byte doesn’t seem so useful, but in fact it’s all we need to decipher the entire cookie — if we’re able to control the cookie’s alignment and location within the enciphered record. In this case, we can simply move one byte after another into that critical final-byte-of-the-cipher-block location and run the attack described above.

One way to do this is to trick the victim’s browser into running some Javascript we control. This script will make SSL POST requests to a secure site like Google. Each time it does so, it will transmit a request path first, followed by an HTTP cookie and other headers, followed by a payload it controls.

Source: Möller et al.

Since the script controls the path and payload, by varying these values and knowing the size of the intermediate headers, the script can systematically align each specific byte of the cookie to any location it wants. It can also adjust the padding length to ensure that the final block of the record contains 16 bytes of padding.

This means that our attack can now be used to decrypt an entire cookie, with an average of 256 requests per cookie byte. That’s not bad at all.

So should we move to West Virginia and stock up on canned goods?

Portions of the original SSL v3 specification being
reviewed at IETF 90.

Maybe. But I’m not so sure. For a few answers on what to do next, see Adam Langley and Rob Graham’s blog posts on this question.

Note that this entire vulnerability stems from the fact that SSLv3 is older than Methuselah. In fact, there are voting-age children who are younger than SSLv3. And that’s worrying.

The obvious and correct solution to this problem is find and kill SSLv3 anywhere it lurks. In fact, this is something we should have done in the early 2000s, if not sooner. We can do it now, and this whole problem goes away.

The problem with the obvious solution is that our aging Internet infrastructure is still loaded with crappy browsers and servers that can’t function without SSLv3 support. Browser vendors don’t want their customers to hit a blank wall anytime they access a server or load balancer that only supports SSLv3, so they enable fallback. Servers administrators don’t want to lock out the critical IE6 market, so they also support SSLv3. And we all suffer.

Hopefully this will be the straw that breaks the camel’s back and gets us to abandon obsolete protocols like SSLv3. But nobody every went bankrupt betting on insecurity. It’s possible that ten years from now we’ll still be talking about ways to work around POODLE and its virulent flesh-eating offspring. All we can do is hope that reason will prevail.

Attack of the Week: Triple Handshakes (3Shake)

The other day Apple released a major security update that fixes a number of terrifying things that can happen to your OS/X and iOS devices. You should install it. Not only does this fix a possible remote code execution vulnerability in the JPEG parser (!), it also patches a TLS/SSL protocol bug known as the “Triple Handshake” vulnerability. And this is great timing, since Triple Handshakes are something I’ve been meaning (and failing) to write about for over a month now.

But before we get there: a few points of order.

First, if Heartbleed taught us one thing, it’s that when it comes to TLS vulnerabilities, branding is key. Henceforth, and with apologies to Bhargavan, Delignat-Lavaud, Pironti,  Fournet and Strub (who actually discovered the attack*), for the rest of this post I will be referring to the vulnerability simply as “3Shake”. I’ve also taken the liberty of commissioning a logo (courtesy @Raed667). I hope you like it.

On a more serious note, 3Shake is not Heartbleed. That’s both good and bad. It’s good because Heartbleed was nasty and 3Shake really isn’t anywhere near as dangerous. It’s bad since, awful as it was, Heartbleed was only an implementation vulnerability — and one in a single TLS library to boot. 3Shake represents a novel and fundamental bug in the TLS protocol.

The final thing you should know about 3Shake is that, according to the cryptographic literature, it shouldn’t exist.

You see, in the last few years there have been at least three four major crypto papers purporting to prove the TLS protocol secure. The existence of 3Shake doesn’t make those results wrong. It may, however, indicate that cryptographers need to think a bit more about what ‘secure’ and ‘TLS’ actually mean. For me, that’s the most fascinating implication of this new attack.

I’ll proceed with the usual ‘fun’ question-and-answer format I save for this sort of thing.

What is TLS and why should you care?

Since you’re reading this blog, you probably already know something about TLS. You might even realize how much of our infrastructure is protected by this crazy protocol.

In case you don’t: TLS is a secure transport protocol that’s designed to establish communications between two parties, who we’ll refer to as the Client and the Server. The protocol consists of two sub-protocols called the handshake protocol and the record protocol. The handshake is intended to authenticate the two communicating parties and establish shared encryption keys between them. The record protocol uses those keys to exchange data securely.

For the purposes of this blog post, we’re going to focus primarily on the handshake protocol, which has (at least) two major variants: the RSA handshake and the Diffie-Hellman handshake (ECDHE/DHE). These are illustrated below.

Simplified description of the TLS handshakes. Top: RSA handshake. Bottom: Diffie-Hellman handshake.

All this means we’re just now starting to uncover some of the bugs that have been present in the protocol since it was first designed. And we’re likely to discover more! That’s partly because this analysis is at a very early stage. It’s also partly because, from an analysts’ point of view, we’re still trying to figure out exactly what the TLS handshake is supposed to do.As much as I love TLS, the protocol is a hot mess. For one thing, it inherits a lot of awful cryptography from its ancient predecessors (SSLv1-3). For another, it’s only really beginning to be subjected to rigorous, formal analysis.

Well, what is the TLS handshake supposed to do?

Up until this result, we thought we had a reasonable understanding of the purpose of the TLS handshake. It was intended to authenticate one or both sides of the connection, then establish a shared cryptographic secret (called the Master Secret) that could be used to derive cryptographic keys for encrypting application data.

The first problem with this understanding is that it’s a bit too simple. There isn’t just one TLS handshake, there are several variants of it. Worse, multiple different handshake types can be used within a single connection.

The standard handshake flow is illustrated — without crypto — in the diagram below. In virtually every TLS connection, the server authenticates to the client by sending a public key embedded in a certificate. The client, for its part, can optionally authenticate itself by sending a corresponding certificate and proving it has the signing key. However this client authentication is by no means common. Many TLS connections are authenticated only in one direction.

Common TLS handshakes. Left: only server authenticates.
Right: client and server both authenticate with certificates.

TLS also supports a “renegotiation” handshake that can switch an open connection from one mode to the other. This is typically used to change a connection that was authenticated only in one direction (Server->Client) into a connection that’s authenticated in both directions. The server usually initiates renegotiation when the client has e.g., asked for a protected resource.

Renegotiating a session. A new handshake causes the existing connection to be mutually authenticated.

Renegotiation has had problems before. Back in 2009, Ray and Dispensa showed that a man-in-the-middle attacker could actually establish a (non-authenticated) connection with some server; inject some data; and when the server asks for authentication, the attacker could then “splice” on a real connection with an authorized client by simply forwarding the new handshake messages to the legitimate client. From the server’s point of view, both communications would seem to be coming from the same (now authenticated) person:

Ray/Dispensa attack from 2009. The attacker first establishes an unauthenticated connection and injects some traffic (“drop table *”). When the server initiates a renegotiation for client authentication, the attacker forwards the new handshake messages to an honest client Alice who then sends real traffic. Since the handshakes are not bound together, Bob views this as a single connection to one party.

To fix this, a “secure renegotiation” band-aid to TLS was proposed. The rough idea of this extension was to ‘bind’ the renegotiation handshake to the previous handshake, by having the client present the “Finished” message of the previous handshake. Since the Finished value is (essentially) a hash of the Master Secret and the (hash of) the previous handshake messages, this allows the client to prove that it — not an attacker — truly negotiated the previous connection.

All of this brings us back to the question of what the TLS handshake is supposed to do.

You see, the renegotiation band-aid now adds some pretty interesting new requirements to the TLS handshake. For one thing, the security of this extension depends on the idea that (1) no two distinct handshakes will happen to use the same Master Secret, and (2) that no two handshakes will have the same handshake messages, ergo (3) no two handshakes will have the same Finished message.

Intuitively, this seemed like a pretty safe thing to assume — and indeed, many other systems that do ‘channel binding’ on TLS connections also make this assumption. The 3Shake attack shows that this is not safe to assume at all.

So what’s the problem here?

It turns out that TLS does a pretty good job of establishing keys with people you’ve authenticated. Unfortunately there’s a caveat. It doesn’t truly guarantee the established key will be unique to your connection. This is a pretty big violation of the assumptions that underlie the “secure renegotiation” fix described above.

For example: imagine that Alice is (knowingly) establishing a TLS connection to a server Mallory. It turns out that Mallory can simultaneously — and unknown to Alice — establish a different connection to a second server Bob. Moreover, if Mallory is clever, she can force both connections to use the same “Master Secret” (MS).

Mallory creates two connections that use the same Master Secret.

The first observation of the 3Shake attack is that this trick can be played if Alice supports either of the or RSA and DHE handshakes — or both (it does not seem to work on ECDHE). Here’s the RSA version:**

RSA protocol flow from the triple handshake attack (source). The attacker is in the middle, while the client and server are on the left/right respectively. MS is computed as a function of (pms, cr, sr) which are identical in both handshakes.

So already we have a flaw in the logic undergirding secure renegotiation. The Master Secret (MS) values are not necessarily distinct between different handshakes.

Fortunately, the above attack does not let us resurrect the Ray/Dispensa injection attack. While the attacker has tricked the client into using a specific MS value, the handshake Finished messages — which the client will attach to the renegotiation handshake — will not be the same in both handshakes. That’s because (among other things) the certificates sent on each connection were very different, hence the handshake hashes are not identical. In theory we’re safe.

But here is where TLS gets awesome.

You see, there is yet another handshake I haven’t told you about. It’s called the “session resumption handshake”, and it allows two parties who’ve previously established a master secret (and still remember it) to resume their session with new encryption keys. The advantage of resumption is that it uses no public-key cryptography or certificates at all, which is supposed to make it faster.

It turns out that if an attacker knows the previous MS and has caused it to be the same on both sides, it can now wait until the client initiates a session resumption. Then it can replay messages between the client and server in order to update both connections with new keys:

An attacker replays the session resumption handshake to ensure the same key on both sides. Note that the handshake messages are identical in both connections. (authors of source)

Which brings us to the neat thing about this handshake. Not only is the MS the same on both connections, but both connections now see exactly the same (resumption) handshake messages. Hence the hash of these handshakes will be identical, which means in turn that their “Finished” message will be identical.

By combining all of these tricks, a clever attacker can pull off the following — and absolutely insane — “triple handshake” injection attack:

Triple handshake attack. The attacker mediates two handshakes that give MS on both sides, but two different handshake hashes. The resumption handshake leaves the same MS and an identical handshake hash on both sides. This means that the Finished message from the resumption handshake will be the same for the connections on either side of the attacker. Now he can hook up the two without anyone noticing that he previously injected traffic.

In the above scenario, an attacker first runs a (standard) handshake to force both sides of the connection to use the same MS. It then causes both sides to perform session resumption, which results in both sides using the same MS and having the same handshake hash and Finished messages on both sides. When the server initiates renegotiation, the attacker can forward the third (renegotiation) handshake on to the legitimate client as in the Ray/Dispensa attack — secure in the knowledge that both client and server will expect the same Finished token.

And that’s the ballgame.

What’s the fix?

There are several, and you can read about them here.

One proposed fix is to change the derivation of the Master Secret such that it includes the handshake hash. This should wipe out most of the attacks above. Another fix is to bind the “session resumption” handshake to the original handshake that led to it.

Wait, why should I care about injection attacks?

You probably don’t, unless you happen to be one of the critical applications that relies on the client authentication and renegotiation features of TLS. In that case, like most applications, you probably assumed that a TLS connection opened with a remote user was actually from that user the whole time, and not from two different users.

If you — like most applications — made that assumption, you might also forget to treat the early part of the connection (prior to client authentication) as a completely untrusted bunch of crap. And then you’d be in a world of hurt.

But don’t take my word for it. There’s video! See here for the source, background and details.

What does this have to do with the provable security of TLS?

Of all the questions 3Shake raises, this one is the most interesting. As I mentioned earlier, there have been several recent works that purport to prove things about the security of TLS. They’re all quite good, so don’t take any of this as criticism.

However, they (with one exception, the miTLS project) didn’t find this attack. Why is that?

The first reason is simple: many of these works analyze only the basic TLS handshake, or they omit at least one of the possible handshakes (e.g., resumption). This means they don’t catch the subtle interactions between the resumption handshake, the renegotiation handshake, and extensions — all of which are the exact ingredients that make most TLS attacks possible.

The second problem is that we don’t quite know what standard we’re holding TLS to. For example, the common definition of security for TLS is called “Authenticated and Confidential Channel Establishment” (ACCE). Roughly speaking this ensures that two parties can establish a channel and that nobody will be able to determine what data is being sent over said channel.

The problem with ACCE is that it’s a definition that was developed specifically so that TLS could satisfy it. As a result, it’s necessarily weak. For example, ACCE does not actually require that each handshake produces a unique Master Secret — one of the flaws that enables this attack — because such a definition was not possible to achieve with the existing TLS protocol. In general this is what happens when you design a protocol first and prove things about it later.

What’s the future for TLS? Can’t we throw the whole thing out and start over again?

Sure, go ahead and make TLS Rev 2. It can strip out all of this nonsense and start fresh.

But before you get cocky, remember — all these crazy features in TLS were put there for a reason. Someone wanted and demanded them. And sadly, this is the difference between a successful, widely-used protocol and your protocol.

Your new replacement for TLS might be simple and wonderful today, but that’s only because nobody uses it. Get it out into the wild and before long it too will be every bit as crazy as TLS.

Notes:

* An earlier version of this post incorrectly identified the researchers who discovered the attack.

** The Diffie-Hellman (DHE) version is somewhat more clever. It relies on the attacker manipulating the D-H parameters such that they will force the client to use a particular key. Since DHE parameters sent down from the server are usually ‘trusted’ by TLS implementations, this trick is relatively easy to pull off.

How does the NSA break SSL?

A few weeks ago I wrote a long post about the NSA’s ‘BULLRUN’ project to subvert modern encryption standards. I had intended to come back to this at some point, since I didn’t have time to discuss the issues in detail. But then things got in the way. A lot of things, actually. Some of which I hope to write about in the near future.

But before I get there, and at the risk of boring you all to tears, I wanted to come back to this subject at least one more time, if only to pontificate a bit about a question that’s been bugging me.

You see, the NSA BULLRUN briefing sheet mentions that NSA has been breaking quite a few encryption technologies, some of which are more interesting than others. One of those technologies is particularly surprising to me, since I just can’t figure how NSA might be doing it. In this extremely long post I’m going to try to dig a bit deeper into the most important question facing the Internet today.

Specifically: how the hell is NSA breaking SSL?

Section of the BULLRUN briefing sheet. Source: New York Times.

To keep things on target I’m going to make a few basic ground rules.

First, I’m well aware that NSA can install malware on your computer and pwn any cryptography you choose. That doesn’t interest me at all, for the simple reason that it doesn’t scale well. NSA can do this to you, but they can’t do it for an entire population. And that’s really what concerns me about the recent leaks: the possibility that NSA is breaking encryption for the purposes of mass surveillance.

For the same reason, we’re not going to worry about man-in-the-middle (MITM) attacks. While we know that NSA does run these, they’re also a very targeted attack. Not only are MITMs detectable if you do them at large scale, they don’t comport with what we know about how NSA does large-scale interception — mostly via beam splitters and taps. In other words: we’re really concerned about passive surveillance.

The rules above aren’t absolute, of course. We will consider limited targeted attacks on servers, provided they later permit passive decryption of large amounts of traffic; e.g., decryption of traffic to major websites. We will also consider arbitrary modifications to software and hardware — something we know NSA is already doing.

One last point: to keep things from going off the rails, I’ve helpfully divided this post into two sections. The first will cover attacks that use only known techniques. Everything in this section can be implemented by a TAO employee with enough gumption and access to software. The second section, which I’ve titled the ‘Tinfoil Hat Spectrum‘ covers the fun and speculative stuff — ranging from new side channel attacks all the way to that huge quantum computer the NSA keeps next to BWI.

We’ll start with the ‘practical’.

Attacks that use Known Techniques

Theft of RSA keys. The most obvious way to ‘crack’ SSL doesn’t really involve cracking anything. Why waste time and money on cryptanalysis when you can just steal the keys? This issue is of particular concern in servers configured for the TLS RSA handshake, where a single 128-byte server key is all you need to decrypt every past and future connection made from the device.

In fact, this technique is so obvious that it’s hard to imagine NSA spending a lot of resources on sophisticated cryptanalytic attacks. We know that GCHQ and NSA are perfectly comfortable suborning even US providers overseas. And inside our borders, they’ve demonstrated a willingness to obtain TLS/SSL keys using subpoena powers and gag orders. If you’re using an RSA connection to a major website, it may be sensible to assume the key is already known.

Of course, even where NSA doesn’t resort to direct measures, there’s always the possibility of obtaining keys via a remote software exploit. The beauty is that these attacks don’t even require remote code execution. Given the right vulnerability, it may simply require a handful of malformed SSL requests to map the full contents of the OpenSSL/SChannel heap.

Source: New York Times

Suborning hardware encryption chips. A significant fraction of SSL traffic on the Internet is produced by hardware devices such as SSL terminators and VPN-enabled routers. Fortunately we don’t have to speculate about the security of these devices — we already know NSA/GCHQ have been collaborating with hardware manufacturers to ‘enable’ decryption on several major VPN encryption chips.

The NSA documents aren’t clear on how this capability works, or if it even involves SSL. If it does, the obvious guess is that each chip encrypts and exflitrates bits of the session key via ‘random’ fields such as IVs and handshake nonces. Indeed, this is relatively easy to implement on an opaque hardware device. The interesting question is how one ensures these backdoors can only be exploited by NSA — and not by rival intelligence agencies. (Some thoughts on that here.)

Side channel attacks. Traditionally when we analyze cryptographic algorithms we concern ourselves with the expected inputs and outputs of the system. But real systems leak all kinds of extra information. These ‘side channels’ — which include operation time, resource consumption, cache timing, and RF emissions — can often be used to extract secret key material.

The good news is that most of these channels are only exploitable when the attacker is in physical proximity to a TLS server. The bad news is that there are conditions in which the attacker can get close. The most obvious example involves virtualized TLS servers in the cloud setting, where a clever attacker may share physical resources with the target device.

A second class of attack uses remote timing information to slowly recover an RSA key. These attacks can be disabled via countermeasures such as RSA blinding, though amusingly, some ‘secure’ hardware co-processors may actually turn these countermeasures off by default! At very least, this makes the hardware vulnerable to attacks by a local user, and could even facilitate remote recovery of RSA keys.

Weak random number generators. Even if you’re using strong Perfect Forward Secrecy ciphersuites, the security of TLS depends fundamentally on the availability of unpredictable random numbers. Not coincidentally, tampering with random number generator standards appears to have been a particular focus of NSA’s efforts.

Random numbers are critical to a number of elements in TLS, but they’re particularly important in three places:

  1. On the client side, during the RSA handshake. The RNG is used to generate the RSA pre-master secret and encryption padding. If the attacker can predict the output of this generator, she can subsequently decrypt the entire session. Ironically, a failure of the server RNG is much less devastating to the RSA handshake.*
  2. On the client or server side, during the Diffie-Hellman handshake(s). Since Diffie-Hellman requires a contribution from each side of the connection, a predictable RNG on either side renders the session completely transparent.
  3. During long-term key generation, particularly of RSA keys. If this happens, you’re screwed.
And you just don’t need to be that sophisticated to weaken a random number generator. These generators are already surprisingly fragile, and it’s awfully difficult to detect when one is broken. Debian’s maintainers made this point beautifully back in 2008 when an errant code cleanup reduced the effective entropy of OpenSSL to just 16 bits. In fact, RNGs are so vulnerable that the challenge here is not weakening the RNG — any idiot with a keyboard can do that — it’s doing so without making the implementation trivially vulnerable to everyone else.

The good news is that it’s relatively easy to tamper with an SSL implementation to make it encrypt and exfiltrate the current RNG seed. This still requires someone to physically alter the library, or install a persistent exploit, but it can be done cleverly without even adding much new code to the existing OpenSSL code. (OpenSSL’s love of function pointers makes it particularly easy to tamper with this stuff.)

If tampering isn’t your style, why not put the backdoor in plain sight? That’s the approach NSA took with the Dual_EC RNG, standardized by NIST in Special Publication 800-90. There’s compelling evidence that NSA deliberately engineered this generator with a backdoor — one that allows them to break any TLS/SSL connection made using it. Since the generator is (was) the default in RSA’s BSAFE library, you should expect every TLS connection made using that software to be potentially compromised.

And I haven’t even mentioned Intel’s plans to replace the Linux kernel RNG with its own hardware RNG.

Esoteric Weaknesses in PFS systems. Many web servers, including Google and Facebook, now use Perfect Forward Secrecy ciphersuites like ephemeral Diffie-Hellman (DHE and ECDHE). In theory these ciphersuites provide the best of all possible worlds: keys persist for one session and then disappear once the connection is over. While this doesn’t save you from RNG issues, it does make key theft a whole lot more difficult.

PFS ciphersuites are a good thing, but a variety of subtle issues can cramp their style. For one thing, the session resumption mechanism can be finicky: session keys must either be stored locally, or encrypted and given out to users in the form of session tickets. Unfortunately, the use of session tickets somewhat diminishes the ‘perfectness’ of PFS systems, since the keys used for encrypting the tickets now represent a major weakness in the system. Moreover, you can’t even keep them internal to one server, since they have to be shared among all of a site’s front-end servers! In short, they seem like kind of a nightmare.

A final area of concern is the validation of Diffie-Hellman parameters. The current SSL design assumes that DH groups are always honestly generated by the server. But a malicious implementation can violate this assumption and use bad parameters, which enable third party eavesdropping. This seems like a pretty unlikely avenue for enabling surveillance, but it goes to show how delicate these systems are.

The Tinfoil Hat Spectrum

I’m going to refer to the next batch of attacks as ‘tinfoil hat‘ vulnerabilities. Where the previous issues all leverage well known techniques, each of the following proposals require totally new cryptanalytic techniques. All of which is a way of saying that the following section is pure speculation. It’s fun to speculate, of course. But it requires us to assume facts not in evidence. Moreover, we have to be a bit careful about where we stop.

So from here on out we are essentially conducting a thought-experiment. Let’s imagine the NSA has a passive SSL-breaking capability; and furthermore, that it doesn’t rely on the tricks of the previous section. What’s left?

The following list begins with the most ‘likely’ theories and works towards the truly insane.

Breaking RSA keys. There’s a persistent rumor in our field that NSA is cracking 1024-bit RSA keys. It’s doubtful this rumor stems from any real knowledge of NSA operations. More likely it’s driven by the fact that cracking 1024-bit keys is highly feasible for an organization with NSA’s resources.

How feasible? Several credible researchers have attempted to answer this question, and it turns out that the cost is lower than you think. Way back in 2003, Shamir and Tromer estimated $10 million for a purpose-built machine that could factor one 1024-bit key per year. In 2013, Tromer reduced those numbers to about $1 million, factoring in hardware advances. And it could be significantly lower. This is pocket change for NSA.

Along similar lines, Bernstein, Heninger and Lange examined at the feasibility of cracking RSA using distributed networks of standard PCs. Their results are pretty disturbing: in principal, a cluster about the size of the real-life Conficker botnet could do serious violence to 1024-bit keys.

Given all this, you might ask why this possibility is even in the ‘tinfoil hat’ category. The simple answer is: because nobody’s actually done it. That means it’s at least conceivable that the estimates above are dramatically too high — or even too low. Moreover, RSA-1024 keys are being rapidly being phased out. Cracking 2048 bit keys would require significant mathematical advances, taking us much deeper into the tinfoil hat.**

Cracking RC4. On paper, TLS supports a variety of strong encryption algorithms. In practice, about half of all TLS traffic is secured with the creaky old RC4 cipher. And this should worry you — because RC4 is starting to show its age. In fact, as used in TLS it’s already vulnerable to (borderline) practical attacks. Thus it seems like a nice candidate for a true cryptanalytic advance on NSA’s part.

Unfortunately the problem with this theory is that we simply don’t know of any attack that would allow the NSA to usefully crack RC4! The known techniques require an attacker to collect thousands or millions of ciphertexts that are either (a) encrypted with related keys (as in WEP) or (b) contain the same plaintext. The best known attack against TLS takes the latter form — it requires the victim to establish billions of sessions, and even then it only recovers fixed plaintext elements like cookies or passwords.

The counterargument is that the public research community hasn’t been thinking very hard about RC4 for the past decade — in part because we thought it was so broken people had stopped using it (oops!) If we’d been focusing all our attention on it (or better, the NSA’s attention), who knows what we’d have today.

If you told me the NSA had one truly new cryptanalytic capability, I’d agree with Jake and point the finger at RC4. Mostly because the alternatives are far scarier.

New side-channel attacks. For the most part, remote timing attacks appear to have been killed off by the implementation of countermeasures such as RSA blinding, which confound timing by multiplying a random blinding factor into each ciphertext prior to decryption. In theory this should make timing information essentially worthless. In practice, many TLS implementations implement compromises in the blinding code that might resurrect these attacks, things like squaring a blinding factor between decryption operations, rather than generating a new one each time. It’s quite unlikely there are attacks here, but who knows.

Goofy stuff. Maybe NSA does have something truly amazing up its sleeve. The problem with opening this Pandora’s box is that it’s really hard to get it closed again. Did Jerry Solinas really cook the NIST P-curves to support some amazing new attack (which NSA knew about way back in the late 1990s, but we have not yet discovered)? Does the NSA have a giant supercomputer named TRANSLTR that can brute-force any cryptosystem? Is there a giant quantum computer at the BWI Friendship annex? For answers to these questions you may as well just shake the Magic 8-Ball, cause I don’t have a clue.

Conclusion

We don’t know and can’t know the answer to these things, and honestly it’ll make you crazy if you start thinking about it. All we can really do is take NSA/GCHQ at their word when they tell us that these capabilities are ‘extremely fragile’. That should at least give us hope.

The question now is if we can guess well enough to turn that fragility from a warning into a promise.

Notes:

* A failure of the server RNG could result in some predictable values like the ServerRandom and session IDs. An attacker who can predict these values may be able to run active attacks against the protocol, but — in the RSA ciphersuite, at least — they don’t admit passive compromise.

** Even though 1024-bit RSA keys are being eliminated, many servers still use 1024-bit for Diffie-Hellman (mostly for efficiency reasons). The attacks on these keys are similar to the ones used against RSA — however, the major difference is that fresh Diffie-Hellman ‘ephemeral’ keys are generated for each new connection. Breaking large amounts of traffic seems quite costly.

Attack of the week: RC4 is kind of broken in TLS

Update: I’ve added a link to a page at Royal Holloway describing the new attack. 

Listen, if you’re using RC4 as your primary ciphersuite in SSL/TLS, now would be a great time to stop. Ok, thanks, are we all on the same page?

No?

I guess we need to talk about this a bit more. You see, these slides have been making the rounds since this morning. Unfortunately, they contain a long presentation aimed at cryptographers, and nobody much seems to understand the real result that’s buried around slide 306 (!). I’d like to help.

Here’s the short summary:

According to AlFardan, Bernstein, Paterson, Poettering and Schuldt (a team from Royal Holloway, Eindhoven and UIC) the RC4 ciphersuite used in SSL/TLS is broken. If you choose to use it — as do a ridiculous number of major sites, including Google — then it may be possible for a dedicated attacker to recover your authentication cookies. The current attack is just on the edge of feasibility, and could probably be improved for specific applications.

This is bad and needs to change soon.

In the rest of this post I’ll cover the details in the usual ‘fun’ question-and-answer format I save for these kinds of attacks.

What is RC4 and why should I care?

RC4 is a fast stream cipher invented in 1987 by Ron Rivest. If you like details, you can see this old post of mine for a longwinded discussion of RC4’s history and flaws. Or you can take my word: RC4 is old and crummy.

Despite its age and crumminess, RC4 has become an extremely popular ciphersuite for SSL/TLS connections — to the point where it accounts for something absurd like half of all TLS traffic. There are essentially two reasons for this:

  1. RC4 does not require padding or IVs, which means it’s immune to recent TLS attacks like BEAST and Lucky13. Many admins have recommended it as the solution to these threats.
  2. RC4 is pretty fast. Faster encryption means less computation and therefore lower hardware requirements for big service providers like Google.

So what’s wrong with RC4?

Like all stream ciphers, RC4 takes a short (e.g., 128-bit) key and stretches it into a long string of pseudo-random bytes. These bytes are XORed with the message you want to encrypt, resulting in what should be a pretty opaque (and random-looking) ciphertext.

The problem with RC4 is that the above statement is not quite true. The bytes come out of the RC4 aren’t quite random looking — they have small biases. A few of these biases have been known for years, but weren’t considered useful. However, recent academic work has uncovered many small but significant additional biases running throughout the first 256 bytes of RC4 output. This new work has systematically identified many more.

At first blush this doesn’t seem like a big deal. Cryptographically, however, it’s a disaster. If I encrypt the same message (plaintext) with many different RC4 keys, then I should get a bunch of totally random looking ciphertexts. But these tiny biases mean that they won’t be random — a statistical analysis of different positions of the ciphertext will show that some values occur more often.

By getting many different encryptions of the same message — under different keys — I can tally up these small deviations from random and thus figure out what was encrypted.

If you like analogies, think of it like peeking at a picture with your hands over your eyes. By peeking through your fingers you might only see a tiny sliver of the scene you’re looking at. But by repeating the observation many times, you may be able to combining those slivers and figure out what you’re looking at.

Why would someone encrypt the same message many times?

The problem here (as usual) is your browser.

You see, there are certain common elements that your browser tends to send at the beginning of every HTTP(S) connection. One of these values is a cookie — typically a fixed string that identifies you to a website. These cookies are what let you log into Gmail without typing your password every time.

If you use HTTPS (which is enforced in many sites by default), then your cookies should be safe. After all, they’ll always be sent over an encrypted connection to the website.Unfortunately, if your connection is encrypted using RC4 (as is the case with Gmail), then each time you make a fresh connection to the Gmail site, you’re sending a new encrypted copy of the same cookie. If the session is renegotiated (i.e., uses a different key) between those connections, then the attacker can build up the list of ciphertexts he needs.

To make this happen quickly, an attacker can send you a piece of Javascript that your browser will run — possibly on a non-HTTPS tab. This Javascript can then send many HTTPS requests to Google, ensuring that an eavesdropper will quickly build up thousands (or millions) of requests to analyze.

Probability of recovering a plaintext byte (y axis) at a particular byte position in the RC4-encrypted stream (x axis), assuming 2^24 ciphertexts. Increasing the number of ciphertexts to 2^32 gives almost guaranteed plaintext recovery at all positions. (source)

Millions of ciphertexts? Pah. This is just crypto-wankery.

It’s true that the current attack requires an enormous number of TLS connections — in the tens to hundreds of millions — which means that it’s not likely to bother you. Today. On the other hand, this is the naive version of the attack, without any special optimizations.

For example, the results given by Bernstein assume no prior plaintext knowledge. But in practice we often know a lot about the structure of an interesting plaintext like a session cookie. For one thing, they’re typically Base64 encoded — reducing the number of possibilities for each byte — and they may have some recognizable structure which can be teased out using advanced techniques.

It’s not clear what impact these specific optimizations will have, but it tends to reinforce the old maxim: attacks only get better, they never get worse. And they can get a lot better while you’re arguing about them.

So what do we do about it?

Treat this as a wakeup call. Sites (like Google) need to stop using RC4 — before these attacks become practical. History tells us that nation-state attackers are already motivated to penetrate Gmail connections. The longer we stick with RC4, the more chances we’re giving them.

In the short term we have an ugly choice: stick with RC4 and hope for the best, or move back to CBC mode ciphersuites — which many sites have avoided due to the BEAST and Lucky13 attacks.

We could probably cover ourselves a bit by changing the operation of browsers, so as to detect and block Javascript that seems clearly designed to exercise these attacks. We could also put tighter limits on the duration and lifespan of session cookies. In theory we could also drop the first few hundred bytes of each RC4 connection — something that’s a bit of a hack, and difficult to do without breaking the TLS spec.

In the longer term, we need to move towards authenticated encryption modes like the new AEAD TLS ciphersuites, which should put an end to most of these problems. The problem is that we’ll need browsers to support these modes too, and that could take ages.

In summary

I once made fun of Threatpost for sounding alarmist about SSL/TLS. After all, it’s not like we really have an alternative. But the truth is that TLS (in its design, deployment and implementation) needs some active help. We’re seeing too many attacks, and they’re far too close to practical. Something needs to give.

We live in a world where NIST is happy to give us a new hash function every few years. Maybe it’s time we put this level of effort and funding into the protocols that use these primitives? They certainly seem to need it.

Attack of the week: TLS timing oracles

Ever since I started writing this blog (and specifically, the posts on SSL/TLS) I’ve had a new experience: people come up to me and share clever attacks that they haven’t made public yet.

This is pretty neat — like being invited to join an exclusive club. Unfortunately, being in this club mostly sucks. That’s because the first rule of ‘TLS vulnerability club’ is: You don’t talk about TLS vulnerability club. Which takes all the fun out of it.

(Note that this is all for boring reasons — stuff like responsible disclosure, publication and fact checking. Nobody is planning a revolution.)

Anyway, it’s a huge relief that I’m finally free to tell you about a neat new TLS attack I learned about recently. The new result comes from Nadhem AlFardan and Kenny Paterson of Royal Holloway. Dubbed ‘Lucky 13’, it takes advantage of a very subtle bug in the way records are encrypted in the TLS protocol.

If you aren’t into long crypto posts, here’s the TL;DR:

There is a subtle timing bug in the way that TLS data decryption works when using the (standard) CBC mode ciphersuite. Given the right set of circumstances, an attacker can use this to completely decrypt sensitive information, such as passwords and cookies. 

The attack is borderline practical if you’re using the Datagram version of TLS (DTLS). It’s more on the theoretical side if you’re using standard TLS. However, with some clever engineering, that could change in the future. You should probably patch!

For the details, read on. As always, we’ll do this in the ‘fun’ question/answer format I save for these kinds of posts.

What is TLS, what is CBC mode, and why should I care if it’s broken?

Some background: Transport Layer Security (née SSL) is the most important security protocol on the Internet. If you find yourself making a secure connection to another computer, there’s a very good chance you’ll be doing it with TLS. (Unless you’re using UDP-based protocol, in which case you might use TLS’s younger cousin Datagram TLS [DTLS]).

The problem with TLS is that it kind of stinks. Mostly this is due to bad decisions made back in the the mid-1990s when SSL was first designed. Have you seen the way people dressed back then? Protocol design was worse.

While TLS has gotten better since then, it still retains many of the worst ideas from the era. One example is the CBC-mode ciphersuite, which I’ve written about several times before on this blog. CBC-mode uses a block cipher (typically AES) to encrypt data. It’s the most common ciphersuite in use today, probably because it’s the only mandatory ciphersuite given in the spec.

What’s wrong with CBC mode?

The real problem with TLS is not the encryption itself, but rather the Message Authentication Code (MAC) that’s used to protect the integrity (authenticity) of each data record.

Our modern understanding is that you should always encrypt a message first, then apply the MAC to the resulting ciphertext. But TLS gets this backwards. Upon encrypting a record, the sender first applies a MAC to the plaintext, then adds up to 255 bytes of padding to get the message up to a multiple of the cipher (e.g., AES’s) block size. Only then does it CBC-encrypt the record.

Structure of a TLS record. The whole thing is encrypted with CBC mode.

The critical point is that the padding is not protected by the MAC. This means an attacker can tamper with it (by flipping specific bits in the ciphertext), leading to a very nasty kind of problem known as a padding oracle attack.

In these attacks (example here), an attacker first captures an encrypted record sent by an honest party, modifies it, then re-transmits it to the server for decryption. If the attacker can learn whether her changes affected the padding — e.g., by receiving a padding error as opposed to a bad MAC error — she can use this information to adaptively decrypt the whole record. The structure of TLS’s encryption padding makes it friendly to these attacks.

Closeup of a padded TLS record. Each byte contains the padding length, followed by another (pointless, redundant) length byte.

But padding oracle attacks are well known, and (D)TLS has countermeasures!

The TLS designers learned about padding oracles way back in 2002, and immediately took steps to rectify them. Unfortunately, instead of fixing the problem, they decided to apply band-aids. This is a time-honored tradition in TLS design.

The first band-aid was simple: eliminate any error messages that could indicate to the attacker whether the padding check (vs. the MAC check) is what caused a decryption failure.

This seemed to fix things for a while, until some researchers figured out that you could simply time the server to see how long decryption takes, and thereby learn if the padding check failed. This is because implementations of the time would first check the padding, then return immediately (without checking the MAC) if the padding was bad. That resulted in a noticeable timing differential the attacker could detect.

Thus a second band-aid was needed. The TLS designers decreed that decryption should always take the same amount of time, regardless of how the padding check comes out. Let’s roll the TLS 1.2 spec:

[T]he best way to do this is to compute the MAC even if the padding is incorrect, and only then reject the packet. For instance, if the pad appears to be incorrect, the implementation might assume a zero-length pad and then compute the MAC.

Yuck. Does this even work?

Unfortunately, not quite. When the padding check fails, the decryptor doesn’t know how much padding to strip off. That means they don’t know how long the actual message is, and therefore how much data to MAC. The recommended countermeasure (above) is to assume no padding, then MAC the whole blob. As a result, the MAC computation can take a tiny bit longer when the padding is damaged.

The TLS designers realized this, but by this point they were exhausted and wanted to go think about something else. So they left us with the following note:

This leaves a small timing channel, since MAC performance depends to some extent on the size of the data fragment, but it is not believed to be large enough to be exploitable, due to the large block size of existing MACs and the small size of the timing signal.

And for the last several years — at least, as far as we know — they were basically correct.

How does this new paper change things?

The new AlFardan and Paterson result shows that it is indeed possible to distinguish the tiny timing differential caused by invalid padding, at least from a relatively close distance — e.g., over a LAN. This is partly due to advances in computing hardware: most new computers now ship with an easily accessible CPU cycle counter. But it’s also thanks to some clever statistical techniques that use many samples to smooth out and overcome the jitter and noise of a network connection.

The upshot is that new technique can measure timing differentials of less than 1 microsecond over a LAN connection — for example, if the attacker is in the same data center as your servers. It does this by making several thousand decryption queries and processing the results. Under the right circumstances, this turns out to be enough to bring (D)TLS padding oracle attacks back to life.

How does the attack work?

For the details, you should obviously read the full paper or at least the nice FAQ that Royal Holloway has put out. Here I’ll try to give some intuition.

Before I can explain the attack, you need to know a little bit about how hash-based MACs work. TLS typically uses HMAC with either MD5, SHA1 or SHA256 as the hash function. While these are very different hash functions, the critical point is that each one processes messages in 64-byte blocks.

Consequently, hashing time is a function of the number of blocks in the message, not the number of bytes. Going from a 64-byte input to a 65-byte input means an entire extra block, and hence a (relatively) large jump in the amount of computation time (an extra iteration of the hash function’s compression function).

There are a few subtleties in here. The hash functions incorporate an 8-byte length field plus some special hash function padding, which actually means a one-block message can only contain about 55 bytes of real data (which also includes the 13-byte record header). The HMAC construction adds a (constant) amount of additional work, but we don’t need to think about that here.

So in summary: you can get 55 bytes of data into one block of the hash. Go a single byte beyond that, and the hash function will have to run a whole extra round, causing a tiny (500-1000 hardware cycle) delay.

The attack here is to take a message that — including the TLS padding — would fall above that 55 byte boundary. However, the same message with padding properly removed would fall below it. When an attacker tampers with the message (damaging the padding), the decryption process will MAC the longer version of the message — resulting in a measurably higher computation time than when the padding checks out.

By repeating this process many, many thousand (or millions!) of times to eliminate noise and network jitter, it’s possible to get a clear measurement of whether the decryption succeeded or not. Once you get that, it’s just a matter of executing a standard padding oracle attack.

But there’s no way this will work on TLS! It’ll kill the session!

Please recall that I described this as a practical attack on Datagram TLS (DTLS) — and as a more theoretical one on TLS itself.* There’s a reason for this.

The reason is that TLS (and not DTLS) includes one more countermeasure I haven’t mentioned yet: anytime a record fails to decrypt (due to a bad MAC or padding error), the TLS server kills the session. DTLS does not do this, which makes this attack borderline practical. (Though it still takes millions of packet queries to execute.)

The standard TLS ‘session kill’ feature would appear to stop padding oracle attacks, since they require the attacker to make many, many decryption attempts. Killing the session limits the attacker to one decryption — and intuitively that would seem to be the end of it.

But actually, this turns out not to be true.

You see, one of the neat things about padding oracle attacks is that they can work across different sessions (keys), provided that that (a) your victim is willing to re-initiate the session after it drops, and (b) the secret plaintext appears in the same position in each stream. Fortunately the design of browsers and HTTPS lets us satisfy both of these requirements.

  1. To make a target browser initiate many connections, you can feed it some custom Javascript that causes it to repeatedly connect to an SSL server (as in the CRIME attack). Note that the Javascript doesn’t need to come from the target webserver — it can even served on an unrelated non-HTTPS page, possibly running in a different tab. So in short: this is pretty feasible.
  2. Morover, thanks to the design of the HTTP(S) protocol, each of these connections will include cookies at a known location in HTTP stream. While you may not be able to decrypt the rest of the stream, these cookie values are generally all you need to break into somebody’s account.

Thus the only practical limitation on such a cookie attack is the time it takes for the server to re-initiate all of these connections. TLS handshakes aren’t fast, and this attack can take tens of thousands (or millions!) of connections per byte. So in practice the TLS attack would probably take days. In other words: don’t panic.

On the other hand, don’t get complacent either. The authors propose some clever optimizations that could take the TLS attack into the realm of the feasible (for TLS) in the near future.

How is it being fixed?

With more band-aids of course!

But at least this time, they’re excellent band-aids. Adam Langley has written a 500-line OpenSSL patch (!) that modifies the CBC-mode decryption procedure to wipe out the timing differentials used by this attack. I would recommend that you think about updating at least your servers in the future (though we all know you won’t). Microsoft products should also see updates soon are allegedly not vulnerable to this attack, so won’t need updates.**

Still, this is sort of like fixing your fruitfly problem by spraying your kitchen with DDT. Why not just throw away the rotted fruit? In practice, that means moving towards modern AEAD ciphersuites like AES-GCM, which should generally end this madness. We hope.

Why not switch to RC4?

RC4 is not an option in DTLS. However, it will mitigate this issue for TLS, since the RC4 ciphersuite doesn’t use padding at all. In fact, this ancient ciphersuite has been (hilariously) enjoying a resurgence in recent years as the ‘solution’ to TLS attacks like BEAST. Some will see this attack as further justification for the move.

But please don’t do this. RC4 is old and creaky, and we really should be moving away from it too.

So what’s next for TLS?

I’d love to say more, but you see, the first rule of TLS vulnerability club is…

Notes:

* The attack on Datagram TLS is more subtle, and a lot more interesting. I haven’t covered it much in this post because TLS is much more widely used than DTLS. But briefly, it’s an extension of some previous techniques — by the same authors — that I covered in this blog last year. The gist is that an attacker can amplify the impact of the timing differential by ‘clogging’ the server with lots of unrelated traffic. That makes these tiny differentials much easier to detect.

** And if you believe that, I have a lovely old house in Baltimore to sell you…

Surveillance works! Let’s have more of it.

If you care about these things, you might have heard that Google recently detected and revoked a bogus Google certificate. Good work, and huge kudos to the folks at Google who lost their holidays to this nonsense.

From what I’ve read, this particular incident got started in late 2011 when Turkish Certificate Authority TURKTRUST accidentally handed out intermediate CA certificates to two of their customers. Intermediate CA certs are like normal SSL certs, with one tiny difference: they can be used to generate other SSL certificates. Oops.

One of the recipients noticed the error and reported it to the CA. The other customer took a different approach and installed it into an intercepting Checkpoint firewall to sniff SSL-secured connections. Because, you know, why not.

So this is bad but not exactly surprising — in fact, it’s all stuff we’ve seen before. Our CA system has far too many trust points, and it requires too many people to act collectively when someone proves untrustworthy. Unless we do something, we’re going to see lots more of this.

What’s interesting about this case — and what leads to the title above — is not so much what went wrong, but rather, what went right. You see, this bogus certificate was detected, and likely not because some good samaritan reported the violation. Rather, it was (probably) detected by Google’s unwavering surveillance.

The surveillance in question is conducted by the Chrome brower, which actively looks out for attacks and reports them. Here’s the money quote from their privacy policy:

“If you attempt to connect to a Google website using a secure
connection, and the browser blocks the connection due to information
that indicates you are being actively attacked by someone on the
network (a “man in the middle attack”), Chrome may send information
about that connection to Google for the purpose of helping to
determine the extent of the attack and how the attack functions.”

The specific technical mechanism in Chrome simple: Chrome ships with a series of ‘pins’ in its source code (thanks Moxie, thanks Tom). These tell Chrome what valid Google certificates should look like, and help it detect an obviously bogus certificate. When Chrome sees a bogus cert attached to a purported Google site, it doesn’t just stop the connection, it rings an alarm at Google HQ.

And that alarm is a fantastic thing, because in this case it may have led to discovery and revocation before this certificate could be stolen and used for something worse.

Now imagine the same thing happening in many other browsers, and not just for Google sites. Well, you don’t have to imagine. This is exactly the approach taken by plugins like Perspectives and Convergence, which monitor users’ SSL connections in a distributed fashion to detect bogus certificates. These plugins are great, but they’re not deployed widely enough. The technique should be standard in all browsers, perhaps with some kind of opt in. (I certainly would opt.)

The simple fact is that our CA system is broken and this is what we’ve got. Congratulations to Google for taking a major first step in protecting its users. Now let’s take some more.

On the (provable) security of TLS: Part 2

This is the second post in a series on the provable security of TLS. If you haven’t read the first part, you really should!

The goal of this series is to try to answer an age-old question that is often asked and rarely answered. Namely: is the TLS protocol provably secure?

While I find the question interesting in its own right, I hope to convince you that it’s of more than academic interest. TLS is one of the fundamental security protocols on the Internet, and if it breaks lots of other things will too. Worse, it has broken — repeatedly. Rather than simply patch and hope for the best, it would be fantastic if we could actually prove that the current specification is the right one.

Unfortunately this is easier said than done. In the first part of this series I gave an overview of the issues that crop up when you try to prove TLS secure. They come at you from all different directions, but most stem from TLS’s use of ancient, archaic cryptography; gems like, for example, the ongoing use of RSA-PKCS#1v1.5 encryption fourteen years after it was shown to be insecure.

Despite these challenges, cryptographers have managed to come up with a handful of nice security results on portions of the protocol. In the previous post I discussed Jonnson and Kaliski’s proof of security for the RSA-based TLS handshake. This is an important and confidence-inspiring result, given that the RSA handshake is used in almost all TLS connections.

In this post we’re going to focus on a similarly reassuring finding related to the the TLS record encryption protocol — and the ‘mandatory’ ciphersuites used by the record protocol in TLS 1.1 and 1.2 (nb: TLS 1.0 is broken beyond redemption). What this proof tells us is that TLS’s CBC mode ciphersuites are secure, assuming… well, a whole bunch of things, really.

The bad news is that the result is extremely fragile, and owes its existence more to a series of happy accidents than from any careful security design. In other words, it’s just like TLS itself.

Records and handshakes

Let’s warm up with a quick refresher.

TLS is a layered protocol, with different components that each do a different job. In the previous post I mostly focused on the handshake, which is a beefed-up authenticated key agreement protocol. Although the handshake does several things, its main purpose is to negotiate a shared encryption key between a client and a server — parties who up until this point may be complete strangers.

The handshake gets lots of attention from cryptographers because it’s exciting. Public key crypto! Certificates! But really, this portion of the protocol only lasts for a moment. Once it’s done, control heads over to the unglamorous record encryption layer which handles the real business of the protocol: securing application data.

Most kids don’t grow up dreaming about a chance to work on the TLS record encryption layer, and that’s fine — they shouldn’t have to. All the record encryption layer does is, well, encrypt stuff. In 2012 that should be about as exciting as mailing a package.

And yet TLS record encryption still manages to be a source of endless excitement! In the past year alone we’ve seen three critical (and exploitable!) vulnerabilities in this part of TLS. Clearly, before we can even talk about the security of record encryption, we have to figure out what’s wrong with it.

Welcome to 1995

Development of the SSLv1
record encryption layer

The problem (again) is TLS’s penchant for using prehistoric cryptography, usually justified on some pretty shaky ‘backwards compatibility‘ grounds. This excuse is somewhat bogus, since the designers have actually changed the algorithms in ways that break compatibility with previous versions — and yet retained many of the worst features of the originals.

The most widely-used ciphersuites employ a block cipher configured in CBC mode, along with a MAC to ensure record authenticity. This mode can be used with various ciphers/MAC algorithms, but encryption always involves the following steps:

  1. If both sides support TLS compression, first compress the plaintext.
  2. Next compute a MAC over the plaintext, record type, sequence number and record length. Tack the MAC onto the end of the plaintext.
  3. Pad the result with up to 256 bytes of padding, such that the padded length is a multiple of the cipher’s block size. The last byte of the padding should contain the padding length (excluding this byte), and all padding bytes must also contain the same value. A padded example (with AES) might look like:

    0x MM MM MM MM MM MM MM MM MM 06 06 06 06 06 06 06

  4. Encrypt the padded message using CBC mode. In TLS 1.0 the last block of the previous ciphertext (called the ‘residue’) is used as the Initialization Vector. Both TLS 1.1 and 1.2 generate a fresh random IV for each record.
Upon decryption the attacker verifies that the padding is correctly structured, checks the MAC, and outputs an error if either condition fails.

To get an idea of what’s wrong with the CBC ciphersuite, you can start by looking at the appropriate section of the TLS 1.2 spec — which reads more like the warning label on a bottle of nitroglycerin than a cryptographic spec. Allow me sum up the problems.

First, there’s the compression. It’s long been known that compression can leak information about the contents of a plaintext, simply by allowing the adversary to see how well it compresses. The CRIME attack recently showed how nasty this can get, but the problem is not really news. Any analysis of TLS encryption begins with the assumption that compression is turned off.

Next there’s the issue of the Initialization Vectors. TLS 1.0 uses the last block of the previous ciphertext, which is absurd, insecure and — worse — actively exploitable by the BEAST attack. Once again, the issue has been known for years yet was mostly ignored until it was exploited.

So ok: no TLS 1.0, no compression. Is that all?

Well, we still haven’t discussed the TLS MAC, which turns out to be in the wrong place — it’s applied before the message is padded and encrypted. This placement can make the protocol vulnerable to padding oracle attacks, which (amazingly) will even work across handshakes. This last fact is significant, since TLS will abort the connection (and initiate a new handshake) whenever a decryption error occurs in the record layer. It turns out that this countermeasure is not sufficient.

To deal with this, recent versions of TLS have added the following patch: they require implementers to hide the cause of each decryption failure — i.e., make MAC errors indistinguishable from padding failures. And this isn’t just a question of changing your error codes, since clever attackers can learn this information by measuring the time it takes to receive an error. From the TLS 1.2 spec:

In general, the best way to do this is to compute the MAC even if the padding is incorrect, and only then reject the packet. For instance, if the pad appears to be incorrect, the implementation might assume a zero-length pad and then compute the MAC. This leaves a small timing channel, since MAC performance depends to some extent on the size of the data fragment, but it is not believed to be large enough to be exploitable.

To sum up: TLS is insecure if your implementation leaks the cause of a decryption error, but careful implementations can avoid leaking much, although admittedly they probably will leak some — but hopefully not enough to be exploited. Gagh!

At this point, just take a deep breath and say ‘all horses are spherical‘ three times fast, cause that’s the only way we’re going to get through this.

Accentuating the positive

Having been through the negatives, we’re almost ready to say nice things about TLS. Before we do, let’s just take a second to catch our breath and restate some of our basic assumptions:

  1. We’re not using TLS 1.0 because it’s broken.
  2. We’re not using compression because it’s broken.
  3. Our TLS implementation is perfect — i.e., doesn’t leak any information about why a decryption failed. This is probably bogus, yet we’ve decided to look the other way.
  4. Oh yeah: we’re using a secure block cipher and MAC (in the PRP and PRF sense respectively).**

And now we can say nice things. In fact, thanks to a recent paper by Kenny Paterson, Thomas Ristenpart and Thomas Shrimpton, we can say a few surprisingly positive things about TLS record encryption.

What Paterson/Ristenpart/Shrimpton show is that TLS record encryption satisfies a notion they call ‘length-hiding authenticated encryption‘, or LHAE. This new (and admittedly made up) notion not only guarantees the confidentiality and authenticity of records, but ensures that the attacker can’t tell how long they are. The last point seems a bit extraneous, but it’s important in the case of certain TLS libraries like GnuTLS, which actually add random amounts of padding to messages in order to disguise their length.

There’s one caveat to this proof: it only works in cases where the MAC has an output size that’s greater or equal to the cipher’s block size. This is, needless to say, a totally bizarre and fragile condition for the security of a major protocol to hang on. And while the condition does hold for all of the real TLS ciphersuites we use — yay! — this is more a happy accident than the result of careful design on anyone’s part. It could easily have gone the other way.

So how does the proof work?

Good question. Obviously the best way to understand the proof is to read the paper itself. But I’d like to try to give an intuition.

First of all, we can save a lot of time by starting with the fact that CBC-mode encryption is already known to be IND-CPA secure if implemented with a secure block cipher (PRP). This result tells us only that CBC is secure against passive attackers who can request the encryption of chosen messages. (In fact, a properly-formed CBC mode ciphertext should be indistinguishable from a string of random bits.)

The problem with plain CBC-mode is that these security results don’t hold in cases where the attacker can ask for the decryption of chosen ciphertexts.

This limitation is due to CBC’s malleability — specifically, the fact that an attacker can tamper with a ciphertext, then gain useful information by sending the result to be decrypted. To show that TLS record encryption is secure, what we really want to prove is that tampering gives no useful results. More concretely, we want to show that asking for the decryption of a tampered ciphertext will always produce an error.

We have a few things working in our favor. First, remember that the underlying TLS record has a MAC on it. If the MAC is (PRF) secure, then any ciphertext tampering that results in a change to this record data or its MAC will be immediately detected (and rejected) by the decryptor. This is good.

Unfortunately the TLS MAC doesn’t cover the padding. To continue our argument, we need to show that no attacker can produce a legitimate ciphertext, and that includes tampering that messes with the padding section of the message. Here again things look intuitively good for TLS. During decryption, the decryptor checks the last byte of the padded message to see how much padding there is, then verifies that all padding bytes contain the same numeric value. Any tampering that affects this section of the plaintext should either:

  1. Produce inconsistencies in some padding bytes, resulting in a padding error, or
  2. Cause the wrong amount of padding to be stripped off, resulting in a MAC error.
Both of these error conditions will appear the same to the attacker, thanks to the requirement that TLS implementations hide the reason for a decryption failure. Once again, the attacker should learn nothing useful.

This all seems perfectly intuitive, and you can imagine the TLS developers making exactly this argument as they wrote up the spec. However there’s one small exception to the rule above, which can turn TLS implementations that add an unnecessarily large amount of padding to the plaintext. (For example, GnuTLS.)

To give an example, let’s say the unpadded record + MAC is 15 bytes. If we’re using AES, then this plaintext can be padded with a single byte. Of course, if we’re inclined to add extra padding, it could also be padded with seventeen bytes — both are valid padding strings. The two possible paddings are presented below:

If the extra-long padding is used, the attacker could maul the longer ciphertext by truncating it — throwing away the last 16-byte ciphertext block. Truncation is a viable way to maul CBC ciphertexts, since it has the same effect on the underlying plaintext. The CBC decryption of the truncated ciphertext would yield:
Which not very useful, since the invalid padding would lead to a decryption error. However, the attacker could fix this — this time, using the fact that TLS can be mauled by flipping bits in the last byte of the Initialization Vector. Such an attack would allow the attacker to convert that trailing 0x10 into an 0x00. This result is now valid ciphertext that will not produce an error on decryption.
So what has the attacker learned by this attack? In practice, not very much. Mostly what they know is the length of the original ciphertext — so much for GnuTLS’s attempt to disguise the length. But more fundamentally: this attacker of ‘mauling’ the ciphertext totally screws up the nice idea we were going for in our proof.
So the question is: can this attack be used against real TLS? And this is where the funny restriction about MAC size comes into play.

You see, if TLS MACs are always bigger than a ciphertext block, then all messages will obey a strict rule: no padding will ever appear in the first block of the CBC ciphertext.

Since the padding is now guaranteed to start in the second (or later) block of the CBC ciphertext, the attacker cannot ‘tweak’ it by modifying the IV (this attack only works against the first block of the plaintext). Instead, they would have to tamper with a ciphertext block. And in CBC mode, tampering with ciphertext blocks has consequences! Such a tweak will allow the attacker to change padding bytes, but as a side effect it will cause one entire block of the record or MAC to be randomized when decrypted. And what Paterson/Ristenpart/Shrimpton prove is that this ‘damage’ will inevitably lead to a MAC error.

This ‘lucky break’ means that an attacker can’t successfully tamper with a CBC-mode TLS ciphertext. And that allows us to push our way to a true proof of the CBC-mode TLS ciphersuites. By contrast, if the MAC was only 80 bits (as it is in some IPSEC configurations), the proof would not be possible. So it goes.

Now I realize this has all been pretty wonky, and that’s kind of the point! The moral to the story is that we shouldn’t need this proof in the first place! What it illustrates is how fragile and messy the TLS design really is, and how (once again) it achieves security by luck and the skin of its teeth, rather than secure design.

What about stream ciphers?

The good news — to some extent — is that none of the above problems apply to stream ciphers, which don’t attempt to hide the record length, and don’t use padding in the first place. So the security of these modes is much ‘easier’ to argue.

Of course, this is only ‘good news’ if you believe that the stream ciphers included with TLS are good in the first place. The problem, of course, is that the major supported stream cipher is RC4. I will leave it to the reader to decide if that’s a worthwhile tradeoff.

In conclusion

There’s probably a lot more that can be said about TLS record encryption, but really… I think this post is probably more than anyone (outside of the academic community and a few TLS obsessives) has ever wanted to read on the subject.

In the next posts I’m going to come back to the much more exciting Diffie-Hellman handshake and then try to address the $1 million and $10 million questions respectively. First: does anyone really implement TLS securely? And second: when can we replace this thing?

Notes:

* One thing I don’t mention in this post is the TLS 1.0 ’empty fragment’ defense, which actually works against BEAST and has been deployed in OpenSSL for several years. The basic idea is to encrypt an empty record of length 0 before each record goes over the wire. In practice, this results in a full record structure with a MAC, and prevents attackers from exploiting the residue bug. Although nobody I know of has ever proven it secure, the proof is relatively simple and can be arrived at using standard techniques.

** The typical security definition for a MACs is SUF-CMA (strongly unforgeable under chosen message attack). This result uses the stronger — but also reasonable — assumption that the MAC is actually a PRF.