Attack of the Week: Unpicking PLAID

A few years ago I came across an amusing Slashdot story: ‘Australian Gov’t offers $560k Cryptographic Protocol for Free‘. The story concerned a protocol developed by Australia’s Centrelink, the equivalent of our Health and Human Services department, that was wonderfully named the Protocol for Lightweight Authentication of ID, or (I kid you not), ‘PLAID‘.

Now to me this headline did not inspire a lot of confidence. I’m a great believer in TANSTAAFL — in cryptographic protocol design and in life. I figure if someone has to use ‘free’ to lure you in the door, there’s a good chance they’re waiting in the other side with a hammer and a bottle of chloroform, or whatever the cryptographic equivalent might be.

A quick look at PLAID didn’t disappoint. The designers used ECB like it was going out of style; did unadvisable things with RSA encryption, and that was only the beginning.

What PLAID was not, I thought at the time, was bad to the point of being exploitable. Moreover, I got the idea that nobody would use the thing. It appears I was badly wrong on both counts.

This is apropos of a new paper authored by Degabriele, Fehr, Fischlin, Gagliardoni, Günther, Marson, Mittelbach and Paterson entitled ‘Unpicking Plaid: A Cryptographic Analysis of an ISO-standards-track Authentication Protocol‘. Not to be cute about it, but this paper shows that PLAID is, well, bad.

As is typical for this kind of post, I’m going to tackle the rest of the content in a ‘fun’ question and answer format.

What is PLAID?

In researching this blog post I was somewhat amazed to find that Australians not only have universal healthcare, but that they even occasionally require healthcare. That this surprised me is likely due to the fact that my knowledge of Australia mainly comes from the first two Crocodile Dundee movies.

It seems that not only do Australians have healthcare, but they also have access to a ‘smart’ ID card that allows them to authenticate themselves. These contactless smartcards run the proprietary PLAID protocol, which handles all of the ugly steps in authenticating the bearer, while also providing some very complicated protections to prevent user tracking.

This protocol has been standardized as Australian standard AS-5185-2010 and is currently “in the fast track standardization process for ISO/IEC 25185-1.2”. You can obtain your own copy of the standard for a mere 118 Swiss Francs, which my currency converter tells me is entirely too much money. So I’ll do my best to provide the essential details in this post — and many others are in the research paper.

How does PLAID work?

Accompanying tinfoil hat
not pictured.

PLAID is primarily an identification and authentication protocol, but it also attempts to offer its users a strong form of privacy. Specifically, it attempts to ensure that only authorized readers can scan your card and determine who you are.

This is a laudable goal, since contactless smartcards are ‘promiscuous’, meaning that they can be scanned by anyone with the right equipment. Countless research papers have been written on tracking RFID devices, so the PLAID designers had to think hard about how they were going to prevent this sort of issue.

Before we get to what steps the PLAID designers took, let’s talk about what one shouldn’t do in building such systems.

Let’s imagine you have a smartcard talking to a reader where anyone can query the card. The primary goal of an authentication protocol is to perform some sort of mutual authentication handshake and derive a session key that the card can use to exchange sensitive data. The naive protocol might look like this:

Naive approach to an authentication protocol. The card identifies itself
before the key agreement protocol is run, so the reader can look up a card-specific key.

The obvious problem with this protocol is that it reveals the card ID number to anyone who asks. Yet such identification appears necessary, since each card will have its own secret key material — and the reader must know the card’s key in order to run an authenticated key agreement protocol.

PLAID solves this problem by storing a set of RSA public keys corresponding to various authorized applications. When the reader says “Hi, I’m a hospital“, a PLAID card determines which public key it use to talk to hospitals, then encrypts data under the key and sends it over. Only a legitimate hospital should know the corresponding RSA secret key to decrypt this value.

So what’s the problem here?

PLAID’s approach would be peachy if there were only one public key to deal with. However PLAID cards can be provisioned to support many applications (called ‘capabilities’). For example, a citizen who routinely finds himself wrestling crocodiles might possess a capability unique to the Reptile Wound Treatment unit of a local hospital.* If the card responds to this capability, it potentially leaks information about the bearer.

To solve this problem, PLAID cards do some truly funny stuff.

Specifically, when a reader queries the card, the reader initially transmits a set of capabilities that it will support (e.g., ‘hospital’, ‘bank’, ‘social security center’). If the PLAID card has been provisioned with a matching public key, it goes ahead and uses it. If no matching key is found, however, the card does not send an error — since this would reveal user-specific information. Instead, it fakes a response by encrypting junk under a special ‘dummy’ RSA public key (called a ‘shill key’) that’s stored within the card. And herein lies the problem.

You see, the ‘shill key’ is unique to each card, which presents a completely new avenue for tracking individual cards. If an attacker can induce an error and subsequently fingerprint the resulting RSA ciphertext — that is, figure out which shill key was used to encipher it — they can potentially identify your card the next time they encounter you.

A portion of the PLAID protocol (source). The reader (IFD) is on the left, the card (ICC) is on the right.

Thus the problem of tracking PLAID cards devolves to one of matching RSA ciphertexts to unknown public keys. The PLAID designers assumed this would not be possible. What Degabriele et al. show is that they were wrong.

What do German Tanks have to do with RSA encryption?

To distinguish the RSA moduli of two different cards, the researchers employed of an old solution to a problem called the German Tank Problem. As the name implies, this is a real statistical problem that the allies ran up against during WWII.

The problem can be described as follows:

Imagine that a factory is producing tanks, where each tank is printed with a sequential serial number in the ordered sequence 1, 2, …, N. Through battlefield captures you then obtain a small and (presumably) random subset of k tanks. From the recovered serial numbers, your job is to estimate N, the total number of tanks produced by the factory.

Happily, this is the rare statistical problem with a beautifully simple solution.** Let m be the maximum serial number in the set of k tanks you’ve observed. To obtain a rough guess Ñ for the total number of tanks produced, you can simply compute the following formula:

So what’s this got to do with guessing an unknown RSA key?

Well, this turns out to be another instance of exactly the same problem. Imagine that I can repeatedly query your card with bogus ‘capabilities’ and thus cause it to enter its error state. To foil tracking, the card will send me a random RSA ciphertext encrypted with its (card-specific) “shill key”. I don’t know the public modulus N corresponding to your key, but I do know that the ciphertext was computed using the standard RSA encryption formula m^e mod N.

My goal, as it was with the tanks, is to make a guess for N.

It’s worth pointing out that RSA is a permutation, which means that, provided the plaintext messages are randomly distributed, the ciphertexts will be randomly distributed as well. So all I need to do is collect a number of ciphertexts and apply the equation above. The resulting guess Ñ should then serve as a (fuzzy) identifier for any given card.

Results for identifying a given card in a batch of (up to) 100 cards. Each card was initially ‘fingerprinted’ by collecting k1=1000 RSA ciphertexts. Arbitrary cards were later identified by collecting varying number of ciphertexts (k2).

Now obviously this isn’t the whole story — the value Ñ isn’t exact, and you’ll get different levels of error depending on how many ciphertexts you get, and how many cards you have to distinguish amongst. But as the chart above shows, it’s possible to identify a specific card within in a real system (containing 100 cards) with reasonable probability.

But that’s not realistic at all. What if there are other cards in play?

It’s important to note that real life is nothing like a laboratory experiment. The experiment above considered a ‘universe’ consisting of only 100 cards, required an initial ‘training period’ of 1000 scans for a given card, and subsequent recognition demanded anywhere from 50 to 1000 scans of the card.

Since the authentication time for the cards is about 300 milliseconds per scan, this means that even the minimal number (50) of scans still requires about 15 seconds, and only produces the correct result with about 12% probability. This probability goes up dramatically the more scans you get to make.

Nonetheless, even the 50-scan result is significantly better than guessing, and with more concerted scans can be greatly improved. What this attack primarily serves to prove is that homebrew solutions are your enemy. Cooking up a clever approach to foiling tracking might seem like the right way to tackle a problem, but sometimes it can make you even more vulnerable than you were before.

How should one do this right?

The basic property that the PLAID designers were trying to achieve with this protocol is something called key privacy. Roughly speaking, a key private cryptosystem ensures that an attacker cannot link a given ciphertext (or collection of ciphertexts) to the public key that created them — even if the attacker knows the public key itself.

There are many excellent cryptosystems that provide strong key privacy. Ironically, most are more efficient than RSA; one solution to the PLAID problem is simply to switch to one of these. For example, many elliptic-curve (Diffie-Hellman or Elgamal) solutions will generally provide strong key privacy, provided that all public keys in the system are set in the same group.

A smartcard encryption scheme based on, say, Curve25519 would likely avoid most of the problems presented above, and might also be significantly faster to boot.

What else should I know about PLAID?

There are a few other flaws in the PLAID protocol that make the paper worthy of a careful read — if only to scare you away from designing your own protocols.

In addition to the shill key fingerprinting attacks, the authors also show how to identify the set of capabilities that a card supports, using a relatively efficient binary search approach. Beyond this, there are also many significantly more mundane flaws due to improper use of cryptography.

By far the ugliest of these are the protocol’s lack of proper RSA-OAEP padding, which may present potential (but implementation specific) vulnerabilities to Bleichenbacher’s attack. There’s also some almost de rigueur misuse of CBC mode encryption, and a handful of other flaws that should serve as audit flags if you ever run into them in the wild.

At the end of the day, bugs in protocols like PLAID mainly help to illustrate the difficulty of designing secure protocols from scratch. They also keep cryptographers busy with publications, so perhaps I shouldn’t complain too much.

You should probably read up a bit on Australia before you post again.

I would note that this really isn’t a question. But it’s absolutely true. And to this end: I would sincerely like to apologize to any Australian citizens who may have been offended by this post.

Notes:

* This capability is not explicitly described in the PLAID specification.

** The solution presented here — and used in the paper — is the frequentist approach to the problem. There is also a Bayesian approach, but it isn’t required for this application.

Attack of the week: POODLE

Believe it or not, there’s a new attack on SSL. Yes, I know you’re thunderstruck. Let’s get a few things out of the way quickly.

First, this is not another Heartbleed. It’s bad, but it’s not going to destroy the Internet. Also, it applies only to SSLv3, which is (in theory) an obsolete protocol that we all should have ditched a long time ago. Unfortunately, we didn’t.

Anyway, enough with the good news. Let’s get to the bad.

The attack is called POODLE, and it was developed by Bodo Möller, Thai Duong and Krzysztof Kotowicz of Google. To paraphrase Bruce Schneier, attacks only get better — they never get worse. The fact that this attack is called POODLE also tells us that attack names do get worse. But I digress.

The rough summary of POODLE is this: it allows a clever attacker who can (a) control the Internet connection between your browser and the server, and (b) run some code (e.g., script) in your browser to potentially decrypt authentication cookies for sites such as Google, Yahoo and your bank. This is obviously not a good thing, and unfortunately the attack is more practical than you might think. You should probably disable SSLv3 everywhere you can. Sadly, that’s not so easy for the average end user.

To explain the details, I’m going to use the usual ‘fun’ question and answer format I employ for attacks like these.

What is SSL?

SSL is probably the most important security protocol on the Internet. It’s used to encrypt connections between two different endpoints, most commonly your web browser and a web server. We mostly refer to SSL by the dual moniker SSL/TLS, since the protocol suite known as Secure Sockets Layer was upgraded and renamed to Transport Layer Security back in 1999.

This bug has nothing to do with TLS, however. It’s purely a bug in the old pre-1999 SSL, and specifically version 3 — something we should have ditched a long time ago. Unfortunately, for legacy reasons many browsers and servers still support SSLv3 in their configurations. It turns out that when you try to turn this option off, a good portion of the Internet stops working correctly, thanks to older browsers and crappy load balancers, etc.

As a result, many modern browsers and servers continue to support SSLv3 as an option. The worst part of this is that in many cases an active attacker can actually trigger a fallback. That is, even if both the server and client support more modern protocols, as long as they’re willing to support SSLv3, an active attacker can force them to use this old, terrible protocol. In many cases this fallback is transparent to the user.

What’s the matter with SSL v3?

So many things it hurts to talk about. For our purposes we need focus on just one. This has to do with the structure of encryption padding used when encrypting with the CBC mode ciphersuites of SSLv3.

SSL data is sent in ‘record’ structures, where each record is first authenticated using a MAC. It’s subsequently enciphered using a block cipher (like 3DES or AES) in CBC mode. This MAC-then-encrypt design has been the cause of much heartache in the past. It’s also responsible for the problems now.

Here’s the thing: CBC mode encryption requires that the input plaintext length be equal to a multiple of the cipher’s block size (8 bytes in the case of 3DES, 16 bytes for AES). To make sure this is the case, SSL implementations add ‘padding’ to the plaintext before encrypting it. The padding can be up to one cipher block in length, is not covered by the MAC, and always ends with a single byte denoting the length of the padding that was added.

In SSLv3, the contents of the rest of the padding is unspecified. This is the problem that will vex us here.

How does the attack work?

Let’s imagine that I’m an active attacker who is able to obtain a CBC-encrypted record containing an interesting message like a cookie. I want to learn a single byte of this cookie — and I’m willing to make the assumption that this byte happens to live at the end of a cipher block boundary.

(Don’t worry about how I know that the byte I want to learn is in this position. Just accept this as a given for now.)

Imagine further that the final block of the record in question contains a full block of padding. If we’re using AES as our cipher, this means that the last byte of the plaintext of the final block contains a ’15’ value, since there are 15 bytes of padding. The preceding 15 bytes of said block contain arbitrary values that the server will basically strip off and ignore upon decryption, since SSLv3 doesn’t specify what they should contain. (NB: TLS does, which prevents this issue.)

The attack works like this. Since I control the Internet connection, I can identify the enciphered block that I want to learn within an encrypted record. I can then substitute (i.e., move) this block in place of the final block that should contain only padding.

When the server receives this new enciphered record, it will go ahead and attempt to decrypt the final block (which I’ll call C_n) using the CBC decryption equation, which looks like this:

Decrypted final block := Decipher(C_n) XOR C_{n-1}

Note that C_{n-1} is the second-to-last block of the encrypted record.

If the decrypted final block does not contain a ’15’ in the final position, the server will assume either that the block is bogus (too much padding) or that there’s less padding in the message than we intended. In the former case it will simply barf. In the latter case it will assume that the meaningful message is longer than it actually is, which should trigger an error in decryption since MAC verification will fail. This should also terminate the SSL connection.

Indeed, this is by far the most likely outcome of our experiment, since the deciphered last byte is essentially random — thus failure will typically occur 255 out of every 256 times we try this experiment. In this case we have to renegotiate the handshake and try again.

Every once in a while we’ll get lucky. In 1/256 of the cases, the deciphered final block will contain a 15 byte at the final position, and the server will accept this as as a valid padding length. The preceding fifteen bytes have also probably been changed, but the server will then strip off and ignore those values — since SSLv3 doesn’t care about the contents of the padding. No other parts of the ciphertext have been altered, so decryption will work perfectly and the server should report no errors.

This case is deeply meaningful to us. If this happens, we know that the decipherment of the final byte of C_n, XORed with the final byte of the preceding ciphertext block, is equal to ’15’. From this knowledge we can easily determine the actual plaintext value of the original byte we wanted to learn. We can recover this value by XORing it with the final byte of the preceding ciphertext block, then XOR that with the last byte of the ciphertext block that precedes the original block we targeted.

Voila, in this case — which occurs with probability 1/256 — we’ve decrypted a single byte of the cookie.

The important thing to know is that if at first we don’t succeed, we can try, try again. That’s because each time we fail, we can re-run the SSL handshake (which changes the encryption key) and try the attack again. As long as the cookie byte we’re attacking stays in the same position, we can continue our attempts until we get lucky. The expected number of attempts needed for success is 256.

We’ve got one byte, how do we get the rest?

The ability to recover a single byte doesn’t seem so useful, but in fact it’s all we need to decipher the entire cookie — if we’re able to control the cookie’s alignment and location within the enciphered record. In this case, we can simply move one byte after another into that critical final-byte-of-the-cipher-block location and run the attack described above.

One way to do this is to trick the victim’s browser into running some Javascript we control. This script will make SSL POST requests to a secure site like Google. Each time it does so, it will transmit a request path first, followed by an HTTP cookie and other headers, followed by a payload it controls.

Source: Möller et al.

Since the script controls the path and payload, by varying these values and knowing the size of the intermediate headers, the script can systematically align each specific byte of the cookie to any location it wants. It can also adjust the padding length to ensure that the final block of the record contains 16 bytes of padding.

This means that our attack can now be used to decrypt an entire cookie, with an average of 256 requests per cookie byte. That’s not bad at all.

So should we move to West Virginia and stock up on canned goods?

Portions of the original SSL v3 specification being
reviewed at IETF 90.

Maybe. But I’m not so sure. For a few answers on what to do next, see Adam Langley and Rob Graham’s blog posts on this question.

Note that this entire vulnerability stems from the fact that SSLv3 is older than Methuselah. In fact, there are voting-age children who are younger than SSLv3. And that’s worrying.

The obvious and correct solution to this problem is find and kill SSLv3 anywhere it lurks. In fact, this is something we should have done in the early 2000s, if not sooner. We can do it now, and this whole problem goes away.

The problem with the obvious solution is that our aging Internet infrastructure is still loaded with crappy browsers and servers that can’t function without SSLv3 support. Browser vendors don’t want their customers to hit a blank wall anytime they access a server or load balancer that only supports SSLv3, so they enable fallback. Servers administrators don’t want to lock out the critical IE6 market, so they also support SSLv3. And we all suffer.

Hopefully this will be the straw that breaks the camel’s back and gets us to abandon obsolete protocols like SSLv3. But nobody every went bankrupt betting on insecurity. It’s possible that ten years from now we’ll still be talking about ways to work around POODLE and its virulent flesh-eating offspring. All we can do is hope that reason will prevail.

Attack of the Week: Triple Handshakes (3Shake)

The other day Apple released a major security update that fixes a number of terrifying things that can happen to your OS/X and iOS devices. You should install it. Not only does this fix a possible remote code execution vulnerability in the JPEG parser (!), it also patches a TLS/SSL protocol bug known as the “Triple Handshake” vulnerability. And this is great timing, since Triple Handshakes are something I’ve been meaning (and failing) to write about for over a month now.

But before we get there: a few points of order.

First, if Heartbleed taught us one thing, it’s that when it comes to TLS vulnerabilities, branding is key. Henceforth, and with apologies to Bhargavan, Delignat-Lavaud, Pironti, Fournet and Strub (who actually discovered the attack*), for the rest of this post I will be referring to the vulnerability simply as “3Shake”. I’ve also taken the liberty of commissioning a logo (courtesy @Raed667). I hope you like it.

On a more serious note, 3Shake is not Heartbleed. That’s both good and bad. It’s good because Heartbleed was nasty and 3Shake really isn’t anywhere near as dangerous. It’s bad since, awful as it was, Heartbleed was only an implementation vulnerability — and one in a single TLS library to boot. 3Shake represents a novel and fundamental bug in the TLS protocol.

The final thing you should know about 3Shake is that, according to the cryptographic literature, it shouldn’t exist.

You see, in the last few years there have been at least three four major crypto papers purporting to prove the TLS protocol secure. The existence of 3Shake doesn’t make those results wrong. It may, however, indicate that cryptographers need to think a bit more about what ‘secure’ and ‘TLS’ actually mean. For me, that’s the most fascinating implication of this new attack.

I’ll proceed with the usual ‘fun’ question-and-answer format I save for this sort of thing.

What is TLS and why should you care?

Since you’re reading this blog, you probably already know something about TLS. You might even realize how much of our infrastructure is protected by this crazy protocol.

In case you don’t: TLS is a secure transport protocol that’s designed to establish communications between two parties, who we’ll refer to as the Client and the Server. The protocol consists of two sub-protocols called the handshake protocol and the record protocol. The handshake is intended to authenticate the two communicating parties and establish shared encryption keys between them. The record protocol uses those keys to exchange data securely.

For the purposes of this blog post, we’re going to focus primarily on the handshake protocol, which has (at least) two major variants: the RSA handshake and the Diffie-Hellman handshake (ECDHE/DHE). These are illustrated below.

Simplified description of the TLS handshakes. Top: RSA handshake. Bottom: Diffie-Hellman handshake.

All this means we’re just now starting to uncover some of the bugs that have been present in the protocol since it was first designed. And we’re likely to discover more! That’s partly because this analysis is at a very early stage. It’s also partly because, from an analysts’ point of view, we’re still trying to figure out exactly what the TLS handshake is supposed to do.As much as I love TLS, the protocol is a hot mess. For one thing, it inherits a lot of awful cryptography from its ancient predecessors (SSLv1-3). For another, it’s only really beginning to be subjected to rigorous, formal analysis.

Well, what is the TLS handshake supposed to do?

Up until this result, we thought we had a reasonable understanding of the purpose of the TLS handshake. It was intended to authenticate one or both sides of the connection, then establish a shared cryptographic secret (called the Master Secret) that could be used to derive cryptographic keys for encrypting application data.

The first problem with this understanding is that it’s a bit too simple. There isn’t just one TLS handshake, there are several variants of it. Worse, multiple different handshake types can be used within a single connection.

The standard handshake flow is illustrated — without crypto — in the diagram below. In virtually every TLS connection, the server authenticates to the client by sending a public key embedded in a certificate. The client, for its part, can optionally authenticate itself by sending a corresponding certificate and proving it has the signing key. However this client authentication is by no means common. Many TLS connections are authenticated only in one direction.

Common TLS handshakes. Left: only server authenticates.
Right: client and server both authenticate with certificates.

TLS also supports a “renegotiation” handshake that can switch an open connection from one mode to the other. This is typically used to change a connection that was authenticated only in one direction (Server->Client) into a connection that’s authenticated in both directions. The server usually initiates renegotiation when the client has e.g., asked for a protected resource.

Renegotiating a session. A new handshake causes the existing connection to be mutually authenticated.

Renegotiation has had problems before. Back in 2009, Ray and Dispensa showed that a man-in-the-middle attacker could actually establish a (non-authenticated) connection with some server; inject some data; and when the server asks for authentication, the attacker could then “splice” on a real connection with an authorized client by simply forwarding the new handshake messages to the legitimate client. From the server’s point of view, both communications would seem to be coming from the same (now authenticated) person:

Ray/Dispensa attack from 2009. The attacker first establishes an unauthenticated connection and injects some traffic (“drop table *”). When the server initiates a renegotiation for client authentication, the attacker forwards the new handshake messages to an honest client Alice who then sends real traffic. Since the handshakes are not bound together, Bob views this as a single connection to one party.

To fix this, a “secure renegotiation” band-aid to TLS was proposed. The rough idea of this extension was to ‘bind’ the renegotiation handshake to the previous handshake, by having the client present the “Finished” message of the previous handshake. Since the Finished value is (essentially) a hash of the Master Secret and the (hash of) the previous handshake messages, this allows the client to prove that it — not an attacker — truly negotiated the previous connection.

All of this brings us back to the question of what the TLS handshake is supposed to do.

You see, the renegotiation band-aid now adds some pretty interesting new requirements to the TLS handshake. For one thing, the security of this extension depends on the idea that (1) no two distinct handshakes will happen to use the same Master Secret, and (2) that no two handshakes will have the same handshake messages, ergo (3) no two handshakes will have the same Finished message.

Intuitively, this seemed like a pretty safe thing to assume — and indeed, many other systems that do ‘channel binding’ on TLS connections also make this assumption. The 3Shake attack shows that this is not safe to assume at all.

So what’s the problem here?

It turns out that TLS does a pretty good job of establishing keys with people you’ve authenticated. Unfortunately there’s a caveat. It doesn’t truly guarantee the established key will be unique to your connection. This is a pretty big violation of the assumptions that underlie the “secure renegotiation” fix described above.

For example: imagine that Alice is (knowingly) establishing a TLS connection to a server Mallory. It turns out that Mallory can simultaneously — and unknown to Alice — establish a different connection to a second server Bob. Moreover, if Mallory is clever, she can force both connections to use the same “Master Secret” (MS).

Mallory creates two connections that use the same Master Secret.

The first observation of the 3Shake attack is that this trick can be played if Alice supports either of the or RSA and DHE handshakes — or both (it does not seem to work on ECDHE). Here’s the RSA version:**

RSA protocol flow from the triple handshake attack (source). The attacker is in the middle, while the client and server are on the left/right respectively. MS is computed as a function of (pms, cr, sr) which are identical in both handshakes.

So already we have a flaw in the logic undergirding secure renegotiation. The Master Secret (MS) values are not necessarily distinct between different handshakes.

Fortunately, the above attack does not let us resurrect the Ray/Dispensa injection attack. While the attacker has tricked the client into using a specific MS value, the handshake Finished messages — which the client will attach to the renegotiation handshake — will not be the same in both handshakes. That’s because (among other things) the certificates sent on each connection were very different, hence the handshake hashes are not identical. In theory we’re safe.

But here is where TLS gets awesome.

You see, there is yet another handshake I haven’t told you about. It’s called the “session resumption handshake”, and it allows two parties who’ve previously established a master secret (and still remember it) to resume their session with new encryption keys. The advantage of resumption is that it uses no public-key cryptography or certificates at all, which is supposed to make it faster.

It turns out that if an attacker knows the previous MS and has caused it to be the same on both sides, it can now wait until the client initiates a session resumption. Then it can replay messages between the client and server in order to update both connections with new keys:

An attacker replays the session resumption handshake to ensure the same key on both sides. Note that the handshake messages are identical in both connections. (authors of source)

Which brings us to the neat thing about this handshake. Not only is the MS the same on both connections, but both connections now see exactly the same (resumption) handshake messages. Hence the hash of these handshakes will be identical, which means in turn that their “Finished” message will be identical.

By combining all of these tricks, a clever attacker can pull off the following — and absolutely insane — “triple handshake” injection attack:

Triple handshake attack. The attacker mediates two handshakes that give MS on both sides, but two different handshake hashes. The resumption handshake leaves the same MS and an identical handshake hash on both sides. This means that the Finished message from the resumption handshake will be the same for the connections on either side of the attacker. Now he can hook up the two without anyone noticing that he previously injected traffic.

In the above scenario, an attacker first runs a (standard) handshake to force both sides of the connection to use the same MS. It then causes both sides to perform session resumption, which results in both sides using the same MS and having the same handshake hash and Finished messages on both sides. When the server initiates renegotiation, the attacker can forward the third (renegotiation) handshake on to the legitimate client as in the Ray/Dispensa attack — secure in the knowledge that both client and server will expect the same Finished token.

And that’s the ballgame.

What’s the fix?

There are several, and you can read about them here.

One proposed fix is to change the derivation of the Master Secret such that it includes the handshake hash. This should wipe out most of the attacks above. Another fix is to bind the “session resumption” handshake to the original handshake that led to it.

Wait, why should I care about injection attacks?

You probably don’t, unless you happen to be one of the critical applications that relies on the client authentication and renegotiation features of TLS. In that case, like most applications, you probably assumed that a TLS connection opened with a remote user was actually from that user the whole time, and not from two different users.

If you — like most applications — made that assumption, you might also forget to treat the early part of the connection (prior to client authentication) as a completely untrusted bunch of crap. And then you’d be in a world of hurt.

But don’t take my word for it. There’s video! See here for the source, background and details.

What does this have to do with the provable security of TLS?

Of all the questions 3Shake raises, this one is the most interesting. As I mentioned earlier, there have been several recent works that purport to prove things about the security of TLS. They’re all quite good, so don’t take any of this as criticism.

However, they (with one exception, the miTLS project) didn’t find this attack. Why is that?

The first reason is simple: many of these works analyze only the basic TLS handshake, or they omit at least one of the possible handshakes (e.g., resumption). This means they don’t catch the subtle interactions between the resumption handshake, the renegotiation handshake, and extensions — all of which are the exact ingredients that make most TLS attacks possible.

The second problem is that we don’t quite know what standard we’re holding TLS to. For example, the common definition of security for TLS is called “Authenticated and Confidential Channel Establishment” (ACCE). Roughly speaking this ensures that two parties can establish a channel and that nobody will be able to determine what data is being sent over said channel.

The problem with ACCE is that it’s a definition that was developed specifically so that TLS could satisfy it. As a result, it’s necessarily weak. For example, ACCE does not actually require that each handshake produces a unique Master Secret — one of the flaws that enables this attack — because such a definition was not possible to achieve with the existing TLS protocol. In general this is what happens when you design a protocol first and prove things about it later.

What’s the future for TLS? Can’t we throw the whole thing out and start over again?

Sure, go ahead and make TLS Rev 2. It can strip out all of this nonsense and start fresh.

But before you get cocky, remember — all these crazy features in TLS were put there for a reason. Someone wanted and demanded them. And sadly, this is the difference between a successful, widely-used protocol and your protocol.

Your new replacement for TLS might be simple and wonderful today, but that’s only because nobody uses it. Get it out into the wild and before long it too will be every bit as crazy as TLS.

Notes:

* An earlier version of this post incorrectly identified the researchers who discovered the attack.

** The Diffie-Hellman (DHE) version is somewhat more clever. It relies on the attacker manipulating the D-H parameters such that they will force the client to use a particular key. Since DHE parameters sent down from the server are usually ‘trusted’ by TLS implementations, this trick is relatively easy to pull off.

Can hackers decrypt Target’s PIN data?

Short answer: probably not.

Slightly longer answer: it depends on whether they have access to the encryption key, or to a machine that contains the encryption key.

In case you have no idea what I’m talking about: there was recently a massive credit card breach at Target. If you’re like many people you probably heard about this three times. First in the news, then again in your email when Target notified you that you were a victim, and finally a third time when you checked your credit card bill. Not a proud day for our nation’s retailers.

The news got a bit messier today when Target announced the thieves had also managed to get their hands on the PIN numbers of unfortunate debit card customers. But this time there’s a silver lining: according to Target, the PIN data was encrypted under a key the hackers don’t have.

Snyder said PIN data is encrypted at a retail location’s keypad with Triple-DES [3DES] encryption and that data remains encrypted over the wire until it reaches its payment processor. Attackers would have to have compromised the point-of-sale system and intercepted the PIN data before it is encrypted in order to have accessed it.

Several folks on Twitter have noted that 3DES is no spring chicken, but that’s not very important. Aside from a few highly impractical attacks, there isn’t much to worry about with 3DES. Moreover, PCI standards appear to mandate unique keys for every payment terminal, which means that the attackers would need to compromise the terminals themselves, or else break into the back-end payment processor. If Target is to be believed, this has not happened.

Others have pointed out that PINs are pretty short. For example, there are only 10,000 4-digit PINs — so surely the attackers can “brute-force” through this space to figure out your PIN. The good news is that encryption is decidedly not the same thing as password hashing, which means this is unlikely to be a serious concern. Provided that Target is being proactive and makes sure to change the keys now.

Of course you shouldn’t take my word for this. It helps to take a quick look at the PCI PIN encryption standards themselves. Before you encrypt a 4-digit PIN, the PIN is first processed and in some cases padded to increase the complexity of the data being encrypted. There are four possible encryption formats:

Format 0. XOR the PIN number together with the Primary Account Number (PAN), usually the rightmost twelve digits of the card number, not including the last digit. Then encrypt the result using 3DES in ECB mode.
Format 1. Concatenate the PIN number with a unique transaction number and encrypt using 3DES in ECB mode.
Format 2. Pad with some fixed (non-random) padding, then encrypt in 3DES/ECB with a unique, derived per-transaction key (called a DUKPT). Update: only used for EMV cards.
Format 3. Pad with a bunch of random bytes, then 3DES/ECB encrypt.

Notice that in each case the encryption is ECB mode, but in Formats 0, 1 and 3 the plaintext has been formatted in such a way that two users with PIN “1234” are unlikely to encrypt exactly the same value under the same key. For example, consider the Format 0 encryptions for two users with the same PIN (1234) but different PANs:

(PIN) 0x1234FFFFFFFF ⊕ (PAN) 0x937492492032 = 0x81406DB6DFCD
(PIN) 0x1234FFFFFFFF ⊕ (PAN) 0x274965382343 = 0x357D9AC7DCBC

Notice that the values being encrypted (at right) will be quite different. ECB mode has many flaws, but one nice feature is that the encryption of two different values (even under the same key) should lead to effectively unrelated ciphertexts. This means that even an attacker who learns the user’s PAN shouldn’t be able to decompose the encrypted PIN without knowledge of the key. Their best hope would be to gain access to the terminal, hope that it was still configured to use the same key, and build a dictionary — encrypting every possible PIN under a specific user’s PAN — before they could learn anything useful about one user’s key.

This does not seem practical.

The one exception to the above rule is Format 2, which does not add any unpredictable padding to the plaintext at all. While the PIN is padded out, but there are still exactly 10,000 possible plaintexts going into the encryption. PCI deals with this by mandating that the payment terminal derive a unique key per transaction, hopefully using a secure key derivation function. Update: this one probably isn’t used by Target.

All of this is a long, dull way of saying that encryption is not like password hashing. Provided that you can keep the keys secret, it’s perfectly fine to encrypt messages drawn from even small message spaces — like PINs — provided you’re not an idiot about it. The PCI standards clearly skirt the borders of idiocy, but they mostly steer clear of disaster.

So in summary, Target debit card users are probably just fine. Until tomorrow, when we learn that the thieves also have the decryption keys. Then we can panic.

Is the cryptopocalypse nigh?

I’ve been traveling a bit over the past couple of weeks, so I haven’t had much of a chance to keep up on blogging. One consequence is that I completely missed my chance to say something about, well, anything that happened at BlackHat or Def Con.

Which is too bad, since a surprising amount of crypto stuff did go down! One thing that I wish I’d had a chance to talk about was a presentation at BlackHat called ‘The Factoring Dead‘ by Tom Ritter, Javed Samuel and Alex Stamos. (Thomas Ptacek was also involved, but he insists that he did nothing and they only included him out of pity.)

Although I don’t quite agree with the premise of this presentation, talks like it are fun — in the way that zombie movies are fun — because you get to abandon reality and think about some non-serious things for a while.

Factually, the presentation addresses some new results on the discrete logarithm problem published by Antoine Joux and his co-authors Razvan Barbulescu, Pierrick Gaudry and Emmanuel Thomé — developments the presenters cite as a very serious reason for people to get worried. And we’re not talking about the usual ‘you’re going to use crypto wrong’ kind of worry, but a more earthshaking kind: namely that RSA and Diffie-Hellman and DSA are soon going to be broken altogether.

Now let me be clear that Ritter, Samuel and Stamos and even lame, non-contributing Ptacek (henceforth RSSp) are all smart guys. In fact, I would venture to say they’re probably much more familiar with the Joux et al. work than I am since they seem interested in the details. Me, I like my hardness assumptions the way I like my hamburgers: neatly ground up and without the mooing. I could live my whole life without voluntarily digging into the efficiency of discrete logarithm solvers in fields of small characteristic.

Moreover, it’s hard to really argue with the content of RSSp’s presentation, since the bulk of what they do is to simply present facts. There really have been some major recent advances in solving discrete logarithms over certain special fields. There have been major attacks in the more distant past that took us by surprise. And yes, it would really awesome if people stopped fiddling with 1024-bit RSA keys and moved to elliptic curve crypto.

What’s concerning is the conclusions they (and other sources) have reached: namely, that factoring-based cryptosystems could be dead in just a few years. This kind of thing could incite panic! (I mean, it could if people actually cared about cryptography. Which unfortunately they mostly don’t.)

So let’s spend some time examining this.

Razvan Barbulescu, Emmanuel Thomé
and Antoine Joux hungrily eye a defenseless
discrete logarithm instance.
(Source: Steven Galbraith)

The background

The jumping off point for RSSp’s slides is a set of recent advances made by Joux and subsequently by Barbulescu, Gaudry, Joux and Thomé. The discrete logarithm problem (which you can learn about in this highly informative video) is noteworthy for two reasons. First, it’s believed that in many settings, the discrete logarithm problem is difficult to solve. Second: that assumption is critical to the security of many of the cryptosystems we know and love — for example, Diffie-Hellman and DSA.

Now the Joux and Barbulescu et al. results are important work, and really do deserve attention from cryptographers and non-cryptographers alike. What they show is that there exist relatively efficient algorithms for solving discrete logarithms in certain very specific types of field. Even more amazingly, the new algorithms are efficient enough to actually implement and run — against parameters that were previously thought to have cryptographic security!

Indeed this has already had some (limited) impact on practitioners in the research community. For example, many of the pairing-based cryptography libraries I work with ship with parameters that are now deemed to be too risky thanks to these new attacks. However — and this is key — these are research libraries. To my knowledge, none of these fields is actually being used in deployment, let alone standardized cryptography.

In other words, this is the kind of result that should receive (and has received!) lots of attention from cryptographers. But not necessarily from people who use cryptography. And here’s why.

You see, while the Joux and Barbulescu et al. algorithms really are efficient, they only work in fields with very specific properties. Namely, the fields must have small characteristic. Indeed, this feature of the field is critical to certain key steps of the algorithm. Take this property away and you still get some advances over the previous state of the art, but the advances are decidedly more theoretical.

Which brings us to the payoff: all of the fields we use to implement most cryptography — things like (non-elliptic-curve) DSA, Diffie-Hellman, and even the fields we use to implement NIST standard elliptic curves — are prime fields and hence don’t have the necessary properties to make the Joux results meaningful. Hence these attacks don’t seem to apply. Moreover there’s really no good reason to believe that they will anytime soon.

The BlackHat presentation

Which brings us to the RSSp BlackHat presentation. The overall premise of RSSp’s presentation is that advances happen rapidly. It’s not unprecedented for theoretical attacks in the literature to rapidly morph into real things that keep security engineers up at night. They also point out that attacks on the DLP have closely tracked attacks on factoring, both in the classical and the quantum world. (Ok, they don’t say the last part but it’s also true.)

RSSp also correctly imply that we should be switching away from cryptosystems that rely on the hardness of the (field-based) discrete logarithm problem, and should instead be moving to cryptosystems based on the elliptic curve discrete logarithm problem (ECDLP).* This is because none of the efficient attacks on DLP — including Joux’s algorithms — seem to apply in the (standardized) EC setting.

Lastly, they correctly point out that cryptosystems based on factoring and (field-based) Discrete Logarithms are already being deprecated by organizations like NIST for a variety of good — though not panic-related — reasons. Mostly this is because our current pre-Joux algorithms against those settings have made it too costly to get long-term (256-bit) security; you just need enormous keys. This was the case before Joux came along, and it’s still the case now.

The last point RSSp make is also absolutely correct: we should be switching to elliptic curve cryptography (ECC) as soon as possible, in part just so people can start using high-security cryptosystems without paying performance and bandwidth through the nose for the privilege. This isn’t totally academic, since — as Thomas Ptacek reminds me — we’re getting close to the death of 1024-bit keys. If your adversary is the NSA, anyway.

(It also doesn’t hurt that getting more people on this bandwagon will reduce the number of people rolling their own RSA implementation.)

So should we stock up on canned goods and move to Vermont?

Vermont is lovely. But you shouldn’t move there because of this presentation.

In fact this is hardly the first time we’ve seen a major breakthrough against an important cryptographic problem. In the 90s it was fast number field sieving against factoring-based systems and slightly later, things like the MOV attack on the ECDLP. In both cases, there was a small degree of panic, but ultimately a pretty mild result: cryptographers carefully examined the attack and chose new parameters that made it impractical. Then everyone went back to business.

In this case it looks like we’ve already got a set of parameters that keep us safe, so it’s even more unlikely — that except for a few researchers doing what researchers do — any of us will have to think about this in three years or five or even ten.

And by the way, you should not believe this because I say so — that would be foolish. You should believe it because the people who work in this area also don’t seem to think it’s an issue. If you doubt this, go to CRYPTO this week look for people running around with their hair on fire. The number should be barely higher than usual.

**What would we do if there was a real cryptpocalypse?**

Right! If we’re going to gin up a cryptpocalypse let’s have a real one. What if in 2015, Joux and his co-authors publish a new algorithm that efficiently solves the DLP in prime fields and at realistic key sizes, and moreover has a factoring analog that breaks RSA? Well, this would certainly be very bad for everything we’ve encrypted in the past, but at least we’d have an immediate solution: a rapid transition to elliptic curve crypto. Whew!

But this is not much fun: like watching an alien invasion movie where the aliens are allergic to water.

So let’s go way out on a limb and imagine that in 2017, after everyone has piled into ECC, Joux et al. and a team from the University of Waterloo team up to publish a revolutionary new attack that reduces ECDLP to roughly the hardness of field-based DLP. What would happen then?

Well, this would be really bad.

Let me reiterate that there’s a reason we like our current batch of public-key cryptosystems — EC, field-based and factoring-based systems. They’re relatively easy to understand, they’ve all been studied quite a bit. But most importantly: they’re really efficient.

Once you leave this domain you enter a region that the maps label with ‘here be dragons‘. Not because this space is empty. It’s just that there are relatively few efficient schemes that have received anywhere near the level of study that our beloved ones have.

Probably the oldest and most familiar of the alternative encryption schemes is the McEliece cryptosystem, which was developed way back in 1978 (that’s one year after RSA, in case you’re keeping score at home). McEliece and its modern variants are based on problems in algebraic coding theory: they depend for security on the hardness of decoding general codes, as well as some assumptions about the specific code used.

McEliece is surprisingly fast and (so far as we know) quite secure. There’s only one major problem: the public keys are big. According to a 2008 analysis by Bernstein, Lange and Peters, achieving security equivalent to a 3072-bit RSA key (aka the ‘128 bit’ symmetric-equivalent security level) requires a stunning 187 kilobyte McEliece public key. Moving up to 256-bit security — notably hard even for RSA — takes this to nearly 1MB. Recent improvements may cut that down a bit, but they’re still relatively unstudied.

Another possibility is to use Lattice-based cryptosystems. While there are several in the research literature, one of the most studied is the NTRU cryptosystem. I won’t confess to caring much about NTRU, except to note that it’s relatively well-studied by the standards of such alternative schemes and even shows up in some standards. Unfortunately that doesn’t mean everyone loves it. The inventors also hold a patent on it.

Lastly, for signatures at least we can always fall back on old standbys such as hash based signatures, which should hold us as long as Joan Daemen’s team can think up new hash functions.

Conclusion

We live in frightening times and yes, it’s always tempting to peek under the bed and imagine scary monsters. In practice, the reality is probably a bit more mundane.

As much as we love to read stories of solitary mathematicians making revolutionary leaps in their unlit apartment, this is rarely how things go. Even the most significant advances are usually telegraphed via a series of public, widely-read research papers.

In other words: when RSA and DSA really are in danger you’ll know about it. Just look for a bunch of cryptographers running around with their hair on fire.

Notes:

* By ‘based on’ I don’t mean that these cryptosystems necessarily reduce to the ECDLP, but rather that their security depends upon the hardness of the ECDLP.

Attack of the week: RC4 is kind of broken in TLS

Update: I’ve added a link to a page at Royal Holloway describing the new attack.

Listen, if you’re using RC4 as your primary ciphersuite in SSL/TLS, now would be a great time to stop. Ok, thanks, are we all on the same page?

No?

I guess we need to talk about this a bit more. You see, these slides have been making the rounds since this morning. Unfortunately, they contain a long presentation aimed at cryptographers, and nobody much seems to understand the real result that’s buried around slide 306 (!). I’d like to help.

Here’s the short summary:

According to AlFardan, Bernstein, Paterson, Poettering and Schuldt (a team from Royal Holloway, Eindhoven and UIC) the RC4 ciphersuite used in SSL/TLS is broken. If you choose to use it — as do a ridiculous number of major sites, including Google — then it may be possible for a dedicated attacker to recover your authentication cookies. The current attack is just on the edge of feasibility, and could probably be improved for specific applications.

This is bad and needs to change soon.

In the rest of this post I’ll cover the details in the usual ‘fun’ question-and-answer format I save for these kinds of attacks.

What is RC4 and why should I care?

RC4 is a fast stream cipher invented in 1987 by Ron Rivest. If you like details, you can see this old post of mine for a longwinded discussion of RC4’s history and flaws. Or you can take my word: RC4 is old and crummy.

Despite its age and crumminess, RC4 has become an extremely popular ciphersuite for SSL/TLS connections — to the point where it accounts for something absurd like half of all TLS traffic. There are essentially two reasons for this:

RC4 does not require padding or IVs, which means it’s immune to recent TLS attacks like BEAST and Lucky13. Many admins have recommended it as the solution to these threats.
RC4 is pretty fast. Faster encryption means less computation and therefore lower hardware requirements for big service providers like Google.

So what’s wrong with RC4?

Like all stream ciphers, RC4 takes a short (e.g., 128-bit) key and stretches it into a long string of pseudo-random bytes. These bytes are XORed with the message you want to encrypt, resulting in what should be a pretty opaque (and random-looking) ciphertext.

The problem with RC4 is that the above statement is not quite true. The bytes come out of the RC4 aren’t quite random looking — they have small biases. A few of these biases have been known for years, but weren’t considered useful. However, recent academic work has uncovered many small but significant additional biases running throughout the first 256 bytes of RC4 output. This new work has systematically identified many more.

At first blush this doesn’t seem like a big deal. Cryptographically, however, it’s a disaster. If I encrypt the same message (plaintext) with many different RC4 keys, then I should get a bunch of totally random looking ciphertexts. But these tiny biases mean that they won’t be random — a statistical analysis of different positions of the ciphertext will show that some values occur more often.

By getting many different encryptions of the same message — under different keys — I can tally up these small deviations from random and thus figure out what was encrypted.

If you like analogies, think of it like peeking at a picture with your hands over your eyes. By peeking through your fingers you might only see a tiny sliver of the scene you’re looking at. But by repeating the observation many times, you may be able to combining those slivers and figure out what you’re looking at.

Why would someone encrypt the same message many times?

The problem here (as usual) is your browser.

You see, there are certain common elements that your browser tends to send at the beginning of every HTTP(S) connection. One of these values is a cookie — typically a fixed string that identifies you to a website. These cookies are what let you log into Gmail without typing your password every time.

If you use HTTPS (which is enforced in many sites by default), then your cookies should be safe. After all, they’ll always be sent over an encrypted connection to the website.Unfortunately, if your connection is encrypted using RC4 (as is the case with Gmail), then each time you make a fresh connection to the Gmail site, you’re sending a new encrypted copy of the same cookie. If the session is renegotiated (i.e., uses a different key) between those connections, then the attacker can build up the list of ciphertexts he needs.

To make this happen quickly, an attacker can send you a piece of Javascript that your browser will run — possibly on a non-HTTPS tab. This Javascript can then send many HTTPS requests to Google, ensuring that an eavesdropper will quickly build up thousands (or millions) of requests to analyze.

Probability of recovering a plaintext byte (y axis) at a particular byte position in the RC4-encrypted stream (x axis), assuming 2^24 ciphertexts. Increasing the number of ciphertexts to 2^32 gives almost guaranteed plaintext recovery at all positions. (source)

Millions of ciphertexts? Pah. This is just crypto-wankery.

It’s true that the current attack requires an enormous number of TLS connections — in the tens to hundreds of millions — which means that it’s not likely to bother you. Today. On the other hand, this is the naive version of the attack, without any special optimizations.

For example, the results given by Bernstein assume no prior plaintext knowledge. But in practice we often know a lot about the structure of an interesting plaintext like a session cookie. For one thing, they’re typically Base64 encoded — reducing the number of possibilities for each byte — and they may have some recognizable structure which can be teased out using advanced techniques.

It’s not clear what impact these specific optimizations will have, but it tends to reinforce the old maxim: attacks only get better, they never get worse. And they can get a lot better while you’re arguing about them.

So what do we do about it?

Treat this as a wakeup call. Sites (like Google) need to stop using RC4 — before these attacks become practical. History tells us that nation-state attackers are already motivated to penetrate Gmail connections. The longer we stick with RC4, the more chances we’re giving them.

In the short term we have an ugly choice: stick with RC4 and hope for the best, or move back to CBC mode ciphersuites — which many sites have avoided due to the BEAST and Lucky13 attacks.

We could probably cover ourselves a bit by changing the operation of browsers, so as to detect and block Javascript that seems clearly designed to exercise these attacks. We could also put tighter limits on the duration and lifespan of session cookies. In theory we could also drop the first few hundred bytes of each RC4 connection — something that’s a bit of a hack, and difficult to do without breaking the TLS spec.

In the longer term, we need to move towards authenticated encryption modes like the new AEAD TLS ciphersuites, which should put an end to most of these problems. The problem is that we’ll need browsers to support these modes too, and that could take ages.

In summary

I once made fun of Threatpost for sounding alarmist about SSL/TLS. After all, it’s not like we really have an alternative. But the truth is that TLS (in its design, deployment and implementation) needs some active help. We’re seeing too many attacks, and they’re far too close to practical. Something needs to give.

We live in a world where NIST is happy to give us a new hash function every few years. Maybe it’s time we put this level of effort and funding into the protocols that use these primitives? They certainly seem to need it.

Attack of the week: TLS timing oracles

Ever since I started writing this blog (and specifically, the posts on SSL/TLS) I’ve had a new experience: people come up to me and share clever attacks that they haven’t made public yet.

This is pretty neat — like being invited to join an exclusive club. Unfortunately, being in this club mostly sucks. That’s because the first rule of ‘TLS vulnerability club’ is: You don’t talk about TLS vulnerability club. Which takes all the fun out of it.

(Note that this is all for boring reasons — stuff like responsible disclosure, publication and fact checking. Nobody is planning a revolution.)

Anyway, it’s a huge relief that I’m finally free to tell you about a neat new TLS attack I learned about recently. The new result comes from Nadhem AlFardan and Kenny Paterson of Royal Holloway. Dubbed ‘Lucky 13’, it takes advantage of a very subtle bug in the way records are encrypted in the TLS protocol.

If you aren’t into long crypto posts, here’s the TL;DR:

There is a subtle timing bug in the way that TLS data decryption works when using the (standard) CBC mode ciphersuite. Given the right set of circumstances, an attacker can use this to completely decrypt sensitive information, such as passwords and cookies.

The attack is borderline practical if you’re using the Datagram version of TLS (DTLS). It’s more on the theoretical side if you’re using standard TLS. However, with some clever engineering, that could change in the future. You should probably patch!

For the details, read on. As always, we’ll do this in the ‘fun’ question/answer format I save for these kinds of posts.

What is TLS, what is CBC mode, and why should I care if it’s broken?

Some background: Transport Layer Security (née SSL) is the most important security protocol on the Internet. If you find yourself making a secure connection to another computer, there’s a very good chance you’ll be doing it with TLS. (Unless you’re using UDP-based protocol, in which case you might use TLS’s younger cousin Datagram TLS [DTLS]).

The problem with TLS is that it kind of stinks. Mostly this is due to bad decisions made back in the the mid-1990s when SSL was first designed. Have you seen the way people dressed back then? Protocol design was worse.

While TLS has gotten better since then, it still retains many of the worst ideas from the era. One example is the CBC-mode ciphersuite, which I’ve written about several times before on this blog. CBC-mode uses a block cipher (typically AES) to encrypt data. It’s the most common ciphersuite in use today, probably because it’s the only mandatory ciphersuite given in the spec.

What’s wrong with CBC mode?

The real problem with TLS is not the encryption itself, but rather the Message Authentication Code (MAC) that’s used to protect the integrity (authenticity) of each data record.

Our modern understanding is that you should always encrypt a message first, then apply the MAC to the resulting ciphertext. But TLS gets this backwards. Upon encrypting a record, the sender first applies a MAC to the plaintext, then adds up to 255 bytes of padding to get the message up to a multiple of the cipher (e.g., AES’s) block size. Only then does it CBC-encrypt the record.

Structure of a TLS record. The whole thing is encrypted with CBC mode.

The critical point is that the padding is not protected by the MAC. This means an attacker can tamper with it (by flipping specific bits in the ciphertext), leading to a very nasty kind of problem known as a padding oracle attack.

In these attacks (example here), an attacker first captures an encrypted record sent by an honest party, modifies it, then re-transmits it to the server for decryption. If the attacker can learn whether her changes affected the padding — e.g., by receiving a padding error as opposed to a bad MAC error — she can use this information to adaptively decrypt the whole record. The structure of TLS’s encryption padding makes it friendly to these attacks.

Closeup of a padded TLS record. Each byte contains the padding length, followed by another (pointless, redundant) length byte.

But padding oracle attacks are well known, and (D)TLS has countermeasures!

The TLS designers learned about padding oracles way back in 2002, and immediately took steps to rectify them. Unfortunately, instead of fixing the problem, they decided to apply band-aids. This is a time-honored tradition in TLS design.

The first band-aid was simple: eliminate any error messages that could indicate to the attacker whether the padding check (vs. the MAC check) is what caused a decryption failure.

This seemed to fix things for a while, until some researchers figured out that you could simply time the server to see how long decryption takes, and thereby learn if the padding check failed. This is because implementations of the time would first check the padding, then return immediately (without checking the MAC) if the padding was bad. That resulted in a noticeable timing differential the attacker could detect.

Thus a second band-aid was needed. The TLS designers decreed that decryption should always take the same amount of time, regardless of how the padding check comes out. Let’s roll the TLS 1.2 spec:

[T]he best way to do this is to compute the MAC even if the padding is incorrect, and only then reject the packet. For instance, if the pad appears to be incorrect, the implementation might assume a zero-length pad and then compute the MAC.

Yuck. Does this even work?

Unfortunately, not quite. When the padding check fails, the decryptor doesn’t know how much padding to strip off. That means they don’t know how long the actual message is, and therefore how much data to MAC. The recommended countermeasure (above) is to assume no padding, then MAC the whole blob. As a result, the MAC computation can take a tiny bit longer when the padding is damaged.

The TLS designers realized this, but by this point they were exhausted and wanted to go think about something else. So they left us with the following note:

This leaves a small timing channel, since MAC performance depends to some extent on the size of the data fragment, but it is not believed to be large enough to be exploitable, due to the large block size of existing MACs and the small size of the timing signal.

And for the last several years — at least, as far as we know — they were basically correct.

How does this new paper change things?

The new AlFardan and Paterson result shows that it is indeed possible to distinguish the tiny timing differential caused by invalid padding, at least from a relatively close distance — e.g., over a LAN. This is partly due to advances in computing hardware: most new computers now ship with an easily accessible CPU cycle counter. But it’s also thanks to some clever statistical techniques that use many samples to smooth out and overcome the jitter and noise of a network connection.

The upshot is that new technique can measure timing differentials of less than 1 microsecond over a LAN connection — for example, if the attacker is in the same data center as your servers. It does this by making several thousand decryption queries and processing the results. Under the right circumstances, this turns out to be enough to bring (D)TLS padding oracle attacks back to life.

How does the attack work?

For the details, you should obviously read the full paper or at least the nice FAQ that Royal Holloway has put out. Here I’ll try to give some intuition.

Before I can explain the attack, you need to know a little bit about how hash-based MACs work. TLS typically uses HMAC with either MD5, SHA1 or SHA256 as the hash function. While these are very different hash functions, the critical point is that each one processes messages in 64-byte blocks.

Consequently, hashing time is a function of the number of blocks in the message, not the number of bytes. Going from a 64-byte input to a 65-byte input means an entire extra block, and hence a (relatively) large jump in the amount of computation time (an extra iteration of the hash function’s compression function).

There are a few subtleties in here. The hash functions incorporate an 8-byte length field plus some special hash function padding, which actually means a one-block message can only contain about 55 bytes of real data (which also includes the 13-byte record header). The HMAC construction adds a (constant) amount of additional work, but we don’t need to think about that here.

So in summary: you can get 55 bytes of data into one block of the hash. Go a single byte beyond that, and the hash function will have to run a whole extra round, causing a tiny (500-1000 hardware cycle) delay.

The attack here is to take a message that — including the TLS padding — would fall above that 55 byte boundary. However, the same message with padding properly removed would fall below it. When an attacker tampers with the message (damaging the padding), the decryption process will MAC the longer version of the message — resulting in a measurably higher computation time than when the padding checks out.

By repeating this process many, many thousand (or millions!) of times to eliminate noise and network jitter, it’s possible to get a clear measurement of whether the decryption succeeded or not. Once you get that, it’s just a matter of executing a standard padding oracle attack.

But there’s no way this will work on TLS! It’ll kill the session!

Please recall that I described this as a practical attack on Datagram TLS (DTLS) — and as a more theoretical one on TLS itself.* There’s a reason for this.

The reason is that TLS (and not DTLS) includes one more countermeasure I haven’t mentioned yet: anytime a record fails to decrypt (due to a bad MAC or padding error), the TLS server kills the session. DTLS does not do this, which makes this attack borderline practical. (Though it still takes millions of packet queries to execute.)

The standard TLS ‘session kill’ feature would appear to stop padding oracle attacks, since they require the attacker to make many, many decryption attempts. Killing the session limits the attacker to one decryption — and intuitively that would seem to be the end of it.

But actually, this turns out not to be true.

You see, one of the neat things about padding oracle attacks is that they can work across different sessions (keys), provided that that (a) your victim is willing to re-initiate the session after it drops, and (b) the secret plaintext appears in the same position in each stream. Fortunately the design of browsers and HTTPS lets us satisfy both of these requirements.

To make a target browser initiate many connections, you can feed it some custom Javascript that causes it to repeatedly connect to an SSL server (as in the CRIME attack). Note that the Javascript doesn’t need to come from the target webserver — it can even served on an unrelated non-HTTPS page, possibly running in a different tab. So in short: this is pretty feasible.
Morover, thanks to the design of the HTTP(S) protocol, each of these connections will include cookies at a known location in HTTP stream. While you may not be able to decrypt the rest of the stream, these cookie values are generally all you need to break into somebody’s account.

Thus the only practical limitation on such a cookie attack is the time it takes for the server to re-initiate all of these connections. TLS handshakes aren’t fast, and this attack can take tens of thousands (or millions!) of connections per byte. So in practice the TLS attack would probably take days. In other words: don’t panic.

On the other hand, don’t get complacent either. The authors propose some clever optimizations that could take the TLS attack into the realm of the feasible (for TLS) in the near future.

How is it being fixed?

With more band-aids of course!

But at least this time, they’re excellent band-aids. Adam Langley has written a 500-line OpenSSL patch (!) that modifies the CBC-mode decryption procedure to wipe out the timing differentials used by this attack. I would recommend that you think about updating at least your servers in the future (though we all know you won’t). Microsoft products should also see updates soon are allegedly not vulnerable to this attack, so won’t need updates.**

Still, this is sort of like fixing your fruitfly problem by spraying your kitchen with DDT. Why not just throw away the rotted fruit? In practice, that means moving towards modern AEAD ciphersuites like AES-GCM, which should generally end this madness. We hope.

Why not switch to RC4?

RC4 is not an option in DTLS. However, it will mitigate this issue for TLS, since the RC4 ciphersuite doesn’t use padding at all. In fact, this ancient ciphersuite has been (hilariously) enjoying a resurgence in recent years as the ‘solution’ to TLS attacks like BEAST. Some will see this attack as further justification for the move.

But please don’t do this. RC4 is old and creaky, and we really should be moving away from it too.

So what’s next for TLS?

I’d love to say more, but you see, the first rule of TLS vulnerability club is…

Notes:

* The attack on Datagram TLS is more subtle, and a lot more interesting. I haven’t covered it much in this post because TLS is much more widely used than DTLS. But briefly, it’s an extension of some previous techniques — by the same authors — that I covered in this blog last year. The gist is that an attacker can amplify the impact of the timing differential by ‘clogging’ the server with lots of unrelated traffic. That makes these tiny differentials much easier to detect.

** And if you believe that, I have a lovely old house in Baltimore to sell you…

Attack of the week: Cross-VM side-channel attacks

It’s been a busy week for crypto flaws. So busy in fact, that I’m totally overwhelmed and paralyzed trying to write about them. It’s made blogging almost impossible.

So let’s just blast through it, then we can get to the subject I actually want to talk about.

First, at the start of the week we received news that many Android applications and a few (critical!) Java libraries simply don’t validate certificates on TLS/SSL connections. This is disastrous, embarrassing and stupid, since lack of certificate verification makes TLS trivially vulnerable to man-in-the-middle attacks. I was going to say something shrill and over-the-top about all of this, but turns out that Threatpost has already totally beaten me there with this synopsis:

The death knell for SSL is getting louder.

Yes indeed, Threatpost. Thank you for making me feel better.

In other news, we learned that major email providers — who should remain nameless, but are actually Microsoft, Google, Apple, Yahoo and everyone else — have been competing to see who can deploy the shortest RSA key for DomainKeys Identified Mail (DKIM). I’m told that Yahoo was ahead with 384 bits, but Microsoft had a research team working on a 22-bit key and Apple had abandoned keys altogether, opting for a simple note from Tim Cook asserting that any invalid email messages were probably the recipient’s fault. (Ba-dum-tschch.)

So that’s the headline news, and I don’t want to write about any of it.

What I do want to write about is a result that’s gotten a lot less attention — mostly because it’s subtle, falls into the category of ‘things we thought of knew about, but didn’t explore‘ and because it involves Hidden Markov models — which are to tech reporters as raw garlic and silver are to vampires.

This new result is by Zhang, Juels, Reiter and Ristenpart, and it appeared in the ACM CCS conference just a few weeks ago, and it deals with something very relevant to the way we build modern systems. Specifically, if we’re going to go and stick all of cryptographic services in cloud-based VMs, how secure can we possibly expect them to be?

The answer is: unfortunately, not very. To get into the details I’m going to use the standard ‘fun’ question/answer format I usually save for these kinds of attacks.

Why would I put my cryptography in a VM anyway?

In case you missed it, the cloud is our future. The days of running our own cranky hardware and dealing with problems like power-supply faults are long gone. If you’re deploying a new service, there’s a good chance it’ll be at least partly cloud-based. There’s a decent chance it will be entirely cloud-based.

Take Instagram, for instance. Their entire $1bn service runs on a few hundred cleverly-managed EC2 instances. While Instagram itself isn’t exactly a security product, they do use TLS and SSH (presumably on the instances themselves), and this implies public key crypto and the use of secret keys.

Now this shouldn’t be a problem, but VM instances often share physical hardware with other instances, and since EC2 is a public service, those co-resident VMs may not be entirely friendly. The major threat here is, of course, software vulnerabilities — things that can let an attacker break out of one VM and into another. But even if you perfect the software, there’s another more insidious threat: namely, that the attacker VM instance could be able to run a side-channel attack on the co-resident VM.

This threat has long been discussed, and security people generally agree that it’s a concern. But actually implementing such an attack has proven surprisingly difficult. This is because real hypervisors put a lot of fluff between the attacking process and the bare metal of the server. Different VMs often run on different cores. Moreover, since each VM has an entire OS running inside of it, there’s tons of noise.

In fact, there’s been plenty of reason to wonder if these attacks are simply the product of security researchers’ fevered imaginations, or if something we need to worry about. What Zhang, Juels, Reiter and Ristenpart tell us is: yes, we should worry. And oh boy, do they do it in a nifty way.

So what is it and how does it work?

The new result focuses specifically on the Xen Hypervisor, which is the one actually used by services like Amazon EC2. Although the attack was not implemented in EC2 itself, it focuses on similar hardware: multi-core servers with SMT turned off. The threat model assumes that the attacker and victim VM are co-resident on the machine, and that the victim is decrypting an Elgamal ciphertext using libgcrypt v.1.5.0.

Now, Elgamal encryption is a great example for side-channel attacks, since it’s implemented by taking a portion of the ciphertext, which we’ll call x, and computing x^e mod N, where e is the secret key and N is (typically) a prime number. This exponentiation is implemented via the ‘square and multiply‘ algorithm, shown in the figure below:

Square and multiply algorithm (source: Zhang et al.)

The first thing you notice about square-and-multiply is that its operation depends fundamentally on the bits of the secret key. If the ith bit of e is 1, the steps labeled (M) and (R) are conducted. If that bit is 0, they aren’t. The bits of the key results in a distinctive set of computations that can be detected if the attacking VM is able to precisely monitor the hardware state.

Now, side-channel attacks on square-and-multiply are not new. They date back at least to Paul Kocher’s observations in the mid-to-late 90s using power and operating time as a channel, and they’ve been repeatedly optimized as technology advances. More recent attacks have exploited cache misses in a shared processor cache (typical in hyper-threading environments) as a means by which a single process can monitor the execution of another one.

However, while these attacks have worked from one process to another, they’ve never been applied to the full Xen VM setting. This is a pretty challenging problem for a variety of reasons, including::

The difficulty of getting the attacking process to run frequently enough to take precise measurements.
The problem that VCPUs can be assigned to different cores, or irrelevant VCPUs can be assigned to the same core.
Noisy measurements that give only probabilistic answers about which operations occurred on the target process.

And so on and so forth. The challenge in this paper is to overcome all that nonsense and still recover useful information from the attacked VM.

So what’s the basic idea?

At a fundamental level, the attack in this paper is quite similar to previous attacks that worked only across processes. The attacking VM first ‘primes‘ the L1 instruction cache by allocating continuous memory pages, then executing a series of instructions designed to load the cache with cache-line-sized blocks it controls.

The attacker then gives up execution and hopes that the target VM will run next on the same core — and moreover, that the target is in the process of running the square-and-multiply operation. If it is, the target will cause a few cache-line-sized blocks of the attacker’s instructions to be evicted from the cache. Which blocks are evicted is highly dependent on the operations that the attacker conducts.

To see what happened, the attacking VM must recover control as quickly as possible. It then ‘probes‘ to see which blocks have been evicted from the cache set. (This is done by executing the same instructions and timing the results. If a given block has been evicted from the cache, execution will result in a cache miss and a measurable delay.) By compiling a list of which blocks were missing, the attacker gains insight into which instructions may have been executed while the target VM was running.

A big challenge for the attacker is the need to regain control quickly. Wait too long and all kinds of things will happen — the state of the cache won’t give any useful information.

Normally Xen doesn’t allow VCPUs to rapidly regain control, but there are a few exceptions: Xen gives high priority to Virtual CPUs (VCPUs) that receive an interrupt. The authors were able to exploit this by running a 2-VCPU VM, where the second VCPU’s only job is to issue Inter-Processor Interrupts (IPIs) in an effort to get the first VCPU back in control as quickly as possible. Using this approach they were able to get back in the saddle within about 16 microseconds — an eternity in processing time, but enough to give useful information.

But isn’t that data noisy as hell? And fragmented?

Yes. The challenge here is that the attacking VM has no control over where in the computation it will jump in. It could get just a small fragment of the square-and-multiply operation (which is hundreds or thousands operations long), it could jump into the OS kernel, it could even get the wrong VM, thanks to the fact that they can run on any core. Plus the data could be pretty noisy.

The solution to these problems is what’s so nifty about this paper. First, they don’t just monitor one execution — they assume that the device is constantly decrypting different ciphertexts, all with the same key. This is a pretty reasonable assumption for something like an SSL web server.

Next, they use machine learning techniques to identify which of the many possible instruction sequences are associated with particular cache measurements. This requires the researchers to train the algorithm beforehand on the target hardware, having the target VCPU conduct of square, multiply and modular reduce calls in order build a training model. During the attack, the data was further processed using a Hidden Markov Model to eliminate errors and bogus measurements from non-cryptographic processes.

Even after all this work, the attacker winds up with thousands of fragments, some of which contain errors or low-confidence results. These can be compared against each other to reduce errors, then stitched together to recover the key itself. Fortunately this is a problem that’s been solved in many other domains (most famously: DNA sequencing), and the techniques used here are quite similar.

A good way to illustrate the process is to present a totally made-up example, in which six fragments are reconstructed to form a single spanning sequence:

S1: SRSRMRSRMRSRSRSMR
S2: MRSRSRSRMR**SRMRSR
S3: SRMRSRSR
S4: MRSRSRSR**SRMRSR
S5: MR*RSRMRSRMRSR
S6: MRSRSRMRSRSRSRMR
————————————————
SRSRMRSRMRSRSRSMRSRSRMRSRSRSRMRSRMRSRSRMRSRMRSR

This is obviously a huge simplification of a very neat (and complex) process that’s very well described in the paper. And if all this technical stuff is too much for you: it’s basically like the scene in Argo where the little kids reconstructed the shredded photos of the embassy workers. Just without the kids. Or the photos.

So does it actually work?

It would be a hell of a bad paper if it didn’t.

With everything in place, the researchers applied their attack against a 4096-bit Elgamal public key, which (due to an optimization in libgcrypt) actually has a 457-bit private key e. After several hours of data collection, they were able to obtain about 1000 key-related fragments, of which 330 turned out to be long enough to be useful for key reconstruction. These allowed the attackers to reconstruct the full key with only a few missing bits, and those they were able to guess using brute force.

And that, as they say, is the ballgame.

Left: Fragment size vs. number of fragments recovered, Right: sequence accuracy as a function of fragments in a batch. (source: Zhang et al.)

Oh my god, we’re all going to die.

I would note that this isn’t actually a question. But before you start freaking out and pulling down your cloud VMs, a few points of order.

First: there’s a reason these researchers did this with libgcrypt and Elgamal, and not, say OpenSSL and RSA (which would be a whole lot more useful). That’s because libgcrypt’s Elgamal implementation is the cryptographic equivalent of a 1984 Stanley lawnmower engine — it uses textbook square-and-multiply with no ugly optimizations to get in the way. OpenSSL RSA decryption, on the other hand, is more like a 2012 Audi turbo-diesel: it uses windowing and CRT and blinding and two different types of multiplication, all of which make it a real pain in the butt to deal with.

Secondly, this attack requires a perfect set of conditions. As proposed it works only works with two VMs, and requires specific training on the target hardware. This doesn’t mean that the attack isn’t viable (especially since cloud services probably do use lots of identical hardware), but it does mean that messiness — the kind you get in real cloud deployments — is going to be more of an obstacle than it seems.

One last thing worth mentioning is that before you can attack a VM, you have to get your attack VM onto the same physical hardware with your target. This seems like a pretty big challenge. Unfortunately, some slightly older research indicates that this is actually very feasible in existing cloud deployments. In fact, for only a few dollars, researchers were able to co-locate themselves with a given target VM with about 40% probability.

In the short term, you certainly shouldn’t panic about this, especially given how elaborate the attack is. But it does indicate that we should be thinking very hard about side-channel attacks, and considering how we can harden our systems and VMMs to be sure they aren’t going to happen to us.

Reposted: A cryptanalysis of HDCP v2.1

Update 8/27: This post was originally published three weeks ago under a different title. I subsequently took it down to give affected vendors time to patch the bugs. As a result of the notification, Digital Content Protection LLC (DCP) has updated the spec to v2.2.

Contrary to my understanding when I wrote the original post, HDCP v2 actually is used by a number of devices. I would like to give credit to Alon Ziv at Discretix, who had previously discovered the Locality Check issue, and to Martin Kaiser who experimentally verified the master secret issue on a European Samsung TV and a Galaxy S II.

Finally, I would like to thank Hanni Fakhoury and Marcia Hofmann at the Electronic Frontier Foundation for all of their helpful advice. The EFF is one of the only organizations that represents security researchers. Please consider donating so they can keep doing it!

Over the past couple of weeks I’ve mostly been blogging about inconsequential things. Blame summer for this — it’s hard to be serious when it’s 104 degrees out. But also, the world just hasn’t been supplying much in the way of interesting stuff to write about.

Don’t get me wrong, this is a good thing! But in a (very limited) way it’s also too bad. One of the best ways to learn about security systems is to take them apart and see how they fail. While individual systems can be patched, the knowledge we collect from the process is invaluable.

Fortunately for us, we’re not completely helpless. If we want to learn something about system analysis, there are plenty of opportunities right out there in the wild. The best place to start is by finding a public protocol that’s been published, but not implemented yet. Download the spec and start poking!

This will be our task today. The system we’ll be looking at is completely public, and (to the best of my knowledge) has not yet been deployed anywhere (Update: see note above). It’s good practice for protocol cryptanalysis because it includes all kinds of complicated crypto that hasn’t been seriously reviewed by anyone yet.

(Or at least, my Google searches aren’t turning anything up. I’m very willing to be corrected.)

Best of all, I’ve never looked at this system before. So whatever we find (or don’t find), we’ll be doing it together.

A note: this obviously isn’t going to be a short post. And the TL;DR is that there is no TL;DR. This post isn’t about finding bugs (although we certainly will), it’s about learning how the process works. And that’s something you do for its own sake.

HDCPv2

The protocol we’ll be looking at today is the High Bandwidth Digital Content Protection (HDCP) protocol version 2. Before you get excited, let me sort out a bit of confusion. We are not going to talk about HDCP version 1, which is the famous protocol you probably have running in your TV right now.

HDCPv1 was analyzed way back in 2001 and found to be wanting. Things got much worse in 2010 when someone leaked the HDCPv1 master key — effectively killing the whole system.

What we’ll be looking at today is the replacement: HDCP v2. This protocol is everything that its predecessor was not. For one thing, it uses standard encryption: RSA, AES and HMAC-SHA256. It employs a certificate model with a revocation list. It also adds exciting features like ‘localization’, which allows an HDCP transmitter to determine how far away a receiver is, and stop people from piping HDCP content over the Internet. (In case they actually wanted to do that.)

HDCPv2 has barely hit shelves yet (Update: though it was recently selected as the transport security for MiraCast). The Digital Content Protection licensing authority has been keeping a pretty up-to-date set of draft protocol specifications on their site. The latest version at the time of this writing is 2.1, and it gives us a nice opportunity to see how industry ‘does’ protocols.

An overview of the protocol

As cryptographic protocols go, HDCPv2 has a pretty simple set of requirements. It’s designed to protect high-value content running over a wire (or wireless channel) between a transmitter (e.g., a DVD player) and a receiver (a TV). The protocol accomplishes the following operations:

Exchanging and verifying public key certificates.
Establishing shared symmetric keys between the transmitter and receiver.
Caching shared keys for use in later sessions.
Verifying that a receiver is local, i.e., you’re not trying to proxy the data to some remote party via the Internet.

These functions are accomplished via three (mostly) separate protocols: a public-key Authenticated Key Agreement (AKE) protocol, a pairing protocol, where the derived key is cached for later use, and a locality check protocol to ensure that the devices are physically close.

I’m going to take these protocols one at a time, since each one involves its own messages and assumptions.

Phase (1): Authenticated Key Agreement (AKE)

The core of HDCPv2 is a custom key exchange protocol, which looks quite a bit like TLS. (In fact, the resemblance is so strong that you wonder why the designers didn’t just use TLS and save a lot of effort.) It looks like this:

HDCPv2 key agreement protocol (source). Click the image to enlarge.

Now, there’s lots going on here. But if we only look at the crypto, the summary is this:

The transmitter starts by sending ‘AKE_Init’ along with a random 64-bit nonce R_tx. In response, the receiver sends back its certificate, which contains its RSA public key and device serial number, all signed by the HDCP licensing authority.

If the certificate checks out (and is not revoked), the transmitter generates a random 128-bit ‘master secret’ K_m and encrypts it under the receiver’s public key. The result goes back to the receiver, which decrypts it. Now both sides share K_m and R_tx, and can combine them using a wacky custom key derivation function. The result is a shared a session key K_d.

The last step is to verify that both sides got the same K_d. The receiver computes a value H’, using HMAC-SHA256 on inputs K_d, R_tx and some other stuff. If the receiver’s H’ matches a similar value computed at the transmitter, the protocol succeeds.

Simple, right?

Note that I’ve ignored one last message in the protocol, which turns out to be very important. Before we go there, let’s pause and take stock.

If you’re paying close attention, you’ve noticed a couple of worrying things:

The transmitter doesn’t authenticate itself at all. This means anyone can pretend to be a transmitter.
None of the handshake messages (e.g., AKE_Transmitter_Info) appear to be authenticated. An attacker can modify them as they transit the wire.
The session key K_d is based solely on the inputs supplied by the transmitter. The receiver does generate a nonce R_rx, but it isn’t used in the above protocol.

None of these things by themselves are a problem, but they make me suspicious.

Phase (2): Pairing

Public-key operations are expensive. And you only really need to do them once. The designers recognized this, and added a feature called ‘pairing’ to cache the derived K_m for use in later sessions. This is quite a bit like what TLS does for session resumption.

However, there’s one catch, and it’s where things get complicated: some receivers don’t have a secure non-volatile storage area for caching keys. This didn’t phase the designers, who came up with a ‘clever’ workaround for the problem: the receiver can simply ask the transmitter to store K_m for it.

To do this, the receiver encrypts K_m under a fixed internal AES key K_h (which is derived by hashing the receiver’s RSA private key). In the last message of the AKE protocol the receiver now sends this ciphertext back to the transmitter for storage. This appears in the protocol diagram as the ciphertext E(K_h, K_m).

The obvious intuition here is that K_m is securely encrypted. What could possibly go wrong? The answer is to ask how K_m is encrypted. And that’s where things get worrying.

According to the spec, K_m is encrypted using AES in what amounts to CTR mode, where the ‘counter’ value is defined as some value m. On closer inspection, m turns out to be just the transmitter nonce R_tx padded with 0 bits. So that’s simple. Here’s what it looks like:

Encryption of the master key K_m with the receiver key K_h. The value m is equal to (R_tx || 0x000000000000000).

Now, CTR is a perfectly lovely encryption mode provided that you obey one unbreakable rule: the counter value must never be re-used. Is that satisfied here? Recall that the counter m is actually chosen by another party — the transmitter. This is worrying. If the transmitter wants, it could certainly ask the receiver to encrypt anything it wants under the same counter.

Of course, an honest transmitter won’t do this. But what about a dishonest transmitter? Remember that the transmitter is not authenticated by HDCP. The upshot is that an attacker can pretend to be a transmitter, and submit her own K_m values to be encrypted under K_h and m.

Even this might be survivable, if it weren’t for one last fact: in CTR mode, encryption and decryption are the same operation.

All of this leads to the following attack:

Observe a legitimate communication between a transmitter and receiver. Capture the values R_tx and E(K_h, K_m) as they go over the wire.
Now: pretend to be a transmitter and initiate your own session with the receiver.
Replay the captured R_tx as your initial transmitter nonce. When you reach the point where you pick the master secret, don’t use a random value for K_m. Instead, set K_m equal to the ciphertext E(K_h, K_m) that you captured earlier. Recall that this ciphertext has the form:AES(K_h, R_Tx || 000…) ⊕ K_m

Now encrypt this value under the receiver’s public key and send it along.
Sooner or later the receiver will encrypt the ‘master secret’ you chose above under its internal key K_h. The resulting ciphertext can be expanded to:
AES(K_h, R_Tx || 000…) ⊕ AES(K_h, R_Tx || 000…) ⊕ K_m

Thanks to the beauty of XOR, the first two terms of this ciphertext simply cancel out. The result is the original K_m from the first session! Yikes!

This is a huge problem for two reasons. First, K_m is used to derive the session keys used to encrypt HDCP content, which means that you may now be able to decrypt any past HDCP content traces. And even worse, thanks to the ‘pairing’ process, you may be able to use this captured K_m to initiate or respond to further sessions involving this transmitter.

Did I mention that protocols are hard?

Phase (3): The Locality Check

For all practical purposes, the attack above should be our stopping point. Once you have the stored K_m you can derive the session keys and basically do whatever you want. But just for fun, let’s go on and see what else we can find.

At its heart, the locality check is a pretty simple thing. Let’s assume the transmitter and receiver are both trusted, and have successfully established a session key K_d by running the AKE protocol above. The locality check is designed to ensure that the receiver is nearby — specifically, that it can provide a cryptographic response to a challenge, and can do it in < 7 milliseconds. This is a short enough time that it should prevent people from piping HDCP over a WAN connection.

(Why anyone would want to do this is a mystery to me.)

In principle the locality check should be simple. In practice, it turns out to be pretty complicated. Here’s the ‘standard’ protocol:

Simple version of the locality check. K_d is a shared key and R_rx is a receiver nonce.

Now this isn’t too bad: in fact, it’s about the simplest challenge-response protocol you can imagine. The transmitter generates a random nonce R_n and sends it to the receiver. The receiver has 7 milliseconds to kick back a response L’, which is computed as HMAC-SHA256 of {the session key K_d, challenge nonce R_n, and a ‘receiver nonce’ R_rx}. You may recall that the receiver nonce was chosen during the AKE.

So far this looks pretty hard to beat.

But here’s a wrinkle: some devices are slow. Consider that the 7 milliseconds must the round-trip communication time, as well as the time required to compute the HMAC. There is a very real possibility that some slower, embedded devices might be not be able to respond in time.

Will HDCP provide a second, optional protocol to deal with those devices? You bet it will.

The second protocol allows the receiver to pre-compute the HMAC response before the timer starts ticking. Here’s what it looks like:

‘Precomputed’ version of the protocol.

This is nearly the same protocol, with a few small differences. Notably, the transmitter gives the receiver all the time it wants to compute the HMAC. The timer doesn’t start until the receiver says it’s ready.

Of course, there has to be something keeping the RTT under 7ms. In this case the idea is to keep the receiver from speaking until it’s received some authenticator from the transmitter. This consists of the least significant 128-bits of the expected HMAC result (L’), which is computed in the same way as above. The receiver won’t speak until it sees those bits. Then it‘ll it kick back its own response, which consists of the most-significant 128 bits of the same value.

Ok, so here we have a protocol that’s much more complicated. But considered its own, this one looks pretty ok by me.

But here’s a funny question: what if we try running both protocols at once?

No, I’m not being ridiculous. You see, it turns out that the receiver and transmitter get to negotiate which protocol they support. By default they run the ‘simple’ protocol above. If both support the pre-computed version, they must indicate this in the AKE_Transmitter_Info and AKE_Receiver_Info messages sent during the handshake.

This leads to the following conjecture: what if, as a man-in-the-middle attacker, we can convince the transmitter to run the ‘pre-computed’ protocol. And at the same time, convince the receiver to run the ‘simple’ one? Recall that none of the protocol flags (transmitted during the AKE) are authenticated. We might be able to trick both sides into seeing a different view of the other’s capabilities.

Here’s the setup: we have a receiver running in China, and a transmitter located in New York. We’re a man-in-the-middle sitting next to the transmitter. We want to convince the transmitter that the receiver is close — close enough to be on a LAN, for example. Consider the following attack:

Modify the message flags so that the transmitter thinks we’re running the pre-computed protocol. Since it thinks we’re running the pre-computed protocol, it will hand us R_n and then give us all the time in the world to do our pre-computation.
Now convince the receiver to run the ‘simple’ protocol. Send R_n to it, and wait for the receiver to send back the HMAC result (L’).
Take a long bath, mow the lawn. Watch Season 1 of Game of Thrones.
At your leisure, send the RTT_READY message to the transmitter, which has been politely waiting for the receiver to finish the pre-computation
The transmitter will now send us some bits. Immediately send it back the most significant bits of the value L’, which we got in step (2).
Send video to China.

Now this attack may not always work — it hinges on whether we can convince the two parties to run different protocols. Still, this is a great teaching example in that it illustrates a key problem in cryptographic protocol design: parties may not share the same view of what’s going on.

A protocol designer’s most important job is to ensure that such disagreements can never happen. The best way to do this is to ensure that there’s only one view to be had — in other words, dispense with all the options and write a single clear protocol. But if you must have options, make sure that the protocol only succeeds if both sides agree on what those options are. This is usually accomplished by authenticating the negotiation messages, but even this can be a hard, hard problem.Compared to the importance of learning those lessons, actually breaking localization is pretty trivial. It’s a stupid feature anyway.

In Conclusion

This has been a long post. To the readers I have left at this point: thanks for sticking it out.The only remaining thing I’d like to say is that this post is not intended to judge HDCPv2, or to make it look bad. It may or it may not be a good protocol, depending on whether I’ve understood the specification properly and depending on whether the above flaws make it into real devices. Which, hopefully they won’t now.

What I’ve been trying to do is teach a basic lesson: protocols are hard. They can fail in ruinous, subtle, unexpected, exciting ways. The best cryptographers — working with BAN logic analyzers and security proofs — still make mistakes. If you don’t have those tools, steer clear.

The best ‘fix’ for the problem is to recognize how dangerous protocols can be,and to avoid designing your own. If you absolutely must do so, please try to make yours as simple as possible. Too many people fail to grok this lesson, and the result is, well, HDCPv2.

===

Update 8/27: As I mentioned above, DCP has released a new version of the specification. Version 2.2 includes several updates: it changes the encryption of Km to incorporate both the Transmitter and Receiver nonces. It also modifies the locality check to patch the bug described above. Both of these changes appear to mitigate the bugs above, at least in new devices.

A missing post (updated)

Update (8/27): I’ve put the original post back up.

Update (8/9): I’ve re-written this post to include a vague, non-specific explanation of the bug. I’ve now confirmed the problem with one vendor, who has asked for a week to issue a patch. So far I haven’t had a response from the DCP’s technical people. And yes, I do realize someone PasteBinned the original post while it was up.

A few people have asked what happened to the post that was in this space just a few hours ago. No, you’re not going crazy. It was here.

The post contained a long, detailed evaluation of the HDCP v2 protocol. My idea was to do real-time evaluation of an industry protocol that hasn’t been deployed yet — a kind of ‘liveblogging’ cryptanalysis. What I expected to find was some bad practices, which I would gently poke fun at. I didn’t expect to find anything serious.

I was wrong in that initial judgement, with some caveats. I’m going to give a vague and non-specific summary here, and I hope to re-post the detailed technical post in a few days when I’ve heard (something, anything!) from DCP, the organization that maintains HDCP.

In case you’ve never heard of it, HDCP is a security protocol used to ‘protect’ video traveling over wired and wireless networks. There are two versions. Version 1 is in your TV today, and was seriously compromised in 2010. Version 2 is much better, but has only been deployed in a few products — including those that implement MiraCast (formerly Wi-Fi Display).

Version 2 contains a key agreement protocol that’s designed to establish a session encryption key between a transmitter (your phone, for example) and a receiver (a TV). Once this key is established, the transmitter can encrypt all video data going over the wire.

What I discovered in my brief analysis is a flaw in the key agreement protocol that may allow a man-in-the-middle to recover the session key (actually the ‘master secret’ used to derive the session key). This could potentially allow them to decrypt content. More on that in a minute, though.

I also discovered some slightly less serious flaws elsewhere in the protocol. It turns out that the DCP already knows about those, thanks to some enterprising work by a smart guy at an unnamed vendor (who deserves credit, and will get it once I put the original post back up).

Now for a few big caveats about the session key bug.

The bug I found does not get you all the way to decrypting HDCPv2 streams in practice, thanks to a tiny additional protection I missed while writing the original post. I don’t think much of this protection, since it involves a secret that’s stored in every single HDCPv2-compliant device. That’s a pretty lousy way to keep a secret.

And of course I haven’t personally verified this in any real HDCP devices, since I don’t own any. Although if I did, I could use this nifty HDCP plugin for WireShark to do some of the work.

The issue has been confirmed by one vendor, who is working on a patch for their product. Their products are used in real things that you’ve heard of, so I’m trusting that they’d know.

The last thing I want to address is why I published this, and why I subsequently pulled it.

When I wrote the original post I thought HDCP v2 was just a ‘paper spec’ — that there were no devices actually using it. Shortly after posting, I came across one commercial product that does claim to support HDCPv2. Later I discovered a few others. To be on the safe side, I decided to pull the post until I could notify the vendors. Then through sheer ineptitude I briefly re-posted it. Now I’m doing my best to put the toothpaste back in the tube.

As soon as I get some feedback I intend to put the post back up. A post which, incidentally, was not intended to break anything, but rather to serve as a lesson in just how complicated it is to design your own protocol. I suppose it’s achieved that goal.

Anyway, I’m putting this up as a placeholder in case you’re curious about what happened or why the heck I’m not blogging. Writing a long technical post and then having to can it is a drag. But hopefully we’ll be back to our regularly-scheduled programming in no time at all.

Some random thoughts about crypto. Notes from a course I teach. Pictures of my dachshunds.

What is PLAID?

How does PLAID work?

So what’s the problem here?

What do German Tanks have to do with RSA encryption?

But that’s not realistic at all. What if there are other cards in play?

How should one do this right?

What else should I know about PLAID?

You should probably read up a bit on Australia before you post again.

What is SSL?

What’s the matter with SSL v3?

How does the attack work?

We’ve got one byte, how do we get the rest?

So should we move to West Virginia and stock up on canned goods?

What is TLS and why should you care?

Well, what is the TLS handshake supposed to do?

So what’s the problem here?

What’s the fix?

Wait, why should I care about injection attacks?

What does this have to do with the provable security of TLS?

What’s the future for TLS? Can’t we throw the whole thing out and start over again?

The background

The BlackHat presentation

So should we stock up on canned goods and move to Vermont?

What would we do if there was a real cryptpocalypse?

Conclusion

What is RC4 and why should I care?

So what’s wrong with RC4?

Why would someone encrypt the same message many times?

Millions of ciphertexts? Pah. This is just crypto-wankery.

So what do we do about it?

In summary

What is TLS, what is CBC mode, and why should I care if it’s broken?

What’s wrong with CBC mode?

But padding oracle attacks are well known, and (D)TLS has countermeasures!

How does this new paper change things?

How does the attack work?

But there’s no way this will work on TLS! It’ll kill the session!

How is it being fixed?

Why not switch to RC4?

So what’s next for TLS?

Why would I put my cryptography in a VM anyway?

So what is it and how does it work?

So what’s the basic idea?

But isn’t that data noisy as hell? And fragmented?

So does it actually work?

Oh my god, we’re all going to die.

HDCPv2

An overview of the protocol

Phase (1): Authenticated Key Agreement (AKE)

Phase (2): Pairing

Phase (3): The Locality Check

In Conclusion

Menu

**What would we do if there was a real cryptpocalypse?**