Monday, January 9, 2012

Attack of the week: Datagram TLS

Nadhem Alfardan and Kenny Paterson have a paper set to appear in NDSS 2012 entitled 'Plaintext-Recovery Attacks Against Datagram TLS'.* This is obviously music to my ears. Plaintext recovery! Datagram TLS! Padding oracles. Oh my.

There's just one problem: the paper doesn't seem to be online yet (Update 1/10: It is now. See my update further below.) Infoworld has a few vague details, and it looks like the vulnerability fixes are coming fast and furious. So let's put on our thinking caps and try to puzzle this one out.
What is Datagram TLS?
If you're reading this blog, I assume you know TLS is the go-to protocol for encrypting data over the Internet. Most people associate TLS with reliable transport mechanisms such as TCP. But TCP isn't the only game in town.

Audio/video streaming, gaming, and VoIP apps often use unreliable datagram transports like UDP. These applications don't care if packets arrive out of order, or if they don't arrive at all. The biggest priority is quickly handling the packets that do arrive.

Since these applications need security too, TLS tries to handle them via an extension called Datagram TLS (DTLS). DTLS addresses the two big limitations that make TLS hard to use on an unreliable datagram transport:
  1. TLS handshake messages need to arrive whole and in the right order. This is easy when you're using TCP, but doesn't work so well with unreliable datagram transport. Moreover, these messages are bigger than the typical 1500 byte Ethernet frame, which means fragmentation (at best), and packet loss (at worst).
     
  2. Ordinarily TLS decryption assumes that you've received all the previous data. But datagrams arrive when they want to -- that means you need the ability to decrypt a packet even if you haven't received its predecessor.
There are various ways to deal with these problems; DTLS mounts a frontal assault. The handshake is made reliable by implementing a custom ack/re-transmit framework. A protocol-level fragmentation mechanism is added to break large handshake messages up over multiple datagrams. And most importantly: the approach to encrypting records is slightly modified.
So what's the problem?
To avoid radical changes, DTLS inherits most of the features of TLS. That includes its wonky (and obsolete) MAC-then-Encrypt approach to protecting data records. Encryption involves three steps:
  1. Append a MAC to the plaintext to prevent tampering.
  2. Pad the resulting message to a multiple of the cipher block size (16 bytes for AES). This is done by appending bytes of padding, where each byte must contain the value X.
  3. Encrypt the whole mess using CBC mode.
Cryptographers have long known that this kind of encryption can admit padding oracle attacks. This happens when a decryptor does something obvious (throw an error, for example) when it encounters invalid padding, i.e., padding that doesn't meet the specification above.
CBC Mode encryption, courtesy Wikipedia.
This wouldn't matter very much, except that CBC mode is malleable. This means an attacker can flip bits in an intercepted ciphertext, which will cause the same bits to be flipped when the ciphertext is ultimately decrypted. Padding oracle attacks work by carefully tweaking a ciphertext in specific ways before sending it to an honest decryptor. If the the decryptor returns a padding error, then the adversary knows something about the underlying plaintext. Given enough time the attacker can use these errors to completely decrypt the message.

I could spend a lot of time describing padding oracle attacks, but it's mostly beside the point.** Standard TLS implementations know about this attack and deal with it in a pretty effective way. Whenever the decryptor encounters bad padding, it just pretends that it hasn't. Instead, it goes ahead with the rest of the decryption procedure (i.e., checking the MAC) even if it knows that the message is already borked. 

This is extra work, but it's extra work with a purpose. If a decryptor doesn't perform the extra steps, then messages with bad padding will get rejected considerably faster than other messages. A clever attacker can detect this condition by carefully timing the responses. Performing the unnecessary steps (mostly) neutralizes that threat.
Ok, so you say these attacks are already mitigated. Why are we still talking about this?
Before I go on, I offer one caveat: what I know about this attack comes from speculation, code diffs and some funny shapes I saw in the clouds this afternoon. I think what I'm saying is legit, but I won't swear to it until I read Alfardan and Paterson's paper.

But taking my best guess, there are two problems here. One is related to the DTLS spec, and the second is just an implementation problem. Either one alone probably wouldn't be an issue; the two together spell big trouble.

The first issue is in the way that the DTLS spec deals with invalid records. Since standard TLS works over a reliable connection, the application should never receive invalid or out-of-order data except when packets are being deliberately tampered with. So when standard TLS encounters a bad record MAC (or padding) it takes things very seriously -- in fact, it's required to drop the connection

This necessitates a new handshake, a new key, and generally makes it hard for attackers to run an honest padding oracle attack, since these attacks typically require hundreds or thousands of decryption attempts on a single key.***

DTLS, on the other hand, runs over an unreliable datagram transport, which may not correct for accidental packet errors. Dropping the connection for every corrupted packet just isn't an option. Thus, the standard is relaxed. An invalid MAC (or padding) will cause a single record to be thrown away, but the connection itself goes on. 

This still wouldn't matter much if it wasn't for the second problem, which is specific to the implementation of DTLS in libraries like OpenSSL and GnuTLS. 

You see, padding oracle vulnerabilities in standard TLS are understood and mitigated. In OpenSSL, for example, the main decryption code has been carefully vetted. It does not return specific padding errors, and to avoid timing attacks it performs the same (unnecessary) operations whether or not the padding checks out.

In a perfect world DTLS decryption would do all the same things. But DTLS encryption is subtly different from standard TLS encryption, which means it's implemented in separate code. Code that isn't used frequently, and doesn't receive the same level of scrutiny as the main TLS code. Thus -- two nearly identical implementations, subject to the same attacks, with one secure and one not. (Update 1/11: There's a decent explanation for this, see my update below.) 

And if you're the kind of person who needs this all tied up with a bow, I would point you to this small chunk of the diff just released for the latest OpenSSL fix. It comes from the DTLS-specific file d1_pkt.c:

+ /* To minimize information leaked via timing, we will always
+        * perform all computations before discarding the message.
+        */
+ decryption_failed_or_bad_record_mac = 1;

I guess that confirms the OpenSSL vulnerability. Presumably with these fixes in place, the MAC-then-Encrypt usage in DTLS will now go back to being, well, just theoretically insecure. But not actively so.

Update 1/11/2012: Kenny Paterson has kindly sent me a link to the paper, which wasn't available when I wrote the original post. And it turns out that while the vulnerability is along the lines above, the attack is much more interesting.

An important aspect that I'd missed is that DTLS does not return error messages when it encounters invalid padding -- it just silently drops the packet. This helps to explain the lack of countermeasures in the DTLS code, since the lack of responses would seem to be a padding oracle attack killer.

Alfardan and Paterson show that this isn't the case. They're able to get the same information by timing the arrival of 'heartbeat' messages (or any predictable responses sent by an upper-level protocol). Since DTLS decryption gums up the works, it can slightly delay the arrival of these packets. By measuring this 'gumming' they can determine whether padding errors have ocurred. Even better, they can amplify this gumming by sending 'trains' of valid or invalid packets.

All in all, a very clever attack. So clever, in fact, that it makes me despair that we'll ever have truly secure systems. I guess I'll have to be satisfied with one less insecure one.

Notes:

* N.J.A. Alfardan and K.G. Paterson, Plaintext-Recovery Attacks Against Datagram TLS, To appear in NDSS 2012.

** See here for one explanation. See also a post from this blog describing a padding oracle attack on XML encryption.

*** There is one very challenging padding oracle attack on standard TLS (also mitigated by current implementations). This deals with the problem of session drops/renegotiation by attacking data that remains constant across sessions -- things like passwords or cookies.

2 comments:

  1. Dear Matthew Green:

    I don't want to overgeneralize, and I don't want to toot my own horn too much, but don't you think developments like this lend weight to the hypothesis that new protocols should be designed to fit new purposes instead of TLS being adapted to new purposes?

    I'm thinking of the relative simplicity and (as far as I know) better security of SRTP + ZRTP versus DTLS.

    Regards,

    Zooko

    ReplyDelete
  2. Hi Zooko,
    That's a good question. I do think that it's valuable to have general-purpose security protocols for things like this, even if you ultimately choose something specialized for a particular application.

    If you don't have these generic protocols, then people will try to adapt special-purpose protocols to applications for which they're not suited. So you'll end up with ZRTP being used for {ridiculous application X}, and people will still blame you (Zooko) when it blows apart. Maybe.

    As to the more general question, whether it's better to design new protocols or hack on top of TLS -- I guess there are two ways of thinking about it.

    1. Starting from scratch is a bad idea because it means you're likely to make all of the same mistakes everyone else has, and possibly some new ones (both in your spec and in your code). Plus nobody will study your protocol because it's not part of a 'major standard'.

    2. Basing a new protocol on TLS is an awful idea, because TLS sucks -- in the sense that it's all crapped up with insecure legacy nonsense like the things that led to this attack.

    The funny thing about DTLS is that the authors seem to have gotten the worst of both worlds. They kept the stupid MAC-then-Encrypt and other questionable aspects of TLS, but they still had to radically change the protocol and write all new code. But maybe I'm overstating things.

    ReplyDelete