Let’s talk about ZRTP

I just checked the date on my last post and it seems that I haven’t blogged in nearly a month. Believe me, this isn’t for lack of trying. The world has just been a very, very busy place.

But this is the Thanksgiving holiday and every (US) American knows the best part of Thanksgiving is sitting around for hours waiting for a huge bird to cook thoroughly enough that it won’t kill your family. And that means I finally have time to write about my favorite wonky topic: cryptographic protocols. And how utterly confusing they can be.

ZRTP

The subject of today’s post is the ZRTP key agreement protocol. ZRTP has recently gotten some press for being the primary key agreement used by Silent Circle, a new encrypted voice/video/text service launched by PGP inventor Phil Zimmermann and some other bright folks. But it’s also used in other secure VoIP solutions, like Zfone and Moxie’s RedPhone.

Silent Circle’s an interesting case, since it’s gotten some gentle criticism lately from a variety of folks — well, mostly Nadim Kobeissi — for being partially closed-source and for having received no real security audits. Nadim’s typical, understated critique goes like this:

And is usually followed by the whole world telling Nadim he’s being a jerk. Eventually a tech reporter notices the fracas and chimes in to tell us the whole Infosec community is a bunch of jerks:

And the cycle repeats, without anyone having actually learned much at all. (Really, it’s enough to make you think Twitter isn’t the right place to get your Infosec news.)

Now, as unhelpful as all this is, maybe we can make lemonade and let all of this serve as a teaching moment. For one thing, it gives us a (wonky) chance to learn a little bit about this ZRTP protocol that so many people seem to be using.

Overview of ZRTP

The ZRTP protocol is fully laid out in RFC 6189. This is one of the more confusing specs I’ve read — partly because the critical information is spread out across so many different sections, and partly because ZRTP seems determined to address every possible attack simultaneously.

Fortunately the Intro does a good job of telling us what the protocol’s up to:

ZRTP is a key agreement protocol that performs a Diffie-Hellman key exchange during call setup in the media path and is transported over the same port as the Real-time Transport Protocol (RTP) media stream which has been established using a signaling protocol such as Session Initiation Protocol (SIP). This generates a shared secret, which is then used to generate keys and salt for a Secure RTP (SRTP) session.

So: simple enough. ZRTP lets us establish keys over an insecure channel using Diffie-Hellman.

However we all know that Diffie-Hellman isn’t secure against active (Man-in-the-Middle) attacks. Normally we prevent these by signing Diffie-Hellman messages using keys distributed via a PKI. ZRTP is having none of it:

Although it uses a public key algorithm, [ZRTP] does not rely on a public key infrastructure (PKI) … [ZRTP] allows the detection of man-in-the-middle (MiTM) attacks by displaying a short authentication string (SAS) for the users to read and verbally compare over the phone. … But even if the users are too lazy to bother with short authentication strings, we still get reasonable authentication against a MiTM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call’s DH shared secret, giving it key continuity properties analogous to Secure SHell (SSH).

So our protection is twofold: (1) every time we establish a connection with some remote party, we can verbally compare a “short authentication string” (SAS) to ensure that we’ve both agreed on the same key. Assuming that our attacker isn’t a voice actor, this should let us easily detect a typical MiTM. And just in case we’re too lazy to bother with this, even completing one ‘un-attacked’ connection leaves us with (2) a long-term shared secret that we can use to validate future connection attempts.

So is this a reasonable model? Well, you can draw your own conclusions — but it works fine for me. Moreover, I’m willing to accept just about any assumption that allows us to not think about the ‘Bill Clinton’ or ‘Rich Little attacks‘. So let’s just assume that this voice thing works… and move on to the interesting bits.

The Guts of the Protocol

ZRTP is an interaction between two parties, defined as an Initiator and a Responder. This figure from the spec illustrates the flow of a typical transaction:

Note that the identity of the party acting as the Initiator is determined during the protocol run — it’s the one that sends the Commit message (F5). The protocol breaks down into roughly four phases:

  1. Discovery and protocol negotiation (F1-F4), in which the parties start up a protocol transaction and agree on a supported ZRTP version and ciphersuites.
  2. Diffie-Hellman key establishment (F5-F7). This is almost ‘missionary position’ Diffie-Hellman, with one exception (the F5 message), which we’ll talk more about in a second.
  3. Key confirmation (F8-F10), in which the parties verify that they’ve agreed on the same key.
  4. Secure communication. That’s the last section, labeled “SRTP begins”.

Discovery and Protocol Negotiation

Messages F1-F4 are responsible for setting up a ZRTP connection. This stuff is (almost) entirely non-cryptographic, which means this is the part of the protocol where stuff is most likely to go wrong.

Let me explain: prior to completing the Diffie-Hellman key exchange in messages F5-F7, we can’t assume that Alice or Bob share a cryptographic key yet. Without a key, they can’t authenticate their messages — at least not until after the Diffie-Hellman transaction is complete. That means an attacker can potentially re-write any portion of those messages and get away with it. At least for a while.
So what’s in those messages? Well, just a couple of small things:
  1. A 4-character string containing the version of the ZRTP protocol.
  2. A Client Identifier string (cid), which is 4 words long and identifies the vendor and release of the ZRTP software.
  3. The 96-bit-long unique identifier for the ZRTP endpoint (ZID).
  4. A Signature-capable flag (S) indicates this Hello message is sent from a ZRTP endpoint which is able to parse and verify digital signatures.
  5. The MiTM flag (M), which has something to do with PBXes.
  6. The Passive flag (P), which is set to true if and only if this Hello message is sent from a device that is configured to always act as a Responder (not Initiator).
  7. The supported Hash algorithms, Cipher algorithms (including Diffie-Hellman handshake type), SRTP Auth Tag Types, Key Agreement Types, and SAS Types.
Which is to say: quite a lot! And some of these values are pretty critical. Changing the version number might allow an attacker to roll us back to an earlier (potentially insecure) version of the protocol. Changing the ciphers might (theoretically) allow us to switch to a weaker set of Diffie-Hellman parameters, hash function or Short Authentication String algorithm.
At this point a careful protocol reviewer will be asking ‘what does ZRTP does to prevent this?’ ZRTP gives us several answers, both good and not-so-good:
  • On the bad side, ZRTP allows us to send a hash of the Initiator’s Hello message through a separate integrity-protected channel, e.g.,secure SIP. (This protection gets folded on into later messages using a crazy hash-chain and MAC construction.) In general I’m not a fan of this protection — it’s like saying a life-preserver will keep you alive… as long as you have a separate lifeboat to wear it in. If you have a secure channel in the first place, why are you using ZRTP? (Even the ZRTP authors seem a little sheepish about this one.)
  • On the confusing side, Hello messages are MACed using a MAC key that isn’t revealed until the subsequent message (e.g., Commit). In theory this means you can’t forge a MAC until the Commit message has been delivered. In practice, an MiTM attacker can just capture both the Initiator Hello (F3) and the Commit message (F5), learn the MAC key, and then forge the Hello message to its heart’s content… before sending both messages on to the recipient. This entire mechanism baffles me. The less said about it the better.
  • A more reasonable protection comes from the key derivation process. In a later phase of the protocol, both ZRTP parties compute a hash over the handshake messages. This value is folded into the Key Derivation Function (KDF) that’s ultimately to compute session keys. The hashing process looks like:  total_hash = hash(Hello of responder || Commit ||                  DHPart1 || DHPart2)

But… zounds! One important message is missing: the Initiator’s Hello (F3). I can’t for the life of me figure out why this message would be left out. And unless there’s something really clever that I’m missing, this means an attacker can tamper with the Initiator’s Hello message without affecting the key at all.*So is this a problem? Well, in theory it means you could roll back any field in the Initiator Hello message without detection. It’s not exactly clear what practical benefit this would have. You could certainly modify the Diffie-Hellman parameters or ciphers to turn off advanced options like ECDH or 256-bit AES. Fortunately ZRTP does not support ‘export weakened‘ ciphers, so even the ‘weak’ options are pretty strong. Still: this seems like a pretty big oversight, and possibly an unforced error.

For the moment, let’s file it under ‘this protocol is really complicated‘ or ‘don’t analyze protocols at Thanksgiving‘. At very least I think this could use a good explanation.

(Update 11/25: After some back and forth, Moxie points out that the Hash, SAS and Cipher information is repeated in the Commit message [F5], so that should provide an extra layer of protection against rollback on those fields — but not the other fields like version or signature capability. And of course, an implementation might have the Responder prune its list of supported algorithms by basing it off the Initiator’s list, which would be a big problem.)
Diffie-Hellman Key EstablishmentAssuming that the two parties have successfully completed the negotiation, the next phase of the protocol requires the two parties to establish a shared key. This is done using (nearly) straight-up Diffie-Hellman. The parameters are negotiated in the previous phase, and are drawn from a list defined by RFC3526 and a NIST publication.
Normally a Diffie-Hellman agreement would work something like this (all exponentiation is in a group, i.e., mod p):

  1. Alice picks a, sends g^a (message F6).
  2. Bob picks b, sends g^b (message F7).
  3. Both parties compute g^ab and HMAC this value together with lots of other things to derive a session key s0.
  4. The parties further compute a 32-bit Short Authentication value as a function of s0, convert this into a human-readable Short Authentication String (SAS), and compare notes verbally.

The problem with the traditional Diffie-Hellman protocol is that it doesn’t play nice with the Short Authentication String mechanism. Say an MiTM attacker Eve has established a connection with Bob, during which they agreed on SAS value X. Eve now tries to run the Diffie-Hellman protocol with Alice. Once Alice sends her choice of g^a, Eve can now run an offline attack, trying millions of candidate b values until she gets one such that the derived SAS between her and Alice is also X.This is a problem, since Alice and Bob will now see the same SAS, even though Eve is duping them.ZRTP deals with this by forcing one party (Bob) to commit to g^b before the protocol even begins. Thus, in ZRTP Bob picks b and sends H(g^b) before Alice sends her first message. This ‘commitment’ is transmitted in the “Commit” message (F5).

This prevents Bob from ‘changing his mind’ after seeing Alice’s input, and thus the remote party has at most one chance to get a colliding SAS per protocol run. Which means Eve is (probably) out of luck.**

Key Confirmation & Secure Communication

The rest of the protocol does all of the stuff you expect from a key agreement protocol: the two parties, having successfully completed a Diffie-Hellman exchange, now derive a session key by HMACing together the value g^ab with total_hash and any pre-shared secrets they hold from older sessions. If all goes well, the result should be a strong secret that only the two parties know — or an SAS mismatch that reveals Eve’s tampering.

From this point it’s all downhill, or it should be at least. Both parties now have a shared secret that they can use to derive secure encryption and MAC keys. They now construct “Confirm” messages (F8, F9), which they encrypt and (partially) MAC. In principle this exchange gives both parties a chance to detect a mismatch in keys and call the whole thing off.There’s not a ton to say about this section except for one detail: the MAC on these messages is computed only over the encrypted portion (between the == signs below), and leaves out critical details like the Initialization Vector that’s used to encrypt them: ***

Structure of a ZRTP “Confirm” message (source).

Once again it’s not clear if there’s any practical impact here, but at least technically speaking, someone could tamper with this IV and thus change the decryption of the message (specifically, the first 64 bits of H0). I have no idea if this matters — or even if it’s a real attack — but in general it seems like the kind of thing you want to avoid. It’s another example of a place where I just plain don’t understand ZRTP.

Conclusion

I think it should be obvious at this point that I have no real point with this post — I just love protocols. If you’re still reading at this point, you must also love protocols… or else you did something really wrong, and reading this post is your punishment.

ZRTP is a fun protocol to analyze, since it’s a particularly complicated spec that deals with many use-cases at once. Plus it’s an old protocol and it’s been subjected to lots of analysis (including analysis by robots). That’s why I’m so surprised to see loose ends at this date, even if they’re not particularly useful loose ends.

Regardless of the results, anyone who’s interested in cryptography should try their hands at this from time to time. There are so many protocols that we rely on in our day-to-day existence, and far too few of these have really been poked at.

Notes:

* An older paper by Gupta and Shmatikov also notes the lack of authentication for ZID messages, and describes a clever attack that causes the peers to not detect mutually shared secrets. This seems to be fixed in later versions of the protocol, since the ZID values are now included in the Key Derivation Function. But not the rest of the fields in the Initiator Hello.

*** This is another place where the specification is a bit vague. It says that the MAC is computed over the “encrypted part of the message”, but isn’t entirely clear if the MAC is computed pre- or post- encryption. If it’s pre- encryption this is a bit non-standard, but not a major concern.

Reposted: A cryptanalysis of HDCP v2.1

Update 8/27: This post was originally published three weeks ago under a different title. I subsequently took it down to give affected vendors time to patch the bugs. As a result of the notification, Digital Content Protection LLC (DCP) has updated the spec to v2.2. 

Contrary to my understanding when I wrote the original post, HDCP v2 actually is used by a number of devicesI would like to give credit to Alon Ziv at Discretix, who had previously discovered the Locality Check issue, and to Martin Kaiser who experimentally verified the master secret issue on a European Samsung TV and a Galaxy S II.

Finally, I would like to thank Hanni Fakhoury and Marcia Hofmann at the Electronic Frontier Foundation for all of their helpful advice. The EFF is one of the only organizations that represents security researchers. Please consider donating so they can keep doing it!

Over the past couple of weeks I’ve mostly been blogging about inconsequential things. Blame summer for this — it’s hard to be serious when it’s 104 degrees out. But also, the world just hasn’t been supplying much in the way of interesting stuff to write about.

Don’t get me wrong, this is a good thing! But in a (very limited) way it’s also too bad. One of the best ways to learn about security systems is to take them apart and see how they fail. While individual systems can be patched, the knowledge we collect from the process is invaluable.

Fortunately for us, we’re not completely helpless. If we want to learn something about system analysis, there are plenty of opportunities right out there in the wild. The best place to start is by finding a public protocol that’s been published, but not implemented yet. Download the spec and start poking!

This will be our task today. The system we’ll be looking at is completely public, and (to the best of my knowledge) has not yet been deployed anywhere (Update: see note above). It’s good practice for protocol cryptanalysis because it includes all kinds of complicated crypto that hasn’t been seriously reviewed by anyone yet.

(Or at least, my Google searches aren’t turning anything up. I’m very willing to be corrected.)

Best of all, I’ve never looked at this system before. So whatever we find (or don’t find), we’ll be doing it together.

A note: this obviously isn’t going to be a short post. And the TL;DR is that there is no TL;DR. This post isn’t about finding bugs (although we certainly will), it’s about learning how the process works. And that’s something you do for its own sake.

HDCPv2

The protocol we’ll be looking at today is the High Bandwidth Digital Content Protection (HDCP) protocol version 2. Before you get excited, let me sort out a bit of confusion. We are not going to talk about HDCP version 1, which is the famous protocol you probably have running in your TV right now.

HDCPv1 was analyzed way back in 2001 and found to be wanting. Things got much worse in 2010 when someone leaked the HDCPv1 master key — effectively killing the whole system.

What we’ll be looking at today is the replacement: HDCP v2. This protocol is everything that its predecessor was not. For one thing, it uses standard encryption: RSA, AES and HMAC-SHA256. It employs a certificate model with a revocation list. It also adds exciting features like ‘localization’, which allows an HDCP transmitter to determine how far away a receiver is, and stop people from piping HDCP content over the Internet. (In case they actually wanted to do that.)

HDCPv2 has barely hit shelves yet (Update: though it was recently selected as the transport security for MiraCast). The Digital Content Protection licensing authority has been keeping a pretty up-to-date set of draft protocol specifications on their site. The latest version at the time of this writing is 2.1, and it gives us a nice opportunity to see how industry ‘does’ protocols.

An overview of the protocol

As cryptographic protocols go, HDCPv2 has a pretty simple set of requirements. It’s designed to protect  high-value content running over a wire (or wireless channel) between a transmitter (e.g., a DVD player) and a receiver (a TV). The protocol accomplishes the following operations:

  1. Exchanging and verifying public key certificates.
  2. Establishing shared symmetric keys between the transmitter and receiver.
  3. Caching shared keys for use in later sessions.
  4. Verifying that a receiver is local, i.e., you’re not trying to proxy the data to some remote party via the Internet.

These functions are accomplished via three (mostly) separate protocols: a public-key Authenticated Key Agreement (AKE) protocol, a pairing protocol, where the derived key is cached for later use, and a locality check protocol to ensure that the devices are physically close.

I’m going to take these protocols one at a time, since each one involves its own messages and assumptions.

Phase (1): Authenticated Key Agreement (AKE)

The core of HDCPv2 is a custom key exchange protocol, which looks quite a bit like TLS. (In fact, the resemblance is so strong that you wonder why the designers didn’t just use TLS and save a lot of effort.) It looks like this:

 

HDCPv2 key agreement protocol (source). Click the image to enlarge.

Now, there’s lots going on here. But if we only look at the crypto, the summary is this:

The transmitter starts by sending ‘AKE_Init’ along with a random 64-bit nonce R_tx. In response, the receiver sends back its certificate, which contains its RSA public key and device serial number, all signed by the HDCP licensing authority.

If the certificate checks out (and is not revoked), the transmitter generates a random 128-bit ‘master secret’ K_m and encrypts it under the receiver’s public key. The result goes back to the receiver, which decrypts it. Now both sides share K_m and R_tx, and can combine them using a wacky custom key derivation function. The result is a shared a session key K_d.

The last step is to verify that both sides got the same K_d. The receiver computes a value H’, using HMAC-SHA256 on inputs K_d, R_tx and some other stuff. If the receiver’s H’ matches a similar value computed at the transmitter, the protocol succeeds.

Simple, right?

Note that I’ve ignored one last message in the protocol, which turns out to be very important. Before we go there, let’s pause and take stock.

If you’re paying close attention, you’ve noticed a couple of worrying things:

  1. The transmitter doesn’t authenticate itself at all. This means anyone can pretend to be a transmitter.
  2. None of the handshake messages (e.g., AKE_Transmitter_Info) appear to be authenticated. An attacker can modify them as they transit the wire.
  3. The session key K_d is based solely on the inputs supplied by the transmitter. The receiver does generate a nonce R_rx, but it isn’t used in the above protocol.
None of these things by themselves are a problem, but they make me suspicious.

Phase (2): Pairing

Public-key operations are expensive. And you only really need to do them once. The designers recognized this, and added a feature called ‘pairing’ to cache the derived K_m for use in later sessions. This is quite a bit like what TLS does for session resumption.

However, there’s one catch, and it’s where things get complicated: some receivers don’t have a secure non-volatile storage area for caching keys. This didn’t phase the designers, who came up with a ‘clever’ workaround for the problem: the receiver can simply ask the transmitter to store K_m for it.

To do this, the receiver encrypts K_m under a fixed internal AES key K_h (which is derived by hashing the receiver’s RSA private key). In the last message of the AKE protocol the receiver now sends this ciphertext back to the transmitter for storage. This appears in the protocol diagram as the ciphertext E(K_h, K_m).

The obvious intuition here is that K_m is securely encrypted. What could possibly go wrong? The answer is to ask how K_m is encrypted. And that’s where things get worrying.

According to the spec, K_m is encrypted using AES in what amounts to CTR mode, where the ‘counter’ value is defined as some value m. On closer inspection, m turns out to be just the transmitter nonce R_tx padded with 0 bits. So that’s simple. Here’s what it looks like:

Encryption of the master key K_m with the receiver key K_h. The value m is equal to (R_tx || 0x000000000000000).

Now, CTR is a perfectly lovely encryption mode provided that you obey one unbreakable rule: the counter value must never be re-used. Is that satisfied here? Recall that the counter m is actually chosen by another party — the transmitter. This is worrying. If the transmitter wants, it could certainly ask the receiver to encrypt anything it wants under the same counter.

Of course, an honest transmitter won’t do this. But what about a dishonest transmitter? Remember that the transmitter is not authenticated by HDCP. The upshot is that an attacker can pretend to be a transmitter, and submit her own K_m values to be encrypted under K_h and m.

Even this might be survivable, if it weren’t for one last fact: in CTR mode, encryption and decryption are the same operation.

All of this leads to the following attack: 

  1. Observe a legitimate communication between a transmitter and receiver. Capture the values R_tx and E(K_h, K_m) as they go over the wire.
  2. Now: pretend to be a transmitter and initiate your own session with the receiver.
  3. Replay the captured R_tx as your initial transmitter nonce. When you reach the point where you pick the master secret, don’t use a random value for K_m. Instead, set K_m equal to the ciphertext E(K_h, K_m) that you captured earlier. Recall that this ciphertext has the form:AES(K_h, R_Tx || 000…) ⊕ K_m  
     
    Now encrypt this value under the receiver’s public key and send it along.
  4. Sooner or later the receiver will encrypt the ‘master secret’ you chose above under its internal key K_h. The resulting ciphertext can be expanded to:  
    AES(K_h, R_Tx || 000…) ⊕ AES(K_h, R_Tx || 000…) 
    ⊕ K_m
Thanks to the beauty of XOR, the first two terms of this ciphertext simply cancel out. The result is the original K_m from the first session! Yikes!

This is a huge problem for two reasons. First, K_m is used to derive the session keys used to encrypt HDCP content, which means that you may now be able to decrypt any past HDCP content traces. And even worse, thanks to the ‘pairing’ process, you may be able to use this captured K_m to initiate or respond to further sessions involving this transmitter.
 

Did I mention that protocols are hard?

Phase (3): The Locality Check

For all practical purposes, the attack above should be our stopping point. Once you have the stored K_m you can derive the session keys and basically do whatever you want. But just for fun, let’s go on and see what else we can find.

At its heart, the locality check is a pretty simple thing. Let’s assume the transmitter and receiver are both trusted, and have successfully established a session key K_d by running the AKE protocol above. The locality check is designed to ensure that the receiver is nearby — specifically, that it can provide a cryptographic response to a challenge, and can do it in < 7 milliseconds. This is a short enough time that it should prevent people from piping HDCP over a WAN connection.

(Why anyone would want to do this is a mystery to me.)

 
In principle the locality check should be simple. In practice, it turns out to be pretty complicated. Here’s the ‘standard’ protocol:
Simple version of the locality check. K_d is a shared key and R_rx is a receiver nonce.
Now this isn’t too bad: in fact, it’s about the simplest challenge-response protocol you can imagine. The transmitter generates a random nonce R_n and sends it to the receiver. The receiver has 7 milliseconds to kick back a response L’, which is computed as HMAC-SHA256 of {the session key K_d, challenge nonce R_n, and a ‘receiver nonce’ R_rx}. You may recall that the receiver nonce was chosen during the AKE.
 
So far this looks pretty hard to beat.
 

But here’s a wrinkle: some devices are slow. Consider that the 7 milliseconds must the round-trip communication time, as well as the time required to compute the HMAC. There is a very real possibility that some slower, embedded devices might be not be able to respond in time.

Will HDCP provide a second, optional protocol to deal with those devices? You bet it will.

The second protocol allows the receiver to pre-compute the HMAC response before the timer starts ticking. Here’s what it looks like:
 
‘Precomputed’ version of the protocol.

This is nearly the same protocol, with a few small differences. Notably, the transmitter gives the receiver all the time it wants to compute the HMAC. The timer doesn’t start until the receiver says it’s ready.

Of course, there has to be something keeping the RTT under 7ms. In this case the idea is to keep the receiver from speaking until it’s received some authenticator from the transmitter. This consists of the least significant 128-bits of the expected HMAC result (L’), which is computed in the same way as above. The receiver won’t speak until it sees those bits. Then it‘ll it kick back its own response, which consists of the most-significant 128 bits of the same value.

Ok, so here we have a protocol that’s much more complicated. But considered its own, this one looks pretty ok by me.

But here’s a funny question: what if we try running both protocols at once?

No, I’m not being ridiculous. You see, it turns out that the receiver and transmitter get to negotiate which protocol they support. By default they run the ‘simple’ protocol above. If both support the pre-computed version, they must indicate this in the AKE_Transmitter_Info and AKE_Receiver_Info messages sent during the handshake.

This leads to the following conjecture: what if, as a man-in-the-middle attacker, we can convince the transmitter to run the ‘pre-computed’ protocol. And at the same time, convince the receiver to run the ‘simple’ one? Recall that none of the protocol flags (transmitted during the AKE) are authenticated. We might be able to trick both sides into seeing a different view of the other’s capabilities.

Here’s the setup: we have a receiver running in China, and a transmitter located in New York. We’re a man-in-the-middle sitting next to the transmitter. We want to convince the transmitter that the receiver is close — close enough to be on a LAN, for example. Consider the following attack:

  1. Modify the message flags so that the transmitter thinks we’re running the pre-computed protocol. Since it thinks we’re running the pre-computed protocol, it will hand us R_n and then give us all the time in the world to do our pre-computation.
  2. Now convince the receiver to run the ‘simple’ protocol. Send R_n to it, and wait for the receiver to send back the HMAC result (L’).
  3. Take a long bath, mow the lawn. Watch Season 1 of Game of Thrones.
  4. At your leisure, send the RTT_READY message to the transmitter, which has been politely waiting for the receiver to finish the pre-computation
  5. The transmitter will now send us some bits. Immediately send it back the most significant bits of the value L’, which we got in step (2).
  6. Send video to China.

Now this attack may not always work — it hinges on whether we can convince the two parties to run different protocols. Still, this is a great teaching example in that it illustrates a key problem in cryptographic protocol design: parties may not share the same view of what’s going on

A protocol designer’s most important job is to ensure that such disagreements can never happen. The best way to do this is to ensure that there’s only one view to be had — in other words, dispense with all the options and write a single clear protocol. But if you must have options, make sure that the protocol only succeeds if both sides agree on what those options are. This is usually accomplished by authenticating the negotiation messages, but even this can be a hard, hard problem.Compared to the importance of learning those lessons, actually breaking localization is pretty trivial. It’s a stupid feature anyway.

In Conclusion

This has been a long post. To the readers I have left at this point: thanks for sticking it out.The only remaining thing I’d like to say is that this post is not intended to judge HDCPv2, or to make it look bad. It may or it may not be a good protocol, depending on whether I’ve understood the specification properly and depending on whether the above flaws make it into real devices. Which, hopefully they won’t now.

What I’ve been trying to do is teach a basic lesson: protocols are hard. They can fail in ruinous, subtle, unexpected, exciting ways. The best cryptographers — working with BAN logic analyzers and security proofs — still make mistakes. If you don’t have those tools, steer clear.

The best ‘fix’ for the problem is to recognize how dangerous protocols can be,and to avoid designing your own. If you absolutely must do so, please try to make yours as simple as possible. Too many people fail to grok this lesson, and the result is, well, HDCPv2.

===

Update 8/27: As I mentioned above, DCP has released a new version of the specification. Version 2.2 includes several updates: it changes the encryption of Km to incorporate both the Transmitter and Receiver nonces. It also modifies the locality check to patch the bug described above. Both of these changes appear to mitigate the bugs above, at least in new devices.

A missing post (updated)

Update (8/27): I’ve put the original post back up.

Update (8/9): I’ve re-written this post to include a vague, non-specific explanation of the bug. I’ve now confirmed the problem with one vendor, who has asked for a week to issue a patch. So far I haven’t had a response from the DCP’s technical people. And yes, I do realize someone PasteBinned the original post while it was up.

A few people have asked what happened to the post that was in this space just a few hours ago. No, you’re not going crazy. It was here.

The post contained a long, detailed evaluation of the HDCP v2 protocol. My idea was to do real-time evaluation of an industry protocol that hasn’t been deployed yet — a kind of ‘liveblogging’ cryptanalysis. What I expected to find was some bad practices, which I would gently poke fun at. I didn’t expect to find anything serious.

I was wrong in that initial judgement, with some caveats. I’m going to give a vague and non-specific summary here, and I hope to re-post the detailed technical post in a few days when I’ve heard (something, anything!) from DCP, the organization that maintains HDCP.

In case you’ve never heard of it, HDCP is a security protocol used to ‘protect’ video traveling over wired and wireless networks. There are two versions. Version 1 is in your TV today, and was seriously compromised in 2010. Version 2 is much better, but has only been deployed in a few products — including those that implement MiraCast (formerly Wi-Fi Display).

Version 2 contains a key agreement protocol that’s designed to establish a session encryption key between a transmitter (your phone, for example) and a receiver (a TV). Once this key is established, the transmitter can encrypt all video data going over the wire.

What I discovered in my brief analysis is a flaw in the key agreement protocol that may allow a man-in-the-middle to recover the session key (actually the ‘master secret’ used to derive the session key). This could potentially allow them to decrypt content. More on that in a minute, though.

I also discovered some slightly less serious flaws elsewhere in the protocol. It turns out that the DCP already knows about those, thanks to some enterprising work by a smart guy at an unnamed vendor (who deserves credit, and will get it once I put the original post back up).

Now for a few big caveats about the session key bug.

The bug I found does not get you all the way to decrypting HDCPv2 streams in practice, thanks to a tiny additional protection I missed while writing the original post. I don’t think much of this protection, since it involves a secret that’s stored in every single HDCPv2-compliant device. That’s a pretty lousy way to keep a secret.

And of course I haven’t personally verified this in any real HDCP devices, since I don’t own any. Although if I did, I could use this nifty HDCP plugin for WireShark to do some of the work.

The issue has been confirmed by one vendor, who is working on a patch for their product. Their products are used in real things that you’ve heard of, so I’m trusting that they’d know.

The last thing I want to address is why I published this, and why I subsequently pulled it.

When I wrote the original post I thought HDCP v2 was just a ‘paper spec’ — that there were no devices actually using it. Shortly after posting, I came across one commercial product that does claim to support HDCPv2. Later I discovered a few others. To be on the safe side, I decided to pull the post until I could notify the vendors. Then through sheer ineptitude I briefly re-posted it. Now I’m doing my best to put the toothpaste back in the tube.

As soon as I get some feedback I intend to put the post back up. A post which, incidentally, was not intended to break anything, but rather to serve as a lesson in just how complicated it is to design your own protocol. I suppose it’s achieved that goal.

Anyway, I’m putting this up as a placeholder in case you’re curious about what happened or why the heck I’m not blogging. Writing a long technical post and then having to can it is a drag. But hopefully we’ll be back to our regularly-scheduled programming in no time at all.

Where Things Fall Apart: Protocols (Part 2 of 2)

This is the fourth installment in a series on cryptographic engineering.  You can see the previous posts here:
 

Test Drive

Your vehicle ignition system is the result of a Darwinian arms race between the people who build cars — and those who like to drive them.

Vehicle ignition switches through history.  From left: vintage floor-mounted ignition button; 60s-era cylinder lock; 90s/2000-era high security key; 2011 dash-mounted ignition button.

The very first electronic vehicle ignition was nothing more than a switch that completed an electrical circuit. This worked fine in small towns and out on the farm, but things weren’t so simple in the big city. So manufacturers adapted, first adding a mechanical lock cylinder then hardening the wiring. This worked for a while, until inevitably the thieves got smarter. Worse, at this point the answer wasn’t so obvious: ignition lock technology stagnated. By the late 1980s and early 1990s, vehicle theft was a multi-billion dollar industry.

A few luxury manufacturers tried to improve the physical security of their locks using high-security keys and non-standard templates. But for most manufacturers, however, there was already a more promising approach at hand. Cars themselves were becoming reliant on microcontrollers for engine control. Why not use a digital lock?

The result is the vehicle immobilizer. A typical first-gen immobilizer used a small chip embedded into the head of the car key. This chip had a single purpose: when the driver inserted the key into the lock cylinder, it would squirt out a code (or serial number), which could be received by an antenna in the lock cylinder. If this code matched what the vehicle expected to see, the engine would start. Expressed as a protocol, the transaction looked like this:

Immobilizers effectively shut down traditional hotwiring and lock-picking. But they had a fatal flaw that criminals soon discovered. Since the code never changed, someone with the right equipment could eavesdrop on the communication (or borrow your keys), and later replay it to the car. This sounds complicated, but quickly became practical thanks to inexpensive devices called “code-grabbers”.

Once again manufacturers adapted. The next generation of immobilizers dropped the fixed key in favor of a simple challenge/response authentication protocol. In this approach, the immobilizer chip and car share a cryptographic key of some sort. When you insert your car key, the car generates a random “challenge” number and sends it to the key. The chip in your car key uses the cryptographic key to compute a response based on the challenge. This tops code-grabbers, since the key itself never goes over the air, and the challenge always changes.

Challenge response protocol between vehicle and immobilizer key.  The key and car share a deterministic cryptographic algorithm F() and a secret key. The car computes F(key, challenge) and compares it to the response value.
So we’ve laid out a cryptographic protocol, but it’s a simple one, and one that’s likely to be effective. What could go wrong?

40 bits of personal history

Back in 2005, along with some colleagues, I looked at the immobilizer system used in a few million Ford, Toyota and Nissan vehicles. This particular system was based on a Texas Instruments chip called the Digital Signature Transponder (DST), a tiny RFID chip with a cipher F() and a 40-bit secret key.

Two DST form factors.  The big one is a Mobil Speedpass, which also relies on the DST technology.

The DST uses exactly the challenge-response protocol I describe above. The reader (car) sends it a 40 bit challenge, the DST encrypts that value with its cipher, truncates the result down to a 24 bit response, ships it back. The car also has a copy of the secret key which it uses to verify the response.

The problem with the DST is not the protocol. Rather, it’s the number I mentioned above: 40. As in 40-bit key length. If an adversary — say, a malicious parking attendant — borrows your car key, she can issue a challenge to the chip. After collecting the response, she can, at her leisure, test every single one of the approximately 1.1 trillion possible Immobilizer keys until they find one where F(key, challenge) is equal to the response they got from your DST chip.** This sounds hard, but it only takes a few hours on an FPGA.

This process is called “cloning”. It’s not the scariest attack since, in general, it requires the adversary to get your car key, or at least get close enough to scan it.

Now we come to the interesting part. Texas Instruments was well aware of the DST’s keysize limitation. To foil this attack, they deployed a new chip called the DST+. Rather than simply replace the weak 40-bit algorithm with a better one, which would have solved the problem, they decided to address cloning attacks using a protocol.
DST+ Mutual Authentication protocol.  From a presentation in the  Fourth Conference on the Advanced Encryption Standard (AES) (2004).
What I know about the DST+ protocol comes from a public presentation posted by a Dr. Ulrich Kaiser from Texas Instruments Germany. I freely admit that I’ve never verified this diagram against a real DST+, so caveat emptor.
The diagram is a little confusing: let’s walk through it step by step. The car (“Reader”) is on the left, and the DST+ chip (“Transponder”) is on the right. For the most part it’s exactly the same as the DST: the car generates a 40-bit challenge and sends it over to the chip.  The chip encrypts the challenge under its secret (“Immobilizer”) key, truncates the result down to 24 bits, and sends back the response.
 
The new stuff is “tacked on” to the original protocol.  long with the challenge, the car transmits an extra 24 bits that it derives by: 1) encrypting its own challenge under the Immobilizer key, 2) encrypting the result again under a second “Mutual Authentication Key” (MAK), and 3) truncating that  result down to 24 bits.
Since the DST+ chip shares both keys, it can verify that the car’s transmission is correct before it responds. If the value isn’t correct, the DST+ clams up. No response for you!  
In principle this stumps our malicious valet.  He doesn’t know the right keys.  He can send the DST+ all the challenges he wants — but it won’t answer him.  No responses, no attack.

All’s well that ends well?

A first observation is that the DST+ protocol only protects against challenges sent by an unauthorized reader. If our valet can eavesdrop on the communication between the DST+ and the legitimate reader in the car, he can still obtain a (challenge, response) pair.  Since these values are identical to those in the original DST protocol, the same attacks apply. He can use an FPGA to brute force the 40-bit Immobilizer key.

Here’s something else. Once he’s got the car’s Immobilizer key, he can go back and find the Mutual Authentication Key (MAK).  Given the challenge sent by the car, along with the 24-bit “additional authentication” string, he can:

  1. compute I = F(Immobilizer key, challenge),
  2. use the FPGA to test every single possible MAK value
  3. stop when he finds a MAK value s.t. F(MAK, I) matches the “additional authentication”.
This might seem a little pointless. After all, we already have the Immobilizer key, which is all we need to simulate a DST+ and thus start the car. Why bother going back for the MAK?

Into hypothetical territory

Yet imagine… What if a car manufacturer made a tiny mistake.  What if, speaking hypothetically, the manufacturer decided to use a single MAK across many different cars — say, every 2009 Toyota Camry? A tiny, harmless optimization.

And yet.

We know that knowledge of the Immobilizer Key makes it easy to find the car’s MAK.  But this works the other way, too. If many cars share a MAK, then anyone who knows that value can use it to derive the Immobilizer key for a car.

Even better (or worse, depending on your point of view) our attacker can do this without ever seeing the car key at all.  All you need is a challenge value and “additional authentication” value, both of which the car will give to you. The owner can be fast asleep with his keys safe on his nightstand next to him. You’re outside stealing his car.

So in other words, if you use the DST+ mutual authentication key, and make the small mistake of re-using a MAK across multiple vehicles, you’ve transformed a mild key cloning attack into something much worse. People can now steal your car even without scanning your key.

Please keep in mind that all of this is hypothetical and speculative. But the re-use of a MAK key could happen.  There’s evidence that it may have, at least in the past. What it goes to show that if you’re not very careful about your goals and security properties, protocols can do unexpected things.  They can make you less secure.

Rolling it up

These posts were not intended to be an in-depth tutorial on the mysteries of protocol design and analysis.  I do hope to talk about that more in the future.  So far we’ve barely scratched the surface of what can go wrong in a cryptographic protocol.  And these are certainly not the best examples of “bad” protocols.

Instead, the purpose of this discussion was to provide a couple of case studies involving real protocols whose failure has implications for millions of people.  It was also to show you how tiny changes to a protocol can have a significant impact.

In the next few installment of this overview series we’ll look a bit at hardware, physical security, and the kind of things that can go wrong even when you build the best machines with the best intentions.

Footnotes:

** Note, since the response is only 24 bits long, there’s a possibility of collisions, i.e., many different keys will satisfy truncate(F(key, challenge)) == response.  To deal with this problem, all you need to do is obtain a second (challenge, response) pair and weed out the false positives.

Where Things Fall Apart: Protocols (Part 1 of 2)

This is the third installment in a series on cryptographic engineering.  You can see the previous posts here:
 

Protocols

Once upon a time there were no cryptographic protocols. Most cryptographic communication took the form of one-move operations like this one:

 

There are plenty of things that can go wrong in this scenario. But if you’re the sender (as opposed to the sap carrying the message) you could restrict your concerns to the security of the primitive (e.g., cipher), or perhaps to the method of key distribution.  Admittedly, through most of history it was almost impossible to get these things “right”.  Still, you knew where your weaknesses were.

Fast forward to present day. Modern cryptographic systems are almost never content with simple one-way communication. Consider the simple act of plugging in your high-definition TV via a digital connection. Thanks to modern rights management protocols such as HDCP and DTCP, this can lead to communication flows like this:

Not only is the communication here vastly more complicated, but it takes place in the presence of a more dangerous adversary.  For one thing, he owns the communication channel and the hardware — it’s probably sitting in his living room or lab.  Most importantly, he’s got time on his hands.  Your typical barbarian only got one chance to capture a ciphertext.  If this guy needs to, he can run the protocol 100,000 times.

Now, most people will tell you that cryptography is one of the most settled areas of computer security.  Relative to the disaster that is software security, this is true. Particularly in when it comes to standardized cryptographic primitives, we’ve made enormous progress — to the point where attacks lag by years, if not decades (though, see my previous post).  If you use reasonable primitives, your attacker will not get you by breaking the algorithm.

As Gibson said, the future is here, but it’s not evenly distributed. And in the field of cryptography, protocol design is one area in which progress has definitely not been as fast.

We’ve learned a huge amount about what not to do.  We have tools that can sometimes catch us when we trip up.  Unfortunately, we still don’t have a widely-accepted methodology for building provably-secure protocols that area as practical as they are safe to use.

Maybe I’m getting ahead of myself.  What do I even mean when I refer to a “cryptographic protocol”?

For the purposes of this discussion, a protocol is an interactive communication between two or more parties. Obviously a cryptographic protocol is one that combines cryptographic primitives like encryption, digital signing, and key agreement to achieve some security goal.

What makes cryptographic protocol special is the threat.  When we analyze cryptographic protocols, we assume that the protocol will be conducted in the presence of an adversary, who (at minimum) eavesdrops on all communications. More commonly we assume that the adversary also has the ability to record, inject and modify communications flowing across the wire.  This feature is what makes cryptographic protocols so much more interesting — and troublesome — than the other types of protocol that crop up in computer networking.

A first example: SSL and TLS

SSL and its younger cousin TLS are probably the best-known security protocols on the Internet. The most recent version is TLS 1.2 and we think it’s a pretty solid protocol now. But it didn’t reach achieve this status in one fell swoop. The history of each SSL/TLS revision includes a number of patches, each improving on various flaws found in the previous version.

The original SSL v1 was a proprietary protocol created by Netscape Inc. back in 1994 to secure web connections between a browser and a web server. SSL v2 was standardized the next year. The protocol is a highly flexible framework by which two parties who have never met before can authenticate each other, agree on transport keys, and encrypt/authenticate data. Part of the flexible nature of the protocol is that it was designed to support many different ciphersuites and modes of operation. For example, it’s possible to configure SSL to authenticate data without encrypting it, even though almost nobody actually does this.

Now let me be clear. There’s nothing wrong with flexibility, per se. The benefits of flexibility, extensibility and modularity are enormous in software design — there’s nothing worse than re-factoring 100,000 lines of production code because the original designers didn’t consider the possibility of new requirements. It’s just that when it comes to building a secure cryptographic protocol, flexibility almost always works against you.

Let’s consider a concrete example. The SSL v2 protocol included a very nice feature known as “ciphersuite negotiation”. This allowed both parties to negotiate which ciphers they wanted to use from a (potentially) distinct list of candidates that each one supported. In practice, some of those ciphers were likely to be better than others. Some were designed to be worse: for example, they were created to support export-controlled browsers that max out at a 40-bit key length.

There’s nothing fundamentally wrong with ciphersuite negotiation. It clearly gives a lot of flexibility to SSL, since the protocol can easily be upgraded to support new ciphersuites without breaking compatibility with older implementations.  From the spec, you can see that this section of the protocol was treated as something of an abstraction by the designers. The negotiation is handled via its own dedicated set of messages.  Looking at this, you can almost hear some long-ago dry erase marker scribbling “ciphersuite negotiation messages go here”.

Unfortunately, in providing all of this flexibility, the designers of SSL v2 essentially created many implicit protocols, each of which could be instantiated by changing the details of the ciphersuite negotiation process. Worse, unlike the messages in the “main” protocol, the ciphersuite negotiation messages exchanged between client and server in the SSLv2 protocol were not authenticated.

In practice, this means that our adversary can sit on the wire in between the two, replacing and editing the messages passing by to make it look like each party only supports the lowest common ciphersuite. Thus, two parties that both support strong 128-bit encryption can be easily tricked into settling on the 40-bit variant.  And 40 bits is not good.

Now, the good news is that this vulnerability is a piece of history.  It was closed in SSL v3, and the fix has held up well through the present day.  But that doesn’t mean TLS and SSL are home free.

This post is continued in the next section: “Where Things Fall Apart: Protocols (Part 2)”.