Falling through the KRACKs

The big news in crypto today is the KRACK attack on WPA2 protected WiFi networks. logo-smallDiscovered by Mathy Vanhoef and Frank Piessens at KU Leuven, KRACK (Key Reinstallation Attack) leverages a vulnerability in the 802.11i four-way handshake in order to facilitate decryption and forgery attacks on encrypted WiFi traffic.

The paper is here. It’s pretty easy to read, and you should.

I don’t want to spend much time talking about KRACK itself, because the vulnerability is pretty straightforward. Instead, I want to talk about why this vulnerability continues to exist so many years after WPA was standardized. And separately, to answer a question: how did this attack slip through, despite the fact that the 802.11i handshake was formally proven secure?

A quick TL;DR on KRACK

For a detailed description of the attack, see the KRACK website or the paper itself. Here I’ll just give a brief, high level description.

The 802.11i protocol (also known as WPA2) includes two separate mechanisms to ensure the confidentiality and integrity of your data. The first is a record layer that encrypts WiFi frames, to ensure that they can’t be read or tampered with. This encryption is (generally) implemented using AES in CCM mode, although there are newer implementations that use GCM mode, and older ones that use RC4-TKIP (we’ll skip these for the moment.)

The key thing to know is that AES-CCM (and GCM, and TKIP) is a stream cipher, which means it’s vulnerable to attacks that re-use the same key and “nonce”, also known as an initialization vector. 802.11i deals with this by constructing the initialization vector using a “packet number” counter, which initializes to zero after you start a session, and always increments (up to 2^48, at which point rekeying must occur). This should prevent any nonce re-use, provided that the packet number counter can never be reset.

The second mechanism you should know about is the “four way handshake” between the AP and a client (supplicant) that’s responsible for deriving the key to be used for encryption. The particular message KRACK cares about is message #3, which causes the new key to be “installed” (and used) by the client.

I’m a four-way handshake. Client is on the left, AP is in the right. (courtesy Wikipedia, used under CC).

The key vulnerability in KRACK (no pun intended) is that the acknowledgement to message #3 can be blocked by adversarial nasty people.* When this happens, the AP re-transmits this message, which causes (the same) key to be reinstalled into the client (note: see update below*). This doesn’t seem so bad. But as a side effect of installing the key, the packet number counters all get reset to zero. (And on some implementations like Android 6, the key gets set to zero — but that’s another discussion.)

The implication is that by forcing the AP to replay this message, an adversary can cause a connection to reset nonces and thus cause keystream re-use in the stream cipher. With a little cleverness, this can lead to full decryption of traffic streams. And that can lead to TCP hijacking attacks. (There are also direct traffic forgery attacks on GCM and TKIP, but this as far as we go for now.)

How did this get missed for so long?

If you’re looking for someone to blame, a good place to start is the IEEE. To be clear, I’m not referring to the (talented) engineers who designed 802.11i — they did a pretty good job under the circumstances. Instead, blame IEEE as an institution.

One of the problems with IEEE is that the standards are highly complex and get made via a closed-door process of private meetings. More importantly, even after the fact, they’re hard for ordinary security researchers to access. Go ahead and google for the IETF TLS or IPSec specifications — you’ll find detailed protocol documentation at the top of your Google results. Now go try to Google for the 802.11i standards. I wish you luck.

The IEEE has been making a few small steps to ease this problem, but they’re hyper-timid incrementalist bullshit. There’s an IEEE program called GET that allows researchers to access certain standards (including 802.11) for free, but only after they’ve been public for six months — coincidentally, about the same time it takes for vendors to bake them irrevocably into their hardware and software.

This whole process is dumb and — in this specific case — probably just cost industry tens of millions of dollars. It should stop.

The second problem is that the IEEE standards are poorly specified. As the KRACK paper points out, there is no formal description of the 802.11i handshake state machine. This means that implementers have to implement their code using scraps of pseudocode scattered around the standards document. It happens that this pseudocode leads to the broken implementation that enables KRACK. So that’s bad too.

And of course, the final problem is implementers. One of the truly terrible things about KRACK is that implementers of the WPA supplicant (particularly on Linux) managed to somehow make Lemon Pledge out of lemons. On Android 6 in particular, replaying message #3 actually sets an all-zero key. There’s an internal logic behind why this happens, but Oy Vey. Someone actually needs to look at this stuff.

What about the security proof?

The fascinating thing about the 802.11i handshake is that despite all of the roadblocks IEEE has thrown in people’s way, it (the handshake, at least) has been formally analyzed. At least, for some definition of the term.

(This isn’t me throwing shade — it’s a factual statement. In formal analysis, definitions really, really matter!)

A paper by He, Sundararajan, Datta, Derek and Mitchell (from 2005!) looked at the 802.11i handshake and tried to determine its security properties. What they determined is that yes, indeed, it did produce a secret and strong key, even when an attacker could tamper with and replay messages (under various assumptions). This is good, important work. The proof is hard to understand, but this is par for the course. It seems to be correct.

Representation of the 4-way handshake from the paper by He et al. Yes, I know you’re like “what?“. But that’s why people who do formal verification of protocols don’t have many friends.

Even better, there are other security proofs showing that — provided the nonces are never repeated — encryption modes like CCM and GCM are highly secure. This means that given a secure key, it should be possible to encrypt safely.

So what went wrong?

The critical problem is that while people looked closely at the two components — handshake and encryption protocol — in isolation, apparently nobody looked closely at the two components as they were connected together. I’m pretty sure there’s an entire geek meme about this.

Two unit tests, 0 integration tests, thanks Twitter.

Of course, the reason nobody looked closely at this stuff is that doing so is just plain hard. Protocols have an exponential number of possible cases to analyze, and we’re just about at the limit of the complexity of protocols that human beings can truly reason about, or that peer-reviewers can verify. The more pieces you add to the mix, the worse this problem gets.

In the end we all know that the answer is for humans to stop doing this work. We need machine-assisted verification of protocols, preferably tied to the actual source code that implements them. This would ensure that the protocol actually does what it says, and that implementers don’t further screw it up, thus invalidating the security proof.

This needs to be done urgently, but we’re so early in the process of figuring out how to do it that it’s not clear what it will take to make this stuff go live. All in all, this is an area that could use a lot more work. I hope I live to see it.


* Update: An early version of this post suggested that the attacker would replay the third message. This can indeed happen, and it does happen in some of the more sophisticated attacks. But primarily, the paper describes forcing the AP to resend it by blocking the acknowledgement from being received at the AP. Thanks to Nikita Borisov and Kyle Birkeland for the fix!

28 thoughts on “Falling through the KRACKs

  1. I don’t know how the underlying protocol works, but if the attacker manages to reset the client’s view of the packet counter, doesn’t it mean the client’s view doesn’t match the AP’s view anymore? Does the AP still accept the client’s packets after that point? I’m certainly misunderstanding something.

    1. Short answer… Yes you are right, client and AP are not on the same page after message 3 ….from the handshake is complete and packets will start flowing, now this will only last for a short period of time…the time that takes the AP realize that message #3 is no good because the AP will retransmit mess#3 ..19 times maximum giving an approximatelly 20 seconds gap.

  2. This article is excellent, except for two things. First: no, it’s not hard to put these proofs together, and there is no exponential blowup involved. If the standards people had required a proof before approving the protocol, it would have easily been done (and, in this case, the bug found). And second, while it’s good to prove that implementation code correctly implements its spec (I do it often), it’s a very different kind of task and the protocol itself is the correct verification target here.

  3. The explanation of that third message issue iš lacking. If client is blocked from receiving first message but receives second, how its a problem? It got One message as it expected.

  4. AFAIK they did release 802.11i draft 3 (if I remember correctly) to the public in late 2002 or so. It was not free, but you could buy it.

  5. > The key vulnerability in KRACK (no pun intended) is that message #3 can
    > be blocked by adversarial nasty people. When this happens, the AP re-
    > transmits this message…

    This is incorrect, As this protocol operates over the airwaves, it is impossible to block any reception of transmitted data. Unless you’re using a jammer of some sort (I’m not clear on how radio jamming works). In any case, what the attack described in the paper does instead, is fake message #3 from the AP, which in effect achieves the same thing. (note: see update below).

    Update: an earlier version of this post implied that you hadn’t corrected your mistake when you actually had. Sorry for the confusion.

    1. Not true. From the paper:

      > First, the adversary uses a channel-based MitM attack
      > so she can manipulate handshake messages [70]. Then she blocks
      > message 4 from arriving at the authenticator.

      Furthermore the selective jamming techniques that you describe do in fact exist:

      > Inspired by this observation, an adversary could also
      > selectively jam message 4 [70],
      > [70] Mathy Vanhoef and Frank Piessens. 2014. Advanced Wi-Fi attacks > using commodity hardware. In ACSAC.

    2. To be more specific, the way to block transmissions is to get the client on a different channel than the AP. This is pretty easy to do since most people don’t manually configure a channel into their Wi-Fi settings. Now you present the client with a fake AP and the AP with a fake client, and selectively forward transmissions between them. Since the real client and real AP are on different channels, they won’t see each other’s transmissions directly.

      1. You can also block a transmission by just transmitting enough wide band noise[1] during the transmission that the receivers signal to noise level drops below the detection threshold[2[. Military grade spread spectrum is designed to be hard to jam (and hard to detect). But commercial grade spread spectrum just goes for noise immunity.

        [1] If the jammer transmits garbage or a spoofed packet[2] using the same spread spectrum modulation the system is using so much the better. (need less power). However even a pure RF signal can cause enough LNA compression to disrupt reception, LNA is the low noise amplifier that amplifies the signal from the antenna.

        [2] Bonus I’ve read of people using this technique to spoof the payload of a transmitted packet.

      2. Is this possible? From the paper it looks as though the re-transmitted message 3 is from attacker being AP on a different channel, and the bloked message 4 is on the channel with the “real” AP. The client accepts all messages form all channels with given SSID:MAC pair, otherwise this wouldn’t work (from memory of paper).

  6. “The key thing to know is that AES-CCM (and GCM, and TKIP) is a stream cipher”

    My wife says AES-CCM is a block cipher.

    1. AES is a block cipher. AES-CCM is “AES-[CTR with CBC-MAC]”, and CTR is a tool (mode) for _building_ a stream cipher _from_ a block cipher, while CBC-MAC builds a MAC from same.

      In short, AES-CCM is an AEAD made from a stream cipher and a MAC, both of which are in turn built from a block cipher.

  7. > And of course, the final problem is implementers. One of the
    > truly terrible things about KRACK is that implementers of the
    > WPA supplicant (particularly on Linux) managed to somehow
    > make Lemon Pledge out of lemons.

    Cheap and uncalled for.

    If you are going to attack someone’s body of work based on one implementation error, just ask yourself how much software of similar weight and significance you have contributed recently. hostapd and wpa_supplicant have been an integral part of the OSS (and non-OSS) wifi ecosystem for more than a decade and how many other critical issues have been uncovered during that period?

    Have you ever read hostapd/wpa_supplicant code or consulted their mailing-list? Because if you did, you would know that your description of “the implementers” is careless, inflammatory and without merit.

    1. Not an insult. Just a factual statement. The nonce reset was a problem, all by itself, but setting the key to 0 turned a problematic bug (with a challenging path to exploitation) into a really catastrophic one. Notably, of all the major implementations *only* the Linux/Android6 one had this property. In fact, iOS and Windows actually rejected the improper key installation message.

      When you have N implementations of a standard, each with its own slightly different unique behavior *and* the standard itself is flawed, you have a problem. This can’t be ignored. The probable answer — as I say at the end of the post — is to formally verify that implementations match the functional spec, after you’ve proven that the functional spec is secure.

  8. My wife also says there’s a newer security proof for the 4-way handshake, see https://eprint.iacr.org/2017/253.pdf section 5.2, “Analyzing the 4-Way-Handshake”, specifically Game 1 looks like it assumes no nonce reuse: “This game proceeds as the previous one, but aborts if not all the nonces in the game are distinct” (but I’m not an expert in this myself, so maybe there’s a more relevant bit; I guess your point stands nonetheless).

  9. Naive question. If something has gone wrong in the production of a food or a drug or a vehicle, and consumers are at risk, a recall is announced and the producer pays to replace everyone’s stuff. Am I right in thinking there’s no equivalent process for this kind of product and this kind of vulnerability? Shouldn’t there be?

    1. Perhaps not as depressing as Intel’s ME and AMD’s PSP which have access to everything including network interface…

    2. Understand the sentiment, but your phrasing is incorrect. This is not a “closed protocol” in the traditional sense (vendor controlled, knowledge of which requires an NDA). It is a standard which requires money to access. You can even have a formal role in influencing the standard (by joining IEEE and the relevant standards committee), which usually also requires money.

  10. Has anyone worked out how much of a bribe the IEEE would have to be offered
    to get them to do their standards development openly on github?

    I’m wondering if a crowd of enlightened entities couldn’t fund such a bribe.
    What is the actual money involved, and how does it flow, in what proportions,
    from what ultimate sources, through what intermediaries?

    My bet is the total is chump change in the big picture. E.g., I’m sure the government
    spills more daily than the IEEE drinks all year.

  11. Note IEEE is not unique in charging for standards. ISO and ANSI are two other non-profit organizations that charge for (most of) their standards. These organizations originally supported themselves with a “printing charge” for the standard. Now of course there is less cost to the non-profit given electronic delivery, but prices still start around $50 per document, topping out over $200+.

    And this is an active impediment. I contribute to Common Criteria (CC) standards (through CCUF), helping create standards for conducting product security evaluations through collaborative Protection Profiles (cPP). Each cPP usually depends upon and references a bunch of standards from a variety of organizations. Anytime we depend upon a IEEE, ISO, or ANSI standard; the quality of review is lowered because much of the group does not have access to the standard in question. Worse it is rarely purchasing a single standard, you almost always need to research related (and only potentially relevant) standards, meaning is is quite easy to expend over $1000 dollars just to research referencing a single standard in work product.) Even those CCUF members from large organizations (which in theory could afford purchasing all the standards, or purchase bulk access for employees) often don’t have access because of internal administrative concerns (did you budget several thousand dollars for purchasing standards this year?).

    Note that Common Criteria is one of those exceptions, while technically an ISO standard — all CC standards are freely available. However that may potentially change, CC is undergoing its ISO 5 year review, and there is a small faction that wants to move CC more closely under the ISO umbrella. I have had several discussions trying to explain the true cost that even a nominal fee for the standard can create. The better long term solution is for IEEE, ISO, and ANSI to find a way to fund themselves that does not depend upon limiting access to the standards they produce.

  12. Pardon my ignorance of crypto and this protocol, but I still don’t understand why this re-installation introduces a weakness. So what if the AP is fooled into re-sending phase 3, which resets the nonce in the client: it had only just been set/reset by the first arrival of phase 3, and had not yet been used to encode any data. What has been lost by this resetting?

Comments are closed.