So long False Start, we hardly knew ye

Last week brought the sad news that Google is removing support for TLS False Start from the next version of Chrome. This follows on Google’s decision to withdraw TLS Snap Start, and caps off the long line of failures that started with the demise of Google Wave. Industry analysts are already debating the implications for Larry Page’s leadership and Google’s ‘Social’ strategy.

(I am, of course, being facetious.)

More seriously, it’s sad to see False Start go, since the engineering that led to False Start is one of the most innovative things happening in Internet security today. It’s also something that only Google can really do, since they make a browser and run a very popular service. The combination means they can try out new ideas on a broad scale without waiting for IETF approval and buy-in. Nor are they shy about doing this: when the Google team spots a problem, they tackle it head-on, in what I call the ‘nuke it from orbit’ approach. It’s kind of refreshing.

Of course, the flipside of experimentation is failure, and Google has seen its share of failed experiments. The last was Snap Start, which basically tackled the same problem as False Start, but in a different way. And now False Start is going the way of the coelacanth: it’ll still exist, but you’ll have a hard time finding it.

The response from the security community has been a lot of meaningful chin-tapping and moustache-stroking, but not (surprisingly) a lot of strong opinions. This is because — as we all recently discovered — none of us had really spent much time thinking about False Start before Google killed it.* And so now we’re all doing it retroactively.

What the heck is (was) False Start anyway?

Secure web (TLS) is dirt slow. At least, that’s the refrain you’ll hear from web companies that don’t deploy it. There are many objections but they usually boil down to two points: (1) TLS requires extra computing hardware on the server side, and (2) it adds latency to the user experience.

Why latency? Well, to answer that you only need to look at the TLS handshake. The handshake establishes a communication key between the server and client. It only needs to happen once per browsing session. Unfortunately it happens at the worst possible time: the first time you visit a new site. You never get a second chance to make a first impression.

Why is the standard handshake so slow? Short answer: key establishment requires two round trips — four ‘flights’ — before the client can send any actual data (say, an HTTP GET request). If you’re communicating via undersea cable, or if you’re using a mobile phone with a huge ping time (my AT&T phone just registered 250ms!) this is going to add up.

Standard TLS handshake (source). Application data exchange only occurs at step (9) in the diagram. Note that step (5) is optional, and steps (4),(7) can be conducted in the same message ‘flight’.

This is the problem that False Start tried to address. The designers — Langley, Modadugu and Moeller — approached this in the simplest possible way: they simply started sending data earlier.

You see, the bulk of the key exchange work in TLS really occurs in the first three messages, “client hello”, “server hello”, and “client key exchange”. By the time the client is ready to send this last message, it already knows the encryption key. Hence it can start transmitting encrypted data right then, cutting a full roundtrip off the handshake. The rest of the handshake is just ‘mop-up’: mostly dedicated to ensuring that nobody tampered with the handshake messages in-flight.

If your reaction to the previous sentence was: ‘isn’t it kind of important to know that nobody tampered with the handshake messages?‘, you’ve twigged to the primary security objection to False Start. But we’ll come back to that in a moment.

False Start sounds awesome: why did Google yank it?

The problem with False Start is that there are many servers (most, actually) that don’t support it. The good news is that many of them will tolerate FS. That is, if the Client starts sending data early, they’ll hold onto the data until they’ve finished their side of the handshake.

But not every server is so flexible. A small percentage of the servers on the Internet — mostly SSL terminators, according to Langley — really couldn’t handle False Start. They basically threw up all over it.

Google knew about this before deploying FS in Chrome, and had taken what I will loosely refer to as the ‘nuke it from orbit squared‘ approach. They scanned the Internet to compile a blacklist list of IPs that didn’t handle False Start, and they shipped it with Chrome. In principle this should have done the job.

Unfortunately, it didn’t. It seems that the non-FS-compatible servers were non-compliant in a weird non-deterministic way. A bunch looked good when Google scanned them, but really weren’t. These slipped past Google’s scan, and ultimately Google gave up on dealing with them. False Start was withdrawn.

Ok, it has issues. But is it secure?

From my point of view this is the much more interesting question. A bunch of smart folks (and also: me) wasted last Friday afternoon discussing this on Twitter. The general consensus is that we don’t really know.

The crux of the discussion is the Server “finished” message (#8 in the diagram above). This message is not in TLS just for show: it contains a hash over all of the handshake messages received by the server. This allows the client to verify that no ‘man in the middle’ is tampering with handshake messages, and in normal TLS it’s supposed to verify this before it sends any sensitive data.

This kind of tampering isn’t theoretical. In the bad old days of SSL2, it was possible to quietly downgrade the ciphersuite negotiation so that the Browser & Server would use an ‘export-weakened’ 40-bit cipher even when both wanted to use 128-bit keys. Integrity checks (mostly) eliminate this kind of threat.

The FS designers were wise to this threat, so they include some mitigations to deal with it. From the spec:

Clients and servers MUST NOT use the False Start protocol modification in a handshake unless the cipher suite uses a symmetric cipher that is considered cryptographically strong … While an attacker can change handshake messages to force a downgrade to a less secure symmetric cipher than otherwise would have been chosen, this rule ensures that in such a downgrade attack no application data will be sent under an insecure symmetric cipher.

While this seems to deal with the obvious threats (‘let’s not use encryption at all!’), it’s not clear that FS handles every possible MITM threat that could come up. For example, there were a number of mitigations proposed to deal with the BEAST attack — from basic things like ‘don’t use TLS 1.0’ to ‘activate the empty-fragment’ protection. There’s some reason to be fearful that these protections could be overwhelmed or disabled by an attacker who can tamper with handshake messages.

Note that this does not mean that False Start was actually vulnerable to any of the above. The last thing I want to do is claim a specific vulnerability in FS. Or rather, if I had a specific, concrete vulnerability, I’d be blogging about it.

Rather, the concern here is that TLS has taken a long time to get to the point it is now — and we’re still finding bugs (like BEAST). When you rip out parts of the anti-tamper machinery, you’re basically creating a whole new world of opportunities for clever (or even not-so-clever) attacks on the protocol. And that’s what I worry about.

Hands off my NSS

There’s one last brief point I’d like to mention about False Start and Snap Start, and that has to do with the implementation. If you’ve ever spent time with the NSS or OpenSSL code, you know that these are some of the finest examples of security coding in the world.

Haha, I kid. They’re terrifying.

Every time Google adds another extension (Snap Start, False Start, etc.), NSS gets patched. This additional patch code is small, and it’s written by great coders — but even so, this patching adds new cases and complexity to an already very messy library. It would be relatively easy for a bug to slip into one of these patches, or even the intersection of two patches. And this could affect more than just Google users.

In summary

It may seem a little odd to spend a whole blog post on a protocol that’s already been withdrawn, but really: it’s worth it. Examining protocols like False Start is the only way we learn; and making TLS faster is definitely something we want to learn about. Second, it’s not like False Start is the last we’ll hear from the Google TLS crew. It’ll certainly help if we’re up-to-speed when they release “False Start II: False Start with a Vengeance”.

Finally, False Start isn’t really dead — not yet, anyway. It lives on in Chrome for conducting Next Protocol Negotiation (NPN) transactions, used to negotiate SPDY.

On a final note, let me just say that I’m mostly enthusiastic about the work that Google is doing (modulo a few reservations noted above). I hope to see it continue, though I would like to see it get a little bit more outside scrutiny. After all, even though Google’s primarily acting as its own test case, many people use Chrome and Google! If they get something wrong, it won’t just be Google doing the suffering.


* With some notable exceptions: see this great presentation by Nate Lawson.