The network is hostile

Yesterday the New York Times and ProPublica posted a lengthy investigation based on leaked NSA documents, outlining the extensive surveillance collaboration between AT&T and the U.S. government. This surveillance includes gems such as AT&T’s assistance in tapping the main fiber connection supporting the United Nations, and that’s only the start.

The usual Internet suspects are arguing about whether this is actually news. The answer is both yes and no, though I assume that the world at large will mostly shrug at this point. After all, we’ve learned so much about the NSA’s operations at this point that we’re all suffering from revelation-fatigue. It would take a lot to shock us now.

But this isn’t what I want to talk about. Instead, the effect of this story was to inspire me to look back on the NSA leaks overall, to think about what they’ve taught us. And more importantly — what they mean for the design of the Internet and our priorities as security engineers. That’s what I’m going to ruminate about below.

The network is hostile

Anyone who has taken a network security class knows that the first rule of Internet security is that there is no Internet security. Indeed, this assumption is baked into the design of the Internet and most packet-switched networks — systems where unknown third parties are responsible for handling and routing your data. There is no way to ensure that your packets will be routed as you want them, and there’s absolutely no way to ensure that they won’t be looked at.
Indeed, the implications of this were obvious as far back as ARPANET. If you connect from point A to point B, it was well known that your packets would traverse untrusted machines C, D and E in between. In the 1970s the only thing preserving the privacy of your data was a gentleman’s agreement not to peek. If that wasn’t good enough, the network engineers argued, you had to provide your own security between the endpoints themselves.
My take from the NSA revelations is that even though this point was ‘obvious’ and well-known, we’ve always felt it more intellectually than in our hearts. Even knowing the worst was possible, we still chose to believe that direct peering connections and leased lines from reputable providers like AT&T would make us safe. If nothing else, the NSA leaks have convincingly refuted this assumption.

We don’t encrypt nearly enough

The most surprising lesson of the NSA stories is that 20 years after the development of SSL encryption, we’re still sending vast amounts of valuable data in the clear.

Even as late as 2014, highly vulnerable client-to-server connections for services like Yahoo Mail were routinely transmitted in cleartext — meaning that they weren’t just vulnerable to the NSA, but also to everyone on your local wireless network. And web-based connections were the good news. Even if you carefully checked your browser connections for HTTPS usage, proprietary extensions and mobile services would happily transmit data such as your contact list in the clear. If you noticed and shut down all of these weaknesses, it still wasn’t enough — tech companies would naively transmit the same data through vulnerable, unencrypted inter-datacenter connections where the NSA could scoop them up yet again.

There is a view in our community that we’re doing much better now, and to some extent we may be. But I’m less optimistic. From an attacker’s point of view, the question is not how much we’re encrypting, but rather, which valuable scraps we’re not protecting. As long as we tolerate the existence of unencrypted protocols and services, the answer is still: way too much.

It’s the metadata, stupid

Even if we, by some miracle, manage to achieve 100% encryption of communications content, we still haven’t solved the whole problem. Unfortunately, today’s protocols still leak a vast amount of useful information via session metadata. And we have no good strategy on the table to defend against it.

Examples of metadata leaked by today’s protocols include protocol type, port number, and routing information such as source and destination addresses. It also includes traffic characteristics, session duration, and total communications bandwidth. Traffic analysis remains a particular problem: even knowing the size of the files requested by a TLS-protected browser connection can leak a vast amount of information about the user’s browsing habits.

Absolutely none of this is news to security engineers. The problem is that there’s so little we can do about it. Anonymity networks like Tor protect the identity of endpoints in a connection, but they do so at a huge cost in additional bandwidth and latency — and they offer only limited protection in the face of a motivated global adversary. IPSec tunnels only kick the can to a different set of trusted components that themselves can be subverted.

‘Full take’ culture

Probably the most eye-opening fact of the intelligence leaks is the sheer volume of data that intelligence agencies are willing to collect. This is most famously exemplified by the U.S. bulk data collection and international call recording programs — but for network engineers the more worrying incarnation is “full take” Internet collection devices like TEMPORA.
If we restrict our attention purely to the collection of such data — rather than how it’s accessed — it appears that the limiting factors are almost exclusively technical in nature. In other words, the amount of data collected is simply a function of processing power, bandwidth and storage. And this is bad news for our future.
That’s because while meaningful human communication bandwidth (emails, texts, Facebook posts, Snapchats) continues to increase substantially, storage and processing power increase faster. With some filtration, and no ubiquitous encryption, ‘full take’ is increasingly going to be the rule rather than the exception.

We’ve seen the future, and it’s not American

Even if you’re not inclined to view the NSA as an adversary — and contrary to public perception, that view is not uniform even inside Silicon Valley — America is hardly the only intelligence agency capable of subverting the global communications network. Nations like China are increasingly gaining market share in telecommunications equipment and services, especially in developing parts of the world such as Africa and the Middle East.

While it’s cheap to hold China out as some sort of boogeyman, it’s significant that someday a large portion of the world’s traffic will flow through networks controlled by governments that are, at least to some extent, hostile to the core values of Western democracies.

If you believe that this is the future, then the answer certainly won’t involve legislation or politics. The NSA won’t protect us through cyber-retaliation or whatever plan is on the table today. If you’re concerned about the future, then the answer is to finally, truly believe our propaganda about network trust. We need to learn to build systems today that can survive such an environment. Failing that, we need to adjust to a very different world.

Attack of the week: Logjam

10560832436_f7d4eb635c_z
Credit: Sharon Mollerus (cc)

In case you haven’t heard, there’s a new SSL/TLS vulnerability making the rounds. Nicknamed Logjam, the new attack is ‘special’ in that it may admit complete decryption or hijacking of any TLS connection you make to an improperly configured web or mail server. Worse, there’s at least circumstantial evidence that similar (and more powerful) attacks might already be in the toolkit of some state-level attackers such as the NSA.

This work is the result of an unusual collaboration between a fantastic group of co-authors spread all around the world, including institutions such as the University of Michigan, INRIA Paris-Rocquencourt, INRIA Paris-Nancy, Microsoft Research, Johns Hopkins and the University Of Pennsylvania. It’s rare to see this level of collaboration between groups with so many different areas of expertise, and I hope to see a lot more like it. (Disclosure: I am one of the authors.)

The absolute best way to understand the Logjam result is to read the technical research paper. This post is mainly aimed at people who want a slightly less technical form. For those with even shorter attention spans, here’s the TL;DR:

It appears that the the Diffie-Hellman protocol, as currently deployed in SSL/TLS, may be vulnerable to a serious downgrade attack that restores it to 1990s “export” levels of security, and offers a practical “break” of the TLS protocol against poorly configured servers. Even worse, extrapolation of the attack requirements — combined with evidence from the Snowden documents — provides some reason to speculate that a similar attack could be leveraged against protocols (including TLS, IPSec/IKE and SSH) using 768- and 1024-bit Diffie-Hellman.

I’m going to tackle this post in the usual ‘fun’ question-and-answer format I save for this sort of thing.

What is Diffie-Hellman and why should I care about TLS “export” ciphersuites?

Diffie-Hellman is probably the most famous public key cryptosystem ever invented. Publicly discovered by Whit Diffie and Martin Hellman in the late 1970s (and a few years earlier, in secret, by UK GCHQ), it allows two parties to negotiate a shared encryption key over a public connection.

Diffie-Hellman is used extensively in protocols such as SSL/TLS and IPSec, which rely on it to establish the symmetric keys that are used to transport data. To do this, both parties must agree on a set of parameters to use for the key exchange. In traditional (‘mod p‘) Diffie-Hellman, these parameters consist of a large prime number p, as well as a ‘generator’ g. The two parties now exchange keys as shown below:

Classical Diffie-Hellman (source).

TLS supports several variants of Diffie-Hellman. The one we’re interested in for this work is the ‘ephemeral’ non-elliptic (“DHE”) protocol variant, which works in a manner that’s nearly identical to the diagram above. The server takes the role of Alice, selecting (p, g, ga mod p) and signing this tuple (and some nonces) using its long-term signing key. The client responds gb mod p and the two sides then calculate a shared secret.

Just for fun, TLS also supports an obsolete ‘export’ variant of Diffie-Hellman. These export ciphersuites are a relic from the 1990s when it was illegal to ship strong encryption out of the country. What you need to know about “export DHE” is simple: it works identically to standard DHE, but limits the size of p to 512 bits. Oh yes, and it’s still out there today. Because the Internet.

How do you attack Diffie-Hellman?

The best known attack against a correct Diffie-Hellman implementation involves capturing the value gand solving to find the secret key a. The problem of finding this value is known as the discrete logarithm problem, and it’s thought to be a mathematically intractable, at least when Diffie-Hellman is implemented in cryptographically strong groups (e.g., when p is of size 2048 bits or more).

Unfortunately, the story changes dramatically when p is relatively small — for example, 512 bits in length. Given a value gmod p for a 512-bit p, itshould at least be possible to efficiently recover the secret a and read traffic on the connection.

Most TLS servers don’t use 512-bit primes, so who cares?

The good news here is that weak Diffie-Hellman parameters are almost never used purposely on the Internet. Only a trivial fraction of the SSL/TLS servers out there today will organically negotiate 512-bit Diffie-Hellman. For the most part these are crappy embedded devices such as routers and video-conferencing gateways.

However, there is a second class of servers that are capable of supporting 512-bit Diffie-Hellman when clients request it, using a special mode called the ‘export DHE’ ciphersuite. Disgustingly, these servers amount to about 8% of the Alexa top million sites (and a whopping 29% of SMTP/STARTLS mail servers). Thankfully, most decent clients (AKA popular browsers) won’t willingly negotiate ‘export-DHE’, so this would also seem to be a dead end.

It isn’t.
ServerKeyExchange message (RFC 5246)

You see, before SSL/TLS peers can start engaging in all this fancy cryptography, they first need to decide which ciphers they’re going to use. This is done through a negotiation process in which the client proposes some options (e.g., RSA, DHE, DHE-EXPORT), and the server picks one.

This all sound simple enough. However, one of the early, well known flaws in SSL/TLS is the protocol’s failure to properly authenticate these ‘negotiation’ messages. In very early versions of SSL they were not authenticated at all. SSLv3 and TLS tacked on an authentication process — but one that takes place only at the end of the handshake.*

This is particularly unfortunate given that TLS servers often have the ability to authenticate their messages using digital signatures, but don’t really take advantage of this. For example, when two parties negotiate Diffie-Hellman, the parameters sent by the server are transmitted within a signed message called the ServerKeyExchange (shown at right). The signed portion of this message covers the parameters, but neglects to include any information about which ciphersuite the server thinks it’s negotiating. If you remember that the only difference between DHE and DHE-EXPORT is the size of the parameters the server sends down, you might start to see the problem.

Here it is in a nutshell: if the server supports DHE-EXPORT, the attacker can ‘edit’ the negotiation messages sent from the a client — even if the client doesn’t support export DHE — replacing the client’s list of supported ciphers with only export DHE. The server will in turn send back a signed 512-bit export-grade Diffie-Hellman tuple, which the client will blindly accept — because it doesn’t realize that the server is negotiating the export version of the ciphersuite. From its perspective this message looks just like ‘standard’ Diffie-Hellman with really crappy parameters.

Overview of the Logjam active attack (source: paper).

All this tampering should run into a huge snag at the end of the handshake, when he client and server exchange Finished messages embedding include a MAC of the transcript. At this point the client should learn that something funny is going on, i.e., that what it sent no longer matches what the server is seeing. However, the loophole is this: if the attacker can recover the Diffie-Hellman secret quickly — before the handshake ends — she can forge her own Finished messages. In that case the client and server will be none the wiser.

The upshot is that executing this attack requires the ability to solve a 512-bit discrete logarithm before the client and server exchange Finished messages. That seems like a tall order.

Can you really solve a discrete logarithm before a TLS handshake times out?

In practice, the fastest route to solving the discrete logarithm in finite fields is via an algorithm called the Number Field Sieve (NFS). Using NFS to solve a single 512-bit discrete logarithm instance requires several core-years — or about week of wall-clock time given a few thousand cores — which would seem to rule out solving discrete logs in real time.

However, there is a complication. In practice, NFS can actually be broken up into two different steps:

  1. Pre-computation (for a given prime p). This includes the process of polynomial selection, sieving, and linear algebra, all of which depend only on p. The output of this stage is a table for use in the second stage.
  2. Solving to find a (for a given gmod p). The final stage, called the descent, uses the table from the precomputation. This is the only part of the algorithm that actually involves a specific g andga.

The important thing to know is that the first stage of the attack consumes the vast majority of the time, up to a full week on a large-scale compute cluster. The descent stage, on the other hand, requires only a few core-minutes. Thus the attack cost depends primarily on where the server gets its Diffie-Hellman parameters from. The best case for an attacker is when p is hard-coded into the server software and used across millions of machines. The worst case is when p is re-generated routinely by the server.

I’ll let you guess what real TLS servers actually do.

In fact, large-scale Internet scans by the team at University of Michigan show that most popular web servers software tends to re-use a small number of primes across thousands of server instances. This is done because generating prime numbers is scary, so implementers default to using a hard-coded value or a config file supplied by your Linux distribution. The situation for export Diffie-Hellman is particularly awful, with only two (!) primes used across up 92% of enabled Apache/mod_ssl sites.

Number of seconds to solve a 512-bit discrete log (source: paper).

The upshot of all of this is that about two weeks of pre-computation is sufficient to build a table that allows you to perform the downgrade against most export-enabled servers in just a few minutes (see the chart at right). This is fast enough that it can be done before the TLS connection timeout. Moreover, even if this is not fast enough, the connection can often be held open longer by using clever protocol tricks, such as sending TLS warning messages to reset the timeout clock.

Keep in mind that none of this shared prime craziness matters when you’re using sufficiently large prime numbers (on the order of 2048 bits or more). It’s only a practical issue you’re using small primes, like 512-bit, 768-bit or — and here’s a sticky one I’ll come back to in a minute — 1024 bit.

How do you fix the downgrade to export DHE?

The best and most obvious fix for this problem is to exterminate export ciphersuites from the Internet. Unfortunately, these awful configurations are the default in a number of server software packages (looking at you Postfix), and getting people to update their configurations is surprisingly difficult (see e.g., FREAK).

A simpler fix is to upgrade the major web browsers to resist the attack. The easy way to do this is to enforce a larger minimum size for received DHE keys. The problem here is that the fix itself causes some collateral damage — it will break a small but significant fraction of lousy servers that organically negotiate (non-export) DHE with 512 bit keys.

The good news here is that the major browsers have decided to break the Internet (a little) rather than allow it to break them. Each has agreed to raise the minimum size limit to at least 768 bits, and some to a minimum of 1024 bits. It’s still not perfect, since 1024-bit DHE may not be cryptographically sound against powerful attackers, but it does address the immediate export attack. In the longer term the question is whether to use larger negotiated DHE groups, or abandon DHE altogether and move to elliptic curves.

What does this mean for larger parameter sizes?

The good news so far is that 512-bit Diffie-Hellman is only used by a fraction of the Internet, even when you account for active downgrade attacks. The vast majority of servers use Diffie-Hellman moduli of length at least 1024 bits. (The widespread use of 1024 is largely due to a hard-cap in older Java clients. Go away Java.)

While 2048-bit moduli are generally believed to be outside of anyone’s reach, 1024-bit DHE has long been considered to be at least within groping range of nation-state attackers. We’ve known this for years, of course, but the practical implications haven’t been quite clear. This paper tries to shine some light on that, using Internet-wide measurements and software/hardware estimates.

If you recall from above, the most critical aspect of the NFS attack is the need to perform large amounts of pre-computation on a given Diffie-Hellman prime p, followed by a relatively short calculation to break any given connection that uses p. At the 512-bit size the pre-computation only requires about a week. The question then is, how much does it cost for a 1024-bit prime, and how common are shared primes?

While there’s no exact way to know how much the 1024-bit attack would cost, the paper attempts to provide some extrapolations based on current knowledge. With software, the cost of the pre-computation seems quite high — on the order of 35 million core-years. Making this happen for a given prime within a reasonable amount of time (say, one year) would appear to require billions of dollars of computing equipment if we assume no algorithmic improvements. Even if we rule out such improvements, it’s conceivable that this cost might be brought down to a few hundred million dollars using hardware. This doesn’t seem out of bounds when you consider leaked NSA cryptanalysis budgets.

What’s interesting is that the descent stage, required to break a given Diffie-Hellman connection, is much faster. Based on some implementation experiments by the CADO-NFS team, it may be possible to break a Diffie-Hellman connection in as little as 30 core-days, with parallelization hugely reducing the wall-clock time. This might even make near-real-time decryption of Diffie-Hellman connections practical.

Is the NSA actually doing this?

So far all we’ve noted is that NFS pre-computation is at least potentially feasible when 1024-bit primes are re-used. That doesn’t mean the NSA is actually doing any of it.

There is some evidence, however, that suggests the NSA has decryption capability that’s at least consistent with such a break. This evidence comes from a series of Snowden documents published last winter in Der Spiegel. Together they describe a large-scale effort at NSA and GCHQ, capable of decrypting ‘vast’ amounts of Internet traffic, including IPSec, SSH and HTTPS connections.

NSA slide illustrating exploitation
of IPSec encrypted traffic (source: Spiegel).

While the architecture described by the documents mentions attacks against many protocols, the bulk of the energy seems to be around the IPSec and IKE protocols, which are used to establish Virtual Private Networks (VPNs) between individuals and corporate networks such as financial institutions.

The nature of the NSA’s exploit is never made clear in the documents, but diagram at right gives a lot of the architectural details. The system involves collecting Internet Key Exchange (IKE) handshakes, transmitting them to the NSA’s Cryptanalysis and Exploitation Services (CES) enclave, and feeding them into a decryption system that controls substantial high performance computing resources to process the intercepted exchanges. This is at least circumstantially consistent with Diffie-Hellman cryptanalysis.

Of course it’s entirely possible that the attack is based on a bad random number generator, weak symmetric encryption, or any number of engineered backdoors. There are a few pieces of evidence that militate towards a Diffie-Hellman break, however:

  1. IPSec (or rather, the IKE key exchange) uses Diffie-Hellman for every single connection, meaning that it can’t be broken without some kind of exploit, although this doesn’t rule out the other explanations.
  2. The IKE exchange is particularly vulnerable to pre-computation, since IKE uses a small number of standardized prime numbers called the Oakley groups, which are going on 17 years old now. Large-scale Internet scanning by the Michigan team shows that a majority of responding IPSec endpoints will gladly negotiate using Oakley Group 1 (768 bit) or Group 2 (1024 bit), even when the initiator offers better options.
  3. The NSA’s exploit appears to require the entire IKE handshake as well as any pre-shared key (PSK). These inputs would be necessary for recovery of IKEv1 session keys, but are not required in a break that involves only symmetric cryptography.
  4. The documents explicitly rule out the use of malware, or rather, they show that such malware (‘TAO implants’) is in use — but that malware allows the NSA to bypass the IKE handshake altogether.

I would stipulate that beyond the Internet measurements and computational analysis, this remains firmly in the category of  ‘crazy-eyed informed speculation’. But while we can’t rule out other explanations, this speculation is certainly consistent with a hardware-optimized break of Diffie-Hellman 768 and 1024-bit, along with some collateral damage to SSH and related protocols.

So what next?

The paper gives a detailed set of recommendations on what to do about these downgrade attacks and (relatively) weak DHE groups. The website provides a step-by-step guide for server administrators. In short, probably the best long-term move is to switch to elliptic curves (ECDHE) as soon as possible. Failing this, clients and servers should enforce at least 2048-bit Diffie-Hellman across the Internet. If you can’t do that, stop using common primes.

Making this all happen on anything as complicated as the Internet will probably consume a few dozen person-lifetimes. But it’s something we have to do, and will do, to make the Internet work properly.

Notes:

* There are reasons for this. Some SSL/TLS ciphersuites (such as the RSA encryption-based ciphersuites) don’t use signatures within the protocol, so the only way to authenticate the handshake is to negotiate a ciphersuite, run the key exchange protocol, then use the resulting shared secret to authenticate the negotiation messages after the fact. But SSL/TLS DHE involves digital signatures, so it should be possible to achieve a stronger level of security than this. It’s unfortunate that the protocol does not.

On the new Snowden documents

If you don’t follow NSA news obsessively, you might have missed yesterday’s massive Snowden document dump from Der Spiegel. The documents provide a great deal of insight into how the NSA breaks our cryptographic systems. I was very lightly involved in looking at some of this material, so I’m glad to see that it’s been published.

Unfortunately with so much material, it can be a bit hard to separate the signal from the noise. In this post I’m going to try to do that a little bit — point out the bits that I think are interesting, the parts that are old news, and the things we should keep an eye on.

Background

Those who read this blog will know that I’ve been wondering for a long time how NSA works its way around our encryption. This isn’t an academic matter, since it affects just about everyone who uses technology today.

What we’ve learned since 2013 is that NSA and its partners hoover up vast amounts of Internet traffic from fiber links around the world. Most of this data is plaintext and therefore easy to intercept. But at least some of it is encrypted — typically protected by protocols such as SSL/TLS or IPsec.

Conventional wisdom pre-Snowden told us that the increasing use of encryption ought to have shut the agencies out of this data trove. Yet the documents we’ve seen so far indicate just the opposite. Instead, the NSA and GCHQ have somehow been harvesting massive amounts of SSL/TLS and IPSEC traffic, and appear to be making inroads into other technologies such as Tor as well.

How are they doing this? To repeat an old observation, there are basically three ways to crack an encrypted connection:

  1. Go after the mathematics. This is expensive and unlikely to work well against modern encryption algorithms (with a few exceptions). The leaked documents give very little evidence of such mathematical breaks — though a bit more on this below.
  2. Go after the implementation. The new documents confirm a previously-reported and aggressive effort to undermine commercial cryptographic implementations. They also provide context for how important this type of sabotage is to the NSA.
  3. Steal the keys. Of course, the easiest way to attack any cryptosystem is simply to steal the keys. Yesterday we received a bit more evidence that this is happening.
I can’t possibly spend time on everything that’s covered by these documents — you should go read them yourself — so below I’m just going to focus on the highlights.

Not so Good Will Hunting

First, the disappointing part. The NSA may be the largest employer of cryptologic mathematicians in the United States, but — if the new story is any indication — those guys really aren’t pulling their weight.

In fact, the only significant piece of cryptanalytic news in the entire stack comes is a 2008 undergraduate research project looking at AES. Sadly, this is about as unexciting as it sounds — in fact it appears to be nothing more than a summer project by a visiting student. More interesting is the context it gives around the NSA’s efforts to break block ciphers such as AES, including the NSA’s view of the difficulty of such cryptanalysis, and confirmation that NSA has some ‘in-house techniques’.

Additionally, the documents include significant evidence that NSA has difficulty decrypting certain types of traffic, including Truecrypt, PGP/GPG, Tor and ZRTP from implementations such as RedPhone. Since these protocols share many of the same underlying cryptographic algorithms — RSA, Diffie-Hellman, ECDH and AES — some are presenting this as evidence that those primitives are cryptographically strong.

As with the AES note above, this ‘good news’ should also be taken with a grain of salt. With a small number of exceptions, it seems increasingly obvious that the Snowden documents are geared towards NSA’s analysts and operations staff. In fact, many of the systems actually seem aimed at protecting knowledge of NSA’s cryptanalytic capabilities from NSA’s own operational staff (and other Five Eyes partners). As an analyst, it’s quite possible you’ll never learn why a given intercept was successfully decrypted.

 To put this a bit more succinctly: the lack of cryptanalytic red meat in these documents may not truly be representative of the NSA’s capabilities. It may simply be an artifact of Edward Snowden’s clearances at the time he left the NSA.

Tor

One of the most surprising aspects of the Snowden documents — to those of us in the security research community anyway — is the NSA’s relative ineptitude when it comes to de-anonymizing users of the Tor anonymous communications network.

The reason for our surprise is twofold. First, Tor was never really designed to stand up against a global passive adversary — that is, an attacker who taps a huge number of communications links. If there’s one thing we’ve learned from the Snowden leaks, the NSA (plus GCHQ) is the very definition of the term. In theory at least, Tor should be a relatively easy target for the agency.

The real surprise, though, is that despite this huge signals intelligence advantage, the NSA has barely even tested their ability to de-anonymize users. In fact, this leak provides the first concrete evidence that NSA is experimenting with traffic confirmation attacks to find the source of Tor connections. Even more surprising, their techniques are relatively naive, even when compared to what’s going on in the ‘research’ community.

This doesn’t mean you should view Tor as secure against the NSA. It seems very obvious that the agency has identified Tor as a high-profile target, and we know they have the resources to make much more headway against the network. The real surprise is that they haven’t tried harder. Maybe they’re trying now.

SSL/TLS and IPSEC

 A few months ago I wrote a long post speculating about how the NSA breaks SSL/TLS. Because it’s increasingly clear that the NSA does break these protocols, and at relatively large scale.

The new documents don’t tell us much we didn’t already know, but they do confirm the basic outlines of the attack. The first portion requires endpoints around the world that are capable of performing the raw decryption of SSL/TLS sessions provided they know the session keys. The second is a separate infrastructure located on US soil that can recover those session keys when needed.

All of the real magic happens within the key recovery infrastructure. These documents provide the first evidence that a major attack strategy for NSA/GCHQ involves key databases containing the private keys for major sites. For the RSA key exchange ciphersuites of TLS, a single private key is sufficient to recover vast amounts of session traffic — in real time or even after the fact.

The interesting question is how the NSA gets those private keys. The easiest answer may be the least technical. A different Snowden leak shows gives some reason to believe that the NSA may have relationships with employees at specific named U.S. entities, and may even operate personnel “under cover”. This would certainly be one way to build a key database.

 

But even without the James Bond aspect of this, there’s every reason to believe that NSA has other means to exfiltrate RSA keys from operators. During the period in question, we know of at least one vulnerability (Heartbleed) that could have been used to extract private keys from software TLS implementations. There are still other, unreported vulnerabilities that could be used today.

 Pretty much everything I said about SSL/TLS also applies to VPN protocols, with the additional detail that many VPNs use broken protocols and relatively poorly-secured pre-shared secrets that can in some cases be brute-forced. The NSA seems positively gleeful about this.

Open Source packages: Redphone, Truecrypt, PGP and OTR

The documents provide at least circumstantial evidence that some open source encryption technologies may thwart NSA surveillance. These include Truecrypt, ZRTP implementations such as RedPhone, PGP implementations, and Off the Record messaging. These packages have a few commonalities:

  1. They’re all open source, and relatively well studied by researchers.
  2. They’re not used at terribly wide scale (as compared to e.g., SSL or VPNs)
  3. They all work on an end-to-end basis and don’t involve service providers, software distributers, or other infrastructure that could be corrupted or attacked.

What’s at least as interesting is which packages are not included on this list. Major corporate encryption protocols such as iMessage make no appearance in these documents, despite the fact that they ostensibly provide end-to-end encryption. This may be nothing. But given all we know about NSA’s access to providers, this is definitely worrying.

A note on the ethics of the leak

Before I finish, it’s worth addressing one major issue with this reporting: are we, as citizens, entitled to this information? Would we be safer keeping it all under wraps? And is this all ‘activist nonsense‘?

This story, more than some others, skates close to a line. I think it’s worth talking about why this information is important.

To sum up a complicated issue, we live in a world where targeted surveillance is probably necessary and inevitable. The evidence so far indicates that NSA is very good at this kind of work, despite some notable failures in actually executing on the intelligence it produces.

Unfortunately, the documents released so far also show that a great deal of NSA/GCHQ surveillance is not targeted at all. Vast amounts of data are scooped up indiscriminately, in the hope that some of it will someday prove useful. Worse, the NSA has decided that bulk surveillance justifies its efforts to undermine many of the security technologies that protect our own information systems. The President’s own hand-picked review council has strongly recommended this practice be stopped, but their advice has — to all appearances — been disregarded. These are matters that are worthy of debate, but this debate hasn’t happened.

Unfortunate if we can’t enact changes to fix these problems, technology is probably about all that’s left. Over the next few years encryption technologies are going to be widely deployed, not only by individuals but also by corporations desperately trying to reassure overseas customers who doubt the integrity of US technology.

In that world, it’s important to know what works and doesn’t work. Insofar as this story tells us that, it makes us all better off.

A letter from US security researchers

This week a group of more than fifty prominent security and cryptography researchers signed a letter protesting the mass surveillance efforts of the NSA, and attempts by NSA to weaken cryptography and privacy protections on the Internet. The full letter can be found here.

Most of you have already formed your own opinions on the issue over the past several months, and it’s unlikely that one letter is going to change that. Nonetheless, I’d like a chance to explain why this statement matters.

For academic professionals in the information security field, the relationship with NSA has always been a bit complicated. However, for the most part the public side of that relationship has been generally positive. Up until 2013 if you’d asked most US security researchers for their opinions on NSA, you would, of course, have heard a range of views. But you also might have heard notes of (perhaps grudging) respect. This is because many of the NSA’s public activities have been obviously in everyone’s interest — helping to fund research and secure our information systems.

Even where evidence indicated the possibility of unfair dealing, most researchers were content to dismiss these allegations as conspiracy theories. We believed the NSA would stay between the lines. Putting backdoors into US information standards was possible, of course. But would they do it? We thought nobody would be that foolish. We were wrong.

In my opinion this letter represents more than just an appeal to conscience. It measures the priceless trust and goodwill the NSA has lost — and continues to lose while our country fails to make serious reforms to this agency.

While I’m certain the NSA itself will survive this loss of faith in the short term, in the long term our economic and electronic security depend very much on the cooperation of academia, industry and private citizens. The NSA’s actions have destroyed this trust. And ironically, that makes us all less safe.

A few more notes on NSA random number generators

Last Friday, Joseph Menn from Reuters published an article claiming that RSA, the pioneering security firm and division of EMC, accepted $10 million dollars to include the 3a7ac-thomas_adversaryDual EC random number generator as the default in their flagship BSAFE library. I’ve written a bit about Dual EC on this blog, so readers will know that I don’t think highly of it. For one thing, it’s a rotten choice for a commercial product, due to its lousy software performance. Much worse: it may introduce a practical backdoor into any system that uses it.

Given the numerous problems with Dual EC it’s baffling that RSA would select this particular generator as the default for their software. It’s particularly strange given that BSAFE is designed for use in Java-based and embedded systems where performance truly is at a premium. And none of this can be explained by the needs of a single RSA customer, since those needs could be satisfied merely by making BSAFE an option, rather than the default.

Of course there have been many people who had viewed RSA’s decision as potentially malicious. Unsupported rumors have been floating around since long before Reuters got involved. What’s new this time is that RSA insiders appear to be going on the record.

I suppose time will tell how the industry reacts to this news. In the interim I have a few small facts to add to the discussion, so I thought I’d sketch them out in this post.

#1: Dual_EC_DRBG’s ‘backdoor’ was known as of January 2005

It’s widely believed that the ‘vulnerability’ in Dual_EC was first identified by Microsoft employees Dan Shumow and Niels Ferguson in the summer of 2007. Tanja Lange (and nymble) recently tipped me off to the fact that this isn’t precisely true.

In point of fact, the possibility of a backdoor was known to at least some members of the ANSI X9.82 standardization committee as far back in January 2005. This surprising news comes via a patent application filed by Certicom employees Dan Brown and Scott Vanstone. The application claims a priority date of January 2005. Here’s the scary bit:

If P and Q are established in a security domain controlled by an administrator, and the entity who generates Q for the domain does so with knowledge of e (or indirectly via knowledge of d). The administrator will have an escrow key for every ECRNG that follows that standard. 

Escrow keys are known to have advantages in some contexts. They can provide a backup functionality. If a cryptographic key is lost, then data encrypted under that key is also lost. However, encryption keys are generally the output of random number generators. Therefore, if the ECRNG is used to generate the encryption key K, then it may be possible that the escrow key e can be used to recover the encryption key K. Escrow keys can provide other functionality, such as for use in a wiretap. In this case, trusted law enforcement agents may need to decrypt encrypted traffic of criminals, and to do this they may want to be able to use an escrow key to recover an encryption key.

For example, in the SSL and TLS protocols, which are used for securing web (HTTP) traffic, a client and server perform a handshake in which their first actions are to exchange random values sent in the clear.

The patent also describes a number of ways to close the backdoor in Dual_EC_DRBG. Indeed, it may be due to Brown and Vanstone that the NIST standard includes an alternative method to close the backdoor (by generating a random Q point).

The existence of this patent does not mean that Brown and Vanstone were responsible for Dual EC. In fact, the generator appears to be an NSA invention, and may date back to the early 2000s. What this patent demonstrates is that some members of the ANSI committee, of which RSA was also a member, had reason to at least suspect that Dual EC could be used to create a wiretapping backdoor. (Update: John Kelsey confirms this.) It would be curious to know how widely this information was shared, and whether anyone on the committee ever inquired as to the provenance of the default parameters.

#2. Dual_EC_DRBG is not really a NIST standard

This is hardly a secret, but it’s something that hasn’t been widely acknowledged in the press. Dual_EC_DRBG is generally viewed as a NIST standard, since it was published in NIST Special Publication 800-90. But that’s not the only place it appears, nor was it developed at NIST.

A complete history of Dual_EC_DRBG would begin with NSA’s drive to include it in the ANSI X9.82 DRBG standard, with a standardization process kicked off in the early 2000s. The draft ANSI standard includes Dual_EC_DRBG with all of the known parameters, along with several additional elliptic curve parameters that were not included in the NIST standards.

Members of the ANSI X9F1 Tool Standards and Guidelines Group which wrote ANSI X9.82.

However you’ll also find Dual_EC_DRBG in the international standard ISO 18031. In other words, Dual EC was widely propagated in numerous standards long before NIST finalized it, and it’s hardly limited to US borders. The ANSI and ISO drafts are in some sense worse, since they don’t include a technique for generating your own Q parameter.

#3. Dual_EC_DRBG is not the only asymmetric random number generator in the ANSI and ISO standards

Cryptographers generally think of Dual EC as the only ‘public key’ random number generator to be widely standardized. We also point to NSA’s generation of the public parameters as evidence that Dual_EC may be compromised.

But in fact, the ANSI X9.82 and ISO standards each include a second generator based on public key cryptographic techniques. And like Dual EC, this one ships with a complete set of default parameters! The additional generator is based on an algorithm due to Micali and Schnorr, and relies for its security on assumptions related to the hardness of factoring large composite numbers. It requires an RSA-type modulus, several of which are conveniently provided in the specification.

Two default MS-DRBG moduli from the ISO 18031 specification.

There’s no reason to suspect that MS-DRBG is used by any real products, let alone that there’s a backdoor in the standard. In fact, there’s a curious fact about it: it’s not obvious from the public literature how one would attack the generator even if one knew the factorization of the n values above — though it seems intuitively likely that an attack does exist. Solving the problem, and finding a practical attack for known factorization, would be a fun project for an enthusiastic mathematician.

Since MS-DRBG comes from the same people who brought you Dual EC, if you are using it you might want to think twice.

How does the NSA break SSL?

A few weeks ago I wrote a long post about the NSA’s ‘BULLRUN’ project to subvert modern encryption standards. I had intended to come back to this at some point, since I didn’t have time to discuss the issues in detail. But then things got in the way. A lot of things, actually. Some of which I hope to write about in the near future.

But before I get there, and at the risk of boring you all to tears, I wanted to come back to this subject at least one more time, if only to pontificate a bit about a question that’s been bugging me.

You see, the NSA BULLRUN briefing sheet mentions that NSA has been breaking quite a few encryption technologies, some of which are more interesting than others. One of those technologies is particularly surprising to me, since I just can’t figure how NSA might be doing it. In this extremely long post I’m going to try to dig a bit deeper into the most important question facing the Internet today.

Specifically: how the hell is NSA breaking SSL?

Section of the BULLRUN briefing sheet. Source: New York Times.

To keep things on target I’m going to make a few basic ground rules.

First, I’m well aware that NSA can install malware on your computer and pwn any cryptography you choose. That doesn’t interest me at all, for the simple reason that it doesn’t scale well. NSA can do this to you, but they can’t do it for an entire population. And that’s really what concerns me about the recent leaks: the possibility that NSA is breaking encryption for the purposes of mass surveillance.

For the same reason, we’re not going to worry about man-in-the-middle (MITM) attacks. While we know that NSA does run these, they’re also a very targeted attack. Not only are MITMs detectable if you do them at large scale, they don’t comport with what we know about how NSA does large-scale interception — mostly via beam splitters and taps. In other words: we’re really concerned about passive surveillance.

The rules above aren’t absolute, of course. We will consider limited targeted attacks on servers, provided they later permit passive decryption of large amounts of traffic; e.g., decryption of traffic to major websites. We will also consider arbitrary modifications to software and hardware — something we know NSA is already doing.

One last point: to keep things from going off the rails, I’ve helpfully divided this post into two sections. The first will cover attacks that use only known techniques. Everything in this section can be implemented by a TAO employee with enough gumption and access to software. The second section, which I’ve titled the ‘Tinfoil Hat Spectrum‘ covers the fun and speculative stuff — ranging from new side channel attacks all the way to that huge quantum computer the NSA keeps next to BWI.

We’ll start with the ‘practical’.

Attacks that use Known Techniques

Theft of RSA keys. The most obvious way to ‘crack’ SSL doesn’t really involve cracking anything. Why waste time and money on cryptanalysis when you can just steal the keys? This issue is of particular concern in servers configured for the TLS RSA handshake, where a single 128-byte server key is all you need to decrypt every past and future connection made from the device.

In fact, this technique is so obvious that it’s hard to imagine NSA spending a lot of resources on sophisticated cryptanalytic attacks. We know that GCHQ and NSA are perfectly comfortable suborning even US providers overseas. And inside our borders, they’ve demonstrated a willingness to obtain TLS/SSL keys using subpoena powers and gag orders. If you’re using an RSA connection to a major website, it may be sensible to assume the key is already known.

Of course, even where NSA doesn’t resort to direct measures, there’s always the possibility of obtaining keys via a remote software exploit. The beauty is that these attacks don’t even require remote code execution. Given the right vulnerability, it may simply require a handful of malformed SSL requests to map the full contents of the OpenSSL/SChannel heap.

Source: New York Times

Suborning hardware encryption chips. A significant fraction of SSL traffic on the Internet is produced by hardware devices such as SSL terminators and VPN-enabled routers. Fortunately we don’t have to speculate about the security of these devices — we already know NSA/GCHQ have been collaborating with hardware manufacturers to ‘enable’ decryption on several major VPN encryption chips.

The NSA documents aren’t clear on how this capability works, or if it even involves SSL. If it does, the obvious guess is that each chip encrypts and exflitrates bits of the session key via ‘random’ fields such as IVs and handshake nonces. Indeed, this is relatively easy to implement on an opaque hardware device. The interesting question is how one ensures these backdoors can only be exploited by NSA — and not by rival intelligence agencies. (Some thoughts on that here.)

Side channel attacks. Traditionally when we analyze cryptographic algorithms we concern ourselves with the expected inputs and outputs of the system. But real systems leak all kinds of extra information. These ‘side channels’ — which include operation time, resource consumption, cache timing, and RF emissions — can often be used to extract secret key material.

The good news is that most of these channels are only exploitable when the attacker is in physical proximity to a TLS server. The bad news is that there are conditions in which the attacker can get close. The most obvious example involves virtualized TLS servers in the cloud setting, where a clever attacker may share physical resources with the target device.

A second class of attack uses remote timing information to slowly recover an RSA key. These attacks can be disabled via countermeasures such as RSA blinding, though amusingly, some ‘secure’ hardware co-processors may actually turn these countermeasures off by default! At very least, this makes the hardware vulnerable to attacks by a local user, and could even facilitate remote recovery of RSA keys.

Weak random number generators. Even if you’re using strong Perfect Forward Secrecy ciphersuites, the security of TLS depends fundamentally on the availability of unpredictable random numbers. Not coincidentally, tampering with random number generator standards appears to have been a particular focus of NSA’s efforts.

Random numbers are critical to a number of elements in TLS, but they’re particularly important in three places:

  1. On the client side, during the RSA handshake. The RNG is used to generate the RSA pre-master secret and encryption padding. If the attacker can predict the output of this generator, she can subsequently decrypt the entire session. Ironically, a failure of the server RNG is much less devastating to the RSA handshake.*
  2. On the client or server side, during the Diffie-Hellman handshake(s). Since Diffie-Hellman requires a contribution from each side of the connection, a predictable RNG on either side renders the session completely transparent.
  3. During long-term key generation, particularly of RSA keys. If this happens, you’re screwed.
And you just don’t need to be that sophisticated to weaken a random number generator. These generators are already surprisingly fragile, and it’s awfully difficult to detect when one is broken. Debian’s maintainers made this point beautifully back in 2008 when an errant code cleanup reduced the effective entropy of OpenSSL to just 16 bits. In fact, RNGs are so vulnerable that the challenge here is not weakening the RNG — any idiot with a keyboard can do that — it’s doing so without making the implementation trivially vulnerable to everyone else.

The good news is that it’s relatively easy to tamper with an SSL implementation to make it encrypt and exfiltrate the current RNG seed. This still requires someone to physically alter the library, or install a persistent exploit, but it can be done cleverly without even adding much new code to the existing OpenSSL code. (OpenSSL’s love of function pointers makes it particularly easy to tamper with this stuff.)

If tampering isn’t your style, why not put the backdoor in plain sight? That’s the approach NSA took with the Dual_EC RNG, standardized by NIST in Special Publication 800-90. There’s compelling evidence that NSA deliberately engineered this generator with a backdoor — one that allows them to break any TLS/SSL connection made using it. Since the generator is (was) the default in RSA’s BSAFE library, you should expect every TLS connection made using that software to be potentially compromised.

And I haven’t even mentioned Intel’s plans to replace the Linux kernel RNG with its own hardware RNG.

Esoteric Weaknesses in PFS systems. Many web servers, including Google and Facebook, now use Perfect Forward Secrecy ciphersuites like ephemeral Diffie-Hellman (DHE and ECDHE). In theory these ciphersuites provide the best of all possible worlds: keys persist for one session and then disappear once the connection is over. While this doesn’t save you from RNG issues, it does make key theft a whole lot more difficult.

PFS ciphersuites are a good thing, but a variety of subtle issues can cramp their style. For one thing, the session resumption mechanism can be finicky: session keys must either be stored locally, or encrypted and given out to users in the form of session tickets. Unfortunately, the use of session tickets somewhat diminishes the ‘perfectness’ of PFS systems, since the keys used for encrypting the tickets now represent a major weakness in the system. Moreover, you can’t even keep them internal to one server, since they have to be shared among all of a site’s front-end servers! In short, they seem like kind of a nightmare.

A final area of concern is the validation of Diffie-Hellman parameters. The current SSL design assumes that DH groups are always honestly generated by the server. But a malicious implementation can violate this assumption and use bad parameters, which enable third party eavesdropping. This seems like a pretty unlikely avenue for enabling surveillance, but it goes to show how delicate these systems are.

The Tinfoil Hat Spectrum

I’m going to refer to the next batch of attacks as ‘tinfoil hat‘ vulnerabilities. Where the previous issues all leverage well known techniques, each of the following proposals require totally new cryptanalytic techniques. All of which is a way of saying that the following section is pure speculation. It’s fun to speculate, of course. But it requires us to assume facts not in evidence. Moreover, we have to be a bit careful about where we stop.

So from here on out we are essentially conducting a thought-experiment. Let’s imagine the NSA has a passive SSL-breaking capability; and furthermore, that it doesn’t rely on the tricks of the previous section. What’s left?

The following list begins with the most ‘likely’ theories and works towards the truly insane.

Breaking RSA keys. There’s a persistent rumor in our field that NSA is cracking 1024-bit RSA keys. It’s doubtful this rumor stems from any real knowledge of NSA operations. More likely it’s driven by the fact that cracking 1024-bit keys is highly feasible for an organization with NSA’s resources.

How feasible? Several credible researchers have attempted to answer this question, and it turns out that the cost is lower than you think. Way back in 2003, Shamir and Tromer estimated $10 million for a purpose-built machine that could factor one 1024-bit key per year. In 2013, Tromer reduced those numbers to about $1 million, factoring in hardware advances. And it could be significantly lower. This is pocket change for NSA.

Along similar lines, Bernstein, Heninger and Lange examined at the feasibility of cracking RSA using distributed networks of standard PCs. Their results are pretty disturbing: in principal, a cluster about the size of the real-life Conficker botnet could do serious violence to 1024-bit keys.

Given all this, you might ask why this possibility is even in the ‘tinfoil hat’ category. The simple answer is: because nobody’s actually done it. That means it’s at least conceivable that the estimates above are dramatically too high — or even too low. Moreover, RSA-1024 keys are being rapidly being phased out. Cracking 2048 bit keys would require significant mathematical advances, taking us much deeper into the tinfoil hat.**

Cracking RC4. On paper, TLS supports a variety of strong encryption algorithms. In practice, about half of all TLS traffic is secured with the creaky old RC4 cipher. And this should worry you — because RC4 is starting to show its age. In fact, as used in TLS it’s already vulnerable to (borderline) practical attacks. Thus it seems like a nice candidate for a true cryptanalytic advance on NSA’s part.

Unfortunately the problem with this theory is that we simply don’t know of any attack that would allow the NSA to usefully crack RC4! The known techniques require an attacker to collect thousands or millions of ciphertexts that are either (a) encrypted with related keys (as in WEP) or (b) contain the same plaintext. The best known attack against TLS takes the latter form — it requires the victim to establish billions of sessions, and even then it only recovers fixed plaintext elements like cookies or passwords.

The counterargument is that the public research community hasn’t been thinking very hard about RC4 for the past decade — in part because we thought it was so broken people had stopped using it (oops!) If we’d been focusing all our attention on it (or better, the NSA’s attention), who knows what we’d have today.

If you told me the NSA had one truly new cryptanalytic capability, I’d agree with Jake and point the finger at RC4. Mostly because the alternatives are far scarier.

New side-channel attacks. For the most part, remote timing attacks appear to have been killed off by the implementation of countermeasures such as RSA blinding, which confound timing by multiplying a random blinding factor into each ciphertext prior to decryption. In theory this should make timing information essentially worthless. In practice, many TLS implementations implement compromises in the blinding code that might resurrect these attacks, things like squaring a blinding factor between decryption operations, rather than generating a new one each time. It’s quite unlikely there are attacks here, but who knows.

Goofy stuff. Maybe NSA does have something truly amazing up its sleeve. The problem with opening this Pandora’s box is that it’s really hard to get it closed again. Did Jerry Solinas really cook the NIST P-curves to support some amazing new attack (which NSA knew about way back in the late 1990s, but we have not yet discovered)? Does the NSA have a giant supercomputer named TRANSLTR that can brute-force any cryptosystem? Is there a giant quantum computer at the BWI Friendship annex? For answers to these questions you may as well just shake the Magic 8-Ball, cause I don’t have a clue.

Conclusion

We don’t know and can’t know the answer to these things, and honestly it’ll make you crazy if you start thinking about it. All we can really do is take NSA/GCHQ at their word when they tell us that these capabilities are ‘extremely fragile’. That should at least give us hope.

The question now is if we can guess well enough to turn that fragility from a warning into a promise.

Notes:

* A failure of the server RNG could result in some predictable values like the ServerRandom and session IDs. An attacker who can predict these values may be able to run active attacks against the protocol, but — in the RSA ciphersuite, at least — they don’t admit passive compromise.

** Even though 1024-bit RSA keys are being eliminated, many servers still use 1024-bit for Diffie-Hellman (mostly for efficiency reasons). The attacks on these keys are similar to the ones used against RSA — however, the major difference is that fresh Diffie-Hellman ‘ephemeral’ keys are generated for each new connection. Breaking large amounts of traffic seems quite costly.

RSA warns developers not to use RSA products

In today’s news of the weird, RSA (a division of EMC) has recommended that developers desist from using the (allegedly) ‘backdoored’ Dual_EC_DRBG random number generator — which happens to be the default in RSA’s BSafe cryptographic toolkit. Youch.

In case you’re missing the story here, Dual_EC_DRBG (which I wrote about yesterday) is the random number generator voted most likely to be backdoored by the NSA. The story here is that — despite many valid concerns about this generator — RSA went ahead and made it the default generator used for all cryptography in its flagship cryptography library. The implications for RSA and RSA-based products are staggering. In the worst case a modestly bad but by no means worst case, the NSA may be able to intercept SSL/TLS connections made by products implemented with BSafe.

So why would RSA pick Dual_EC as the default? You got me. Not only is Dual_EC hilariously slow — which has real performance implications — it was shown to be a just plain bad random number generator all the way back in 2006. By 2007, when Shumow and Ferguson raised the possibility of a backdoor in the specification, no sensible cryptographer would go near the thing.

And the killer is that RSA employs a number of highly distinguished cryptographers! It’s unlikely that they’d all miss the news about Dual_EC.

We can only speculate about the past. But here in the present we get to watch RSA’s CTO Sam Curry publicly defend RSA’s choices. I sort of feel bad for the guy. But let’s make fun of it anyway.

I’ll take his statement line by line (Sam is the boldface):

“Plenty of other crypto functions (PBKDF2, bcrypt, scrypt) will iterate a hash 1000 times specifically to make it slower.”

Password hash functions are built deliberately slow to frustrate dictionary attacks. Making a random number generator slow is just dumb.

At the time, elliptic curves were in vogue

Say what?

and hash-based RNG was under scrutiny.

Nonsense. A single obsolete hash based generator (FIPS 186) was under scrutiny — and fixed. The NIST SP800-90 draft in which Dual_EC appeared ALSO provided three perfectly nice non-backdoored generators: two based on hash functions and one based on AES. BSafe even implements some of them. Sam, this statement is just plain misleading.

The hope was that elliptic curve techniques—based as they are on number theory—would not suffer many of the same weaknesses as other techniques (like the FIPS 186 SHA-1 generator) that were seen as negative

Dual-EC suffers exactly the same sort of weaknesses as FIPS 186. Unlike the alternative generators in NIST SP800-90 it has a significant bias and really should not be used in production systems. RSA certainly had access to this information after the analyses were published in 2006.

and Dual_EC_DRBG was an accepted and publicly scrutinized standard.

And every bit of public scrutiny said the same thing: this thing is broken! Grab your children and run away!

SP800-90 (which defines Dual EC DRBG) requires new features like continuous testing of the output, mandatory re-seeding,

The exact same can be said for the hash-based and AES-based alternative generators you DIDN’T choose from SP800-90.

optional prediction resistance and the ability to configure for different strengths.

So did you take advantage of any of these options as part of the BSafe defaults? Why not? How about the very simple mitigations that NIST added to SP800-90A as a means to remove concerns that the generator might have a backdoor? Anyone?

There’s not too much else to say here. I guess the best way to put it is: this is all part of the process. First you find the disease. Then you see if you can cure it.

The Many Flaws of Dual_EC_DRBG

The Dual_EC_DRBG generator from NIST SP800-90A.

Update 9/19: RSA warns developers not to use the default Dual_EC_DRBG generator in BSAFE. Oh lord.

As a technical follow up to my previous post about the NSA’s war on crypto, I wanted to make a few specific points about standards. In particular I wanted to address the allegation that NSA inserted a backdoor into the Dual-EC pseudorandom number generator.

For those not following the story, Dual-EC is a pseudorandom number generator proposed by NIST for international use back in 2006. Just a few months later, Shumow and Ferguson made cryptographic history by pointing out that there might be an NSA backdoor in the algorithm. This possibility — fairly remarkable for an algorithm of this type — looked bad and smelled worse. If true, it spelled almost certain doom for anyone relying on Dual-EC to keep their system safe from spying eyes.

Now I should point out that much of this is ancient history. What is news today is the recent leak of classified documents that points a very emphatic finger towards Dual_EC, or rather, to an unnamed ‘2006 NIST standard’. The evidence that Dual-EC is this standard has now become so hard to ignore that NIST recently took the unprecedented step of warning implementers to avoid it altogether.

Better late than never.

In this post I’m going to try to explain the curious story of Dual-EC. While I’ll do my best to keep this discussion at a high and non-mathematical level, be forewarned that I’m probably going to fail at least at a couple of points. I you’re not the mood for all that, here’s a short summary:

  • In 2005-2006 NIST and NSA released a pseudorandom number generator based on elliptic curve cryptography. They released this standard — with very little explanation — both in the US and abroad.
  • This RNG has some serious issues with just being a good RNG. The presence of such obvious bugs was mysterious to cryptographers.
  • In 2007 a pair of Microsoft researchers pointed out that these vulnerabilities combined to produce a perfect storm, which — together with some knowledge that only NIST/NSA might have — opened a perfect backdoor into the random number generator itself.
  • This backdoor may allow the NSA to break nearly any cryptographic system that uses it.

If you’re still with me, strap in. Here goes the long version.

Dual-EC

For a good summary on the history of Dual-EC-DRBG, see this 2007 post by Bruce Schneier. Here I’ll just give the highlights.

Back in 2004-5, NIST decided to address a longstanding weakness of the FIPS standards, namely, the limited number of approved pseudorandom bit generator algorithms (PRGs, or ‘DRBGs’ in NIST parlance) available to implementers. This was actually a bit of an issue for FIPS developers, since the existing random number generators had some known design weaknesses.*

NIST’s answer to this problem was Special Publication 800-90, parts of which were later wrapped up into the international standard ISO 18031. The NIST pub added four new generators to the FIPS canon. None these algorithms is a true random number generator in the sense that they collect physical entropy. Instead, what they do is process the (short) output of a true random number generator — like the one in Linux — conditioning and stretching this ‘seed’ into a large number of random-looking bits you can use to get things done.** This is particularly important for FIPS-certified cryptographic modules, since the FIPS 140-2 standards typically require you to use a DRBG as a kind of ‘post-processing’ — even when you have a decent hardware generator.

The first three SP800-90 proposals used standard symmetric components like hash functions and block ciphers. Dual_EC_DRBG was the odd one out, since it employed mathematics more that are typically used to construct public-key cryptosystems. This had some immediate consequences for the generator: Dual-EC is slow in a way that its cousins aren’t. Up to a thousand times slower.

Now before you panic about this, the inefficiency of Dual_EC is not necessarily one of its flaws! Indeed, the inclusion of an algebraic generator actually makes a certain amount of sense. The academic literature includes a distinguished history of provably secure PRGs based on on number theoretic assumptions, and it certainly didn’t hurt to consider one such construction for standardization. Most developers would probably use the faster symmetric alternatives, but perhaps a small number would prefer the added confidence of a provably-secure construction.

Unfortunately, here is where NIST ran into their first problem with Dual_EC.

Flaw #1: Dual-EC has no security proof

Let me spell this out as clearly as I can. In the course of proposing this complex and slow new PRG where the only damn reason you’d ever use the thing is for its security reduction, NIST forgot to provide one. This is like selling someone a Mercedes and forgetting to attach the hood ornament.

I’d like to say this fact alone should have damned Dual_EC, but sadly this is par for the course for NIST — which treats security proofs like those cool Japanese cookies you can never find. In other words, a fancy, exotic luxury. Indeed, NIST has a nasty habit of dumping proof-writing work onto outside academics, often after the standard has been finalized and implemented in products everywhere.

So when NIST put forward its first draft of SP800-90 in 2005, academic cryptographers were left to analyze it from scratch. Which, to their great credit, they were quite successful at.

The first thing reviewers noticed is that Dual-EC follows a known design paradigm — it’s a weird variant of an elliptic curve linear congruential generator. However they also noticed that NIST had made some odd rookie mistakes.

Now here we will have to get slightly wonky — though I will keep mathematics to a minimum. (I promise it will all come together in the end!) Constructions like Dual-EC have basically two stages:

1. A stage that generates a series of pseudorandom elliptic curve points. Just like on the graph at right, an elliptic curve point is a pair (x, y) that satisfies an elliptic curve equation. In general, both x and y are elements of a finite field, which for our purposes means they’re just large integers.***

The main operation of the PRNG is to apply mathematical operations to points on the elliptic curve, in order to generate new points that are pseudorandom — i.e., are indistinguishable from random points in some subgroup.

And the good news is that Dual-EC seems to do this first part beautifully! In fact Brown and Gjøsteen even proved that this part of the generator is sound provided that the Decisional Diffie-Hellman problem is hard in the specific elliptic curve subgroup. This is a well studied hardness assumption so we can probably feel pretty confident in this proof.

2. Extract pseudorandom bits from the generated EC points. While the points generated by Dual-EC may be pseudorandom, that doesn’t mean the specific (x, y) integer pairs are random bitstrings. For one thing, ‘x‘ and ‘y‘ are not really bitstrings at all, they’re integers less than some prime number. Most pairs don’t satisfy the curve equation or are not in the right subgroup. Hence you can’t just output the raw x or y values and expect them to make good pseudorandom bits.

Thus the second phase of the generator is to ‘extract’ some (but not all) of the bits from the EC points. Traditional literature designs do all sorts of things here — including hashing the point or dropping up to half of the bits of the x-coordinate. Dual-EC does something much simpler: it grabs the x coordinate, throws away the most significant 16-18 bits, and outputs the rest.

In 2006, first Gjøsteen and later Schoenmakers and Sidorenko took a close look at Dual-EC and independently came up with a surprising result:

Flaw #2: Dual-EC outputs too many bits

Unlike those previous EC PRGs which output anywhere from 2/3 to half of the bits from the x-coordinate, Dual-EC outputs nearly the entire thing.

This is good for efficiency, but unfortunately it also gives Dual-EC a bias. Due to some quirks in the mathematics of the field operations, an attacker can now predict the next bits of Dual-EC output with a fairly small — but non-trivial — success probability, in the range of 0.1%. While this number may seem small to non-cryptographers, it’s basically a hanging offense for a cryptographic random number generator where probability of predicting a future bit should be many orders of magnitude lower.

What’s just plain baffling is that this flaw ever saw the light of day. After all, the specification was developed by bright people at NIST — in collaboration with NSA. Either of those groups should easily have discovered a bug like this, especially since this issue had been previously studied. Indeed, within a few months of public release, two separate groups of academic cryptographers found it, and were able to implement an attack using standard PC equipment.

So in summary, the bias is mysterious and it seems to be very much an ‘own-goal’ on the NSA’s part. Why in the world would they release so much information from each EC point? It’s hard to say, but a bit more investigation reveals some interesting consequences:

Flaw #3: You can guess the original EC point from looking at the output bits

By itself this isn’t really a flaw, but will turn out to be interesting in just a minute.

Since Dual-EC outputs so many bits from the x-coordinate of each point — all but the most significant 16 bits — it’s relatively easy to guess the original source point by simply brute-forcing the missing 16 bits and solving the elliptic curve equation for y. (This is all high-school algebra, I swear!)

While this process probably won’t uniquely identify the original (x, y), it’ll give you a modestly sized list of candidates. Moreover with only 16 missing bits the search can be done quickly even on a desktop computer. Had Dual_EC thrown away more bits of the x-coordinate, this search would not have been feasible at all.

So what does this mean? In general, recovering the EC point shouldn’t actually be a huge problem. In theory it could lead to a weakness — say predicting future outputs — but in a proper design you would still have to solve a discrete logarithm instance for each and every point in order to predict the next bytes output by the generator.

And here is where things get interesting.

Flaw #4: If you know a certain property about the Dual_EC parameters, and can recover an output point, you can predict all subsequent outputs of the generator

Did I tell you this would get interesting in a minute? I totally did.

The next piece of our puzzle was discovered by Microsoft researchers Dan Shumow and Niels Ferguson, and announced at the CRYPTO 2007 rump session. I think this result can best be described via the totally intuitive diagram below. (Don’t worry, I’ll explain it!)

Annotated diagram from Shumow-Ferguson presentation (CRYPTO 2007).
Colorful elements were added by yours truly. Thick green arrows mean ‘this part is
easy to reverse’. Thick red arrows should mean the opposite. Unless you’re the NSA.

The Dual-EC generator consists of two stages: a portion that generates the output bits (right) and a part that updates the internal state (left).

Starting from the “r_i” value (circled, center) and heading right, the bit generation part first computes the output point using the function “r_i * Q” — where Q is an elliptic curve point defined in the parameters — then truncates 16 bits its off its x-coordinate to get the raw generator output. The “*” operator here describes elliptic point multiplication, which is a complex operation that should be relatively hard to invert.

Note that everything after the point multiplication should be easy to invert and recover from the output, as we discussed in the previous section.

Every time the generator produces one block of output bits, it also updates its internal state. This is designed to prevent attacks where someone compromises the internal values of a working generator, then uses this value to wind the generator backwards and guess past outputs. Starting again from the circled “r_i” value, the generator now heads upwards and computes the point “r_i * P” where P is a different elliptic curve point also described in the parameters. It then does some other stuff.

The theory here is that P and Q should be random points, and thus it should be difficult to find “r_i * P” used for state update even if you know the output point “r_i * Q” — which I stress you do know, because it’s easy to find. Going from one point to the other requires you to know a relationship between P and Q, which you shouldn’t actually know since they’re supposed to be random values. The difficulty of this is indicated by the thick red arrow.

Looks totally kosher to me. (Source: NIST SP800-90A)

There is, however, one tiny little exception to this rule. What if P and Q aren’t entirely random values? What if you chose them yourself specifically so you’d know the mathematical relationship between the two points?

In this case it turns out you can easily compute the next PRG state after recovering a single output point (from 32 bytes of RNG output). This means you can follow the equations through and predict the next output. And the next output after that. And on forever and forever.****

This is a huge deal in the case of SSL/TLS, for example. If I use the Dual-EC PRG to generate the “Client Random” nonce transmitted in the beginning of an SSL connection, then the NSA will be able to predict the “Pre-Master” secret that I’m going to generate during the RSA handshake. Given this information the connection is now a cleartext read. This is not good.

So now you should all be asking the most important question of all: how the hell did the NSA generate the specific P and Q values recommended in Appendix A of Dual-EC-DRBG? And do they know the relationship that allows them to run this attack? All of which brings us to:

Flaw #5Nobody knows where the recommended parameters came from

And if you think that’s problematic, welcome to the club.

But why? And where is Dual-EC used?

The ten million dollar question of Dual-EC is why the NSA would stick such an obviously backdoored algorithm into an important standard. Keep in mind that cryptographers found the major (bias) vulnerabilities almost immediately after Dual-EC shipped. The possibility of a ‘backdoor’ was announced in summer 2007. Who would still use it?

A few people have gone through the list of CMVP-evaluated products and found that the answer is: quite a few people would. Most certify Dual-EC simply because it’s implemented in OpenSSL-FIPS, and they happen to use that library. But at least one provider certifies it exclusively. Yuck.

Hardcoded constants from the OpenSSL-FIPS
implementation of Dual_EC_DRBG. Recognize ’em?

It’s worth keeping in mind that NIST standards carry a lot of weight — even those that might have a backdoor. Folks who aren’t keeping up on the latest crypto results could still innocently use the thing, either by accident or (less innocently) because the government asked them to. Even if they don’t use it, they might include the code in their product — say through the inclusion of OpenSSL-FIPS or MS Crypto API — which means it’s just a function call away from being surreptitiously activated.

Which is why people need to stop including Dual-EC immediately. We have no idea what it’s for, but it needs to go away. Now.

So what about the curves?

The last point I want to make is that the vulnerabilities in Dual-EC have precisely nothing to do with the specifics of the NIST standard elliptic curves themselves. The ‘back door’ in Dual-EC comes exclusively from the relationship between P and Q — the latter of which is published only in the Dual-EC specification. The attack can work even if you don’t use the NIST pseudorandom curves.

However, the revelations about NIST and the NSA certainly make it worth our time to ask whether these curves themselves are somehow weak. The best answer to that question is: we don’t know. Others have observed that NIST’s process for generating the curves leaves a lot to be desired. But including some kind of hypothetical backdoor would be a horrible, horrific idea — one that would almost certainly blow back at us.

You’d think people with common sense would realize this. Unfortunately we can’t count on that anymore.

Thanks to Tanja Lange for her assistance proofing this post. Any errors in the text are entirely mine.

Notes:

* My recollection of this period is hazy, but prior to SP800-90 the two most common FIPS DRBGs in production were (1) the SHA1-based DSA generator of FIPS 186-2 and (2) ANSI X9.31. The DSA generator was a special-purpose generator based on SHA1, and was really designed just for that purpose. ANSI X9.31 used block ciphers, but suffered from some more subtle weaknesses it retained from the earlier X9.17 generator. These were pointed out by Kelsey, Schneier, Wagner and Hall.

** This is actually a requirement of the FIPS 140-2 specification. Since FIPS does not approve any true random number generators, it instead mandates that you run your true RNG output through a DRBG (PRNG) first. The only exception is if your true RNG has been approved ‘for classified use’.

*** Specifically, x and y are integers in the range 0 to p-1 where p is a large prime number. A point is a pair (x, y) such that y^2 = x^3 + ax + b mod p. The values a and b are defined as part of the curve parameters.

**** The process of predicting future outputs involves a few guesses, since you don’t know the exact output point (and had to guess at the missing 16 bits), but you can easily reduce this to a small set of candidates — then it’s just a question of looking at a few more bits of RNG output until you guess the right one.

A note on the NSA, the future and fixing mistakes

Readers of this blog will know this has been an interesting couple of days for me. I have very mixed feelings about all this. On the one hand, it’s brought this blog a handful of new readers who might not have discovered it otherwise. On the other hand, it’s made me a part of the story in a way I don’t deserve to be.

After speaking with my colleagues and (most importantly) with my wife, I thought I might use the last few seconds of my inadvertent notoriety to make some of highly non-technical points about the recent NSA revelations and my decision to blog about them.

I believe my first point should be self-evident: the NSA has made a number of terrible mistakes. These range from policy decisions to technical direction, to matters of their own internal security. There may have been a time when these mistakes could have been mitigated or avoided, but that time has passed. Personally I believe it passed even before Edward Snowden made his first contact with the press. But the disclosures of classified documents have set those decisions in stone.

Given these mistakes, we’re now faced with the job of cleaning up the mess. To that end there are two sets of questions: public policy questions — who should the NSA be spying on and how far should they be allowed to go in pursuit of that goal? And a second set of more technical questions: how do we repair the technological blowback from these decisions?

There are many bright people — quite a few in Congress — who are tending to the first debate. While I have my opinions about this, they’re (mostly) not the subject of this blog. Even if they were, I would probably be the wrong person to discuss them.

So my concern is the technical question. And I stress that while I label this ‘technical‘, it isn’t a question of equations and logic gates. The tech sector is one of the fastest growing and most innovative areas of the US economy. I believe the NSA’s actions have caused long-term damage to our credibility, in a manner that threatens our economic viability as well as, ironically, our national security.

The interesting question to me — as an American and as someone who cares about the integrity of speech — is how we restore faith in our technology. I don’t have the answers to this question right now. Unfortunately this is a long-term problem that will consume the output of researchers and technologists far more talented than I. I only hope to be involved in the process.

So while I know there are people at NSA who must be cursing Edward Snowden’s name and wishing we’d all stop talking about this. Too late. I hope that they understand the game we’re playing now. Their interests as well as mine now depend on repairing the damage. Downplaying the extent of the damage, or trying to restrict access to (formerly) classified documents does nobody any good.

It’s time to start fixing things.

On the NSA

Let me tell you the story of my tiny brush with the biggest crypto story of the year.

A few weeks ago I received a call from a reporter at ProPublica, asking me background questions about encryption. Right off the bat I knew this was going to be an odd conversation, since this gentleman seemed convinced that the NSA had vast capabilities to defeat encryption. And not in a ‘hey, d’ya think the NSA has vast capabilities to defeat encryption?’ kind of way. No, he’d already established the defeating. We were just haggling over the details.

Oddness aside it was a fun (if brief) set of conversations, mostly involving hypotheticals. If the NSA could do this, how might they do it? What would the impact be? I admit that at this point one of my biggest concerns was to avoid coming off like a crank. After all, if I got quoted sounding too much like an NSA conspiracy nut, my colleagues would laugh at me. Then I might not get invited to the cool security parties.

All of this is a long way of saying that I was totally unprepared for today’s bombshell revelations describing the NSA’s efforts to defeat encryption. Not only does the worst possible hypothetical I discussed appear to be true, but it’s true on a scale I couldn’t even imagine. I’m no longer the crank. I wasn’t even close to cranky enough.

And since I never got a chance to see the documents that sourced the NYT/ProPublica story — and I would give my right arm to see them — I’m determined to make up for this deficit with sheer speculation. Which is exactly what this blog post will be.

‘Bullrun’ and ‘Cheesy Name’ 

If you haven’t read the ProPublica/NYT or Guardian stories, you probably should. The TL;DR is that the NSA has been doing some very bad things. At a combined cost of $250 million per year, they include:

  1. Tampering with national standards (NIST is specifically mentioned) to promote weak, or otherwise vulnerable cryptography.
  2. Influencing standards committees to weaken protocols.
  3. Working with hardware and software vendors to weaken encryption and random number generators.
  4. Attacking the encryption used by ‘the next generation of 4G phones‘.
  5. Obtaining cleartext access to ‘a major internet peer-to-peer voice and text communications system’ (Skype?)
  6. Identifying and cracking vulnerable keys.
  7. Establishing a Human Intelligence division to infiltrate the global telecommunications industry.
  8. And worst of all (to me): somehow decrypting SSL connections.

All of these programs go by different code names, but the NSA’s decryption program goes by the name ‘Bullrun’ so that’s what I’ll use here.

How to break a cryptographic system

There’s almost too much here for a short blog post, so I’m going to start with a few general thoughts. Readers of this blog should know that there are basically three ways to break a cryptographic system. In no particular order, they are:

  1. Attack the cryptography. This is difficult and unlikely to work against the standard algorithms we use (though there are exceptions like RC4.) However there are many complex protocols in cryptography, and sometimes they are vulnerable.
  2. Go after the implementation. Cryptography is almost always implemented in software — and software is a disaster. Hardware isn’t that much better. Unfortunately active software exploits only work if you have a target in mind. If your goal is mass surveillance, you need to build insecurity in from the start. That means working with vendors to add backdoors.
  3. Access the human side. Why hack someone’s computer if you can get them to give you the key?

Bruce Schneier, who has seen the documents, says that ‘math is good’, but that ‘code has been subverted’. He also says that the NSA is ‘cheating‘. Which, assuming we can trust these documents, is a huge sigh of relief. But it also means we’re seeing a lot of (2) and (3) here.

So which code should we be concerned about? Which hardware?

SSL Servers by OS type. Source: Netcraft.

This is probably the most relevant question. If we’re talking about commercial encryption code, the lion’s share of it uses one of a small number of libraries. The most common of these are probably the Microsoft CryptoAPI (and Microsoft SChannel) along with the OpenSSL library.

Of the libraries above, Microsoft is probably due for the most scrutiny. While Microsoft employs good (and paranoid!) people to vet their algorithms, their ecosystem is obviously deeply closed-source. You can view Microsoft’s code (if you sign enough licensing agreements) but you’ll never build it yourself. Moreover they have the market share. If any commercial vendor is weakening encryption systems, Microsoft is probably the most likely suspect.

And this is a problem because Microsoft IIS powers around 20% of the web servers on the Internet — and nearly forty percent of the SSL servers! Moreover, even third-party encryption programs running on Windows often depend on CAPI components, including the random number generator. That makes these programs somewhat dependent on Microsoft’s honesty.

Probably the second most likely candidate is OpenSSL. I know it seems like heresy to imply that OpenSSL — an open source and widely-developed library — might be vulnerable. But at the same time it powers an enormous amount of secure traffic on the Internet, thanks not only to the dominance of Apache SSL, but also due to the fact that OpenSSL is used everywhere. You only have to glance at the FIPS CMVP validation lists to realize that many ‘commercial’ encryption products are just thin wrappers around OpenSSL.

Unfortunately while OpenSSL is open source, it periodically coughs up vulnerabilities. Part of this is due to the fact that it’s a patchwork nightmare originally developed by a programmer who thought it would be a fun way to learn Bignum division.* Part of it is because crypto is unbelievably complicated. Either way, there are very few people who really understand the whole codebase.

On the hardware side (and while we’re throwing out baseless accusations) it would be awfully nice to take another look at the Intel Secure Key integrated random number generators that most Intel processors will be getting shortly. Even if there’s no problem, it’s going to be an awfully hard job selling these internationally after today’s news.

Which standards?

From my point of view this is probably the most interesting and worrying part of today’s leak. Software is almost always broken, but standards — in theory — get read by everyone. It should be extremely difficult to weaken a standard without someone noticing. And yet the Guardian and NYT stories are extremely specific in their allegations about the NSA weakening standards.

The Guardian specifically calls out the National Institute of Standards and Technology (NIST) for a standard they published in 2006. Cryptographers have always had complicated feelings about NIST, and that’s mostly because NIST has a complicated relationship with the NSA.

Here’s the problem: the NSA ostensibly has both a defensive and an offensive mission. The defensive mission is pretty simple: it’s to make sure US information systems don’t get pwned. A substantial portion of that mission is accomplished through fruitful collaboration with NIST, which helps to promote data security standards such as the Federal Information Processing Standards (FIPS) and NIST Special Publications.

I said cryptographers have complicated feelings about NIST, and that’s because we all know that the NSA has the power to use NIST for good as well as evil. Up until today there’s been no real evidence of malice, despite some occasional glitches — and compelling evidence that at least one NIST cryptographic standard could have contained a backdoor. But now maybe we’ll have to re-evaluate that relationship. As utterly crazy as it may seem.

Unfortunately, we’re highly dependent on NIST standards, ranging from pseudo-random number generators to hash functions and ciphers, all the way to the specific elliptic curves we use in SSL/TLS. While the possibility of a backdoor in any of these components does seem remote, trust has been violated. It’s going to be an absolute nightmare ruling it out.

Which people?

Probably the biggest concern in all this is the evidence of collaboration between the NSA and unspecified ‘telecom providers’. We already know that the major US (and international) telecom carriers routinely assist the NSA in collecting data from fiber-optic cables. But all this data is no good if it’s encrypted.

While software compromises and weak standards can help the NSA deal with some of this, by far the easiest way to access encrypted data is to simply ask for — or steal — the keys. This goes for something as simple as cellular encryption (protected by a single key database at each carrier) all the way to SSL/TLS which is (most commonly) protected with a few relatively short RSA keys.

The good and bad thing is that as the nation hosting the largest number of popular digital online services (like Google, Facebook and Yahoo) many of those critical keys are located right here on US soil. Simultaneously, the people communicating with those services — i.e., the ‘targets’ — may be foreigners. Or they may be US citizens. Or you may not know who they are until you scoop up and decrypt all of their traffic and run it for keywords.

Which means there’s a circumstantial case that the NSA and GCHQ are either directly accessing Certificate Authority keys** or else actively stealing keys from US providers, possibly (or probably) without executives’ knowledge. This only requires a small number of people with physical or electronic access to servers, so it’s quite feasible.*** The one reason I would have ruled it out a few days ago is because it seems so obviously immoral if not illegal, and moreover a huge threat to the checks and balances that the NSA allegedly has to satisfy in order to access specific users’ data via programs such as PRISM.

To me, the existence of this program is probably the least unexpected piece of all the news today. Somehow it’s also the most upsetting.

So what does it all mean?

I honestly wish I knew. Part of me worries that the whole security industry will talk about this for a few days, then we’ll all go back to our normal lives without giving it a second thought. I hope we don’t, though. Right now there are too many unanswered questions to just let things lie.

The most likely short-term effect is that there’s going to be a lot less trust in the security industry. And a whole lot less trust for the US and its software exports. Maybe this is a good thing. We’ve been saying for years that you can’t trust closed code and unsupported standards: now people will have to verify.

Even better, these revelations may also help to spur a whole burst of new research and re-designs of cryptographic software. We’ve also been saying that even open code like OpenSSL needs more expert eyes. Unfortunately there’s been little interest in this, since the clever researchers in our field view these problems as ‘solved’ and thus somewhat uninteresting.

What we learned today is that they’re solved all right. Just not the way we thought.

Notes:

* The original version of this post repeated a story I heard recently (from a credible source!) about Eric Young writing OpenSSL as a way to learn C. In fact he wrote it as a way to learn Bignum division, which is way cooler. Apologies Eric!

** I had omitted the Certificate Authority route from the original post due to an oversight — thanks to Kenny Patterson for pointing this out — but I still think this is a less viable attack for passive eavesdropping (that does not involve actively running a man in the middle attack). And it seems that much of the interesting eavesdropping here is passive.

*** The major exception here is Google, which deploys Perfect Forward Secrecy for many of its connections, so key theft would not work here. To deal with this the NSA would have to subvert the software or break the encryption in some other way.