# The strange story of “Extended Random”

Yesterday, David Benjamin posted a pretty esoteric note on the IETF’s TLS mailing list. At a superficial level, the post describes some seizure-inducingly boring flaws in older Canon printers. To most people that was a complete snooze. To me and some of my colleagues, however, it was like that scene in X-Files where Mulder and Scully finally learn that aliens are real.

Those fossilized printers confirmed a theory we’d developed in 2014, but had been unable to prove: namely, the existence of a specific feature in RSA’s BSAFE TLS library called “Extended Random” — one that we believe to be evidence of a concerted effort by the NSA to backdoor U.S. cryptographic technology.

Before I get to the details, I want to caveat this post in two different ways. First, I’ve written about the topic of cryptographic backdoors way too much. In 2013, the Snowden revelations revealed the existence of a campaign to sabotage U.S. encryption systems. Since that time, cryptographers have spent thousands of hours identifying, documenting, and trying to convince people to care about these backdoors. We’re tired and we want to do more useful things.

The second caveat covers a problem with any discussion of cryptographic backdoors. Specifically, you never really get absolute proof. There’s always some innocent or coincidental explanation that could sort of fit the evidence — maybe it was all a stupid mistake. So you look for patterns of unlikely coincidences, and use Occam’s razor a lot. You don’t get a Snowden every day.

With all that said, let’s talk about Extended Random, and what this tells us about the NSA. First some background.

### Dual_EC_DRBG and RSA BSAFE

To understand the context of this discovery, you need to know about a standard called Dual EC DRBG. This was a proposed random number generator that the NSA developed in the early 2000s. It was standardized by NIST in 2007, and later deployed in some important cryptographic products — though we didn’t know it at the time.

Dual EC has a major problem, which is that it likely contains a backdoor. This was pointed out in 2007 by Shumow and Ferguson, and effectively confirmed by the Snowden leaks in 2013. Drama ensued. NIST responded by pulling the standard. (For an explainer on the Dual EC backdoor, see here.)

Somewhere around this time the world learned that RSA Security had made Dual EC the default random number generator in their popular cryptographic library, which was called BSAFE. RSA hadn’t exactly kept this a secret, but it was such a bonkers thing to do that nobody (in the cryptographic community) had known. So for years RSA shipped their library with this crazy algorithm, which made its way into all sorts of commercial devices.

The RSA drama didn’t quite end there, however. In late 2013, Reuters reported that RSA had taken \$10 million to backdoor their software. RSA sort of denies this. Or something. It’s not really clear.

Regardless of the intention, it’s known that RSA BSAFE did incorporate Dual EC. This could have been an innocent decision, of course, since Dual EC was a NIST standard. To shed some light on that question, in 2014 my colleagues and I decided to reverse-engineer the BSAFE library to see if it the alleged backdoor in Dual EC was actually exploitable by an attacker like the NSA. We figured that specific engineering decisions made by the library designers could be informative in tipping the scales one way or the other.

It turns out they were.

### Extended Random

In the course of reverse engineering the Java version of BSAFE, we discovered a funny inclusion. Specifically, we found that BSAFE supports a non-standard extension to the TLS protocol called “Extended Random”.

The Extended Random extension is an IETF Draft proposed by an NSA employee named Margaret Salter (at some point the head of NSA’s Information Assurance Directorate, which worked on “defensive” crypto for DoD) along with Eric Rescorla as a contractor. (Eric was very clearly hired to develop a decent proposal that wouldn’t hurt TLS, and would primarily be used on government machines. The NSA did not share their motivations with him.)

It’s important to note that Extended Random by itself does not introduce any cryptographic vulnerabilities. All it does is increase the amount of random data (“nonces”) used in a TLS protocol connection. This shouldn’t hurt TLS at all, and besides it was largely intended for U.S. government machines.

The only thing that’s interesting about Extended Random is what happens when that random data is generated using the Dual EC algorithm. Specifically, this extra data acts as “rocket fuel”, significantly increasing the efficiency of exploiting the Dual EC backdoor to decrypt TLS connections.

In short, if you’re an agency like the NSA that’s trying to use Dual EC as a backdoor to intercept communications, you’re much better off with a system that uses both Dual EC DRBG and Extended Random. Since Extended Random was never standardized by the IETF, it shouldn’t be in any systems. In fact, to the best of our knowledge, BSAFE is the only system in the world that implements it.

In addition to Extended Random, we discovered a variety of features that, combined with the Dual EC backdoor, could make RSA BSAFE fairly easy to exploit. But Extended Random is by far the strangest and hardest to justify.

So where did this standard come from? For those who like technical mysteries, it turns out that Extended Random isn’t the only funny-smelling proposal the NSA made. It’s actually one of four failed IETF proposals made by NSA employees, or contractors who work closely with the NSA, all of which try to boost the amount of randomness in TLS. Thomas Ptacek has a mind-numbingly detailed discussion of these proposals and his view of their motivation in this post.

### Oh my god I never thought spies could be so boring. What’s the new development?

Despite the fact that we found Extended Random in RSA BSAFE (a free version we downloaded from the Internet), a fly in the ointment was that it didn’t actually seem to be enabled. That is: the code was there but the switches to enable it were hard-coded to “off”.

This kind of put a wrench in our theory that RSA might have included Extended Random to make BSAFE connections more exploitable by the NSA. There might be some commercial version of BSAFE out there with this code active, but we were never able to find it or prove it existed. And even worse, it might appear only in some special “U.S. government only” version of BSAFE, which would tend to undermine the theory that there was something intentional about including this code — after all, why would the government spy on itself?

Which finally brings us to the news that appeared on the TLS mailing list the other day. It turns out that certain Canon printers are failing to respond properly to connections made using the new version of TLS (which is called 1.3), because they seem to have implemented an unauthorized TLS extension using the same number as an extension that TLS 1.3 needs in order to operate correctly. Here’s the relevant section of David’s post:

The web interface on some Canon printers breaks with 1.3-capable
ClientHello messages. We have purchased one and confirmed this with a
PIXMA MX492. User reports suggest that it also affects PIXMA MG3650
and MX495 models. It potentially affects a wide range of Canon
printers.

These printers use the RSA BSAFE library to implement TLS and this
library implements the extended_random extension and assigns it number
40. This collides with the key_share extension and causes 1.3-capable
handshakes to fail.

So in short, this news appears to demonstrate that commercial (non-free) versions of RSA BSAFE did deploy the Extended Random extension, and made it active within third-party commercial products. Moreover, they deployed it specifically to machines — specifically off-the-shelf commercial printers — that don’t seem to be reserved for any kind of special government use.

(If these turn out to be special Department of Defense printers, I will eat my words.)

Ironically, the printers are now the only thing that still exhibits the features of this (now deprecated) version of BSAFE. This is not because the NSA was targeting printers. Whatever devices they were targeting are probably gone by now. It’s because printer firmware tends to be obsolete and yet highly persistent. It’s like a remote pool buried beneath the arctic circle that preserves software species that would otherwise vanish from the Internet.

Which brings us to the moral of the story: not only are cryptographic backdoors a terrible idea, but they totally screw up the assigned numbering system for future versions of your protocol.

Actually no, that’s a pretty useless moral. Instead, let’s just say that you can deploy a cryptographic backdoor, but it’s awfully hard to control where it will end up.

# Attack of the week: DUHK

Before we get started, fair warning: this is going to be a post about a fairly absurd (but non-trivial!) attack on cryptographic systems. But that’s ok, because it’s based on a fairly absurd vulnerability.

This work comes from Nadia Heninger, Shaanan Cohney and myself, and follows up on some work we’ve been doing to look into the security of pseudorandom number generation in deployed cryptographic devices. We made a “fun” web page about it and came up with a silly logo. But since this affects something like 25,000 deployed Fortinet devices, the whole thing is actually kind of depressing.

The paper is called “Practical state recovery attacks against legacy RNG implementation“, and it attacks an old vulnerability in a pseudorandom number generator called ANSI X9.31, which is used in a lot of government certified products. The TL;DR is that this ANSI generator really sucks, and is easy to misuse. Worse, when it’s misused — as it has been — some very bad things can happen to the cryptography that relies on it.

First, some background.

### What is an ANSI, and why should I care?

A pseudorandom number generator (PRG) is a deterministic algorithm designed to “stretch” a short random seed into a large number of apparently random numbers. These algorithms are used ubiquitously in cryptographic software to supply all of the random bits that our protocols demand.

PRGs are so important, in fact, that the U.S. government has gone to some lengths to standardize them. Today there are three generators approved for use in the U.S. (FIPS) Cryptographic Module Validation Program. Up until 2016, there were four. This last one, which is called the ANSI X9.31 generator, is the one we’re going to talk about here.

ANSI X9.31 is a legacy pseudorandom generator based on a block cipher, typically AES. It takes as its initial seed a pair of values (K, V) where K is a key and V is an initial “seed” (or “state”). The generator now produces a long stream of pseudorandom bits by repeatedly applying the block cipher in the crazy arrangement below:

The diagram above illustrates one of the funny properties of the ANSI generator: namely, that while the state value V updates for each iteration of the generator, the key K never changes. It remains fixed throughout the entire process.

And this is a problem. Nearly twenty years ago, Kelsey, Schneier, Wagner and Hall pointed out that this fact makes the ANSI generator terribly insecure in the event that an attacker should ever learn the key K.

Specifically, if an attacker were to obtain K somehow, and then was able to learn only a single 16-byte raw output block (Ri) from a working PRG, she could do the following: (1) guess the timestamp T, (2) work backwards (decrypting using K) in order to recover the corresponding state value V, and now (3) run the generator forwards or backwards (with guesses for T) to obtain every previous and subsequent output of the generator.

Thus, if an application uses the ANSI generator to produce something like a random nonce (something that is typically sent in a protocol in cleartext), and also uses the generator to produce secret keys, this means an attacker could potentially recover those secret keys and completely break the protocol.

Of course, all of this requires that somehow the attacker learns the secret value K. At the time Kelsey et al. published their result, this was viewed as highly unlikely. After all, we’re really good at keeping secrets.

### I assume you’re joking?

So far we’ve established that the ANSI generator is only secure if you can forever secure the value K. However, this seems fairly reasonable. Surely implementers won’t go around leaking their critical secrets all over the place. And certainly not in government-validated cryptographic modules. That would be crazy.

Yet crazy things do happen. We figured someone should probably check.

To see how the X9.31 key is managed in real products, our team developed a sophisticated analytic technique called “making a graduate student read every FIPS document on the CMVP website”.

Most of the documents were fairly vague. And yet, a small handful of widely-used cryptographic modules had language that was troubling. Specifically, several vendors include language in their security policy that indicates the ANSI key was either hard-coded, or at least installed in a factory — as opposed to being freshly generated at each device startup.

Of even more concern: at least one of the hard-coded vendors was Fortinet, a very popular and successful maker of VPN devices and firewalls.

To get more specific, it turns out that starting (apparently in 2009, or perhaps earlier), every FortiOS 4.x device has shipped with a hardcoded value for K. This key has been involved in generating virtually every random bit used to establish VPN connections on those appliances, using both the TLS and IPSec protocols. The implication is that anyone with the resources to simply reverse-engineer the FortiOS firmware (between 2009 and today) could theoretically have been able to recover K themselves — and thus passively decrypt any VPN connection.

(Note: Independent of our work, the ANSI generator was replaced with a more secure alternative as of FortiOS 5.x. As a result of our disclosure, it has also been patched in FortiOS 4.3.19. There are still lots of unpatched firewalls out there, however.)

### What does the attack look like?

Running an attack against a VPN device requires three ingredients. The first is the key K, which can be recovered from the FortiOS firmware using a bit of elbow grease. Shaanan Cohney (the aforementioned graduate student) was able to pull it out with a bit of effort.

Next, the attacker must have access to some VPN or TLS traffic. It’s important to note that this is not an active attack. All you really need is a network position that’s capable of monitoring full two-sided TLS or IPSec VPN connections.

Specifically, the attacker needs a full AES block (16 bytes) worth of output from the ANSI generator, plus part of a second block to check success against. Fortunately both TLS and IPSec (IKE) include nonces of sufficient length to obtain this output, and both are drawn from the ANSI generator, which lives in the FortiOS kernel. The attacker also needs the Diffie-Hellman ephemeral public keys, which are part of the protocol transcript.

Finally, you need to know the timestamp Ti that was used to operate the generator. In FortiOS, these timestamps have a 1-microsecond resolution, so guessing them is actually a bit of a challenge. Fortunately, TLS and other protocols include the time-in-seconds as one of the outputs of the TLS protocol, so the actually guessing space is typically only about 2^20 at most. Still, this guessing proves to be one of the most costly elements of the attack.

Given all of the ingredients above, the attacker now decrypts the output block taken from the protocol nonce using K, guesses each possible Ti value, and then winds forward or backwards until she finds the random bits that were used to generate that party’s Diffie-Hellman secret key. Fortunately, the key and nonce are generated one after the other, so this is not quite as painful as it sounds. But it is fairly time consuming. Fortunately, computers are fast, so this is not a dealbreaker.

With the secret key in hand, it’s possible to fully decrypt the VPN connection, read all traffic, and modify the data as needed.

### Does the attack really work?

Since we’re not the NSA, it’s awfully hard for us to actually apply this attack to real Fortinet VPN connections in the wild. Not to mention that it would be somewhat unethical.

However, there’s nothing really unethical about scanning for FortiOS devices that are online and willing to accept incoming traffic from the Internet. To validate the attack, the team conducted a large-scale scan of the entire IPv4 address space. Each time we found a device that appeared to present as a FortiOS 4.x VPN, we initiated a connection with it and tested to see if we could break our own connection.

It turns out that there are a lot of FortiOS 4.x devices in the wild. Unfortunately, only a small number of them accept normal IPSec connections from strangers. Fortunately, however, a lot of them do accept TLS connections. Both protocol implementations use the same ANSI generator for their random numbers.

This scan allowed us to validate that — as of  October 2017 — the vulnerability was present and exploitable on more than 25,000 Fortinet devices across the Internet. And this count is likely conservative, since these were simply the devices that bothered to answer us when we scanned. A more sophisticated adversary like a nation-state would have access to existing VPN connections in flight.

In short, if you’re using a legacy Fortinet VPN you should probably patch.

### So what does it all mean?

There are really three lessons to be learned from a bug like this one.

The first is that people make mistakes. We should probably design our crypto and certification processes to anticipate that, and make it much harder for these mistakes to become catastrophic decryption vulnerabilities like the one in FortiOS 4.x. Enough said.

The second is that government crypto certifications are largely worthless. I realize that seems like a big conclusion to draw from a single vulnerability. But this isn’t just a single vendor — it’s potentially several vendors that all fell prey to the same well-known 20-year old vulnerability. When a vulnerability is old enough to vote, your testing labs should be finding it. If they’re not finding things like this, what value are they adding?

Finally, there’s a lesson here about government standards. ANSI X9.31 (and its cousin X9.17) is over twenty years old. It’s (fortunately) been deprecated as of 2016, but a huge number of products still use it. This algorithm should have disappeared ten years earlier — and yet here we are. It’s almost certain that this small Fortinet vulnerability is just the tip of the iceberg. Following on revelations of a possible deliberate backdoor in the Dual EC generator, none of this stuff looks good. It’s time to give serious thought to how we make cryptographic devices resilient — even against the people who are supposed to be helping us secure them.

But that’s a topic for a much longer post.

# How do you know if an RNG is working?

 No matter how much cryptographers accomplish, we’re always building on a questionable foundation. (illustration: Marc S. Rousseau)

Last week, Edward Snowden spoke to a packed crowd at SXSW about the many problems (and limited solutions) facing those of us who want to keep our communications private. Snowden said a number of things — including a shout out to Moxie’s company Whisper Systems, who certainly deserve it. But instead of talking about that, I wanted to focus on (in my opinion) one of Snowden’s most important quotes:

We need all those brilliant Belgian cryptographers to go “alright we know that these encryption algorithms we are using today work, typically it is the random number generators that are attacked as opposed to the encryption algorithms themselves. How can we make them [secure], how can we test them?”

Now it’s possible I’m a little biased, but it seems to me this cuts to the core of our problems with building secure systems in an increasingly hostile world. Namely: most encryption relies on some source of “random” numbers, either to generate keys or (particularly in the case of public key encryption) to provide semantic security for our ciphertexts.

What this means is that an attacker who can predict the output of your RNG — perhaps by taking advantage of a bug, or even compromising it at a design level — can often completely decrypt your communications. The Debian project learned this firsthand, as have many others. This certainly hasn’t escaped NSA’s notice, if the allegations regarding its Dual EC random number generator are true.

All of this bring us back to Snowden’s quote above, and the question he throws open for us. How do you know that an RNG is working? What kind of tests can we run on our code to avoid flaws ranging from the idiotic to the highly malicious? Unfortunately this question does not have an easy answer. In the rest of this post I’m going to try to explain why.

Background: Random and Pseudorandom Number Generation

I’ve written quite a bit about random number generation on this blog, but before we go forward it’s worth summarizing a few basic facts about random number generation.

First off, the ‘random’ numbers we use in most deployed cryptographic systems actually come from two different systems:

1. A ‘true’ random number generator (or entropy generator) that collects entropy from the physical world. This can include entropy collected from low-level physical effects like thermal noise and shot noise, or it can include goofy stuff like mouse movements and hard disk seek times.
2. An algorithmic ‘pseudorandom number generator’ (PRNG) that typically processes the output of (1) to both stretch the output to provide more bits and, in some cases, provide additional security protections in case the output of (1) proves to be biased.
It’s important to note that pseudorandom number generators aren’t “random number generators” at all. These generators typically use cryptographic algorithms (e.g., block ciphers or hash functions) to process a seed value from the RNG into many apparently random looking and unpredictable bytes.

In most cases, it’s quite rare for your application to ever see the raw output of a true random number generator.* Even the low-level entropy collector within Linux’s RNG uses cryptographic constructs like hash functions in order to ‘mix’ the output of various entropy sources. To produce the bits produced in /dev/random or /dev/urandom Linux then seeds a PRNG like Yarrow or Fortuna.**

Another similar pattern occurs inside of the Intel “secure key” random number generator included in Intel Ivy Bridge processors. When you buy one of these processors, you’re getting (free!) a hardware ‘1-shot‘ circuit that collects high-entropy electronic noise, which is then measured and processed into useful RNG output. The design looks like this:

 Hardware random number generator used on Intel Ivy Bridge processors. Left: the ‘1-shot’ circuit used to collect physical entropy. Right: the data flow from generator to output, including health checks and PRNG computation. (source).

Once again, with Intel’s design you (i.e., the application developer) don’t get access to this raw randomness. It’s first used to seed a PRNG based on AES (CTR-DRBG from NIST SP800-90A). What you actually get as an application developer is the processed output of that algorithm.

In practice this typical design some implications. On the positive side, the presence of a PRNG means that the underlying RNG circuit can get pretty borked (e.g., biased) without the results being detectable by your application. On the negative side, the underlying RNG circuit can get pretty borked without the results being detectable in your application.

In other words, with only a few ugly glitches — things that can happen in real life — you can easily get a broken random number generator that nobody notices until it’s way too late. And that’s without deliberate tampering, which makes things way, way worse.

Which brings us back to our fundamental question: how do systems know that their RNG is working. This turns out to be a question without a perfect answer.

### Statistical Tests

If you look at the literature on random number generators, you’ll find a lot of references to statistical randomness testing suites like Diehard or NIST’s SP 800-22. The gist of these systems is that they look a the output of an RNG and run tests to determine whether the output is, from a statistical perspective, “good enough” for government work (very literally, in the case of the NIST suite.)

The nature of these tests varies. Some look at simple factors like bias (the number of 1s and 0s) while others look for more sophisticated features such as the distribution of numbers when mapped into 3-D space.

Now I don’t want to knock these tests. They’re a perfectly valid way to detect serious flaws in a (true) RNG — I can attest to this, since I’ve built one that failed the tests miserably — but they probably won’t detect flaws in your system. That’s because like I said above, most deployed systems include a combination of RNG and PRNG, or even RNG plus “conditioning” via cryptographic hash functions or ciphers. The nature of these cryptographic, algorithmic processes is such that virtually every processed output will pass statistical tests with flying colors — even if the PRNG is initialized with ‘garbage’ input.

This means, unfortunately, that it can be very hard to use statistical tests to detect a broken RNG unless you properly test it only at the low level. And even there you won’t rule out intentional backdoors — as I’ll discuss in a moment.

Assuming that you’ve tested your true RNG properly and it’s passing all tests, it’s still important to test your PRNG. One approach to doing this is to use Known Answer Tests (KATs) that are essentially test vectors. These contain some input seed material as well as a set of output bytes that should be the algorithmic result of running the PRNG on that seed.

Since PRNGs are purely algorithmic, the theory here is that you can test them like algorithms. While this approach is valid, it raises two potential issues (both of which I’ve seen in practice).

First, you can only test your PRNG on so many points. Thus it’s quite possible that your PRNG will succeed on one particular test vector (i.e., it’ll output just so many valid bytes) but go completely off the reservation on some other input. This is unlikely, but not impossible in normal conditions. It’s very possible if someone is trying to build a malicious backdoor into your PRNG implementation.

Second, the process of instrumenting your PRNG implementation for testing can actually introduce vulnerabilities in your deployed system! Think about this for a second. Normal PRNGs take in real random seeds from your RNG. The last thing you’d ever want to do is run your PRNG on some predictable seed — if you did, everyone would be able to predict the PRNGs outputs. Yet adding a test harness your system means building in logic to re-seed your RNG to something predictable!

This is like adding an ejection seat to your car. Might make you safer — unless it goes off while you’re driving to work.

A quick glance through e.g., the OpenSSL code shows that indeed, exactly this sort of code exists and ships in some versions of the library. Of course, the experienced developers will note that surely such features could be surrounded by pre-processor directives (or the equivalent in your language of choice) ensuring that they’ll never be activated in production code. Sadly, at least in the case of FIPS, this is not possible — for reasons I’ll explain next.

### Runtime Health Checks

Another approach to testing RNGs is to test them while the system is running. This isn’t intended to rule out design-level flaws (as the above statistical and KAT tests are) but it is intended to catch situations where the RNG becomes broken during normal operation. This can occur for a variety of reasons, e.g., manufacturing defects, system damage, and even exposure to outside radiation.

Health checks can take different forms. FIPS 140, for example, mandates that all approved RNGs be tested at startup time using KATs. (This is why you can’t make your test harness conditional on compilation flags — it must ship in your production code!) They subsequently mandate a runtime health check that verifies the generator has not become ‘stuck’, i.e., is spitting out the same bytes over and over again.

While I’m sure this last test may have saved someone, somewhere, it seems totally inappropriate and useless when applied to the output of an RNG/PRNG pair, which is how NIST recommends it be used. This is because even the most broken algorithmic PRNGs will almost never spit out duplicate values — even if the underlying RNG fails completely.

The upshot of this decision is that NIST (FIPS) recommend a check that will almost never succeed in catching anything useful from a PRNG, but does introduce a whole bunch of extra logic that can suffer from flaws and/or malicious circumvention. I’m sure the good folks at NIST realize this, but they recommend it anyway — after all, what else are they going to do?

### Tampering

Which brings us to the \$10 million question. What happens if an attacker is deliberately tampering with our RNG/PRNG in order to make it fail? Note that this is not an academic question. We have excellent reason to believe it’s happened in some real systems.

Just for fun, let’s go back to the Intel Ivy Bridge RNG described above. We’ll take a look specifically at the PRNG portion of the design, which uses the NIST CTR-DRBG random number generator with AES:

 Portion of the Intel Ivy Bridge design, with a few annotations added by yours truly. (original source)

The CTR-DRBG design relies on two features. First, an AES key is selected at random along with some input seed. This pair goes into the AES cipher, where it is processed to derive a new key and data. The result should be unpredictable to most attackers.

But if you were able to change the way keys were updated (in the key_in_mux hilighted) so that instead of updating the key and/or using an unpredictable one, it chose a fixed key known to the attacker, you would now have a very powerful backdoor. Specifically, the output would still look statistically perfectly random. But an attacker who knows this key could simply decrypt one block of RNG output to obtain all future and past outputs of the generator until the next time it was reseeded.

Note that I am not saying the Intel system has a backdoor in it — far from it. I’m only considering how easily it might be made to have one if you were an attacker with control of Intel’s fabrication plants (or their microcode updates). And this is hardly Intel’s fault. It’s just the nature of this particular RNG design. Others could be just as vulnerable.

Actually using this knowledge to attack applications would be more complex, since many system-level RNGs (including the Linux Kernel RNG) combine the output of the RNG with other system entropy (through XOR, unfortunately, not hashing). But Intel has pushed hard to see their RNG output used directly, and there exist plugins for OpenSSL that allow you to use it similarly. If you used such a method, these hypothetical flaws could easily make their way all the way into your cryptography.

### Designing against these issues

Unfortunately, so far all I’ve done is call out the challenges with building trustworthy RNGs. And there’s a reason for this: the challenges are easy to identify, while the solutions themselves are hard. And unfortunately at this time, they’re quite manual.

Building secure RNG/PRNGs still requires a combination of design expertise, careful low-level (true) RNG testing — using expert design and statistical tests — and the use of certified algorithms with proper tests. All of the techniques above contribute to building a secure RNG, but none of them are quite sufficient.

Solving this problem, at least in software, so we can ensure that code is correct and does not contain hidden ‘easter eggs’, represents one of the more significant research challenges facing those of us who depend on secure cryptographic primitives. I do hope some enterprising graduate students will give these issues the attention they deserve.

Notes:

* Though there are some exceptions. See, for example, this FIPS certified smart card that included a bad RNG which was used to generate cryptographic secrets. In general FIPS disallows this except for a very small number of approved RNGs. Perhaps this was one.

** The original version of this post claimed that /dev/random seeds /dev/urandom. This is actually a mistake — both /dev/random and /dev/urandom use the same PRNG, but /dev/random simply keeps track of how much ‘entropy’ is in the pool and blocks when you have drawn too many bits. Thanks to Brendan Long and Thomas Ptacek for setting me straight.

# A few more notes on NSA random number generators

Last Friday, Joseph Menn from Reuters published an article claiming that RSA, the pioneering security firm and division of EMC, accepted \$10 million dollars to include the Dual EC random number generator as the default in their flagship BSAFE library. I’ve written a bit about Dual EC on this blog, so readers will know that I don’t think highly of it. For one thing, it’s a rotten choice for a commercial product, due to its lousy software performance. Much worse: it may introduce a practical backdoor into any system that uses it.

Given the numerous problems with Dual EC it’s baffling that RSA would select this particular generator as the default for their software. It’s particularly strange given that BSAFE is designed for use in Java-based and embedded systems where performance truly is at a premium. And none of this can be explained by the needs of a single RSA customer, since those needs could be satisfied merely by making BSAFE an option, rather than the default.

Of course there have been many people who had viewed RSA’s decision as potentially malicious. Unsupported rumors have been floating around since long before Reuters got involved. What’s new this time is that RSA insiders appear to be going on the record.

I suppose time will tell how the industry reacts to this news. In the interim I have a few small facts to add to the discussion, so I thought I’d sketch them out in this post.

### #1: Dual_EC_DRBG’s ‘backdoor’ was known as of January 2005

It’s widely believed that the ‘vulnerability’ in Dual_EC was first identified by Microsoft employees Dan Shumow and Niels Ferguson in the summer of 2007. Tanja Lange (and nymble) recently tipped me off to the fact that this isn’t precisely true.

In point of fact, the possibility of a backdoor was known to at least some members of the ANSI X9.82 standardization committee as far back in January 2005. This surprising news comes via a patent application filed by Certicom employees Dan Brown and Scott Vanstone. The application claims a priority date of January 2005. Here’s the scary bit:

If P and Q are established in a security domain controlled by an administrator, and the entity who generates Q for the domain does so with knowledge of e (or indirectly via knowledge of d). The administrator will have an escrow key for every ECRNG that follows that standard.

Escrow keys are known to have advantages in some contexts. They can provide a backup functionality. If a cryptographic key is lost, then data encrypted under that key is also lost. However, encryption keys are generally the output of random number generators. Therefore, if the ECRNG is used to generate the encryption key K, then it may be possible that the escrow key e can be used to recover the encryption key K. Escrow keys can provide other functionality, such as for use in a wiretap. In this case, trusted law enforcement agents may need to decrypt encrypted traffic of criminals, and to do this they may want to be able to use an escrow key to recover an encryption key.

For example, in the SSL and TLS protocols, which are used for securing web (HTTP) traffic, a client and server perform a handshake in which their first actions are to exchange random values sent in the clear.

The patent also describes a number of ways to close the backdoor in Dual_EC_DRBG. Indeed, it may be due to Brown and Vanstone that the NIST standard includes an alternative method to close the backdoor (by generating a random Q point).

The existence of this patent does not mean that Brown and Vanstone were responsible for Dual EC. In fact, the generator appears to be an NSA invention, and may date back to the early 2000s. What this patent demonstrates is that some members of the ANSI committee, of which RSA was also a member, had reason to at least suspect that Dual EC could be used to create a wiretapping backdoor. (Update: John Kelsey confirms this.) It would be curious to know how widely this information was shared, and whether anyone on the committee ever inquired as to the provenance of the default parameters.

### #2. Dual_EC_DRBG is not really a NIST standard

This is hardly a secret, but it’s something that hasn’t been widely acknowledged in the press. Dual_EC_DRBG is generally viewed as a NIST standard, since it was published in NIST Special Publication 800-90. But that’s not the only place it appears, nor was it developed at NIST.

A complete history of Dual_EC_DRBG would begin with NSA’s drive to include it in the ANSI X9.82 DRBG standard, with a standardization process kicked off in the early 2000s. The draft ANSI standard includes Dual_EC_DRBG with all of the known parameters, along with several additional elliptic curve parameters that were not included in the NIST standards.

 Members of the ANSI X9F1 Tool Standards and Guidelines Group which wrote ANSI X9.82.

However you’ll also find Dual_EC_DRBG in the international standard ISO 18031. In other words, Dual EC was widely propagated in numerous standards long before NIST finalized it, and it’s hardly limited to US borders. The ANSI and ISO drafts are in some sense worse, since they don’t include a technique for generating your own Q parameter.

### #3. Dual_EC_DRBG is not the only asymmetric random number generator in the ANSI and ISO standards

Cryptographers generally think of Dual EC as the only ‘public key’ random number generator to be widely standardized. We also point to NSA’s generation of the public parameters as evidence that Dual_EC may be compromised.

But in fact, the ANSI X9.82 and ISO standards each include a second generator based on public key cryptographic techniques. And like Dual EC, this one ships with a complete set of default parameters! The additional generator is based on an algorithm due to Micali and Schnorr, and relies for its security on assumptions related to the hardness of factoring large composite numbers. It requires an RSA-type modulus, several of which are conveniently provided in the specification.

 Two default MS-DRBG moduli from the ISO 18031 specification.

There’s no reason to suspect that MS-DRBG is used by any real products, let alone that there’s a backdoor in the standard. In fact, there’s a curious fact about it: it’s not obvious from the public literature how one would attack the generator even if one knew the factorization of the n values above — though it seems intuitively likely that an attack does exist. Solving the problem, and finding a practical attack for known factorization, would be a fun project for an enthusiastic mathematician.

Since MS-DRBG comes from the same people who brought you Dual EC, if you are using it you might want to think twice.

# RSA warns developers not to use RSA products

In today’s news of the weird, RSA (a division of EMC) has recommended that developers desist from using the (allegedly) ‘backdoored’ Dual_EC_DRBG random number generator — which happens to be the default in RSA’s BSafe cryptographic toolkit. Youch.

In case you’re missing the story here, Dual_EC_DRBG (which I wrote about yesterday) is the random number generator voted most likely to be backdoored by the NSA. The story here is that — despite many valid concerns about this generator — RSA went ahead and made it the default generator used for all cryptography in its flagship cryptography library. The implications for RSA and RSA-based products are staggering. In the worst case a modestly bad but by no means worst case, the NSA may be able to intercept SSL/TLS connections made by products implemented with BSafe.

So why would RSA pick Dual_EC as the default? You got me. Not only is Dual_EC hilariously slow — which has real performance implications — it was shown to be a just plain bad random number generator all the way back in 2006. By 2007, when Shumow and Ferguson raised the possibility of a backdoor in the specification, no sensible cryptographer would go near the thing.

And the killer is that RSA employs a number of highly distinguished cryptographers! It’s unlikely that they’d all miss the news about Dual_EC.

We can only speculate about the past. But here in the present we get to watch RSA’s CTO Sam Curry publicly defend RSA’s choices. I sort of feel bad for the guy. But let’s make fun of it anyway.

I’ll take his statement line by line (Sam is the boldface):

“Plenty of other crypto functions (PBKDF2, bcrypt, scrypt) will iterate a hash 1000 times specifically to make it slower.”

Password hash functions are built deliberately slow to frustrate dictionary attacks. Making a random number generator slow is just dumb.

At the time, elliptic curves were in vogue

Say what?

and hash-based RNG was under scrutiny.

Nonsense. A single obsolete hash based generator (FIPS 186) was under scrutiny — and fixed. The NIST SP800-90 draft in which Dual_EC appeared ALSO provided three perfectly nice non-backdoored generators: two based on hash functions and one based on AES. BSafe even implements some of them. Sam, this statement is just plain misleading.

The hope was that elliptic curve techniques—based as they are on number theory—would not suffer many of the same weaknesses as other techniques (like the FIPS 186 SHA-1 generator) that were seen as negative

Dual-EC suffers exactly the same sort of weaknesses as FIPS 186. Unlike the alternative generators in NIST SP800-90 it has a significant bias and really should not be used in production systems. RSA certainly had access to this information after the analyses were published in 2006.

and Dual_EC_DRBG was an accepted and publicly scrutinized standard.

And every bit of public scrutiny said the same thing: this thing is broken! Grab your children and run away!

SP800-90 (which defines Dual EC DRBG) requires new features like continuous testing of the output, mandatory re-seeding,

The exact same can be said for the hash-based and AES-based alternative generators you DIDN’T choose from SP800-90.

optional prediction resistance and the ability to configure for different strengths.

So did you take advantage of any of these options as part of the BSafe defaults? Why not? How about the very simple mitigations that NIST added to SP800-90A as a means to remove concerns that the generator might have a backdoor? Anyone?

There’s not too much else to say here. I guess the best way to put it is: this is all part of the process. First you find the disease. Then you see if you can cure it.

# The Many Flaws of Dual_EC_DRBG

 The Dual_EC_DRBG generator from NIST SP800-90A.

Update 9/19: RSA warns developers not to use the default Dual_EC_DRBG generator in BSAFE. Oh lord.

As a technical follow up to my previous post about the NSA’s war on crypto, I wanted to make a few specific points about standards. In particular I wanted to address the allegation that NSA inserted a backdoor into the Dual-EC pseudorandom number generator.

For those not following the story, Dual-EC is a pseudorandom number generator proposed by NIST for international use back in 2006. Just a few months later, Shumow and Ferguson made cryptographic history by pointing out that there might be an NSA backdoor in the algorithm. This possibility — fairly remarkable for an algorithm of this type — looked bad and smelled worse. If true, it spelled almost certain doom for anyone relying on Dual-EC to keep their system safe from spying eyes.

Now I should point out that much of this is ancient history. What is news today is the recent leak of classified documents that points a very emphatic finger towards Dual_EC, or rather, to an unnamed ‘2006 NIST standard’. The evidence that Dual-EC is this standard has now become so hard to ignore that NIST recently took the unprecedented step of warning implementers to avoid it altogether.

Better late than never.

In this post I’m going to try to explain the curious story of Dual-EC. While I’ll do my best to keep this discussion at a high and non-mathematical level, be forewarned that I’m probably going to fail at least at a couple of points. I you’re not the mood for all that, here’s a short summary:

• In 2005-2006 NIST and NSA released a pseudorandom number generator based on elliptic curve cryptography. They released this standard — with very little explanation — both in the US and abroad.
• This RNG has some serious issues with just being a good RNG. The presence of such obvious bugs was mysterious to cryptographers.
• In 2007 a pair of Microsoft researchers pointed out that these vulnerabilities combined to produce a perfect storm, which — together with some knowledge that only NIST/NSA might have — opened a perfect backdoor into the random number generator itself.
• This backdoor may allow the NSA to break nearly any cryptographic system that uses it.

If you’re still with me, strap in. Here goes the long version.

### Dual-EC

For a good summary on the history of Dual-EC-DRBG, see this 2007 post by Bruce Schneier. Here I’ll just give the highlights.

Back in 2004-5, NIST decided to address a longstanding weakness of the FIPS standards, namely, the limited number of approved pseudorandom bit generator algorithms (PRGs, or ‘DRBGs’ in NIST parlance) available to implementers. This was actually a bit of an issue for FIPS developers, since the existing random number generators had some known design weaknesses.*

NIST’s answer to this problem was Special Publication 800-90, parts of which were later wrapped up into the international standard ISO 18031. The NIST pub added four new generators to the FIPS canon. None these algorithms is a true random number generator in the sense that they collect physical entropy. Instead, what they do is process the (short) output of a true random number generator — like the one in Linux — conditioning and stretching this ‘seed’ into a large number of random-looking bits you can use to get things done.** This is particularly important for FIPS-certified cryptographic modules, since the FIPS 140-2 standards typically require you to use a DRBG as a kind of ‘post-processing’ — even when you have a decent hardware generator.

The first three SP800-90 proposals used standard symmetric components like hash functions and block ciphers. Dual_EC_DRBG was the odd one out, since it employed mathematics more that are typically used to construct public-key cryptosystems. This had some immediate consequences for the generator: Dual-EC is slow in a way that its cousins aren’t. Up to a thousand times slower.

Now before you panic about this, the inefficiency of Dual_EC is not necessarily one of its flaws! Indeed, the inclusion of an algebraic generator actually makes a certain amount of sense. The academic literature includes a distinguished history of provably secure PRGs based on on number theoretic assumptions, and it certainly didn’t hurt to consider one such construction for standardization. Most developers would probably use the faster symmetric alternatives, but perhaps a small number would prefer the added confidence of a provably-secure construction.

Unfortunately, here is where NIST ran into their first problem with Dual_EC.

### Flaw #1: Dual-EC has no security proof

Let me spell this out as clearly as I can. In the course of proposing this complex and slow new PRG where the only damn reason you’d ever use the thing is for its security reduction, NIST forgot to provide one. This is like selling someone a Mercedes and forgetting to attach the hood ornament.

I’d like to say this fact alone should have damned Dual_EC, but sadly this is par for the course for NIST — which treats security proofs like those cool Japanese cookies you can never find. In other words, a fancy, exotic luxury. Indeed, NIST has a nasty habit of dumping proof-writing work onto outside academics, often after the standard has been finalized and implemented in products everywhere.

So when NIST put forward its first draft of SP800-90 in 2005, academic cryptographers were left to analyze it from scratch. Which, to their great credit, they were quite successful at.

The first thing reviewers noticed is that Dual-EC follows a known design paradigm — it’s a weird variant of an elliptic curve linear congruential generator. However they also noticed that NIST had made some odd rookie mistakes.

Now here we will have to get slightly wonky — though I will keep mathematics to a minimum. (I promise it will all come together in the end!) Constructions like Dual-EC have basically two stages:

1. A stage that generates a series of pseudorandom elliptic curve points. Just like on the graph at right, an elliptic curve point is a pair (x, y) that satisfies an elliptic curve equation. In general, both x and y are elements of a finite field, which for our purposes means they’re just large integers.***

The main operation of the PRNG is to apply mathematical operations to points on the elliptic curve, in order to generate new points that are pseudorandom — i.e., are indistinguishable from random points in some subgroup.

And the good news is that Dual-EC seems to do this first part beautifully! In fact Brown and Gjøsteen even proved that this part of the generator is sound provided that the Decisional Diffie-Hellman problem is hard in the specific elliptic curve subgroup. This is a well studied hardness assumption so we can probably feel pretty confident in this proof.

2. Extract pseudorandom bits from the generated EC points. While the points generated by Dual-EC may be pseudorandom, that doesn’t mean the specific (x, y) integer pairs are random bitstrings. For one thing, ‘x‘ and ‘y‘ are not really bitstrings at all, they’re integers less than some prime number. Most pairs don’t satisfy the curve equation or are not in the right subgroup. Hence you can’t just output the raw x or y values and expect them to make good pseudorandom bits.

Thus the second phase of the generator is to ‘extract’ some (but not all) of the bits from the EC points. Traditional literature designs do all sorts of things here — including hashing the point or dropping up to half of the bits of the x-coordinate. Dual-EC does something much simpler: it grabs the x coordinate, throws away the most significant 16-18 bits, and outputs the rest.

In 2006, first Gjøsteen and later Schoenmakers and Sidorenko took a close look at Dual-EC and independently came up with a surprising result:

### Flaw #2: Dual-EC outputs too many bits

Unlike those previous EC PRGs which output anywhere from 2/3 to half of the bits from the x-coordinate, Dual-EC outputs nearly the entire thing.

This is good for efficiency, but unfortunately it also gives Dual-EC a bias. Due to some quirks in the mathematics of the field operations, an attacker can now predict the next bits of Dual-EC output with a fairly small — but non-trivial — success probability, in the range of 0.1%. While this number may seem small to non-cryptographers, it’s basically a hanging offense for a cryptographic random number generator where probability of predicting a future bit should be many orders of magnitude lower.

What’s just plain baffling is that this flaw ever saw the light of day. After all, the specification was developed by bright people at NIST — in collaboration with NSA. Either of those groups should easily have discovered a bug like this, especially since this issue had been previously studied. Indeed, within a few months of public release, two separate groups of academic cryptographers found it, and were able to implement an attack using standard PC equipment.

So in summary, the bias is mysterious and it seems to be very much an ‘own-goal’ on the NSA’s part. Why in the world would they release so much information from each EC point? It’s hard to say, but a bit more investigation reveals some interesting consequences:

### Flaw #3: You can guess the original EC point from looking at the output bits

By itself this isn’t really a flaw, but will turn out to be interesting in just a minute.

Since Dual-EC outputs so many bits from the x-coordinate of each point — all but the most significant 16 bits — it’s relatively easy to guess the original source point by simply brute-forcing the missing 16 bits and solving the elliptic curve equation for y. (This is all high-school algebra, I swear!)

While this process probably won’t uniquely identify the original (x, y), it’ll give you a modestly sized list of candidates. Moreover with only 16 missing bits the search can be done quickly even on a desktop computer. Had Dual_EC thrown away more bits of the x-coordinate, this search would not have been feasible at all.

So what does this mean? In general, recovering the EC point shouldn’t actually be a huge problem. In theory it could lead to a weakness — say predicting future outputs — but in a proper design you would still have to solve a discrete logarithm instance for each and every point in order to predict the next bytes output by the generator.

And here is where things get interesting.

### Flaw #4: If you know a certain property about the Dual_EC parameters, and can recover an output point, you can predict all subsequent outputs of the generator

Did I tell you this would get interesting in a minute? I totally did.

The next piece of our puzzle was discovered by Microsoft researchers Dan Shumow and Niels Ferguson, and announced at the CRYPTO 2007 rump session. I think this result can best be described via the totally intuitive diagram below. (Don’t worry, I’ll explain it!)

 Annotated diagram from Shumow-Ferguson presentation (CRYPTO 2007). Colorful elements were added by yours truly. Thick green arrows mean ‘this part is easy to reverse’. Thick red arrows should mean the opposite. Unless you’re the NSA.

The Dual-EC generator consists of two stages: a portion that generates the output bits (right) and a part that updates the internal state (left).

Starting from the “r_i” value (circled, center) and heading right, the bit generation part first computes the output point using the function “r_i * Q” — where Q is an elliptic curve point defined in the parameters — then truncates 16 bits its off its x-coordinate to get the raw generator output. The “*” operator here describes elliptic point multiplication, which is a complex operation that should be relatively hard to invert.

Note that everything after the point multiplication should be easy to invert and recover from the output, as we discussed in the previous section.

Every time the generator produces one block of output bits, it also updates its internal state. This is designed to prevent attacks where someone compromises the internal values of a working generator, then uses this value to wind the generator backwards and guess past outputs. Starting again from the circled “r_i” value, the generator now heads upwards and computes the point “r_i * P” where P is a different elliptic curve point also described in the parameters. It then does some other stuff.

The theory here is that P and Q should be random points, and thus it should be difficult to find “r_i * P” used for state update even if you know the output point “r_i * Q” — which I stress you do know, because it’s easy to find. Going from one point to the other requires you to know a relationship between P and Q, which you shouldn’t actually know since they’re supposed to be random values. The difficulty of this is indicated by the thick red arrow.

 Looks totally kosher to me. (Source: NIST SP800-90A)

There is, however, one tiny little exception to this rule. What if P and Q aren’t entirely random values? What if you chose them yourself specifically so you’d know the mathematical relationship between the two points?

In this case it turns out you can easily compute the next PRG state after recovering a single output point (from 32 bytes of RNG output). This means you can follow the equations through and predict the next output. And the next output after that. And on forever and forever.****

This is a huge deal in the case of SSL/TLS, for example. If I use the Dual-EC PRG to generate the “Client Random” nonce transmitted in the beginning of an SSL connection, then the NSA will be able to predict the “Pre-Master” secret that I’m going to generate during the RSA handshake. Given this information the connection is now a cleartext read. This is not good.

So now you should all be asking the most important question of all: how the hell did the NSA generate the specific P and Q values recommended in Appendix A of Dual-EC-DRBG? And do they know the relationship that allows them to run this attack? All of which brings us to:

### Flaw #5: Nobody knows where the recommended parameters came from

And if you think that’s problematic, welcome to the club.

### But why? And where is Dual-EC used?

The ten million dollar question of Dual-EC is why the NSA would stick such an obviously backdoored algorithm into an important standard. Keep in mind that cryptographers found the major (bias) vulnerabilities almost immediately after Dual-EC shipped. The possibility of a ‘backdoor’ was announced in summer 2007. Who would still use it?

A few people have gone through the list of CMVP-evaluated products and found that the answer is: quite a few people would. Most certify Dual-EC simply because it’s implemented in OpenSSL-FIPS, and they happen to use that library. But at least one provider certifies it exclusively. Yuck.

 Hardcoded constants from the OpenSSL-FIPS implementation of Dual_EC_DRBG. Recognize ’em?

It’s worth keeping in mind that NIST standards carry a lot of weight — even those that might have a backdoor. Folks who aren’t keeping up on the latest crypto results could still innocently use the thing, either by accident or (less innocently) because the government asked them to. Even if they don’t use it, they might include the code in their product — say through the inclusion of OpenSSL-FIPS or MS Crypto API — which means it’s just a function call away from being surreptitiously activated.

Which is why people need to stop including Dual-EC immediately. We have no idea what it’s for, but it needs to go away. Now.

### So what about the curves?

The last point I want to make is that the vulnerabilities in Dual-EC have precisely nothing to do with the specifics of the NIST standard elliptic curves themselves. The ‘back door’ in Dual-EC comes exclusively from the relationship between P and Q — the latter of which is published only in the Dual-EC specification. The attack can work even if you don’t use the NIST pseudorandom curves.

However, the revelations about NIST and the NSA certainly make it worth our time to ask whether these curves themselves are somehow weak. The best answer to that question is: we don’t know. Others have observed that NIST’s process for generating the curves leaves a lot to be desired. But including some kind of hypothetical backdoor would be a horrible, horrific idea — one that would almost certainly blow back at us.

You’d think people with common sense would realize this. Unfortunately we can’t count on that anymore.

Thanks to Tanja Lange for her assistance proofing this post. Any errors in the text are entirely mine.

Notes:

* My recollection of this period is hazy, but prior to SP800-90 the two most common FIPS DRBGs in production were (1) the SHA1-based DSA generator of FIPS 186-2 and (2) ANSI X9.31. The DSA generator was a special-purpose generator based on SHA1, and was really designed just for that purpose. ANSI X9.31 used block ciphers, but suffered from some more subtle weaknesses it retained from the earlier X9.17 generator. These were pointed out by Kelsey, Schneier, Wagner and Hall.

** This is actually a requirement of the FIPS 140-2 specification. Since FIPS does not approve any true random number generators, it instead mandates that you run your true RNG output through a DRBG (PRNG) first. The only exception is if your true RNG has been approved ‘for classified use’.

*** Specifically, x and y are integers in the range 0 to p-1 where p is a large prime number. A point is a pair (x, y) such that $y^2 = x^3 + ax + b$ mod p. The values a and b are defined as part of the curve parameters.

**** The process of predicting future outputs involves a few guesses, since you don’t know the exact output point (and had to guess at the missing 16 bits), but you can easily reduce this to a small set of candidates — then it’s just a question of looking at a few more bits of RNG output until you guess the right one.

A couple of weeks ago I wrote a long post about random number generation, which I find to be one of the most fascinating subjects in cryptography — mostly because of how terrible things get when people screw it up.

And oh boy, do people screw it up. Back in 2008 it was Debian, with their ‘custom’ OpenSSL implementation that could only produce 32,768 possible TLS keys (do you really need more?) In 2012 it’s 25,000 factorable TLS public keys, all of which appear to have been generated by embedded devices with crappy RNGs.

When this happens, people get nervous. They start to wonder: am I at risk? And if so, what can I do to protect myself?

Answering this question is easy. Answering it in detail is hard. The easy answer is that if you really believe there’s a problem with your RNG, stop reading this blog and go fix it!

The more complicated answer is that many bad things can happen if your RNG breaks down, and some are harder to deal with than others.

In the rest of this post I’m going to talk about this, and give a few potential mitigations. I want to stress that this post is mostly a thought-exercise. Please do not re-engineer OpenSSL around any of the ‘advice’ I give herein (I’m looking at you, Dan Kaminsky), and if you do follow any of my advice, understand the following:

When it all goes terribly wrong, I’ll quietly take down this post and pretend I never wrote it.

In other words, proceed at your own risk. First, some background.

Before we get started, it’s important to understand what it means for an RNG to be broken. In general, failure comes in three or four different flavors, which may or may not share the same root cause:

1. Predictable output. This usually happens when a generator is seeded with insufficient entropy. The result is that the attacker can actually predict, or at least guess, the exact bits that the RNG will output. This has all kinds of implications, none of them good.
2. Resetting output. This can occur when a generator repeatedly outputs the same stream of bits, e.g., every time the system restarts. When an attacker deliberately brings about this condition, we refer to it as a Reset Attack, and it’s a real concern for devices like smartcards.
3. Shared output. Sometimes exactly the same bits will appear on two or more different devices. Often the owners won’t have any idea this is happening until someone else turns up with their public key! This is almost always caused by some hokey entropy source or hardcoded seed value.

These aren’t necessarily distinct problems, and they can easily bleed into one another. For example, a resetting RNG can become a predictable RNG once the adversary observes the first round of outputs. Shared output can become predictable if the attacker gets his hands on another device of the same model. The Debian bug, for example, could be classified into all three categories.

In addition to the problems above, there’s also a fourth (potential) issue:

4. Non-uniform or biased output. It’s at least possible that the output of your generator will exhibit biased outputs, or strings of repeated characters (the kind of thing that tests like DIEHARD look for). In the worst case, it might just start outputting zero bytes.

The good news is that this is relatively unlikely as long as you’re using a standard crypto library. That’s because modern systems usually process their collected entropy through a pseudo-random generator (PRG) built from a hash function or a block cipher.

The blessing of a PRG is that it will usually give you nice, statistically-uniform output even when you feed it highly non-uniform seed entropy. This helps to prevent attacks (like this one) which rely on the presence of obvious biases in your nonces/keys/IVs. While this isn’t a rule, most common RNG failures seem to be related to bad entropy, not to some surprise failure of the PRG.

Unfortunately, the curse is that a good PRG can hide problems. Since most people only see the output of their PRG (rather than the seed material), it’s easy to believe that your RNG is doing a great job, even when it’s actually quite sick.* This has real-world implications: many standards (like FIPS-140) perform continuous tests on the final RNG/PRG output (e.g., for repeated symbols). The presence of a decent PRG renders these checks largely useless, since they’ll really only detect the most catastrophic (algorithmic) failures.

### Key generation

When it comes to generating keys with a bad (P)RNG, the only winning move is not to play. Algorithm aside, if an attacker can predict your ‘randomness’, they can generate the same key themselves. Game over. Incidentally, this goes for ephemeral keys as well, meaning that protocols like Diffie-Hellman are not secure in the presence of a predictable RNG (on either side).

If you think there’s any chance this will happen to you, then either (a) generate your keys on a reliable device, or (b) get yourself a better RNG. If neither option is available, then for god’s sake, collect some entropy from the user before you generate keys. Ask them to tap a ditty on a button, or (if a keyboard is available), get a strong, unpredictable passphrase and hash it through PBKDF2 to get a string of pseudo-random bits. This might not save you, but it’s probably better than the alternative.

What’s fascinating is that some cryptosystems are more vulnerable to bad, or shared randomness than others. The recent batch of factorable RSA keys, for example, appears to be the product of poor entropy on embedded devices. But the keys weren’t broken because someone guessed the entropy that was used. Rather, the mere fact that two different devices share entropy was enough to make both of their keys factorable.

According to Nadia Heninger, this is an artifact of the way that RSA keys are generated. Every RSA public modulus is the product of two primes. Some devices generate one prime, then reseed their RNG (with the time, say) before generating the second. The result is two different moduli, each sharing one prime. Unfortunately, this is the worst thing you can do with an RSA key, since anyone can now compute the GCD and efficiently factor both keys.

Although you’re never going to be safe when two devices share entropy, it’s arguable that you’re better off if they at least generate the same RSA key, rather than two moduli with a single shared prime. One solution is to calculate the second prime as a mathematical function of the first. An even easier fix is just to make sure that you don’t reseed between the two primes.

Of course it’s not really fair to call these ‘solutions’. Either way you’re whistling past the graveyard, but at least this might let you whistle a bit longer.

### Digital signatures and MACs

There’s a widely held misconception that digital signatures must be randomized. This isn’t true, but it’s understandable that people might think this, since it’s a common property of the signatures we actually use. Before we talk about this, let me stipulate that what we’re talking about here is the signing operation itself — I’m premising this discussion on the very important assumption that we have properly-generated keys.

MACs. The good news is that virtually every practical MAC in use today is deterministic. While there are probabilistic MACs, they’re rarely used. As long as you’re using a standard primitive like HMAC, that bad RNG shouldn’t affect your ability to authenticate your messages.

Signatures. The situation with signatures is a bit more complicated. I can’t cover all signatures, but let’s at least go over the popular ones. For reasons that have never been adequately explored, these are (in no particular order): ECDSA, DSA, RSA-PKCS#1v1.5 and RSA-PSS. Of these four signatures, three are randomized.

The major exception is RSA-PKCS#1v1.5 signature padding, which has no random fields at all. While this means you can give your RNG a rest, it doesn’t mean that v1.5 padding is good. It’s more accurate to say that the ‘heuristically-secure’ v1.5 padding scheme remains equally bad whether you have a working RNG or not.

If you’re signing with RSA, a much better choice is to use RSA-PSS, since that scheme actually has a reduction to the hardness of the RSA problem. So far so good, but wait a second: doesn’t the P in PSS stand for Probabilistic? And indeed, a close look at the PSS description (below) reveals the presence of random salt in every signature.

The good news is that this salt is only an optimization. It allows the designers to obtain a tighter reduction to the RSA problem, but the security proof holds up even if you repeat the salt, or just hardcode it to zero.

 The PSS signing algorithm. MGF is constructed from a hash function.

Having dispensed with RSA, we can get down to the serious offenders: DSA and ECDSA.

The problem in a nutshell is that every (EC)DSA signature includes a random nonce value, which must never be repeated. If you ever forget this warning — i.e., create two signatures (on different messages) using the same nonce — then anyone can recover your secret key. This is both easy to detect, and to compute. You could write a script to troll the Internet for repeated nonces (e.g., in X509 certificates), and then outsource the final calculation to a bright eighth-grader.

Usually when DSA/ECDSA go wrong, it’s because someone simply forgot to generate a random nonce in the first place. This appears to be what happened with the Playstation 3. Obviously, this is stupid and you shouldn’t do it. But no matter how careful your implementation, you’re always going to be vulnerable if your RNG starts spitting out repeated values. If this happens to you even once, you need to throw away your key and generate a new one.

There are basically two ways to protect yourself:

• Best: don’t to use (EC)DSA in the first place. It’s a stupid algorithm with no reasonable security proof, and as a special bonus it goes completely pear-shaped in the presence of a bad RNG. Unfortunately, it’s also a standard, used in TLS and elsewhere, so you’re stuck with it.
• Second best: Derive your nonces deterministically from the message and some secret data. If done correctly (big if!), this prevents two messages from being signed with the same nonce. In the extreme case, this approach completely eliminates the need for randomness in (EC)DSA signatures.There are two published proposals that take this approach. The best is Dan Bernstein’s (somewhat complex) EdDSA proposal, which looks like a great replacement for ECDSA. Unfortunately it’s a replacement, not a patch, since EdDSA uses different elliptic curves and is therefore not cross-compatible with existing ECDSA implementations.

Alternatively, Thomas Pornin has a proposal up that simply modifies (EC)DSA by using HMAC to derive the nonces. The best part about Thomas’s proposal is that it doesn’t break compatibility with existing (EC)DSA implementations. I will caution you, however: while Thomas’s work looks reasonable, his proposal is just a draft (and an expired one to boot). Proceed at your own risk.

### Encryption

There are various consequences to using a bad RNG for encryption, most of which depend on the scheme you’re using. Once again we’ll assume that the keys themselves are properly-generated. What’s at stake is the encryption itself.

Symmetric encryption. The good news is that symmetric encryption can be done securely with no randomness at all, provided that you have a strong encryption key and the ability to keep state between messages.

An obvious choice is to use CTR mode encryption. Since CTR mode IVs needn’t be unpredictable, you can set your initial IV to zero, then simply make sure that you always hang onto the last counter value between messages. Provided that you never ever re-use a counter value with a given key (even across system restarts) you’ll be fine.**

This doesn’t work with CBC mode, since that actually does require an unpredictable IV at the head of each chain. You can hack around this requirement in various ways, but I’m not going to talk about those here; nothing good will come of it.

Public-key encryption. Unfortunately, public-key encryption is much more difficult to get right without a good RNG.

Here’s the fundamental problem: if an attacker knows the randomness you used to produce a ciphertext, then (in the worst case) she can simply encrypt ‘guess’ messages until she obtains the same ciphertext as you. At that point she knows what you encrypted.***

Obviously this attack only works if the attacker can guess the message you encrypted. Hence it’s possible that high-entropy messages (symmetric keys, for example) will encrypt securely even without good randomness. But there’s no guarantee of this. Elgamal, for example, can fail catastrophically when you encrypt two messages with the same random nonce.****

Although I’m not going to endorse any specific public-key encryption scheme, it seems likely that some schemes will hold up better than others. For example, while predictably-randomized RSA-OAEP and RSA-OAEP+ will both be vulnerable to guessing attacks, there’s some (intuitive) reason to believe that they’ll remain secure for high-entropy messages like keys. I can’t prove this, but it seems like a better bet than using Elgamal (clearly broken) or older padding schemes like RSA-PKCS#1v1.5.

If my intuition isn’t satisfying to you, quite a lot of research is still being done in this area. See for example, recent works on deterministic public-key encryption, or hedged public-key encryption. Note that all of this work makes on the assumption that you’re encrypting high-entropy messages.

### Protocols

I can’t conclude this post without at least a token discussion of how a bad RNG can affect cryptographic protocols. The short version is that it depends on the protocol. The shorter version is that it’s almost always bad.

Consider the standard TLS handshake. Both sides use their RNGs to generate nonces. Then the client generates a random ‘Pre-master Secret’ (PMS), encrypts it under the server’s key, and transmits it over the wire. The ‘Master Secret’ (and later, transport key) is derived by hashing together all of the nonces and the PMS.

Since the PMS is the only real ‘secret’ in the protocol (everything else is sent in the clear), predicting it is the same as recovering the transport key. Thus TLS is not safe to use if the client RNG is predictable. What’s interesting is that the protocol is secure (at least, against passive attackers) even if the server’s RNG fails. I can only guess that this was a deliberate choice on the part of TLS’s designers.

 SSL handshake (source).

Protocols are already plenty exciting when you have a working RNG. Adding a bad RNG to the mix is like pouring fireworks on a fire. It’s at least possible to build protocols that are resilient to one participant losing their RNG, but it’s very tricky to accomplish — most protocols will fail in unexpected ways.

### In Summary

If you take nothing else from this post, I hope it’s this: using a broken RNG is just a bad idea. If you think there’s any chance your generator will stop working, then for god’s sake, fix it! Don’t waste your time doing any of the stuff I mention above.

That said, there legitimately are cases where your RNG can go wrong, or where you don’t have one in the first place. The purpose of this post was to help you understand these scenarios, and the potential consequences for your system. So model it. Think about it. Then spend your time on better things.

Notes:

The classic example is Debian’s 2008 OpenSSL release, which used a 15-bit process ID as the only seed for its PRG. This wasn’t obvious during testing, since the 32,768 possible RNG streams all looked pretty random. It was only after the public release that people noticed that many devices were sharing TLS keys.

** If you’re going to do this, you should also be sure to use a MAC on your ciphertext, including the initial counter value for each message.

*** A great example is unpadded, textbook RSA. If m is random, then it’s quite difficult to recover m given m^e mod N. If, however, you have a few good guesses for m and you know the public key (N, e), you can easily try each of your guesses and compare the results.

**** Given two Elgamal ciphertexts on the same key and randomness (g^r, y^r*M1), (g^r, y^r*M2) you can easily compute M1/M2. A similar thing happens with hash Elgamal. This may or may not be useful to you, depending on how much you know about the content of the various messages.