|No matter how much cryptographers accomplish, we’re always
building on a questionable foundation. (illustration: Marc S. Rousseau)
Last week, Edward Snowden spoke to a packed crowd at SXSW about the many problems (and limited solutions) facing those of us who want to keep our communications private. Snowden said a number of things — including a shout out to Moxie’s company Whisper Systems, who certainly deserve it. But instead of talking about that, I wanted to focus on (in my opinion) one of Snowden’s most important quotes:
We need all those brilliant Belgian cryptographers to go “alright we know that these encryption algorithms we are using today work, typically it is the random number generators that are attacked as opposed to the encryption algorithms themselves. How can we make them [secure], how can we test them?”
Now it’s possible I’m a little biased, but it seems to me this cuts to the core of our problems with building secure systems in an increasingly hostile world. Namely: most encryption relies on some source of “random” numbers, either to generate keys or (particularly in the case of public key encryption) to provide semantic security for our ciphertexts.
What this means is that an attacker who can predict the output of your RNG — perhaps by taking advantage of a bug, or even compromising it at a design level — can often completely decrypt your communications. The Debian project learned this firsthand, as have many others. This certainly hasn’t escaped NSA’s notice, if the allegations regarding its Dual EC random number generator are true.
All of this bring us back to Snowden’s quote above, and the question he throws open for us. How do you know that an RNG is working? What kind of tests can we run on our code to avoid flaws ranging from the idiotic to the highly malicious? Unfortunately this question does not have an easy answer. In the rest of this post I’m going to try to explain why.
Background: Random and Pseudorandom Number Generation
I’ve written quite a bit about random number generation on this blog, but before we go forward it’s worth summarizing a few basic facts about random number generation.
First off, the ‘random’ numbers we use in most deployed cryptographic systems actually come from two different systems:
- A ‘true’ random number generator (or entropy generator) that collects entropy from the physical world. This can include entropy collected from low-level physical effects like thermal noise and shot noise, or it can include goofy stuff like mouse movements and hard disk seek times.
- An algorithmic ‘pseudorandom number generator’ (PRNG) that typically processes the output of (1) to both stretch the output to provide more bits and, in some cases, provide additional security protections in case the output of (1) proves to be biased.
In most cases, it’s quite rare for your application to ever see the raw output of a true random number generator.* Even the low-level entropy collector within Linux’s RNG uses cryptographic constructs like hash functions in order to ‘mix’ the output of various entropy sources. To produce the bits produced in /dev/random or /dev/urandom Linux then seeds a PRNG like Yarrow or Fortuna.**
Another similar pattern occurs inside of the Intel “secure key” random number generator included in Intel Ivy Bridge processors. When you buy one of these processors, you’re getting (free!) a hardware ‘1-shot‘ circuit that collects high-entropy electronic noise, which is then measured and processed into useful RNG output. The design looks like this:
Once again, with Intel’s design you (i.e., the application developer) don’t get access to this raw randomness. It’s first used to seed a PRNG based on AES (CTR-DRBG from NIST SP800-90A). What you actually get as an application developer is the processed output of that algorithm.
In practice this typical design some implications. On the positive side, the presence of a PRNG means that the underlying RNG circuit can get pretty borked (e.g., biased) without the results being detectable by your application. On the negative side, the underlying RNG circuit can get pretty borked without the results being detectable in your application.
In other words, with only a few ugly glitches — things that can happen in real life — you can easily get a broken random number generator that nobody notices until it’s way too late. And that’s without deliberate tampering, which makes things way, way worse.
Which brings us back to our fundamental question: how do systems know that their RNG is working. This turns out to be a question without a perfect answer.
If you look at the literature on random number generators, you’ll find a lot of references to statistical randomness testing suites like Diehard or NIST’s SP 800-22. The gist of these systems is that they look a the output of an RNG and run tests to determine whether the output is, from a statistical perspective, “good enough” for government work (very literally, in the case of the NIST suite.)
The nature of these tests varies. Some look at simple factors like bias (the number of 1s and 0s) while others look for more sophisticated features such as the distribution of numbers when mapped into 3-D space.
Now I don’t want to knock these tests. They’re a perfectly valid way to detect serious flaws in a (true) RNG — I can attest to this, since I’ve built one that failed the tests miserably — but they probably won’t detect flaws in your system. That’s because like I said above, most deployed systems include a combination of RNG and PRNG, or even RNG plus “conditioning” via cryptographic hash functions or ciphers. The nature of these cryptographic, algorithmic processes is such that virtually every processed output will pass statistical tests with flying colors — even if the PRNG is initialized with ‘garbage’ input.
This means, unfortunately, that it can be very hard to use statistical tests to detect a broken RNG unless you properly test it only at the low level. And even there you won’t rule out intentional backdoors — as I’ll discuss in a moment.
Known Answer Tests (KATs)
Assuming that you’ve tested your true RNG properly and it’s passing all tests, it’s still important to test your PRNG. One approach to doing this is to use Known Answer Tests (KATs) that are essentially test vectors. These contain some input seed material as well as a set of output bytes that should be the algorithmic result of running the PRNG on that seed.
Since PRNGs are purely algorithmic, the theory here is that you can test them like algorithms. While this approach is valid, it raises two potential issues (both of which I’ve seen in practice).
First, you can only test your PRNG on so many points. Thus it’s quite possible that your PRNG will succeed on one particular test vector (i.e., it’ll output just so many valid bytes) but go completely off the reservation on some other input. This is unlikely, but not impossible in normal conditions. It’s very possible if someone is trying to build a malicious backdoor into your PRNG implementation.
Second, the process of instrumenting your PRNG implementation for testing can actually introduce vulnerabilities in your deployed system! Think about this for a second. Normal PRNGs take in real random seeds from your RNG. The last thing you’d ever want to do is run your PRNG on some predictable seed — if you did, everyone would be able to predict the PRNGs outputs. Yet adding a test harness your system means building in logic to re-seed your RNG to something predictable!
This is like adding an ejection seat to your car. Might make you safer — unless it goes off while you’re driving to work.
A quick glance through e.g., the OpenSSL code shows that indeed, exactly this sort of code exists and ships in some versions of the library. Of course, the experienced developers will note that surely such features could be surrounded by pre-processor directives (or the equivalent in your language of choice) ensuring that they’ll never be activated in production code. Sadly, at least in the case of FIPS, this is not possible — for reasons I’ll explain next.
Runtime Health Checks
Another approach to testing RNGs is to test them while the system is running. This isn’t intended to rule out design-level flaws (as the above statistical and KAT tests are) but it is intended to catch situations where the RNG becomes broken during normal operation. This can occur for a variety of reasons, e.g., manufacturing defects, system damage, and even exposure to outside radiation.
Health checks can take different forms. FIPS 140, for example, mandates that all approved RNGs be tested at startup time using KATs. (This is why you can’t make your test harness conditional on compilation flags — it must ship in your production code!) They subsequently mandate a runtime health check that verifies the generator has not become ‘stuck’, i.e., is spitting out the same bytes over and over again.
While I’m sure this last test may have saved someone, somewhere, it seems totally inappropriate and useless when applied to the output of an RNG/PRNG pair, which is how NIST recommends it be used. This is because even the most broken algorithmic PRNGs will almost never spit out duplicate values — even if the underlying RNG fails completely.
The upshot of this decision is that NIST (FIPS) recommend a check that will almost never succeed in catching anything useful from a PRNG, but does introduce a whole bunch of extra logic that can suffer from flaws and/or malicious circumvention. I’m sure the good folks at NIST realize this, but they recommend it anyway — after all, what else are they going to do?
Which brings us to the $10 million question. What happens if an attacker is deliberately tampering with our RNG/PRNG in order to make it fail? Note that this is not an academic question. We have excellent reason to believe it’s happened in some real systems.
Just for fun, let’s go back to the Intel Ivy Bridge RNG described above. We’ll take a look specifically at the PRNG portion of the design, which uses the NIST CTR-DRBG random number generator with AES:
|Portion of the Intel Ivy Bridge design, with a few annotations added by yours truly. (original source)|
The CTR-DRBG design relies on two features. First, an AES key is selected at random along with some input seed. This pair goes into the AES cipher, where it is processed to derive a new key and data. The result should be unpredictable to most attackers.
But if you were able to change the way keys were updated (in the key_in_mux hilighted) so that instead of updating the key and/or using an unpredictable one, it chose a fixed key known to the attacker, you would now have a very powerful backdoor. Specifically, the output would still look statistically perfectly random. But an attacker who knows this key could simply decrypt one block of RNG output to obtain all future and past outputs of the generator until the next time it was reseeded.
Note that I am not saying the Intel system has a backdoor in it — far from it. I’m only considering how easily it might be made to have one if you were an attacker with control of Intel’s fabrication plants (or their microcode updates). And this is hardly Intel’s fault. It’s just the nature of this particular RNG design. Others could be just as vulnerable.
Actually using this knowledge to attack applications would be more complex, since many system-level RNGs (including the Linux Kernel RNG) combine the output of the RNG with other system entropy (through XOR, unfortunately, not hashing). But Intel has pushed hard to see their RNG output used directly, and there exist plugins for OpenSSL that allow you to use it similarly. If you used such a method, these hypothetical flaws could easily make their way all the way into your cryptography.
Designing against these issues
Unfortunately, so far all I’ve done is call out the challenges with building trustworthy RNGs. And there’s a reason for this: the challenges are easy to identify, while the solutions themselves are hard. And unfortunately at this time, they’re quite manual.
Building secure RNG/PRNGs still requires a combination of design expertise, careful low-level (true) RNG testing — using expert design and statistical tests — and the use of certified algorithms with proper tests. All of the techniques above contribute to building a secure RNG, but none of them are quite sufficient.
Solving this problem, at least in software, so we can ensure that code is correct and does not contain hidden ‘easter eggs’, represents one of the more significant research challenges facing those of us who depend on secure cryptographic primitives. I do hope some enterprising graduate students will give these issues the attention they deserve.
* Though there are some exceptions. See, for example, this FIPS certified smart card that included a bad RNG which was used to generate cryptographic secrets. In general FIPS disallows this except for a very small number of approved RNGs. Perhaps this was one.
** The original version of this post claimed that /dev/random seeds /dev/urandom. This is actually a mistake — both /dev/random and /dev/urandom use the same PRNG, but /dev/random simply keeps track of how much ‘entropy’ is in the pool and blocks when you have drawn too many bits. Thanks to Brendan Long and Thomas Ptacek for setting me straight.