How do you know if an RNG is working?

No matter how much cryptographers accomplish, we’re always
building on a questionable foundation. (illustration: Marc S. Rousseau)

Last week, Edward Snowden spoke to a packed crowd at SXSW about the many problems (and limited solutions) facing those of us who want to keep our communications private. Snowden said a number of things — including a shout out to Moxie’s company Whisper Systems, who certainly deserve it. But instead of talking about that, I wanted to focus on (in my opinion) one of Snowden’s most important quotes:

We need all those brilliant Belgian cryptographers to go “alright we know that these encryption algorithms we are using today work, typically it is the random number generators that are attacked as opposed to the encryption algorithms themselves. How can we make them [secure], how can we test them?”

Now it’s possible I’m a little biased, but it seems to me this cuts to the core of our problems with building secure systems in an increasingly hostile world. Namely: most encryption relies on some source of “random” numbers, either to generate keys or (particularly in the case of public key encryption) to provide semantic security for our ciphertexts.

What this means is that an attacker who can predict the output of your RNG — perhaps by taking advantage of a bug, or even compromising it at a design level — can often completely decrypt your communications. The Debian project learned this firsthand, as have many others. This certainly hasn’t escaped NSA’s notice, if the allegations regarding its Dual EC random number generator are true.

All of this bring us back to Snowden’s quote above, and the question he throws open for us. How do you know that an RNG is working? What kind of tests can we run on our code to avoid flaws ranging from the idiotic to the highly malicious? Unfortunately this question does not have an easy answer. In the rest of this post I’m going to try to explain why.

Background: Random and Pseudorandom Number Generation

I’ve written quite a bit about random number generation on this blog, but before we go forward it’s worth summarizing a few basic facts about random number generation.

First off, the ‘random’ numbers we use in most deployed cryptographic systems actually come from two different systems:

  1. A ‘true’ random number generator (or entropy generator) that collects entropy from the physical world. This can include entropy collected from low-level physical effects like thermal noise and shot noise, or it can include goofy stuff like mouse movements and hard disk seek times.
  2. An algorithmic ‘pseudorandom number generator’ (PRNG) that typically processes the output of (1) to both stretch the output to provide more bits and, in some cases, provide additional security protections in case the output of (1) proves to be biased.
It’s important to note that pseudorandom number generators aren’t “random number generators” at all. These generators typically use cryptographic algorithms (e.g., block ciphers or hash functions) to process a seed value from the RNG into many apparently random looking and unpredictable bytes.

In most cases, it’s quite rare for your application to ever see the raw output of a true random number generator.* Even the low-level entropy collector within Linux’s RNG uses cryptographic constructs like hash functions in order to ‘mix’ the output of various entropy sources. To produce the bits produced in /dev/random or /dev/urandom Linux then seeds a PRNG like Yarrow or Fortuna.**

Another similar pattern occurs inside of the Intel “secure key” random number generator included in Intel Ivy Bridge processors. When you buy one of these processors, you’re getting (free!) a hardware ‘1-shot‘ circuit that collects high-entropy electronic noise, which is then measured and processed into useful RNG output. The design looks like this:

Hardware random number generator used on Intel Ivy Bridge processors. Left: the ‘1-shot’ circuit used to collect physical entropy. Right: the data flow from generator to output, including health checks and
PRNG computation. (source).

Once again, with Intel’s design you (i.e., the application developer) don’t get access to this raw randomness. It’s first used to seed a PRNG based on AES (CTR-DRBG from NIST SP800-90A). What you actually get as an application developer is the processed output of that algorithm.

In practice this typical design some implications. On the positive side, the presence of a PRNG means that the underlying RNG circuit can get pretty borked (e.g., biased) without the results being detectable by your application. On the negative side, the underlying RNG circuit can get pretty borked without the results being detectable in your application.

In other words, with only a few ugly glitches — things that can happen in real life — you can easily get a broken random number generator that nobody notices until it’s way too late. And that’s without deliberate tampering, which makes things way, way worse.

Which brings us back to our fundamental question: how do systems know that their RNG is working. This turns out to be a question without a perfect answer.

Statistical Tests

If you look at the literature on random number generators, you’ll find a lot of references to statistical randomness testing suites like Diehard or NIST’s SP 800-22. The gist of these systems is that they look a the output of an RNG and run tests to determine whether the output is, from a statistical perspective, “good enough” for government work (very literally, in the case of the NIST suite.)

The nature of these tests varies. Some look at simple factors like bias (the number of 1s and 0s) while others look for more sophisticated features such as the distribution of numbers when mapped into 3-D space.

Now I don’t want to knock these tests. They’re a perfectly valid way to detect serious flaws in a (true) RNG — I can attest to this, since I’ve built one that failed the tests miserably — but they probably won’t detect flaws in your system. That’s because like I said above, most deployed systems include a combination of RNG and PRNG, or even RNG plus “conditioning” via cryptographic hash functions or ciphers. The nature of these cryptographic, algorithmic processes is such that virtually every processed output will pass statistical tests with flying colors — even if the PRNG is initialized with ‘garbage’ input.

This means, unfortunately, that it can be very hard to use statistical tests to detect a broken RNG unless you properly test it only at the low level. And even there you won’t rule out intentional backdoors — as I’ll discuss in a moment.

Known Answer Tests (KATs)

Assuming that you’ve tested your true RNG properly and it’s passing all tests, it’s still important to test your PRNG. One approach to doing this is to use Known Answer Tests (KATs) that are essentially test vectors. These contain some input seed material as well as a set of output bytes that should be the algorithmic result of running the PRNG on that seed.

Since PRNGs are purely algorithmic, the theory here is that you can test them like algorithms. While this approach is valid, it raises two potential issues (both of which I’ve seen in practice).

First, you can only test your PRNG on so many points. Thus it’s quite possible that your PRNG will succeed on one particular test vector (i.e., it’ll output just so many valid bytes) but go completely off the reservation on some other input. This is unlikely, but not impossible in normal conditions. It’s very possible if someone is trying to build a malicious backdoor into your PRNG implementation.

Second, the process of instrumenting your PRNG implementation for testing can actually introduce vulnerabilities in your deployed system! Think about this for a second. Normal PRNGs take in real random seeds from your RNG. The last thing you’d ever want to do is run your PRNG on some predictable seed — if you did, everyone would be able to predict the PRNGs outputs. Yet adding a test harness your system means building in logic to re-seed your RNG to something predictable!

This is like adding an ejection seat to your car. Might make you safer — unless it goes off while you’re driving to work.

A quick glance through e.g., the OpenSSL code shows that indeed, exactly this sort of code exists and ships in some versions of the library. Of course, the experienced developers will note that surely such features could be surrounded by pre-processor directives (or the equivalent in your language of choice) ensuring that they’ll never be activated in production code. Sadly, at least in the case of FIPS, this is not possible — for reasons I’ll explain next.

Runtime Health Checks

Another approach to testing RNGs is to test them while the system is running. This isn’t intended to rule out design-level flaws (as the above statistical and KAT tests are) but it is intended to catch situations where the RNG becomes broken during normal operation. This can occur for a variety of reasons, e.g., manufacturing defects, system damage, and even exposure to outside radiation.

Health checks can take different forms. FIPS 140, for example, mandates that all approved RNGs be tested at startup time using KATs. (This is why you can’t make your test harness conditional on compilation flags — it must ship in your production code!) They subsequently mandate a runtime health check that verifies the generator has not become ‘stuck’, i.e., is spitting out the same bytes over and over again.

While I’m sure this last test may have saved someone, somewhere, it seems totally inappropriate and useless when applied to the output of an RNG/PRNG pair, which is how NIST recommends it be used. This is because even the most broken algorithmic PRNGs will almost never spit out duplicate values — even if the underlying RNG fails completely.

The upshot of this decision is that NIST (FIPS) recommend a check that will almost never succeed in catching anything useful from a PRNG, but does introduce a whole bunch of extra logic that can suffer from flaws and/or malicious circumvention. I’m sure the good folks at NIST realize this, but they recommend it anyway — after all, what else are they going to do?

Tampering

Which brings us to the $10 million question. What happens if an attacker is deliberately tampering with our RNG/PRNG in order to make it fail? Note that this is not an academic question. We have excellent reason to believe it’s happened in some real systems.

Just for fun, let’s go back to the Intel Ivy Bridge RNG described above. We’ll take a look specifically at the PRNG portion of the design, which uses the NIST CTR-DRBG random number generator with AES:

Portion of the Intel Ivy Bridge design, with a few annotations added by yours truly. (original source)

The CTR-DRBG design relies on two features. First, an AES key is selected at random along with some input seed. This pair goes into the AES cipher, where it is processed to derive a new key and data. The result should be unpredictable to most attackers.

But if you were able to change the way keys were updated (in the key_in_mux hilighted) so that instead of updating the key and/or using an unpredictable one, it chose a fixed key known to the attacker, you would now have a very powerful backdoor. Specifically, the output would still look statistically perfectly random. But an attacker who knows this key could simply decrypt one block of RNG output to obtain all future and past outputs of the generator until the next time it was reseeded.

Note that I am not saying the Intel system has a backdoor in it — far from it. I’m only considering how easily it might be made to have one if you were an attacker with control of Intel’s fabrication plants (or their microcode updates). And this is hardly Intel’s fault. It’s just the nature of this particular RNG design. Others could be just as vulnerable.

Actually using this knowledge to attack applications would be more complex, since many system-level RNGs (including the Linux Kernel RNG) combine the output of the RNG with other system entropy (through XOR, unfortunately, not hashing). But Intel has pushed hard to see their RNG output used directly, and there exist plugins for OpenSSL that allow you to use it similarly. If you used such a method, these hypothetical flaws could easily make their way all the way into your cryptography.

Designing against these issues

Unfortunately, so far all I’ve done is call out the challenges with building trustworthy RNGs. And there’s a reason for this: the challenges are easy to identify, while the solutions themselves are hard. And unfortunately at this time, they’re quite manual.

Building secure RNG/PRNGs still requires a combination of design expertise, careful low-level (true) RNG testing — using expert design and statistical tests — and the use of certified algorithms with proper tests. All of the techniques above contribute to building a secure RNG, but none of them are quite sufficient.

Solving this problem, at least in software, so we can ensure that code is correct and does not contain hidden ‘easter eggs’, represents one of the more significant research challenges facing those of us who depend on secure cryptographic primitives. I do hope some enterprising graduate students will give these issues the attention they deserve.

Notes:

* Though there are some exceptions. See, for example, this FIPS certified smart card that included a bad RNG which was used to generate cryptographic secrets. In general FIPS disallows this except for a very small number of approved RNGs. Perhaps this was one.

** The original version of this post claimed that /dev/random seeds /dev/urandom. This is actually a mistake — both /dev/random and /dev/urandom use the same PRNG, but /dev/random simply keeps track of how much ‘entropy’ is in the pool and blocks when you have drawn too many bits. Thanks to Brendan Long and Thomas Ptacek for setting me straight.

14 thoughts on “How do you know if an RNG is working?

  1. This is a good article, but I want to respond to this:

    “Even low-level interfaces like Linux /dev/random typically use cryptographic constructs like hash functions in order to 'mix' the output of various entropy sources. Faster sources like /dev/urandom then seed a PRNG like Yarrow or Fortuna to produce the many output bits your application probably needs.”

    Actually, the Linux kernel always creates random numbers the same way for both /dev/urandom and /dev/random (although their entropy pools are kept separate). The only difference is that /dev/urandom will stop giving you output if the “entropy” is too low.

    Greg KH just posted this on Google+ a little while ago:

    https://plus.googleapis.com/111049168280159033135/posts/AJkqHReK9x9
    http://www.2uo.de/myths-about-urandom/

  2. For crypto this sounds like a serious problem.
    But what about for randomized algorithms like Miller-Rabin primality.
    Is the fact that the random numbers they are using aren't quite random a problem or not?
    I want to say they don't seem to be since the Miller-Rabin primality test works in the real world- but do we really know that?

  3. Thank you for the article!

    “combine the output of the RNG with other system entropy (through XOR, unfortunately, not hashing)”

    Ups. I learned it to do by XORing.
    *looks in HoAC*, ah. Interesting. But why?
    Can you please suggest something to read why I shouldn't use XOR?

  4. Let's say I can control the output of the RNG. That is, I have a backdoor in your system that allows me to specify /any/ set of 'random' bits in response to a low-level RNG call, and I know the bits Y you're going to XOR my response with. Then I can choose a value X such that X xor Y = Z for any chosen Z. This is a pretty rarified, ridiculous attack — but it's conceivably possible to engineer it if you have enough access. It's much harder to do this if the combination step is Z = H(X || Y).

  5. I'm not sure of a good citation, but if the procedure for mixing the entropy pool is too simple (and particularly if it is a linear operation in some Galois field or otherwise commutes with XOR) then a simple relationship exists between the added entropy and the resulting entropy pool state. This means that some common accidental relationships between added entropy values may actually remove entropy, and furthermore it may be easier for an attacker to manipulate entropy pool state. In the simplest case, if the entropy pool mixing operation is the identity function, if two inputs are successive values from a 128-bit counter, 50% of the time the net effect is to flip the least significant bit, 25% of the time the two least significant bits of the pool are flipped etc., which works out to 2 bits of actual entropy added, regardless of the amount of entropy present in the 128-bit counter.

    At a bare minimum, a strong avalanche effect is desirable. Flipping one bit in your input should flip about half of the bits in your entropy pool.

    In the case of the Linux /dev/random, I seem to remember the entropy pool mixing operation is a lagged fibonacci generator (hopefully using addition instead of XOR, but I don't recall).

    Note that Keccak (SHA-3) uses a simple XOR to mix blocks of entropy into the entropy pool, but uses a very strong mixing function. So, if the mixing function is very carefully constructed for proper linear and differential, etc. properties, XOR isn't a problem. However, it takes several person-years of expert analysis to determine that a mixing function is *probably* okay with XOR.

    A hash function that's believed to be secure does, however, give you the properties you want: strong avalanche, good linear properties, good differential properties, etc. to make it as difficult as possible to deduce the entropy pool state from its outputs or to be able to deduce how to force the entropy pool into a state where new entropy is poorly mixed.

    In the end, maybe you can come up with your own function that has all of the needed properties, but many expert person-years went into making SHA-2 and SHA-3 achieve the needed properties in a minimum number of CPU cycles. Maybe it's possible you could roll your own function that had the needed properties, but it's very unlikely it would run as fast as SHA-2 or SHA-3. If your own function runs faster than SHA-2 or SHA-3, it's almost certainly missing something.

  6. I appreciate the RNG article. The point of modifying the state of the DRBG should definitely be consider by all designers. Of course this is true with any DRBG/PRNG not just SP800-90. If the DRBG is hardware only then it's straightforward to disallow any modification of the key when not in test mode. For ASIC and FPGA designers requiring the best security, use a hardware only SP800-90 AES 256 CTR-DRBG with prediction resistance always enabled. If used the only attack I can think of that may work is a fault injection attack against the AES or somehow locking the TRNG entropy. Both attacks should be detected by health monitor and simple anti-tamper.

    Of course the main security problem isn't that there aren't good solutions. It's that many commercial companies business units don't care enough about security, understand security, or want to pay for high quality solutions.

  7. About /dev/random and /dev/uranom using yarrow: you were thinking of FreeBSD's versions (which are identical).

  8. Pondering again the question of XOR vs Hash for mixing: Shall we see mixing functions as distinct cryptographic primitive?

  9. The only thing that you and so many others seem to forget is that NIST/NSA has final say and oversight over encryption. If someone published an encryption algorithm and set it free to be downloaded and used by anybody, then that individual or group would be charged with aiding the enemy; which carries with it the possibility of execution. As has been evident with the Snowden revelations the NSA has manipulated encryption to allow themselves an easier time at decrypting the encrypted information. Also, there is a 256 bit limit to consumer grade encryption; anything grater than that and it would be illegal to just hand out.

    Encryption is not permitted to prevent the US government from being able to decrypt the information. Encryption is only used to try and prevent other people from easily decrypting the encrypted information.

  10. Both Blowfish and Threefish can have keys >= 256 bits (448 & 1024 respectively.) Algorithms can also be stacked, although not with linear security gains (or any in some cases). You can also always kick it old school with a one time pad, although that has its own, seperate problems. I'd be much more concerned about someone (NSA or otherwise) with a big pipe and few moral constraints than the algorithmic security of block ciphers though.

  11. I’m not entirely understanding the preference for hashing over XORing with entropy pools:

    If I understand this correctly, the weakness of XORing is when the attacker has control over one entropy stream and can read the other (let’s assume just two streams to make it simple). Then the attacker can control the output of the RNG. However, if the attacker has access to each of the entropy streams, presumably he can compute the output of the RNG regardless of whether a hash or XOR is used. The only benefit of the hash is the attacker cannot necessarily _control_ the output, but he would still be able to read it just fine.

    As far as I can think, the only time this would make a difference is in a (seemingly) contrived situation where the attacker has compromised the system enough to have an entropy stream that can see all the other entropy streams, and actively works against them, but not compromised enough for the attacker to know what’s going on once this has started.

    On the other hand, XORing the streams _guarantees_ entropy as good or better than the stream with the most entropy — that is, XORing will never make things less random, and all you need is one good independent stream at a time to be working right for the whole system to stay secure. I’m unaware of the same guarantee with any hash function, but I’m very interested if they exist. If you don’t have that guarantee, you’re right back to the primary topic of this article: even if a hash function seems pretty good and passes the statistical tests, you can never be sure if it might just be making things less random and wasting all the work of your entropy pools.

  12. One advantage of using a hash function is that it spreads entropy over all input bits. But it is also true that a bad hash algorithm can make things less random. While A xor B will be at least as random as the most random of A and B. I guess taking all these things into consideration is part of designing a good crypto system.

Comments are closed.