Wednesday, September 18, 2013

The Many Flaws of Dual_EC_DRBG

The Dual_EC_DRBG generator from NIST SP800-90A.
Update 9/19: RSA warns developers not to use the default Dual_EC_DRBG generator in BSAFE. Oh lord.

As a technical follow up to my previous post about the NSA's war on crypto, I wanted to make a few specific points about standards. In particular I wanted to address the allegation that NSA inserted a backdoor into the Dual-EC pseudorandom number generator.

For those not following the story, Dual-EC is a pseudorandom number generator proposed by NIST for international use back in 2006. Just a few months later, Shumow and Ferguson made cryptographic history by pointing out that there might be an NSA backdoor in the algorithm. This possibility -- fairly remarkable for an algorithm of this type -- looked bad and smelled worse. If true, it spelled almost certain doom for anyone relying on Dual-EC to keep their system safe from spying eyes.

Now I should point out that much of this is ancient history. What is news today is the recent leak of classified documents that points a very emphatic finger towards Dual_EC, or rather, to an unnamed '2006 NIST standard'. The evidence that Dual-EC is this standard has now become so hard to ignore that NIST recently took the unprecedented step of warning implementers to avoid it altogether.

Better late than never.

In this post I'm going to try to explain the curious story of Dual-EC. While I'll do my best to keep this discussion at a high and non-mathematical level, be forewarned that I'm probably going to fail at least at a couple of points. I you're not the mood for all that, here's a short summary:
  • In 2005-2006 NIST and NSA released a pseudorandom number generator based on elliptic curve cryptography. They released this standard -- with very little explanation -- both in the US and abroad
  • This RNG has some serious issues with just being a good RNG. The presence of such obvious bugs was mysterious to cryptographers.
  • In 2007 a pair of Microsoft researchers pointed out that these vulnerabilities combined to produce a perfect storm, which -- together with some knowledge that only NIST/NSA might have -- opened a perfect backdoor into the random number generator itself.
  • This backdoor may allow the NSA to break nearly any cryptographic system that uses it. 
If you're still with me, strap in. Here goes the long version.

Dual-EC

For a good summary on the history of Dual-EC-DRBG, see this 2007 post by Bruce Schneier. Here I'll just give the highlights.

Back in 2004-5, NIST decided to address a longstanding weakness of the FIPS standards, namely, the limited number of approved pseudorandom bit generator algorithms (PRGs, or 'DRBGs' in NIST parlance) available to implementers. This was actually a bit of an issue for FIPS developers, since the existing random number generators had some known design weaknesses.*

NIST's answer to this problem was Special Publication 800-90, parts of which were later wrapped up into the international standard ISO 18031. The NIST pub added four new generators to the FIPS canon. None these algorithms is a true random number generator in the sense that they collect physical entropy. Instead, what they do is process the (short) output of a true random number generator -- like the one in Linux -- conditioning and stretching this 'seed' into a large number of random-looking bits you can use to get things done.** This is particularly important for FIPS-certified cryptographic modules, since the FIPS 140-2 standards typically require you to use a DRBG as a kind of 'post-processing' -- even when you have a decent hardware generator.

The first three SP800-90 proposals used standard symmetric components like hash functions and block ciphers. Dual_EC_DRBG was the odd one out, since it employed mathematics more that are typically used to construct public-key cryptosystems. This had some immediate consequences for the generator: Dual-EC is slow in a way that its cousins aren't. Up to a thousand times slower.

Now before you panic about this, the inefficiency of Dual_EC is not necessarily one of its flaws! Indeed, the inclusion of an algebraic generator actually makes a certain amount of sense. The academic literature includes a distinguished history of provably secure PRGs based on on number theoretic assumptions, and it certainly didn't hurt to consider one such construction for standardization. Most developers would probably use the faster symmetric alternatives, but perhaps a small number would prefer the added confidence of a provably-secure construction.

Unfortunately, here is where NIST ran into their first problem with Dual_EC.
Flaw #1: Dual-EC has no security proof. 
Let me spell this out as clearly as I can. In the course of proposing this complex and slow new PRG where the only damn reason you'd ever use the thing is for its security reduction, NIST forgot to provide one. This is like selling someone a Mercedes and forgetting to attach the hood ornament.

I'd like to say this fact alone should have damned Dual_EC, but sadly this is par for the course for NIST -- which treats security proofs like those cool Japanese cookies you can never find. In other words, a fancy, exotic luxury. Indeed, NIST has a nasty habit of dumping proof-writing work onto outside academics, often after the standard has been finalized and implemented in products everywhere.

So when NIST put forward its first draft of SP800-90 in 2005, academic cryptographers were left to analyze it from scratch. Which, to their great credit, they were quite successful at.

The first thing reviewers noticed is that Dual-EC follows a known design paradigm -- it's a weird variant of an elliptic curve linear congruential generator. However they also noticed that NIST had made some odd rookie mistakes.

Now here we will have to get slightly wonky -- though I will keep mathematics to a minimum. (I promise it will all come together in the end!) Constructions like Dual-EC have basically two stages:
  1. A stage that generates a series of pseudorandom elliptic curve points. Just like on the graph at right, an elliptic curve point is a pair (x, y) that satisfies an elliptic curve equation. In general, both x and y are elements of a finite field, which for our purposes means they're just large integers.***

    The main operation of the PRNG is to apply mathematical operations to points on the elliptic curve, in order to generate new points that are pseudorandom -- i.e., are indistinguishable from random points in some subgroup.

    And the good news is that Dual-EC seems to do this first part beautifully! In fact Brown and Gjøsteen even proved that this part of the generator is sound provided that the Decisional Diffie-Hellman problem is hard in the specific elliptic curve subgroup. This is a well studied hardness assumption so we can probably feel pretty confident in this proof.
     
  2. Extract pseudorandom bits from the generated EC points. While the points generated by Dual-EC may be pseudorandom, that doesn't mean the specific (x, y) integer pairs are random bitstrings. For one thing, 'x' and 'y' are not really bitstrings at all, they're integers less than some prime number. Most pairs don't satisfy the curve equation or are not in the right subgroup. Hence you can't just output the raw x or y values and expect them to make good pseudorandom bits.

    Thus the second phase of the generator is to 'extract' some (but not all) of the bits from the EC points. Traditional literature designs do all sorts of things here -- including hashing the point or dropping up to half of the bits of the x-coordinate. Dual-EC does something much simpler: it grabs the x coordinate, throws away the most significant 16-18 bits, and outputs the rest.
In 2006, first Gjøsteen and later Schoenmakers and Sidorenko took a close look at Dual-EC and independently came up with a surprising result:
Flaw #2: Dual-EC outputs too many bits. 
Unlike those previous EC PRGs which output anywhere from 2/3 to half of the bits from the x-coordinate, Dual-EC outputs nearly the entire thing.

This is good for efficiency, but unfortunately it also gives Dual-EC a bias. Due to some quirks in the mathematics of the field operations, an attacker can now predict the next bits of Dual-EC output with a fairly small -- but non-trivial -- success probability, in the range of 0.1%. While this number may seem small to non-cryptographers, it's basically a hanging offense for a cryptographic random number generator where probability of predicting a future bit should be many orders of magnitude lower.

What's just plain baffling is that this flaw ever saw the light of day. After all, the specification was developed by bright people at NIST -- in collaboration with NSA. Either of those groups should easily have discovered a bug like this, especially since this issue had been previously studied. Indeed, within a few months of public release, two separate groups of academic cryptographers found it, and were able to implement an attack using standard PC equipment.

So in summary, the bias is mysterious and it seems to be very much an 'own-goal' on the NSA's part. Why in the world would they release so much information from each EC point? It's hard to say, but a bit more investigation reveals some interesting consequences:
Flaw #3: You can guess the original EC point from looking at the output bits.
By itself this isn't really a flaw, but will turn out to be interesting in just a minute.

Since Dual-EC outputs so many bits from the x-coordinate of each point -- all but the most significant 16 bits -- it's relatively easy to guess the original source point by simply brute-forcing the missing 16 bits and solving the elliptic curve equation for y. (This is all high-school algebra, I swear!)

While this process probably won't uniquely identify the original (x, y), it'll give you a modestly sized list of candidates. Moreover with only 16 missing bits the search can be done quickly even on a desktop computer. Had Dual_EC thrown away more bits of the x-coordinate, this search would not have been feasible at all.

So what does this mean? In general, recovering the EC point shouldn't actually be a huge problem. In theory it could lead to a weakness -- say predicting future outputs -- but in a proper design you would still have to solve a discrete logarithm instance for each and every point in order to predict the next bytes output by the generator.

And here is where things get interesting.
Flaw #4: If you know a certain property about the Dual_EC parameters, and can recover an output point, you can predict all subsequent outputs of the generator.
Did I tell you this would get interesting in a minute? I totally did.

The next piece of our puzzle was discovered by Microsoft researchers Dan Shumow and Niels Ferguson, and announced at the CRYPTO 2007 rump session. I think this result can best be described via the totally intuitive diagram below. (Don't worry, I'll explain it!)
Annotated diagram from Shumow-Ferguson presentation (CRYPTO 2007).
Colorful elements were added by yours truly. Thick green arrows mean 'this part is
easy to reverse'. Thick red arrows should mean the opposite. Unless you're the NSA.
The Dual-EC generator consists of two stages: a portion that generates the output bits (right) and a part that updates the internal state (left).

Starting from the "r_i" value (circled, center) and heading right, the bit generation part first computes the output point using the function "r_i * Q" -- where Q is an elliptic curve point defined in the parameters -- then truncates 16 bits its off its x-coordinate to get the raw generator output. The "*" operator here describes elliptic point multiplication, which is a complex operation that should be relatively hard to invert.

Note that everything after the point multiplication should be easy to invert and recover from the output, as we discussed in the previous section.

Every time the generator produces one block of output bits, it also updates its internal state. This is designed to prevent attacks where someone compromises the internal values of a working generator, then uses this value to wind the generator backwards and guess past outputs. Starting again from the circled "r_i" value, the generator now heads upwards and computes the point "r_i * P" where P is a different elliptic curve point also described in the parameters. It then does some other stuff.

The theory here is that P and Q should be random points, and thus it should be difficult to find "r_i * P" used for state update even if you know the output point "r_i * Q" -- which I stress you do know, because it's easy to find. Going from one point to the other requires you to know a relationship between P and Q, which you shouldn't actually know since they're supposed to be random values. The difficulty of this is indicated by the thick red arrow.
Looks totally kosher to me. (Source: NIST SP800-90A)

There is, however, one tiny little exception to this rule. What if P and Q aren't entirely random values? What if you chose them yourself specifically so you'd know the mathematical relationship between the two points?

In this case it turns out you can easily compute the next PRG state after recovering a single output point (from 32 bytes of RNG output). This means you can follow the equations through and predict the next output. And the next output after that. And on forever and forever.****

This is a huge deal in the case of SSL/TLS, for example. If I use the Dual-EC PRG to generate the "Client Random" nonce transmitted in the beginning of an SSL connection, then the NSA will be able to predict the "Pre-Master" secret that I'm going to generate during the RSA handshake. Given this information the connection is now a cleartext read. This is not good.

So now you should all be asking the most important question of all: how the hell did the NSA generate the specific P and Q values recommended in Appendix A of Dual-EC-DRBG? And do they know the relationship that allows them to run this attack? All of which brings us to:
Flaw #5Nobody knows where the recommended parameters came from.
And if you think that's problematic, welcome to the club.

But why? And where is Dual-EC used?

The ten million dollar question of Dual-EC is why the NSA would stick such an obviously backdoored algorithm into an important standard. Keep in mind that cryptographers found the major (bias) vulnerabilities almost immediately after Dual-EC shipped. The possibility of a 'backdoor' was announced in summer 2007. Who would still use it?

A few people have gone through the list of CMVP-evaluated products and found that the answer is: quite a few people would. Most certify Dual-EC simply because it's implemented in OpenSSL-FIPS, and they happen to use that library. But at least one provider certifies it exclusively. Yuck.

Hardcoded constants from the OpenSSL-FIPS
implementation of Dual_EC_DRBG. Recognize 'em?
It's worth keeping in mind that NIST standards carry a lot of weight -- even those that might have a backdoor. Folks who aren't keeping up on the latest crypto results could still innocently use the thing, either by accident or (less innocently) because the government asked them to. Even if they don't use it, they might include the code in their product -- say through the inclusion of OpenSSL-FIPS or MS Crypto API -- which means it's just a function call away from being surreptitiously activated.

Which is why people need to stop including Dual-EC immediately. We have no idea what it's for, but it needs to go away. Now.

So what about the curves?

The last point I want to make is that the vulnerabilities in Dual-EC have precisely nothing to do with the specifics of the NIST standard elliptic curves themselves. The 'back door' in Dual-EC comes exclusively from the relationship between P and Q -- the latter of which is published only in the Dual-EC specification. The attack can work even if you don't use the NIST pseudorandom curves.

However, the revelations about NIST and the NSA certainly make it worth our time to ask whether these curves themselves are somehow weak. The best answer to that question is: we don't know. Others have observed that NIST's process for generating the curves leaves a lot to be desired. But including some kind of hypothetical backdoor would be a horrible, horrific idea -- one that would almost certainly blow back at us.

You'd think people with common sense would realize this. Unfortunately we can't count on that anymore.

Thanks to Tanja Lange for her assistance proofing this post. Any errors in the text are entirely mine.

Notes:

* My recollection of this period is hazy, but prior to SP800-90 the two most common FIPS DRBGs in production were (1) the SHA1-based DSA generator of FIPS 186-2 and (2) ANSI X9.31. The DSA generator was a special-purpose generator based on SHA1, and was really designed just for that purpose. ANSI X9.31 used block ciphers, but suffered from some more subtle weaknesses it retained from the earlier X9.17 generator. These were pointed out by Kelsey, Schneier, Wagner and Hall.

** This is actually a requirement of the FIPS 140-2 specification. Since FIPS does not approve any true random number generators, it instead mandates that you run your true RNG output through a DRBG (PRNG) first. The only exception is if your true RNG has been approved 'for classified use'.

*** Specifically, x and y are integers in the range 0 to p-1 where p is a large prime number. A point is a pair (x, y) such that y^2 = x^3 + ax + b mod p. The values a and b are defined as part of the curve parameters.

**** The process of predicting future outputs involves a few guesses, since you don't know the exact output point (and had to guess at the missing 16 bits), but you can easily reduce this to a small set of candidates -- then it's just a question of looking at a few more bits of RNG output until you guess the right one.

23 comments:

  1. I have degree from ITT Technical Institute. I was taught that the U.S. government mandates no encryption standard can surpass theirs and they have to be able to crack all encryption standards in play.

    I guess a new trend in Information Technology is going to emerge. Specifically a heightened increase in national IT security and pride. Brazil has already announced plans for a new dedicated backbone bypassing America. So much for globalization.

    ReplyDelete
    Replies
    1. The US government tried to mandate backdoors in everything by law in the 90s and lost due to public outcry. What they couldn't mandate by law they have ever since resorted to trying to get by subterfuge. They will never listen to what the people want, they are against the people.

      And reducing USA-centric-ness of the internet isn't killing globalization, it's enhancing globalization. The whole definition of "globalization" excludes there being one massive central authority controlling everything like the US has been doing! Why do people have an "I must be in charge to ensure harmony by force, otherwise there's chaos" attitude to everything? Open Source has taught us that that is not the case, it's just an excuse that dictators use to keep power.

      Delete
    2. I have a degree from the Blaine School of Cosmetology and I would love for you to state an actually credible source for that.

      Delete
    3. Hey Javier,

      A quick question, how is Brazil being able to communicate with Europe without being MITM'd by USA bad for globalization?

      It skips America because America is hostile, it's just common sense. Connections still would be able to reach America, or go through America if that is the most convenient path.

      Delete
    4. I wonder if the choice of Brazil to bypass the US will help them.... Point of arrival probably will be a NSA-cooperative state. We all learned that the services of NSA and other secret services are so damned intertwined that it is impossible to transfer packets around the globe finally to hit one of their tapping points. And then.... With IPV6 we're lost.... Profiling is 50% of their information value, fully fully compatible with war time UK Y-stations, only capable to log the 'gibberish' from the Germans (Enigma traffic) but very very good in profiling the radio networks. At the start of the war the job actually was done by half. Look at the analogy.... Amazing, isn't it?
      www.cryptomuseum.com

      Delete
  2. You Mr Green, Lange, Schneier, Bernstein etc became famous because of hero whistleblower Mr Snowden and the evil NSA. Your life must have changed for the better. Keep it up. In math and cryptographers we trust.

    ReplyDelete
  3. Thank you, that was just the detailed analysis of the issue I was hoping to eventually find here :)

    I'm wondering about one thing: Assuming that somebody chose P and Q specifically for their mathematical relationship (i.e., they own the "secret key" to the DRBG), would they also be able to "rewind" the DRBG and recover previous outputs, or does this concern future outputs only?

    ReplyDelete
    Replies
    1. From 'just' knowing log_Q P you only get the future outputs. For getting past ones requires computing discrete logarithms. So far we don't know any way that this mathematical problem is easier on the NIST curves.

      Delete
  4. Thank your for the very informative and interesting read :)
    Maybe I got it wrong but
    - The EC points P and Q are defined in the Dual-EC standard appendix and are recommendations (?)
    - They are basically the seed for the algorithm

    So, without real knowledge of crypto, I would naively assume that, if we just use other EC points than the predefined P and Q, we could bypass the NSA-knows-the-ec-point-relationship problem and just use the basic algorithm of the standard.

    Or is the problem one level deeper, at the EC itself? Is the EC fundamentally corrupted?

    MK

    ReplyDelete
    Replies
    1. No, you're absolutely right. Replacing "Q" with one generated randomly *will* kill the backdoor. Later versions of the spec include an (optional) Appendix describing how to do this. However I've looked at a couple of open implementations -- including OpenSSL -- and they still use the default parameters.

      As I said above, the backdoor in Dual_EC has nothing to do with corruption of the elliptic curves themselves. But there's nothing wrong with an excess of caution right now.

      Delete
    2. Ok, thank you for the clarification :)

      MK

      Delete
  5. If the problem is in just two number P and Q, why hasn't somebody generated new set of numbers and explained to everyone how he did it? Seems like guys from OpenSSL could have done this. Why not?

    ReplyDelete
    Replies
    1. You can generate your own "Q" value. But it's not the default in most implementations.

      Delete
  6. This isn't about common sense -- it's about corruption and lies.

    ReplyDelete
  7. The reason why tou would use the NIST (read NSA) mandated EC Points is the reason you would use any NIST (read NSA) approved algorithm : It has supposedly been vetted by experts cryptographers and are A (not necessarily THE) good choice. Implementing DUAL-EC exactly as NIST specified requires programming skills while changing the points requires some mild understanding of what a Elliptic Curve is.
    And last but not least, I'm not sure you can get a product certified if you actually chose other parameters as the ones that were specified by NIST (read, say-it-with-me NSA).

    ReplyDelete
  8. Thanks for the post, Dr. Green. In the future I hope you tackle the question of the NIST elliptic curve generation "controversy" and let us know if there is truly anything fishy with that whole process.

    It's sad, but this whole Snowden debacle is going to cause a lot of distrust in NIST from now on. Perhaps that's a good thing as it might result in new findings from cryptographers who go back and really analyze these standards again.

    ReplyDelete
  9. Assume there are more obscure hacks hiding behind the obvious fix. In the intelligence biz, it's called a 'modified limited hangout': the ability to say "oh, yes, you caught us dead to rights" while distracting you from deeper wrongness.

    ReplyDelete
  10. I never fully understood what my teacher in a cryptography 101 lesson at university 20 years ago meant with: Be suspicious with algorithms where the design principles are not published. I simply couldn't imagine how a backdoor could be planted into an algorithms - now I can!
    Thanks for making this understandable to someone who is dealing with security in everyday practice, but who is not a mathematician. :)

    ReplyDelete
  11. Purely for the entertainment value to the audience here, I offer that it occurred to me that the suspect P-Q could have been a test case provided by the NSA, along the lines of "Given how the algorithm is supposed to work, if we corrupt the P-Q pair by making them non-random using a specific mathematical relationship between them, then the algorithm should be provably not secure. Demonstrating this should increase the confidence that the correctly implemented algorithm is secure." Then what happened is some arrogant scientist at NIST (full disclosure--I was formerly a NIST employee, and the terms of my departure still burn as a fire in the pit of my stomach) conveniently "forgot" to put the correct ones in the standard, or did it on purpose since "Anyone of modest skill in cryptography will detect the problem and come up with their own P-Q pair correctly. Anyone who doesn't deserves what they get." There are, in my estimation, people that arrogant employed by NIST.

    ReplyDelete
  12. There’s no word to describe such a great masterpiece. You made such an interesting piece to read, giving every subject enlightenment for us to gain knowledge and information Porcelain Veneers

    ReplyDelete
  13. I would appreciate your advice on how to publish a freeware encryption program I constructed.

    Please reply to DavidLNewton@comcast.net

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Some of the comments ask why not just generate your own points. The naive answer is, "Sure, you can do that." NIST SP 800-90A even discusses that as an option. However, if you want to have your device or code validated as FIPS 140-2 compliant, it has to pass some known answer tests. (You feed in the supplied value as though it was entropy, and check the first so many output blocks match the known answers.) And those tests are (I think) dependent on P and Q being as published in NIST SP 800-90A.

    So the short answer is "yes, you can", but the catch is that you will have to have some sort of option to test with the published points, and the use your own, and you will have to convince the validation lab that this is all okay.

    The "I think" is because the published examples are on the NIST web site, but that is down due to the government shutdown, so I can't go look.

    ReplyDelete