How do Interception Proxies fail?

I have some substantive posts in the works, but mostly this week hasn’t been good for blogging. In the meantime, I wanted to point readers to this fascinating talk by researcher Jeff Jarmoc, which I learned about through the Corelan team blog:

SSL/TLS Interception Proxies and Transitive Trust

SSL/TLS is entrusted with securing many of the communications services we take for granted in our connected world. Threat actors are also aware of the advantages offered by encrypted communication channels, and increasingly utilize encryption for exploit delivery, malware command-and-control and data exfiltration.

To counter these tactics, organizations are increasingly deploying security controls that intercept end-to-end SSL/TLS channels. Web proxies, DLP systems, specialized threat detection solutions, and network IPSs now offer functionality to intercept, inspect and filter encrypted traffic. Similar functionality is also present in lawful intercept systems and solutions enabling the broad surveillance of encrypted communications by governments. Broadly classified as “SSL/TLS Interception Proxies,” these solutions act as man-in-the-middle, violating the end-to-end security guarantees promised by SSL/TLS.

In this presentation we’ll explore a phenomenon known as “transitive trust,” and explain how deployment of SSL/TLS interception solutions can introduce new vulnerabilities. We detail a collection of new vulnerabilities in widely used interception proxies first discovered by the Dell SecureWorks CTU and responsibly disclosed to the impacted vendors. These vulnerabilities enable attackers to more easily intercept and modify secure communications. In addition, we will introduce a public web site that organizations can use to quickly and easily test for these flaws.

I can’t find Jeff’s slides or whitepaper at the moment (Update: The slides are now public. There’s a lot more to his talk than I cover in this post.) What I can tell from the summary is that Jeff is doing us all an invaluable favor — essentially, putting his hands deep in the scuzz to find out what’s down there.

To make a long story short, the answer is nothing good. The details are in the Corelan post (which, ironically, gives me a TLS error), but to sum it up: Jeff mostly focuses on what interception proxies do when the proxy receives an invalid certificate from a remote website — for example, one that is expired or revoked.

Normally your browser would be the one dealing with this, but in a MITM scenario you’re totally dependent on the proxy. Even if the proxy checks the certificate properly in the first place, they’re still in a tough place. They essentially have the following options:

  1. Reject the connection altogether (probably safest)
  2. Give users the option to proceed or abort (no worse than standard TLS)
  3. Ignore the errors and make the connection anyway (run for the hills!) 

Jeff correctly points out that option (3) is the most dangerous, since it opens users up to all kinds of bad TLS connections that would normally ring alarm bells in your browser. Worse, this seems to be the default policy of a number of commercial interception proxies, mostly for unintentional/stupid reasons.

Beyond these default settings, it seems to be that there’s another question here, namely: how are these devices being configured in the field? My guess is that this depends greatly on whether the “victims” of interception know that their TLS traffic is being monitored. If deployers choose to do interception quietly, it could make a big difference in how a proxy will handle cert issues.

I stress that we’re now speculating, but let’s pretend that ACME corporation wants to intercept its employees’ TLS connections, but doesn’t actively want to advertise this fact.* This may restrict their options. For one thing, option (2) is probably out, since this would produce obvious messages on the end-users’ web browser. Even option (1) might be iffy, since some sites will simply not work, without any decent explanation. Hence — in speculation land — one could imagine some organizations deliberately choosing option (3), on the theory that being quiet is better than being secure.**

This is different than the vulnerabilities that Jeff addresses (which mainly deal with devices’ default settings), but it’s something I’ve been wondering about since I first heard of the practice. After all, you’ve gone to all this trouble to get a publicly-rooted MITM CA, now you’re going to advertise that you’re using it? Maybe, maybe not.

The world of TLS MITM interception is a fascinating one, and I can’t possibly learn enough about it. Hopefully we’ll soon learn even more, at least about the nasty CA-facilitated variant of it, as CAs start to respond to Mozilla’s recent ultimatum.

Notes:

* They may notify their employees somehow, but that’s different from reminding them on a daily basis. This isn’t totally nuts: it’s one speculative reason for deploying CA-generated MITM certificates, rather than generating an org certificate and installing it throughout your enterprise.

** I suppose there are workarounds for this case, such as re-writing the MITM cert to include the same class of errors (expiration dates, name errors) but I’d be utterly shocked if anyone uses them.

Surviving a bad RNG

A couple of weeks ago I wrote a long post about random number generation, which I find to be one of the most fascinating subjects in cryptography — mostly because of how terrible things get when people screw it up.

And oh boy, do people screw it up. Back in 2008 it was Debian, with their ‘custom’ OpenSSL implementation that could only produce 32,768 possible TLS keys (do you really need more?) In 2012 it’s 25,000 factorable TLS public keys, all of which appear to have been generated by embedded devices with crappy RNGs.

When this happens, people get nervous. They start to wonder: am I at risk? And if so, what can I do to protect myself?

Answering this question is easy. Answering it in detail is hard. The easy answer is that if you really believe there’s a problem with your RNG, stop reading this blog and go fix it!

The more complicated answer is that many bad things can happen if your RNG breaks down, and some are harder to deal with than others.

In the rest of this post I’m going to talk about this, and give a few potential mitigations. I want to stress that this post is mostly a thought-exercise. Please do not re-engineer OpenSSL around any of the ‘advice’ I give herein (I’m looking at you, Dan Kaminsky), and if you do follow any of my advice, understand the following:

When it all goes terribly wrong, I’ll quietly take down this post and pretend I never wrote it.

In other words, proceed at your own risk. First, some background.

What’s a ‘bad RNG’?

Before we get started, it’s important to understand what it means for an RNG to be broken. In general, failure comes in three or four different flavors, which may or may not share the same root cause:

  1. Predictable output. This usually happens when a generator is seeded with insufficient entropy. The result is that the attacker can actually predict, or at least guess, the exact bits that the RNG will output. This has all kinds of implications, none of them good.
  2. Resetting output. This can occur when a generator repeatedly outputs the same stream of bits, e.g., every time the system restarts. When an attacker deliberately brings about this condition, we refer to it as a Reset Attack, and it’s a real concern for devices like smartcards.
  3. Shared output. Sometimes exactly the same bits will appear on two or more different devices. Often the owners won’t have any idea this is happening until someone else turns up with their public key! This is almost always caused by some hokey entropy source or hardcoded seed value.

These aren’t necessarily distinct problems, and they can easily bleed into one another. For example, a resetting RNG can become a predictable RNG once the adversary observes the first round of outputs. Shared output can become predictable if the attacker gets his hands on another device of the same model. The Debian bug, for example, could be classified into all three categories.

In addition to the problems above, there’s also a fourth (potential) issue:

4. Non-uniform or biased output. It’s at least possible that the output of your generator will exhibit biased outputs, or strings of repeated characters (the kind of thing that tests like DIEHARD look for). In the worst case, it might just start outputting zero bytes.

The good news is that this is relatively unlikely as long as you’re using a standard crypto library. That’s because modern systems usually process their collected entropy through a pseudo-random generator (PRG) built from a hash function or a block cipher.

The blessing of a PRG is that it will usually give you nice, statistically-uniform output even when you feed it highly non-uniform seed entropy. This helps to prevent attacks (like this one) which rely on the presence of obvious biases in your nonces/keys/IVs. While this isn’t a rule, most common RNG failures seem to be related to bad entropy, not to some surprise failure of the PRG.

Unfortunately, the curse is that a good PRG can hide problems. Since most people only see the output of their PRG (rather than the seed material), it’s easy to believe that your RNG is doing a great job, even when it’s actually quite sick.* This has real-world implications: many standards (like FIPS-140) perform continuous tests on the final RNG/PRG output (e.g., for repeated symbols). The presence of a decent PRG renders these checks largely useless, since they’ll really only detect the most catastrophic (algorithmic) failures.

Key generation

When it comes to generating keys with a bad (P)RNG, the only winning move is not to play. Algorithm aside, if an attacker can predict your ‘randomness’, they can generate the same key themselves. Game over. Incidentally, this goes for ephemeral keys as well, meaning that protocols like Diffie-Hellman are not secure in the presence of a predictable RNG (on either side).

If you think there’s any chance this will happen to you, then either (a) generate your keys on a reliable device, or (b) get yourself a better RNG. If neither option is available, then for god’s sake, collect some entropy from the user before you generate keys. Ask them to tap a ditty on a button, or (if a keyboard is available), get a strong, unpredictable passphrase and hash it through PBKDF2 to get a string of pseudo-random bits. This might not save you, but it’s probably better than the alternative.

What’s fascinating is that some cryptosystems are more vulnerable to bad, or shared randomness than others. The recent batch of factorable RSA keys, for example, appears to be the product of poor entropy on embedded devices. But the keys weren’t broken because someone guessed the entropy that was used. Rather, the mere fact that two different devices share entropy was enough to make both of their keys factorable.

According to Nadia Heninger, this is an artifact of the way that RSA keys are generated. Every RSA public modulus is the product of two primes. Some devices generate one prime, then reseed their RNG (with the time, say) before generating the second. The result is two different moduli, each sharing one prime. Unfortunately, this is the worst thing you can do with an RSA key, since anyone can now compute the GCD and efficiently factor both keys.

Although you’re never going to be safe when two devices share entropy, it’s arguable that you’re better off if they at least generate the same RSA key, rather than two moduli with a single shared prime. One solution is to calculate the second prime as a mathematical function of the first. An even easier fix is just to make sure that you don’t reseed between the two primes.

Of course it’s not really fair to call these ‘solutions’. Either way you’re whistling past the graveyard, but at least this might let you whistle a bit longer.

Digital signatures and MACs

There’s a widely held misconception that digital signatures must be randomized. This isn’t true, but it’s understandable that people might think this, since it’s a common property of the signatures we actually use. Before we talk about this, let me stipulate that what we’re talking about here is the signing operation itself — I’m premising this discussion on the very important assumption that we have properly-generated keys.

MACs. The good news is that virtually every practical MAC in use today is deterministic. While there are probabilistic MACs, they’re rarely used. As long as you’re using a standard primitive like HMAC, that bad RNG shouldn’t affect your ability to authenticate your messages.

Signatures. The situation with signatures is a bit more complicated. I can’t cover all signatures, but let’s at least go over the popular ones. For reasons that have never been adequately explored, these are (in no particular order): ECDSA, DSA, RSA-PKCS#1v1.5 and RSA-PSS. Of these four signatures, three are randomized.

The major exception is RSA-PKCS#1v1.5 signature padding, which has no random fields at all. While this means you can give your RNG a rest, it doesn’t mean that v1.5 padding is good. It’s more accurate to say that the ‘heuristically-secure’ v1.5 padding scheme remains equally bad whether you have a working RNG or not.

If you’re signing with RSA, a much better choice is to use RSA-PSS, since that scheme actually has a reduction to the hardness of the RSA problem. So far so good, but wait a second: doesn’t the P in PSS stand for Probabilistic? And indeed, a close look at the PSS description (below) reveals the presence of random salt in every signature.

The good news is that this salt is only an optimization. It allows the designers to obtain a tighter reduction to the RSA problem, but the security proof holds up even if you repeat the salt, or just hardcode it to zero.

rsa-pss
The PSS signing algorithm. MGF is constructed from a hash function.

Having dispensed with RSA, we can get down to the serious offenders: DSA and ECDSA.

The problem in a nutshell is that every (EC)DSA signature includes a random nonce value, which must never be repeated. If you ever forget this warning — i.e., create two signatures (on different messages) using the same nonce — then anyone can recover your secret key. This is both easy to detect, and to compute. You could write a script to troll the Internet for repeated nonces (e.g., in X509 certificates), and then outsource the final calculation to a bright eighth-grader.

Usually when DSA/ECDSA go wrong, it’s because someone simply forgot to generate a random nonce in the first place. This appears to be what happened with the Playstation 3. Obviously, this is stupid and you shouldn’t do it. But no matter how careful your implementation, you’re always going to be vulnerable if your RNG starts spitting out repeated values. If this happens to you even once, you need to throw away your key and generate a new one.

There are basically two ways to protect yourself:

  • Best: don’t to use (EC)DSA in the first place. It’s a stupid algorithm with no reasonable security proof, and as a special bonus it goes completely pear-shaped in the presence of a bad RNG. Unfortunately, it’s also a standard, used in TLS and elsewhere, so you’re stuck with it.
  • Second best: Derive your nonces deterministically from the message and some secret data. If done correctly (big if!), this prevents two messages from being signed with the same nonce. In the extreme case, this approach completely eliminates the need for randomness in (EC)DSA signatures.There are two published proposals that take this approach. The best is Dan Bernstein’s (somewhat complex) EdDSA proposal, which looks like a great replacement for ECDSA. Unfortunately it’s a replacement, not a patch, since EdDSA uses different elliptic curves and is therefore not cross-compatible with existing ECDSA implementations.

    Alternatively, Thomas Pornin has a proposal up that simply modifies (EC)DSA by using HMAC to derive the nonces. The best part about Thomas’s proposal is that it doesn’t break compatibility with existing (EC)DSA implementations. I will caution you, however: while Thomas’s work looks reasonable, his proposal is just a draft (and an expired one to boot). Proceed at your own risk.

Encryption

There are various consequences to using a bad RNG for encryption, most of which depend on the scheme you’re using. Once again we’ll assume that the keys themselves are properly-generated. What’s at stake is the encryption itself.

Symmetric encryption. The good news is that symmetric encryption can be done securely with no randomness at all, provided that you have a strong encryption key and the ability to keep state between messages.

An obvious choice is to use CTR mode encryption. Since CTR mode IVs needn’t be unpredictable, you can set your initial IV to zero, then simply make sure that you always hang onto the last counter value between messages. Provided that you never ever re-use a counter value with a given key (even across system restarts) you’ll be fine.**

This doesn’t work with CBC mode, since that actually does require an unpredictable IV at the head of each chain. You can hack around this requirement in various ways, but I’m not going to talk about those here; nothing good will come of it.

Public-key encryption. Unfortunately, public-key encryption is much more difficult to get right without a good RNG.

Here’s the fundamental problem: if an attacker knows the randomness you used to produce a ciphertext, then (in the worst case) she can simply encrypt ‘guess’ messages until she obtains the same ciphertext as you. At that point she knows what you encrypted.***

Obviously this attack only works if the attacker can guess the message you encrypted. Hence it’s possible that high-entropy messages (symmetric keys, for example) will encrypt securely even without good randomness. But there’s no guarantee of this. Elgamal, for example, can fail catastrophically when you encrypt two messages with the same random nonce.****

Although I’m not going to endorse any specific public-key encryption scheme, it seems likely that some schemes will hold up better than others. For example, while predictably-randomized RSA-OAEP and RSA-OAEP+ will both be vulnerable to guessing attacks, there’s some (intuitive) reason to believe that they’ll remain secure for high-entropy messages like keys. I can’t prove this, but it seems like a better bet than using Elgamal (clearly broken) or older padding schemes like RSA-PKCS#1v1.5.

If my intuition isn’t satisfying to you, quite a lot of research is still being done in this area. See for example, recent works on deterministic public-key encryption, or hedged public-key encryption. Note that all of this work makes on the assumption that you’re encrypting high-entropy messages.

Protocols

I can’t conclude this post without at least a token discussion of how a bad RNG can affect cryptographic protocols. The short version is that it depends on the protocol. The shorter version is that it’s almost always bad.

Consider the standard TLS handshake. Both sides use their RNGs to generate nonces. Then the client generates a random ‘Pre-master Secret’ (PMS), encrypts it under the server’s key, and transmits it over the wire. The ‘Master Secret’ (and later, transport key) is derived by hashing together all of the nonces and the PMS.

Since the PMS is the only real ‘secret’ in the protocol (everything else is sent in the clear), predicting it is the same as recovering the transport key. Thus TLS is not safe to use if the client RNG is predictable. What’s interesting is that the protocol is secure (at least, against passive attackers) even if the server’s RNG fails. I can only guess that this was a deliberate choice on the part of TLS’s designers.

sy10660a
SSL handshake (source).

Protocols are already plenty exciting when you have a working RNG. Adding a bad RNG to the mix is like pouring fireworks on a fire. It’s at least possible to build protocols that are resilient to one participant losing their RNG, but it’s very tricky to accomplish — most protocols will fail in unexpected ways.

In Summary

If you take nothing else from this post, I hope it’s this: using a broken RNG is just a bad idea. If you think there’s any chance your generator will stop working, then for god’s sake, fix it! Don’t waste your time doing any of the stuff I mention above.

That said, there legitimately are cases where your RNG can go wrong, or where you don’t have one in the first place. The purpose of this post was to help you understand these scenarios, and the potential consequences for your system. So model it. Think about it. Then spend your time on better things.

Notes:

 The classic example is Debian’s 2008 OpenSSL release, which used a 15-bit process ID as the only seed for its PRG. This wasn’t obvious during testing, since the 32,768 possible RNG streams all looked pretty random. It was only after the public release that people noticed that many devices were sharing TLS keys.

** If you’re going to do this, you should also be sure to use a MAC on your ciphertext, including the initial counter value for each message.

*** A great example is unpadded, textbook RSA. If m is random, then it’s quite difficult to recover m given m^e mod N. If, however, you have a few good guesses for m and you know the public key (N, e), you can easily try each of your guesses and compare the results.

**** Given two Elgamal ciphertexts on the same key and randomness (g^r, y^r*M1), (g^r, y^r*M2) you can easily compute M1/M2. A similar thing happens with hash Elgamal. This may or may not be useful to you, depending on how much you know about the content of the various messages.

A brief update

My early-week post on the MITM certificate mess seems to have struck a nerve with readers. (Or perhaps I just picked the right time to complain!) Since folks seem interested in this subject, I wanted to follow up with a few quick updates:

  • The EFF has released a new version of HTTPS Everywhere, which includes a nifty ‘Decentralized SSL Observatory’ feature. This scans for unusual certificates (e.g., MITM certs, certs with weak keys) and reports them back to EFF for logging. A very nice step towards a better ‘net.
  • StalkR reminds me that Chrome 18 includes support for Public-key Pinning. This is an HTTP extension that allows a site operator to ‘pin’ their site to one (or more) pre-specified public keys for a given period of time. A pinned browser will reject any alternative keys that show up — even if they’re embedded in a valid certificate.
  • A couple of readers point out that popular sites (e.g., Google and Facebook) change their certificates quite frequently — possibly due to the use of load balancers — which poses a problem for “carry a list of legitimate certs with you” solutions. I recognize this. The best I can say is that we’re all better off if bogus certs are easy to detect. Hopefully site operators will find a compromise that makes this easy for us.

Appearances to the contrary, this blog is not going to become a forum for complaining about CAs. I’ll be back in a few days with more wonky crypto posts, including some ideas for dealing with bad randomness, some thoughts on patented modes of operation, and an update on the progress that researchers are making with Fully-Homomorphic Encryption.

The Internet is broken: could we please fix it?

Ok, this is a little embarrassing and I hate having to admit it publicly. But I can’t hold it in any longer: I think I’m becoming an Internet activist.

This is upsetting to me, since active is the last thing I ever thought I’d be. I have friends who live to make trouble for big corporations on the Internet, and while I admire their chutzpah (and results!), they’ve always made me a little embarrassed. Even when I agree with their cause, I still have an urge to follow along, cleaning up the mess and apologizing on behalf of all the ‘reasonable’ folks on the Internet.

But every man has a breaking point, and the proximate cause of mine is Trustwave. Or rather, the news that Trustwave — an important CA and pillar of the Internet — took it upon themselves to sell a subordinate root cert to some (still unknown) client, for the purposes of undermining the trust assumptions that make the Internet secure eavesdropping on TLS connections.

This kind of behavior is absolutely, unquestionably out of bounds for a trusted CA, and certainly deserves a response — a stronger one than it’s gotten. But the really frightening news is twofold:

  1. There’s reason to believe that other (possibly bigger) CAs are engaged in the same practice.
  2. To the best of my knowledge, only one browser vendor has taken a public stand on this issue, and that vendor isn’t gaining market share.

The good news is that the MITM revelation is exactly the sort of kick we’ve needed to improve the CA system. And even better, some very bright people are already thinking about it. The rest of this post will review the problem and talk about some of the potential solutions.

Certificates 101

For those of you who know the TLS protocol (and how certificates work), the following explanation is completely gratuitous. Feel free to skip it. If you don’t know — or don’t understand the problem — I’m going to take a minute to give some quick background.

TLS (formerly SSL) is probably the best-known security protocol on the Internet. Most people are familiar with TLS for its use in https — secure web — but it’s also used to protect email in transit, software updates, and a whole mess of other stuff you don’t even think about.

TLS protects your traffic by encrypting it with a strong symmetric key algorithm like AES or RC4. Unfortunately, this type of cryptography only works when the communicating parties share a key. Since you probably don’t share keys with most of the web servers on the Internet, TLS provides you with a wonderful means to do so: a public-key key agreement protocol.

I could spend a lot of time talking about this, but for our purposes, all you need to understand is this: when I visit https://gmail.com, Google’s server will send me a public key. If this key really belongs to Google, then everything is great: we can both derive a secure communication key, even if our attacker Mallory is eavesdropping on the whole conversation.

If, on the other hand, Mallory can intercept and modify our communications, the game is very different. In this case, she can overwrite Gmail’s key with her own public key. The result: I end up sharing a symmetric key with her! The worst part is that I probably won’t know this has happened: clever Mallory can make her own connection to Gmail and silently pass my traffic through — while reading every word. This scenario is called a Man in the Middle (MITM) Attack.

Man_in_the_middle_attack.svg
MITM attack. Alice is your grandfather, Bob is BankofAmerica.com, and Mallory establishes connections with both. (Wikipedia/CC license)

MITM attacks are older than the hills. Fortunately TLS has built-in protections to thwart them. Instead of transmitting a naked public key, the Gmail server wraps its key in a certificate; this is a simple file that embeds both the key and some identifying information, like “gmail.com”. The certificate is digitally signed by someone very trustworthy: one of a few dozen Certificate Authorities (CA) that your browser knows and trusts. These include companies like Verisign, and (yes) Trustwave.

TLS clients (e.g., web browsers) carry the verification keys for a huge number of CAs. When a certificate comes in, they can verify its signature to ensure that it’s legit. This approach works very well, under one very important assumption: namely, Mallory won’t be able to get a signed certificate on a domain she doesn’t own.

What’s wrong with the CA model?

The real problem with the CA model is that every root CA has the power to sign any domain, which completely unravels the security of TLS. So far the industry has policed itself using the Macaroni Grill model: If a CA screws up too badly, they face being removed from the ‘trusted’ list of major TLS clients. In principle this should keep people in line, since it’s the nuclear option for a CA — essentially shutting down their business.

Unfortunately while this sounds good it’s tricky to implement in practice. That’s because:

  1. It assumes that browser vendors are willing to go nuclear on their colleagues at the CAs.
  2. It assumes that browser vendors can go nuclear on a major CA, knowing that the blowback might very well hurt their product. (Imagine that your browser unilaterally stopped accepting Verisign certs. What would you do?)
  3. It assumes that someone will catch misbehaving CAs in the first place.

What’s fascinating about the Trustwave brouhaha is that it’s finally giving us some visibility into how well these assumptions play out in the real world.

So what happened with Trustwave?

In late January of this year, Trustwave made a cryptic update to their CA policy. When people started asking about it, they responded with a carefully-worded post on the company blog. When you cut through the business-speak, here’s what it says:

We sold the right to generate certificates — on any domain name, regardless of whether it belongs to one of our clients or not — and packed this magical capability into a box. We rented this box to a corporate client for the express purpose of running Man-in-the-Middle attacks to eavesdrop on their employees’ TLS-secured connections. At no point did we stop to consider how damaging this kind of practice was, nor did we worry unduly about its potential impact on our business — since quite frankly, we didn’t believe it would have any.

I don’t know which part is worse. That a company whose entire business is based on trust — on the idea that people will believe them when they say a certificate is legit — would think they could get away with selling a tool to make fraudulent certificates. Or that they’re probably right.

But this isn’t the worst of it. There’s reason to believe that Trustwave isn’t alone in this practice. In fact, if we’re to believe the rumors, Trustwave is only noteworthy in that they stopped. Other CAs may still be up to their ears.

And so this finally brings us to the important part of this post: what’s being done, and what can we do to make sure that it never happens again?

Option 1: Rely on the browser vendors

What’s particularly disturbing about the Trustwave fiasco is the response it’s gotten from the various browser manufacturers.

So far exactly one organization has taken a strong stand against this practice. The Mozilla foundation (makers of Firefox) recently sent a strongly-worded letter to all of their root CAs — demanding that they disclose whether such MITM certificates exist, and that they shut them down forthwith. With about 20% browser share (depending on who’s counting), Mozilla has the means to enforce this. Assuming the vendors are honest, and assuming Mozilla carries through on its promise. And assuming that Mozilla browser-share doesn’t fall any further.

That’s the good news. Less cheerful is the deafening silence from Apple, Microsoft and Google. These vendors control most of the remaining browser market, and to the best of my knowledge they’ve said nothing at all about the practice. Publicly, anyway. It’s possible that they’re working the issue privately; if so, more power to them. But in the absence of some evidence, I find it hard to take this on faith.

Option 2: Sunlight is the best disinfectant

The Trustwave fiasco exposes two basic problems with the CA model: (1) any CA can claim ownership of any domain, and (2) there’s no easy way to know which domains a CA has put its stamp on.

This last is very much by CA preference: CAs don’t want to reveal their doings, on the theory that it would harm their business. I can see where they’re coming from (especially if their business includes selling MITM certs!) Unfortunately, allowing CAs to operate without oversight is one of those quaint practices (like clicking on links sent by strangers) that made sense in a more innocent time, but no longer has much of a place in our world.

Hash_Tree.svg
Merkle tree (Wikipedia/CC)

Ben Laurie and Adam Langley feel the same way, and they’ve developed a plan to do something about it. The basic idea is this:

  1. Every new certificate should be published in a public audit log. This log will be open to the world, which means that everyone can scan for illegal entries (i.e., their own domain appearing in somebody else’s certificate.)
  2. Anytime a web server hands out a certificate, it must prove that the certificate is contained in the list.

The beautiful thing is that this proof can be conducted relatively efficiently using a Merkle hash tree. The resulting proofs are quite short (log(N) hashes, where N is the total number of certificates). Browsers will need to obtain the current tree root, which requires either (a) periodic scanning of the tree, or some degree of trust in an authority, who will periodically distribute signed root nodes.

Along the same lines, the EFF has a similar proposal called the Sovereign Keys Project. SKP also proposes a public log, but places stronger requirements on what it takes to get into the log. It’s quite likely that in the long run these projects will merge, or give birth to something even better.

Option 3: Eternal vigilance

The problem with SKP and the Laurie/Langley proposal is that both require changes to the CA infrastructure. Someone will need to construct these audit logs; servers will have to start shipping hash proofs. Both can be incrementally deployed, but will only be effective once deployment reaches a certain level.

Another option is to dispense with this machinery altogether, and deal with rogue CAs today by subjecting them to contant, unwavering surveillance. This is the approach taken by CMU’s Perspectives plugin and by Moxie Marlinspike’s Convergence.

The core idea behind both of these systems is to use ‘network perspectives’ to determine whether the certificate you’re receiving is the same certificate that everyone else is. This helps to avoid MITMs, since presumably the attacker can only be in the ‘middle’ of so many network paths. To accomplish this, both systems deploy servers called Notaries — run on a volunteer basis — which you can call up whenever you receive an unknown certificate. They’ll compare your version of the cert to what they see from the same server, and help you ring the alarm if there’s a mismatch.

A limitation of this approach is privacy; these Notary servers obviously learn quite a bit about the sites you visit. Convergence extends the Perspectives plugin to address some of these issues, but fundamentally there’s no free lunch here. If you’re querying some external party, you’re leaking information.

One solution to this problem is to dispense with online notary queries altogether, and just ask people to carry a list of legitimate certificates with them. If we assume that there are 4 million active certificates in the world, we could easily fit them into a < 40MB Bloom filter. This would allow us to determine whether a cert is ‘on the list’ without making an online query. Of course, this requires someone to compile and maintain such a list. Fortunately there are folks already doing this, including the EFF’s SSL Observatory project.

Option 4: The hypothetical

The existence of these proposals is definitely heartening. It means that people are taking this seriously, and there’s an active technical discussion on how to make things better.

Since we’re in this mode, let me mention a few other things that could make a big difference in detecting exploits. For one thing, it would be awfully nice if web servers had a way to see things through their clients’ eyes. One obvious way to do this is through script: use Javascript to view the current server certificate, and report the details back to the server.

Of course this isn’t perfect — a clever MITM could strip the Javascript or tamper with it. Still, obfuscation is a heck of a lot easier then de-obfuscation, and it’s unlikely that a single attacker is going to win an arms race against a variety of sites.

Unfortunately, this idea has to be relegated to the ‘could be, should be’ dustbin, mostly because Javascript doesn’t have access to the current certificate info. I don’t really see the reason for this, and I sure hope that it changes in the future.

Option 5: The long arm of the law

I suppose the last option — perhaps the least popular — is just to treat CAs the same way that you’d treat any important, trustworthy organization in the real world. That means: you cheat, you pay the penalty. Just as we shouldn’t tolerate Bank of America knowingly opening a credit line in the name of a non-customer, we shouldn’t tolerate a CA doing the same.

Option 6: Vigilante justice

Ok, I’m only kidding about this one, cowboy. You can shut down that LOIC download right now.

In summary

I don’t know that there’s a magical vaccine that will make the the CA system secure, but I’ve come to believe that the current approach is not working. It’s not just examples like Trustwave, which (some might argue) is a relatively limited type of abuse. It’s that the Trustwave revelation comes in addition to a steady drumbeat of news about stolen keys, illegitimately-obtained certificates, and various other abuses.

While dealing with these problems might not be easy, what’s shocking is how easy it would be to at least detect and expose the abuses at the core of it — if various people agreed that this was a worthy goal. I do hope that people start taking this stuff seriously, mostly because being a radical is hard, hard work. I’m just not cut out for it.

Random number generation: An illustrated primer

Last week we learned (from two different sources!) that certain RSA implementations don’t properly seed their random number generators before generating keys. One practical upshot is that a non-trivial fraction of RSA moduli share a prime factor. Given two such moduli, you can easily factor both.

This key generation kerfuffle is just the tip of the iceberg: a lot of bad things can happen when you use a weak, or improperly-seeded RNG. To name a few:

  • Re-using randomness with (EC)DSA can lead to key recovery.
  • Re-using randomness with Elgamal can lead to plaintext recovery and other ugliness.
  • Using predictable IVs in CBC or CTR mode encryption can lead to plaintext recovery.
  • When protocols use predictable nonces they may become vulnerable to e.g., replay attacks.

In the rest of this post I’m going to talk about the various ways that random number generators work, the difference between RNGs and PRGs, and some of the funny problems with both. Since the post has gotten horrifically long, I’ve decided to present it in a (fun!) question/answer style that makes it easy to read in any order you want. Please feel free to skip around.

What’s the difference between Randomness, Pseudo-Randomness and Entropy?

Before we get started, we have to define a few of our terms. The fact is, there are many, many definitions of randomness. Since for our purposes we’re basically interested in random bit generators, I’m going to give a workaday definition: with a truly random bit generator, nobody (regardless of what information or computing power they have) can predict the next output bit with probability greater than 1/2.

If we lived in an orderly universe, it would be hard to build generators that meet this standard. Fortunately, the universe we live in seems to be anything but orderly. Physicists tell us that at the quantum level certain events have measurable probabilities, but otherwise cannot be predicted in advance.
random
A hardware RNG.

The most expensive hardware RNGs take advantage of this, measuring such phenomena as radioactive decay or shot noise. Most consumer-grade RNGs don’t have radioactive particles lying around, so they instead measure macroscopic, but chaotic phenomena — typically highly-amplified electrical noise.

These devices are great if you’ve got ’em; unfortunately not everyone does. For the rest of us, the solution is to collect unpredictable values from the computer we’re working on. While this gunk may not be truly random, we hope that it has sufficient entropy — essentially a measure of unpredictability — that our attacker won’t know the difference.

If you’re using a standard PC, your system is probably filling its entropy pool right now: from unpredictable values such as drive seek or inter-keystroke timings. Taken individually none of these events provide enough entropy to do much; but by ‘stirring’ many such measurements together you can obtain enough to do useful cryptography.

Random vs. Pseudorandom. The big problem with RNGs is that they’re usually pretty inefficient. Hardware RNGs can only collect so many bits per second, and the standard OS entropy measurement techniques are even slower. For this reason, many security systems don’t actually use this entropy directly. Instead, they use it to seed a fast cryptographically-secure pseudo-random generator, sometimes called a CSPRNG or (to cryptographers) just a PRG.

PRGs don’t generate random numbers at all. Rather, they’re algorithms that take in a short random string (‘seed’), and stretch it into a long sequence of random-looking bits. Since PRGs are deterministic and computational in nature, they obviously don’t satisfy our definition of randomness (a sufficiently powerful attacker can simply brute-force her way through the seed-space.) But if our attackers are normal (i.e., computationally limited) it’s possible to build unpredictable PRGs from fairly standard assumptions.*

Combining RNGs and PRGs. As I said, most systems combine an RNG with a PRG, using the former to generate a seed for the latter. Some standards actually mandate this combination — not just because it’s faster, but because the additional layer of PRG is believed to offer some resilience in the event that the RNG contains a hardware flaw.

You can argue about whether this is a good idea, but the upshot is as follows: if you want to understand where ‘random’ numbers come from, you really need to understand both technologies and how they interoperate on your machine.

Where does my entropy come from?

Unless you’re running a server and have a fancy Hardware Security Module installed, chances are that your system is collecting entropy from the world around it. Most OSes do this at the kernel level, using a variety of entropy sources which are then ‘stirred’ together. These include:

  • Drive seek timings. Modern hard drives (of the spinning variety) are a wonderful source of chaotic events. In 1994 Davis, Ihaka and Fenstermacher argued that drive seek times are affected by air turbulence within the drive’s enclosure, which makes them an excellent candidate for cryptographic entropy sampling. It’s not clear how this technique holds up against solid-state drives; probably not well.
  • Mouse and keyboard interaction. People are unpredictable. Fortunately for us, that’s a good thing. Many RNGs collect entropy by measuring the time between a user’s keystrokes or mouse movements, then gathering a couple of low-order bits and adding them to the pool.
  • Network events. Although network events (packet timings, for example) seem pretty unpredictable, most systems won’t use this data unless you explicitly tell them to. That’s because the network is generally assumed to be under the adversary’s control (he may be the one sending you those ‘unpredictable’ packets!) You disable these protections at your own risk.
  • Uninitialized memory. Ever forget to initialize a variable? Then you know that RAM is full of junk. While this stuff may not be random, certain systems use it on the theory that it probably can’t hurt. Occasionally it can — though not necessarily in the way you’d think. The classic example is this Debian OpenSSL bug, which (via a comedy of errors) meant that the PRG had only 32,768 possible seed values.
  • Goofy stuff. Some systems will try to collect entropy by conducting unpredictable calculations. One example is to start many threads counting towards infinity, then stop one with a hardware interrupt. I’ve done this once before and evaluated the output. I assure you that YMMV. Significantly.
  • Trusted Platform Module. Many desktop machines these days include a TPM chip on the motherboard. The good news about this is that every TPM contains an internal hardware RNG, which your OS can access if it has the right drivers. It ain’t fast, and the design hasn’t been publicly audited. Still, folding some of this into your entropy pool is probably a good idea.
  • New processor RNGs. To save us all this trouble, the next generation of Intel processors will contain a built-in hardware RNG/PRG, which goes by the codename ‘Bull Mountain’. Perhaps this will be the solution to all of our problems. (h/t David Johnston in comments.)

The upshot of all of this is that on a typical machine there’s usually enough ‘unpredictable’ stuff going on to seed a decent entropy pool. The real problems come up in systems that aren’t typical.

What about VMs and embedded devices?

Life inside an embedded device.

The problem with classical entropy gathering is that it assumes that unpredictable things will actually happen on the system. Unfortunately, VMs and embedded devices defy this expectation, mostly by being very, very boring.

Imagine the following scenario: you have a VM instance running on a server. It has no access to keyboard or mouse input, and only mediated access to hardware, which it shares with eight other VM instances.

Worse yet, your VM may be a clone. Perhaps you just burped up fifty instances of that particular image from a ‘frozen’ state. Each of these VMs may have loads of entropy in its pool, but it’s all the same entropy, across every clone sibling. Whether this is a problem depends on what the VM does next. If it has enough time to replenish its entropy pool, the state of the VMs will gradually diverge. But if it decides to generate a key: not good at all.

Embedded devices present their own class of problems. Unfortunately (like every other problem in the embedded arena) there’s no general solution. Some people obtain entropy from user keypad timings — if there is a user and a keypad. Some use the low-order bits of the ADC output. Still others forgo this entirely and ship their devices with an externally-generated PRG seed, usually stored in NVRAM.

I don’t claim that any of these are good answers, but they’re better than the alternative — which is to pretend that you have entropy when you don’t.

How do pseudo-random number generators work?

You’ve read the books. You’ve seen the movies. But when it comes down to it you still don’t understand the inner workings of the typical pseudo-random number generator. I can’t possibly make up for this in a single blog post, but hopefully I can hit a few of the high points.

Block cipher-based PRGs. One common approach to PRG construction uses a block cipher to generate unpredictable bits. This seems like a reasonable choice, since modern block ciphers are judged for their quality as pseudo-random permutations, and because most crypto libraries already have one lying around somewhere.

ANSI X9.31 PRNG implemented with AES (source). At each iteration, the PRNG takes in a predictable ‘date-time vector’ (DTi) and updated state value (Si). It outputs a block of random bits Ri. The generator is seeded with a cipher key (k) and an initial state S0.

One inexplicably popular design comes from ANSI X9.31. This PRG is blessed by both ANSI and FIPS, and gets used in a lot of commercial products (OpenSSL also uses it in FIPS mode). It takes in two seeds, k and S0 and does pretty much what you’d expect, on two conditions: you seed both values, and you never, ever reveal k.

If k does leak out, things can get ugly. With knowledge of k your attacker can calculate every previous and future PRG output from one single block of output!** This is totally gratuitous, and makes you wonder why this particular design was ever chosen — much less promoted.

Before you dismiss this as a theoretical concern: people routinely make stupid mistakes with X9.31. For example, an early draft of the AACS standard proposed to share one k across many different devices! Moreover keys do get stolen, and when this happens to your RNG you risk compromising every previous transaction on the system — even supposedly ‘forward-secure’ ones like ephemeral ECDH key exchanges. You can mitigate this by reseeding k periodically.

Hash-based PRGs. Many PRGs do something similar, but using hash functions instead of ciphers. There are some good arguments for this: hash functions are very fast, plus they’re hard to invert — which can help to prevent rewinding attacks on PRG state. Since there are zillions of hash-based PRGs I’ll restrict this discussion to a few of the most common ones:

  1. FIPS 186-2 (Appendix 3) defines a SHA-based generator that seems to be all the rage, despite the fact that it was nominally defined only for DSA signing. Windows uses this as its default PRG.
  2. Linux uses a hash-based PRG based on two variants of SHA.
  3. The non-FIPS OpenSSL PRG also uses a hash-based design. Like everything else in OpenSSL, it’s clearly documented and follows standard, well-articulated design principles.
Left: the Linux PRG (circa 2006). Right: the non-FIPS OpenSSL PRG.

Number-theoretic PRGs. The problem with basing a PRG on, say, a hash function is it makes you dependent on the security of that primitive. If a the hash turns out to be vulnerable, then your PRG could be as well.*** (Admittedly, if this happens to a standard hash function, the security of your PRG may be the least of your concerns.)

One alternative is to use a PRG that relies on well-studied mathematical assumptions for its security. Usually, you pay a heavy cost for this hypothetical benefit — these generators can be 2-3 orders of magnitude slower than their hash-based cousins. Still, if you’re down for this you have various choices. An oldie (but goodie) is Blum-Blum-Shub, which is provably secure under the factoring assumption.

If you like standards, NIST also has a proposal called Dual-EC-DRBG. Dual-EC is particularly fascinating, for the following three reasons. First, it’s built into Windows, which probably makes it the most widely deployed number-theoretic PRG in existence. Second, it’s slightly biased, due to a ‘mistake’ in the way that NIST converted EC points into bits.**** Also, it might contain a backdoor.

This last was pointed out by Shumow and Ferguson at the Crypto 2007 rump session. They noticed that the standard parameters given with Dual-EC could easily hide a trapdoor. Anyone who knew this value would be able to calculate all future outputs of the PRG after seeing only a 32-byte chunk of its output! Although there’s probably no conspiracy here, NSA’s complicity in designing the thing doesn’t make anyone feel better about it.

shrinking
Shrinking generator.

The rest. There are many dedicated PRG constructions that don’t fit into the categories above. These include stream ciphers like RC4, not to mention a host of crazy LFSR-based things. All I can say is: if you’re going to use something nonstandard, please make sure you have a good reason.

How much entropy do I need?

The general recommendation is that you need to seed your PRG with at least as much entropy as the security level of your algorithms. If you’re generating 1024-bit RSA keys, the naive theory tells you that you need at least 80 bits of entropy, since this is the level of security provided by RSA at that key size.

In practice you need more, possibly as much as twice the security level, depending on your PRG. The problem is that many PRNGs have an upper bound on the seed size, which means they can’t practically achieve levels higher than, say, 256 bits. This is important to recognize, but it’s probably not of any immediate practical consequence.

I don’t care about any of this, just tell me how to get good random numbers on my Linux/Windows/BSD system!

The good news for you is that modern operating systems and (non-embedded) hardware provide most of what you need, meaning that you’re free to remain blissfully ignorant.

On most Unix systems you can get decent random numbers by reading from /dev/random and /dev/urandom devices. The former draws entropy from a variety of system sources and hashes it together, while the latter is essentially a PRG that seeds itself from the system’s entropy pool. Windows can provide you with essentially the same thing via the CryptoAPI (CAPI)’s CryptGenRandom call.

Care must be taken in each of these cases, particularly as your application is now dependent on something you don’t control. Many cryptographic libraries (e.g., OpenSSL) will run their own internal PRG, which they seed from sources like the above.

I’ve designed my own PRG. Is this a good idea?

Maybe. But to be completely honest, it probably isn’t.

If I seed my PRG properly, is it safe to use RSA again?

Yes. Despite the title of the recent Lenstra et al. paper, there’s nothing wrong with RSA. What seems to have happened is that some embedded systems didn’t properly seed their (P)RNGs before generating keys.

I’m sure there’s more to it than that, but at a high level: if you make sure to properly seed your PRG, the probability that you’ll repeat a prime is negligibly small. In other words, don’t sweat it.

Notes:

* The security definition for a PRG is simple: no (computationally limited) adversary should be able to distinguish the output of a PRG from a sequence of ‘true’ random numbers, except with a negligible probability. An equivalent definition is the ‘next bit test’, which holds that no adversary can predict the next bit output by a PRG with probability substantially different from 1/2.

** Decrypting Ri gives you (Si XOR Ii), and decrypting DTi gives you Ii. You can now calculate Si by XORing the results. If you know DT{i-1} you can now compute R{i-1} and start the process over again. This was first noted by Kelsey, Schneier, Wagner and Hall in the context of an early version (X9.17). It works even if you only have a rough guess for the timestamp values — a pretty reasonable assumption, since some implementations specify a counter for the DT values.

*** It’s also important to be clear what security properties you’re relying on with a hash-based PRG. Most of the high-profile attacks on hash functions (e.g., MD5) focus on finding collisions; they’re not attacks on the pseudo-random nature of the outputs. In practice, this means you usually get lots of warning before a hash function becomes unsuitable for use in a PRG. Or maybe you won’t! Fun stuff.

**** Dual-EC is another fun example of NIST developing provably-secure looking protocols, but not actually including a security proof. This is particularly bizarre, because the only conceivable reason to use something as slow as Dual-EC is to gain this level of provable security. The generator is divided into two parts: the first generates pseudo-random EC points (this part is provable under the DDH assumption). The other part turns these points into bits. It’s the latter part that has the biasing flaw. Amusingly, the potential ‘backdoor’ wouldn’t be possible if the designers had built this part differently.

RSA keys: no insight whatsoever

I have a deadline coming up so (substantial) posting will be light this week.

For those of you who don’t read the New York Times, the big story of the week is this paper by Lenstra, Hughes, Augier, Bos, Kleinjung and Wachterlet:

Ron was wrong, Whit is right

We performed a sanity check of public keys collected on the web. Our main goal was to test the validity of the assumption that different random choices are made each time keys are generated. We found that the vast majority of public keys work as intended. A more disconcerting finding is that two out of every one thousand RSA moduli that we collected offer no security. Our conclusion is that the validity of the assumption is questionable and that generating keys in the real world for “multiple-secrets” cryptosystems such as RSA is significantly riskier than for “single-secret” ones such as ElGamal or (EC)DSA which are based on Diffie-Hellman.

Lots of people have written insightfully on this topic. See Dan Kaminsky’s post here, for example, or Thomas Ptacek’s excellent multi-part Twitter musing. (Update: much better, see Nadia Heninger’s explanation at the end of this post.)

There must be something wrong with me, because I find it almost impossible to draw any deep insight at all from this work. Don’t get me wrong: the paper itself is a fantastic piece of research; it sets a new standard for data analysis on public keys and certs. I hope we see more like it.

But what’s the takeaway? That two-key systems are insecure? That intelligence agencies have known this for years? Maybe. Whatever. The takeaway to me is that one (or more) RSA keygen implementations had a crappy RNG, or didn’t properly seed its PRG.

That’s really good to know about, but it isn’t the big news that the paper’s title would imply. It doesn’t have any implications for the use of RSA or any other cryptosystem. I’d sure like to solve the mystery of which implementations we need to look out for, and how to make sure this doesn’t happen again, but that’s literally the only thing I take away from this — so far.

I don’t mean to sound like a curmudgeon. Really, I want to believe. Please help me!

Update: Mystery solved! Nadia Heninger has a post at Freedom to Tinker explaining that most of these keys were generated by embedded devices, and that — through a parallel research effort — they actually know which devices. Once again extremely nice work. Even nicer than Lenstra et al., since it’s actually useful. (I can only imagine how Nadia and her team have been feeling the past two days, seeing ‘their’ result all over the New York Times. That’s responsible disclosure for you.)

SSL MITM Update

The other day I snarked about Trustwave’s decision to selltrustwavelogo subordinate root (‘skeleton’) certificates to their corporate clients, for the explicit purpose of destabilizing the web’s Public Key Infrastructure ‘legitimately’* intercepting TLS connections. This practice (new to me) is ostensibly only permitted in limited, controlled settings (usually to spy on a company’s employees).

Trustwave argues that the key was always safe inside of a Hardware Security Module and besides, they’re not doing it any more. (Kind of like saying that you handed out the master key to every door on earth but it’s ok ’cause you chained it to a hubcap.)

Unfortunately, the really bad news is that Trustwave may not be the only major CA implicated in this practice. And at least one browser vendor is planning to do something about it:

Dear Certification Authority, 
This note requests a set of immediate actions on your behalf, as a 

participant in the Mozilla root program.  

Please reply by {date 2 weeks out} to confirm completion of the 
following actions or state when these actions will be completed.  

1) Subordinate CAs chaining to CAs in Mozilla’s root program may not be 
used for MITM purposes, regardless of whether it is in a closed and 
controlled environment or not. Please review all of your subordinate CAs 
to make sure that they may not be used for MITM purposes. Any existing 
subordinate CAs that may be used for that purpose must be revoked and 
any corresponding HSMs destroyed by {date TBD}. For each subordinate CA 
that is revoked, send me: 
a) The certificate that signed the subCA. If it is a root certificate in 
NSS, then the root certificate’s subject and SHA1 fingerprint. 
b) The Serial Number of the revoked certificate. 
c) The CRL that contains the serial number of the revoked certificate. 

As a CA in Mozilla’s root program you are ultimately responsible for 
certificates issued by you and your subordinate CAs. After {date TBD} if 
it is found that a subordinate CA is being used for MITM, we will remove 
the corresponding root certificate. Based on Mozilla’s assessment, we 
may also remove any of your other root certificates, and root 
certificates from other organizations that cross-sign your certificates.  

2) Please add a statement to your CP/CPS committing that you will not 
issue a subordinate certificate that may be used for MITM or traffic 
management of domain names or IPs that the party does not legitimately 
own or control. Send me the URL to the updated document(s) and the 
impacted sections or page numbers. 

Participation in Mozilla’s root program is at our sole discretion, and 
we will take whatever steps are necessary to keep our users safe. 
Nevertheless, we believe that the best approach to safeguard that 
security is to work with CAs as partners, to foster open and frank 
communication, and to be diligent in looking for ways to improve. Thank 
you for your participation in this pursuit. 

Regards, 
Kathleen Wilson 
Module Owner of Mozilla’s CA Certificates Module 

Now I’m no bomb-thrower, but if it were up to me, {Date TBD} would be yesterday and there would be no re-entry for the CAs caught doing this kind of thing. Still, I’m glad that Mozilla is doing this, and we’re all lucky that they have the independence and browser share to force this kind of change.

But not everything is sunshine and rainbows:

  1. We have to trust that the CAs in question will respond honestly to Mozilla’s inquiry and will voluntarily exit a (presumably) lucrative business. This relies very much on the honor system, and it’s hard to presume much honor in a CA that would sell such a product in the first place.
  2. Mozilla only represents 25% of the browser share, and that seems to be falling. That’s probably enough to make the difference — today — but it’d be nice to hear something similar from Microsoft or Google.
  3. We still lack a good client-side mechanism for detecting and reporting unusual (that is: correctly signed, but inconsistent) certificates. Given the news from Trustwave, such a mechanism seems more useful than ever.

We cannot possibly have faith in the security of the Internet when CAs are willing to engage in this kind of practice — even if they do it under the most ‘carefully-controlled’ conditions.

* Whatever that means.

Trustwave announces name change: henceforth will simply be ‘Wave’

This story has making the rounds for about a week and I’m still shockedtrustwavelogo by it. Here’s the postage stamp version: at some point in the past few years, certificate authority Trustwave basically handed out their root signing capability to a third party company. But don’t worry, it’s all better now.

As with any such story, there are bits we know and bits we have to speculate about. Speculation is more fun, so let’s start there:

Once upon a time there was a company — let’s call them ACME Inc — who really didn’t trust its employees. For ACME the solution was vigilance. Constant, invasive vigilance. ACME’s IT department was given the task of intercepting every packet sent to and from the corporate network, which seemed straightforward; until someone pointed out that they could intercept all the packets they wanted, but they couldn’t necessarily read them. Especially not the ones encrypted with SSL/TLS.

Now this isn’t a killer. ACME had a few options: they could (a) block SSL/TLS at their network gateway, forcing everyone to use cleartext connections. They could (b) force their employees to use some awkward SSL proxy. If they were feeling ambitious, they could even (c) run a man-in-the-middle on every SSL connection initiated from within their corporate network. The last option would result in some awkward certificate errors, however — which would be unpleasant for web users, and downright nasty for embedded devices or headless boxes.

But really, each of these solution is just a different version of flypaper. Why catch flies with flypaper, when you can totally screw with the trust model of the Internet?

And this is where we get to the facts. A few years back ACME — or some company like ACME — approached Trustwave with this problem. Trustwave seemed like a good group to ask, since they’re one of the select few companies that make SSL certificates, i.e., they’re one of the ‘authorities’ whose root certs are built into all of the major browsers and OSes.

Somehow the two companies cooked up the following plan. Trustwave would generate a new ‘subordinate root’ certificate with full signing authority. Anyone who possessed the signing key for this cert would essentially be Trustwave — meaning that they could vouch for any website they wanted. Of course, such a key would be enormously valuable (and dangerous). No responsible CA would allow such a thing to leave their facilities.

But apparently Trustwave’s motto is ‘think different’. So they cheerfully packed the signing key into a Hardware Security Module and sent it over to ACME. From that point on, ACME possessed the ability to transparently impersonate any SSL website on the Internet.

And impersonate they did; whenever some client initiated an SSL connection from within ACME’s corporate network, an ACME server would intercept the connection, sign a fresh certificate on the requested domain, then deliver that cert back to the client. To the client, it appeared that the connection went through perfectly. But of course the client was now talking to ACME’s server, not to the company whose name was on the certificate. ACME would in turn connect on to the target SSL server, thus completing the connection.

Technically this tampering wasn’t totally invisible; a clever user might notice that every certificate was now signed by Trustwave — and comparison with certificates received outside of ACME’s network would clearly reveal something funny going on. But since the vast majority of web users don’t check this kind of thing, the interception was basically transparent.

Now I hope I don’t need to tell you why this is a bad idea. Let’s just take it a a given that this is a bad idea. Even Trustwave now realizes it’s a bad idea, and have ‘proactively’ revoked the cert to make sure the evil thing doesn’t fall into the wrong hands. From their blog post about it:

Trustwave has decided to be open about this decision as well as stating that we will no longer enable systems of this type and are effectively ending this short journey into this type of offering.

I guess we can at least be thankful that Trustwave has decided to be open about this decision, despite the fact that they weren’t open about it while it was happening. Let’s all hope this is really the last journey Trustwave plans to take into this type of offering, where by ‘offering’ I mean — disastrous, short-sighted mistake.

Satellite phone encryption is terrible. Anyone surprised?

I adhere to a ‘one post, one topic’ rule on this blog, which means that this weekend I actually have to choose which bad-crypto news I’m going to blog about.

It’s a tough call, but the most interesting story comes via Erik Tews, who recently attended a talk on satellite phone security at Ruhr Universität Bochum. It seems that researchers Benedikt Driessen, Ralf Hund, Carsten Willems, Christof Paar, and Thorsten Holz have reverse-engineered and cryptanalyzed the proprietary ciphers used in the GMR-1 and GMR-2 satellite telephone standards.* If you’ve never heard of these standards, what you need to know is that they power the networks of satphone providers Thuraya and Inmarsat.

The verdict? Encrypting with these ciphers is better than using no encryption. But not necessarily by much.

I guess this shouldn’t come as a big shock — link privacy in mobile telephony has always been kind of a mess. And the GMR ciphers come from the same folks (ETSI) who brought us the A5-series GSM ciphers. If you pay attention to this sort of thing, you probably know that those ciphers have also had some problems. In fact, today it’s possible to download rainbow tables that permit (efficient) decryption of A5/1-encrypted GSM phone calls.

A5/1 is actually the strong member of the GSM family. For export purposes there’s A5/2 — a weakened version with a much shorter key. You don’t hear about people downloading huge A5/2 rainbow tables, mostly because you don’t need them. A5/2 is vulnerable to ciphertext-only attacks that run in a few minutes on a standard PC.

A5/2 GSM cipher. Image: Barkan, Biham, Keller.

ETSI seems to have had A5/2 in mind when developing the GMR-1 and GMR-2 ciphers. Both are custom designs, use short keys, and depend heavily on obscurity of design to make up for any shortcomings (the ciphers are only given to manufacturers who sign an NDA). This secrecy hardly inspires confidence, and worse yet, it doesn’t even do a good job of keeping things secret. The R.U.B. researchers didn’t have to break into Thuraya’s hardware lab; they simply reversed the ciphers from handset firmware updates.**

GMR-1 uses an LFSR-based cipher quite similar to A5/2 (pictured above), which means that it’s vulnerable to a similar class of attacks. Since the underlying plaintext has correctness checks built into it, it’s possible to recover the key using only ciphertext and about 30 minutes on a standard PC. The GMR-2 cipher is a bit more sophisticated (and weirder to boot), but it also appears to have weaknesses.

So why is this a big deal? The obvious answer is that satellite telephone security matters. In many underdeveloped rural areas it’s the primary means of communicating with the outside world. Satphone coverage is also important in war zones, where signal privacy is of more than academic interest.

Moreover, eavesdropping on satellite communications is (in principle) easier than eavesdropping on cellular signals. That’s because satellite ‘spot beams’ cover relatively broad geographic territories (Thuraya’s are 600km on average). So you don’t just have to worry about eavesdropping by your neighbor, you have to worry about eavesdropping by neighboring countries.

The really sad thing is that, unlike cellular networks — which are fundamentally vulnerable to government eavesdropping at the infrastructure level — satellite networks like Thuraya/Inmarsat don’t need local infrastructure. That means their systems really could have provided privacy for individuals persecuted by oppressive regimes. You can argue about whether the manufacturers even had the option to use strong ciphers; it’s quite possible they didn’t. Still, I suspect this will be cold comfort to those who suffer as a direct result of ETSI’s design choices.

Those who are really in the know (news organizations, for example) claim to use additional security measures beyond the built-in link encryption found in GMR-1 and GMR-2. Presumably these days the best way to do that is to run your own voice protocol via the packet data extensions. This practice ought to become more common going forward; now that the GMR-1 code is public, it looks like the barriers to eavesdropping are going to go down quite a bit.

The slides above come from this presentation.

Notes:

* Update 2/16/2012: I had some initial confusion about the authorship on this work, but the research paper clears it all up: see here.

** And by ‘simply’, I mean ‘with great expertise and difficulty’ — don’t read this as trivializing the effort involved. Obtaining the ciphers meant disassembling code written in a proprietary DSP instruction set, and then searching for a cipher without knowing exactly what it looks like. All in all a pretty significant accomplishment. The point here is that it could have been a lot harder. If you’re going to keep a cipher secret, you shouldn’t release it as software in the first place.

Multiple encryption

 

Not everything combines well.

While browsing some community websites, I noticed a few people talking about the security of double (or more generally, multiple) encryption. Multiple encryption addresses the following problem: you have two (or more) encryption schemes, and you’re worried that one of them might get compromised. Surely if you encrypt with both at the same time you’ll buy yourself an added safety margin.

 

Let me preface this by saying that multiple encryption addresses a problem that mostly doesn’t exist. Modern ciphers rarely get broken — at least, not in the Swordfish sense. You’re far more likely to get hit by malware or an implementation bug than you are to suffer from a catastrophic attack on AES.*

That said, you really are likely to get hit by malware or an implementation bug. And that’s at least one argument for multiple encryption — if you’re willing to encrypt on separate, heterogenous devices.** There’s also the future to think about. We feel good about AES today, but how will we feel in 2040?

I note that these are problems for the extremely paranoid — governments, mostly — not for the typical developer. The majority of us should work on getting single encryption right. But this kind of thing isn’t ridiculous — the NESSIE standards even recommend it. Moreover, my experience is that when people start asking questions about the security of X, it means that they’re already doing X, and have been for some time.

So for all that, it’s worth answering some of these questions. And roughly speaking, the questions are:

  1. Am I better off encrypting with two or more encryption schemes (or keys?)
  2. Could I be worse off?
  3. If I have to do it, how should I do it securely?
Given how little sleep I’ve gotten recently I don’t promise to answer these fully, or in any particular order. But I do hope I can provide a little bit of insight around the edges.

Preliminaries

There are many ways to double encrypt, but for most people ‘double encryption’ means this:

 

SuperDuperEncrypt(KA, KB, M) = EncryptA(KA, EncryptB(KB, M))

This construction is called a cascade. Sometimes EncryptA and EncryptB are different algorithms, but that’s not really critical. What does matter for our purposes is that the keys KA and KB are independently-generated.*** (To make life easier, we’ll also assume that the algorithms are published.)A lot has been written about cascade encryption, some good and some bad. The answer to the question largely depends on whether the algorithms are simply block ciphers, or if they’re true encryption algorithms (e.g., a mode of operation using a block cipher). It also depends on what security definition you’re trying to achieve.

The good

Let’s consider the positive results first. If either EncryptA or EncryptB is ‘semantically secure’, i.e., indistinguishable under chosen-plaintext attack, then so is the cascade of the two. This may seem wonky, but it’s actually very handy — since many common cryptosystems are specifically analyzed under (at least) this level of security. For example, in the symmetric setting, both CBC and CTR modes of operation can both be shown to achieve this security level, provided that they’re implemented with a secure block cipher.

So how do we know the combined construction is secure? A formal proof can be found in this 2002 paper by Herzberg, but the intuition is pretty simple. If there’s an attack algorithm that ‘breaks’ the combined construction, then we can use that algorithm to attack either of the two underlying algorithms by simply picking our own key for the other algorithm and simulating the double encryption on its ciphertexts.

This means that an attack on the combination is an attack on the underlying schemes. So if one is secure, you’re in good shape.

The not-so-good

Interestingly, Herzberg also shows that the above result does not apply for all definitions of security, particularly strong definitions such as adaptive-chosen ciphertext security. In the symmetric world, we usually achieve this level of security using authenticated encryption.

To give a concrete (symmetric encryption) example, imagine that the inner layer of encryption (EncryptB) is authenticated, as is the case in GCM-mode. Authenticated encryption provides both confidentiality (attackers can’t read your message) and authenticity (attackers can’t tamper with your message — or change the ciphertext in any way.)

Now imagine that the outer scheme (EncryptAdoesn’t provide this guarantee. For a simple example, consider CBC-mode encryption with padding at the end. CBC-mode is well known for its malleability; attackers can flip bits in a ciphertext, which causes predictable changes to the underlying plaintext.

The combined scheme still provides some authenticity protections — if the attacker’s tampering affects the inner (GCM) ciphertext, then his changes should be detected (and rejected) upon combined decryption. But if his modifications only change the CBC-mode padding, then the combined ciphertext could be accepted as valid. Hence the combined scheme is ‘benignly’ malleable, making it technically weaker than the inner layer of encryption.

Do you care about this? Maybe, maybe not. Some protocols really do require a completely non-malleable ciphertext — for example, to prevent replay attacks — but in most applications these attacks aren’t world-shattering. If you do care, you can find some alternative constructions here.

The ugly

Of course, so far all I’ve discussed is whether the combined encryption scheme is at least as secure as either underlying algorithm. But some people want more than ‘at least as’. More importantly, I’ve been talking about entire encryption algorithms (e.g., modes of operation), not raw ciphers.

So let’s address the first question. Is a combined encryption scheme significantly more secure than either algorithm on its own? Unfortunately the answer is: not necessarily. There are at least a couple of counterexamples here:

  1. The encryption scheme is a group. Imagine that EncryptA and EncryptB are the same algorithm, with the following special property: when you encrypt sequentially with KA and KB you obtain a ciphertext that can be decrypted with some third key KC.**** In this case, the resulting ciphertext ought to be at least as vulnerable as a single-encrypted ciphertext. Hence double-encrypting gives you no additional security at all. Fortunately modern block ciphers don’t (seem) to have this property — in fact, cryptographers explicitly design against it, as it can make the cipher weaker. But some number-theoretic schemes do, hence it’s worth looking out for.
  2. Meet-in-the-Middle Attacks. MiTM attacks are the most common ‘real-world’ counterexample that come up in discussions of cascade encryption (really, cascade encipherment). This attack was first discovered by Diffie and Hellman, and is a member of a class we call time-space tradeoff attacks. It’s useful in constructions that use a deterministic algorithm like a block cipher. For example:DOUBLE_DES(KA, KB, M) = DES_ENCRYPT(KA, DES_ENCRYPT(KB, M))

    On the face of it, you’d assume that this construction would be substantially stronger than a single layer of DES. If a brute-force attack on DES requires 2^56 operations (DES has a 56-bit key), you’d hope that attacking a construction with two DES keys would require on the order of 2^112 operations. But actually this hope is a false one — if the attacker has lots of storage.

    The attack works like this. First, obtain the encryption C of some known plaintext M under the two unknown secret keys KA and KB. Next, construct a huge table comprising the encipherment of M under every possible DES key. In our DES example there are 2^56 keys, this would take a corresponding amount of effort, and the resulting table will be astonishingly huge. But leave that aside for the moment.

    Finally, try decrypting C with every possible DES key. For each result, check to see if it’s in the table you just made. If you find a match, you’ve now got two keys: KA’ and KB’ that satisfy the encryption equation above.*****
    If you ignore storage costs (ridiculously impractical, but which may also be traded for time), this attack will run you (2^56)*2 = 2^57 cipher operations. That’s much less than the 2^112 we were hoping for. If you’re willing to treat it as a chosen plaintext attack you can even re-use the table for many separate attacks.

  3. Plaintext distribution issues. Maurer showed one more interesting result, which is that in a cascade of ciphers, the entire construction is guaranteed to be as secure as the first cipher, but not necessarily any stronger. This is because the first cipher may introduce certain patterns into its output that can assist the attacker in breaking the second layer of encipherment. Maurer even provides a (very contrived) counterexample in which this happens.

    I presume that this is the source of the following folklore construction, which is referenced in Applied Cryptography and other sources around the Internet:UberSuperEncrypt(KA, KB, M) = EncryptA(KA, R⊕M) || EncryptB(KB, R))

    Where || indicates concatenation, and R is a random string of the same length of the message. Since in this case both R and R⊕M both have a random distribution, this tends to eliminate the issue that Maurer notes. At the cost of doubling the ciphertext size!

Now the good news is that multiple encipherment (done properly) can probably make things more secure. This is precisely what constructions like DESX and 3DES try to achieve (using a single cipher). If you make certain strong assumptions about the strength of the cipher, it is possible to show that these constructions are harder to attack than the underlying cipher itself (see this analysis of DESX and this one of 3DES).

I warn you that these analyses use an unrealistic model for the security of the cipher, and they don’t treat multiple distinct ciphers., Still, they’re a useful guide — assuming that your attacker does not have any special attack against (at least one) of the underlying schemes. Your mileage may vary, and I would generally advise against assembling this sort of thing yourself unless you really know what you’re doing.

In summary

I’m afraid this post will end with a whimper rather than a bang. It’s entirely possible to combine encryption schemes in secure ways (many of which are not cascade constructions), but the amount of extra security you’ll get is subject to some debate.

In fact, this entire idea has been studied for a quite a while under the heading of (robust) combiners. These deal with combining cryptosystems (encryption, as well as hashing, signing, protocols, etc.) in a secure way, such that the combination remains secure even if some of the underlying schemes are broken.

If you’re interested, that’s the place to start. But in general my advice is that this is not something that most people should spend a lot of time doing, outside of (perhaps) the government and the academic world. If you want to do this, you should familiarize yourself with some of the academic papers already mentioned. Otherwise, think hard about why you’re doing it, and what it’s going to buy you.

Notes:

  
* And yes, I know about MD5 and the recent biclique attacks one AES. That still doesn’t change my opinion.

** Note that this is mostly something the government likes to think about, namely: how to use consumer off-the-shelf products together so as to achieve the same security as trusted, government-certified hardware. I’m dubious about this strategy based on my suspicion that all consumer products will soon be manufactured by Foxconn. Nonetheless I wish them luck.

*** This key independence is a big deal. If the keys are related (worst case: KA equals KB) then all guarantees are off. For example, consider a stream cipher like CTR mode, where encryption and decryption are the same algorithm. If you use the same algorithm and key, you’d completely cancel out the encryption, i.e.: CTR_ENC(K, IV, CTR_ENC(K, IV, M) = M.

**** Classical substitution ciphers (including the Vigenere cipher and Vernam One-Time Pad) have this structure.

***** The resulting KA’ and KB’ aren’t necessarily the right keys, however, due to false positives: keys that (for a single message M) satisfy DES(KA’, DES(KB’, M)) = DES(KA, DES(KB, M)). You can quickly eliminate the bad keys by obtaining the encryption of a second message M’ and testing it against each of your candidate matches. The chance that a given false positive will work on two messages is usually quite low.