Neat research ideas that went nowhere: Preventing offline dictionary attacks with CAPTCHAs

Once in a while I come across a research paper that’s so off the beaten path and so obviously applicable to a real problem, that I know it’s going to sink like a stone and never be heard from again.

Today I’d like to talk about an idea that was introduced back in CRYPTO 2006, but doesn’t seem to have gotten any real traction out in the real world since. (If I’m wrong and someone is using it, feel free to email me. I’d be happy to eat my words and hear more.)

This particular idea comes from a 2006 paper by Ran Canetti, Shai Halevi and Michael Steiner,* and it addresses a very real problem that we have with deployed encryption schemes: namely, they tend to rely on passwords. This means that even if you use strong cryptography, all of your data is vulnerable to someone with a password dictionary and a lot of computing power.

Password-Based Encryption (and why we hate it)

To explain how Canetti, Halevi and Steiner propose to deal with this, I need to give a little bit of background on password-based encryption.

First of all of all, no matter what Dan Brown says, we’ve gotten pretty good at building encryption schemes that stand up against very motivated adversaries. If you’re using a modern scheme and are properly generating/storing your secret keys, then it’s awfully unlikely that a third party is going to decrypt your data — even if that party is a government agency.** (I’m leaving aside the wrench factor).

Unfortunately, the previous paragraph embeds one major assumption that makes it inapplicable to the way that many people secure their data. You see, in a cryptographer’s imagination, everyone encrypts with a strong, random 128+ bit secret key, which they memorize and never write down.

But in real life, people mostly prefer to use their cat’s name.

The use of passwords to derive encryption keys is a lot more common than it should be. It’s depressingly common in many applications where people really need security. Passwords are the default authorization method for many fancy encrypting hard-drives and WDE packages. Windows just loves to encrypt things under your login credentials. (Many of these tools also support hardware tokens and TPM, but these are such a drag that most people don’t bother.)

To your attacker, this is a feature not a bug. If a key is derived from dictionary words, or some predictable mutation of dictionary words, then it can be guessed. All it takes is time and  computing power. Moreover, since password guessing is embarrassingly parallel, it’s just the kind of enterprise that you can throw a data center or five at. (Ever wondered why the state of Maryland uses so much electricity?)

So far I haven’t said anything you didn’t already know. But just for fun, let’s quickly talk about what we currently do to prevent dictionary attacks. There are basically three approaches:

  1. Use some trusted, on-line server (or tamper-resistant hardware) to assist with the key derivation process. If done correctly, the server/hardware can shut off an attacker after a few invalid guesses, assuming that it isn’t also compromised. Unfortunately, this solution assumes that the server or hardware will be available when you need your data. That isn’t always possible.
  2. Use key stretching, which increases the time required to convert a password into a cryptographic key. This can slow down a dictionary attack, but it just slows it down. Moreover, it’s pretty ineffective against an attacker with specialized hardware.
  3. Use salt, which is hardly worth mentioning since if you aren’t salting your passwords then you’re crazy. Unfortunately all this does is prevent pre-computation. It does nothing against normal dictionary attacks.

The human touch

In a perfect world, the online solution (1) above would probably be the best way to go. Unfortunately, we don’t live in a perfect world, and this may not be compatible with people’s requirements.

While it’s not possible to prevent dictionary attacks in an offline system, we may be able to slow them down. To do this, we have to make the decryption process much more resource-intensive. Unfortunately, it’s hard to win this game through computation alone, at least not in a world full of FPGAs and elastic compute clouds.

Not this sort of brains. (XKCD).

What Canetti, Halevi and Steiner realized is that there’s one resource that the adversary might not have in abundance: human brains.

What if we could mandate some sort of human interaction each time a user tries to decrypt a file? This shouldn’t be too burdensome for the legitimate decryptor, since presumably he’s human, and moreover he only needs to try once.

But a password cracker needs to try millions of passwords. Requiring a human interaction for each attempt should throw a serious wrench into the process.

Using Human Puzzles to stop dictionary attacks

Fortunately we already have a whole class of technologies that differentiate human beings from computers. Generally, these are known as Reverse Turing Tests (RTTs). Canetti, Halevi and Steiner propose to use RTTs to construct “human puzzles”. At a high level, the idea is pretty simple.

In every PBE scheme, a password must first be converted into a key. Normally you do this by hashing the password and some salt via a standard key derivation function (e.g., PBKDF2). Usually this is where you’d stop. The output of this function, which we’ll call K, would be the key used to encrypt and decrypt.

However, now we’re going to take things one step further. The first key will be further processed by converting it into a puzzle that only a human can solve. When a human solves this puzzle, they create a new key that, ideally, a computer all by itself should not be able to recover. The process should be reliable — it needs to produce the same result every time it’s run, possibly across many different users.

Once you get past the theoretical stuff, the concrete example that Canetti, Halevi and Steiner propose is to use CAPTCHAs to implement human puzzles.

CAPTCHAs: don’t you love them?

In their example, the process of deriving an encryption key from a password now looks like this:

  1. First hash the password with some random salt r to derive a first key K. Now break this into a bunch of small chunks, each (for example) about 20 bits long.
  2. Use each of these 20-bit chunks to index into a huge table containing 2^20 (about 1 million) distinct CAPTCHA images. Each CAPTCHA contains a hard-to-read string with at least 20 bits of entropy in it. This string could be a bunch of numbers, or some randomly-selected dictionary words.
  3. Give each CAPTCHA to the human user to solve. Collect the results.
  4. Hash all of the user’s responses together with another salt s to derive a final decryption key.

If we instantiate the hash function with, say, PBKDF2 and a 160-bit output, then this requires the legitimate user to solve 8 puzzles the first time they encrypt, and every subsequent time they decrypt.

The dictionary attacker, however, has to solve eight puzzles for every password they try. If we assume that they have to guess a large number of passwords, this means that over time they’ll have to solve a substantial fraction (if not all) of the 1 million CAPTCHAs. Even if they can outsource this work at a rate of 5 cents per CAPTCHA, solving the whole collection will still cost $50,000 and a whole lot of hassle.

(This is a hugely simplified analysis. Obviously the attacker doesn’t have to solve all of the CAPTCHAs. Rather, their probability of guessing the key is a function of the number of CAPTCHAs, the fraction of the CAPTCHAs that they’ve been able to solve, the complexity of the password, etc. See the full paper for a much better analysis.)

The sticky wicket

Now, this really might work, in the sense that it will slow down attacks. But if you’re a practical-minded sort, you probably have one small question about this proposal. Namely: where do the CAPTCHAs come from?

Well, a simple answer is that you generate them all randomly the first time you encrypt, and you ship them along with the encrypted file. If we assume that each CAPTCHA is a 4KB image file, then storing the entire collection will require about 4GB.

That’s seems like a lot of space, but really it isn’t. After all, it’s not like we’re working with floppy drives. We live in a world where a 16GB USB Flash Drive can be had for about $10. So it’s perfectly reasonable to assume that we’ll have lots storage for things we really care about.

In summary

Now maybe CAPTCHAs aren’t your thing. Maybe you don’t like the idea of storing 4GB worth of files just to ship someone an encrypted ZIP file. That’s fine. One of the neatest open problems in the paper is to find new sorts of human puzzles. In general, a human puzzle is any function that converts some input (say a 20-bit string) into an output, such that only a human being can do the transformation.

These don’t need to rely on bulky, stored CAPTCHAs. Hypothetically you could come up with a human puzzle that dynamically generates instances without a big stored database. For example, your puzzle might be based on recognizing human expressions, or inkblots. It just has to be reliable.

And that’s why I think this is such a neat paper. Whether or not this particular idea changes the world, it’s a new idea, it tries to solve a hard practical problem that most people have given up on, and it inspires some new ideas.

The crypto publishing game rarely produces papers with that particular combination. Just for that, I’d love to see it go somewhere.

Notes:

* Ran Canetti, Shai Halevi and Michael Steiner. 2006. Mitigating Dictionary Attacks on Password-Protected Local Storage. In CRYPTO ’06.

** Leaving aside what we all know the NSA’s got in its basement.

An update on our contest

Update 11/26/11: You can download all of these symbols in PDF and high-res PNG here.

A couple of weeks ago I sang the praises of the AIGA Symbol Signs package. I think it’s a great way to get nice looking presentations and diagrams, especially if you’re as graphically challenged as I am. (H/t goes to Thomas Ptacek for introducing me to it.)

The one thing Symbol Signs lacks is a good symbol for the Adversary. I’ve been using the diapered baby. Thomas likes the Martini glass. And yet both of these are really hacks. We can do better.

Thus I proposed a contest: $200 to the person who could come up with a new AIGA-consistent symbol to use for the adversary. Honestly, I kind of figured I’d never have to shell out. So it’s with pride and some financial trepidation that I today present not one, but two fantastic entries. The first, really a collection of adversaries, comes from my good friend Sam Small:

I particularly like that he provided me with the adversaries in both sexes. In my teaching I’ve been making a concerted effort to provide equal time to both female adversaries. I find that too many male (and even female!) cryptographers implicitly relate adversarial behavior to the male gender. (Apparently the whole Eve thing is just a figleaf.) But this isn’t fair. Women can be attackers too!

Also, I like the devil baby.

Our second entry comes straight from Thomas Ptacek himself. He only has one proposal, but what he lacks in quantity he makes up for in sinisterness:

And so here we are. I’m not quite sure who the winner is — it’ll take some thinking — but I can tell you that thanks to Sam and Thomas the security community is better off today than we were a few weeks ago.

As soon as I have time (this weekend) I’ll post both of these adversaries in downloadable PDF form. I urge you to use them frequently, and to use them wisely.

The first rule of vulnerability acknowledgement is: there is no vulnerability acknowledgement

Just for fun, today we’re going to look at two recent vulnerability acknowledgements. The first one’s pretty mild; on the Torino scale of vulnerability denial, it rates only about a three:

The research team notified Amazon about the issues last summer, and the company responded by posting a notice to its customers and partners about the problem. “We have received no reports that these vulnerabilities have been actively exploited,” the company wrote at the time. 

But this one from RSA, wow. The charts weren’t made for it. I suggest you read the entire interview, perhaps with a stiff drink to fortify you. I warn you, it only gets worse.

If our customers adopted our best practices, which included hardening their back-end servers, it would now become next to impossible to take advantage of any of the SecurID information that was stolen.

… We gave our customers best practices and remediation steps. We told our customers what to do. And we did it quickly and publicly. If the attackers had wanted to use SecurID, they would want to have done it quietly, effectively and under the covers. The fact that we announced the attack immediately, and the fact that we gave our customers these remediation steps, significantly disadvantaged the attackers from effectively using SecurID information.

… We think because we blew their cover we haven’t seen more evidence [of successful attacks].

I have a paper deadline midweek, so blogging will be light ’til then. Once that’s done, I’ll have something more substantial to say about all this.

Format Preserving Encryption, or, how to encrypt a credit card number with AES

People like to encrypt all kinds of things. This is really their own business. One of the lovely things about modern crypto is that for the most part, we don’t care.

You see, data is data. If you can encode it as a file then we can tell you how to encrypt it. There’s rarely any reason to develop special domain-specific encryption algorithms, like image encryptors. If you’re developing such an algorithm then chances are you’re doing it to implement some sort of goofy DRM scheme, rather than to provide serious cryptographic protection.

However, once in a blue moon, you do come across a case where you legitimately need an encryption algorithm that works with specific data types.

For example, let’s say I want to encrypt my credit card number. In principle this is no problem: card numbers are just numbers, after all.

Assuming that we want a general technique*, I could simply represent my 16-digit VISA card number as an integer between 0 and 9,999,999,999,999,999. In binary this value will require up to 54 bits to store (since lg 10,000,000,000,000,000 = 53.1508495).

In the absolute ‘best case’ — assuming that we don’t need to pad the message or add an IV (for example, if we’re using a stateful stream cipher, like AES-CTR), and we don’t care about authentication, we can encrypt this value without expansion. Thus, the ciphertext will be a randomish 54-bit string. Not bad at all.

But here’s a question: what if our software can’t handle a 54-bit string?

No, this isn’t crazy! For example, let’s say we need to store that encrypted card number in some legacy retail system that only accepts standard 16-digit decimal card numbers. Or alternatively, we want to transmit it through a normal payment processing network.** For both of these applications, we may need to first translate our ciphertext back into something that looks like a credit card number.

And this can be a problem. Certainly, the naive solution doesn’t work. If you’ve properly encrypted the card number, it should now be a uniformly distributed 54-bit string, which encodes to an integer between 0 and 18,014,398,509,481,983. Too many digits!

Moreover, there are applications where you might require additional properties from your encryption scheme. For example, it might be necessary to search a database for a particular card number. One way to do this is to use a deterministic encryption scheme, one that’s a permutation on the message space. This ensures that a given card number X will always encrypt to a unique number Y, which makes searching and comparison really easy.

If we didn’t care about the size of the ciphertext, we might be able to get away with encrypting the credit card number as a single block in ECB mode. Block ciphers are permutations, so this would satisfy the requirements just above.***

Unfortunately, this even further exacerbates our original problem. The block size for AES is 128 bits, even poor old DES is 64 bits. If we use ECB mode to encrypt, that’s how long the resulting ciphertext will be. We’ll never be able to encode something that long back to a 16-digit decimal number.

So what to do?

Format-Preserving Encryption

Fortunately there’s an answer to these problems, and it goes by the name of “Format-Preserving Encryption“, or FPE. As the name implies, the goal of a Format-Preserving Encryption scheme is to securely encrypt while preserving the original formatting of the plaintext data. That means, in a nutshell, that our 16-digit card number can encrypt to a 16-digit number, no problem at all.

The seminal work in this area was written by Black and Rogaway back in 2002. Right off the bat, these guys made three very important observations:

  1. There are all sorts of applications where you need to preserve the format of input data.
  2. You could design your own cipher to do it, but it would probably suck.****
  3. It would sure be nice if we could do it securely using standard ciphers, like AES.

And then, in one of those amazing coincidences that always seem to turn up in research papers, Black and Rogaway discovered that you can do it with standard ciphers. And better yet, you can do it fairly efficiently.

Two Problems

What Black and Rogaway noticed is that there are really two problems here. The first is to build a cipher that works on arbitrary bit lengths. The second problem is to use such a cipher to encrypt arbitrary sets, like the set of all people named “Gino” who live in Atlanta. Or, more usefully, the set of integers from 0 to 9,999,999,999,999,999.

Even better, they pointed out, the first problem has already been solved. Various people, starting with Luby and Rackoff, had shown how to use a pseudorandom function or permutation (e.g., AES) to build ciphers of arbitrary block size. This is usually accomplished with a Feistel network construction like the one used in the DES cipher.

The basic idea is illustrated in the diagram below. To build a cipher that operates on 54-bit blocks, for example, we would cut the input block input block into two 27-bit halves, L0 and R0, and push them through a Feistel network for several rounds. This is an amazingly simple construction; it requires only XOR and some non-linear function F(), which we’ll implement using an existing cipher like AES (with truncation of the output).

Note that each call to AES must use a different key, although we can derive these all from one single key using standard techniques. Since AES is much better than the DES S-boxes, we won’t need to run the network for 16 rounds. Three or four rounds is probably sufficient.*****

feistel
Diagram of a Feistel network. The input (left side) is split into two halves and passed
through a number of rounds. In the Luby-Rackoff variant, each function “F” is implemented via an independent, keyed pseudorandom function. Decryption is accomplished by reversing the process.

But like I said, this is only half the battle. Now we can construct a strong block cipher that operates on 54-bit inputs. I said above that this big enough to handle credit card, but I also pointed out that it’s too big.

If we encrypt a card number with such a cipher (in ECB mode), we’ll end up with a randomly-distributed 54-bit ciphertext that might not easily translate back into a 16-digit integer. We still need something more.

Cycling

Black and Rogaway had a few ideas on how to deal with this. One of their approaches is something called ‘cycling’, which is so astonishingly straightforward that you’d assume it doesn’t work. It does!

Here’s how it works. Let’s say I want to encrypt my Visa card number (“4532294977918448“). I’ll follow these steps:

  1. First, I’ll encode the number as 54-bit integer, then encrypt it using the fancy 54-bit block cipher I built above.
  2. At the end of this process I should have 54-bit ciphertext C. Now, there’s a very good chance that, represented as an integer, C will be ‘small’ enough to represent as a 16-digit integer. If so, then I’m done. That’s my output!
  3. But what if C is too large? No problem. If it first you don’t succeed, just try again. Take that first ciphertext C, and encrypt it again using the blockcipher to get a new C. Go back to step 2.

I’ll let you work out the decryption algorithm.

Now, the pedantic among you will observe that technically speaking, this encryption process could put you in an infinite a ridiculously long (but finite) loop.***** At very least, you can’t say in advance how long it will go on.

However, if we assume that the cipher behaves ‘randomly’, then we can easily get an idea of how likely we are to be satisfied at stage (2). Roughly speaking, there’s a 55% probability that we’ll get it right each time through the loop (10,000,000,000,000,000 / 2^54). This means that on average we shouldn’t require more than a couple of encryptions

If you’re not convinced, go grab a (slightly biased) coin and start flipping. You’re done when you get tails.

A note on security

Cycling is not the only technique that Black and Rogaway propose. They have a couple of others that may be better for your specific application. You can check out the paper if this kind of thing interests you.

Another neat thing about the cycling technique is that it actually doesn’t cost you anything. This is a very cryptographer kind of thing to say, since it actually does cost you quite a lot — in terms of time and extra computation. But what it doesn’t cost you much of is security. You might think that all this repetition might make the cipher somehow weaker. But Black and Rogaway show that it doesn’t, at least not in the ideal case.

In Summary: The real world 

So we know how to deterministically encipher arbitrary numbers. But a more important question is: is any of this a good idea?

Well, sometimes in security, application requirements trump perfection. Obviously it would be better to protect your data using standard, randomized encryption. This deterministic stuff only works well if there’s a certain amount of unpredictability in the underlying data, and that’s never guaranteed.

For example, your typical credit card number isn’t really 16 digits of random data at all. Typically it’s composed of a 4-6 digit Bank Identification Number (BIN), followed by a Personal Account Number (PAN), followed by a special checksum digit that’s computed deterministically based on the previous digits. The Personal Account number is the really unpredictable bit, and if you use an American Express card (15 digits total), it could be as short as 8 digits.

That means in an extreme case, there might be only about 100 million possible PAN numbers. Still a pretty large number, but if you knew the BIN and could gain access to an encryption oracle that lets you encrypt arbitrary card numbers at a rate of 1,000 per second, you could generate every possible ciphertext in a little over a day. This would let you decrypt any encrypted card number, simply by looking it up in a table.

Probably this scenario is unrealistic, but it’s something to keep in mind if you plan to deploy a system like this.

Notes:

* Credit card numbers do have some structure, so in practice you could take advantage of that. For the moment, let’s not.

** In practice, this application requires more thinking. CCNs contain routing information called the Bank Identification Number (the first few digits) as well as a meaningful checksum digit (the last one).

*** Though be careful with this: the permutation property only holds if you can encode your data into a single cipher block. If you have to break it up across multiple blocks, it doesn’t work anymore.

**** As I’ve explained elsewhere, I am not being rude. In cryptography ‘suck’ is a purely technical term meaning “slow, complex, and probably insecure”.

***** There are various specific attacks that dictate how many rounds you need to use. Don’t take my word here as gospel.

****** Updated per Terence’s comment below. However I’m not sure that this is comforting to anyone but a cryptographer 🙂 Also see JoachimV’s comment for some updates on the field.

How not to redact a document: NHTSA and Toyota edition

Allow me to apologize in advance for today’s offtopic post, which has nothing to do with crypto. Consider it a reflection on large organizations’ ability to manage and protect sensitive data without cryptography. Report card: not so good.

Some backstory. You probably remember that last year sometime Toyota Motors had a small amount of trouble with their automobiles. A few of them, it was said, seemed prone to sudden bouts of acceleration. Recalls were issued, malfunctioning throttle cables were held aloft, the CEO even apologized. That’s the part most people have heard about.

What you probably didn’t hear too much about (except maybe in passing) was that NASA and NHTSA spent nearly a year poring through the Toyota engine control module code to figure out if software could be at fault. Their report, issued in February of this year, basically said the software was ok. Or maybe it didn’t. It’s not really clear what it said, because major portions — particularly of the software evaluation section — were redacted.

Now, like every major redaction of the digital age, these redactions were done thoughtfully, by carefully chopping out the sensitive portions using sophisticated digital redaction software, designed to ensure that the original meaning could never leak through.

Just kidding!

Seriously, as is par for the course in these things, NHTSA just drew big black squares over the parts they wanted to erase.

And this is where we get to the sophistication of organizations when it comes to managing secure data. You see, NHTSA released these reports online in February 2011, as a response to a FOIA request. They were up there for anybody to see, until about April — when somebody noticed that Google was magically unredacting these documents. Whoops. Time to put up some better documents!

Naturally NHTSA also remembered to contact Archive.org and ask that the old reports be pulled off of the Wayback Machine. No, really, I’m just kidding about that, too.

Of course, they’re all cached there for all to see, in their full un-redactable glory. All it takes is a copy and paste. For example, take this portion:

Where the redacted part decodes to:

The duty % is converted into three boolean flags, a flag describing the sign of the duty, a flag if the absolute value of the duty is greater than or equal to 88%, and a flag if the absolute value of the duty is less than 1%.  The 64 combinations of these flags and their previous values are divided into ten cases. Of the ten cases, five will open the throttle, two of the five will make the throttle more open than currently but not wide open, two will provide 100% duty instantaneously, and one will perpetually open the throttle. Any duty command from the PID controller greater than or equal to 88% will perpetually open the throttle and lead to WOT [wide open throttle]. This also means that any duty greater than 88% will be interpreted by the hardware as a 100% duty command.

Yikes!

So what’s the computer security lesson from this? Once data’s out on the wire, it’s gone for good. People need to be more careful with these kinds of things. On the bright side, this was just information, possibly even information that might be useful to the public. It’s not like it was sensitive source code, which I’ve also seen find its way onto Google.

In defense of Applied Cryptography

Over the weekend I found myself re-reading a few bits of Bruce Schneier’s Applied Cryptography for a historical research project, and it reminded me what a fantastically wonderful, completely insane piece of writing it is. I’m sure Bruce Schneier needs no additional validation in his life, but I do think it’s worth saying a few words about the book — and why we need more works like it in our field.

I should preface this all by saying that Applied Cryptography is probably one of the most influential crypto books ever written. It certainly played a part in getting me interested in the field. If you don’t own a copy, you should, if only to be awed by Bruce’s knowledge of bizarre, historical ciphers and all of the ways they’ve been broken. (Though the most recent edition is from 1996, so don’t go looking for up to date information in it.)

Much the way a hard-core Nirvana fan will roll their eyes if you mention “Nevermind”, people in the crypto research community tend to get a little bent out of shape if someone references Applied Cryptography. This is not because AC is a lousy book, or because the material is inaccessible or wrong. Quite the contrary. Rather, the case against Applied Cryptography is that it made cryptography too damned accessible. And that has led to a whole lot of fail, some on a pretty grand scale.

The detailed argument goes something like this: Applied Cryptography demystified cryptography for a lot of people. By doing so, it empowered them to experiment with crypto techniques, and to implement their own code. No problem so far.

Unfortunately, some readers, abetted by Bruce’s detailed explanations and convenient source code examples, felt that they were now ready to implement crypto professionally. Inevitably their code made its way into commercial products, which shipped full of horribly ridiculous, broken crypto implementations. This is the part that was not so good. We’re probably still dealing with the blowback today.

Just for one modest example, take this fragment of code spotted in a Diebold voting machine, circa 2003:

// LCG – Linear Conguential Generator – used to generate ballot serial numbers 
// A psuedo-random-sequence generator  
// (per Applied Cryptography, by Bruce Schneier, Wiley, 1996) 
#define LCG_MULTIPLIER 1366 
#define LCG_INCREMENTOR 150889 …

Thanks to Applied Cryptography, the Diebold coders were able to write a perfectly functional Linear Congruential Generator in no time at all. You certainly can’t blame Bruce for anything here — the LCG code is fine. It’s certainly not his fault that Diebold missed the part where he warned never to use LCGs for security applications. Whoops!

Although it’s all said with love, some people really do blame Applied Cryptography for this sort of thing. Even Bruce has at various points himself apologized for this aspect of the book.

(Not coincidentally, you’ll notice that his more recent books are nowhere near as brazenly useful as AC. Where Practical Cryptography is all crapped up with grave warnings about the dangers of rolling your own crypto implementations, Applied Cryptography just laid it all out there sans apology, like a copy of the Anarchist Cookbook left open in a middle school library.)

But I said that I’m writing this post to praise the book, not to damn it with faint praise.

What’s magical about Applied Cryptography is really two things. First of all, it’s an incredible historical document. If there’s a cipher that was used in the period 1970-1996, you’ll read about it in Applied Cryptography. Even if the cipher was based on the cryptographic equivalent of an abacus, even if it was broken in the same conference in which it was published, Bruce will still give you a full design description and the address of the guy who owns the patent.

And you have to love the understated way in which he sums up his opinions. Take, for example, the section on the FEAL cipher. After four dense paragraphs in which he lays out not one, but eleven devastating attacks culminating in the total demolition of FEAL (and possibly, the ritual suicide of its designers), he could have written something terrible, capping it with three exclamation points. Instead, he leaves it with the following, dry remark:

Whenever someone discovers a new cryptanalytic attack, he always seems to try it out on FEAL first.

All joking aside, I’ve found Applied Cryptography (and its massive bibliography) to be incredibly useful, particularly when examining historical crypto software, and (most importantly) when finding prior art to current crypto patents. Applied Cryptography is like a key to a time period that’s not well covered by Google Scholar or other Internet search tools. It’s helped me through more than one trip to the library.

The other magical thing about AC is that you really feel that after writing it, Bruce must have been a few pounds lighter — and some of that weight was his soul. I’m not saying he’s soulless! (Though he does work for BT now, doesn’t he?) It’s just that there’s just no way to read through this monster of a book, with its enormous compendium of bibliographies and source code, and imagine someone writing this thing just for the money or even for the fame. In my opinion it’s an achievement of practical cryptography that has not been matched either before or since.

And that’s too bad, because someone really should try.

On symbol signs, the adversary, and announcing a contest

Update 11/26/2011: I’ve had two excellent responses to my challenge. You can see them all on this page.

A couple of people have asked me about the symbols that I use to illustrate protocol diagrams. Let me first say that I claim no particular credit or blame for this style; all of that goes to Thomas Ptacek.

These symbols are part of the AIGA Symbol Signs package, a collection of freely downloadable stencils for the standard international symbols. They’re wonderful for someone as graphically challenged as I am. You can download a stencil set for Omnigraffle, or get the EPS templates from the AIGA site.

As Thomas points out, there’s all kinds of useful security stuff in them: the international symbols for Alice and Bob, the passport agent and keys, and some less-useful things like pictures of ships and planes. (Though even these turned out to be useful once, when I reviewed a security system for a cruise ship.)

However, I think what prompts questions is not Symbol Signs, but rather, it’s the symbol I’ve been using to represent the adversary. Some people seem to think the baby is an odd choice. (My guess is that these people simply don’t have one.)

However you feel about my choices, I think we can all agree that there’s a tremendous need for better design style in scientific presentations. It’s 2011, and Comic Sans + Microsoft clip art no longer cuts the mustard.

Announcing a contest. In order to do my small part to improve the appearance of security presentations, and to resolve this baby-as-adversary issue, I have decided to offer a whopping $200 (plus fame and glory) to any artist, or person of artistic bent, who can come up with a better AIGA-style symbol for the adversary, and/or other security-related symbols if they’re exceptional. This is a one-time prize, and will be offered to a single winner of my choice. I realize this isn’t a whole ton of money, but if I like the result I promise I’ll do my level best to praise your talents far and wide.

What is the Random Oracle Model and why should you care? (Part 4)

This is part four of a series on the Random Oracle Model.  See here for the previous posts:

Part 1: An introduction
Part 2: The ROM formalized, a scheme and a proof sketch
Part 3: How we abuse the ROM to make our security proofs work

This is the penultimate post in a series on the Random Oracle Model. I originally meant for this to be short and tidy, something that could be easily digested by a non-expert who was interested in what cryptographers were up to. More to the point, I expected to be done by now.

But the end is in sight, and I feel that there are only a few things left that I want to cover. Even better, one of these topics is of a very practical bent, which is in keeping with the theme of this blog.

Specifically: I keep talking about the random oracle model, but what impact does it have on anyone’s life? Where is this thing used?

Giving a complete answer to that question is hard — it’s kind of like coming up with a list of all the foods that have sodium benzoate in their ingredients list. So in this post I’m going to at least hit at least a couple of the high points, meaning: the schemes that are probably most relevant to our daily security lives.

RSA Signing and Encryption

RSA is probably the most famous cryptosystem in use today. I don’t know exactly why it’s so popular, but I suspect there are few reasons. First of all, RSA is flexible; it provides signature and encryption in one scheme. It’s relatively simple — you can compute simple encryptions with a calculator. It’s been around for a while and hasn’t been broken yet. And I would be remiss if I didn’t mention that there are no current patents on RSA.

This is all great, but in some sense it’s also too bad. Over the past twenty years, attacks on the factoring problem (and by implication, RSA) have been getting better at the margins. This means that RSA key sizes have had to get bigger just to keep up. Sooner or later we’re going to lose this game.*

More interesting (to me) is RSA’s status as a sort-of-provably-secure cryptosystem. There’s a general understanding that RSA encryption is provably secure under the hardness of the RSA problem. However, this really isn’t true, at least given the way that most people have traditionally used RSA. You can get a reduction to the RSA assumption, but you need to take some very specific steps. Most importantly: you need to invoke the random oracle model.

To explain all this, I need to take you back in time to the late 1990s and I need to talk about padding.

If you’ve ever looked at a real RSA implementation, you probably know that we don’t use plain-old ‘textbook’ RSA (“m^e mod N“) to encrypt or sign raw messages. There are lots of reasons for this. One is that plain-old RSA isn’t randomized, meaning that a given message will always encrypt to the same ciphertext.

Another reason is that RSA is malleable, meaning that you can mess with ciphertexts and signatures in creative ways — for example, by multiplying them by constants of your choice. When you do this, the decrypted (resp: verified) message will be altered in predictable ways. This is bad for encryption. It’s downright awful for signatures.

We deal with these problems by applying padding to the message before encrypting or signing it. Padding schemes can add randomness, apply hashing in the case of signature, and most importantly, they typically add structure that lets us know when a message has been tampered with.

Some historical background: RSA before the ROM

The first widely-used, standard padding schemes were described in RSA’s PKCS#1 standard. Nowadays we refer to these as the “PKCS #1 v1.5” standards, with the implication that the standard (now at version 2.1) has left them behind.

Back in the 90s these standards were the only game in town. Sure, there were a few pedantic cryptographers who objected to the lack of formal security justification for the standards. But these troublemakers were mainly confined to academic institutions, which meant that the real security professionals could get on with business.

And that business was to implement the PKCS padding standards everywhere. PKCS #1 v1.5 padding still turns up in products that I evaluate. Most notably at the time, it was all over a young standard known as SSL, which some people were using to encrypt credit card numbers traveling over the Internet.

Danielbleichen
Daniel Bleichenbacher
shortly after his arrest. Or
possibly just posing for a
Bell Labs publicity photo.

Now, I said that almost nobody of a practical bent had a problem with PKCS. One notable exception was a bright cryptographer from Bell Labs named Daniel Bleichenbacher.

Dr. Bleichenbacher had a real problem with the PKCS padding standards. In fact, one could say that he made it his life’s mission to destroy them. This vendetta reached its zenith in 2006, when he showed how to break common implementations of PKCS signature with a pencil and paper, shortly before driving a fireworks-laden minivan into the headquarters of RSA Data Security.

(Ok, the last part did not happen. But the rest of it did.)

Bleichenbacher took his first crack at PKCS in CRYPTO 1998. In a surprisingly readable paper, he proposed the first practical adaptive chosen ciphertext attack against “protocols using the PKCS #1” encryption standard. Since, as I’ve just mentioned, the major protocol that met this description was SSL, it was a pretty big deal.

The attack goes something like this.

Whenever somebody sends their credit card info over an SSL connection to a web server, their browser and server execute a handshake protocol to derive a secret key. In particular, at one step of the protocol, the browser encrypts a very important value called the “pre-master secret” under the server’s RSA public key, and ships the resulting ciphertext to it over net.

This encryption is important, because anyone who decrypts that ciphertext — immediately, or five years later — can ultimately recover the SSL transport key, and hence decrypt all communications between the server and browser.

What Bleichenbacher pointed out was very specific to how most web servers processed RSA messages. Specifically, every time the server receives an RSA ciphertext, it first decrypts it, and then checks that the padding is valid. For a typical RSA key, PKCS #1 padding looks like this:

0x 00 02 { at least 8 non-zero random bytes } 00 { message }
When an SSL webserver doesn’t like the padding it sees, it may kick back a specific ‘bad padding’ error. Bleichenbacher first showed that if he could intercept a legitimate RSA ciphertext “C = M^e mod N” on the wire, then he could ‘maul’ it by multiplying it by a chosen value. In other words, given some C, he would pick some “s” and compute:
C' = C * s^e mod N

This value C’ will decrypt to “M * s mod N”. Bleichenbacher showed that for some values of “s”, this mauled message might also be a valid PKCS-padded message, meaning that it would begin with “0x 00 02” etc. This wasn’t all that unlikely, mainly because just looking at a couple of bytes was a pretty crappy padding check in the first place.

It was quite common for servers to leak the result of the padding check back to the sender, by way of some specialized error code like “BAD PADDING”. Even if they didn’t do that, you could sometimes detect a failure in the padding check by measuring the time it took the server to get back to you.

Bleichenbacher’s main trick was noting that the “s” values that led to a successful padding check could also be used to learn something about the original message M. To make a long story short, he showed how to choose these values adaptively in such a way as to gradually zero in on the actual plaintext, until he had just one candidate. And that was the ballgame.

The actual attack could require a couple of million decryption queries, on average. This sounds like a lot, but it meant you could decrypt an SSL session overnight, as long as nobody was paying attention to the server logs.

Random Oracles to the Rescue

The Bleichenbacher attack was a big deal for a whole variety of reasons. First of all, it was practical. It worked on a widely-deployed system, and it could be implemented and executed even by non-experts. It completely negated the protections offered by SSL, unless you were willing to monitor your SSL webserver literally night and day.

But to cryptographers the attack had special significance. First, it demonstrated that adaptive-chosen ciphertext attacks were a real problem, not just some airy academic topic. I suspect that even the cryptographers who worked in this area were a bit surprised by this. Some probably started looking for less useful topics to work on.

More importantly, the Bleichenbacher demonstration was seen as a victory for proponents of provable security. Up until now, their concerns had mostly fallen on deaf ears. Now a lot of people were listening.

And even better, the crypto research community had something to say to them. The overwhelming consensus was to immediately switch to a new RSA padding scheme that had been proposed all the way back in 1994. This scheme was called Optimal Asymmetric Encryption Padding (OAEP), and it was only a little bit more complex than PKCS #1. Even better, and unlike PKCS #1, RSA encryption with OAEP could be provably reduced to the hardness of the RSA problem, in the random oracle model. 

The success of OAEP also led to the adoption of the PSS padding scheme, which did essentially the same thing for RSA signatures.

OAEP padding, courtesy Wikipedia.  G and H are independent hash functions with different output lengths.

Now, to hear some people tell it, this is where the story ends. And it’s a good story — bad, unproven security protocol gets broken. New, provably-secure protocol steps in and saves the day.

And for the most part the story is true. With some important caveats.

The first of these is implementation-related. Just as PKCS #1 was broken because the server gave too many detailed errors, even OAEP encryption can break if the server leaks too much info on its error conditions. And when OAEP implementations go bad, they can really go bad. James Manger showed that you can break 1024-bit RSA-OAEP in a mere 1100 queries, assuming your server leaks a little bit too much information on why a given decryption attempt went wrong.****

But even if you do everything correctly, the OAEP security proof only holds in the random oracle model. That means that whatever guarantees it supposedly offers are only valid if the hash function is ideal. That’s a good heuristic, meaning that OAEP is unlikely to be laden with obvious flaws.

But if you dig into the OAEP security proof, you’ll see that it’s using random oracles in about the strongest ways imaginable. The essential concept of OAEP is to use a pair of hash functions to construct a tiny, invertible Feistel network. The beauty is that every time you encrypt, you’re essentially sending the message (and some randomness) directly “through” the random oracle, in a way that permits all sorts of nonsense.

The reduction proof can then see what the adversary’s encrypting, which turns out to be very useful when you’re building a system that’s secure against chosen ciphertext attacks. This would never work if you implemented OAEP with a real hash function, like the ones that are actually used to implement it in real life.

So in one very real sense, we still don’t really have a practical, provably-secure way to encrypt with RSA. Moreover, there’s little reason to believe that we ever will, given a slew of impossibility results that seem to keep getting in the way.*****

All of which raises an important question. Given all the limitations of RSA, why are we so darn attached to it?

The Digital Signature Algorithm (DSA/DSS)

You might find it a bit strange that I’m mentioning DSA in a discussion about provable security. This is mainly because, unlike the Elgamal signature on which DSA is based, the DSA standard has no security proof whatsoever.******

There’s no really defensible reason for this. As best I can tell, NIST took a look at the Elgamal signature and thought it was cute, but the signatures were a little bit too long. So they took some scissors to it and produced something that looks basically the same, but doesn’t have the very nice reduction to the Discrete Logarithm assumption that Elgamal did.

This kind of behavior is par for the course, and honestly it gets me a little down on government standards bodies.

For the record, while DSA isn’t really provably-secure at all, the Elgamal scheme is. But unfortunately, the Elgamal scheme is only provably secure in the random oracle model. This means that if your hash function has the very nice property of being a perfectly random function and programmable, Elgamal signatures can be reduced to the hardness of the Discrete Logarithm problem. But otherwise you’re on your own.

Key Derivation and Beyond

There are probably a zillion cryptosystems that ought to be mentioned here, and we can’t cover them all. Still, there’s one more class of ‘things that hash functions are used for’ that probably isn’t really secure at all, unless you’re willing to make some pretty strong assumptions about the properties of your hash function.

The example I’m thinking of is the use of hashing to derive cryptographic keys. For example, many password-based encryption schemes use a hash function to combine a password and some salt to derive a key for encryption. This approach seems to work ok in most cases (assuming a high entropy password), but it doesn’t have any strong theoretical legs to stand on.

In general, if you’re trying to extract a uniform random key from from a ‘lumpy’, non-uniform source like a password, the proper tool to use is something called a randomness extractor. These can be implemented in the standard model (no random oracles), but all of the existing constructions pretty much suck. This sounds rude, but it’s actually a technical term that cryptographers use to indicate that a scheme is complicated and slow.

So in real life, nobody uses these things. Instead they just hash the password and salt together, and assume that the result will be pretty good. If they’re feeling frisky they might even hash it a bunch of times, which is the crypto equivalent of shooting a turkey with an AK-47. It’s probably just fine, although technically you can never rule out bulletproof turkeys.*******

Whenever people bother to argue formally about the security of schemes like this (which is, essentially, never), the usual argument goes like this: let’s just assume the hash function is a random oracle. In the random oracle model, of course, every hash function is a perfect extractor. Stick any high-entropy source into an ideal hash function, no matter how ‘lumpy’ or ‘weird’ the input is, and you get a fantastic key out the other end.

So even if your favorite key derivation has no formal security proof whatsoever, you can bet somebody, somewhere has made this argument, if only to help themselves sleep better at night.

In Summary

This was supposed to be a post about the random oracle model, but I feel that it took on a life of its own. That’s not a bad thing. I promised that I’d spend some time talking about practical crypto, and I’ve certainly accomplished that.

Although I could say a million things about the random oracle model, I only have one more post left in me. And in some ways, this is going to be the most important post of all.

You see, we always knew that this ride wouldn’t last forever, we just thought we had more time. Unfortunately, the end is nigh. Just like the imaginary city that Leonardo de Caprio explored during the boring part of Inception, the random oracle model is collapsing under the weight of its own contradictions.

This collapse — and what it means for cryptographers, security professionals, and everyone else — will be the subject of my final post.

This series concludes in Part 5.

Notes:

* NIST has begun to deprecate RSA in its FIPS standards. You can use it in some places, but it’s no longer a major supported algorithm.

**** Specifically, the OAEP decoding algorithm has a step in which it decodes the padded plaintext (a value between 0 and N-1) into a bitstring. This requires it to check that the decrypted RSA value is less than 2^n, where n is some number of bits that can be reliably encoded within the range. The Manger attack assumes that the decoding algorithm might leak the result of this check.

***** Although, on the signature side things are a little better. Hohenberger and Waters recently proposed a hash-and-sign RSA signature that’s secure under the RSA assumption without random oracles. This is really a very neat result. Unfortunately it’s inefficient as hell.

****** This is not to say people haven’t tried to piece a proof together. You can do it, if you’re willing to work in the random oracle model, and also assume the existence of leprechauns and unicorns and similarly unrealistic things.

******* In fact, repeated hashing is usually just as as much about making the hash process slow as it is about achieving a strong key. This can be helpful in thwarting dictionary attacks.

Attack of the week: XML Encryption

Unfortunately I had to skip this year’s CCS because I was traveling for work. This means I missed out on a chance to go to my favorite Chicago restaurant and on the architecture cruise. I’m told that there were also some research papers being given somewhere.

The Internet can’t make up for every missed opportunity, but it can help with the research. In fact, today’s news is all about one of those papers. I’m talking about Tibor Jager and Juraj Somorovsky’s work entitled “How To Break XML Encryption“.*

In case I had any doubts, it only took one glance at the first page to confirm that we’re back in “someone screwed up basic symmetric encryption” territory. But really, to describe Jager and Somorovsky’s findings like this does them no justice. It’s like comparing the time that kid built a nuclear reactor in his garage to, say, the incident at Fukushima.

Some background is in order.

What is XML Encryption and why should I care?

You probably already know that XML is the world’s most popular way to get structured data from point A to point B.** Following the success of HTML, the computing world decided that we needed a single “flexible” and ASCII-based format for handling arbitrary data types. (We’ve spent much of the subsequent period regretting this decision.)

The one upshot of this XML epidemic was the emergence of some standards and libraries, which — among other things — were supposed to help application developers process data in a reliable and secure way. One of those standards is SOAP, which is used to transport data in web services frameworks.

Another standard is the W3C XML Encryption Standard, which was dropped like a log in 2002 and doesn’t seem to have been updated since. The point of this standard was to give developers a uniform (and secure!) way to protect their XML documents so that we wouldn’t wind up with five thousand incompatible, insecure ways of doing it. (Oh the irony.)

Finally, a very common implementation of both standards can be found in the Apache Axis2 web services framework and in the RedHat JBoss framework. These are probably the most common open-source SOAP frameworks. They power a number of systems you don’t necessarily think about, but you probably should because they carry a lot of your personal information.

So what about the encryption?

To make a very long story short, the W3C standard recommends to encrypt messages using a block cipher configured using (our old friend) CBC-mode. Here’s a quick recap on CBC mode, courtesy Wikipedia:

CBC mode encryption. The message is first subdivided into equal-length blocks and encrypted as in the diagram.  The circular plus symbol denotes XOR.

There are two basic things you need to know about CBC mode, and ought to know if you ever plan to use it.***

First: CBC requires that every plaintext be an even multiple of the block size. In the case of AES, this means 16 bytes. If the message is not the right length, then CBC implementations will pad the message with additional bytes. Of course it must be possible to recognize this padding so that it can be stripped off after decryption. There are various schemes for doing this.

The W3C standard uses the following padding approach. Let “MM” indicate message bytes, and let “NN” be the total number of padding bytes being added. “XX” represents arbitrary padding bytes, which can hold any value you want. A padded block will look like this:

0x MM MM MM MM MM MM MM MM MM MM XX XX XX XX XX NN
When a decryptor encounters a padded message, it looks at the last byte to figure out how many bytes to strip off. If it encounters padding that doesn’t make sense, i.e., NN > 16 or NN < 1, then it should see something funny going on and reject the message.
Second: CBC ciphertexts are malleable. This means that you can modify a CBC-encrypted ciphertext such that your modifications will carry through decryption, and have a meaningful effect on the resulting decrypted plaintext.
In the simplest case, you can truncate the message by stripping blocks off the end. This does the same thing to the resulting plaintext.
More interestingly, you can flip any bit (or set of bits) in the IV of a CBC-encrypted message, and upon decryption you’ll find that the same bits have been flipped in the first block of the decrypted plaintext. You can also do a lot more, but it’s not that important right now.

Anything else I should know?

Yup. You need to know how XML messages are formatted. They’re encoded using UTF-8 — essentially ASCII — with some special rules.

These are described in the paper. Briefly: any XML message that contains bytes ranging from 0x00-0x1F (with the exception of tabs, LF and CR) may be considered invalid. Similarly, the ampersand (&) character is used as an escape character, and must be followed by a valid string. Lastly, open brackets “<” must be properly closed. Messages that don’t meet this standard should be rejected.  ****
Of course, the message will have to be decrypted (using the appropriate key) before any of these checks can be run.

This is all fascinating, but how does it lead to an attack?

There’s one more detail I haven’t given you. You see, in addition to the details above, the Axis2 server (ditto JBoss) is kind enough to let you know when you haven’t met its standards for an XML message.

Specifically, after it decrypts a message, it kicks back an error if the message either: (1) has bad padding, or (2) contains invalid characters. In both cases, the error is the same. And this error is your way in the door.

I’m not going to completely describe the attack, but I’ll try to give an overview.

Imagine that you’ve intercepted a legitimately-encoded, encrypted XML message (IV, C1, …, CN) and you want to know what it says. You also have access to an Axis2 or JBoss server that knows the key and can decrypt it. The server won’t tell you what the message says, but it will give you an error if the encoding doesn’t meet its standards.

Sending the original, intercepted message won’t tell you much. We know that it’s encoded correctly. But what if you tamper with the message? This is precisely what Jager and Somorovsky proceed to do.

Step 1: Truncate the ciphertext. A good way to start is to chop off everything after the first ciphertext block, and send the much-shorter message consisting of (IV, C1) in to be decrypted. Chances are good that this new message will produce a decryption error, since the server will interpret the last byte of the decrypted message as padding — a number that should be between 0x01 and 0x10.

Step 2: Tweak the padding. I already said that if you flip bits in the IV, this will result in a similar change to the decrypted plaintext. Using this concept, you can force the last byte of the IV through all possible values and ask the server to decrypt each version of the ciphertext. More formally, the ‘new’ last IV byte can be computed as (‘original last IV byte’ ⊕ i) for i from 0 to 255.

In the best case, 16 of these test ciphertexts will decrypt without error. These correspond to the decrypted values 0x01 through 0x10, i,e., valid padding. If fewer than 16 of your test ciphertexts decrypt, this means that there’s something wrong with the message: probably it contains an unclosed “<” tag.

Step 3: Squish the bug(s). This is no problem. If exactly only one of your ciphertexts decrypts successfully, that means the open “<” character must in the first byte of the message. You caused the last byte of the message to decrypt to 0x10 (16 decimal), and the decryptor treated the whole block as padding. There are no errors in an empty message.

If two ciphertexts decrypt successfully, then the “<” character must be in the second position of the message, because now there are only two padding lengths (16 and 15) that would cause it to be stripped off. And so on. The number of successful decryptions tells you where the “<” is.

Now that you know where it is, kill it by flipping the last bit of the appropriate byte in the IV. This turns “<” into a harmless “=”.

Step 4: Learn the last byte of the block. You should now be able to come up with 16 test ciphertexts that decrypt successfully. This means you have 16 values of “i” such that ‘last byte of the original IV’ ⊕ i results in a successful decryption. One of these “i” values will differ from the others in the fourth most significant bit. This one will correspond to the padding value 0x10.

If “x” is the original plaintext byte and “i” is the value you just identified, you know now the last byte of the block. Since we have x ⊕ i = 0x10, then through the very sophisticated process of rearranging a simple equation we have 0x10 ⊕ i = x.

Now, using our knowledge of the last byte of the plaintext, manually tweak the IV so that the last byte (padding) of the plaintext will decrypt to 0x01.

Step 5: Learn everything else. The rest of the attack hinges on the fact that all of the bytes in the message should be ‘acceptable’ UTF-8 characters. Thanks to our trick with the IV, we can now flip arbitrary bits in any given byte, and see whether or not that leads to another ‘acceptable’ character or not.

Believe it or not, the results of this experiment tell us enough to figure out each character in turn. I won’t belabor this, because it takes a little bit of squinting at the ASCII table, but fundamentally there’s not much magic beyond that. It’s very simple and elegant, and it works.

Step 6: Finish it. You’ve done this for one block. Now do it for the rest. Just set your ‘ciphertext’ to be the next block in the message, and your ‘IV’ to be the ciphertext block immediately preceding it. Now go back to Step 1.

And that’s the ballgame.

Obviously all of this takes a lot of tests, which you can reduce a bit using some of the optimizations suggested in the paper. Jager and Somorovsky are able to recover plaintexts with “14 requests per plaintext byte on average.” But at the end of the day, who cares if it takes 50? The box is sitting there ready and willing to decrypt ciphertexts. And it’s incredibly unlikely that anyone is paying attention.

Ok. Can’t you fix this by authenticating the ciphertexts?

This entire attack is a special example of an adaptive chosen ciphertext attack. (Specifically, it’s a super-duper variation of Vaudenay’s padding oracle attack, which he discovered in 2002, the same year the W3C standard hit!)

These attacks can almost always be prevented with proper authentication of the ciphertexts. If the decryptor checks that the ciphertexts are valid before decrypting them, the attacker can’t tamper with them. Hence, no useful information should leak out to permit these attacks at all.

The designers could have added a mandatory MAC or even a signature on the ciphertext, or they could have used an authenticated mode of operation.

But the W3C standard does provide for MACs and signatures!

Indeed. As the authors point out, there’s an optional field in the specification where you can add a signature or a MAC over the ciphertext and its meta data. But there’s one tiny, hilarious thing about that signature…

Did I mention it’s optional?

You can quite easily take a signed/MACed ciphertext and ‘convert’ it into one that’s not signed or MACed at all, simply by stripping the ciphertext contents out and placing them into a section that does not claim to have a signature. Since the spec doesn’t mandate that the authentication be on every message, the decryptor will be totally cool with this.

So in summary: optional MACs and signatures don’t buy you much.

Is there anything else to say about this?

Lots. Stop using unauthenticated CBC mode. Stop using any encryption standard designed before 2003 that hasn’t been carefully analyzed and recently updated. Stop using any unauthenticated encryption at all. Wrap your SOAP connections with TLS (a recent version!) if you can.

People didn’t really think much about active chosen ciphertext attacks on symmetric encryption before 2000, but now they’re the new hip thing. If your system is online and doesn’t have a solid, well-analyzed protection against them, don’t pretend that you’re doing anything at all to secure your data.

I wish I had a funny, pithy way to sum this all up. But honestly, I’m just a little depressed. It’s not Jager and Somorosky’s fault at all — this is a great paper. It’s just that there’s too much embarrassingly bad crypto going around.

While I love reading these papers, I’m starting to feel that this one-off approach is not sufficient to the problem. Maybe what the community needs is a centralized clearinghouse for this stuff, a website where suspect crypto standards (and implementations) can be identified and listed. Crowdsourced and analyzed. Probably many are just fine, but I doubt most have been looked at (or even thought about in a while).

Anyone looking for a project?

Notes:
  
* Tibor Jager and Juraj Somorovsky, “How To Break XML Encryption”. In ACM CCS 2011. ACM Press.

** Someone is inevitably going to tell me that JSON is the world’s most popular way to move structured data. Ok. I don’t care.

*** But better yet, don’t use it. Use a standard authenticated encryption scheme like CCM, OCB or GCM, and use someone else’s implementation according to the standards. These modes should prevent all of this nonsense.

**** These details are quoted straight from the Jager and Somorovsky paper. I’m not 100% sure if all implementations enforce this, but Axis2 does.

What is the Random Oracle Model and why should you care? (Part 3)

This is part 3 of a series on the Random Oracle Model.  See here for the previous posts:

Part 1: An introduction
Part 2: The ROM formalized, a scheme and a proof sketch

I began this series in the hope of writing a true layperson’s introduction to the Random Oracle Model. At the outset I figured I probably knock this thing out in a couple of not-too-technical posts. Yet, here I am at post three, and there’s still a big chunk of technical left to go.

Please bear with me. We’re going to get back to some solid, practical crypto with the next post in the series.

A quick recap

In the last two posts we’ve covered a bit of ground. We talked about cryptographic hash functions and their properties, and discussed why the cryptographer’s ‘ideal’ hash function would be a random function. We noted that random functions would never work in real life (too big to store, too slow to compute), so we came to a strange resolution: we would pretend that our hashes were random functions, just for the purposes of our security analysis. Then of course we would use real hash functions like SHA.

Of course, even this required us to tamper with the fabric of the universe, since we want to restrict our ‘adversaries’ to running in a reasonable amount of time, and they can’t do that if they’re busy evaluating random functions. So we created a new model of computation (aka ‘thought experiment’) called the Random Oracle Model. In this model all hashing is computed by a magical third party called an oracle.

If you know a little about the random oracle model, say from reading the previous posts, you could be forgiven for thinking that it isn’t all that bad. Basically we’re assuming that hash functions are super-duper “random”. Maybe a little more so than a real function can be, but big deal.

But unfortunately it’s much more than that.

The world’s shortest lesson on provable security

To explain what I mean, we first need to understand a little about provable security, and particularly, a type of proof called a reduction proof.

When we say that an encryption scheme is “provably secure”, we usually mean that it’s secure under some assumption. For most practical schemes the assumption is mathematical in nature. For example, we might found an encryption scheme on the assumption that it’s hard to factor large composite numbers, or to find the shortest vector in a lattice. In this case ‘hard’ doesn’t mean ‘I can’t figure out how to do it’. It means: there exists no (efficient) algorithm that can do it.1

Now, I mentioned that these are assumptions. We don’t absolutely know that these problems are hard. However, we like to think that if some bright people have looked at the problem for a while, and nobody has made any progress towards solving it, that means it’s probably not easy. In some cases the problem may even fall into a useful class of problems, i.e., it may be NP-Hard.

Even if you’re not willing to blindly trust in these assumptions, security proofs are still useful. Analyzing a new cryptosystem can soak up huge amounts of time and energy. If we can at least ‘reduce’ each new scheme to one of a small number of mathematical problems, we can pour our brainpower into attacking just those problems, knowing that our work will apply to every scheme that uses them.

All of this brings me to my main point, namely, how do these proofs work. Getting specific, the general flow of such a proof is as follows:

  1. Assume that problem X is hard.1
  2. Show that if there exists an (efficient) algorithm that “breaks” our cryptographic scheme, then there exists an (efficient) algorithm that solves problem X.
  3. Snottily point out that (2) means if the scheme can be broken, then problem X can’t be hard.

Clearly point (3) cannot be true if point (1) is. Following the logic, this allows us to conclude that if Problem X is hard, then there can’t be an algorithm that breaks our encryption scheme.

Of course I haven’t explained how to actually accomplish step (2), which is pretty important. It’s also the beautiful part. You see, in most reductions, step (2) consists of actually writing an efficient algorithm that solves problem X.

You probably think I’m crazy. I just claimed that problem X can’t be efficiently solved, now I’m telling you that we write an algorithm for solving it. But the genius of it is that our solver algorithm won’t quite work. It’s missing an important subroutine.

That subroutine is identical — at the API level — to an algorithm that ‘breaks’ our encryption scheme.

So here is how the argument breaks down: if there was, hypothetically, an algorithm that efficiently ‘broke’ our encryption scheme, then we could plug it into the hole we left in our ‘solver’ algorithm. This would yield an algorithm that efficiently solves problem X. Hence, it completely satisfies what we’re asking for in point (2).

Black Boxes

With this foundation laid, I only need to make one more point. There’s a basic restriction that standard reduction proofs must obey. Our ‘Problem X solver’ algorithm must use the ‘encryption scheme breaker’ subroutine as a black box. It cannot make assumptions about how the breaker algorithm works internally.

Let me be more clear about this. We cannot, for example, have a step in our ‘solver’ algorithm where we decompile the breaker algorithm, or peek inside its internal memory registers, and then assume that they’ll contain a partial result we need.

This is because our reduction only works if it works with every possible algorithm that breaks the scheme. Since we can’t guess a priori how such an algorithm might work internally, we can’t depend on its internal workings. The breaker algorithm could hypothetically be horrifically obfuscated, to the point where we couldn’t understand it even if we held the code in our hands.

Thus in a traditional black box reduction, we can count on the breaker algorithm having a standard “API” (which we’ll define at some point). This typically takes in public keys and/or ciphertexts (or signatures, etc.) It outputs some useful information that constitutes our definition of ‘breaking’. And though it doesn’t always need to succeed, we expect that it does so with reasonable probability. But everything that happens inside the algorithm must be hidden from us.

Dirty laundry

You wouldn’t think this has
anything to do with provable
security.

Let’s pause for a moment to consider some related problems in the area of celebrity journalism.

Imagine you’re a muckraking celebrity reporter who’s been assigned to get a scoop on Brad Pitt and Angelina Jolie’s latest adopted baby. The Pitt/Jolies are extremely private, and won’t even acknowledge that they’re adopting. You’re a clever reporter, and more importantly, you have no morals.

You know that if you could just get a peek at the family laundry, you’d have your whole story. Sex, size, weight, favorite color. Even diet (Angelina insists on eco-safe cloth diapers). In a perfect world you could drive over to Pitt/Jolie’s laundry company, slip someone $500, and lo — your whole story would be there in the clothes. If you were really devious, you could start your own celebrity laundry service and trick them into sending it right to you.

But this is all wishful thinking. Pitt and Jolie don’t send out their laundry — they have it done at home.  In truth, you’re not sure that they even do laundry. You heard a rumor that Angelina has her clothes ritually burned after each use.

In case you don’t see the relevance, this is an analogy to reduction proofs. To you, the ‘breaker’ algorithm is a black box. You can see inputs go in, and see what comes out, but you can’t see what it’s doing in between. If your reduction depends on peeking at its dirty laundry, you’d better come up with a better reduction. It does its own laundry.

Back to the Random Oracle Model

The Random Oracle Model is different. In my previous post I explained that it’s just like the real world, with one exception: nobody can compute hashes by themselves. Every party — including the adversary (synonymous with ‘breaker’ algorithm) — must hash by sending messages to a special outside party. This party computes the hashes using a random function, and sends them back to the caller.

To make this abundantly clear, in the Random Oracle Model, everyone sends out their laundry.

The Random Oracle Model. The oracle computes hashes for all parties, including the adversary.

Speaking more formally, in the Random Oracle Model, an adversary (aka our ‘breaker’ algorithm) is no longer quite a black box. In one very important way we have visibility into it. It must call outside of itself if it ever wants to hash something.

If we’re building a reduction, that means we can take advantage of the ‘extra’ info that leaks from these calls, just as our gossip reporter could take advantage of celebrities’ laundry. We know that the algorithm has to make these hash calls, so there must exist a ‘port’ in the API for it to make them. Once we make this stipulation, our reduction can tap into this port and intercept the adversary’s hash calls to see which messages it’s asking for, and possibly even tamper with the results it gets back.

In an extreme case, our reduction can even run its own fake ‘oracle’ (aka laundry service), as long as we always hand back convincing responses.

And while this may not seem a big deal, it really is. Just as Brangelina’s dirty laundry could reveal major details about the stars’ private life, the choice of which messages a ‘breaker’ algorithm chooses to hash can reveal major details about what it’s thinking internally. By tampering with those responses we can also fundamentally change its operation. These details can be the difference between a reduction proof that works, and one that doesn’t.

Taking Stock

At this point, I think it’s worth taking a breath and asking what we’ve gotten ourselves into here. When we started with this nonsense, all we wanted was a better hash function. One that looked slightly more ‘random’ than any real function we could construct.

Along the way we realized that random functions were totally impractical. Even to pretend we could use them, we had to tamper with our computational model and “pull the random function” outside of the parties. After all, they run in polynomial time and couldn’t possibly evaluate such a function.

Each of these steps seemed perfectly reasonable, and yet somewhere we jumped the shark. We took the most sacrosanct principle of reductionist security and violated it. Instead of “our argument works with any adversary”, we now say “our argument works only with adversaries who kindly tell us whatever and whenever they need to hash”.

Real adversaries don’t work like this. If someone comes up with an algorithm that breaks an implementation of RSA-OAEP that uses a real hash function (say SHA256), that algorithm won’t need to tell us what it’s hashing. It will just do it internally. And if its code is obfuscated, we won’t be even able to connect it to our reduction, if our reduction depends on knowing (or tampering with) the hash calls.

An Example: RSA Signing

At this point I’ve run out of ways to talk about the Random Oracle Model without actually invoking a real scheme. So let’s bring this back to something concrete.

The RSA scheme is probably the most famous cryptosystem of the modern age. Many people mistakenly believe that RSA is based on the hardness of factoring large integers. In fact, that’s not quite accurate. RSA is actually based on a slightly different, but related assumption not-creatively named the “RSA assumption“.

This assumption holds that the following problem is fundamentally hard, at least when the numbers get sufficiently large.1 Given three values (N, e, M), where:

  1. N is the product of two prime numbers (p, q) that you don’t know.
  2. 2 < e < N and gcd(e, (p-1)(q-1)) = 1.
  3. 0 <= M < N.

The problem is to find a value “S” such that S^e mod N = M.  2

It’s conjectured that this problem is hard1 if you only have the values given above. However, if you know (p, q) or an additional ‘secret key’ derived from those values, it’s quite easy to solve.

The RSA signature scheme depends on this. In a textbook RSA signature, you “sign” a message by first encoding it as an integer m, such that 0 <= m < N.  Then, using your secret key you compute a ‘signature’ “s”, such that s^e mod N = m. (See here for the exact algorithm.)

The intuitive argument for RSA signatures rests on three legs:

  1. Since only you know the secret key, it’s easy for you to “sign” messages of your choice.
  2. Anybody else can verify your signatures given (m, s) and the “public key” (N, e).
  3. Intuitively, nobody else can sign messages on your behalf (this is known as ‘forging’), because doing so without the secret key would be equivalent to solving the RSA problem.

This is, of course, total nonsense.

If you look carefully, what the RSA assumption actually implies is that it’s hard to come up with a “signature”(S) on a particular message (M). If the adversary has freedom to choose the message himself, then it’s easy for him to come up with a whole bunch of ‘signed messages’. This makes textbook RSA signatures pretty useless.3

More importantly, to prove that RSA signatures are secure under the RSA assumption, we can’t just have the adversary spitting out any old message/signature pair. To make a reduction work, we must use the adversary to solve a specific RSA problem instance (N, e, M). That means whatever the adversary outputs, we must be able to use that value in order to compute an “S” that satisfies S^e mod N = M.

And now here, finally is where the hash functions come through for us.

Adding a Hash

I’ve mentioned previously that most signature schemes don’t literally sign a raw message. Rather, they sign the hash of a message. This is convenient, since messages can be large and hashes are usually small. But in the Random Oracle Model it has a special significance.

Let’s say that we have a hash function H() with the odd property that it outputs an integer between 0 and N-1. This is called a “full domain hash”. This is a little unusual for a hash function, but I promise that such things can be built out of more “traditional” hash functions like SHA.

Instead of signing the message directly, we could have our signer first hash the message (which can now be an arbitrary string), and sign the resulting hash. More formally, a signature “s” must now satisfy:

s^e mod N = H(message)

Can we prove that this scheme is secure under the RSA assumption?

A proof sketch

Imagine that A is some algorithm that breaks the signature scheme, meaning that it takes in a public key (N, e), and eventually outputs an “existential forgery”, i.e., any pair (message, s’) that satisfies the verification equation above.

Our goal is to design a solving algorithm for that can take in an arbitrary RSA problem instance (N, e, M) and somehow force A to help us solve it. Our goal is to eventually spit out a value S where S^e mod N = M.

First, our solver will send A a “public key” consisting of (N, e). Since we’re in the Random Oracle Model, A will eventually have to hash something. Whenever A asks to hash a message, we’ll intercept its attempt to call the oracle, pick a random value 0 <= r < N and answer its query as follows:

H(message) = M * r^e mod N

I will claim that since “r” is random, then our “hash” result is also randomly-distributed, and thus “looks right” to A. So he won’t twig to what we’re doing. Most importantly, we’ve managed to tie every single hash value back to the one value “M” that we took from the RSA problem instance. Of course, we’ll also store each value “r” so we can retrieve it later.

We assumed that eventually A will output a valid “forged” message/signature pair of the form (message, s’), where this pair satisfies s’^e mod N = H(message). If this is the case, then all we have to do is look up the “r” value used to compute H(message) and output:

S = s’ / r mod N         (or more accurately, s’ * inverse(r) mod N)

This is the answer to our RSA problem instance, since S^e mod N = M. 4, 5

Why does this work? Well, let’s simplify the equation “s’^e mod N = H(message)” a bit:

s’^e mod N                            = H(message)
s’^e mod N                            = M * r^e mod N         (this is the hash we computed)
s’                                            = M^{1/e} * r mod N  (raising both sides to the inverse of e)
        s’                                            = S * r mod N              (because S = M^{1/e}.)

A full version of this proof can be found here. None of it works in the real world, of course. The minute you sub in a real hash function, this whole proof is bunk.

A few notes and a conclusion

This has been a long post; even my footnotes are getting out of control. I do hope I was able to make my main point, which is that the Random Oracle Model is both deeply unnatural, and deeply useful to cryptographers writing security proofs.

This is a dangerous combination, since it allows us to take shortcuts and make assumptions, that simply can’t be supported in the real world we live in. And yet, it’s so powerful that we can find it hard to stop.

More to the point, even when cryptographers stop relying on this crutch (and it’s started to happen), we have a hard time convincing end users to get rid of schemes that have been proven in the ROM. Heck, we have a hard time convincing our friends in the systems security community. They think we’re nuts.

And it’s hard to blame them. Even when we have proper replacements, they’re usually funny looking, almost always more “costly”, and sometimes rest on bizarre new mathematical assumptions that people don’t trust. Plus, it’s not like the random oracle schemes are broken.

In my next post I’m going to talk about all this, pull back from the crypto a bit, and focus on this dispute. I’ll also talk a bit about the state of the art in ROM/non-ROM schemes.

And oh yes, I’ll explain why cryptographers have started to get religion on this point.  Hint: someone proved something. Something very bad for the ROM.

Click here for the next post in this series.

Notes:

1. More formally, saying that a problem is “hard” means that that all probabilistic polynomial time Turing machines solve it with at most negligible probability.

2. For a quick explanation of modular arithmetic, see here.

3. For example, he can pick an arbitrary number as “S”, then — using only the public key — work backwards to compute “M” as M = S^e mod N. This pair (M, S) is a valid message/signature pair. Similarly, he can mangle legitimate message/signature pairs in various ways, to get new ones.

4. In practice, there’s no division mod N. Instead, we compute the inverse of r” mod N, then multiply by that number.

5. In a real proof, we would also consider the possibility that the adversary can ask for and obtain signatures on chosen messages. Our reduction would have to answer these requests. This can be done, but it adds a bit of complexity. See the full proof for some details on this process.