Neat research ideas that went nowhere: Preventing offline dictionary attacks with CAPTCHAs

Once in a while I come across a research paper that’s so off the beaten path and so obviously applicable to a real problem, that I know it’s going to sink like a stone and never be heard from again.

Today I’d like to talk about an idea that was introduced back in CRYPTO 2006, but doesn’t seem to have gotten any real traction out in the real world since. (If I’m wrong and someone is using it, feel free to email me. I’d be happy to eat my words and hear more.)

This particular idea comes from a 2006 paper by Ran Canetti, Shai Halevi and Michael Steiner,* and it addresses a very real problem that we have with deployed encryption schemes: namely, they tend to rely on passwords. This means that even if you use strong cryptography, all of your data is vulnerable to someone with a password dictionary and a lot of computing power.

Password-Based Encryption (and why we hate it)

To explain how Canetti, Halevi and Steiner propose to deal with this, I need to give a little bit of background on password-based encryption.

First of all of all, no matter what Dan Brown says, we’ve gotten pretty good at building encryption schemes that stand up against very motivated adversaries. If you’re using a modern scheme and are properly generating/storing your secret keys, then it’s awfully unlikely that a third party is going to decrypt your data — even if that party is a government agency.** (I’m leaving aside the wrench factor).

Unfortunately, the previous paragraph embeds one major assumption that makes it inapplicable to the way that many people secure their data. You see, in a cryptographer’s imagination, everyone encrypts with a strong, random 128+ bit secret key, which they memorize and never write down.

But in real life, people mostly prefer to use their cat’s name.

The use of passwords to derive encryption keys is a lot more common than it should be. It’s depressingly common in many applications where people really need security. Passwords are the default authorization method for many fancy encrypting hard-drives and WDE packages. Windows just loves to encrypt things under your login credentials. (Many of these tools also support hardware tokens and TPM, but these are such a drag that most people don’t bother.)

To your attacker, this is a feature not a bug. If a key is derived from dictionary words, or some predictable mutation of dictionary words, then it can be guessed. All it takes is time and  computing power. Moreover, since password guessing is embarrassingly parallel, it’s just the kind of enterprise that you can throw a data center or five at. (Ever wondered why the state of Maryland uses so much electricity?)

So far I haven’t said anything you didn’t already know. But just for fun, let’s quickly talk about what we currently do to prevent dictionary attacks. There are basically three approaches:

  1. Use some trusted, on-line server (or tamper-resistant hardware) to assist with the key derivation process. If done correctly, the server/hardware can shut off an attacker after a few invalid guesses, assuming that it isn’t also compromised. Unfortunately, this solution assumes that the server or hardware will be available when you need your data. That isn’t always possible.
  2. Use key stretching, which increases the time required to convert a password into a cryptographic key. This can slow down a dictionary attack, but it just slows it down. Moreover, it’s pretty ineffective against an attacker with specialized hardware.
  3. Use salt, which is hardly worth mentioning since if you aren’t salting your passwords then you’re crazy. Unfortunately all this does is prevent pre-computation. It does nothing against normal dictionary attacks.

The human touch

In a perfect world, the online solution (1) above would probably be the best way to go. Unfortunately, we don’t live in a perfect world, and this may not be compatible with people’s requirements.

While it’s not possible to prevent dictionary attacks in an offline system, we may be able to slow them down. To do this, we have to make the decryption process much more resource-intensive. Unfortunately, it’s hard to win this game through computation alone, at least not in a world full of FPGAs and elastic compute clouds.

Not this sort of brains. (XKCD).

What Canetti, Halevi and Steiner realized is that there’s one resource that the adversary might not have in abundance: human brains.

What if we could mandate some sort of human interaction each time a user tries to decrypt a file? This shouldn’t be too burdensome for the legitimate decryptor, since presumably he’s human, and moreover he only needs to try once.

But a password cracker needs to try millions of passwords. Requiring a human interaction for each attempt should throw a serious wrench into the process.

Using Human Puzzles to stop dictionary attacks

Fortunately we already have a whole class of technologies that differentiate human beings from computers. Generally, these are known as Reverse Turing Tests (RTTs). Canetti, Halevi and Steiner propose to use RTTs to construct “human puzzles”. At a high level, the idea is pretty simple.

In every PBE scheme, a password must first be converted into a key. Normally you do this by hashing the password and some salt via a standard key derivation function (e.g., PBKDF2). Usually this is where you’d stop. The output of this function, which we’ll call K, would be the key used to encrypt and decrypt.

However, now we’re going to take things one step further. The first key will be further processed by converting it into a puzzle that only a human can solve. When a human solves this puzzle, they create a new key that, ideally, a computer all by itself should not be able to recover. The process should be reliable — it needs to produce the same result every time it’s run, possibly across many different users.

Once you get past the theoretical stuff, the concrete example that Canetti, Halevi and Steiner propose is to use CAPTCHAs to implement human puzzles.

CAPTCHAs: don’t you love them?

In their example, the process of deriving an encryption key from a password now looks like this:

  1. First hash the password with some random salt r to derive a first key K. Now break this into a bunch of small chunks, each (for example) about 20 bits long.
  2. Use each of these 20-bit chunks to index into a huge table containing 2^20 (about 1 million) distinct CAPTCHA images. Each CAPTCHA contains a hard-to-read string with at least 20 bits of entropy in it. This string could be a bunch of numbers, or some randomly-selected dictionary words.
  3. Give each CAPTCHA to the human user to solve. Collect the results.
  4. Hash all of the user’s responses together with another salt s to derive a final decryption key.

If we instantiate the hash function with, say, PBKDF2 and a 160-bit output, then this requires the legitimate user to solve 8 puzzles the first time they encrypt, and every subsequent time they decrypt.

The dictionary attacker, however, has to solve eight puzzles for every password they try. If we assume that they have to guess a large number of passwords, this means that over time they’ll have to solve a substantial fraction (if not all) of the 1 million CAPTCHAs. Even if they can outsource this work at a rate of 5 cents per CAPTCHA, solving the whole collection will still cost $50,000 and a whole lot of hassle.

(This is a hugely simplified analysis. Obviously the attacker doesn’t have to solve all of the CAPTCHAs. Rather, their probability of guessing the key is a function of the number of CAPTCHAs, the fraction of the CAPTCHAs that they’ve been able to solve, the complexity of the password, etc. See the full paper for a much better analysis.)

The sticky wicket

Now, this really might work, in the sense that it will slow down attacks. But if you’re a practical-minded sort, you probably have one small question about this proposal. Namely: where do the CAPTCHAs come from?

Well, a simple answer is that you generate them all randomly the first time you encrypt, and you ship them along with the encrypted file. If we assume that each CAPTCHA is a 4KB image file, then storing the entire collection will require about 4GB.

That’s seems like a lot of space, but really it isn’t. After all, it’s not like we’re working with floppy drives. We live in a world where a 16GB USB Flash Drive can be had for about $10. So it’s perfectly reasonable to assume that we’ll have lots storage for things we really care about.

In summary

Now maybe CAPTCHAs aren’t your thing. Maybe you don’t like the idea of storing 4GB worth of files just to ship someone an encrypted ZIP file. That’s fine. One of the neatest open problems in the paper is to find new sorts of human puzzles. In general, a human puzzle is any function that converts some input (say a 20-bit string) into an output, such that only a human being can do the transformation.

These don’t need to rely on bulky, stored CAPTCHAs. Hypothetically you could come up with a human puzzle that dynamically generates instances without a big stored database. For example, your puzzle might be based on recognizing human expressions, or inkblots. It just has to be reliable.

And that’s why I think this is such a neat paper. Whether or not this particular idea changes the world, it’s a new idea, it tries to solve a hard practical problem that most people have given up on, and it inspires some new ideas.

The crypto publishing game rarely produces papers with that particular combination. Just for that, I’d love to see it go somewhere.


* Ran Canetti, Shai Halevi and Michael Steiner. 2006. Mitigating Dictionary Attacks on Password-Protected Local Storage. In CRYPTO ’06.

** Leaving aside what we all know the NSA’s got in its basement.