Zero Knowledge Proofs: An illustrated primer, Part 2

This post is the second in a two-part series on zero-knowledge proofs. Click here t2380271980_b2a66bd47d_zo read Part 1.

In this post I’m going to continue the short, (relatively) non-technical overview of zero knowledge proofs that I started a couple of years ago. Yes, that was a very long time! If you didn’t catch the first post, now would be an excellent time to go read it.

Before we go much further, a bit of a warning. While this series is still intended as a high-level overview, at a certain point it’s necessary to dig a bit deeper into some specific algorithms. So you should expect this post to get a bit wonkier than the last.

A quick recap, and a bit more on Zero Knowledge(ness)

First, a brief refresher.

In the last post we defined a zero knowledge proof as an interaction between two computer programs (or Turing machines) — respectively called a Prover and a Verifier — where the Prover works to convince the Verifier that some mathematical statement is true. We also covered a specific example: a clever protocol by Goldreich, Micali and Wigderson that allows us to prove, in zero knowledge, that a graph possesses a three-coloring.

In the course of that discussion, we described three critical properties that any zero knowledge proof must satisfy:

  • Completeness: If the Prover is honest, then she will eventually convince the Verifier.
  • Soundness: The Prover can only convince the Verifier if the statement is true.
  • Zero-knowledge(ness): The Verifier learns no information beyond the fact that the statement is true.

The real challenge turns out to be finding a way to formally define the last property. How do you state that a Verifier learns nothing beyond the truth of a statement?

In case you didn’t read the previous post — the answer to this question came from Goldwasser, Micali and Rackoff, and it’s very cool. What they argued is that a protocol can be proven zero knowledge if for every possible Verifier, you can demonstrate the existence of an algorithm called a ‘Simulator’, and show that this algorithm has some very special properties.

From a purely mechanical perspective, the Simulator is like a special kind of Prover. However, unlike a real Prover — which starts with some special knowledge that allows it to prove the truth of a statement — the Simulator gets no special knowledge at all.* Nonetheless, the Simulator (or Simulators) must be able to ‘fool’ every Verifier into believing that the statement is true, while producing a transcript that’s statistically identical top (or indistinguishable from) the output of a real Prover.

The logic here flows pretty cleanly: since Simulator has no ‘knowledge’ to extract in the first place, then clearly a Verifier can’t obtain any meaningful amount of information after interacting with it. Moreover, if the transcript of the interaction is distributed identically to a real protocol run with a normal Prover, then the Verifier can’t do better against the real prover than it can do against the Simulator. (If the Verifier could do better, then that would imply that the distributions were not statistically identical.) Ergo, the Verifier can’t extract useful information from the real protocol run.

This is incredibly wonky, and worse, it seems contradictory! We’re asking that a protocol be both sound — meaning that a bogus Prover can’t trick some Verifier into accepting a statement unless it has special knowledge allowing it to prove the statement — but we’re also asking for the existence of an algorithm (the simulator) that can literally cheat. Clearly both properties can’t hold at the same time.

The solution to this problem is that both properties don’t hold at the same time.

To build our simulator, we’re allowed to do things to the Verifier that would never happen in the real world. The example that I gave in the previous post was to use a ‘time machine’ — that is, our ‘Simulator’ can rewind the Verifier program’s execution in order to ‘fool’ it. Thus, in a world where we can wind the Verifier back in time, it’s easy to show that a Simulator exists. In the real world, of course it doesn’t. This ‘trick’ gets us around the contradiction.

As a last reminder, to illustrate all of these ideas, we covered one of the first general zero knowledge proofs, devised by Goldreich, Micali and Wigderson (GMW). That protocol allowed us to prove, in zero knowledge, that a graph supports a three-coloring. Of course, proving three colorings isn’t terribly interesting. The real significance of the GMW result is theoretical. Since graph three coloring is known to be in the complexity class NP-complete, the GMW protocol can be used to prove any statement in the class NP. And that’s quite powerful.

Let me elaborate slightly on what that means:

  1. If there exists any decision problem (that is, a problem with a yes/no answer) whose witness (solution) can be verified in polynomial time, then:
  2. We can prove that said solution exists by (1) translating the problem into an instance of the graph three-coloring problem, and (2) running the GMW protocol.*

This amazing result gives us interactive zero knowledge proofs for every statement in NP. The only problem is that it’s almost totally unusable.

From theory into practice

If you’re of a practical mindset, you’re probably shaking your head at all this talk of ZK proofs. That’s because actually using this approach would be an insanely expensive and stupid thing to do. Most likely you’d first represent your input problem as a boolean circuit where the circuit is satisfied if and only if you know the correct input. Then you’d have to translate your circuit into a graph, resulting in some further blowup. Finally you’d need to run the GMW protocol, which is damned expensive all by itself.

So in practice nobody does this. It’s really considered a ‘feasibility’ result. Once you show that something is possible, the next step is to make it efficient.

But we do use zero knowledge proofs, almost every day. In this post I’m going to spend some time talking about the more practical ZK proofs that we actually use. To do that I just need give just a tiny bit of extra background.

Proofs vs. Proofs of Knowledge

Before we go on, there’s one more concept we need to cover. Specifically, we need to discuss what precisely we’re proving when we conduct a zero knowledge proof.

Let me explain. At a high level, there are two kinds of statement you might want to prove in zero knowledge. Roughly speaking, these break up as follows.

Statements about “facts”. For example, I might wish to prove that “a specific graph has a three coloring” or “some number N is in the set of composite numbers“. Each of these is a statement about some intrinsic property of the universe.

Statements about my personal knowledge. Alternatively, I might wish to prove that I know some piece information. Examples of this kind of statement include: “I know a three coloring for this graph”, or “I know the factorization of N”. These go beyond merely proving that a fact is true, and actually rely on what the Prover knows.

It’s important to recognize that there’s a big difference between these two kinds of statements! For example, it may be possible to prove that a number N is composite even if you don’t know the full factorization. So merely proving the first statement is not equivalent to proving the second one.

The second class of proof is known as a “proof of knowledge”. It turns out to be extremely useful for proving a variety of statements that we use in real life. In this post, we’ll mostly be focusing on this kind of proof.

The Schnorr identification protocol

Now that we’ve covered some of the required background, it’s helpful to move on to a specific and very useful proof of knowledge that was invented by Claus-Peter Schnorr in the 1980s. At first glance, the Schnorr protocol may seem a bit odd, but in fact it’s the basis of many of our modern signature schemes today.

Schnorr wasn’t really concerned with digital signatures, however. His concern was with identification. Specifically, let’s imagine that Alice has published her public key to the world, and later on wants to prove that she knows the secret key corresponding to that public key. This is the exact problem that we encounter in real-world protocols such as public-key SSH, so it turns out to be well-motivated.

Schnorr began with the assumption that the public key would be of a very specific format. Specifically, let p be some prime number, and let g be a generator of a cyclic group of prime-order q. To generate a keypair, Alice would first pick a random integer a between 1 and q, and then compute the keypair as:

PK_{A} = g^a~mod~p, SK_{A} = a

(If you’ve been around the block a time or two, you’ll probably notice that this is the same type of key used for Diffie-Hellman and the DSA signing algorithm. That’s not a coincidence, and it makes this protocol very useful.)

Alice keeps her secret key to herself, but she’s free to publish her public key to the world. Later on, when she wants to prove knowledge of her secret key, she conducts the following simple interactive protocol with Bob:

There’s a lot going on in here, so let’s take a minute to unpack things.

First off, we should ask ourselves if the protocol is complete. This is usually the easiest property to verify: if Alice performs the protocol honestly, should Bob be satisfied at the end of it? In this case, completeness is pretty easy to see just by doing a bit of substitution:

Proving soundness

The harder property is soundness. Mainly because we don’t yet have a good definition of what it means for a proof of knowledge to be sound. Remember that what we want to show is the following:

If Alice successfully convinces Bob, then she must know the secret key a.

It’s easy to look at the equations above and try to convince yourself that Alice’s only way to cheat the protocol is to know a. But that’s hardly a proof.

When it comes to demonstrating the soundness of a proof of knowledge, we have a really nice formal approach. Just as with the Simulator we discussed above, we need to demonstrate the existence of a special algorithm. This algorithm is called a knowledge extractor, and it does exactly what it claims to. A knowledge extractor (or just ‘Extractor’ for short) is a special type of Verifier that interacts with a Prover, and — if the Prover succeeds in completing the proof — the Extractor should be able to extract the Prover’s original secret.

And this answers our question above. To prove soundness for a proof of knowledge, we must show that an Extractor exists for every possible Prover.

Of course this again seems totally contradictory to the purpose of a zero knowledge protocol — where we’re not supposed to be able to learn secrets from a Prover. Fortunately we’ve already resolved this conundrum once for the case of the Simulator. Here again, we take the same approach. The Extractor is not required to exist during a normal run of the protocol. We simply show that it exists if we’re allowed to take special liberties with the Prover — in this case, we’ll use ‘rewinding’ to wind back the Prover’s execution and allow us to extract secrets.

The extractor for the Schnorr protocol is extremely clever — and it’s also pretty simple. Let’s illustrate it in terms of a protocol diagram. Alice (the Prover) is on the left, and the Extractor is on the right:

The key observation here is that by rewinding Alice’s execution, the Extractor can ‘trick’ Alice into making two different proof transcripts using the same k. This shouldn’t normally happen in a real protocol run, where Alice specifically picks a new k for each execution of the protocol.

If the Extractor can trick Alice into doing this, then he can solve the following simple equation to recover Alice’s secret:

It’s worth taking a moment right now to note that this also implies a serious vulnerability in bad implementations of the Schnorr protocol. If you ever accidentally use the same k for two different runs of the protocol, an attacker may be able to recover your secret key! This can happen if you use a bad random number generator.

Indeed, those with a bit more experience will notice that this is similar to a real attack on systems (with bad random number generators) that implement ECDSA or DSA signatures! This is also not a coincidence. The (EC)DSA signature family is based on Schnorr. Ironically, the developers of DSA managed to retain this vulnerability of the Schorr family of protocols while at the same time ditching the security proof that makes Schnorr so nice.

Proving zero-knowledge(ness) against an honest Verifier

Having demonstrated that Schnorr signatures are complete and sound, it remains only to prove that they’re ‘zero knowledge’. Remember that to do this, normally we require a Simulator that can interact with any possible Verifier and produce a ‘simulated’ transcript of the proof, even if the Simulator doesn’t know the secret it’s proving it knows.

The standard Schnorr protocol does not have such a Simulator, for reasons we’ll get into in a second. Instead, to make the proof work we need to make a special assumption. Specifically, the Verifier needs to be ‘honest’. That is, we need to make the special assumption that it will run its part of the protocol correctly — namely, that it will pick its challenge “c” using only its random number generator, and will not choose this value based on any input we provide it. As long as it does this, we can construct a Simulator.

Here’s how the Simulator works.

Let’s say we are trying to prove knowledge of a secret a for some public key g^a~mod~pbut we don’t actually know the value aOur Simulator assumes that the Verifier will choose some value c as its challenge, and moreover, it knows that the honest Verifier will choose the value c only based on its random number generator — and not based on any inputs the Prover has provided.

  1. First, output some initial g^{k_1} as the Prover’s first message, and find out what challenge c the Verifier chooses.
  2. Rewind the Verifier, and pick a random integer z in the range \{0,\dots,q-1\}.
  3. Compute g^{k_2} = g^z * g^{a (-c)} and output g^{k_2} as the Prover’s new initial message.
  4. When the Verifier challenges on c again, output z.
Notice that the transcript g^k, c, z will verify correctly as a perfectly valid, well-distributed proof of knowledge of the value a. The Verifier will accept this output as a valid proof of knowledge of a, even though the Simulator does not know a in the first place!
What this proves is that if we can rewind a Verifier, then (just as in the first post in this series) we can always trick the Verifier into believing we have knowledge of a value, even when we don’t. And since the statistical distribution of our protocol is identical to the real protocol, this means that our protocol must be zero knowledge — against an honest Verifier.

From interactive to non-interactive

So far we’ve shown how to use the Schnorr protocol to interactively prove knowledge of a secret key a that corresponds to a public key g^{a}. This is an incredibly useful protocol, but it only works if our Verifier is online and willing to interact with us.

An obvious question is whether we can make this protocol work without interaction. Specifically, can I make a proof that I can send you without you even being online. Such a proof is called a non-interactive zero knowledge proof (NIZK). Turning Schnorr into a non-interactive proof seems initially quite difficult — since the protocol fundamentally relies on the Verifier picking a random challenge. Fortunately there is a clever trick we can use.

This technique was developed by Fiat and Shamir in the 1980s. What they observed was that if you have a decent hash function lying around, you can convert an interactive protocol into a non-interactive one by simply using the hash function to pick the challenge.

Specifically, the revised protocol for proving knowledge of a with respect to a public key g^a looks like this:

  1. The Prover picks g^k (just as in the interactive protocol).
  2. Now, the prover computes the challenge as c = H(g^k || M) where H() is a hash function, and M is an (optional) and arbitary message string.
  3. Compute ac + k~mod~q (just as in the interactive protocol).

The upshot here is that the hash function is picking the challenge c without any interaction with the Verifier. In principle, if the hash function is “strong enough” (meaning, it’s a random oracle) then the result is a completely non-interactive proof of knowledge of the value a that the Prover can send to the Verifier. The proof of this is relatively straightforward.

The particularly neat thing about this protocol is that it isn’t just a proof of knowledge, it’s also a signature scheme. That is, if you put a message into the (optional) value M, you obtain a signature on M, which can only be produced by someone who knows the secret key a. The resulting protocol is called the Schnorr signature scheme, and it’s the basis of real-world protocols like EdDSA.

Phew.

Yes, this has been a long post and there’s probably a lot more to be said. Hopefully there will be more time for that in a third post — which should only take me another three years.
Notes:


* In this definition, it’s necessary that the statement be literally true.

How do we pay for privacy?

 

If you haven’t read Julia Angwin’s excellent profile of GnuPG’s lead developer Werner Koch, now would be a great time to check it out. Koch, who single-handedly wrote GnuPG in 1997, has been doggedly maintaining the codebase ever since — and not getting paid very well for it. Despite good intentions on all sides, Koch has been essentially going broke.

The news is not all bad. In response to Angwin’s piece, ‘the Internet’ rallied to GnuPG’s aid. So far individual and corporate donors have coughed up over EU 200,000 to pay Koch and even hire him some help. It looks like GnuPG is saved, and so are its users — for the moment.

But is this model really sustainable? I’m pretty skeptical.

Sooner or later this money will run out. And next time this happens, the Internet might not have a quarter million in spare change to fork over. In fact, that’s already the normal state of affairs for most privacy tools — software ranging from GPGTools to OTR — most of which subsist on meager donations and volunteer time. The scary part is that thousands of people depend on these tools for their privacy and even their physical safety.

This raises a question: how can we support the long-term development and maintenance of privacy tools? It turns out that nobody really knows the answer to this — but a few people are groping towards a solution. In this (entirely non-technical) post I’m going to talk a bit about what people are doing today — and how we might do better in the future.NB: I should mention that most of the smart ideas in this post come from Meredith Whittaker, who leads Google’s Open Source Research Group, and helped found Simply Secure. The dumb ideas are all mine.

How we support privacy tools today

If you’re developing, or are interested in developing privacy tools in 2015, there are a handful of funding sources for you to choose from. They include:

  1. Self-funding. The vast majority of privacy tools come from engineers working in their spare time. This is a great way to develop prototypes and simple tools, but it tends not to be very sustainable — particularly when developers have to choose between family and code maintenance.
  2. Donations, charges and crowd-funding. A few major projects pull in some cash through donations, but this seems to work well only during a catastrophe or when the spotlight is on a particular project. GnuPG, for example, made a ton of money following their recent publicity, but before that they averaged a minuscule $20k per year — and this is for one of the most widely used privacy tools on the Internet! Projects like GPG Tools recently began charging for their software, which may work a bit better. Unfortunately this is anathema for young projects that rely on network effect for their success.
  3. Industry grants. From time to time major corporations give out modest chunks of money to tool developers, particularly when those companies use tools internally. Case in point: Google Stripe and Facebook just gave $50k each to GnuPG, and the Linux Foundation Core Infrastructure Initiative (essentially an industry funding group*) kicked in an additional $60k. Unfortunately this money is tragically difficult to come by — for the average developer it might as well not exist.
  4. Government and NGOs. Perhaps the most promising source of money comes from the U.S. government, and a set of NGOs that have sprung up to disburse it. The State Dept. directly funds the Tor Project, and the government also provides block grants to groups such as the Open Technology Fund (via Radio Free Asia) and the — confusingly similar — Open Technology Institute (at New America Foundation). OTF in particular has done a phenomenal job at funding both development and audit of privacy tools.
  5. Internal industry funding. Once in a blue moon a company proposes to internally develop a privacy tool like Google/Yahoo End-to-End. I’ll believe this works when I see it.
  6. Academic research funding. A few academic tools have managed to slip into this space, most notably OTR and Tor. But this model is awfully hard to sustain, mostly because academics don’t get paid to do things. We get paid to teach and write papers. It’s hard to sustain software development this way.
  7. Bitcoin wallet theft. This is mostly a joke.
Of these funding sources, the U.S. government is by far the heaviest hitter — responsible for funding many well-known projects such as Tor and TextSecure. While I tend to think any money is good money in the hands of right people, I should point out that this view is not universally shared. In part this is because we, admittedly, don’t have much of a process to validate code and convince non-experts that this process isn’t producing compromised code.

As Jillian York points out, US government funding also comes with some political baggage, and sadly, tends to attract more than its fair share of paranoia.

Developers need more than just money!

If you give a starving privacy tool a million bucks, you get a well-fed privacy tool. Unfortunately it may not actually be a better privacy tool. That’s not because people are trying to waste your cash. It turns out that software development, crypto, and security are just plain hard.

So yes, people need to eat — that’s a baseline. But beyond that what developers also need are things like expert guidance, security audits, analysis tools, and collaboration with other developers. They also really need help with the hardest problem in computer science, which is turning a pile of code into a product that people want to use.

A few groups like OTF (in case you’re not getting the hint, I really like them*) are trying to help with some of this. They fund professional code audits through groups like iSEC Partners. They fund usability resources and provide communications help. They host regular meetings where project members can get together and dork out about how to handle encrypted spam. A friend calls this a ‘dog park for developers who haven’t been outside for a while.’ This sounds silly, but it really, really helps.

Beyond those things, there’s still an awful problem of coordinating all of this technical stuff so that auditing results adhere to some consistent standard and produce knowledge that gets retained within the organization, as well as seeing that tools get proper academic analysis where necessary. And finally, there are useful services such as connecting developers with UI designers and helping them to turn their tools into real, usable products.

A few groups have done well at this all on their own. Moxie’s Open Whisper Systems not only launched a popular messaging app, but managed to get TextSecure (the protocol) into 600 million WhatsApp clients. Unfortunately this kind of success doesn’t come easy to people and requires a lot of assistance. Institutions can really help.

How can we do better?

There are a lot of answers to this question. But since this is a blog post, let’s swing for the fences. What’s really needed is a privacy incubator. A place that provides both funding (or at least, guides funding) as well as in-house technical staff and researchers, non-technical help such a communications, infrastructure, a great advisory board, and access to tools and engineers.

In essence this center would combine all the best parts of NGOs, academic institutions, and corporate research into one center. It would help with projects ranging from research to development, and would also provide infrastructure for developers — helping to keep them from re-inventing the wheel with each new idea, and perhaps even helping projects to merge when one has strengths. Connecting them with corporations who could conceivably deploy their tool.

This organization could also provide resources ranging from legal advice to marketing, two areas that software developers are notoriously bad at. It might even provide important, but miscellaneous resources like healthcare.

Please keep in mind I’m not advocating the creation of an entirely new organization — god forbid, we have enough of those already (the XKCD cartoon at right comes to mind). Instead, the goal should be to identify organizations that are already working and either connect that, or build up their capabilities with a large infusion of cash.

Anyway, we can all dream. But this is a dream that would actually make some difference.

So will this happen?

I guess it depends on the will and the money. It also depends on us: that is, on the willingness of the technically focused privacy/security community to accept that many of the elements we need to succeed are outside of our personal realm of expertise, and we need help with them.

Friends of mine also keep telling me that there are major philanthropic organizations out there looking to make a difference in this area. I’m still waiting to see it happen, but wheels turn slowly. One thing I can tell you: it wouldn’t take much to do better than what we have today.

* Full disclosure: I’m on the advisory board for Linux Foundation’s Core Infrastructure Initiative. I also once sat in on an advisory board meeting for OTF. Nobody paid me — but they did feed me lunch.

Noodling about IM protocols

The last couple of months have been a bit slow in the blogging department. It’s hard to blog when there are exciting things going on. But also: I’ve been a bit blocked. I have two or three posts half-written, none of which I can quite get out the door.

Instead of writing and re-writing the same posts again, I figured I might break the impasse by changing the subject. Usually the easiest way to do this is to pick some random protocol and poke at it for a while to see what we learn.

The protocols I’m going to look at today aren’t particularly ‘random’ — they’re both popular encrypted instant messaging protocols. The first is OTR (Off the Record Messaging). The second is Cryptocat’s group chat protocol. Each of these protocols has a similar end-goal, but they get there in slightly different ways.
I want to be clear from the start that this post has absolutely no destination. If you’re looking for exciting vulnerabilities in protocols, go check out someone else’s blog. This is pure noodling.

The OTR protocol

OTR is probably the most widely-used protocol for encrypting instant messages. If you use IM clients like Adium, Pidgin or ChatSecure, you already have OTR support. You can enable it in some other clients through plugins and overlays.

OTR was originally developed by Borisov, Goldberg and Brewer and has rapidly come to dominate its niche. Mostly this is because Borisov et al. are smart researchers who know what they’re doing. Also: they picked a cool name and released working code.

OTR works within the technical and usage constraints of your typical IM system. Roughly speaking, these are:

  1. Messages must be ASCII-formatted and have some (short) maximum length.
  2. Users won’t bother to exchange keys, so authentication should be “lazy” (i.e., you can authenticate your partners after the fact).
  3. Your chat partners are all FBI informants so your chat transcripts must be plausibly deniable — so as to keep them from being used as evidence against you in a court of law.

Coming to this problem fresh, you might find goal (3) a bit odd. In fact, to the best of my knowledge no court in the history of law has ever used a cryptographic transcript as evidence that a conversation occurred. However it must be noted that this requirement makes the problem a bit more sexy. So let’s go with it!

“Dammit, they used a deniable key exchange protocol” said no Federal prosecutor ever.

The OTR (version 2/3) handshake is based on the SIGMA key exchange protocol. Briefly, it assumes that both parties generate long-term DSA public keys which we’ll denote by (pubA, pubB). Next the parties interact as follows:

The OTRv2/v3 AKE. Diagram by Bonneau and Morrison, all colorful stuff added. There’s also
an OTRv1 protocol that’s too horrible to talk about here.

There are four elements to this protocol:

  1. Hash commitment. First, Bob commits to his share of a Diffie-Hellman key exchange (g^x) by encrypting it under a random AES key r and sending the ciphertext and a hash of g^x over to Alice.
  2. Diffie-Hellman Key Exchange. Next, Alice sends her half of the key exchange protocol (g^y). Bob can now ‘open’ his share to Alice by sending the AES key r that he used to encrypt it in the previous step. Alice can decrypt this value and check that it matches the hash Bob sent in the first message.Now that both sides have the shares (g^x, g^y) they each use their secrets to compute a shared secret g^{xy} and hash the value several ways to establish shared encryption keys (c’, Km2, Km’2) for subsequent messages. In addition, each party hashes g^{xy} to obtain a short “session ID”.

    The sole purpose of the commitment phase (step 1) is to prevent either Alice or Bob from controlling the value of the shared secret g^{xy}. Since the session ID value is derived by hashing the Diffie-Hellman shared secret, it’s possible to use a relatively short session ID value to authenticate the channel, since neither Alice nor Bob will be able to force this ID to a specific value.

  3. Exchange of long-term keys and signatures. So far Alice and Bob have not actually authenticated that they’re talking to each other, hence their Diffie-Hellman exchange could have been intercepted by a man-in-the-middle attacker. Using the encrypted channel they’ve previously established, they now set about to fix this.Alice and Bob each send their long-term DSA public key (pubA, pubB) and key identifiers, as well as a signature on (a MAC of) the specific elements of the Diffie-Hellman message (g^x, g^y) and their view of which party they’re communicating with. They can each verify these signatures and abort the connection if something’s amiss.**
  4. Revealing MAC keys. After sending a MAC, each party waits for an authenticated response from its partner. It then reveals the MAC keys for the previous messages.
  5. Lazy authentication. Of course if Alice and Bob never exchange public keys, this whole protocol execution is still vulnerable to a man-in-the-middle (MITM) attack. To verify that nothing’s amiss, both Alice and Bob should eventually authenticate each other. OTR provides three mechanisms for doing this: parties may exchange fingerprints (essentially hashes) of (pubA, pubB) via a second channel. Alternatively, they can exchange the “session ID” calculated in the second phase of the protocol. A final approach is to use the Socialist Millionaires’ Problem to prove that both parties share the same secret.
The OTR key exchange provides the following properties:

Protecting user identities. No user-identifying information (e.g., long-term public keys) is sent until the parties have first established a secure channel using Diffie-Hellman. The upshot is that a purely passive attacker doesn’t learn the identity of the communicating partners — beyond what’s revealed by the higher-level IM transport protocol.*

Unfortunately this protection fails against an active attacker, who can easily smash an existing OTR connection to force a new key agreement and run an MITM on the Diffie-Hellman used during the next key agreement. This does not allow the attacker to intercept actual message content — she’ll get caught when the signatures don’t check out — but she can view the public keys being exchanged. From the client point of view the likely symptoms are a mysterious OTR error, followed immediately by a successful handshake.

One consequence of this is that an attacker could conceivably determine which of several clients you’re using to initiate a connection.

Weak deniability. The main goal of the OTR designers is plausible deniability. Roughly, this means that when you and I communicate there should be no binding evidence that we really had the conversation. This rules out obvious solutions like GPG-based chats, where individual messages would be digitally signed, making them non-repudiable.

Properly defining deniability is a bit complex. The standard approach is to show the existence of an efficient ‘simulator’ — in plain English, an algorithm for making fake transcripts. The theory is simple: if it’s trivial to make fake transcripts, then a transcript can hardly be viewed as evidence that a conversation really occurred.

OTR’s handshake doesn’t quite achieve ‘strong’ deniability — meaning that anyone can fake a transcript between any two parties — mainly because it uses signatures. As signatures are non-repudiable, there’s no way to fake one without actually knowing your public key. This reveals that we did, in fact, communicate at some point. Moreover, it’s possible to create an evidence trail that I communicated with you, e.g., by encoding my identity into my Diffie-Hellman share (g^x). At very least I can show that at some point you were online and we did have contact.

But proving contact is not the same thing as proving that a specific conversation occurred. And this is what OTR works to prevent. The guarantee OTR provides is that if the target was online at some pointand you could contact them, there’s an algorithm that can fake just about any conversation with the individual. Since OTR clients are, by design, willing to initiate a key exchange with just about anyone, merely putting your client online makes it easy for people to fake such transcripts.***

Towards strong deniability. The ‘weak’ deniability of OTR requires at least tacit participation of the user (Bob) for which we’re faking the transcript. This isn’t a bad property, but in practice it means that fake transcripts can only be produced by either Bob himself, or someone interacting online with Bob. This certainly cuts down on your degree of deniability.

A related concept is ‘strong deniability‘, which ensures that any party can fake a transcript using only public information (e.g., your public keys).

OTR doesn’t try achieve strong deniability — but it does try for something in between. The OTR version of deniability holds that an attacker who obtains the network traffic of a real conversation — even if they aren’t one of the participants — should be able alter the conversation to say anything he wants. Sort of.

The rough outline of the OTR deniability process is to generate a new message authentication key for each message (using Diffie-Hellman) and then reveal those keys once they’ve been used up. In theory, a third party can obtain this transcript and — if they know the original message content — they can ‘maul’ the AES-CTR encrypted messages into messages of their choice, then they can forge their own MACs on the new messages.

OTR message transport (source: Bonneau and Morrison, all colored stuff added).

Thus our hypothetical transcript forger can take an old transcript that says “would you like a Pizza and turn it into a valid transcript that says, for example, “would you like to hack STRATFOR“… Except that they probably can’t, since the first message is too short and… oh lord, this whole thing is a stupid idea — let’s stop talking about it.

The OTRv1 handshake. Oh yes, there’s also an OTRv1 protocol that has a few issues and isn’t really deniable. Even better, an MITM attacker can force two clients to downgrade to it, provided both support that version. Yuck.

So that’s the OTR protocol. While I’ve pointed out a few minor issues above, the truth is that the protocol is generally an excellent way to communicate. In fact it’s such a good idea that if you really care about secrecy it’s probably one of the best options out there.

Cryptocat

Since we’re looking at IM protocols I thought it might be nice to contrast with another fairly popular chat protocol: Cryptocat‘s group chat. Cryptocat is a web-based encrypted chat app that now runs on iOS (and also in Thomas Ptacek’s darkest nightmares).

Cryptocat implements OTR for ‘private’ two-party conversations. However OTR is not the default. If you use Cryptocat in its default configuration, you’ll be using its hand-rolled protocol for group chats.

The Cryptocat group chat specification can be found here, and it’s remarkably simple. There are no “long-term” keys in Cryptocat. Diffie-Hellman keys are generated at the beginning of each session and re-used for all conversations until the app quits. Here’s the handshake between two parties:

Cryptocat group chat handshake (current revision). Setting is Curve25519. Keys are generated when the application launches, and re-used through the session.

If multiple people join the room, every pair of users repeats this handshake to derive a shared secret between every pair of users. Individuals are expected to verify each others’ keys by checking fingerprints and/or running the Socialist Millionaire protocol.

Unlike OTR, the Cryptocat handshake includes no key confirmation messages, nor does it attempt to bind users to their identity or chat room. One implication of this is that I can transmit someone else’s public key as if it were my own — and the recipients of this transmission will believe that the person is actually part of the chat.

Moreover, since public keys aren’t bound to the user’s identity or the chat room, you could potentially route messages between a different user (even a user in a different chat room) while making it look like they’re talking to you. Since Cryptocat is a group chat protocol, there might be some interesting things you could do to manipulate the conversation in this setting.****

Does any of this matter? Probably not that much, but it would be relatively easy (and good) to fix these issues.

Message transmission and consistency. The next interesting aspect of Cryptocat is the way it transmits encrypted chat messages. One of the core goals of Cryptocat is to ensure that messages are consistent between individual users. This means that all users should be able to verify that the other user is receiving the same data as it is.

Cryptocat uses a slightly complex mechanism to achieve this. For each pair of users in the chat, Cryptocat derives an AES key and an MAC key from the Diffie-Hellman shared secret. To send a message, the client:

  1. Pads the message by appending 64 bytes of random padding.
  2. Generates a random 12-byte Initialization Vector for each of the Nusers in the chat.
  3. Encrypts the message using AES-CTR under the shared encryption key for each user.
  4. Concatenates all of the N resulting ciphertexts/IVs and computes an HMAC of the whole blob under each recipient’s key.
  5. Calculates a ‘tag’ for the message by hashing the following data:

    padded plaintext || HMAC-SHA512alice || HMAC-SHA512bob || HMAC-SHA512carol || …

  6. Broadcasts the ciphertexts, IVs, MACs and the single ‘tag’ value to all users in the conversation.

When a recipient receives a message from another user, it verifies that:

  1. The message contains a valid HMAC under its shared key.
  2. This IV has not been received before from this sender.
  3. The decrypted plaintext is consistent with the ‘tag’.

Roughly speaking, the idea here is to make sure that every user receives the same message. The use of a hashed plaintext is a bit ugly, but the argument here is that the random padding protects the message from guessing attacks. Make what you will of this.

Anti-replay. Cryptocat also seeks to prevent replay attacks, e.g., where an attacker manipulates a conversation by simply replaying (or reflecting) messages between users, so that users appear to be repeating statements. For example, consider the following chat transcripts:

Replays and reflection attacks.

Replay attacks are prevented through the use of a global ‘IV array’ that stores all previously received and sent IVs to/from all users. If a duplicate IV arrives, Cryptocat will reject the message. This is unwieldy but it generally seems ok to prevent replays and reflection.

A limitation of this approach is that the IV array does not live forever. In fact, from time to time Cryptocat will reset the IV array without regenerating the client key. This means that if Alice and Bob both stay online, they can repeat the key exchange and wind up using the same key again — which makes them both vulnerable to subsequent replays and reflections. (Update: This issue has since been fixed).

In general the solution to these issues is threefold:

  1. Keys shouldn’t be long-term, but should be regenerated using new random components for each key exchange.
  2. Different keys should be derived for the Alice->Bob and Bob->Alice direction
  3. It would be be more elegant to use a message counter than to use this big, unwieldy key array.

The good news is that the Cryptocat developers are working on a totally new version of the multi-party chat protocol that should be enormously better.

In conclusion

I said this would be a post that goes nowhere, and I delivered! But I have to admit, it helps to push it out of my system.

None of the issues I note above are the biggest deal in the world. They’re all subtle issues, which illustrates two things: first, that crypto is hard to get right. But also: that crypto rarely fails catastrophically. The exciting crypto bugs that cause you real pain are still few and far between.

Notes:

* In practice, you might argue that the higher-level IM protocol already leaks user identities (e.g., Jabber nicknames). However this is very much an implementation choice. Moreover, even when using Jabber with known nicknames, you might access the Jabber server using one of several different clients (your computer, phone, etc.). Assuming you use Tor, the only indication of this might be the public key you use during OTR. So there’s certainly useful information in this protocol.

** Notice that OTR signs a MAC instead of a hash of the user identity information. This happens to be a safe choice given that the MAC used is based on HMAC-SHA2, but it’s not generally a safe choice. Swapping the HMAC out for a different MAC function (e.g., CBC-MAC) would be catastrophic.

*** To get specific, imagine I wanted to produce a simulated transcript for some conversation with Bob. Provided that Bob’s client is online, I can send Bob any g^x value I want. It doesn’t matter if he really wants to talk to me — by default, his client will cheerfully send me back his own g^y and a signature on (g^x, g^y, pub_B, keyid_B) which, notably, does not include my identity. From this point on all future authentication is performed using MACs and encrypted under keys that are known to both of us. There’s nothing stopping me from faking the rest of the conversation.

**** Incidentally, a similar problem exists in the OTRv1 protocol.

Attack of the week: OpenSSL Heartbleed

Ouch. (Logo from heartbleed.com)

I start every lecture in my security class by asking the students to give us any interesting security or crypto news they’ve seen recently, preferably with a focus on vulnerabilities. The start of my last class was pretty lame, which meant either (1) we’d finally learned how to make our crypto software secure, or (2) something terrible was about to happen.

I’ll let you take a guess which one it was.

The news today is all about the Heartbleed bug in OpenSSL (nb: a truly awful name that makes me glad there are no other vital organs referenced in the TLS code). Unlike all the fancy crypto-related attacks we’ve seen recently — BEAST, CRIME, Lucky13, etc. — Heartbleed doesn’t require any interesting crypto. In fact it’s the result of a relatively mundane coding error. And predictably, this makes it more devastating than all of those fancy attacks put together.

I’m going to talk about Heartbleed using the ‘fun’ question/answer format I save for this kind of post.

What’s Heartbleed and why should I care about OpenSSL?

In case you haven’t read the Heartbleed website, go do that. Here I’ll just give a quick overview.

Heartbleed is a surprisingly small bug in a piece of logic that relates to OpenSSL’s implementation of the TLS ‘heartbeat’ mechanism. The bug is present in OpenSSL versions 1.0.1 through 1.0.1f (and not in other versions). Sadly, these versions have seen a great deal of adoption lately, because security professionals have been urging developers to implement more recent versions of TLS (1.1 and 1.2). Deployment of TLS 1.2 currently stands at about 30% of the SSL Pulse data set. Many of those servers are likely vulnerable.

The problem is fairly simple: there’s a tiny vulnerability — a simple missing bounds check — in the code that handles TLS ‘heartbeat’ messages. By abusing this mechanism, an attacker can request that a running TLS server hand over a relatively large slice (up to 64KB) of its private memory space. Since this is the same memory space where OpenSSL also stores the server’s private key material, an attacker can potentially obtain (a) long-term server private keys, (b) TLS session keys, (c) confidential data like passwords, (d) session ticket keys.

Alleged Yahoo user credentials visible due to Heartbleed (source: Mark Loman).

Any of the above may allow an attacker to decrypt ongoing TLS sessions or steal useful information. However item (a) above is by far the worst, since an attacker who obtains the server’s main private keys can potentially decrypt past sessions (if made using the non-PFS RSA handshake) or impersonate the server going forward. Worst of all, the exploit leaves no trace.

You should care about this because — whether you realize it or not — a hell of a lot of the security infrastructure you rely on is dependent in some way on OpenSSL. This includes many of the websites that store your personal information. And for better or for worse, industry’s reliance on OpenSSL is only increasing.

What’s the remedy?

Unfortunately it’s pretty nasty.

You can test if a given server is vulnerable using one of these tools (note: use at your own risk). Having identified a problem, the first step is to patch OpenSSL. Fortunately this is relatively easy. The 1.0.1g version is not vulnerable, and Debian has a patch. You can also recompile OpenSSL with the –DOPENSSL_NO_HEARTBEATS option.

Sadly, this is only the beginning. Since there’s no way to tell whether a server has been exploited (and exploit code is now in the wild) you need to assume that it is. This means the safe move is to revoke your certificate and get a new one. Have fun.

What’s the bug?

The TLS Heartbeat mechanism is designed to keep connections alive even when no data is being transmitted. Heartbeat messages sent by one peer contain random data and a payload length. The other peer is suppose to respond with a mirror of exactly the same data.

   When a HeartbeatRequest message is received and sending a
HeartbeatResponse is not prohibited as described elsewhere in this
document, the receiver MUST send a corresponding HeartbeatResponse
message carrying an exact copy of the payload of the received
HeartbeatRequest.

The data structure of the request looks like this. Note the two-byte payload length:

   struct {
HeartbeatMessageType type;
uint16 payload_length;
opaque payload[HeartbeatMessage.payload_length];
opaque padding[padding_length]; 

   } HeartbeatMessage;

Which brings us to the bug. The original bug was introduced in this Git commit. The code appears in different files (for DTLS and TLS). Here we’ll look at the file t1_lib.c:

2412         /* Read type and payload length first */
2413         hbtype = *p++;
2414         n2s(p, payload);
2415         pl = p;
2417         if (s->msg_callback)
2418                 s->msg_callback(0, s->version, TLS1_RT_HEARTBEAT,
2419                         &s->s3->rrec.data[0], s->s3->rrec.length,
2420                         s, s->msg_callback_arg);
2422         if (hbtype == TLS1_HB_REQUEST)
2423                 {
2424                 unsigned char *buffer, *bp;
2425                 int r;
2427                 /* Allocate memory for the response, size is 1 bytes
2428                  * message type, plus 2 bytes payload length, plus
2429                  * payload, plus padding
2430                  */
2431                 buffer = OPENSSL_malloc(1 + 2 + payload + padding);
2432                 bp = buffer;
2433                 
2434                 /* Enter response type, length and copy payload */
2435                 *bp++ = TLS1_HB_RESPONSE;
2436                 s2n(payload, bp);
2437                 memcpy(bp, pl, payload);
2438                 
2439                 r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
As you can see, the incoming (adversarially-generated) data contains a payload length (“payload”) which is trusted without bounds checks. OpenSSL then allocates a buffer for its response, and copies “payload” data bytes from the pointer “pl” into it. Unfortunately, there’s no check to make sure that there are actually “payload” bytes in data, or that this is in bounds. Hence the attacker gets a slice of data from main memory — one that’s up to 64KB in length.
The fix is equally simple. Just add a bounds check:
+    /* Read type and payload length first */
+ if (1 + 2 + 16 > s->s3->rrec.length)
+ return 0; /* silently discard */
+ hbtype = *p++;
+ n2s(p, payload);
+ if (1 + 2 + payload + 16 > s->s3->rrec.length)
+ return 0; /* silently discard per RFC 6520 sec. 4 */
+ pl = p;
+

Wasn’t that dull?

Come on guys, next time please: use a cache timing weakness in AES-GCM encryption. This whole ‘effectively breaking OpenSSL by finding simple C coding bugs’ isn’t even challenging.

Should we rail against the OpenSSL developers for this?

Don’t even think about it. The OpenSSL team, which is surprisingly small, has been given the task of maintaining the world’s most popular TLS library. It’s a hard job with essentially no pay. It involves taking other folks’ code (as in the case of Heartbeat) and doing a best-possible job of reviewing it. Then you hope others will notice it and disclose it responsibly before disasters happen.

The OpenSSL developers have a pretty amazing record considering the amount of use this library gets and the quantity of legacy cruft and the number of platforms (over eighty!) they have to support. Maybe in the midst of patching their servers, some of the big companies that use OpenSSL will think of tossing them some real no-strings-attached funding so they can keep doing their job.

Dear Apple: Please set iMessage free

Normally I avoid complaining about Apple because (a) there are plenty of other people carrying that flag, and (b) I honestly like Apple and own numerous lovely iProducts. I’m even using one to write this post.

Moroever, from a security point of view, there isn’t that much to complain about. Sure, Apple has a few irritating habits — shipping old, broken versions of libraries in its software, for example. But on the continuum of security crimes this stuff is at best a misdemeanor, maybe a half-step above ‘improper baby naming‘. Everyone’s software sucks, news at 11.

There is, however, one thing that drives me absolutely nuts about Apple’s security posture. You see, starting about a year ago Apple began operating one of the most widely deployed encrypted text message services in the history of mankind. So far so good. The problem is that they still won’t properly explain how it works.

And nobody seems to care.

I am, of course, referring to iMessage, which was deployed last year in iOS Version 5. It allows — nay, encourages — users to avoid normal carrier SMS text messages and to route their texts through Apple instead.

Now, this is not a particularly new idea. But iMessage is special for two reasons. First it’s built into the normal iPhone texting application and turned on by default. When my Mom texts another Apple user, iMessage will automatically route her message over the Internet. She doesn’t have to approve this, and honestly, probably won’t even know the difference.

Secondly, iMessage claims to bring ‘secure end-to-end encryption‘ (and authentication) to text messaging. In principle this is huge! True end-to-end encryption should protect you from eavesdropping even by Apple, who carries your message. Authentication should protect you from spoofing attacks. This stands in contrast to normal SMS which is often not encrypted at all.

So why am I looking a gift horse in the mouth? iMessage will clearly save you a ton in texting charges and it will secure your messages for free. Some encryption is better than none, right?

Well maybe.

To me, the disconcerting thing about iMessage is how rapidly it’s gone from no deployment to securing billions of text messages for millions of users. And this despite the fact that the full protocol has never been published by Apple or (to my knowledge) vetted by security experts. (Note: if I’m wrong about this, let me know and I’ll eat my words.)

What’s worse is that Apple has been hyping iMessage as a secure protocol; they even propose it as a solution to some serious SMS spoofing bugs. For example:

Apple takes security very seriously. When using iMessage instead of SMS, addresses are verified which protects against these kinds of spoofing attacks. One of the limitations of SMS is that it allows messages to be sent with spoofed addresses to any phone, so we urge customers to be extremely careful if they’re directed to an unknown website or address over SMS.

And this makes me nervous. While iMessage may very well be as secure as Apple makes it out to be, there are plenty of reasons to give the protocol a second look.

For one thing, it’s surprisingly complicated.

iMessage is not just two phones talking to each other with TLS. If this partial reverse-engineering of the protocol (based on the MacOS Mountain Lion Messages client) is for real, then there are lots of moving parts. TLS. Client certificates. Certificate signing requests. New certificates delivered via XML. Oh my.

As a general rule, lots of moving parts means lots of places for things to go wrong. Things that could seriously reduce the security of the protocol. And as far as I know, nobody’s given this much of  a look. It’s surprising.

Moreover, there are some very real questions about what powers Apple has when it comes to iMessage. In principle ‘end-to-end’ encryption should mean that only the end devices can read the connection. In practice this is almost certainly not the case with iMessage. A quick glance at the protocol linked above is enough to tell me that Apple operates as a Certificate Authority for iMessage devices. And as a Certificate Authority, it may be able to substantially undercut the security of the protocol. When would Apple do this? How would it do this? Are we allowed to know?

Finally, there have been several reports of iMessages going astray and even being delivered to the wrong (or stolen) devices. This stuff may all have a reasonable explanation, but it’s yet another set of reasons why we it would be nice to understand iMessage better than we do now if we’re going to go around relying on it.

So what’s my point with all of this?

This is obviously not a technical post. I’m not here to present answers, which is disappointing. If I knew the protocol maybe I’d have some. Maybe I’d even be saying good things about it.

Rather, consider this post as a plea for help. iMessage is important. People use it. We ought to know how secure it is and what risks those people are taking by using it. The best solution would be for Apple to simply release a detailed specification for the protocol — even if they need to hold back a few key details. But if that’s not possible, maybe we in the community should be doing more to find out.

Remember, it’s not just our security at stake. People we know are using these products. It would be awfully nice to know what that means.

A missing post (updated)

Update (8/27): I’ve put the original post back up.

Update (8/9): I’ve re-written this post to include a vague, non-specific explanation of the bug. I’ve now confirmed the problem with one vendor, who has asked for a week to issue a patch. So far I haven’t had a response from the DCP’s technical people. And yes, I do realize someone PasteBinned the original post while it was up.

A few people have asked what happened to the post that was in this space just a few hours ago. No, you’re not going crazy. It was here.

The post contained a long, detailed evaluation of the HDCP v2 protocol. My idea was to do real-time evaluation of an industry protocol that hasn’t been deployed yet — a kind of ‘liveblogging’ cryptanalysis. What I expected to find was some bad practices, which I would gently poke fun at. I didn’t expect to find anything serious.

I was wrong in that initial judgement, with some caveats. I’m going to give a vague and non-specific summary here, and I hope to re-post the detailed technical post in a few days when I’ve heard (something, anything!) from DCP, the organization that maintains HDCP.

In case you’ve never heard of it, HDCP is a security protocol used to ‘protect’ video traveling over wired and wireless networks. There are two versions. Version 1 is in your TV today, and was seriously compromised in 2010. Version 2 is much better, but has only been deployed in a few products — including those that implement MiraCast (formerly Wi-Fi Display).

Version 2 contains a key agreement protocol that’s designed to establish a session encryption key between a transmitter (your phone, for example) and a receiver (a TV). Once this key is established, the transmitter can encrypt all video data going over the wire.

What I discovered in my brief analysis is a flaw in the key agreement protocol that may allow a man-in-the-middle to recover the session key (actually the ‘master secret’ used to derive the session key). This could potentially allow them to decrypt content. More on that in a minute, though.

I also discovered some slightly less serious flaws elsewhere in the protocol. It turns out that the DCP already knows about those, thanks to some enterprising work by a smart guy at an unnamed vendor (who deserves credit, and will get it once I put the original post back up).

Now for a few big caveats about the session key bug.

The bug I found does not get you all the way to decrypting HDCPv2 streams in practice, thanks to a tiny additional protection I missed while writing the original post. I don’t think much of this protection, since it involves a secret that’s stored in every single HDCPv2-compliant device. That’s a pretty lousy way to keep a secret.

And of course I haven’t personally verified this in any real HDCP devices, since I don’t own any. Although if I did, I could use this nifty HDCP plugin for WireShark to do some of the work.

The issue has been confirmed by one vendor, who is working on a patch for their product. Their products are used in real things that you’ve heard of, so I’m trusting that they’d know.

The last thing I want to address is why I published this, and why I subsequently pulled it.

When I wrote the original post I thought HDCP v2 was just a ‘paper spec’ — that there were no devices actually using it. Shortly after posting, I came across one commercial product that does claim to support HDCPv2. Later I discovered a few others. To be on the safe side, I decided to pull the post until I could notify the vendors. Then through sheer ineptitude I briefly re-posted it. Now I’m doing my best to put the toothpaste back in the tube.

As soon as I get some feedback I intend to put the post back up. A post which, incidentally, was not intended to break anything, but rather to serve as a lesson in just how complicated it is to design your own protocol. I suppose it’s achieved that goal.

Anyway, I’m putting this up as a placeholder in case you’re curious about what happened or why the heck I’m not blogging. Writing a long technical post and then having to can it is a drag. But hopefully we’ll be back to our regularly-scheduled programming in no time at all.

Indifferentiability

After umpteen weeks writing about broken stuff, I’m thrilled to tell you that for once, nothing awful is happening in the crypto world. It won’t last. But while it does, let’s grab the opportunity to talk about something constructive. 

Now a word of warning: what I’m going to talk about today is fairly wonky and (worse), involves hash functions. If you’re not feeling up for this, this is your cue to bail and go read something nice about buffer overflows.

For those still with me, the subject of this post is the design of hash functions, and more specifically: the indifferentiability proofs that designers write to argue for their security. I was surprised to find that most people have never heard of these proofs, and thus have no idea why they’re useful. That’s too bad, since they’re extremely important to the way we analyze hash functions today.

Merkle-Damgård

This is not Ivan Damgård. (Seriously Google?)

The best way to begin any discussion of hash function design is to take a quick peek inside of the hash functions we actually use. Since the most popular hashes today are MD5 (ugh) and SHA, the right place to start is with the ‘Merkle-Damgård’ paradigm.

To understand Merkle-Damgård, you need to understand that cryptographers love to build complicated things out of simpler components. Under the hood of most block ciphers you’ll find S-boxes. Similarly, if you take the lid off a Merkle-Damgård hash function — surprise! — you find block ciphers. Or at least, something very much like them.

This approach dates to a 1979 proposal by a young cryptographer named Ralph Merkle. What Merkle showed is a way to build hash functions with a variable-length input, using any fixed one-way compression function (a one-way function that spits out fewer bits than it takes in). While Merkle wasn’t specific about the function, he suggested that DES might be a good candidate.

Expressed as a colorful diagram, the Merkle construction looks something like this:

Merkle-Damgård Construction (source: Wikipedia because I’m too lazy to
draw my own diagrams). IV is a fixed value. f is a one-way compression function.

The beauty of Merkle’s proposal is that it’s relatively simple to understand. You simply chop your message into blocks, then feed each block into the function f along with the output of the previous function evaluation. Throw in a finalization stage and you’re done.

Of course there’s a difference between proposing a technique and showing that it actually works. It would take ten more years, but at CRYPTO 1989, Merkle and another cryptographer named Ivan Damgård independently submitted formal analyses of Merkle’s proposal. What they showed is that as long as the function f has certain ideal properties, the resulting hash function is guaranteed to be collision-resistant.  The rest, as they say, is history.

The popularity of Merkle-Damgård can be attributed in part to its security proof. But it also owes something to some major practical advantages:

  1. You can use any secure block cipher as the function f, with just a few tweaks.
  2. M-D hash functions can be pretty darn fast, again depending on f and how you use it.
  3. M-D hashes allow you to digest ‘live’ data streams, where you don’t know in advance how much data you’re going to be hashing. 
Of course, Merkle-Damgård hashes also have serious weaknesses. The most famous is the ‘length extension attack‘ in which an attacker, given only H(M) for some unknown message M, can ‘tack on’ additional blocks of her own choosing. This issue spells big trouble for people who think that H(key || message) is a good Message Authentication CodeWhat’s interesting about the length-extension issue is not that it leads to broken MACs. I mean, that is interesting, and it’s why you should use HMAC. But what’s really interesting is that this flaw doesn’t represent a violation of the collision-resistance guarantee. The two issues are in fact completely orthogonal. And this tells us something fundamental. Namely: collision-resistance is not enough.Today’s implementers do all kinds of crazy things with hash functions, and many of those applications require much more than collision-resistance. To achieve the necessary properties, we first need to figure out what they are. And that requires us to think hard about the following question:

What the heck is a secure hash function?

If you crack a typical security textbook (or visit the Wikipedia page on hash functions), you’ll see a long list of things of things a hash function ‘must’ accomplish. The list usually starts with these:
  1. Collision resistance. It should be hard to find any pair of messages M1, M2 such that H(M1) == H(M2).
  2. Pre-image resistance. Given only h it should be hard to find a ‘pre-image’ M2 such that H(M2) == h.

Now leave aside the technical fact that none of the unkeyed hash functions we use today are ‘truly’ collision-resistant. Or that the above definition of pre-image resistance implies that I can hash my cat’s name (‘fluffy’) and nobody can invert the hash (note: not true. Go ask LinkedIn if you don’t believe me.) The real problem is that these definitions don’t cover the things that people actually do with hash functions.

For example, take the construction of PRNGs. A common PRNG design hashes together large pools of gathered entropy in the hope that the result will be sufficiently uniform for cryptographic work. This is so common that it’s probably happening somewhere on your computer right now. And yet, absolutely nothing in the definitions above implies that this technique is safe!* Similar problems exist for key derivation functions, and even for signature schemes like ECDSA which clearly require hash functions that are more than just collision-resistant.
The more you look into the way that people use hash functions, the more you realize that they really need something that produces ‘random-looking’ output. Unfortunately, this notion is surprisingly hard to formalize. Hash functions are unkeyed, so they’re not pseudo-random functions. What in the world are people asking for?

Random oracles and indifferentiability

The answer, if you dig hard enough, is that people want hash functions to be random oracles.

Random oracles are cryptographers’ conception of what an ‘ideal’ hash function should be. Put succinctly, a random oracle is a perfectly random function that you can evaluate quickly. Random functions are beautiful not just because the output is random-looking (of course), but also because they’re automatically collision-resistant and pre-image resistant. It’s the only requirement you ever need.

The problem with random functions is that you just can’t evaluate them quickly: you need exponential storage space to keep them, and exponential time to evaluate one. Moreover, we know of nothing in the ‘real’ world that can approximate them. When cryptographers try to analyze their schemes with random functions, they have to go off into an imaginary fantasy world that we call the ‘random oracle model.

But ok, this post is not to judge. For the moment, let’s imagine that we are willing to visit this fantasy world. An obvious question is: what would it take to build a random oracle? If we had a compression function that was good enough — itself a random function — could we use a technique like Merkle-Damgård to get the rest of the way?

In 2004, Maurer, Renner and Holenstein gave us a powerful tool for answering this question. What they showed is that it’s always possible to replace functionality A (e.g., a random oracle) with another functionality B (e.g., an ideal compression function) provided that the following rules are satisfied:

  1. There exists a way to ‘construct’ something ‘like’ A out of B.
  2. There exists a way to ‘simulate’ something ‘like’ B using A.
  3. An attacker who interacts with {constructed A-like thing, B} cannot tell the difference (i.e., can’t differentiate it) from {A, simulated B-like thing}

The definition of simulation gets a bit wonky. but expressed in simpler language all this means is: if you can show that your hash function, instantiated with an ‘ideal’ compression function, looks indistinguishable from a random oracle. And you can show that a manufactured compression function, built using a random oracle as an ingredient, looks indistinguishable from an ideal compression function, then you can always replace one with the other. That is, your hash function is ‘good enough’ to be a random oracle.

The following year, Coron, Dodis, Malinaud and Puniya applied this framework to Merkle-Damgård-hash functions. Their first result was immediate: such a proof does not work for Merkle-Damgård. Of course this shouldn’t actually surprise us. We already know that Merkle-Damgård doesn’t behave like a random oracle, since random oracles don’t exhibit length-extension attacks. Still it’s one thing to know this, and another to see a known problem actually turn up and screw up a proof. So far, no problem.

What Coron et al. showed next is much more interesting:
  • They proved formally that Merkle-Damgård can be made indifferentiable from a random oracle, as long as you apply a prefix-free encoding to the input before hashing it. Prefix-free encodings prevent length-extensions by ensuring that no message can ever be a prefix of another.
  • Next, they proved the security of HMAC applied to a Merkle-Damgård hash.
  • Finally, and best of all, they showed that if you simply drop some bits from the last output block — something called a ‘chop’ construction — you can make Merkle-Damgård hashes secure with much less work.

The best part of Coron et al.‘s findings is that the chop construction is already (inadvertently) in place on SHA384, which is constructed by dropping some output bits from its big-brother hash SHA512. The modern hash variants SHA512/224 and SHA512/256 also have this property.** So this theoretical work already has one big payoff: we know that (under certain assumptions) these hashes may be better than some of the others.

And these results have bigger implications. Now that we know how to do this, we can repeat the process for just about every candidate hash function anyone proposes. This lets us immediately weed out obvious bugs, and avoid standardizing another hash with problems like the length extension attack. This process has become so common that all of the SHA3 candidates now sport exactly such an indifferentiability proof.

Of course, in the real world, indifferentiability only takes you so far. It does tell us something, but it doesn’t tell us everything. Sure, if the compression function is perfect, you obtain a strong result about the hash function. But compression functions are never perfect. Real compression functions have glitches and oddities that can make these theoretical results irrelevant. This is why we’ll always need smart people to arm wrestle over which hash we get to use next.

In conclusion

If I had it in me, I’d go on to talk about the SHA3 candidates, and the techniques that each uses to achieve security in this model. But this has already been a long post, so that will have to wait for another time.

I want to say only one final thing.

This is a practical blog, and I admit that I try to avoid theory. What fascinates me about this area is that it’s a great example of a place where theory has directly come to the aid of practice. You may think of hash functions as whizzing little black boxes of ad-hoc machinery, and to some extent they are. But without theoretical analysis like this, they’d be a whole lot more ad-hoc. They might not even work.

Remember this when NIST finally gets around to picking Keccak BLAKE.

Notes:

* For a ridiculous example, imagine that you have a secure (collision-resistant, pre-image resistant) hash function H. Now construct a new hash function H’ such that H'(M) = {“long string of 0s” || H(M)}. This function is as collision-resistant as the original, but won’t be very useful if you’re generating keys with it.

** Thanks to Paulo Barreto for fixing numerous typos and pointing out that SHA512/256 and /224 make excellent candidates for chop hashes!