On Ashton Kutcher and Secure Multi-Party Computation

On Ashton Kutcher and Secure Multi-Party Computation

Back in March I was fortunate to spend several days visiting Brussels, where I had a chance to attend a panel on “chat control“: the new content scanning regime being considered by the EU Commission. Among various requirements, this proposed legislation would mandate that client-side scanning technology be incorporated into encrypted text messaging applications like Signal, WhatsApp and Apple’s iMessage. The scanning tech would examine private messages for certain types of illicit content, including child sexual abuse media (known as CSAM), along with a broad category of textual conversations that constitute “grooming behavior.”

I have many thoughts about the safety of the EU proposal, and you can read some of them here. (Or you’re interested in the policy itself, you can read this recent opinion by the EU’s Council’s Legal Service.) But although the EU proposal is the inspiration for today’s post, it’s not precisely what I want to talk about. Instead, I’d like to clear up some confusion I’ve noticed around the specific technologies that many have proposed to use for building these systems.

Also: I want to talk about Ashton Kutcher.

Ashton Kutcher visits the EU parliament in March 2023 (photo: Roberta Metsola.)

It turns out there were a few people visiting Brussels to talk about encryption this March. Only a few days before my own visit, Ashton Kutcher gave a major speech to EU Parliament members in support of the Commission’s content scanning proposal. (And yes, I’m talking about that Ashton Kutcher, the guy who played Steve Jobs and is married to Mila Kunis.)

Kutcher has been very active in the technical debate around client-side scanning. He’s the co-founder of an organization called Thorn, which aims to develop cryptographic technology to enable content scanning. In March he gave an impassioned speech to the EU Parliament urging the deployment of these technologies, and remarkably he didn’t just talk about the policy side of things. When asked how to balance user privacy against the needs of scanning, he even made a concrete technical proposal: to use fully-homomorphic encryption (FHE) as a means to evaluate encrypted messages.

Now let me take a breath here before my head explodes.

I promise I am not one of those researchers who believes only subject-matter experts should talk about cryptography. Really I’m not! I write this blog because I think cryptography is amazing and I want everyone talking about it all the time. Seeing mainstream celebrities toss around phrases like “homomorphic encryption” is literally a dream come true and I wish it happened every single day.

And yet, there are downsides to this much winning.

I ran face-first into some of those downsides when I spoke to policy experts about Kutcher’s proposal. Terms like fully homomorphic encryption can be confusing and off-putting to non-cryptographers. When filtered through people who are not themselves experts in the technology, these ideas can produce the impression that cryptography is magical pixie dust we can sprinkle on all the hard problems in the world. And oh how I wish that were true. But in the real world, cryptography is full of tradeoffs. Solving one problem often just makes new problems, or creates major new costs, or else shifts the risks and costs to other parts of the system.

So when people on various sides of the debate asked me whether “fully-homomorphic encryption” could really do what Kutcher said it would, I couldn’t give an easy five-word answer. The real answer is something like: (scream emoji) it’s very complicated. That’s a very unsatisfying thing to have to tell people. Out here in the real world the technical reality is eye-glazing and full of dragons.

Which brings me to this post.

What Kutcher is really proposing is that we to develop systems that perform privacy-preserving computation on encrypted data. He wants to use these systems to enable “private” scanning of your text messages and media attachments, with the promise that these systems will only detect the “bad” content while keeping your legitimate private data safe. This is a complicated and fraught area of computer science. In what goes below, I am going to discuss at a high and relatively non-technical level the concepts behind it: what we can do, what we can’t do, and how practical it all is.

In the process I’ll discuss the two most powerful techniques that we have developed to accomplish this task: namely, multi-party computation (MPC) and, as an ingredient towards achieving the former, fully-homomorphic encryption (FHE). Then I’ll try to clear up the relationship between these two things, and explain the various tradeoffs that can make one better than the other for specific applications. Although these techniques can be used for so many things, throughout this post I’m going to focus on the specific application being considered in the EU: the use of privacy-preserving computation to conduct content scanning.

This post will not require any mathematics or much computer science, but it will require some patience. So find a comfortable chair and buckle in.

Computing on private data

Encryption is an ancient technology. Roughly speaking, it provides the ability to convert meaningful messages (and data) into a form that only you, and your intended recipient(s) can read. In the modern world this is done using public algorithms that everyone can look at, combined with secret keys that are held only by the intended recipients.

Modern encryption is really quite excellent. So as long as keys are kept safe, encrypted data can be sent over insecure networks or stored in risky locations like your phone. And while occasionally people find a flaw in an implementation of encryption, the underlying technology works very well.

But sometimes encryption can get in the way. The problem with encrypted data is that it’s, well, encrypted. When stored in this form, such data is virtually useless for practical purposes like performing calculations. Before you can compute on that data, you often need to first decrypt it and thus remove all the beautiful protections we get from encryption.

If your goal is to compute on multiple pieces of data that originate from different parties, the problem can become even more challenging. Who can we trust to do the computing? An obvious solution is to decrypt all that data and hand it to one very trustworthy person, who will presumably swear not to steal it or get hacked. Finding those parties can be quite challenging.

Fortunately for us all, the first academic cryptographers also happened to be computer scientists, and so this was exactly the sort of problem that excited them. Those researchers quickly devised a set of specific and general techniques designed to solve these problems, and also came up with a cool name for them: secure multi-party computation, or MPC for short.

MPC: secure private computation (in six eight ten paragraphs)

The setting of MPC is fairly simple: imagine that we have two (or more!) parties that each have some private data they don’t want to give to anyone else. Yet each of the parties is willing to provide their data as input to some specific computation, and are all willing to reveal the output of that computation — either to everyone involved, or perhaps just to some agreed subset of the parties. Can these parties now perform the computation jointly, without appointing a trusted party?

Let’s make this easier by using a concrete example.

Imagine a group of workers all know their own salaries, but don’t know anything about anyone else’s salary. Yet they wish to compute some statistics over their salary data: for example, the average of their salaries. These workers aren’t willing to share their own salary data with anyone else, but they are willing to submit it as one input in a large calculation under the strict condition that only the final result is ever revealed.

This might seem contrived to you, but it is in fact a real problem that some people have used MPC to solve.

An MPC protocol allows the workers to do this, without appointing a trusted central party or revealing their inputs (and intermediate calculations) to anyone else. At the conclusion of the protocol each party will learn only the result of the calculation:

The “cloud” at the center of this diagram is actually a complicated protocol where every party sends messages to every other party.

MPC protocols typically provide strong provable guarantees about their properties. The details vary, but typically speaking: no party will learn anything about the other parties’ inputs. Indeed they won’t even learn any partial information that might be produced during the calculation. Even better, all parties can be assured that the result will be correct: as long as all parties submit valid inputs to the computation, none of them should be able to force the calculation to go awry.

Now obviously there are caveats.

In practice, using MPC is a bit like making a deal with a genie: you need to pay very careful attention to the fine print. Even when the cryptography works perfectly, this does not mean that computing your function is actually “safe.” In fact, it’s entirely possible to choose functions that when computed securely are still devastating to your privacy.

For example: imagine that I use an MPC protocol to compute an average salary between myself and exactly one other worker. This could be a very bad idea! Note that if the other worker is curious, then she can figure out how much I make. That is: the average of our two wages reveals enough information that she find my wage given knowledge of her own input. This (obvious) caveat applies to many other uses of MPC, even when the technology works perfectly.

This is not a criticism of MPC, just the observation that it’s a tool. In practice, MPC (or any other cryptographic technology) is not a privacy solution by itself, at least not in the sense of privacy that real-world human beings like to think about. It provides certain guarantees that may or may not be useful for providing privacy in the real world.

What does MPC have to do with client-side scanning?

We began this post by discussing client-side scanning for encrypted messaging apps. This is a perfect example of an application that fits the MPC (or two-party computation) use-case perfectly. That’s because in this setting we generally have multiple parties with secret data who want to perform some joint calculation on their inputs.

In this setting, the first party is typically a client (usually a person using an encrypted messaging app like WhatsApp or Signal), who possesses some private text message or perhaps a media file they wish to send to another user. Under proposed law in the EU, their app could be legally mandated to “scan” that image to see if it contains illegal content.

According to the EU Commission, this scanning can be done in a variety of ways. For example, the device could compare an images against a secret database of known illicit content (typically using a specialized perceptual hash function.) However, while the EU plan starts there, their plans also get much more ambitious: they also intend to look for entirely new instances of illicit content as well as textual “grooming” conversations, possibly using machine learning (ML) models, that is, deep neural networks that will be trained to recognize data that fits these patterns. These various models must be sophisticated enough to understand entirely new images, as well as to derive meaning from complex interactive human conversation. None of this is likely to be very simple.

Now most of this could be done using standard techniques on the client device, except for one major limitation. The challenge in this setting is that the provider doing the scanning usually wants to keep these hashes and/or ML models secret.

There are several reasons for this. The first reason is that knowledge of the scanning model (or database of illicit content) makes it relatively easy for bad actors to evade the model. In other words, with only modest transformations it’s possible to modify “bad” images so that they become invisible to ML scanners.

Knowledge of the model can also allow for the creation of “poisoned” imagery: these include apparently-benign images (e.g., a picture of a cat) that trigger false positives in the scanner. (Indeed this such “colliding” images have been developed for some hash-based CSAM scanning proposals.) More worryingly, in some cases the hashes and neural network models can be “reversed” to extract the imagery and textual content they were trained on: this has all kinds of horrifying implications, and could expose abuse victims to even more trauma.

So here the user doesn’t want to send its confidential data to the provider for scanning, and the provider doesn’t want to hand its confidential model parameters to the user (or even to expose them inside the user’s phone, where they might be stolen by reverse-engineers.) This is exactly the situation that MPC was designed to handle:

Sketch of a client-side scanning architecture that uses (two-party) MPC between the client and the Provider. The client inputs the content to be scanned, while the server provides its secret model and/or hash database. The protocol gives the provider a copy of the user’s content if and only if the model says it’s illicit content, otherwise the provider sees nothing. (Note in this variant, the output goes only to the Provider.)

This makes everything very complicated. In fact, there has only been one real-world proposal for client-side CSAM scanning that has ever come (somewhat) close to deployment: that system was designed by Apple for a (now abandoned) client-side photo scanning plan. The Apple approach is cryptographically very ambitious: it uses neural-network based perceptual hashing, and otherwise exactly follows the architecture described above. However, critically: it relied on a neural-network based hash function that was not kept secret. Disastrous results ensued (see further below.)

(If you’re interested in getting a sense of how complex this protocol is, here is a white paper describing how it works.)

A diagram from the Apple CSAM scanning protocol.

Ok, so what kind of MPC protocols are available to us?

Multi-party computation is a broad category. It describes a class of protocols. In practice there are many different cryptographic techniques that allow us to realize it. Some of these (like the Apple protocol) were designed for specific applications, while others are capable of performing general-purpose computation.

I promised this post would not go into the weeds, but it’s worth pointing out that general MPC techniques typically make use of (some combination of) three different techniques: secret sharing, circuit garbling, and homomorphic encryption. Often, efficient modern systems will use a mixture of two or three of those techniques, just to make everything more confusing because they’re to maximize efficiency.

What is it that you need to know about these techniques? Here I’ll try, in a matter of a few sentences (that will draw me endless grief) to try to summarize the strengths and disadvantages of each technique.

Both secret sharing and garbling techniques share a common feature, which is that they require a great deal of data to be sent between the parties. In practice the amount of data sent between the parties will grow with (at least) the size of the inputs they’re computing on, but often will grow according to the complexity of the calculation they’re performing. For things like deep neural networks where both the data and calculation are huge, this generally results in fairly surprising amounts of data transfer.

This is not usually considered to be a problem on the general Internet or within EC2 datacenters, where data transfer is cheap. It can be quite a challenge when one of those parties is using a cellphone, however. That makes any scheme using these technologies subject to some very serious limitations.

Homomorphic encryption schemes take a different approach. These systems make use of specialized encryption schemes that are malleable. This means that encrypted data can be “modified” in useful ways without ever decrypting it.

In a bit more detail: in fully-homomorphic encryption MPC systems, a first party can encrypt its data under a public key that it generates. It can then send the encrypted data to a second party. This second party can then perform calculations on the ciphertext while it is still encrypted — adding and multiplying it together with other data (including data encrypted by the second party) to perform some calculation. Throughout this process all of the data remains encrypted. At the conclusion of this process, the second party will end up with a “modified” ciphertext that internally contains a final calculation result, but that it cannot read. To finish the protocol, the second party can send that ciphertext back to the first party, who can then decrypt it using its secret key and obtain the final output.

The major upshot of the pure-FHE technique is that it substantially reduces the amount of data that the two parties need to transmit between them, especially compared to the other MPC techniques. The downside of this approach is… well, there are several. One is that FHE calculations typically require vastly more computational effort (and hence time and carbon emissions) than the other techniques. Moreover, they may still require a good deal of data transfer — in part because the number of calculations that one can perform on a given ciphertext is usually limited by “noise” that turns up within the ciphertext. Hence, calculations must either be very simple or else broken up into “phases”, where the partial calculation result is decrypted and re-encrypted so that more computation can be done. This can be done interactively between the parties, or by the second party alone (using a technique called “bootstrapping”) but in both cases the cost is either much more bandwidth exchanged or a great deal of extra computation.

In practice, cryptographers rarely commit to a single approach. They instead combine all these techniques in order to achieve an appropriate balance of data-transfer and computational effort. These “mixed systems” tend to have merely large amounts of data transfer and large amounts of computation, but are still amazingly efficient compared to the alternatives.

For an example of this, consider this very optimized two-party MPC scheme aimed at performing neural network classification. This scheme takes (from the client) a 32×32 image, and evaluates a tiny 7-layer neural network held by a server in order to perform classification. As you can see, evaluating the model even on a single image requires about 8 seconds of computation and 200 megabytes of bandwidth exchange, for each image being evaluated:

Source: MUSE paper, figure 8. These are the times for a 7-layer MiniONN network trained on the CIFAR-10 dataset.

These numbers may seem quite high, but in fact they’re actually really impressive as these things go. Previous systems used nearly an order of magnitude more time and bandwidth to do their work. Maybe there will be further improvements in the future! Even on a pure efficiency basis there is much work to be done.

What are the other risks of MPC in this setting?

The final point I would like to make is that secure MPC (or MPC built using FHE as a tool) is not itself enough to satisfy the requirements of a safe content scanning system. As I mentioned above, MPC systems merely evaluate some function on private data. The question of whether that function is safe is left largely to the system designer.

In the case of these content scanning systems, the safety of the resulting system really comes down to a question of whether the algorithms work well, particularly in settings where “bad guys” can find adversarial inputs that try to disrupt the system. It also requires new techniques to ensure that the system cannot be abused. That is: there must be guarantees within the computation to ensure that the provider (or a party who hacks the provider) cannot change the model parameters to allow them to access your private content.

This is a much longer conversation than I want to have in this post, because it fundamentally requires one to think about whether the entire system makes sense. For a much longer discussion of the risks, see this paper.

This was nice, but I would like to learn more about each of these technologies!

The purpose of this post was just to give the briefest explanation of the techniques that exist for performing all of these calculations. If you’re interested in knowing (a lot more!) about these technologies, take a look at this textbook by Evans, Kolesnikov and Rosulek. MPC is an exciting area, and one that is advancing every single (research) conference cycle.

And maybe that is the lesson of this post: these technologies are still research techniques. It’s probably not quite time to put them out in the world.

PRFs, PRPs and other fantastic things

PRFs, PRPs and other fantastic things

A few weeks ago I ran into a conversation on Twitter about the weaknesses of applied cryptography textbooks, and how they tend to spend way too much time lecturing people about Feistel networks and the boring details of AES. Some of the folks in this conversation suggested that instead of these things, we should be digging into more fundamental topics like “what is a pseudorandom function.” (I’d link to the thread itself, but today’s Twitter is basically a forgetting machine.)

This particular point struck a chord with me. While I don’t grant the premise that Feistel networks are useless, it is true that pseudorandom functions, also known as PRFs, are awfully fundamental. Moreover: these are concepts that get way too little coverage in (non-theory) texts. Since that seems bad for aspiring practitioners, I figured I’d spend a little time trying to explain these concepts in an intuitive way — in the hopes that I can bring the useful parts to folks who aren’t being exposed to these ideas directly.

This is going to be a high-level post and hence it will skip all the useful formalism. It’s also a little wonky, so feel free to skip it if you don’t really care. Also: since I need to be contrary: I’m going to talk about Feistel networks anyway. That bit will come later.

What’s a PRF, and why should I care?

Pseudorandom functions (PRFs) and pseudorandom permutations (PRPs) are two of the most fundamental primitives in modern cryptography. If you’ve ever implemented any cryptography yourself, there’s an excellent chance you relied on an algorithm like AES, HMAC or ChaCha20 to implement either encryption or authentication. If you did this, then you probably relied on some security property you assumed those primitives to have. But what precisely is that security property you’re relying on?

We could re-imagine this security definition from scratch every time we look at a new cipher. Alternatively, we could start from a much smaller number of general mathematical objects that provide security properties that we can reason about, and try to compare those to the algorithms we actually use. The second approach has a major advantage: it’s very modular. That is, rather than re-design every single protocol each time we come up it with a new type of cipher, all we really need to do is to analyze it with the idealized mathematical objects. Then we can realize it using actual ciphers, which hopefully satisfy these well-known properties.

Two of the most common such objects are the pseudorandom function (PRF) and the pseudorandom permutation (PRP). At the highest level, these functions have two critical properties that are extremely important to cryptographers:

  1. They are keyed functions: this means that they take in a secret key as well as some input value. (This distinguishes them from some hash functions.)
  2. The output of a PRF (or PRP), when evaluated on some unique input, typically appears “random.” (But explaining this rough intuition precisely will require some finesse, see below.)

If a function actually can truly achieve those properties, we can use it to accomplish a variety of useful tasks. At the barest minimum, these properties let us accomplish message authentication (by building MACs), symmetric encryption by building stream ciphers, and key derivation (or “pluralization”) in which a single key is turned into many distinct keys. We can also use PRFs and PRPs to build other, more fundamental primitives such as pseudorandom number generators and modes of operation, which happen to be useful when encrypting things with block ciphers.

The how and why is a little complex, and that’s the part that will require all the explanation.

Random functions

There are many ideal primitives we’d love to be able to use in cryptography, but are thwarted from using due to the fact that they’re inefficient. One of the most useful of these is the random function.

Computer programmers tend to get confused about functions. This is mostly because programming language designers have convinced us that functions are the same thing as the subroutines (algorithms) that we use to compute them. In the purely mathematical sense, it’s much more useful to forget about algorithms, and instead think of functions as simply being a mapping from some set of input values (the domain) to some set of output values (the range).

If we must think about implementing functions, then for any function with a finite domain and range, there is always a simple way to implement it: simply draw up a giant (and yet still finite!) lookup table that contains the mapping from each input to the appropriate output value. Given such a table, you can always quickly realize an algorithm for evaluating it, simply by hard-coding the table into your software and performing a table lookup. (We obviously try not to do this when implementing software — indeed, most of applied computer science can be summarized as “finding ways to avoid using giant lookup tables”.)

A nice thing about the lookup table description of functions is that it helps us reason about concepts like the number of possible functions that can exist for a specific domain and range. Concretely: if a function has M distinct input values and N outputs, then the number of distinct functions sharing that profile is N^M. This probably won’t scale very well for even modest values of M and N, but let’s put this aside for a moment. Given enough paper, we could imagine writing down each unique lookup table on a piece of paper: then we could stack those papers up and admire the minor ecological disaster we’d have just created.

Now let’s take this thought-experiment one step farther: imagine that we could walk out among those huge stacks of paper we’ll have just created, and somehow pick one of these unique lookup tables uniformly at random. If we could perform this trick routinely, the result would be a true “random function”, and it would make an excellent primitive to use for cryptography. We could use these random functions to build hash functions, stream ciphers and all sorts of other things that would make our lives much easier.

There are some problems with this thought experiment, however.

A big problem is that, for functions with non-trivial domain and range, there just isn’t enough paper in the world to enumerate every possible function. Even toy examples fall apart quickly. Consider a tiny hash function that takes in (and outputs) only 4-bit strings. This gives us M=16 inputs and N=16 outputs, and hence number of (distinct) mappings is 16^{16} = 2^{64}, or about 18 quintillion. It gets worse if you think about “useful” cryptographic functions, say those with the input/output profile of ChaCha20, which has 128-bit inputs and 512-bit outputs. There you’d need a whopping (2^{512})^{2^{128}} (giant) pieces of paper. Since there are only around 2^{272} atoms in the observable universe (according to literally the first Google search I ran on the topic), we would quickly run into shortages even if we were somehow able to inscribe each table onto a single atom.

Obviously this version of the thought experiment is pretty silly. After all: why bother to enumerate every possible function if we’re going to throw most of them away? It would be much more efficient if we could sample a single random function directly without all the overhead.

This also turns out to be fairly straightforward: we can write down a single lookup table with M rows (corresponding to all possible inputs); for each row, we can sample a random output from the set of N possible outputs. The resulting table will be M rows long and each row will contain log_2(N) bits of data.

While this seems like a huge improvement over the previous approach, it’s still not entirely kosher. Even a single lookup table is still going to huge — at least as large as the function’s entire domain. For example: if we wanted to sample a random function with the input/output profile of ChaCha20, the table would require enough paper to contain 512*2^{128} = 2^{137} bits.

And no, we are not going to be able to compress this table! It should be obvious now that a random function generated this way is basically just a file full of random data. Since it has maximal entropy, compression simply won’t work.

The fact that random functions aren’t efficient for doing cryptography does not always stop cryptographers from pretending that we might use them, most famously as a way to model cryptographic hash functions in our security proofs. We have an entire paradigm called the random oracle model that makes exactly this assumption. Unfortunately, in reality we can’t actually use random functions to implement cryptographic functions — sampling them, evaluating them and distributing their code are all fundamentally infeasible operations. Instead we “instantiate” our schemes with an efficient hash algorithm like SHA3, and then we pray.

However, there is one important caveat. While we generally cannot sample and share large random functions in practice, we hope we can do something almost as interesting. That is, we can build functions that appear to be random: and we can do this in a very powerful cryptographic sense.

Pseudorandom functions

Random functions represent a single function drawn from a family of functions, namely the family that consists of every possible function that has the appropriate domain and range. As noted above, the cost of this decision is that such functions cannot be sampled, evaluated or distributed efficiently.

Pseudorandom functions share a similar story to random functions. That is, they represent a single function sampled from a family. What’s different in this case is that the pseudorandom function family is vastly smaller. A benefit of this tradeoff is that we can demand that the description of the function (and its family) be compact: pseudorandom function families must possess a relatively short description, and it must be possible to both sample and evaluate them efficiently: meaning, in polynomial time.

Compared to the set of all possible functions over a given domain and range, a pseudorandom function family is positively tiny.

Let’s stick with the example of ChaCha20. As previously discussed, ChaCha has a 128-bit input, but it also takes in a 256-bit secret key. If we were to view ChaCha20 as a pseudorandom function family, then we could view it as a family of 2^{256} individual functions, where each key value selects exactly one function from the family.

Now let’s be clear: 2^{256} is still a really big number! However: it is vastly smaller than (2^{512})^{2^{128}}, which is the total number of possible functions with ChaCha20’s input profile. Sampling a random 256-bit key and sharing it with to Bob is eminently feasible; indeed, your browser did something like this when you loaded this website. Sampling a “key” of bit-length {512}*{2^{128}} is not.

This leaves us with an important question, however. Since ChaCha20’s key is vastly smaller than the description of a random function, and the algorithmic description of ChaCha20 is also much smaller than the description of even a single random function, is it possible for small-key function family like “ChaCha20” to be as good (for cryptographic purposes) as a true random function? And what does “good” even mean here?

Defining pseudorandomness

Mirriam-Webster defines the prefix pseudo as “being apparently rather than actually as stated.” The Urban Dictionary is more colorful: it defines pseudo as “false; not real; fake replication; bootleg; tomfoolery“, and also strongly hints that pseudo may be shorthand for pseudoephedrine (note: it is not.)

Clearly if we can describe a function using a compact algorithmic description and a compact key, then it cannot be a true random function: it is therefore bootleg. However that doesn’t mean it’s entirely tomfoolery. What pseudorandom means, in a cryptographic sense, is that a function of this form will be indistinguishable from a truly random function — at least to an adversary who does not know which function we have chosen from the family, and who has a limited amount of computing power.

Let’s unpack this definition a bit!

Imagine that I create a black box that contains one of two possible items, chosen with equal probability. Item (1) is an instance of a single function sampled at random from a purported pseudorandom function family; item (2) is a true random function sampled from the set of all possible functions. Both functions have exactly the same input/output profile, meaning they take in inputs of the same length, and produce outputs of the same length (here we are excluding the key.)

Now imagine that I give you “oracle” access to this box. What this means is: you will be allowed to submit any input values you want, and the box will evaluate your input using whichever function it contains. You will only see the output. (And no, you don’t get to time the box or measure any side channels it might compute, this is a thought experiment.) You can submit as many inputs as you want, using any strategy for choosing them that you desire: they simply have to be valid inputs, meaning that they’re within the domain of the function. We will further stipulate that you will be computationally limited: that means you will only be able to compute for a limited (polynomial in, say, the PRF’s key length) number of timesteps. At the end of the day, your goal is to guess which type of function I’ve placed in the box.

We say that a family of functions is pseudorandom if for every possible efficient strategy (meaning, using any algorithm that runs in time polynomial in the key size, provided these algorithms were enumerated before the function was sampled), the “advantage” you will have in guessing what’s in the box is very tiny (at most negligible in, say, the size of the function’s key.)

A fundamental requirement of this definition is that the PRF’s key/seed (aka the selector that chooses which function to use) has to remain secret from the adversary. This is because the description of the PRF family itself cannot be kept secret: that is both good cryptographic practice (known as Kerckhoff’s principle), but also due to the way we’ve defined the problem over “all possible algorithms”, which necessarily includes algorithms that have the PRF family’s description coded inside of them.

And pseudorandom functions cannot possibly be indistinguishable from random ones if the attacker can learn or guess the PRF’s secret key: this would allow the adversary to simply compute the function themselves and compare the results they get to the values that come out of the oracle (thus winning the experiment nearly 100% of the time.)

There’s a corollary to this observation: since the key length of the PRF is relatively short, the pseudorandomness guarantee can only be computational in nature. For example, imagine the key is 256 bits long: an attacker with unlimited computational resources could brute-force guess its way through all possible 256-bit keys and test each one against the results coming from the oracle. If the box truly contains a PRF, then with high probability she’ll eventually find a key that produces the same results as what comes out of the box; if the box contains a random function, then she probably won’t. To rule such attacks out of bounds we must assume that the adversary is not powerful enough to test a large fraction of the keyspace. (In practice this requirement is pretty reasonable, since brute forcing through an n-bit keyspace requires on the order of 2^n work, and we assume that there exist reasonable values of n for which no computing device exists that can succeed at this.)

So what can we do with pseudorandom functions?

As I mentioned above, pseudorandom functions are extremely useful for a number of basic cryptographic purposes. Let’s give a handful here.

Building stream ciphers. One of the simplest applications of a PRF is to use it to build an efficient stream cipher. Indeed, this is exactly what the ChaCha20 function is typically used for. Let us assume for the moment that ChaCha20 is a PRF family (I’ll come back to this assumption later.) Then we could select a random key and evaluate the function on a series of unique input values — the ChaCha20 IETF proposals suggest concatenating a 64-bit block number with a counter — and then concatenate the outputs of the function together to produce a keystream of bits. To encrypt a message we would simply exclusive-OR (XOR) this string of bits (called a “keystream”) with the message to be enciphered.

Why is this reasonable? The argument breaks down into three steps:

  1. If we had generated the keystream using a perfect random number generator (and kept it secret, and never re-used the keystream) then the result would be a one-time pad, which is known to be perfectly secure.
  2. And indeed, had we had been computing this output using a truly random function (with a ChaCha20-like I/O profile) where each input was used exactly once, the result of this evaluation would indeed have been such a random string.
  3. Of course we didn’t do this: we used a PRF. But here we can rely on the fact that our attackers cannot distinguish PRF output from that of a random function.

One can make the last argument the other way around, too. If our attacker is much better at “breaking” the stream cipher implemented with a PRF than they are at breaking one implemented with a random function, then they are implicitly “distinguishing” the two types of function with a substantial advantage — and this is precisely what the definition of a PRF says that an attacker cannot do!

Constructing MACs. A PRF with a sufficiently large range can also be used as a Message Authentication Code. Given a message M, the output of PRF(k, M) — the PRF evaluated on a secret key k and the message M — should itself be indistinguishable from the output of a random function. Since this output will effectively be a random string, this means that an attacker who has not previously seen a MAC on M should have a hard time guessing the appropriate MAC for a given message. (The “strength” of the MAC will be proportional to the output length of the PRF.)

Key derivation. Often in cryptography we have a single random key k and we need to turn this into several random-looking keys (k1, k2, etc.) This happens within protocols like TLS, which (at least in version 1.3) has an entire tree of keys that it derives from a single master secret. PRFs, it turns out, are an excellent for this task. To “diversify” a single key into multiple keys, one can simply evaluate the PRF at a series of distinct points (say, k1 = PRF(k, 1), k2 = PRF(k, 2), and so on), and the result is a set of keys that are indistinguishable from random; provided that the PRF does what it says it does.

There are, of course, many other applications for PRFs; but these are some pretty important ones.

Pseudorandom permutations

Up until now we’ve talked about pseudorandom functions (PRFs): these are functions that have output that is indistinguishable from a random function. A related concept is that of the pseudorandom permutation (PRP). Pseudorandom permutations share many of the essential properties of PRFs, with one crucial difference: these functions realize a permutation of their input space. That is: if we concentrate on a given function in the family (or, translating to practical terms, we fix one “key”) then each distinct input maps to a distinct output (and vice versa.)

A nice feature of permutations is that they are potentially invertible, which makes them a useful model for something we use very often in cryptography: block ciphers. These ciphers take in a key and a plaintext string, and output a ciphertext of the same length as the plaintext. Most critically, this ciphertext can be deciphered back to the original plaintext. Note that a standard (pseudo)random function doesn’t necessarily allow this: for example, a PRF instance F can map multiple inputs (A, B) such that F(A) = F(B), which makes it very hard to uniquely invert either output.

The definition of a pseudorandom permutation is very similar to that of a PRF: they must be indistinguishable from some idealized function — only in this case the ideal object is a random permutation. A random permutation is simply a function sampled uniformly from the set of all possible permutations over the domain and range. (Because really, why wouldn’t it be?)

There are two important mathematical features of PRPs that I should mention here:

PRPs are actually PRFs (to an extent.) A well-known result in cryptography, called the “PRP/PRF switching lemma” demonstrates that a PRP with sufficiently-large domain and range basically “is” a PRF. Put differently: a pseudorandom permutation placed into an oracle can be computationally indistinguishable from an oracle that contains a random function (with the same domain and range), provided the range of the function is large enough and the attacker doesn’t make too many queries.

The intuition behind this result is fairly straightforward. If we consider this from the perspective of an attacker interacting with some function in an oracle, the only difference between a random permutation and a random function is that the former will never produce any collisions — distinct inputs that produce the same output — while the latter may (occasionally) do so.

Feistel cipher diagram en.svg

From the adversary’s perspective, therefore, the ability to distinguish whether the oracle contains a random permutation or a random function devolves to querying the oracle to see if one can observe such a collision. Clearly if it sees even one collision of the form F(A) = F(B), then it’s not dealing with a permutation. But it may take many queries for the attacker to find such a collision in a random function, or to be confident one should already have occurred (and hence it is probably interacting with a PRP.)

In general the ability to distinguish the two is a function of the number of queries the attacker is allowed to make, as well as the size of the function’s range. After a single query, the probability of a collision (on a random function) is zero: hence the attacker has no certainty at all. After two queries, the probability is equal to 1/N where N is the number of possible outputs. As the attacker makes more queries this probability increases. Following the birthday argument the expected probability reaches p=0.5 after about \sqrt{N} queries. For functions like AES, which has output size 2^{128}, this will occur around 2^{64} queries.

PRFs can be used to build PRPs. The above result shows us that PRPs are usually good enough to serve as PRFs “without modification.” What if one has a PRF and wishes to build a PRP from it? This can also be done, but it requires more work. The standard technique was proposed by Luby and Rackoff and it involves building a Feistel network, where each “round function” in the PRP is built using a pseudorandom function. (See the picture at right.) This is a bit more involved than I want to get in this post, so please just take away the notion that the existence of one of these objects implies the existence of the other.

Why do I care about any of this?

I mean, you don’t have to. However: I find that many people just getting into cryptography tend to get very involved in the deep details of particular constructions (ciphers and elliptic curves being one source of rabbit-holing) and take much longer to learn about useful analysis tools like PRFs and PRPs.

Once you understand how PRPs and PRFs work, it’s much easier to think about protocols like block cipher modes of operation, or MAC constructions, or anything that involves deriving multiple keys.

Take a simple example, the CBC mode of operation: this is a “classical” mode of operation used in many block ciphers. I don’t recommend that you use it (there are better modes) but it’s actually a very good teaching example. CBC encryption requires the sender to first select a random string called an Initialization Vector, then to chop up their message into equal-size blocks. Encryption looks something like this:

Cipher block chaining (CBC) mode encryption
From Wikipedia. The plus signs are bitwise XOR.

If we’re willing to assume that the block cipher is a PRP, then analyzing the security of this construction shouldn’t be terribly hard. Provided the block size of the cipher is large enough, we can first use the PRP/PRF switching lemma to argue that a PRP is (computationally) indistinguishable from a random function. To think about the security of CBC-mode encryption, therefore, we can (mathematically) replace our block cipher with a random function of the appropriate domain and range. Now the question is whether CBC-mode is secure when realized with a random function.

So if we replace the block cipher with a random function, how does the argument work?

Well obviously in a real scheme both the encryptor and decryptor would need to have a copy of the same function, and we’ve already covered why that’s problematic: the function would need to be fully-sampled and then communicated between the two parties. Then they would have to scan through a massive table to find each entry. But let’s put that aside for a moment.

Instead let’s focus only on the encryptor. Since we don’t have to think about communicating the entire function to another party, we don’t have to sample it up front. Instead we can sample it “lazily” for the purposes of arguing security.

More specifically: means instead of sampling the entire random function in one go, we can instead imagine using an oracle that “builds” the function one query at a time. The oracle works as follows: anytime the encryptor queries it on some input value, the oracle checks to see if this value has been queried before. If it has previously been queried, the oracle outputs the value it gave previously. Otherwise it samples a new (uniformly random) output string using a random number generator, then writes the input/output values down so it can check for later duplicate inputs.

Now imagine that an encryptor is using CBC mode to encrypt some secret message, but instead of a block cipher they are using our “random function” oracle above. The encryption of a message will work like this:

  1. To encrypt each new message, the encryptor will first choose a uniformly-random Initialization Vector (IV).
  2. She will then XOR that IV with the first block of the message, producing a uniformly-distributed string.
  3. Then she’ll query the random function oracle to obtain the “encipherment” of this string. Provided the oracle hasn’t seen this input before, it will sample and output a uniformly random output string. That string will form the first block of ciphertext.
  4. Then the encryptor will take the resulting ciphertext block and treat it as the “IV” for the next message block, and will repeat steps (2-4) over and over again for each subsequent block.

Notice that this encryption is pretty good. As long as the oracle never gets called on the same input value twice, the output of this encryption process will be a series of uniformly-random bits that have literally nothing to do with the input message. This strongly imples that CBC ciphertexts will be very secure! Of course we haven’t really proven this: we have to consider the probability that the encryptor will query the oracle twice on the same input value. Fortunately, with a little bit of simple probability, we can show the following: since (1) each input is uniformly distributed, then (2) the probability of such a repeated input stays quite low.

(In practice the probability works out to be a function of the function’s output length and the total number of plaintext blocks enciphered. This analysis is part of the reason that cryptographers generally prefer ciphers with large block sizes, and why we place official limits on the number of blocks you’re allowed to encipher with modes like CBC before you change the key. To see more of the gory details, look at this paper.)

Notice that so far I’ve done this analysis assuming that the block cipher (encipherment) function is a random function. In practice, it makes much more sense to assume that the block cipher is actually a pseudorandom permutation. Fortunately we’ve got most of the tools to handle this switch. We need to add two final details to the intuition: (1) since a PRF is indistinguishable from a random function to all bounded adversaries, we can first substitute in a PRF for that random function oracle with only minimal improvement in the attacker’s ability to distinguish the ciphertext from random bits. Next: (2) by the PRP/PRF switching lemma we can exchange that PRF for a PRP with similarly minor impact on the adversary’s capability.

This is obviously not a proof of security: it’s merely an intuition. But it helps to set up the actual arguments that would appear in a real proof. And you can provide a similar intuition for many other protocols that use keyed PRF/PRP type functions.

What if the PRP/PRF key isn’t secret?

One of the biggest restrictions on the PRF concept is the notion that these functions are only secure when the secret key (AKA, the choice of which “function” to use from the family) is kept secret from the adversary. We already discussed why this is critical: in the PRF (resp. PRP) security game, an attacker who learns the key can instantly “distinguish” a pseudorandom function from a random one. In other words, knowledge of the secret key explodes the entire concept of pseudorandomness. Hence from a mathematical perspective, the security properties of a PRF are somewhere between non-existent and undefined in this setting.

But that’s not very satisfying, and this kind of non-intuitive behavior only makes people ask more questions. They come back wondering: what actually happens when you learn the secret key for a PRF? Does it explode or collapse into some kind of mathematical singularity? How does a function go from “indistinguishable from random” to “completely broken” based on learning a small amount of data?

And then, inevitably, they’ll try to build things like hash functions using PRFs.

The former questions are mainly interesting to cryptographic philosophers. However the latter question is practically relevant, since people are constantly trying to do things like build hash functions out of block ciphers. (NB: this is not actually a crazy idea. It’s simply not possible to do it based solely on the assumption that these functions are pseudorandom.)

So what happens to a PRF when you learn its key?

One answer to this question draws from the following (tempting, but incorrect) line of reasoning: PRFs must somehow produce statistically-“random looking” output all the time, whether you know the key or not. Therefore, the argument goes, the PRF is effectively as good as random even after one learns the key.

This intuition is backed up by the following thought-experiment:

  1. Imagine that at time (A) I do not know the key for a PRF, but I query an oracle on a series of inputs (for simplicity, let’s say I use the values 1, 2, …, q for some integer q that is polynomial in the key length.)
  2. Clearly at this point, the outputs of the PRF must be indistinguishable from those of a true random function. If the range of the function comprises \ell-bit strings, then any statistical “randomness test” I run on those outputs should “succeed”, i.e., tell me that they look pretty random.

    (Putting this differently: if any test reliably “fails” on the output of the PRF oracle, but “succeeds” on the output of a true random function, then you’ve just built a test that lets you distinguish the PRF from a random function — and this means the function was never a PRF in the first place! And your “PRF” will now disappear in a puff of logic.)
  3. Now imagine that at time (B)after I’ve obtained the oracle outputs — someone hands me the secret key for the PRF that was inside the oracle. Do the outputs somehow “stop” being random? Will the NIST test suite suddenly start failing?

The simple answer to the last question is “obviously no.” Any public statistical test you could have performed on the original outputs will still continue to pass, even after you learn the secret key. What has changed in this instance is that you can now devise new non-public statistical tests that are based on your knowledge of the secret key. For example, you might test to see if the values are outputs of the PRF (on input the secret key), which of course they would be — and true random numbers wouldn’t be.

So far this doesn’t seem so awful.

Where things get deeply unpleasant is if the secret key is known to the attacker at the time it queries the oracle. Then the calls to the PRF can behave in ways that deviate massively from the expected behavior of a random function. For example, consider a function called “Katy-Perry-PRF” that generally behaves like a normal PRF most of the time, but that spews out Katy Perry lyrics when queried on specific (rare) inputs.

Provided that these rare inputs are hard for any attacker to find — meaning, the attacker will find them only with negligible probability — then Katy-Perry-PRF will be a perfectly lovely PRF. (More concretely, we might imagine that the total number of possible input values is exponential in the key length, and the set of “Katy-Perry-producing” input values forms a negligible fraction of this set, distributed pseudorandomly within it, to boot.) We can also imagine that the location of these Katy-Perry-producing inputs is only listed in the secret key, which a normal PRF adversary will not have.

Clearly a standard attacker (without the secret key) is unlikely to find any inputs that produce Katy Perry lyrics. Yet an attacker who knows the secret key can easily obtain the entire output of Katy Perry’s catalog: this attacker will simply look through the secret key to find the appropriate inputs, and then query them all one at a time. The behavior of the Katy-Perry function on these inputs is clearly very different from what we’d expect from a random function and yet here is a function that still satisfies the definition of a PRF.

Now obviously Katy-Perry-PRF is a silly and contrived example. Who actually cares if your PRF outputs Katy Perry lyrics? But similar examples can be used to produce PRFs that enable easy “collisions”, which is generally a bad thing when one is trying to build things like hash functions. This is why the construction of such functions needs to either assume weaker properties (i.e., that you get only collision-resistance) or make stronger assumptions, such as the (crazy) assumption that the block cipher is actually a random function.

Finally: how do we build PRFs?

So far I’ve been using the ChaCha function as an example of something we’d really like to imagine is a PRF. But the fact of the matter is that nobody knows how to actually prove this. Most of the practical functions we use like PRFs, which include ChaCha, HMAC-SHA(x), and many other ciphers, are constructed from a handful of simple mathematical operations such as rotations, XORs, and additions. The result is then analyzed by very smart people to see if they can break it. If someone finds a flaw in the function, we stop using it.

This is theoretically less-than-elegant. Instead, it would be nice to have constructions we clearly know are PRFs. Unfortunately the world is not quite that friendly to cryptography.

From a theoretical perspective, we know that PRFs can be constructed from pseudorandom generators (PRGs). We further know that PRGs can in turn be constructed from one-way functions (OWFs). The existence of the latter functions is one of the most basic assumptions we make in cryptography, which is a good way of saying we have no idea if they exist but we are hopeful. Indeed, this is the foundation of what’s called the “standard model.” But in practice the existence of OWFs remains a stubbornly open problem, bound tightly to the P/NP problem.

If that isn’t entirely satisfying to you, you might also like to know that we can also build (relatively) efficient PRFs based on the assumed hardness of a number of stronger mathematical assumptions, including things like the Decisional Diffie-Hellman assumption and various assumptions in lattices. Such things are nice mainly because they let us build cool things like oblivious PRFs.

Phew!

This has been a long piece, on that I’m glad to have gotten off my plate. I hope it will be helpful to a few people who are just starting out in cryptography and are itching to learn more. If you are one of these people and you plan to keep going, I urge you to take a look at a cryptography textbook like Katz/Lindell’s excellent textbook, or Goldreich’s (more theoretical) Foundations of Cryptography.

Top photo by Flickr user Dave DeSandro, used under CC license.

In defense of crypto(currency)

In defense of crypto(currency)

Last week a group of technologists, including Bruce Schneier, sent a letter to Congress outlining their concerns around cryptocurrency and urging Congress to regulate the space.

Now let me be the first to say that I broadly support this goal. I have no problem with the idea of legislators (intelligently) passing laws to regulate cryptocurrency. Indeed, given the level of insanity and the number of outright scams that are happening in this area, it’s pretty obvious that our current regulatory framework is not up to the task. If the recent letter simply asked for intelligent regulation, I’d gladly sign onto it. Unfortunately that’s not at all what this letter says. Instead, it argues that the entire technology field is worthless and cannot be used for any practical purpose.

Don’t take my word for it. I urge you to stop reading this post right now and take a look for yourself. I’ve helpfully reproduced some of the critical pieces below (emphasis added):

By its very design, blockchain technology, specifically so-called “public blockchains”, are poorly suited for just about every purpose currently touted as a present or potential source of public benefit. From its inception, this technology has been a solution in search of a problem and has now latched onto concepts such as financial inclusion and data transparency to justify its existence, despite far better solutions already in use. After more than thirteen years of development, it has severe limitations and design flaws that preclude almost all applications that deal with public customer data and regulated financial transactions and are not an improvement on existing non-blockchain solutions.

The catastrophes and externalities related to blockchain technologies and crypto-asset investments are neither isolated nor are they growing pains of a nascent technology. They are the inevitable outcomes of a technology that is not built for purpose and will remain forever unsuitable as a foundation for large-scale economic activity.

Frankly, this whole letter bums me out. Over the years I’ve spent a decent amount of time on Twitter calling out cryptocurrency shills who spout technical nonsense while promoting outright confidence games. I took for granted that my technical colleagues would be more a little more reasonable, particularly when speaking as technical experts to Congress and regulators. This is not simply someone “being wrong on the Internet.” These are important claims that deserve serious attention, and there are real consequences to being wrong here.

So while I appreciate the authors’ intentions, against my better judgement — and you’d better believe my better judgement is shouting at me for writing these words — I feel compelled to say something in defense of this technology area. “Public blockchain” technology enables many stupid things: today’s cryptocurrency schemes can be venal, corrupt, overpromised. But the core technology is absolutely not useless. In fact, I think there are some pretty exciting things happening in the field, even if most of them are further away from reality than their boosters would admit. Moreover, many of crypto’s technical problems are also amenable to some really exciting technical solutions, many of which are already here or on their way to deployment.

So instead of enjoying a beautiful week of Baltimore summer, I’m going to spend my time indoors, writing about what I think those things are — and (more or less incidentally) why I think these distinguished authors are wrong that “public blockchain technology” is a technical dead end. This post is not precisely a rebuttal to the letter above: instead, I’ve decided to phrase it as a general response to some of the more common spurious objections I hear people make to public blockchain systems. It just happens to be the case that a few of these (but not all) come up in the letter.

Finally, in the interest of full disclosure: I have designed privacy-preserving cryptocurrency systems (and still serve on a Foundation board of one), and I’m currently working on a startup that is trying to add regulatory compliance capabilities to public blockchains. (You can decide for yourself if this makes me a shill for Big Blockchain or Big Regulation. Frankly I’m not sure.)

Objection: “Cryptocurrency is terrible for the environment”

It’s probably best not to beat around the bush on this one: the most serious current objection to cryptocurrency (as it’s currently construed) is the massive environmental impact of proof-of-work (PoW) mining. I won’t devote a single second towards minimizing this concern. Many defenders have tried to paint the electricity consumption of Bitcoin and other PoW currencies as “green” or define it as a form of energy storage. This is dishonest nonsense: estimates hold that at east 60% of mining energy consumption still comes from fossil sources.

And that’s a lot of energy.

The totals depend on which smallish nation we’re using as our standard unit of energy consumption: this September 2021 NYT article has Bitcoin (all by itself) consuming nearly as much energy as Finland. Yet whereas Finland produces Nokia cellphones and lovable wooden clogs, Bitcoin just produces… bitcoin. Not to mention a pathetic transaction rate of about 3.5 tx/second across the entire globe, as of this week.

Overview of mining pool activity on the Bitcoin chain. Does that look decentralized to you?

Regardless of where you stand on cryptocurrency as a technology, you should understand that this wasteful resource consumption colors the public’s views of cryptocurrency in a highly negative way, and it is absolutely right for people to have these feelings because we are in a climate crisis and wasting this much goddamn energy on a single consensus protocol is pointless, harmful, and quite possibly evil — particularly when the result of all this energy consumption isn’t even a particularly decentralized network.

However: before you call for some kind of misguided cryptocurrency ban, you should understand that this is a temporary situation and not an intrinsic component of public blockchain technology.

Proof-of-work was chosen early in Bitcoin’s history because Bitcoin was designed to be operated by volunteers with personal computers. The concern in this setting was that a single user could mount a “Sybil attack” and pretend to be many different computers, thus dominating the construction of blocks on the network. Since verifiable real-world identities didn’t exist on the Internet, Nakamoto chose proof-of-work as an elegant solution: this approach makes your “vote” in the network organization proportional to the amount of computing power you possess. Since the typical early Bitcoin user only had one or a small number of computer CPUs to mine with, this kept the early Bitcoin network relatively decentralized.

Unfortunately modern proof-of-work mining looks nothing at all like the early Bitcoin network: today’s mining is a capital-intensive industry that competes to build entire datacenters full of specialized ASIC-based mining hardware. This change has undone most of the early decentralization benefits, while pointlessly burning tons of coal and natural gas.

But all is not lost.

Proof-of-work is not the only technology we have on which to build consensus protocols. Today, many forward-looking networks are deploying proof-of-stake (PoS) for their consensus. In these systems, your “voting power” in the network is determined by your ownership stake in some valuable on-chain asset, such as a new or existing electronic token. Since cryptocurrency has coincidentally spent a lot of time distributing tokens, this means that new protocols can essentially “cut out the middleman” and simply use coin ownership directly as a proxy for voting power, rather than requiring operators to sell their coins to buy electricity and mining hardware. Proof-of-stake systems are not perfect: they still lead to some centralization of power, since in this paradigm the rich tend to get richer. However it’s hard to claim that the result will be worse than the semi-centralized mess that proof-of-work mining has turned into.

And proof-of-stake is no longer theory. It’s already been deployed in production within a number of successful projects, including Avalanche, Cardano, Algorand, and Tezos. The Ethereum project is in the process of rolling out a proof-of-stake upgrade they call “Ethereum 2”, and while the final plans still seem a little handwavy for this late date, there has at least been some real progress in launching parts of the system. Beyond proof-of-stake, there are other technologies in deployment, such as the proof-of-time-and-space construction used by Chia, or more centralized proof-of-authority systems.

Now, admittedly: none of this solves the problem that much cryptocurrency is dirty today.

The letter is actually surprisingly nuanced about this objection.

But the question we should be asking is not whether to be angry about the power consumption of proof-of-work mining. We should be trying to figure out the right path out of this mess. And more concretely, whether there’s a path forward which is more likely to produce a good outcome than what is already happening in the industry — namely, that projects are rapidly deploying cleaner technologies to replace proof-of-work. Because it’s very unlikely that shade or hypothetical cryptocurrency bans are going to fix the problem any faster, and in fact: government overreaction could make it much, much worse by driving resources away from the cleaner chains that are coming online to solve the problem.

Objection: “Public blockchains can never support banking features like transaction reversal.”

One of the biggest problems in the field of cryptocurrency is that too many technical experts stopped looking at the field around 2015. This means they’ve missed a lot of the more interesting developments that have occurred in the past couple of years.

To give an example of this phenomenon, let’s take one claim that occurs very near the top of my colleagues’ letter:

This claim is not technically accurate. And worse, it indicates that the readers have missed several years’ of business and technological development. Unfortunately, correcting mistakes like this requires diving pretty deep into the technical weeds.

Blockchains work by assembling a data structure called an append-only ledger. Much like a traditional pen-and-paper bank ledger (shown at right), this ledger represents a list of events, such as currency transactions. A common feature of blockchain tech is that the ledger is constructed using a kind of “adversarial collaboration” that runs between many different computers. The upshot of this process is that entries on the ledger are very difficult to tamper with.

In the earliest cryptocurrency systems (like Bitcoin), the ledger is used to record the ownership and transfer of made-up tokens, such as the bitcoin currency. Since there is no trusted party or “bank” that manages the accounts on these systems, Bitcoin’s transaction rules are very simple and work like cash. If I sent money to your account, only you (using a cryptographic private key) should be able to control where it goes next. This is exciting and also a bit scary: there is no “undo” feature in these cash-like tokens.

By contrast, the modern retail-facing credit card and banking industries work very differently. In those industries there exist trusted parties (your bank or credit card’s customer-service representative) who can and do “reverse” fraudulent or mistaken transactions under specific circumstances. (Whether they will do so is a very different question.)

While it’s technically accurate that blockchain ledgers cannot easily be overwritten, it’s critical to understand that ledgers really has nothing really to do with transaction reversibility — any more than the specific writing instrument used by a historical bank (pen vs. pencil) determines whether a bank can reverse transactions. In practice, transaction reversal has nothing to do with the way a ledger is written. Transaction reversal is not about ledger technology, it’s about transaction rules and trust: it requires that there is someone that you trust to make transaction in your accounts without your explicit permission.

In other words, transaction reversibility is not about the ledger, but rather about the transaction rules that a currency uses. A reversible currency requires that someone anoint this trusted party (or trusted parties) and that they use their powers to freeze/burn/transact currency in ways that are at odds with the recorded owners’ intentions. And indeed, this is a capability that many tokens now possess, thanks to the development of sophisticated smart contract systems like Ethereum, that allow parties to design currencies with basically any set of transaction rules they want.

And what’s fascinating is that none of this is hypothetical!

One of the most interesting developments of the past several years is the deployment of several government-regulated “stablecoins” that represent tokenized versions of real fiat currency (e.g., dollars) in a bank account. More or less without exception, these regulated currencies, which are issued by licensed and government-regulated organixations like USDC and BUSD (and are not the same as unregulated algorithmic scam coins like UST) each possess a centralized party/committee that can “freeze”, mint, or “burn” money owned by any user in the system [BUSD code, USDC code]. A centralized manager of the currency can therefore “lock” the account of any illegitimate recipient and compensate the spender directly, or in some cases, the centralized manager can “burn” and mint new currency to send back to the originator.

Indeed, this capability is explicitly mandated by regulators:

It is reasonable to point out that, compared to mature banking systems, current stablecoins’ transaction reversal capabilities are quite rudimentary. From a business perspective there is no guarantee that a coin issuer will return your stolen money, nor that they can do so in the event that a thief has already passed your money on to a fence. (Just as there’s no guarantee that Zelle will do so.) Modern banks implement these features with fraud-detection features and with a combination of delayed settlement, insurance, and trust. Some of these are technical capabilities, but many are largely business questions. In either case, none are “antithetical to the design of public blockchains.” If reversibility is actually important to you as a feature, then you should pay attention to how these new regulated systems develop.

Two regulated stablecoins with freeze capability: Binance USD (issued by Paxos) and
(issued by Circle/Centre). Chart shows market capitalization from near-$0 in 2020 to about $60bn today.

Objection: “Cryptocurrency doesn’t scale [or the fees are too damned high]”

The early Bitcoin protocol was designed to be many things, but fast and efficient was not one. The system’s famously low transaction rate is effectively the result of several tradeoffs in the design of the network’s consensus algorithm: where centralized payment systems like Visa or Mastercard can scale horizontally — by assigning different computers to handle various subsets of user transactions — in Bitcoin (and Ethereum, and most other extant platforms) every single participating node must validate every single transaction made by anyone in the system. This means that simply adding more computing power doesn’t produce a faster network.

The result is pretty dismal. Bitcoin’s transaction rate has historically topped out around 7 tx/sec worldwide (although recent upgrades may improve things slightly.) Ethereum pulls off maybe 20-30 tx/sec by sailing a bit closer to the wind. Meanwhile: networks like Visa handle about 1700 tx/sec on an ordinary day (!), and 10x that rate on major holidays.

This scaling problem makes cryptocurrency unworkable for just about any mainstream payments application. A single popular mobile game like Candy Crush probably conducts enough in-game transactions to challenge the Ethereum network.

via Etherscan.

And because the transaction rate in these so-called Layer 1 (L1) cryptocurrency networks is so low, competition for scarce network resources translates into high transaction fees. That’s why it recently cost $22 (!) to send a single token transaction on Ethereum, and much more to handle sophisticated transactions like DeFi swaps. These prices are fine if you’re a crypto speculator doing $1,000+ trades. But they rule out anything as mundane as paying people.

This sounds pretty bad. However, an important lesson I’ve learned in my career is this: if people are sufficiently motivated and the only barrier is an engineering problem, then it’s probably wise not to bet against them.

With some exceptions, the cryptocurrency community has acknowledged that existing techniques are unscalable, and they are engineering to try to get around this. The result has gone in two different directions. Both are unproven, and the resolution of these developments will determine exactly how well the field can grow in the future.

More money, more networks. The most obvious way to scale L1 cryptocurrencies is just to build more of them. This describes a lot of the action in 2019-2021: as networks like Ethereum saturate with transactions and become expensive, new entrants are deploying compatible (and sometimes incompatible) chains that provide more transaction capacity without the fees. These networks often look like Ethereum (in that they’re Nakamoto-consensus, proof-of-work mined systems), but sometimes they’re stripped down or use faster consensus technologies (examples include Avalanche, Polygon, Celo and Solana.) Some are just centralized clones of Ethereum.

This might stave off total chaos, but it probably isn’t sustainable. Even if all these networks work perfectly — and that’s a big assumption — the problem with adding more networks is that your ecosystem fragments. If funds are on the Ethereum chain and you want to use an application that lives on a different network, how do you get your funds over there? The solution today mainly involves “bridges” — semi-centralized parties that will accept your money on one L1 network and provide you with funds on another. If you want to move your funds back to the original network, that’s another pass through the bridge, with fees on both networks. It also means you have to trust the integrity of both the bridge itself and the destination network, which might itself be much less robust (and safe) than the original network you started from. It’s possible this approach will work well enough to enable most applications, but I wouldn’t bet on it.

Rollups. The second approach, heavily promoted by the Ethereum Foundation, is to improve the scaling of individual L1 networks like Ethereum by performing transactions on some second-layer, with the most common proposal being a “rollup server.”

Rollup servers are centralized machines that can validate many transactions quickly. One rollup server doesn’t handle every payment or smart contract on the chain. Instead, it operates over one or a small number of applications (say, a few specific smart contracts.) Users submit their transactions either to the chain itself, or directly to the server, which then validates the transactions and posts a short “proof” to the L1 chain asserting that a large collection of transactions has been checked and found to be valid. The idea here is to reduce the amount of computation and storage that the L1 nodes must perform to check these transactions: rather than validating 10,000 individual transactions, the L1 nodes simply verify one short assertion posted by the rollup server.

This sounds like magic, and it is, sort of. There are two approaches to building rollups, both of which are in “experimental” production today:

  • Optimistic rollups use a “trust and punish” approach. The rollup server posts a financial bond to assure the world that it will correctly validate all of its transactions. In the unlikely event that the rollup server “cheats” and authorizes an invalid transaction, any third party whistleblower can submit a “fraud proof” that proves the rollup server’s failure. The L1 network can check these proofs, which will invalidates the bad transactions and pay the whistleblower a large reward.
  • ZK rollups use cryptographic technology drawn from the field of zero-knowledge protocols, such as SNARK or STARK proofs, so the server can “prove” that all transactions were validated correctly before it posts the summary results to the chain. In principle this means the L1 chain can verify a short “proof” that covers many thousand complicated transactions, with (essentially) no possibility of cheating.

Rollups sound like a terrific idea, and the Ethereum community has bet heavily on the tech. But it’s worth pointing out that in practice nobody knows how well things will actually bear out when these systems are in widespread deployment.

One problem is that rollups today are largely focused on reducing the computational burden of verifying transactions. This is a big deal, particularly for Ethereum where verifying complex smart contract executions is quite costly. But even with today’s rollups, L1 nodes are still expected to store and transmit the raw transaction data: without keeping these transactions around, the loss of a rollup server could freeze an entire smart contract in place, blocking any further progress. This means scaling bottlenecks still exist — they’ll just be hit when nodes run low on bandwidth and storage, rather than compute.

That’s still a potentially huge improvement and many folks are optimistic: Vitalik Buterin calculates maximum possible scaling improvements on the order of 100x for rollup servers, though he quickly tempers this calculation by noting practical concerns.

In any case, the point of this section is not to claim that scaling is a solved problem. Rather, the point here is that scale improvements are on the minds of a lot of smart engineers, and there are solutions on the way. The results might not be perfect and presumably there will be much experimentation before we settle on a workable approach, but the end result will almost certainly work fine at some level of scalability. The main question is simply how robust and decentralized the result will be.

Objection: “There is no privacy on blockchains (or there is too much privacy)”

Public blockchains rely on volunteers to operate a network that verifies transactions. The implication of this design is that the transaction data itself must be publicly viewable. While a few naive people still believe that these currencies anonymous, the truth is quite different: these older public chains expose your transactions to anyone who wants to see them. In those systems, your main protection is that transactions use a pseudonym (called an address) in place of your real identity.

A random transaction from a Bitcoin block explorer.

Some advocates once felt that pseudonymity was good enough for privacy, but this belief has receded a bit as sophisticated “chain analysis” companies have grown up and made progress towards identifying the real owners of various accounts. The lack of privacy on these systems poses a real challenge for those who would like to build financial infrastructure on public chains.

The good news is that researchers have made a lot of progress on solving privacy problems around cryptocurrency. We now have deployed privacy tech that allows users to conduct transactions on public blockchains without revealing any information that they do not wish to reveal. These systems work by (effectively) encrypting transactions and using sophisticated zero-knowledge proofs to convince verifiers that the transactions data is consistent (i.e., that encrypted transaction amounts “add up” and do not create money out of thin air.) There now exist several deployed cryptocurrencies that provide strong privacy even against government surveillance, and law enforcement has expressed concerns about them.

Text from US Executive Order dated March 2022.

Still, to criticize an early technology for being both private and not-private-enough has a “nobody goes there, it’s too crowded” feel to it. What is true is that over many years the traditional financial system has learned to walk a tightrope where customer privacy is balanced on one hand against anti-money-laundering regulations on the other (as regulated by laws like the Bank Secrecy Act in the US.) Achieving those goals in today’s financial infrastructure has not been without cost, however. Our privacy as individuals has been vastly degraded by technological developments, with little opportunity for democratic debate. And the result sucks: collecting all of our private data and securing it is expensive, an expense we all pay for in high fees and catastrophic breaches. The cryptographic privacy offered by cryptocurrency is exciting because offers a different road forward: one that promises to keep irrelevant data in our own hands.

Even TornadoCash has regulatory compliance. Sort of.

The actual form that these mature systems take is unknown to me. Perhaps they’ll use zero-knowledge policies that keep smaller transactions private, while ensuring that larger transactions are known to regulators or other parties. I’m not excited to use those systems, because I think they’ll be risky. But at very least they’ll be better than our current collect-it-all-and-then-hand-it-to-hackers approach, which certainly has not done us very many favors.

So why do I care about any of this?

To put it simply: because payments are important. And because something is badly wrong.

Credit card merchant fees have risen since this ad was on TV.

Over the past four decades, computer networking has radically changed the economics of just about every industry that relies heavily on IT. Google made information access so easy that we can barely remember the world before it existed. My kids refuse to believe that I once paid $1/minute for long distance phone calls. It’s so inexpensive to start an online retailer that we now have more online pet food stores than we have drive-in movie theaters in the United States.

And yet if you looked at the money-transfer and payments industry, you’d see no such changes. Credit card merchant fees are similar, or have actually risen in the United States since the 1990s, and that is an absolute tragedy — since these fees are baked into the cost of most retail goods and thus fall heavily on the working poor (who pay them even if they use cash.)

If all you care about is technology, consumer payment tech has improved at a glacial pace. It took nearly two decades to roll out anti-fraud improvements like EMV chipchards and tap-to-pay (NFC) in the US. Shopping online in 1995 meant typing credit card numbers into webforms. In 2022… it mostly means exactly the same thing. As a result, online payment fraud has ballooned to around $200bn in 2020. And forget about real innovation like payment privacy or pay-by-phone (which is ubiquitous in Kenya and China but still in its infancy here in the US, and will likely only be “solved” by giving Apple and Google total control over payments.)

Why are these IT-focused industries so consistently immune to the same technological improvements and cost reductions we see everywhere else?

I’m only a computer scientist, so I’m going to let someone else answer that question. I can only tell you that what we have right now is not functioning properly: I suspect that legacy industry and regulators have smothered two generations of technological improvement, largely (I suspect) by building a (mostly) closed and permissioned financial system. And this is a big deal: payments are too important to our economy to entrust them to 1970s-era technology and an extractive industry. We don’t even know what novel applications — Googles, Facebooks, Wikipedias, Instagrams — we’re missing out on because the industry simply won’t allow them to exist.

I don’t know if blockchains are the solution to this problem. I see indications that the technology is finally starting to grow up in ways that seem like a harbinger of major positive changes on the horizon. Progress here is slow, though in some cases because the regulatory apparatus is throwing sand in the gears of cooperative products, and/or utterly failing to move expeditiously to uncover possible fraud. And maybe the result won’t even be a success for blockchain solutions: perhaps we’ll simply get more and better offerings from “traditional finance” industry as they start to wake up to the fact that more open systems can compete with their closed offerings.

So while I don’t know if cryptocurrency will be the answer, I’m just hopeful that something will be.

Title image by Flickr user Joegoauk Goa, used under CC license.

What is the random oracle model and why should you care? (Part 5)

What is the random oracle model and why should you care? (Part 5)

This is part five of a series on the Random Oracle Model.  See here for the previous posts:

Part 1: An introduction
Part 2: The ROM formalized, a scheme and a proof sketch
Part 3: How we abuse the ROM to make our security proofs work
Part 4: Some more examples of where the ROM is used

About eight years ago I set out to write a very informal piece on a specific cryptographic modeling technique called the “random oracle model”. This was way back in the good old days of 2011, which was a more innocent and gentle era of cryptography. Back then nobody foresaw that all of our standard cryptography would turn out to be riddled with bugs; you didn’t have to be reminded that “crypto means cryptography“. People even used Bitcoin to actually buy things.

That first random oracle post somehow sprouted three sequels, each more ridiculous than the last. I guess at some point I got embarrassed about the whole thing — it’s pretty cheesy, to be honest — so I kind of abandoned it unfinished. And that’s been a major source of regret for me, since I had always planned a fifth, and final post, to cap the whole messy thing off. This was going to be the best of the bunch: the one I wanted to write all along.

To give you some context, let me briefly remind you what the random oracle model is, and why you should care about it. (Though you’d do better just to read the series.)

The random oracle model is a bonkers way to model (reason about) hash functions, in which we assume that these are actually random functions and use this assumption to prove things about cryptographic protocols that are way more difficult to prove without such a model. Just about all the “provable” cryptography we use today depends on this model, which means that many of these proofs would be called into question if it was “false”.

And to tease the rest of this post, I’ll quote the final paragraphs of Part 4, which ends with this:

You see, we always knew that this ride wouldn’t last forever, we just thought we had more time. Unfortunately, the end is nigh. Just like the imaginary city that Leonardo de Caprio explored during the boring part of Inception, the random oracle model is collapsing under the weight of its own contradictions. 

As promised, this post will be about that collapse, and what it means for cryptographers, security professionals, and the rest of us.

First, to make this post a bit more self-contained I’d like to recap a few of the basics that I covered earlier in the series. You can feel free to skip this part if you’ve just come from there.

In which we (very quickly) remind the reader what hash functions are, what random functions are, and what a random oracle is.

As discussed in the early sections of this series, hash functions (or hashing algorithms) are a standard primitive that’s used in many areas of computer science. They take in some input, typically a string of variable length, and repeatably output a short and fixed-length “digest”. We often denote these functions as follows:

{\sf digest} \leftarrow H({\sf message})

Cryptographic hashing takes this basic template and tacks on some important security properties that we need for cryptographic applications. Most famously these provide  well-known properties like collision resistance, which is needed for applications like digital signatures. But hash functions turn up all over cryptography, sometimes in unexpected places — ranging from encryption to zero-knowledge protocols — and sometimes these systems demand stronger properties. Those can sometimes be challenging to put into formal terms: for example, many protocols require a hash function to produce output that is extremely “random-looking”.*

In the earliest days of provably security, cryptographers realized that the ideal hash function would behave like a “random function”. This term refers to a function that is uniformly sampled from the set of all possible functions that have the appropriate input/output specification (domain and range). In a perfect world your protocol could, for example, randomly sample one of vast number of possible functions at setup, bake the identifier of that function into a public key or something, and then you’d be good to go.

Unfortunately it’s not possible to actually use random functions (of reasonably-sized domain and range) in real protocols. That’s because sampling and evaluating those functions is far too much work.

For example, the number of distinct functions that consume a piddly 256-bit input and produce a 256-bit digest is a mind-boggling (2^{256})^{2^{256}}. Simply “writing down” the identity of the function you chose would require memory that’s exponential in the function’s input length. Since we want our cryptographic algorithms to be efficient (meaning, slightly more formally, they run in polynomial time), using random functions is pretty much unworkable.

So we don’t use random functions to implement our hashing. Out in “the real world” we use weird functions developed by Belgians or the National Security Agency, things like like SHA256 and SHA3 and Blake2. These functions come with blazingly fast and tiny algorithms for computing them, most of which occupy few dozen lines of code or less. They certainly aren’t random, but as best we can tell, the output looks pretty jumbled up.

Still, protocol designers continue to long for the security that using  truly random function could give their protocol. What if, they asked, we tried to split the difference. How about we model our hash functions using random functions — just for the sake of writing our security proofs —  and then when we go to implement (or “instantiate”) our protocols, we’ll go use efficient hash functions like SHA3? Naturally these proofs wouldn’t exactly apply to the real protocol as instantiated, but they might still be pretty good.

A proof that uses this paradigm is called a proof in the random oracle model, or ROM. For the full mechanics of how the ROM works you’ll have to go back and read the series from the beginning. What you do need to know right now is that proofs in this model must somehow hack around the fact that evaluating a random function takes exponential time. The way the model handles this is simple: instead of giving the individual protocol participants a description of the hash function itself — it’s way too big for anyone to deal with — they give each party (including the adversary) access to a magical “oracle” that can evaluate the random function H efficiently, and hand them back a result.

This means that any time one of the parties wants to compute the function H({\sf message}) they don’t do it themselves. They instead calling out to a third party, the “random oracle” who keeps a giant table of random function inputs and outputs. At a high level, the model looks like sort of like this:

b68a0-diagram

Since all parties in the system “talk” to the same oracle, they all get the same hash result when they ask it to hash a given message. This is a pretty good standin for what happens with a real hash function. The use of an outside oracle allows us to “bury” the costs of evaluating a random function, so that nobody else needs to spend exponential time evaluating one. Inside this artificial model, we get ideal hash functions with none of the pain.

This seems pretty ridiculous already…

It absolutely is!

However — I think there are several very important things you should know about the random oracle model before you write it off as obviously inane:

1. Of course everyone knows random oracle proofs aren’t “real”. Most conscientious protocol designers will admit that proving something secure in the random oracle model does not actually mean it’ll be secure “in the real world”. In other words, the fact that random oracle model proofs are kind of bogus is not some deep secret I’m letting you in on.

2. And anyway: ROM proofs are generally considered a useful heuristic. For those who aren’t familiar with the term, “heuristic” is a word that grownups use when they’re about to secure your life’s savings using cryptography they can’t prove anything about.

I’m joking! In fact, random oracle proofs are still quite valuable. This is mainly because they often help us detect bugs in our schemes. That is, while a random oracle proof doesn’t imply security in the real world, the inability to write one is usually a red flag for protocols. Moreover, the existence of a ROM proof is hopefully an indicator that the “guts” of the protocol are ok, and that any real-world issues that crop up will have something to do with the hash function.

3. ROM-validated schemes have a pretty decent track record in practice. If ROM proofs were kicking out absurdly broken schemes every other day, we would probably have abandoned this technique. Yet we use cryptography that’s proven (only) in the ROM just about ever day — and mostly it works fine.

This is not to say that no ROM-proven scheme has ever been broken, when instantiated with a specific hash function. But normally these breaks happen because the hash function itself is obvious broken (as happened when MD4 and MD5 both cracked up a while back.) Still, those flaws are generally fixed by simply switching to a better function. Moreover, the practical attacks are historically more likely to come from obvious flaws, like the discovery of hash collisions screwing up signature schemes, rather than from some exotic mathematical flaw. Which brings us to a final, critical note…

4. For years, many people believed that the ROM could actually be saved. This hope was driven by the fact that ROM schemes generally seemed to work pretty well when implemented with strong hash functions, and so perhaps all we needed to do was to find a hash function that was “good enough” to make ROM proofs meaningful. Some theoreticians hoped that fancy techniques like cryptographic obfuscation could somehow be used to make concrete hashing algorithms that behaved well enough to make (some) ROM proofs instantiable.**

So that’s kind of the state of the ROM, or at least — it was the state up until the late 1990s. We knew this model was artificial, and yet it stubbornly refused to explode or produce totally nonsense results.

And then, in 1998, everything went south.

CGH98: an “uninstantiable” scheme

For theoretical cryptographers, the real breaking point for the random oracle model came in the form of a 1998 STOC paper by Canetti, Goldreich and Halevi (henceforth CGH). I’m going to devote the rest of this (long!) post to explaining the gist of what they found.

What CGH proved was that, in fact, there exist cryptographic schemes that can be proven perfectly secure in the random oracle model, but that — terrifyingly — become catastrophically insecure the minute you instantiate the hash function with any concrete function.

This is a really scary result, at least from the point of view of the provable security community. It’s one thing to know in theory that your proofs might not be that strong. It’s a different thing entirely to know that in practice there are schemes that can walk right past your proofs like a Terminator infiltrating the Resistance, and then explode all over you in the most serious way.

Before we get to the details of CGH and its related results, a few caveats.

First, CGH is very much a theory result. The cryptographic “counterexample” schemes that trip this problem generally do not look like real cryptosystems that we would use in practice, although later authors have offered some more “realistic” variants. They are, in fact, designed to do very artificial things that no “real” scheme would ever do. This might lead readers to dismiss them on the grounds of artificiality.

The problem with this view is that looks aren’t a particularly scientific way to judge a scheme. Both “real looking” and “artificial” schemes are, if proven correct, valid cryptosystems. The point of these specific counterexamples is to do deliberately artificial things in order to highlight the problems with the ROM. But that does not mean that “realistic” looking schemes won’t do them.

A further advantage of these “artificial” schemes is that they make the basic ideas relatively easy to explain. As a further note on this point: rather than explaining CGH itseld, I’m going to use a formulation of the same basic result that was proposed by Maurer, Renner and Holenstein (MRH).

A signature scheme

The basic idea of CGH-style counterexamples is to construct a “contrived” scheme that’s secure in the ROM, but totally blows up when we “instantiate” the hash function using any concrete function, meaning a function that has a real description and can be efficiently evaluated by the participants in the protocol.

While the CGH techniques can apply with lots of different types of cryptosystem, in this explanation, we’re going to start our example using a relatively simple type of system: a digital signature scheme.

You may recall from earlier episodes of this series that a normal signature scheme consists of three algorithms: key generation, signing, and verification. The key generation algorithm outputs a public and secret key. Signing uses the secret key to sign a message, and outputs a signature. Verification takes the resulting signature, the public key and the message, and determines whether the signature is valid: it outputs “True” if the signature checks out, and “False” otherwise.

Traditionally, we demand that signature schemes be (at least) existentially unforgeable under chosen message attack, or UF-CMA. This means that that we consider an efficient (polynomial-time bounded) attacker who can ask for signatures on chosen messages, which are produced by a “signing oracle” that contains the secret signing key. Our expectation of a secure scheme is that, even given this access, no attacker will be able to come up with a signature on some new message that she didn’t ask the signing oracle to sign for her, except with negligible probability.****

Having explained these basics, let’s talk about what we’re going to do with it. This will involve several steps:

Step 1: Start with some existing, secure signature scheme. It doesn’t really matter what signature scheme we start with, as long as we can assume that it’s secure (under the UF-CMA definition described above.) This existing signature scheme will be used as a building block for the new scheme we want to build.*** We’ll call this scheme S.

Step 2: We’ll use the existing scheme S as a building block to build a “new” signature scheme, which we’ll call {\bf S_{\sf broken}}. Building this new scheme will mostly consist of grafting weird bells and whistles onto the algorithms of the original scheme S.

Step 3: Having described the working of {\bf S_{\sf broken}} in detail, we’ll argue that it’s totally secure in the ROM. Since we started with an (assumed) secure signature scheme S, this argument mostly comes down to showing that in the random oracle model the weird additional features we added in the previous step don’t actually make the scheme exploitable.

Step 4: Finally, we’ll demonstrate that {\bf S_{\sf broken}} is totally broken when you instantiate the random oracle with any concrete hash function, no matter how “secure” it looks. In short, we’ll show that one you replace the random oracle with a real hash function, there’s a simple attack that always succeeds in forging signatures.

We’ll start by explaining how {\bf S_{\sf broken}} works.

Building a broken scheme

To build our contrived scheme, we begin with the existing secure (in the UF-CMA sense) signature scheme S. That scheme comprises the three algorithms mentioned above: key generation, signing and verification.

We need to build the equivalent three algorithms for our new scheme.

To make life easier, our new scheme will simply “borrow” two of the algorithms from S, making no further changes at all. These two algorithms will be the key generation and signature verification algorithms So two-thirds of our task of designing the new scheme is already done.

Each of the novel elements that shows up in {\bf S_{\sf broken}} will therefore appear in the signing algorithm. Like all signing algorithms, this algorithm takes in a secret signing key and some message to be signed. It will output a signature.

At the highest level, our new signing algorithm will have two subcases, chosen by a branch that depends on the input message to be signed. These two cases are given as follows:

The “normal” case: for most messages M, the signing algorithm will simply run the original signing algorithm from the original (secure) scheme S. This will output a perfectly nice signature that we can expect to work just fine.

The “evil” case: for a subset of (reasonably-sized) messages that have a different (and very highly specific) form, our signing algorithm will not output a signature. It will instead output the secret key for the entire signature scheme. This is an outcome that cryptographers will sometimes call “very, very bad.”

So far this description still hides all of the really important details, but at least it gives us an outline of where we’re trying to go.

Recall that under the UF-CMA definition I described above, our attacker is allowed to ask for signatures on arbitrary messages. When we consider using this definition with our modified signing algorithm, it’s easy to see that the presence of these two cases could make things exciting.

Specifically: if any attacker can construct a message that triggers the “evil” case, her request to sign a message will actually result in her obtaining the scheme’s secret key. From that point on she’ll be able to sign any message that she wants — something that obviously breaks the UF-CMA security of the scheme. If this is too theoretical for you: imagine requesting a signed certificate from LetsEncrypt, and instead obtaining a copy of LetsEncrypt’s signing keys. Now you too are a certificate authority. That’s the situation we’re describing.

The only way this scheme could ever be proven secure is if we could somehow rule out the “evil” case happening at all.

More concretely: we would have to show that no attacker can construct a message that triggers the “evil case” — or at least, that their probability of coming up with such a message is very, very low (negligible). If we could prove this, then our scheme {\bf S_{\sf broken}} basically just reduces to being the original secure scheme. Which means our new scheme would be secure.

In short: what we’ve accomplished is to build a kind of “master password” backdoor into our new scheme {\bf S_{\sf broken}}. Anyone who knows the password can break the scheme. Everything now depends on whether an attacker can figure out that password.

So what is the “backdoor”?

The message that breaks the scheme {\bf S_{\sf broken}} isn’t a password at all, of course. Because this is computer science and nothing is ever easy, the message will actually be a computer program. We’ll call it P.

More concretely, it will be some kind of program that can decoded within our new signing algorithm, and then evaluated (on some input) by an interpreter that we will also place within that algorithm.

If we’re being formal about this, we’d say the message contains an encoding of a program for a universal Turing machine (UTM), along with a unary-encoded integer t that represents the number of timesteps that the machine should be allowed to run for. However, it’s perfectly fine with me if you prefer to think of the message as containing a hunk of Javascript, an Ethereum VM blob combined with some maximum “gas” value to run on, a .tgz encoding of a Docker container, or any other executable format you fancy.

What really matters is the functioning of the program P.

A program P that successfully triggers the “evil case” is one that contains an efficient (e.g., polynomial-sized) implementation of a hash function. And not just any hash function. To actually trigger the backdoor, the algorithm P must a function that is identical to, or at least highly similar to, the random oracle function H.

There are several ways that the signing algorithm can verify this similarity. The MRH paper gives a very elegant one, which I’ll discuss further below. But for the purposes of this immediate intuition, let’s assume that our signing algorithm verifies this similarity probabilistically. Specifically: to check that P matches H, it won’t verify the correspondence at every possible input. It might, for example, simply verify that P(x) = H(x) for some large (but polynomial) number of random input values x.

So that’s the backdoor.

Let’s think briefly about what this means for security, both inside and outside of the random oracle mode.

Case 1: in the random oracle model

Recall that in the random oracle model, the “hash function” H is modeled as a random function. Nobody in the protocol actually has a copy of that function, they just have access to a third party (the “random oracle”) who can evaluate it for them.

If an attacker wishes to trigger the “evil case” in our signing scheme, they will somehow need to download a description of the random function from the oracle. then encode it into a program P, and send it to the signing oracle. This seems fundamentally hard.

To do this precisely — meaning that P would match H on every input — the attacker would need to query the random oracle on every possible input, and then design a program P that encodes all of these results. It suffices to say that this strategy would not be practical: it would require an exponential amount of time to do any of these, and the size of P would also be exponential in the input length of the function. So this attacker would seem virtually guaranteed to fail.

Of course the attacker could try to cheat: make a small function P that only matches H on a small of inputs, and hope that the signer doesn’t notice. However, even this seems pretty challenging to get away with. For example, to perform a probabilistic check, the signing algorithm can simply verify that P(x) = H(x) for a large number of random input points x. This approach will catch a cheating attacker with very high probability.

(We will end up using a slightly more elegant approach to checking the function and arguing this point further below.)

The above is hardly an exhaustive security analysis. But at a high level our argument should now be clear: in the random oracle model, the scheme {\bf S_{\sf broken}} is secure because the attacker can’t know a short enough backdoor “password” that breaks the scheme. Having eliminated the “evil case”, the scheme {\bf S_{\sf broken}} simply devolves to the original, secure scheme S.

Case 2: In the “real world”

Out in the real world, we don’t use random oracles. When we want to implement a scheme that has a proof in the ROM, we must first “instantiate” the scheme by substituting in some real hash function in place of the random oracle H.

This instantiated hash function must, by definition, be efficient to evaluate and describe. This means implicitly that it possesses a polynomial-size description and can be evaluated in expected polynomial time. If we did not require this, our schemes would never work. Moreover, we must further assume that all parties, including the attacker, possess a description of the hash function. That’s a standard assumption in cryptography, and is merely a statement of Kerckhoff’s principle.

With these facts stipulated, the problem with our new signature scheme becomes obvious.

In this setting, the attacker actually does have access to a short, efficient program P that matches the hash function H. In practice, this function will probably be something like SHA2 or Blake2. But even in a weird case where it’s some crazy obfuscated function, the attacker is still expected to have a program that they can efficiently evaluate. Since the attacker possesses this program, they can easily encode it into a short enough message and send it to the signing oracle.

When the signing algorithm receives this program, it will perform some kind of test of this function P against its own implementation of H, and — when it inevitably finds a match between the two functions with high probability — it will output the scheme’s secret key.

Hence, out in the real world our scheme {\bf S_{\sf broken}} is always and forever, totally broken.

A few boring technical details (that you can feel free to skip)

If you’re comfortable with the imprecise technical intuition I’ve given above, feel free to skip this section. You can jump on to the next part, which tries to grapple with tough philosophical questions like “what does this mean for the random oracle model” and “I think this is all nonsense” and “why do we drive on a parkway, and park in a driveway?

All I’m going to do here is clean up a few technical details.

One of the biggest pieces that’s missing from the intuition above is a specification of how the signing algorithm verifies that the program P it receives from the attacker actually “matches” the random oracle function H. The obvious way is to simply evaluate P(x) = H(x) on every possible input x, and output the scheme’s secret key if every comparison succeeds. But doing this exhaustively requires exponential time.

The MRH paper proposes a very neat alternative way to tackle this. They propose to test the functions on a few input values, and not even random ones. More concretely, they propose checking that P(x) = H(x) for values of x \in \{1, \dots, q\} with the specific requirement that q is an integer such that q = 2|P| + k. Here |P| represents the length of the encoding of program P in bits, and k is the scheme’s adjustable security parameter (for example, k=128).

What this means is that to trigger the backdoor, the attacker must come up with a program P that can be described in some number of bits (let’s call it n) , and yet will be able to correctly match the outputs of H at e.g., q=2n+128 different input points. If we conservatively assume that H produces (at least) a 1-bit digest, that means we’re effectively encoding at least 2n+128 bits of data into a string of length n.

If the function H is a real hash function like SHA256, then it should be reasonably easy for the attacker to find some n-bit program that matches H at, say, q=2n+128 different points. For example, here’s a Javascript implementation of SHA256 that fits into fewer than 8,192 bits. If we embed a Javascript interpreter into our signing algorithm, then it simply needs to evaluate this given program on q = 2(8,192)+128 = 16,512 different input points, compare each result to its own copy of SHA256, and if they all match, output the secret key.

However, if H is a random oracle, this is vastly harder for the attacker to exploit. The result of evaluating a random oracle at q distinct points should be a random string of (at minimum) q bits in length. Yet in order for the backdoor to be triggered, we require the encoding of program P to be less than half that size. You can therefore think of the process by which the attacker compresses a random string into that program P to be a very effective compression algorithm, one takes in a random string, and compresses it into a string of less than half the size.

Despite what you may have seen on Silicon Valley (NSFW), compression algorithms do not succeed in compressing random strings that much with high probability. Indeed, for a given string of bits, this is so unlikely to occur that the attacker succeeds with at probability that is at most negligible in the scheme’s security parameter k. This effectively neutralizes the backdoor when H is a random oracle.

Phew.

So what does this all mean?

Judging by actions, and not words, the cryptographers of the world have been largely split on this question.

Theoretical cryptographers, for their part, gently chuckled at the silly practitioners who had been hoping to use random functions as hash functions. Brushing pipe ash from their lapels, they returned to more important tasks, like finding ways to kill off cryptographic obfuscation.

Applied academic cryptographers greeted the new results with joy — and promptly authored 10,000 new papers, each of which found some new way to remove random oracles from an existing construction — while at the same time making said construction vastly slower, more complicated, and/or based on entirely novel made-up and flimsy number-theoretic assumptions. (Speaking from personal experience, this was a wonderful time.)

Practitioners went right on trusting the random oracle model. Because really, why not?

And if I’m being honest, it’s a bit hard to argue with the practitioners on this one.

That’s because a very reasonable perspective to take is that these “counterexample” schemes are ridiculous and artificial. Ok, I’m just being nice. They’re total BS, to be honest. Nobody would ever design a scheme that looks so ridiculous.

Specifically, you need a scheme that explicitly parses an input as a program, runs that program, and then checks to see whether the program’s output matches a different hash function. What real-world protocol would do something so stupid? Can’t we still trust the random oracle model for schemes that aren’t stupid like that?

Well, maybe and maybe not.

One simple response to this argument is that there are examples of schemes that are significantly less artificial, and yet still have random oracle problems. But even if one still views those results as artificial — the fact remains that while we only know of random oracle counterexamples that seem artificial, there’s no principled way for us to prove that the badness will only affect “artificial-looking” protocols. In fact, the concept of “artificial-looking” is largely a human judgement, not something one can realiably think about mathematically.

In fact, at any given moment someone could accidentally (or on purpose) propose a perfectly “normal looking” scheme that passes muster in the random oracle model, and then blows to pieces when it gets actually deployed with a standard hash function. By that point, the scheme may be powering our certificate authority infrastructure, or Bitcoin, or our nuclear weapons systems (if one wants to be dramatic.)

The probability of this happening accidentally seems low, but it gets higher as deployed cryptographic schemes get more complex. For example, people at Google are now starting to deploy complex multi-party computation and others are launching zero-knowledge protocols that are actually capable of running (or proving things about the execution of) arbitrary programs in a cryptographic way. We can’t absolutely rule out the possibility that the CGH and MRH-type counterexamples could actually be made to happen in these weird settings, if someone is a just a little bit careless.

It’s ultimately a weird and frustrating situation, and frankly, I expect it all to end in tears.

Photo by Flickr user joyosity.

Notes:

* Intuitively, this definition sounds a lot like “pseudorandomness”. Pseudorandom functions are required to be indistinguishable from random functions only in a setting where the attacker does not know some “secret key” used for the function. Whereas hash functions are often used in protocols where there is no opporunity to use a secret key, such as in public key encryption protocols.

** One particular hope was that we could find a way to obfuscate pseudorandom function families (PRFs). The idea would be to wrap up a keyed PRF that could be evaluated by anyone, even if they didn’t actually know the key. The result would be indistinguishable from a random function, without actually being one.

*** It might seem like “assume the existence of a secure signature scheme” drags in an extra assumption. However: if we’re going to make statements in the random oracle model it turns out there’s no additional assumption. This is because in the ROM we have access to “secure” (at least collision-resistant, [second] pre-image resistant) hash function, which means that we can build hash-based signatures. So the existence of signature schemes comes “free” with the random oracle model.

**** The “except with negligible probability [in the adjustable security parameter of the scheme]” caveat is important for two reasons. First, a dedicated attacker can always try to forge a signature just by brute-force guessing values one at a time until she gets one that satisfies the verification algorithm. If the attacker can run for an unbounded number of time steps, she’ll always win this game eventually. This is why modern complexity-theoretic cryptography assumes that our attackers must run in some reasonable amount of time — typically a number of time steps that is polynomial in the scheme’s security parameter. However, even a polynomial-time bounded adversary can still try to brute force the signature. Her probability of succeeding may be relatively small, but it’s non-zero: for example, she might succeed after the first guess. So in practice what we ask for in security definitions like UF-CMA is not “no attacker can ever forge a signature”, but rather “all attackers succeed with at most negligible probability [in the security parameter of the scheme]”, where negligible has a very specific meaning.

Zero Knowledge Proofs: An illustrated primer

One of the best things about modern cryptography is the beautiful terminology. You could start any number of punk bands (or Tumblrs) named after cryptography terms like ‘hard-core predicate’, ‘trapdoor function’, or ‘impossible differential cryptanalysis’. And of course, I haven’t even mentioned the one term that surpasses all of these. That term is ‘zero knowledge‘.

In fact, the term ‘zero knowledge’ is so appealing that it leads to problems. People misuse it, assuming that zero knowledge must be synonymous with ‘really, really secure‘. Hence it gets tacked onto all kinds of stuff — like encryption systems and anonymity networks — that really have nothing to do with true zero knowledge protocols.

This all serves to underscore a point: zero-knowledge proofs are one of the most powerful tools cryptographers have ever devised. But unfortunately they’re also relatively poorly understood. In this series of posts I’m going try to give a (mostly) nonmathematical description of what ZK proofs are, and what makes them so special. In this post and the next I’ll talk about some of the ZK protocols we actually use.

Origins of Zero Knowledge

The notion of ‘zero knowledge’ was first proposed in the 1980s by MIT researchers Shafi Goldwasser, Silvio Micali and Charles Rackoff. These researchers were working on problems related to interactive proof systems, theoretical systems where a first party (called a ‘Prover’) exchanges messages with a second party (‘Verifier’) to convince the Verifier that some mathematical statement is true.*

Prior to Goldwasser et al., most work in this area focused the soundness of the proof system. That is, it considered the case where a malicious Prover attempts to ‘trick’ a Verifier into believing a false statement. What Goldwasser, Micali and Rackoff did was to turn this problem on its head. Instead of worrying only about the Prover, they asked: what happens if you don’t trust the Verifier? 

The specific concern they raised was information leakage. Concretely, they asked, how much extra information is the Verifier going to learn during the course of this proof, beyond the mere fact that the statement is true?

It’s important to note that this is not simply of theoretical interest. There are real, practical applications where this kind of thing matters.

Here’s one: imagine that a real-world client wishes to log into a web server using a password. The standard ‘real world’ approach to this problem involves storing a hashed version of the password on the server. The login can thus be viewed as a sort of ‘proof’ that a given password hash is the output of a hash function on some password — and more to the point, that the client actually knows the password.

Most real systems implement this ‘proof’ in the absolute worst possible way. The client simply transmits the original password to the server, which re-computes the password hash and compares it to the stored value. The problem here is obvious: at the conclusion of the protocol, the server has learned my cleartext password. Modern password hygiene therefore involves a good deal of praying that servers aren’t compromised.

What Goldwasser, Micali and Rackoff proposed was a new hope for conducting such proofs. If fully realized, zero knowledge proofs would allow us to prove statements like the one above, while provably revealing no information beyond the single bit of information corresponding to ‘this statement is true’.

A ‘real world’ example

So far this discussion has been pretty abstract. To make things a bit more concrete, let’s go ahead and give a ‘real’ example of a (slightly insane) zero knowledge protocol.

For the purposes of this example, I’d like you to imagine that I’m a telecom magnate in the process of deploying a new cellular communications network. My network structure is represented by the graph below. Each vertex in this graph represents a cellular radio tower, and the connecting lines (edges) indicate locations where two cells overlap, meaning that their transmissions are likely to interfere with each other.

This overlap is problematic, since it means that signals from adjacent towers are likely to scramble reception. Fortunately my network design allows me to configure each tower to one of three different frequency bands to avoid such interference.

Thus the challenge in deploying my network is to assign frequency bands to the towers such that no two overlapping cells share the same frequencies. If we use colors to represent the frequency bands, we can quickly work out one solution to the problem:

Of course, many of you will notice that what I’m describing here is simply an instance of the famous theory problem called the graph three-coloring problem. You might also know that what makes this problem interesting is that, for some graphs, it can be quite hard to find a solution, or even to determine if a solution exists. In fact, graph three-coloring — specifically, the decision problem of whether a given graph supports a solution with three colors — is known to be in the complexity class NP-complete.

It goes without saying that the toy example above is easy to solve by hand. But what if it wasn’t? For example, imagine that my cellular network was very large and complex, so much so that the computing power at my disposal was not sufficient to find a solution. In this instance, it would be desirable to outsource the problem to someone else who has plenty of computing power. For example, I might hire my friends at Google to solve it for me on spec.

But this leads to a problem.

Suppose that Google devotes a large percentage of their computing infrastructure to searching for a valid coloring for my graph. I’m certainly not going to pay them until I know that they really have such a coloring. At the same time, Google isn’t going to give me a copy of their solution until I’ve paid up. We’ll wind up at an impasse.

In real life there’s probably a common-sense answer to this dilemma, one that involves lawyers and escrow accounts. But this is not a blog about real life, it’s a blog about cryptography. And if you’ve ever read a crypto paper, you’ll understand that the right way to solve this problem is to dream up an absolutely crazy technical solution.

A crazy technical solution (with hats!)

The engineers at Google consult with Silvio Micali at MIT, who in consultation with his colleagues Oded Goldreich and Avi Wigderson, comes up with the following clever protocol — one so elegant that it doesn’t even require any computers. All it requires is a large warehouse, lots of crayons, and plenty of paper. Oh yes, and a whole bunch of hats.**

Here’s how it works.

First I will enter the warehouse, cover the floor with paper, and draw a blank representation of my cell network graph. Then I’ll exit the warehouse. Google can now enter enter, shuffle a collection of three crayons to pick a random assignment of the three agreed-upon crayon colors (red/blue/purple, as in the example above), and color in the graph in with their solution. Note that it doesn’t matter which specific crayons they use, only that the coloring is valid.

Before leaving the warehouse, Google covers up each of the vertices with a hat. When I come back in, this is what I’ll see:

Obviously this approach protects Google’s secret coloring perfectly. But it doesn’t help me at all. For all I know, Google might have filled in the graph with a random, invalid solution. They might not even have colored the graph at all.

To address my valid concerns, Google now gives me an opportunity to ‘challenge’ their solution to the graph coloring. I’m allowed to pick — at random — a single ‘edge’ of this graph (that is, one line between two adjacent hats). Google will then remove the two corresponding hats, revealing a small portion of their solution:

Notice that there are two outcomes to my experiment:

  1. If the two revealed vertices are the same color (or aren’t colored in at all!) then I definitely know that Google is lying to me. Clearly I’m not going to pay Google a cent.
  2. If the two revealed vertices are different colors, then Google might not be lying to me.
Hopefully the first proposition is obvious. The second one requires a bit more consideration. The problem is that even after our experiment, Google could still be lying to me — after all, I only looked under two of the hats. If there are E different edges in the graph, then Google could fill in an invalid solution and still get away with it most of the time. Specifically, after one test they could succeed in cheating me with probability up to (E-1)/E (which for a 1,000 edge graph works out to 99.9% of the time).

Fortunately Google has an answer to this. We’ll just run the protocol again!

We put down fresh paper with a new, blank copy of the graph. Google now picks a new (random) shuffle of the three crayons. Next they fill in the graph with a valid solution, but using the new random ordering of the three colors.

The hats go back on. I come back in and repeat the challenge process, picking a new random edge. Once again the logic above applies. Only this time if all goes well, I should now be slightly more confident that Google is telling me the truth. That’s because in order to cheat me, Google would have had to get lucky twice in a row. That can happen — but it happens with relatively lower probability. The chance that Google fools me twice in a row is now (E-1)/E * (E-1)/(or about 99.8% probability for our 1,000 edge example above).

Fortunately we don’t have to stop at two challenges. In fact, we can keep trying this over and over again until I’m confident that Google is probably telling me the truth.

But don’t take my word for it. Thanks to some neat Javascript, you can go try it yourself.

Note that I’ll never be perfectly certain that Google is being honest — there’s always going to be a tiny probability that they’re cheating me. But after a large number of iterations (E^2, as it happens) I can eventually raise my confidence to the point where Google can only cheat me with negligible probability — low enough that for all practical purposes it’s not worth worrying about. And then I’ll be able to safely hand Google my money.

What you need to believe is that Google is also protected. Even if I try to learn something about their solution by keeping notes between protocol runs, it shouldn’t matter. I’m foiled by Google’s decision to randomize their color choices between each iteration. The limited information I obtain does me no good, and there’s no way for me to link the data I learn between interactions.

What makes it ‘zero knowledge’?

I’ve claimed to you that this protocol leaks no information about Google’s solution. But don’t let me get away with this! The first rule of modern cryptography is never to trust people who claim such things without proof.

Goldwasser, Micali and Rackoff proposed three following properties that every zero-knowledge protocol must satisfy. Stated informally, they are:

  1. Completeness. If Google is telling the truth, then they will eventually convince me (at least with high probability).
  2. Soundness. Google can only convince me if they’re actually telling the truth.
  3. Zero-knowledgeness. (Yes it’s really called this.) I don’t learn anything else about Google’s solution.
We’ve already discussed the argument for completeness. The protocol will eventually convince me (with a negligible error probability), provided we run it enough times. Soundness is also pretty easy to show here. If Google ever tries to cheat me, I will detect their treachery with overwhelming probability.

The hard part here is the ‘zero knowledgeness’ property. To do this, we need to conduct a very strange thought experiment.

A thought experiment (with time machines)

First, let’s start with a crazy hypothetical. Imagine that Google’s engineers aren’t quite as capable as people make them out to be. They work on this problem for weeks and weeks, but they never manage to come up with a solution. With twelve hours to go until showtime, the Googlers get desperate. They decide to trick me into thinking they have a coloring for the graph, even though they don’t.

Their idea is to sneak into the GoogleX workshop and borrow Google’s prototype time machine. Initially the plan is to travel backwards a few years and use the extra working time to take another crack at solving the problem. Unfortunately it turns out that, like most Google prototypes, the time machine has some limitations. Most critically: it’s only capable of going backwards in time four and a half minutes.

So using the time machine to manufacture more working time is out. But still, it turns out that even this very limited technology can still be used to trick me.

I don’t really know what’s going on here
but it seemed apropos.

The plan is diabolically simple. Since Google doesn’t actually know a valid coloring for the graph, they’ll simply color the paper with a bunch of random colors, then put the hats on. If by sheer luck, I challenge them on a pair of vertices that happen to be different colors, everyone will heave a sigh of relief and we’ll continue with the protocol. So far so good.

Inevitably, though, I’m going to pull off a pair of hats and discover two vertices of the same color. In the normal protocol, Google would now be totally busted. And this is where the time machine comes in. Whenever Google finds themselves in this awkward situation, they simply fix it. That is, a designated Googler pulls a switch, ‘rewinds’ time about four minutes, and the Google team recolors the graph with a completely new random solution. Now they let time roll forward and try again.

In effect, the time machine allows Google to ‘repair’ any accidents that happen during their bogus protocol execution, which makes the experience look totally legitimate to me. Since bad challenge results will occur only 1/3 of the time, the expected runtime of the protocol (from Google’s perspective) is only moderately greater than the time it takes to run the honest protocol. From my perspective I don’t even know that the extra time machine trips are happening.

This last point is the most important. In fact, from my perspective, being unaware that the time machine is in the picture, the resulting interaction is exactly the same as the real thing. It’s statistically identical. And yet it’s worth pointing out again that in the time machine version, Google has absolutely no information about how to color the graph.

What the hell is the point of this?

What we’ve just shown is an example of a simulation. Note that in a world where time runs only forward and nobody can trick me with a time machine, the hat-based protocol is correct and sound, meaning that after E^2 rounds I should be convinced (with all but negligible probability) that the graph really is colorable and that Google is putting valid inputs into the protocol.

What we’ve just shown is that if time doesn’t run only forward — specifically, if Google can ‘rewind’ my view of time — then they can fake a valid protocol run even if they have no information at all about the actual graph coloring.

From my perspective, what’s the difference between the two protocol transcripts? When we consider the statistical distribution of the two, there’s no difference at all. Both convey exactly the same amount of useful information.

Believe it or not, this proves something very important.

Specifically, assume that I (the Verifier) have some strategy that ‘extracts’ useful information about Google’s coloring after observing an execution of the honest protocol. Then my strategy should work equally well in the case where I’m being fooled with a time machine. The protocol runs are, from my perspective, statistically identical. I physically cannot tell the difference.

Thus if the amount of information I can extract is identical in the ‘real experiment’ and the ‘time machine experiment’, yet the amount of information Google puts into the ‘time machine’ experiment is exactly zero — then this implies that even in the real world the protocol must not leak any useful information.

Thus it remains only to show that computer scientists have time machines. We do! (It’s a well-kept secret.)

Getting rid of the hats (and time machines)

Of course we don’t actually want to run a protocol with hats. And even Google (probably?) doesn’t have a literal time machine.

To tie things together, we first need to bring our protocol into the digital world. This requires that we construct the digital equivalent of a ‘hat’: something that both hides a digital value, while simultaneously ‘binding’ (or ‘committing’) the maker to it, so she can’t change her mind after the fact.

Fortunately we have a perfect tool for this application. It’s called a digital commitment scheme. A commitment scheme allows one party to ‘commit’ to a given message while keeping it secret, and then later ‘open’ the resulting commitment to reveal what’s inside. They can be built out of various ingredients, including (strong) cryptographic hash functions.******

Given a commitment scheme, we now have all the ingredients we need to run the zero knowledge protocol electronically. The Prover first encodes its vertex colorings as a set of digital messages (for example, the numbers 0, 1, 2), then generates digital commitments to each one. These commitments get sent over to the Verifier. When the Verifier challenges on an edge, the Prover simply reveals the opening values for the commitments corresponding to the two vertices.

So we’ve managed to eliminate the hats. But how do we prove that this protocol is zero knowledge?

Fortunately now that we’re in the digital world, we no longer need a real time machine to prove things about this protocol. A key trick is to specify in our setting that the protocol is not going to be run between two people, but rather between two different computer programs (or, to be more formal, probabilistic Turing machines.)

What we can now prove is the following theorem: if you could ever come up with a computer program (for the Verifier) that extracts useful information after participating in a run of the protocol, then it would be possible to use a ‘time machine’ on that program in order to make it extract the same amount of useful information from a ‘fake’ run of the protocol where the Prover doesn’t put in any information to begin with.

And since we’re now talking about computer programs, it should be obvious that rewinding time isn’t such an extraordinary feat at all. In fact, we rewind computer programs all the time. For example, consider using virtual machine software with a snapshot capability.

Example of rewinding through VM snapshots. An initial VM is played forward, rewound to an
initial snapshot, then execution is forked to a new path.

Even if you don’t have fancy virtual machine software, any computer program can be ‘rewound’ to an earlier state, simply by starting the program over again from the beginning and feeding it exactly the same inputs. Provided that the inputs — including all random numbers — are fixed, the program will always follow the same execution path. Thus you can rewind a program just by running it from the start and ‘forking’ its execution when it reaches some desired point.

Ultimately what we get is the following theorem. If there exists any Verifier computer program that successfully extracts information by interactively running this protocol with some Prover, then we can simply use the rewinding trick on that program to commit to a random solution, then ‘trick’ the Verifier by rewinding its execution whenever we can’t answer its challenge correctly. The same logic holds as we gave above: if such a Verifier succeeds in extracting information after running the real protocol, then it should be able to extract the same amount of information from the simulated, rewinding-based protocol. But since there’s no information going into the simulated protocol, there’s no information to extract. Thus the information the Verifier can extract must always be zero.

Ok, so what does this all mean?

So let’s recap. We know that the protocol is complete and sound, based on our analysis above. The soundness argument holds in any situation where we know that nobody is fiddling with time — that is, the Verifier is running normally and nobody is rewinding its execution.

At the same time, the protocol is also zero knowledge. To prove this, we showed that any Verifier program that succeeds in extracting information must also be able to extract information from a protocol run where rewinding is used and no information is available in the first place. Which leads to an obvious contradiction, and tells us that the protocol can’t leak information in either situation.

There’s an important benefit to all this. Since it’s trivial for anyone to ‘fake’ a protocol transcript, even after Google proves to me that they have a solution, I can’t re-play a recording of the protocol transcript to prove anything to anyone else (say, a judge). That’s because the judge would have no guarantee that the video was recorded honestly, and that I didn’t simply edit in the same way Google might have done using the time machine. This means that protocol transcripts themselves contain no information. The protocol is only meaningful if I myself participated, and I can be sure that it happened in real time.

Proofs for all of NP!

If you’ve made it this far, I’m pretty sure you’re ready for the big news. Which is that 3-coloring cellphone networks isn’t all that interesting of a problem — at least, not in and of itself.

The really interesting thing about the 3-coloring problem is that it’s in the class NP-complete. To put this informally, the wonderful thing about such problems is that any other problem in the class NP can be translated into an instance of that problem.In a single stroke, this result — due to Goldreich, Micali and Wigderson — proves that ‘efficient’ ZK proofs exists for a vast class of useful statements, many of which are way more interesting than assigning frequencies to cellular networks. You simply find a statement (in NP) that you wish to prove, such as our hash function example from above, then translate it into an instance of the 3-coloring problem. At that point you simply run the digital version of the hat protocol.

In summary, and next time

Of course, actually running this protocol for interesting statements would be an insanely silly thing for anyone to do, since the cost of doing so would include the total size of the original statement and witness, plus the reduction cost to convert it into a graph, plus the E^2 protocol rounds you’d have to conduct in order to convince someone that the proof is valid. Theoretically this is ‘efficient’, since the total cost of the proof would be polynomial in the input size, but in practice it would be anything but.

So what we’ve shown so far is that such proofs are possible. It remains for us to actually find proofs that are practical enough for real-world use.

In the next post I’ll talk about some of those — specifically, the efficient proofs that we use for various useful statements. I’ll give some examples (from real applications) where these things have been used. Also at reader request: I’ll also talk about why I dislike SRP so much.

See here for Part 2.

 

Notes:

* Formally, the goal of an interactive proof is to convince the Verifier that a particular string belongs to some language. Typically the Prover is very powerful (unbounded), but the Verifier is limited in computation.

** This example is based on the original solution of Goldwasser, Micali and Rackoff, and the teaching example using hats is based on an explanation by Silvio Micali. I take credit only for the silly mistakes.

****** A simple example of a commitment can be built using a hash function. To commit to the value “x” simply generate some (suitably long) string of random numbers, which we’ll call ‘salt’, and output the commitment C = Hash(salt || x). To open the commitment, you simply reveal ‘x’ and ‘salt’. Anyone can check that the original commitment is valid by recomputing the hash. This is secure under some (moderately strong) assumptions about the function itself.

Cryptographic obfuscation and ‘unhackable’ software

I have a thing for over-the-top cryptography headlines — mostly because I enjoy watching steam come out of researchers’ ears when their work gets totally misrepresented. And although I’ve seen quite a few good ones, last week WIRED managed a doozy.

The headline in question, Cryptography Breakthrough Could Make Software Unhackable, managed to accomplish something that few cryptography headlines do. It sent its own protagonist, Amit Sahai, into the comments section to perform intellectual garbage pickup.

In contrast to the headline, which is quite bad, the article is actually pretty decent. Still, the discussion around it has definitely led to some confusion. As a result, many now think an amazing breakthrough has taken place — one that will finally make software secure. They’re probably right about the first part. They may be disappointed about the rest.

The truth, as usual, is complicated. There is, indeed, something very neat going on with the new obfuscation results. They’re just not likely to make software ‘unhackable’ anytime soon. They might, however, radically expand what we can do with cryptography. Someday. When they’re ready. And in ways we don’t fully understand yet.

But before I go into all that, it’s probably helpful to give some background.

Program obfuscation

The Wired article deals with the subject of ‘program obfuscation‘, which is a term that software developers and cryptographers have long been interested in. The motivation here is pretty simple: find a way that we can give people programs they can run — without letting them figure out how the programs work.

Note that the last part necessarily covers a lot of ground. In principle it includes aspects ranging from the nature of specific secret algorithms used — which may be proprietary and thus worth money — to secret information like passwords and cryptographic keys that might be hardcoded into the program.

For a simple example, consider the following routine:

// Check a password and print out top secret information if it’s correct
//
SuperSecretPasswordProtectedStuff(string passwd) {
  if (password == “0xt438fh27266629zn28366492923aai3jnqobbyc4t!”) {
    print(“Congratulations. Here’s some super secret private information: ….\n”);
  } else {
    print(“Wrong password, fool.\n”);
  }
}

If you’re like me, you probably wrote a program like this at some point in your life. You may have eventually realized how ineffective it would be — against a smart attacker who could dump the program code. This is because most programs (and even compiled binaries) are pretty easy to read. An attacker could just look at this program to recover the secret password.

Program obfuscation is motivated by the idea that many useful programs would benefit if we could somehow ‘stop’ people from doing this, while still letting them possess and run the code on their own computers.

In real world software systems, ‘obfuscation’ usually refers to a collection of ad-hoc techniques that turn nice, sensible programs into a morass of GOTOs and spaghetti code. Sometimes important constants are chopped up and distributed around the code. Some portions of the code may even be encrypted — though only temporarily, since decryption keys must ship with the program so it can actually be run. Malware authors and DRM folks love this kind of obfuscation.

A chunk of ‘birken’, one of the winning entries in the 2013 obfuscated C contest.

While these techniques are nice, they’re not really ‘unhackable’ or even secure against reverse-engineering in the strong sense we’d like. Given enough time, brains and tools, you can get past most common software obfuscation techniques. If you have, say, Charlie Miller or Dion Blazakis on your side, you probably do it quickly. This isn’t just because those guys are smart (though they are), it’s because the techniques are not mathematically rigorous. Most practical obfuscation amounts to roadblocks — designed to slow reverse-engineers down and make them give up.

So what does it mean to securely ‘obfuscate’ a program?

The poor quality of existing software obfuscation set cryptographers up with a neat problem. Specifically, they asked, could we define a strong definition of program obfuscation that would improve on, to put it politely, the crap people were actually using? Given such an obfuscator I could hand you my obfuscated program to run while provably protecting all partial information — except for the legitimate inputs and outputs.

If this could be done, it would have an incredible number of applications. Stock traders could obfuscate their proprietary trading algorithms and send them to the cloud or to end-customers. The resulting programs would still work — in the sense that they produced the right results — but customers would never learn anything about ‘how they worked’. The internals of the program would be secret sauce.

Unfortunately, when the first researchers started looking at this problem, they ran into a very serious problem: nobody had any idea what it meant to securely ‘obfuscate a program in the first place.

Do think about it for a second before you roll your eyes. What do you think it means to obfuscate a program? You’re probably thinking something like ‘people shouldn’t learn stuff‘ about the program. But can you explain what stuff? And does ‘stuff‘ depend on the program? What about programs where you can efficiently learn the whole program from sending it inputs and seeing the results? What does it mean to obfuscate those?

Clearly before progress could begin on solving the problem, cryptographers needed to sit down and figure out what they were trying to do. And indeed, several cryptographers immediately began to do exactly this.

Black-box cryptographic obfuscation

The first definitions cryptographers came up addressed a very powerful type of obfuscation called ‘virtual black box obfuscation‘. Roughly speaking, it starts from the following intuitive thought experiment.

Imagine you have a program P, as well as some code obfuscation technique you’ve developed. It’s important that the obfuscation be efficient, meaning it doesn’t slow the program down too much. To determine whether the obfuscation is successful, you can conduct the following experiment.

  1. Give Alice a copy of the obfuscated program code Obf(P), in a form that she can look at and run on her own computer.
  2. Take the original (unobfuscated) program P, and seal it in a special computer located inside an armored ‘black box’. Let a user Sam interact with the program by sending it inputs and receiving outputs. But don’t let him access the code.

Roughly speaking, what we want from secure obfuscation is that Alice should learn ‘no more‘ information after seeing the obfuscated program code, than could some corresponding user Sam who interacts with the same program locked up in the black box.

What’s nice here is that this is we have the beginnings of an intuitive definition. In some sense the obfuscation should render the program itself basically unintelligible — Alice should not get any more information from seeing the obfuscated program than Sam could get simply by interacting with its input/output interface. If Sam can ‘learn’ how the program works just by talking to it, even that’s ok. What’s not ok is for Alice to learn more than Sam.

The problem with this intuition is, that of course, it’s hard to formulate. One thing you may have noticed is that the user Alice who views the obfuscated program truly does learn something more than the user Sam, who only interacts with it via the black box. When the experiment is over, the user who got the obfuscated program still has a copy of the obfuscated program. The user who interacted with the black box does not.

To give a practical example, let’s say this program P is a certificate signing program that has a digital signing key hard-coded inside of it — say, Trustwave’s — that will happily attach valid digital signatures on certificates (CSRs) of the user’s devising. Both Alice and Sam can formulate certificate signing requests and send them into the program (either the obfuscated copy or the version in the ‘black box’) and get valid, signed certificates out the other end.

But here’s the thing: when the experiment is over and we shut down Sam’s access to the black box, he won’t be able to sign any more certificates. On the other hand, Alice, who actually who learned the obfuscated program Obf(P), will still have the program! This means she can keep signing certificates on and forevermore. Knowing a program that can sign arbitrary certificates is a pretty darn significant ‘piece’ of information for Alice to have!

Worse, there’s no way the ‘black box’ user Sam can ever learn a similar piece of information — at least not if the signature scheme is any good. If Sam could ask the black box for a bunch of signatures, then somehow learn a program that could make more signatures, this would imply a fundamental weakness in the digital signature scheme. This cuts against a basic property we expect of secure signatures.

Barak et al. proposed a clever way to get around this problem — rather than outputting an arbitrary piece of information, Alice and Sam would be restricted to outputting a single bit at the end of their experiment (i.e., computing a predicate function). This helps to avoid the problems above and seems generally like the “right” definition.

An impossibility result

Having proposed this nice definition, Barak et al. went on to do an irritating thing that cryptographers sometimes do: they proved that even their definition doesn’t work. Specifically, they showed that there exist programs that simply can’t be obfuscated under this definition.

The reason is a little wonky, but I’m going to try to give the flavor for it below.

Imagine that you have two programs A and B, where A is similar to our password program above. That is, it contains a hard-coded, cryptographic secret which we’ll denote by password. When you run A(x), the program checks whether (x == password) and if so, it outputs a second cryptographically strong password, which we’ll cleverly denote by password_two. If you run A on any other input, it doesn’t output anything.

Now imagine the second program B works similarly to the first one. It contains both password and password_two hardcoded within it. A major difference is that B doesn’t take a string as input. It takes another computer program. You feed a program in as input to B, which then:

  1. Executes the given program on input password to get a result r.
  2. If (r == password_two)it outputs a secret bit.

It should be apparent to you that if you run the program B on input the program A — that is, you compute B(A) — it will always output the secret bit.* It doesn’t even matter if either B or A has been obfuscated, since obfuscation doesn’t change the behavior of the programs.

At the same time, Barak et al. pointed out that this trick only works if you actually have code for the program A. If I give you access to a black box that contains the programs A, B and will simply let you query them on chosen inputs, you’re screwed. As long as password and password_two are cryptographically strong, you won’t be able to get A to output anything in any reasonable amount of time, and hence won’t be able to get B to output the secret bit.

What this means is that Alice, who gets the obfuscated copies of both programs, will always learn the secret bit. But if Sam is a reasonable user (that is, he can’t run long enough to brute-force the passwords) he won’t be able to learn the secret bit. Fundamentally, the obfuscated pair of programs always gives more information than black-box access to them.

It remains only to show that the two programs can be combined into one program that still can’t be obfuscated. And this is what Barak et al. did, completing their work and showing the general programs can’t be obfuscated using their definition.

Now you can be forgiven for looking askance at this. You might, for example, point out that the example above only shows that it’s hard to obfuscate a specific and mildly ridiculous program. Or that this program is some kind of weird exception. But in general it’s not. There are many other useful programs that also can’t obfuscate, particularly when you try to use them in real systems. These points were quickly pointed out by Barak et al., and later in other practical settings by Goldwasser and Kalai, among others.

You might also think that obfuscation is plain impossible. But here cryptography strikes again — it’s never quite that simple.

We can obfuscate some things!

Before we completely write off black box obfuscation, let’s take a moment to go back to the password checking program I showed at the beginning of this post.

Here’s a slightly simpler version:

// Check a password and return true if it’s correct, false otherwise
//
bool SuperSecretPasswordProtectedStuff(string passwd) {
  if (password == “0xt438fh27266629zn28366492923aai3jnqobbyc4t!”) {
    return true;
  } else {
    return false;
  }
}

This function is an example of a ‘point function‘: that is, a functions that returns false on most inputs, but returns true at exactly one point. As you can see, there is exactly one point (the correct password) that makes this routine happy.

Point functions are interesting for a two simple reasons: the first is that we use them all the time in real systems — password checking programs being the obvious example. The second is that it turns out we can obfuscate them, at least under some strong assumptions. Even better, we can do it in a way that should be pretty familiar to real system designers.

Let be a secure hash function — we’ll get back to what that means in a second — and consider the following obfuscated password checking routine:

// Check a password and return ‘true’ if it’s correct, ‘false’ otherwise
// But do it all *obfuscated* and stuff
//
bool ObfuscatedSuperSecretPasswordProtectedStuff(string passwd) {

  static string HARDCODED_SALT          = 0x……; // this is a salt value

  static string HARDCODED_PASSWORD_HASH = 0x……; // this value is H(salt + target password)

  // First, hash the input password with the salt
  hashedPass = H(HARDCODED_SALT + passwd);

  if (hashedPass == HARDCODED_PASSWORD_HASH) {
    return true;
  } else {
    return false;
  }
}

Note that our ‘obfuscated’ program no longer stores the plaintext of the password. Instead we store its hash only (and salt), and wrap it inside a program that simply compares this hash to a hash of the user’s input. This should be familiar, since it’s the way you should be storing password files today.**

A number of cryptographers looked at formulations like this and showed the following: if the password is very hard to guess (for example, it’s secret and drawn at random from an exponentially-sized space of passwords) and the hash function is ‘strong’ enough, then the hash-checking program counts as a secure ‘obfuscation’ of the basic password comparison program above it.

The intuition for this is fairly simple: imagine Alice and Sam don’t know the password. If the password is strong, i.e., it’s drawn from a large enough space, then their probability of ever getting the program to output ‘true’ is negligible. If we assume an ideal hash function — in the simplest case, a random oracle — then the hard-coded hash value that Alice learns is basically just a useless random string, and hides all partial input about the real password. This means, in practice, that any general question Alice can answer at the end of the experiment, Sam can answer with about the same probability.***

Finding better definitions

More than a decade after the definition was formulated, there are basically two kinds of results about ‘strong’ virtual-black-box obfuscation. The first set shows that it’s impossible to do it for general programs, and moreover, that many of the interesting functions we want to obfuscate (like some signatures and pseudorandom functions) can’t be obfuscated in this powerful sense.

The second class of results shows that we can black-box obfuscate certain functions, but only very limited ones like, say, point functions and re-encryption. These results are neat, but they’re hardly going to set the world on fire.

What cryptographers have been trying to achieve since then is a different — and necessarily weaker — definition that could capture interesting things we wanted from obfuscating general programs, but without all the nasty impossibility results.

And that, finally, brings us to the advances described in the WIRED article.

Indistinguishability Obfuscation

In shooting down their own proposed definitions, Barak et al. also left some breadcrumbs towards a type of obfuscation that might actually work for general programs. They called their definition “indistinguishability obfuscation” (IO) and roughly speaking it says the following:

Imagine we have two programs C1, C2 — which we’ll describe as similarly-sized circuits — that compute the same function. More concretely, let’s say they have exactly the same input/output behavior, although they may be implemented very differently inside. The definition of indistinguishability obfuscation states that it should be possible to obfuscate the two circuits C1, C2 such that no efficient algorithm will be able to tell the difference between Obf(C1) from Obf(C2). 

While this idea was proposed years ago, nobody actually knew how to build any such thing, and it was left as one of those ‘open problems’ that cryptographers love to tear their hair out over. This remained the case until just last year, when a group of authors from UCLA, UT Austin and IBM Research proposed a ‘candidate construction’ for building such obfuscators, based on the new area of multilinear map-based cryptography.

Another interesting variant of this notion is called extractability obfuscation (EO), which implies not only that you can’t distinguish between Obf(C1) and Obf(C2), but moreover, that if you could distinguish the two, then you could necessarily find an input value on which both C1 and C2 would produce different outputs. Moreover, other work indicates IO and EO give essentially the ‘best possible’ obfuscation you can provide for general programs.

The question you’re probably asking is: so what? What can we do with indistinguishability obfuscation?

And this is where the WIRED article differs substantially from the reality. The truth is that IO will probably bring us major cryptographic and software advances — it’s already been shown to bring about succinct functional encryption for all circuits, as well as advances in deniable encryption. Moreover it can be used to build known forms of encryption such as public-key encryption based on new techniques that differ from what we use today — for example, by obfuscating ‘symmetric’ primitives like pseudorandom functions.****

And if this doesn’t seem quite as exciting as the WIRED article would imply, that’s because we’re still learning what the possibilities are. The new techniques could, for example, allow us to build exciting new types of security systems — or else show us radical new ways to build systems that we already have. Even the latter is very important, in case advances in our field result in ‘breaks’ to our existing constructions.

What IO and EO will probably not do is make programs unhackable, since that implies something a whole lot stronger than either of these techniques currently provide. In fact, it’s not clear what the heck you’d need to make programs unhackable.

And the reason for that is quite simple: even obfuscated software can still suck.

Notes:

Thanks to Susan Hohenberger and Zooko for comments on this post. 

The bit in the Barak et al. proof doesn’t have to be secret.

** With a pretty damn big caveat, which is that in real life (as opposed to cryptographic examples) users pick terrible passwords, which makes your password hashes vulnerable to dictionary attacks. We have a variety of countermeasures to slow down these attacks, including salt and computationally intensive password hashing functions.

*** God I hate footnotes. But it’s worth unpacking this. Imagine that Alice learns ‘some information’ from seeing an average program containing the hash of a strong password. If the password is strong, then Alice can only guess the right input password with negligible probability. Then Sam can ‘use’ Alice to get the same information as follows. He formulates a ‘fake’ program that contains a completely bogus password hash and mails it to Alice. Whatever information she learns from his program, he takes and outputs as the information he ‘learned’. It remains to show that if the password hash is strong (a random oracle), Alice isn’t going to be able to learn more from a random hash than she would from the hash of a strong password.

**** The basic idea here is that symmetric encryption (like AES, say) can’t be used for public key encryption, since anyone who learns your encryption key also learns your decryption key. But if you can obfuscate an encryption program that contains your symmetric encryption key, then the new program becomes like a public key. You can hand it out to people to encrypt with.

Is the cryptopocalypse nigh?

I’ve been traveling a bit over the past couple of weeks, so I haven’t had much of a chance to keep up on blogging. One consequence is that I completely missed my chance to say something about, well, anything that happened at BlackHat or Def Con.

Which is too bad, since a surprising amount of crypto stuff did go down! One thing that I wish I’d had a chance to talk about was a presentation at BlackHat called ‘The Factoring Dead‘ by Tom Ritter, Javed Samuel and Alex Stamos. (Thomas Ptacek was also involved, but he insists that he did nothing and they only included him out of pity.)

Although I don’t quite agree with the premise of this presentation, talks like it are fun — in the way that zombie movies are fun — because you get to abandon reality and think about some non-serious things for a while.

Factually, the presentation addresses some new results on the discrete logarithm problem published by Antoine Joux and his co-authors Razvan Barbulescu, Pierrick Gaudry and Emmanuel Thomé — developments the presenters cite as a very serious reason for people to get worried. And we’re not talking about the usual ‘you’re going to use crypto wrong’ kind of worry, but a more earthshaking kind: namely that RSA and Diffie-Hellman and DSA are soon going to be broken altogether.

Now let me be clear that Ritter, Samuel and Stamos and even lame, non-contributing Ptacek (henceforth RSSp) are all smart guys. In fact, I would venture to say they’re probably much more familiar with the Joux et al. work than I am since they seem interested in the details. Me, I like my hardness assumptions the way I like my hamburgers: neatly ground up and without the mooing. I could live my whole life without voluntarily digging into the efficiency of discrete logarithm solvers in fields of small characteristic.

Moreover, it’s hard to really argue with the content of RSSp’s presentation, since the bulk of what they do is to simply present facts. There really have been some major recent advances in solving discrete logarithms over certain special fields. There have been major attacks in the more distant past that took us by surprise. And yes, it would really awesome if people stopped fiddling with 1024-bit RSA keys and moved to elliptic curve crypto.

What’s concerning is the conclusions they (and other sources) have reached: namely, that factoring-based cryptosystems could be dead in just a few years. This kind of thing could incite panic! (I mean, it could if people actually cared about cryptography. Which unfortunately they mostly don’t.)

So let’s spend some time examining this.

Razvan Barbulescu, Emmanuel Thomé
and Antoine Joux hungrily eye a defenseless
discrete logarithm instance.
(Source: Steven Galbraith)

The background

The jumping off point for RSSp’s slides is a set of recent advances made by Joux and subsequently by Barbulescu, Gaudry, Joux and Thomé. The discrete logarithm problem (which you can learn about in this highly informative video) is noteworthy for two reasons. First, it’s believed that in many settings, the discrete logarithm problem is difficult to solve. Second: that assumption is critical to the security of many of the cryptosystems we know and love — for example, Diffie-Hellman and DSA.

Now the Joux and Barbulescu et al. results are important work, and really do deserve attention from cryptographers and non-cryptographers alike. What they show is that there exist relatively efficient algorithms for solving discrete logarithms in certain very specific types of field. Even more amazingly, the new algorithms are efficient enough to actually implement and run — against parameters that were previously thought to have cryptographic security!

Indeed this has already had some (limited) impact on practitioners in the research community. For example, many of the pairing-based cryptography libraries I work with ship with parameters that are now deemed to be too risky thanks to these new attacks. However — and this is key — these are research libraries. To my knowledge, none of these fields is actually being used in deployment, let alone standardized cryptography.

In other words, this is the kind of result that should receive (and has received!) lots of attention from cryptographers. But not necessarily from people who use cryptography. And here’s why.

You see, while the Joux and Barbulescu et al. algorithms really are efficient, they only work in fields with very specific properties. Namely, the fields must have small characteristic. Indeed, this feature of the field is critical to certain key steps of the algorithm. Take this property away and you still get some advances over the previous state of the art, but the advances are decidedly more theoretical.

Which brings us to the payoff: all of the fields we use to implement most cryptography — things like (non-elliptic-curve) DSA, Diffie-Hellman, and even the fields we use to implement NIST standard elliptic curves — are prime fields and hence don’t have the necessary properties to make the Joux results meaningful. Hence these attacks don’t seem to apply. Moreover there’s really no good reason to believe that they will anytime soon.

The BlackHat presentation

Which brings us to the RSSp BlackHat presentation. The overall premise of RSSp’s presentation is that advances happen rapidly. It’s not unprecedented for theoretical attacks in the literature to rapidly morph into real things that keep security engineers up at night. They also point out that attacks on the DLP have closely tracked attacks on factoring, both in the classical and the quantum world. (Ok, they don’t say the last part but it’s also true.)

RSSp also correctly imply that we should be switching away from cryptosystems that rely on the hardness of the (field-based) discrete logarithm problem, and should instead be moving to cryptosystems based on the elliptic curve discrete logarithm problem (ECDLP).* This is because none of the efficient attacks on DLP — including Joux’s algorithms — seem to apply in the (standardized) EC setting.

Lastly, they correctly point out that cryptosystems based on factoring and (field-based) Discrete Logarithms are already being deprecated by organizations like NIST for a variety of good — though not panic-related — reasons. Mostly this is because our current pre-Joux algorithms against those settings have made it too costly to get long-term (256-bit) security; you just need enormous keys. This was the case before Joux came along, and it’s still the case now.

The last point RSSp make is also absolutely correct: we should be switching to elliptic curve cryptography (ECC) as soon as possible, in part just so people can start using high-security cryptosystems without paying performance and bandwidth through the nose for the privilege. This isn’t totally academic, since — as Thomas Ptacek reminds me — we’re getting close to the death of 1024-bit keys. If your adversary is the NSA, anyway.

(It also doesn’t hurt that getting more people on this bandwagon will reduce the number of people rolling their own RSA implementation.)

So should we stock up on canned goods and move to Vermont?

Vermont is lovely. But you shouldn’t move there because of this presentation.

In fact this is hardly the first time we’ve seen a major breakthrough against an important cryptographic problem. In the 90s it was fast number field sieving against factoring-based systems and slightly later, things like the MOV attack on the ECDLP. In both cases, there was a small degree of panic, but ultimately a pretty mild result: cryptographers carefully examined the attack and chose new parameters that made it impractical. Then everyone went back to business.

In this case it looks like we’ve already got a set of parameters that keep us safe, so it’s even more unlikely — that except for a few researchers doing what researchers do — any of us will have to think about this in three years or five or even ten.

And by the way, you should not believe this because I say so — that would be foolish. You should believe it because the people who work in this area also don’t seem to think it’s an issue. If you doubt this, go to CRYPTO this week look for people running around with their hair on fire. The number should be barely higher than usual.

What would we do if there was a real cryptpocalypse?

Right! If we’re going to gin up a cryptpocalypse let’s have a real one. What if in 2015, Joux and his co-authors publish a new algorithm that efficiently solves the DLP in prime fields and at realistic key sizes, and moreover has a factoring analog that breaks RSA? Well, this would certainly be very bad for everything we’ve encrypted in the past, but at least we’d have an immediate solution: a rapid transition to elliptic curve crypto. Whew!

But this is not much fun: like watching an alien invasion movie where the aliens are allergic to water.

So let’s go way out on a limb and imagine that in 2017, after everyone has piled into ECC, Joux et al. and a team from the University of Waterloo team up to publish a revolutionary new attack that reduces ECDLP to roughly the hardness of field-based DLP. What would happen then?

Well, this would be really bad.

Let me reiterate that there’s a reason we like our current batch of public-key cryptosystems — EC, field-based and factoring-based systems. They’re relatively easy to understand, they’ve all been studied quite a bit. But most importantly: they’re really efficient.

Once you leave this domain you enter a region that the maps label with ‘here be dragons‘. Not because this space is empty. It’s just that there are relatively few efficient schemes that have received anywhere near the level of study that our beloved ones have.

Probably the oldest and most familiar of the alternative encryption schemes is the McEliece cryptosystem, which was developed way back in 1978 (that’s one year after RSA, in case you’re keeping score at home). McEliece and its modern variants are based on problems in algebraic coding theory: they depend for security on the hardness of decoding general codes, as well as some assumptions about the specific code used.

McEliece is surprisingly fast and (so far as we know) quite secure. There’s only one major problem: the public keys are big. According to a 2008 analysis by Bernstein, Lange and Peters, achieving security equivalent to a 3072-bit RSA key (aka the ‘128 bit’ symmetric-equivalent security level) requires a stunning 187 kilobyte McEliece public key. Moving up to 256-bit security — notably hard even for RSA — takes this to nearly 1MB. Recent improvements may cut that down a bit, but they’re still relatively unstudied.

Another possibility is to use Lattice-based cryptosystems. While there are several in the research literature, one of the most studied is the NTRU cryptosystem. I won’t confess to caring much about NTRU, except to note that it’s relatively well-studied by the standards of such alternative schemes and even shows up in some standards. Unfortunately that doesn’t mean everyone loves it. The inventors also hold a patent on it.

Lastly, for signatures at least we can always fall back on old standbys such as hash based signatures, which should hold us as long as Joan Daemen’s team can think up new hash functions.

Conclusion

We live in frightening times and yes, it’s always tempting to peek under the bed and imagine scary monsters. In practice, the reality is probably a bit more mundane.

As much as we love to read stories of solitary mathematicians making revolutionary leaps in their unlit apartment, this is rarely how things go. Even the most significant advances are usually telegraphed via a series of public, widely-read research papers.

In other words: when RSA and DSA really are in danger you’ll know about it. Just look for a bunch of cryptographers running around with their hair on fire.

Notes:

* By ‘based on’ I don’t mean that these cryptosystems necessarily reduce to the ECDLP, but rather that their security depends upon the hardness of the ECDLP.

The Ideal Cipher Model (wonky)

A friend who’s learning cryptography writes with a few questions about block ciphers:

(1) Let’s say we’re using AES-128 — 128 bit keys, 128 bit blocks.

  • For a given 128 bit block of plaintext “P” – if I was to iterate through all 2**128 key permutations and encrypt the same plaintext P with each key, would the outputs all be unique, or would there be collisions?
  • For a given 128 bit key “K”, if I was to use K to encrypt every possible (2**128) plaintext, would the outputs all be unique, or would there be collisions?
These are all reasonable questions with simple answers. But I’m not going to give them. Why bother with two simple answers when we can give one really complicated intuition?

What’s Claude Shannon got to do with it?

Back in the late 1940s when people were still thinking on this whole information theory business, a gentleman named Claude Shannon came up with a model for what a block cipher should do. His idea was — and yes, I’m embellishing a lot here — that an ‘ideal’ block cipher would work kind of like a magical elf.*

You could ask the elf to encipher and decipher things for you. But instead of using a mathematical algorithm, it would just keep a blackboard with a big table. The table would have three columns (Key, Plaintext, Ciphertext), and it would start off empty.

When you asked the elf to encipher a plaintext P under a key K, it would do the following:
  1. Check to see if (K, P) were already recorded in some row of the table. If they were, it would read off the corresponding ciphertext value C from the same row, and return C.
  2. If no matching row was found, it would generate a perfectly random ciphertext C.
  3. It would then check for an existing table entry containing the same key and ciphertext (K, C). If this entry was found in the table, it would throw C away and repeat step (2).
  4. Otherwise it would add (K, P, C) to the table and return C to the caller.

Now I realize this looks complicated, but if you think about this for a little while you’ll realize that this little thought experiment does ‘work’ a lot like a real block cipher.

Just like a block cipher, it will always give the same output when you pass it a given key and plaintext. Furthermore — for a given key K, no two plaintexts will ever encipher to the same ciphertext. This models the fact that a block cipher is a permutation.

Lastly, the output of the encipherment is very ‘random’ — in the sense that it’s intuitively not linked to the input. Calling the cipher on different inputs should produce values that are very much unrelated.

Of course, you also need the elf to decipher stuff as well. Decipherment works mostly like the process above. When you ask to decipher (K, C), it checks to see whether the given key and ciphertext are already in the table (i.e., they were previously enciphered). If so it looks up and returns the corresponding plaintext. Otherwise it generates a new random plaintext, makes sure it hasn’t previously appeared in the table with that key, and returns the result.

Under the assumption that AES works like our elf, we can now answer my friend’s questions. Clearly if I encrypt the same plaintext with many different keys, I will get many different ciphertexts. They’ll each be random and 128 bits long, which means the probability of any two ciphertexts being the same (colliding) is quite small. But there’s no rule preventing it, and over a large enough set it’s actually quite likely.

Similarly, the second question is answered by the fact that the cipher is a permutation, something that’s captured in our elf’s rule (3).

This is an awful lot of work to answer a few simple questions…

Yes, it certainly is. But the ‘Elf’ model is useful for answering much more interesting questions — like: is a given encryption scheme secure?

For example, take the CBC mode of operation, which is a way to turn a block cipher into an encryption scheme:

CBC encryption takes in a random key (K) and a random ‘Initialization Vector’ (IV), both of which are chosen by the encryptor. Now let’s see what happens if we CBC-encrypt an N-block message using our elf.

  1. The encryptor XORs the first plaintext block with the (random) Initialization Vector (IV), which should give a randomly-distributed value. We’ll call this P’.
  2. He then sends (K, P’) to be enciphered by the elf. Since the elf has never seen (K, P’) — meaning it’s not in the table — it will generate a random value (C1), which will become the first ciphertext block.
  3. The encryptor now XORs the next plaintext block with C1, which again should yield a randomly-distributed value which we’ll call P”.
  4. He sends (K, P”) to be enciphered by the elf. Since P” is also random (and, say, 128 bits long), the probability that it’s already in the table is astronomically low. Thus with high probability the elf will again generate a random value (C2), which becomes the second ciphertext block.
  5. He then repeats steps (3) and (4) for all the remaining blocks.
Note that with overwhelming probability — and unless you encrypt really long ciphertexts** — the elf will generate a random value for each ciphertext block. Hence the entire ciphertext will consist of a random string, which means it really shouldn’t leak anything about the message (except for the length).

Of course, an attacker might also talk to the elf to try to learn something about the message. But note that even this attack isn’t useful unless he can guess the (random) key K, since the elf will give him random results unless he asks for a value that includes K. Since K is a randomly-chosen secret key, the attacker should not be able to guess it.

Do ideal ciphers exist?

Now all of this is well and good, but it leaves us with an important question: is AES actually as good as an ideal cipher? Unfortunately, the resounding answer to this question is no.

The problem here is that ideal ciphers are totally unworkable. Obviously we can’t have an actual elf randomly filling in a bulletin board as we encrypt things. I want to carry a copy of the block cipher around with me (in software or hardware). I also want my copy of the cipher to be consistent with your copy, so that I can send you messages and you can decrypt them.

To make the elf idea work, we would each need to carry around a copy of the elf’s table that’s completely filled in from the start — meaning it would already contain entries for every possible plaintext (resp. ciphertext) and key I might ever need to encipher/decipher. When you consider that there would be 2^128 rows just for a single key, you realize that, no, this is not a question of running out to Best Buy and picking up some more SD cards. It’s fundamentally hard.

So carrying ideal ciphers around is not possible. The question then is: is there something that’s as good as an ideal cipher, without requiring us to carry around an exponentially-sized table.

The answer to that question is a mixed bag. The bad news is that nothing really works as well as an ideal cipher. Worse yet, there exists schemes that would be provably secure with an ideal cipher, but would fail catastrophically if you implemented them with any real block cipher. So that sucks.

On the other hand, those are theoretical results. Unless you’re doing some very specific things, the ideal cipher model is a moderately helpful tool for understanding what block ciphers are capable of. It’s also a good jumping off point for understanding the real proofs we actually use for modes like CBC. These proofs use more ‘realistic’ assumptions, e.g., that the block cipher is a pseudorandom permutation.

But those proofs will have to wait for another time. I’ve reached my wonkery limit for today.

Notes:

* For the record, at no point did Claude Shannon ever refer to an ‘elf’. At least not in writing.

** The probability of a ‘collision’ (i.e., asking the elf to encipher a value that’s already been enciphered) goes up as you put encipher more blocks. At a certain point (for AES, close to 2^64 blocks) it becomes quite high. Not coincidentally, this roughly coincides with the number of blocks NIST says you should encrypt with CBC mode.

Why I hate CBC-MAC

If you’re like most people, you don’t have a strong opinion about CBC-MAC. In fact, if you’re like most people, you don’t have a strong opinion about any crypto primitive.

This is healthy. Keep up the good work.

I’m not most people. I’ve spent the last week thinking about and dealing with CBC-MAC — or more specifically, code that uses it in various contexts — and I need to share with you how much I despise this little algorithm. And beg you never to use it.

Oh yes, I know the temptation. You have this nice block cipher just sitting around — maybe you’re encrypting something — and you’ve heard how serious this whole message authentication thing is. Maybe you’ve even thought about using one of those fancy authenticated encryption modes, but found them to be too exotic and complicated.

Then it comes to you that all your problems would be solved if you just used CBC-MAC. This is too bad, because now your troubles are just beginning.

Now a quick note: there’s nothing really wrong with CBC-MAC, when implemented correctly. And it’s not even that hard to implement properly. The problem is that many people who use CBC-MAC (rather than HMAC or a proper AEAD mode) seem incapable of actually doing this. They get it wrong in hilariously, embarassingly, stupid, complicated ways.

But of course you wanted examples. Ok, let’s give some.

1. Your implementation doesn’t handle variable-length messages.

A quick reminder. CBC-MAC is very similar to the classic CBC mode for encryption, with a few major differences. First, the Initialization Vector (IV) is a fixed value, usually zero. Second, CBC-MAC only outputs the last block of the ciphertext — this single value forms the MAC.

Many dumb implementations stop here. And that leads to big problems.

Most notably, if your system allows for variable-length messages — as it should — there is a simple attack that allows you to forge new messages. First, get a MAC T on a message M1. Now XOR the tag T into the first block of some arbitrary second message M2, and get a MAC on the modified version of M2.

The resulting tag T’ turns out to be a valid MAC for the combined message (M1 || M2). This is a valid forgery, and in some cases can actually be useful.

The standard fix to prepend the message length to the first block of the message before MACing it. But a surprisingly large number of (dumb) implementations skip this extra step. And many CBC-MAC implementations are dumb implementations.

2. Your implementation uses a random Initialization Vector.

If CBC-MAC with a fixed IV is great, surely CBC-MAC with a random IV must be super-great. But no, it isn’t.

Using a random (or variable IV) is bad for the simple reason that verifying a CBC-MAC requires you to know the IV, and to know the IV you probably need to read it from somewhere. Typically this means the same untrusted place where you were storing your message.

If the attacker can change the CBC-MAC IV, they can also change the first block of the MACed message in an equivalent manner. This works because the first step of CBC-MAC is to XOR the IV with the message. There are all kinds of silly variants of this problem, and all of them hurt.

3. You’ve used the same key for MAC and encryption.

A general rule in cryptography is that you shouldn’t use the same key for two different cryptographic primitives — encryption and signature, for example. Or encryption and MAC.

Some people figure that rules were made to be broken.

Note that shared keys can actually be ok, in some cases. Combined modes like CCM (short for CTR + CBC-MAC) actually do use the same key for both operations. However, these modes do it in a very careful, thoughtful manner. Your garden-variety implementation doesn’t.

One particularly ugly pattern I’ve seen is to use (dumb) CBC-MAC on a plaintext, then to encrypt said plaintext in CTR mode using some initial counter (C). This is insecure for a bunch of reasons, but specifically because I might be able to completely decrypt your ciphertext.

To do this, I simply ask you to encrypt a series of small files corresponding to the counter values C, C+1, etc. of the ciphertext I want to attack. The CBC-MAC of each of these files lets me recreate the CTR-mode keystream I need to decrypt the original ciphertext. Now I have your message.

4. You’ve used CBC-MAC as a hash function.

This one isn’t really a problem with CBC-MAC, but it does crop up. In fact, it happened recently to the file sharing site Mega.

To make a long story short: cryptographic hash functions are public functions (i.e., no secret key) that have the property of collision-resistance (it’s hard to find two messages with the same hash). MACs are keyed functions that (typically) provide message unforgeability — a very different property. Moreover, they guarantee this only when the key is secret.

If you attempt to use CBC-MAC with a non-secret key, it becomes a very bad candidate for anything. In fact, you can trivially find useful collisions in the output, something that’s very bad if you’re using it to authenticate code. Which is what Mega was doing with it.

This isn’t true of all MACs — HMAC, for example, should retain the collision resistance of the underlying hash function even if the MAC key is compromised. This is yet another reason to prefer it for cases where cryptographic expertise is not a sure bet.

In summary

I’ll repeat that none of these are really problems with CBC-MAC, which is a perfectly lovely algorithm if implemented and used correctly. The problems above only crop up when people try to whip it up themselves, without using a standard construction.

If you must write your own code, my recommendation is to use HMAC — which is extremely hard to screw up. If you’re doing combined encryption/MAC and you only have a block cipher, then look into the CCM spec, which is a patent free AEAD mode. This should address all of these problems and give you some nice test vectors too.

What you shouldn’t do is code up some half-assed CBC-MAC thing and expect you’ll be ok. The fact is, you probably won’t.

Indifferentiability

After umpteen weeks writing about broken stuff, I’m thrilled to tell you that for once, nothing awful is happening in the crypto world. It won’t last. But while it does, let’s grab the opportunity to talk about something constructive. 

Now a word of warning: what I’m going to talk about today is fairly wonky and (worse), involves hash functions. If you’re not feeling up for this, this is your cue to bail and go read something nice about buffer overflows.

For those still with me, the subject of this post is the design of hash functions, and more specifically: the indifferentiability proofs that designers write to argue for their security. I was surprised to find that most people have never heard of these proofs, and thus have no idea why they’re useful. That’s too bad, since they’re extremely important to the way we analyze hash functions today.

Merkle-Damgård

This is not Ivan Damgård. (Seriously Google?)

The best way to begin any discussion of hash function design is to take a quick peek inside of the hash functions we actually use. Since the most popular hashes today are MD5 (ugh) and SHA, the right place to start is with the ‘Merkle-Damgård’ paradigm.

To understand Merkle-Damgård, you need to understand that cryptographers love to build complicated things out of simpler components. Under the hood of most block ciphers you’ll find S-boxes. Similarly, if you take the lid off a Merkle-Damgård hash function — surprise! — you find block ciphers. Or at least, something very much like them.

This approach dates to a 1979 proposal by a young cryptographer named Ralph Merkle. What Merkle showed is a way to build hash functions with a variable-length input, using any fixed one-way compression function (a one-way function that spits out fewer bits than it takes in). While Merkle wasn’t specific about the function, he suggested that DES might be a good candidate.

Expressed as a colorful diagram, the Merkle construction looks something like this:

Merkle-Damgård Construction (source: Wikipedia because I’m too lazy to
draw my own diagrams). IV is a fixed value. f is a one-way compression function.

The beauty of Merkle’s proposal is that it’s relatively simple to understand. You simply chop your message into blocks, then feed each block into the function f along with the output of the previous function evaluation. Throw in a finalization stage and you’re done.

Of course there’s a difference between proposing a technique and showing that it actually works. It would take ten more years, but at CRYPTO 1989, Merkle and another cryptographer named Ivan Damgård independently submitted formal analyses of Merkle’s proposal. What they showed is that as long as the function f has certain ideal properties, the resulting hash function is guaranteed to be collision-resistant.  The rest, as they say, is history.

The popularity of Merkle-Damgård can be attributed in part to its security proof. But it also owes something to some major practical advantages:

  1. You can use any secure block cipher as the function f, with just a few tweaks.
  2. M-D hash functions can be pretty darn fast, again depending on f and how you use it.
  3. M-D hashes allow you to digest ‘live’ data streams, where you don’t know in advance how much data you’re going to be hashing. 
Of course, Merkle-Damgård hashes also have serious weaknesses. The most famous is the ‘length extension attack‘ in which an attacker, given only H(M) for some unknown message M, can ‘tack on’ additional blocks of her own choosing. This issue spells big trouble for people who think that H(key || message) is a good Message Authentication CodeWhat’s interesting about the length-extension issue is not that it leads to broken MACs. I mean, that is interesting, and it’s why you should use HMAC. But what’s really interesting is that this flaw doesn’t represent a violation of the collision-resistance guarantee. The two issues are in fact completely orthogonal. And this tells us something fundamental. Namely: collision-resistance is not enough.Today’s implementers do all kinds of crazy things with hash functions, and many of those applications require much more than collision-resistance. To achieve the necessary properties, we first need to figure out what they are. And that requires us to think hard about the following question:

What the heck is a secure hash function?

If you crack a typical security textbook (or visit the Wikipedia page on hash functions), you’ll see a long list of things of things a hash function ‘must’ accomplish. The list usually starts with these:
  1. Collision resistance. It should be hard to find any pair of messages M1, M2 such that H(M1) == H(M2).
  2. Pre-image resistance. Given only h it should be hard to find a ‘pre-image’ M2 such that H(M2) == h.

Now leave aside the technical fact that none of the unkeyed hash functions we use today are ‘truly’ collision-resistant. Or that the above definition of pre-image resistance implies that I can hash my cat’s name (‘fluffy’) and nobody can invert the hash (note: not true. Go ask LinkedIn if you don’t believe me.) The real problem is that these definitions don’t cover the things that people actually do with hash functions.

For example, take the construction of PRNGs. A common PRNG design hashes together large pools of gathered entropy in the hope that the result will be sufficiently uniform for cryptographic work. This is so common that it’s probably happening somewhere on your computer right now. And yet, absolutely nothing in the definitions above implies that this technique is safe!* Similar problems exist for key derivation functions, and even for signature schemes like ECDSA which clearly require hash functions that are more than just collision-resistant.
The more you look into the way that people use hash functions, the more you realize that they really need something that produces ‘random-looking’ output. Unfortunately, this notion is surprisingly hard to formalize. Hash functions are unkeyed, so they’re not pseudo-random functions. What in the world are people asking for?

Random oracles and indifferentiability

The answer, if you dig hard enough, is that people want hash functions to be random oracles.

Random oracles are cryptographers’ conception of what an ‘ideal’ hash function should be. Put succinctly, a random oracle is a perfectly random function that you can evaluate quickly. Random functions are beautiful not just because the output is random-looking (of course), but also because they’re automatically collision-resistant and pre-image resistant. It’s the only requirement you ever need.

The problem with random functions is that you just can’t evaluate them quickly: you need exponential storage space to keep them, and exponential time to evaluate one. Moreover, we know of nothing in the ‘real’ world that can approximate them. When cryptographers try to analyze their schemes with random functions, they have to go off into an imaginary fantasy world that we call the ‘random oracle model.

But ok, this post is not to judge. For the moment, let’s imagine that we are willing to visit this fantasy world. An obvious question is: what would it take to build a random oracle? If we had a compression function that was good enough — itself a random function — could we use a technique like Merkle-Damgård to get the rest of the way?

In 2004, Maurer, Renner and Holenstein gave us a powerful tool for answering this question. What they showed is that it’s always possible to replace functionality A (e.g., a random oracle) with another functionality B (e.g., an ideal compression function) provided that the following rules are satisfied:

  1. There exists a way to ‘construct’ something ‘like’ A out of B.
  2. There exists a way to ‘simulate’ something ‘like’ B using A.
  3. An attacker who interacts with {constructed A-like thing, B} cannot tell the difference (i.e., can’t differentiate it) from {A, simulated B-like thing}

The definition of simulation gets a bit wonky. but expressed in simpler language all this means is: if you can show that your hash function, instantiated with an ‘ideal’ compression function, looks indistinguishable from a random oracle. And you can show that a manufactured compression function, built using a random oracle as an ingredient, looks indistinguishable from an ideal compression function, then you can always replace one with the other. That is, your hash function is ‘good enough’ to be a random oracle.

The following year, Coron, Dodis, Malinaud and Puniya applied this framework to Merkle-Damgård-hash functions. Their first result was immediate: such a proof does not work for Merkle-Damgård. Of course this shouldn’t actually surprise us. We already know that Merkle-Damgård doesn’t behave like a random oracle, since random oracles don’t exhibit length-extension attacks. Still it’s one thing to know this, and another to see a known problem actually turn up and screw up a proof. So far, no problem.

What Coron et al. showed next is much more interesting:
  • They proved formally that Merkle-Damgård can be made indifferentiable from a random oracle, as long as you apply a prefix-free encoding to the input before hashing it. Prefix-free encodings prevent length-extensions by ensuring that no message can ever be a prefix of another.
  • Next, they proved the security of HMAC applied to a Merkle-Damgård hash.
  • Finally, and best of all, they showed that if you simply drop some bits from the last output block — something called a ‘chop’ construction — you can make Merkle-Damgård hashes secure with much less work.

The best part of Coron et al.‘s findings is that the chop construction is already (inadvertently) in place on SHA384, which is constructed by dropping some output bits from its big-brother hash SHA512. The modern hash variants SHA512/224 and SHA512/256 also have this property.** So this theoretical work already has one big payoff: we know that (under certain assumptions) these hashes may be better than some of the others.

And these results have bigger implications. Now that we know how to do this, we can repeat the process for just about every candidate hash function anyone proposes. This lets us immediately weed out obvious bugs, and avoid standardizing another hash with problems like the length extension attack. This process has become so common that all of the SHA3 candidates now sport exactly such an indifferentiability proof.

Of course, in the real world, indifferentiability only takes you so far. It does tell us something, but it doesn’t tell us everything. Sure, if the compression function is perfect, you obtain a strong result about the hash function. But compression functions are never perfect. Real compression functions have glitches and oddities that can make these theoretical results irrelevant. This is why we’ll always need smart people to arm wrestle over which hash we get to use next.

In conclusion

If I had it in me, I’d go on to talk about the SHA3 candidates, and the techniques that each uses to achieve security in this model. But this has already been a long post, so that will have to wait for another time.

I want to say only one final thing.

This is a practical blog, and I admit that I try to avoid theory. What fascinates me about this area is that it’s a great example of a place where theory has directly come to the aid of practice. You may think of hash functions as whizzing little black boxes of ad-hoc machinery, and to some extent they are. But without theoretical analysis like this, they’d be a whole lot more ad-hoc. They might not even work.

Remember this when NIST finally gets around to picking Keccak BLAKE.

Notes:

* For a ridiculous example, imagine that you have a secure (collision-resistant, pre-image resistant) hash function H. Now construct a new hash function H’ such that H'(M) = {“long string of 0s” || H(M)}. This function is as collision-resistant as the original, but won’t be very useful if you’re generating keys with it.

** Thanks to Paulo Barreto for fixing numerous typos and pointing out that SHA512/256 and /224 make excellent candidates for chop hashes!