On Ashton Kutcher and Secure Multi-Party Computation

On Ashton Kutcher and Secure Multi-Party Computation

Back in March I was fortunate to spend several days visiting Brussels, where I had a chance to attend a panel on “chat control“: the new content scanning regime being considered by the EU Commission. Among various requirements, this proposed legislation would mandate that client-side scanning technology be incorporated into encrypted text messaging applications like Signal, WhatsApp and Apple’s iMessage. The scanning tech would examine private messages for certain types of illicit content, including child sexual abuse media (known as CSAM), along with a broad category of textual conversations that constitute “grooming behavior.”

I have many thoughts about the safety of the EU proposal, and you can read some of them here. (Or you’re interested in the policy itself, you can read this recent opinion by the EU’s Council’s Legal Service.) But although the EU proposal is the inspiration for today’s post, it’s not precisely what I want to talk about. Instead, I’d like to clear up some confusion I’ve noticed around the specific technologies that many have proposed to use for building these systems.

Also: I want to talk about Ashton Kutcher.

Ashton Kutcher visits the EU parliament in March 2023 (photo: Roberta Metsola.)

It turns out there were a few people visiting Brussels to talk about encryption this March. Only a few days before my own visit, Ashton Kutcher gave a major speech to EU Parliament members in support of the Commission’s content scanning proposal. (And yes, I’m talking about that Ashton Kutcher, the guy who played Steve Jobs and is married to Mila Kunis.)

Kutcher has been very active in the technical debate around client-side scanning. He’s the co-founder of an organization called Thorn, which aims to develop cryptographic technology to enable content scanning. In March he gave an impassioned speech to the EU Parliament urging the deployment of these technologies, and remarkably he didn’t just talk about the policy side of things. When asked how to balance user privacy against the needs of scanning, he even made a concrete technical proposal: to use fully-homomorphic encryption (FHE) as a means to evaluate encrypted messages.

Now let me take a breath here before my head explodes.

I promise I am not one of those researchers who believes only subject-matter experts should talk about cryptography. Really I’m not! I write this blog because I think cryptography is amazing and I want everyone talking about it all the time. Seeing mainstream celebrities toss around phrases like “homomorphic encryption” is literally a dream come true and I wish it happened every single day.

And yet, there are downsides to this much winning.

I ran face-first into some of those downsides when I spoke to policy experts about Kutcher’s proposal. Terms like fully homomorphic encryption can be confusing and off-putting to non-cryptographers. When filtered through people who are not themselves experts in the technology, these ideas can produce the impression that cryptography is magical pixie dust we can sprinkle on all the hard problems in the world. And oh how I wish that were true. But in the real world, cryptography is full of tradeoffs. Solving one problem often just makes new problems, or creates major new costs, or else shifts the risks and costs to other parts of the system.

So when people on various sides of the debate asked me whether “fully-homomorphic encryption” could really do what Kutcher said it would, I couldn’t give an easy five-word answer. The real answer is something like: (scream emoji) it’s very complicated. That’s a very unsatisfying thing to have to tell people. Out here in the real world the technical reality is eye-glazing and full of dragons.

Which brings me to this post.

What Kutcher is really proposing is that we to develop systems that perform privacy-preserving computation on encrypted data. He wants to use these systems to enable “private” scanning of your text messages and media attachments, with the promise that these systems will only detect the “bad” content while keeping your legitimate private data safe. This is a complicated and fraught area of computer science. In what goes below, I am going to discuss at a high and relatively non-technical level the concepts behind it: what we can do, what we can’t do, and how practical it all is.

In the process I’ll discuss the two most powerful techniques that we have developed to accomplish this task: namely, multi-party computation (MPC) and, as an ingredient towards achieving the former, fully-homomorphic encryption (FHE). Then I’ll try to clear up the relationship between these two things, and explain the various tradeoffs that can make one better than the other for specific applications. Although these techniques can be used for so many things, throughout this post I’m going to focus on the specific application being considered in the EU: the use of privacy-preserving computation to conduct content scanning.

This post will not require any mathematics or much computer science, but it will require some patience. So find a comfortable chair and buckle in.

Computing on private data

Encryption is an ancient technology. Roughly speaking, it provides the ability to convert meaningful messages (and data) into a form that only you, and your intended recipient(s) can read. In the modern world this is done using public algorithms that everyone can look at, combined with secret keys that are held only by the intended recipients.

Modern encryption is really quite excellent. So as long as keys are kept safe, encrypted data can be sent over insecure networks or stored in risky locations like your phone. And while occasionally people find a flaw in an implementation of encryption, the underlying technology works very well.

But sometimes encryption can get in the way. The problem with encrypted data is that it’s, well, encrypted. When stored in this form, such data is virtually useless for practical purposes like performing calculations. Before you can compute on that data, you often need to first decrypt it and thus remove all the beautiful protections we get from encryption.

If your goal is to compute on multiple pieces of data that originate from different parties, the problem can become even more challenging. Who can we trust to do the computing? An obvious solution is to decrypt all that data and hand it to one very trustworthy person, who will presumably swear not to steal it or get hacked. Finding those parties can be quite challenging.

Fortunately for us all, the first academic cryptographers also happened to be computer scientists, and so this was exactly the sort of problem that excited them. Those researchers quickly devised a set of specific and general techniques designed to solve these problems, and also came up with a cool name for them: secure multi-party computation, or MPC for short.

MPC: secure private computation (in six eight ten paragraphs)

The setting of MPC is fairly simple: imagine that we have two (or more!) parties that each have some private data they don’t want to give to anyone else. Yet each of the parties is willing to provide their data as input to some specific computation, and are all willing to reveal the output of that computation — either to everyone involved, or perhaps just to some agreed subset of the parties. Can these parties now perform the computation jointly, without appointing a trusted party?

Let’s make this easier by using a concrete example.

Imagine a group of workers all know their own salaries, but don’t know anything about anyone else’s salary. Yet they wish to compute some statistics over their salary data: for example, the average of their salaries. These workers aren’t willing to share their own salary data with anyone else, but they are willing to submit it as one input in a large calculation under the strict condition that only the final result is ever revealed.

This might seem contrived to you, but it is in fact a real problem that some people have used MPC to solve.

An MPC protocol allows the workers to do this, without appointing a trusted central party or revealing their inputs (and intermediate calculations) to anyone else. At the conclusion of the protocol each party will learn only the result of the calculation:

The “cloud” at the center of this diagram is actually a complicated protocol where every party sends messages to every other party.

MPC protocols typically provide strong provable guarantees about their properties. The details vary, but typically speaking: no party will learn anything about the other parties’ inputs. Indeed they won’t even learn any partial information that might be produced during the calculation. Even better, all parties can be assured that the result will be correct: as long as all parties submit valid inputs to the computation, none of them should be able to force the calculation to go awry.

Now obviously there are caveats.

In practice, using MPC is a bit like making a deal with a genie: you need to pay very careful attention to the fine print. Even when the cryptography works perfectly, this does not mean that computing your function is actually “safe.” In fact, it’s entirely possible to choose functions that when computed securely are still devastating to your privacy.

For example: imagine that I use an MPC protocol to compute an average salary between myself and exactly one other worker. This could be a very bad idea! Note that if the other worker is curious, then she can figure out how much I make. That is: the average of our two wages reveals enough information that she find my wage given knowledge of her own input. This (obvious) caveat applies to many other uses of MPC, even when the technology works perfectly.

This is not a criticism of MPC, just the observation that it’s a tool. In practice, MPC (or any other cryptographic technology) is not a privacy solution by itself, at least not in the sense of privacy that real-world human beings like to think about. It provides certain guarantees that may or may not be useful for providing privacy in the real world.

What does MPC have to do with client-side scanning?

We began this post by discussing client-side scanning for encrypted messaging apps. This is a perfect example of an application that fits the MPC (or two-party computation) use-case perfectly. That’s because in this setting we generally have multiple parties with secret data who want to perform some joint calculation on their inputs.

In this setting, the first party is typically a client (usually a person using an encrypted messaging app like WhatsApp or Signal), who possesses some private text message or perhaps a media file they wish to send to another user. Under proposed law in the EU, their app could be legally mandated to “scan” that image to see if it contains illegal content.

According to the EU Commission, this scanning can be done in a variety of ways. For example, the device could compare an images against a secret database of known illicit content (typically using a specialized perceptual hash function.) However, while the EU plan starts there, their plans also get much more ambitious: they also intend to look for entirely new instances of illicit content as well as textual “grooming” conversations, possibly using machine learning (ML) models, that is, deep neural networks that will be trained to recognize data that fits these patterns. These various models must be sophisticated enough to understand entirely new images, as well as to derive meaning from complex interactive human conversation. None of this is likely to be very simple.

Now most of this could be done using standard techniques on the client device, except for one major limitation. The challenge in this setting is that the provider doing the scanning usually wants to keep these hashes and/or ML models secret.

There are several reasons for this. The first reason is that knowledge of the scanning model (or database of illicit content) makes it relatively easy for bad actors to evade the model. In other words, with only modest transformations it’s possible to modify “bad” images so that they become invisible to ML scanners.

Knowledge of the model can also allow for the creation of “poisoned” imagery: these include apparently-benign images (e.g., a picture of a cat) that trigger false positives in the scanner. (Indeed this such “colliding” images have been developed for some hash-based CSAM scanning proposals.) More worryingly, in some cases the hashes and neural network models can be “reversed” to extract the imagery and textual content they were trained on: this has all kinds of horrifying implications, and could expose abuse victims to even more trauma.

So here the user doesn’t want to send its confidential data to the provider for scanning, and the provider doesn’t want to hand its confidential model parameters to the user (or even to expose them inside the user’s phone, where they might be stolen by reverse-engineers.) This is exactly the situation that MPC was designed to handle:

Sketch of a client-side scanning architecture that uses (two-party) MPC between the client and the Provider. The client inputs the content to be scanned, while the server provides its secret model and/or hash database. The protocol gives the provider a copy of the user’s content if and only if the model says it’s illicit content, otherwise the provider sees nothing. (Note in this variant, the output goes only to the Provider.)

This makes everything very complicated. In fact, there has only been one real-world proposal for client-side CSAM scanning that has ever come (somewhat) close to deployment: that system was designed by Apple for a (now abandoned) client-side photo scanning plan. The Apple approach is cryptographically very ambitious: it uses neural-network based perceptual hashing, and otherwise exactly follows the architecture described above. However, critically: it relied on a neural-network based hash function that was not kept secret. Disastrous results ensued (see further below.)

(If you’re interested in getting a sense of how complex this protocol is, here is a white paper describing how it works.)

A diagram from the Apple CSAM scanning protocol.

Ok, so what kind of MPC protocols are available to us?

Multi-party computation is a broad category. It describes a class of protocols. In practice there are many different cryptographic techniques that allow us to realize it. Some of these (like the Apple protocol) were designed for specific applications, while others are capable of performing general-purpose computation.

I promised this post would not go into the weeds, but it’s worth pointing out that general MPC techniques typically make use of (some combination of) three different techniques: secret sharing, circuit garbling, and homomorphic encryption. Often, efficient modern systems will use a mixture of two or three of those techniques, just to make everything more confusing because they’re to maximize efficiency.

What is it that you need to know about these techniques? Here I’ll try, in a matter of a few sentences (that will draw me endless grief) to try to summarize the strengths and disadvantages of each technique.

Both secret sharing and garbling techniques share a common feature, which is that they require a great deal of data to be sent between the parties. In practice the amount of data sent between the parties will grow with (at least) the size of the inputs they’re computing on, but often will grow according to the complexity of the calculation they’re performing. For things like deep neural networks where both the data and calculation are huge, this generally results in fairly surprising amounts of data transfer.

This is not usually considered to be a problem on the general Internet or within EC2 datacenters, where data transfer is cheap. It can be quite a challenge when one of those parties is using a cellphone, however. That makes any scheme using these technologies subject to some very serious limitations.

Homomorphic encryption schemes take a different approach. These systems make use of specialized encryption schemes that are malleable. This means that encrypted data can be “modified” in useful ways without ever decrypting it.

In a bit more detail: in fully-homomorphic encryption MPC systems, a first party can encrypt its data under a public key that it generates. It can then send the encrypted data to a second party. This second party can then perform calculations on the ciphertext while it is still encrypted — adding and multiplying it together with other data (including data encrypted by the second party) to perform some calculation. Throughout this process all of the data remains encrypted. At the conclusion of this process, the second party will end up with a “modified” ciphertext that internally contains a final calculation result, but that it cannot read. To finish the protocol, the second party can send that ciphertext back to the first party, who can then decrypt it using its secret key and obtain the final output.

The major upshot of the pure-FHE technique is that it substantially reduces the amount of data that the two parties need to transmit between them, especially compared to the other MPC techniques. The downside of this approach is… well, there are several. One is that FHE calculations typically require vastly more computational effort (and hence time and carbon emissions) than the other techniques. Moreover, they may still require a good deal of data transfer — in part because the number of calculations that one can perform on a given ciphertext is usually limited by “noise” that turns up within the ciphertext. Hence, calculations must either be very simple or else broken up into “phases”, where the partial calculation result is decrypted and re-encrypted so that more computation can be done. This can be done interactively between the parties, or by the second party alone (using a technique called “bootstrapping”) but in both cases the cost is either much more bandwidth exchanged or a great deal of extra computation.

In practice, cryptographers rarely commit to a single approach. They instead combine all these techniques in order to achieve an appropriate balance of data-transfer and computational effort. These “mixed systems” tend to have merely large amounts of data transfer and large amounts of computation, but are still amazingly efficient compared to the alternatives.

For an example of this, consider this very optimized two-party MPC scheme aimed at performing neural network classification. This scheme takes (from the client) a 32×32 image, and evaluates a tiny 7-layer neural network held by a server in order to perform classification. As you can see, evaluating the model even on a single image requires about 8 seconds of computation and 200 megabytes of bandwidth exchange, for each image being evaluated:

Source: MUSE paper, figure 8. These are the times for a 7-layer MiniONN network trained on the CIFAR-10 dataset.

These numbers may seem quite high, but in fact they’re actually really impressive as these things go. Previous systems used nearly an order of magnitude more time and bandwidth to do their work. Maybe there will be further improvements in the future! Even on a pure efficiency basis there is much work to be done.

What are the other risks of MPC in this setting?

The final point I would like to make is that secure MPC (or MPC built using FHE as a tool) is not itself enough to satisfy the requirements of a safe content scanning system. As I mentioned above, MPC systems merely evaluate some function on private data. The question of whether that function is safe is left largely to the system designer.

In the case of these content scanning systems, the safety of the resulting system really comes down to a question of whether the algorithms work well, particularly in settings where “bad guys” can find adversarial inputs that try to disrupt the system. It also requires new techniques to ensure that the system cannot be abused. That is: there must be guarantees within the computation to ensure that the provider (or a party who hacks the provider) cannot change the model parameters to allow them to access your private content.

This is a much longer conversation than I want to have in this post, because it fundamentally requires one to think about whether the entire system makes sense. For a much longer discussion of the risks, see this paper.

This was nice, but I would like to learn more about each of these technologies!

The purpose of this post was just to give the briefest explanation of the techniques that exist for performing all of these calculations. If you’re interested in knowing (a lot more!) about these technologies, take a look at this textbook by Evans, Kolesnikov and Rosulek. MPC is an exciting area, and one that is advancing every single (research) conference cycle.

And maybe that is the lesson of this post: these technologies are still research techniques. It’s probably not quite time to put them out in the world.

PRFs, PRPs and other fantastic things

PRFs, PRPs and other fantastic things

A few weeks ago I ran into a conversation on Twitter about the weaknesses of applied cryptography textbooks, and how they tend to spend way too much time lecturing people about Feistel networks and the boring details of AES. Some of the folks in this conversation suggested that instead of these things, we should be digging into more fundamental topics like “what is a pseudorandom function.” (I’d link to the thread itself, but today’s Twitter is basically a forgetting machine.)

This particular point struck a chord with me. While I don’t grant the premise that Feistel networks are useless, it is true that pseudorandom functions, also known as PRFs, are awfully fundamental. Moreover: these are concepts that get way too little coverage in (non-theory) texts. Since that seems bad for aspiring practitioners, I figured I’d spend a little time trying to explain these concepts in an intuitive way — in the hopes that I can bring the useful parts to folks who aren’t being exposed to these ideas directly.

This is going to be a high-level post and hence it will skip all the useful formalism. It’s also a little wonky, so feel free to skip it if you don’t really care. Also: since I need to be contrary: I’m going to talk about Feistel networks anyway. That bit will come later.

What’s a PRF, and why should I care?

Pseudorandom functions (PRFs) and pseudorandom permutations (PRPs) are two of the most fundamental primitives in modern cryptography. If you’ve ever implemented any cryptography yourself, there’s an excellent chance you relied on an algorithm like AES, HMAC or ChaCha20 to implement either encryption or authentication. If you did this, then you probably relied on some security property you assumed those primitives to have. But what precisely is that security property you’re relying on?

We could re-imagine this security definition from scratch every time we look at a new cipher. Alternatively, we could start from a much smaller number of general mathematical objects that provide security properties that we can reason about, and try to compare those to the algorithms we actually use. The second approach has a major advantage: it’s very modular. That is, rather than re-design every single protocol each time we come up it with a new type of cipher, all we really need to do is to analyze it with the idealized mathematical objects. Then we can realize it using actual ciphers, which hopefully satisfy these well-known properties.

Two of the most common such objects are the pseudorandom function (PRF) and the pseudorandom permutation (PRP). At the highest level, these functions have two critical properties that are extremely important to cryptographers:

  1. They are keyed functions: this means that they take in a secret key as well as some input value. (This distinguishes them from some hash functions.)
  2. The output of a PRF (or PRP), when evaluated on some unique input, typically appears “random.” (But explaining this rough intuition precisely will require some finesse, see below.)

If a function actually can truly achieve those properties, we can use it to accomplish a variety of useful tasks. At the barest minimum, these properties let us accomplish message authentication (by building MACs), symmetric encryption by building stream ciphers, and key derivation (or “pluralization”) in which a single key is turned into many distinct keys. We can also use PRFs and PRPs to build other, more fundamental primitives such as pseudorandom number generators and modes of operation, which happen to be useful when encrypting things with block ciphers.

The how and why is a little complex, and that’s the part that will require all the explanation.

Random functions

There are many ideal primitives we’d love to be able to use in cryptography, but are thwarted from using due to the fact that they’re inefficient. One of the most useful of these is the random function.

Computer programmers tend to get confused about functions. This is mostly because programming language designers have convinced us that functions are the same thing as the subroutines (algorithms) that we use to compute them. In the purely mathematical sense, it’s much more useful to forget about algorithms, and instead think of functions as simply being a mapping from some set of input values (the domain) to some set of output values (the range).

If we must think about implementing functions, then for any function with a finite domain and range, there is always a simple way to implement it: simply draw up a giant (and yet still finite!) lookup table that contains the mapping from each input to the appropriate output value. Given such a table, you can always quickly realize an algorithm for evaluating it, simply by hard-coding the table into your software and performing a table lookup. (We obviously try not to do this when implementing software — indeed, most of applied computer science can be summarized as “finding ways to avoid using giant lookup tables”.)

A nice thing about the lookup table description of functions is that it helps us reason about concepts like the number of possible functions that can exist for a specific domain and range. Concretely: if a function has M distinct input values and N outputs, then the number of distinct functions sharing that profile is N^M. This probably won’t scale very well for even modest values of M and N, but let’s put this aside for a moment. Given enough paper, we could imagine writing down each unique lookup table on a piece of paper: then we could stack those papers up and admire the minor ecological disaster we’d have just created.

Now let’s take this thought-experiment one step farther: imagine that we could walk out among those huge stacks of paper we’ll have just created, and somehow pick one of these unique lookup tables uniformly at random. If we could perform this trick routinely, the result would be a true “random function”, and it would make an excellent primitive to use for cryptography. We could use these random functions to build hash functions, stream ciphers and all sorts of other things that would make our lives much easier.

There are some problems with this thought experiment, however.

A big problem is that, for functions with non-trivial domain and range, there just isn’t enough paper in the world to enumerate every possible function. Even toy examples fall apart quickly. Consider a tiny hash function that takes in (and outputs) only 4-bit strings. This gives us M=16 inputs and N=16 outputs, and hence number of (distinct) mappings is 16^{16} = 2^{64}, or about 18 quintillion. It gets worse if you think about “useful” cryptographic functions, say those with the input/output profile of ChaCha20, which has 128-bit inputs and 512-bit outputs. There you’d need a whopping (2^{512})^{2^{128}} (giant) pieces of paper. Since there are only around 2^{272} atoms in the observable universe (according to literally the first Google search I ran on the topic), we would quickly run into shortages even if we were somehow able to inscribe each table onto a single atom.

Obviously this version of the thought experiment is pretty silly. After all: why bother to enumerate every possible function if we’re going to throw most of them away? It would be much more efficient if we could sample a single random function directly without all the overhead.

This also turns out to be fairly straightforward: we can write down a single lookup table with M rows (corresponding to all possible inputs); for each row, we can sample a random output from the set of N possible outputs. The resulting table will be M rows long and each row will contain log_2(N) bits of data.

While this seems like a huge improvement over the previous approach, it’s still not entirely kosher. Even a single lookup table is still going to huge — at least as large as the function’s entire domain. For example: if we wanted to sample a random function with the input/output profile of ChaCha20, the table would require enough paper to contain 512*2^{128} = 2^{137} bits.

And no, we are not going to be able to compress this table! It should be obvious now that a random function generated this way is basically just a file full of random data. Since it has maximal entropy, compression simply won’t work.

The fact that random functions aren’t efficient for doing cryptography does not always stop cryptographers from pretending that we might use them, most famously as a way to model cryptographic hash functions in our security proofs. We have an entire paradigm called the random oracle model that makes exactly this assumption. Unfortunately, in reality we can’t actually use random functions to implement cryptographic functions — sampling them, evaluating them and distributing their code are all fundamentally infeasible operations. Instead we “instantiate” our schemes with an efficient hash algorithm like SHA3, and then we pray.

However, there is one important caveat. While we generally cannot sample and share large random functions in practice, we hope we can do something almost as interesting. That is, we can build functions that appear to be random: and we can do this in a very powerful cryptographic sense.

Pseudorandom functions

Random functions represent a single function drawn from a family of functions, namely the family that consists of every possible function that has the appropriate domain and range. As noted above, the cost of this decision is that such functions cannot be sampled, evaluated or distributed efficiently.

Pseudorandom functions share a similar story to random functions. That is, they represent a single function sampled from a family. What’s different in this case is that the pseudorandom function family is vastly smaller. A benefit of this tradeoff is that we can demand that the description of the function (and its family) be compact: pseudorandom function families must possess a relatively short description, and it must be possible to both sample and evaluate them efficiently: meaning, in polynomial time.

Compared to the set of all possible functions over a given domain and range, a pseudorandom function family is positively tiny.

Let’s stick with the example of ChaCha20. As previously discussed, ChaCha has a 128-bit input, but it also takes in a 256-bit secret key. If we were to view ChaCha20 as a pseudorandom function family, then we could view it as a family of 2^{256} individual functions, where each key value selects exactly one function from the family.

Now let’s be clear: 2^{256} is still a really big number! However: it is vastly smaller than (2^{512})^{2^{128}}, which is the total number of possible functions with ChaCha20’s input profile. Sampling a random 256-bit key and sharing it with to Bob is eminently feasible; indeed, your browser did something like this when you loaded this website. Sampling a “key” of bit-length {512}*{2^{128}} is not.

This leaves us with an important question, however. Since ChaCha20’s key is vastly smaller than the description of a random function, and the algorithmic description of ChaCha20 is also much smaller than the description of even a single random function, is it possible for small-key function family like “ChaCha20” to be as good (for cryptographic purposes) as a true random function? And what does “good” even mean here?

Defining pseudorandomness

Mirriam-Webster defines the prefix pseudo as “being apparently rather than actually as stated.” The Urban Dictionary is more colorful: it defines pseudo as “false; not real; fake replication; bootleg; tomfoolery“, and also strongly hints that pseudo may be shorthand for pseudoephedrine (note: it is not.)

Clearly if we can describe a function using a compact algorithmic description and a compact key, then it cannot be a true random function: it is therefore bootleg. However that doesn’t mean it’s entirely tomfoolery. What pseudorandom means, in a cryptographic sense, is that a function of this form will be indistinguishable from a truly random function — at least to an adversary who does not know which function we have chosen from the family, and who has a limited amount of computing power.

Let’s unpack this definition a bit!

Imagine that I create a black box that contains one of two possible items, chosen with equal probability. Item (1) is an instance of a single function sampled at random from a purported pseudorandom function family; item (2) is a true random function sampled from the set of all possible functions. Both functions have exactly the same input/output profile, meaning they take in inputs of the same length, and produce outputs of the same length (here we are excluding the key.)

Now imagine that I give you “oracle” access to this box. What this means is: you will be allowed to submit any input values you want, and the box will evaluate your input using whichever function it contains. You will only see the output. (And no, you don’t get to time the box or measure any side channels it might compute, this is a thought experiment.) You can submit as many inputs as you want, using any strategy for choosing them that you desire: they simply have to be valid inputs, meaning that they’re within the domain of the function. We will further stipulate that you will be computationally limited: that means you will only be able to compute for a limited (polynomial in, say, the PRF’s key length) number of timesteps. At the end of the day, your goal is to guess which type of function I’ve placed in the box.

We say that a family of functions is pseudorandom if for every possible efficient strategy (meaning, using any algorithm that runs in time polynomial in the key size, provided these algorithms were enumerated before the function was sampled), the “advantage” you will have in guessing what’s in the box is very tiny (at most negligible in, say, the size of the function’s key.)

A fundamental requirement of this definition is that the PRF’s key/seed (aka the selector that chooses which function to use) has to remain secret from the adversary. This is because the description of the PRF family itself cannot be kept secret: that is both good cryptographic practice (known as Kerckhoff’s principle), but also due to the way we’ve defined the problem over “all possible algorithms”, which necessarily includes algorithms that have the PRF family’s description coded inside of them.

And pseudorandom functions cannot possibly be indistinguishable from random ones if the attacker can learn or guess the PRF’s secret key: this would allow the adversary to simply compute the function themselves and compare the results they get to the values that come out of the oracle (thus winning the experiment nearly 100% of the time.)

There’s a corollary to this observation: since the key length of the PRF is relatively short, the pseudorandomness guarantee can only be computational in nature. For example, imagine the key is 256 bits long: an attacker with unlimited computational resources could brute-force guess its way through all possible 256-bit keys and test each one against the results coming from the oracle. If the box truly contains a PRF, then with high probability she’ll eventually find a key that produces the same results as what comes out of the box; if the box contains a random function, then she probably won’t. To rule such attacks out of bounds we must assume that the adversary is not powerful enough to test a large fraction of the keyspace. (In practice this requirement is pretty reasonable, since brute forcing through an n-bit keyspace requires on the order of 2^n work, and we assume that there exist reasonable values of n for which no computing device exists that can succeed at this.)

So what can we do with pseudorandom functions?

As I mentioned above, pseudorandom functions are extremely useful for a number of basic cryptographic purposes. Let’s give a handful here.

Building stream ciphers. One of the simplest applications of a PRF is to use it to build an efficient stream cipher. Indeed, this is exactly what the ChaCha20 function is typically used for. Let us assume for the moment that ChaCha20 is a PRF family (I’ll come back to this assumption later.) Then we could select a random key and evaluate the function on a series of unique input values — the ChaCha20 IETF proposals suggest concatenating a 64-bit block number with a counter — and then concatenate the outputs of the function together to produce a keystream of bits. To encrypt a message we would simply exclusive-OR (XOR) this string of bits (called a “keystream”) with the message to be enciphered.

Why is this reasonable? The argument breaks down into three steps:

  1. If we had generated the keystream using a perfect random number generator (and kept it secret, and never re-used the keystream) then the result would be a one-time pad, which is known to be perfectly secure.
  2. And indeed, had we had been computing this output using a truly random function (with a ChaCha20-like I/O profile) where each input was used exactly once, the result of this evaluation would indeed have been such a random string.
  3. Of course we didn’t do this: we used a PRF. But here we can rely on the fact that our attackers cannot distinguish PRF output from that of a random function.

One can make the last argument the other way around, too. If our attacker is much better at “breaking” the stream cipher implemented with a PRF than they are at breaking one implemented with a random function, then they are implicitly “distinguishing” the two types of function with a substantial advantage — and this is precisely what the definition of a PRF says that an attacker cannot do!

Constructing MACs. A PRF with a sufficiently large range can also be used as a Message Authentication Code. Given a message M, the output of PRF(k, M) — the PRF evaluated on a secret key k and the message M — should itself be indistinguishable from the output of a random function. Since this output will effectively be a random string, this means that an attacker who has not previously seen a MAC on M should have a hard time guessing the appropriate MAC for a given message. (The “strength” of the MAC will be proportional to the output length of the PRF.)

Key derivation. Often in cryptography we have a single random key k and we need to turn this into several random-looking keys (k1, k2, etc.) This happens within protocols like TLS, which (at least in version 1.3) has an entire tree of keys that it derives from a single master secret. PRFs, it turns out, are an excellent for this task. To “diversify” a single key into multiple keys, one can simply evaluate the PRF at a series of distinct points (say, k1 = PRF(k, 1), k2 = PRF(k, 2), and so on), and the result is a set of keys that are indistinguishable from random; provided that the PRF does what it says it does.

There are, of course, many other applications for PRFs; but these are some pretty important ones.

Pseudorandom permutations

Up until now we’ve talked about pseudorandom functions (PRFs): these are functions that have output that is indistinguishable from a random function. A related concept is that of the pseudorandom permutation (PRP). Pseudorandom permutations share many of the essential properties of PRFs, with one crucial difference: these functions realize a permutation of their input space. That is: if we concentrate on a given function in the family (or, translating to practical terms, we fix one “key”) then each distinct input maps to a distinct output (and vice versa.)

A nice feature of permutations is that they are potentially invertible, which makes them a useful model for something we use very often in cryptography: block ciphers. These ciphers take in a key and a plaintext string, and output a ciphertext of the same length as the plaintext. Most critically, this ciphertext can be deciphered back to the original plaintext. Note that a standard (pseudo)random function doesn’t necessarily allow this: for example, a PRF instance F can map multiple inputs (A, B) such that F(A) = F(B), which makes it very hard to uniquely invert either output.

The definition of a pseudorandom permutation is very similar to that of a PRF: they must be indistinguishable from some idealized function — only in this case the ideal object is a random permutation. A random permutation is simply a function sampled uniformly from the set of all possible permutations over the domain and range. (Because really, why wouldn’t it be?)

There are two important mathematical features of PRPs that I should mention here:

PRPs are actually PRFs (to an extent.) A well-known result in cryptography, called the “PRP/PRF switching lemma” demonstrates that a PRP with sufficiently-large domain and range basically “is” a PRF. Put differently: a pseudorandom permutation placed into an oracle can be computationally indistinguishable from an oracle that contains a random function (with the same domain and range), provided the range of the function is large enough and the attacker doesn’t make too many queries.

The intuition behind this result is fairly straightforward. If we consider this from the perspective of an attacker interacting with some function in an oracle, the only difference between a random permutation and a random function is that the former will never produce any collisions — distinct inputs that produce the same output — while the latter may (occasionally) do so.

Feistel cipher diagram en.svg

From the adversary’s perspective, therefore, the ability to distinguish whether the oracle contains a random permutation or a random function devolves to querying the oracle to see if one can observe such a collision. Clearly if it sees even one collision of the form F(A) = F(B), then it’s not dealing with a permutation. But it may take many queries for the attacker to find such a collision in a random function, or to be confident one should already have occurred (and hence it is probably interacting with a PRP.)

In general the ability to distinguish the two is a function of the number of queries the attacker is allowed to make, as well as the size of the function’s range. After a single query, the probability of a collision (on a random function) is zero: hence the attacker has no certainty at all. After two queries, the probability is equal to 1/N where N is the number of possible outputs. As the attacker makes more queries this probability increases. Following the birthday argument the expected probability reaches p=0.5 after about \sqrt{N} queries. For functions like AES, which has output size 2^{128}, this will occur around 2^{64} queries.

PRFs can be used to build PRPs. The above result shows us that PRPs are usually good enough to serve as PRFs “without modification.” What if one has a PRF and wishes to build a PRP from it? This can also be done, but it requires more work. The standard technique was proposed by Luby and Rackoff and it involves building a Feistel network, where each “round function” in the PRP is built using a pseudorandom function. (See the picture at right.) This is a bit more involved than I want to get in this post, so please just take away the notion that the existence of one of these objects implies the existence of the other.

Why do I care about any of this?

I mean, you don’t have to. However: I find that many people just getting into cryptography tend to get very involved in the deep details of particular constructions (ciphers and elliptic curves being one source of rabbit-holing) and take much longer to learn about useful analysis tools like PRFs and PRPs.

Once you understand how PRPs and PRFs work, it’s much easier to think about protocols like block cipher modes of operation, or MAC constructions, or anything that involves deriving multiple keys.

Take a simple example, the CBC mode of operation: this is a “classical” mode of operation used in many block ciphers. I don’t recommend that you use it (there are better modes) but it’s actually a very good teaching example. CBC encryption requires the sender to first select a random string called an Initialization Vector, then to chop up their message into equal-size blocks. Encryption looks something like this:

Cipher block chaining (CBC) mode encryption
From Wikipedia. The plus signs are bitwise XOR.

If we’re willing to assume that the block cipher is a PRP, then analyzing the security of this construction shouldn’t be terribly hard. Provided the block size of the cipher is large enough, we can first use the PRP/PRF switching lemma to argue that a PRP is (computationally) indistinguishable from a random function. To think about the security of CBC-mode encryption, therefore, we can (mathematically) replace our block cipher with a random function of the appropriate domain and range. Now the question is whether CBC-mode is secure when realized with a random function.

So if we replace the block cipher with a random function, how does the argument work?

Well obviously in a real scheme both the encryptor and decryptor would need to have a copy of the same function, and we’ve already covered why that’s problematic: the function would need to be fully-sampled and then communicated between the two parties. Then they would have to scan through a massive table to find each entry. But let’s put that aside for a moment.

Instead let’s focus only on the encryptor. Since we don’t have to think about communicating the entire function to another party, we don’t have to sample it up front. Instead we can sample it “lazily” for the purposes of arguing security.

More specifically: means instead of sampling the entire random function in one go, we can instead imagine using an oracle that “builds” the function one query at a time. The oracle works as follows: anytime the encryptor queries it on some input value, the oracle checks to see if this value has been queried before. If it has previously been queried, the oracle outputs the value it gave previously. Otherwise it samples a new (uniformly random) output string using a random number generator, then writes the input/output values down so it can check for later duplicate inputs.

Now imagine that an encryptor is using CBC mode to encrypt some secret message, but instead of a block cipher they are using our “random function” oracle above. The encryption of a message will work like this:

  1. To encrypt each new message, the encryptor will first choose a uniformly-random Initialization Vector (IV).
  2. She will then XOR that IV with the first block of the message, producing a uniformly-distributed string.
  3. Then she’ll query the random function oracle to obtain the “encipherment” of this string. Provided the oracle hasn’t seen this input before, it will sample and output a uniformly random output string. That string will form the first block of ciphertext.
  4. Then the encryptor will take the resulting ciphertext block and treat it as the “IV” for the next message block, and will repeat steps (2-4) over and over again for each subsequent block.

Notice that this encryption is pretty good. As long as the oracle never gets called on the same input value twice, the output of this encryption process will be a series of uniformly-random bits that have literally nothing to do with the input message. This strongly imples that CBC ciphertexts will be very secure! Of course we haven’t really proven this: we have to consider the probability that the encryptor will query the oracle twice on the same input value. Fortunately, with a little bit of simple probability, we can show the following: since (1) each input is uniformly distributed, then (2) the probability of such a repeated input stays quite low.

(In practice the probability works out to be a function of the function’s output length and the total number of plaintext blocks enciphered. This analysis is part of the reason that cryptographers generally prefer ciphers with large block sizes, and why we place official limits on the number of blocks you’re allowed to encipher with modes like CBC before you change the key. To see more of the gory details, look at this paper.)

Notice that so far I’ve done this analysis assuming that the block cipher (encipherment) function is a random function. In practice, it makes much more sense to assume that the block cipher is actually a pseudorandom permutation. Fortunately we’ve got most of the tools to handle this switch. We need to add two final details to the intuition: (1) since a PRF is indistinguishable from a random function to all bounded adversaries, we can first substitute in a PRF for that random function oracle with only minimal improvement in the attacker’s ability to distinguish the ciphertext from random bits. Next: (2) by the PRP/PRF switching lemma we can exchange that PRF for a PRP with similarly minor impact on the adversary’s capability.

This is obviously not a proof of security: it’s merely an intuition. But it helps to set up the actual arguments that would appear in a real proof. And you can provide a similar intuition for many other protocols that use keyed PRF/PRP type functions.

What if the PRP/PRF key isn’t secret?

One of the biggest restrictions on the PRF concept is the notion that these functions are only secure when the secret key (AKA, the choice of which “function” to use from the family) is kept secret from the adversary. We already discussed why this is critical: in the PRF (resp. PRP) security game, an attacker who learns the key can instantly “distinguish” a pseudorandom function from a random one. In other words, knowledge of the secret key explodes the entire concept of pseudorandomness. Hence from a mathematical perspective, the security properties of a PRF are somewhere between non-existent and undefined in this setting.

But that’s not very satisfying, and this kind of non-intuitive behavior only makes people ask more questions. They come back wondering: what actually happens when you learn the secret key for a PRF? Does it explode or collapse into some kind of mathematical singularity? How does a function go from “indistinguishable from random” to “completely broken” based on learning a small amount of data?

And then, inevitably, they’ll try to build things like hash functions using PRFs.

The former questions are mainly interesting to cryptographic philosophers. However the latter question is practically relevant, since people are constantly trying to do things like build hash functions out of block ciphers. (NB: this is not actually a crazy idea. It’s simply not possible to do it based solely on the assumption that these functions are pseudorandom.)

So what happens to a PRF when you learn its key?

One answer to this question draws from the following (tempting, but incorrect) line of reasoning: PRFs must somehow produce statistically-“random looking” output all the time, whether you know the key or not. Therefore, the argument goes, the PRF is effectively as good as random even after one learns the key.

This intuition is backed up by the following thought-experiment:

  1. Imagine that at time (A) I do not know the key for a PRF, but I query an oracle on a series of inputs (for simplicity, let’s say I use the values 1, 2, …, q for some integer q that is polynomial in the key length.)
  2. Clearly at this point, the outputs of the PRF must be indistinguishable from those of a true random function. If the range of the function comprises \ell-bit strings, then any statistical “randomness test” I run on those outputs should “succeed”, i.e., tell me that they look pretty random.

    (Putting this differently: if any test reliably “fails” on the output of the PRF oracle, but “succeeds” on the output of a true random function, then you’ve just built a test that lets you distinguish the PRF from a random function — and this means the function was never a PRF in the first place! And your “PRF” will now disappear in a puff of logic.)
  3. Now imagine that at time (B)after I’ve obtained the oracle outputs — someone hands me the secret key for the PRF that was inside the oracle. Do the outputs somehow “stop” being random? Will the NIST test suite suddenly start failing?

The simple answer to the last question is “obviously no.” Any public statistical test you could have performed on the original outputs will still continue to pass, even after you learn the secret key. What has changed in this instance is that you can now devise new non-public statistical tests that are based on your knowledge of the secret key. For example, you might test to see if the values are outputs of the PRF (on input the secret key), which of course they would be — and true random numbers wouldn’t be.

So far this doesn’t seem so awful.

Where things get deeply unpleasant is if the secret key is known to the attacker at the time it queries the oracle. Then the calls to the PRF can behave in ways that deviate massively from the expected behavior of a random function. For example, consider a function called “Katy-Perry-PRF” that generally behaves like a normal PRF most of the time, but that spews out Katy Perry lyrics when queried on specific (rare) inputs.

Provided that these rare inputs are hard for any attacker to find — meaning, the attacker will find them only with negligible probability — then Katy-Perry-PRF will be a perfectly lovely PRF. (More concretely, we might imagine that the total number of possible input values is exponential in the key length, and the set of “Katy-Perry-producing” input values forms a negligible fraction of this set, distributed pseudorandomly within it, to boot.) We can also imagine that the location of these Katy-Perry-producing inputs is only listed in the secret key, which a normal PRF adversary will not have.

Clearly a standard attacker (without the secret key) is unlikely to find any inputs that produce Katy Perry lyrics. Yet an attacker who knows the secret key can easily obtain the entire output of Katy Perry’s catalog: this attacker will simply look through the secret key to find the appropriate inputs, and then query them all one at a time. The behavior of the Katy-Perry function on these inputs is clearly very different from what we’d expect from a random function and yet here is a function that still satisfies the definition of a PRF.

Now obviously Katy-Perry-PRF is a silly and contrived example. Who actually cares if your PRF outputs Katy Perry lyrics? But similar examples can be used to produce PRFs that enable easy “collisions”, which is generally a bad thing when one is trying to build things like hash functions. This is why the construction of such functions needs to either assume weaker properties (i.e., that you get only collision-resistance) or make stronger assumptions, such as the (crazy) assumption that the block cipher is actually a random function.

Finally: how do we build PRFs?

So far I’ve been using the ChaCha function as an example of something we’d really like to imagine is a PRF. But the fact of the matter is that nobody knows how to actually prove this. Most of the practical functions we use like PRFs, which include ChaCha, HMAC-SHA(x), and many other ciphers, are constructed from a handful of simple mathematical operations such as rotations, XORs, and additions. The result is then analyzed by very smart people to see if they can break it. If someone finds a flaw in the function, we stop using it.

This is theoretically less-than-elegant. Instead, it would be nice to have constructions we clearly know are PRFs. Unfortunately the world is not quite that friendly to cryptography.

From a theoretical perspective, we know that PRFs can be constructed from pseudorandom generators (PRGs). We further know that PRGs can in turn be constructed from one-way functions (OWFs). The existence of the latter functions is one of the most basic assumptions we make in cryptography, which is a good way of saying we have no idea if they exist but we are hopeful. Indeed, this is the foundation of what’s called the “standard model.” But in practice the existence of OWFs remains a stubbornly open problem, bound tightly to the P/NP problem.

If that isn’t entirely satisfying to you, you might also like to know that we can also build (relatively) efficient PRFs based on the assumed hardness of a number of stronger mathematical assumptions, including things like the Decisional Diffie-Hellman assumption and various assumptions in lattices. Such things are nice mainly because they let us build cool things like oblivious PRFs.

Phew!

This has been a long piece, on that I’m glad to have gotten off my plate. I hope it will be helpful to a few people who are just starting out in cryptography and are itching to learn more. If you are one of these people and you plan to keep going, I urge you to take a look at a cryptography textbook like Katz/Lindell’s excellent textbook, or Goldreich’s (more theoretical) Foundations of Cryptography.

Top photo by Flickr user Dave DeSandro, used under CC license.

Book Review: Red Team Blues

Book Review: Red Team Blues

As a rule, book reviews are not a thing I usually do.

So when I received an out-of-the-blue email from Cory Doctorow last week asking if I would review his latest book, Red Team Blues, it took a minute to overcome my initial skepticism. While I’m a fan of Cory’s work, this is a narrow/nerdy blog about cryptography, not a place where we spend much time on literature. Moreover, my only previous attempt to review a popular cryptography novel — a quick sketch of Dan Brown’s abysmal Digital Fortress — did not go very well for anyone.

But Cory isn’t Dan Brown. And Red Team Blues is definitely not Digital Fortress.

This became obvious in the middle of the first chapter, when a character began explaining the operation of a trusted execution environment and its various digital signing keys. While it’s always fun to read about gangsters and exploding cars, there’s something particularly nice about a book whose plot hangs around a piece of technology that most people don’t even think about. (And if that isn’t your thing, there are exploding cars and gangsters.)

This still leaves the question of how a cryptography blog reviews a work of fiction, even one centered on cryptography. The answer is pretty simple: I’m not going to talk much about the story. If you want that, there are other reviews out there. While I did enjoy the book immensely and I’m hopeful Cory will write more books in this line (with hopefully more cryptography), I’ll mainly focus on the plausibility of the core technical setup.

But even to do that, I have to provide a few basic details about the story. (Note: minor spoilers below, but really only two chapters’ worth.)

The protagonist of Red Team Blues is 67-year-old Martin Hench, an expert forensic accountant with decades of experience tracing and recovering funds for some of the most powerful people in Silicon Valley. Martin is on the brink of retirement, lives in a bus named “the Unsalted Hash” and loves bourbon nearly as much as he despises cryptocurrency. This latter position is presumably a difficult one for someone in Martin’s line of work, and sure enough his conviction is quickly put to the test.

Before long Martin is hired by his old friend Danny Lazer — sort of a cross between Phil Zimmerman, David Chaum and (maybe) Max Levchin — who begs him to take one last career-defining job: namely, to save his friend’s life by saving his newest project: a cryptocurrency called TrustlessCoin.

TrustlessCoin is a private cryptocurrency: not terribly different from real ones like Monero or Zcash. (As a founding scientist of a private cryptocurrency, let me say that none of the things in this novel have ever happened to me, and I’m slightly disappointed in that.)

Unlike standard cryptocurrencies, TrustlessCoin contains one unusual and slightly horrifying technological twist. Where standard cryptocurrencies rely on consensus algorithms to construct a public ledger (and zero-knowledge proofs for privacy), TrustlessCoin bases its integrity on the security of mobile Trusted Execution Environments (TEEs). This means that its node software runs inside of systems like Intel’s SGX, ARM’s TrustZone, or Apple’s Secure Enclave Processor.

Now, this idea isn’t entirely unprecedented. Indeed, some real systems like MobileCoin, Secret Network and Intel’s PoET take a fairly similar approach — although admittedly, these rely mainly on server-based TEEs rather than mobile ones. It is, however, an idea that makes me want to scream like a child who just found a severed human finger in his bowl of cornflakes.

You see, TEEs allow you to run software (more) securely inside of your own device, which is a good and respectable thing to do. But distributed systems often require more: they must ensure that everyone else in the network is also running the software in a similarly-trustworthy environment. If some people aren’t doing so — that is, if they’re running the software on a computer they can tamper with and control — then that can potentially harm the security of the entire network.

TEE designers have been aware of this idea for a long time, and for years have been trying to address this using secure remote attestation. Attestation systems provision each processor with a digital signing key (in turn certified by the manufacturer’s root signing key) that allows the processor to produce attestations. These signed messages “prove” to remote parties that you’re actually running the software inside a valid TEE, rather than on some insecure VMWare image or a Raspberry Pi. Provided these systems all work perfectly, everyone in the system can communicate with everyone else and know that they are running the software on secure hardware as well.

The problems crop up when that assumption breaks down. If even a single person can emulate the software inside a TEE on their own (non-trusted device or VM) then all of your beautiful assumptions may go out the window. Indeed, something very similar to this recently happened to Secret Network: clever academic researchers found a way to extract a master decryption key from (one) processor, and were then able to use that key to destroy privacy guarantees across the whole network. (Some mitigations have since been deployed.)

It goes without saying that Red Team Blues is not about side-channel attacks on processors. The problem in this novel is vastly worse: Danny Lazer has gone and bribed someone to steal the secret root signing keys for every major mobile secure enclave processor: and, of course, they’ve been all been stolen. Hench’s problem is to figure out whether it’s even possible to get them back. And that’s only the beginning of the story.

As its name implies, Red Team Blues is a novel about the difference between offense and defense: about how much more difficult it is to secure a system than it is to attack one. This metaphor applies to just about every aspect of life, from our assumptions about computer security to the way we live our lives and build our societies.

But setting all these heavy thoughts aside, mostly Red Team Blues is a quick fun read. You can get the eBook without DRM, or listen to an audiobook version narrated by Wil Wheaton (although I didn’t listen to it because I couldn’t put the book down.)

Remarks on “Chat Control”

Remarks on “Chat Control”

On March 23 I was invited to participate in a panel discussion at the European Internet Services Providers Association (EuroISPA). The focus of this discussion was on recent legislative proposals, especially the EU Commission’s new “chat control” content scanning proposal, as well as the future of encryption and fundamental rights. These are the introductory remarks I prepared.

Thank you for inviting me today.

I should start by making brief introduction. I am a professor of computer science and a researcher in the field of applied cryptography. On a day-to-day basis this means that I work on the design of encryption systems. Most of what I do involves building things: I design new encryption systems and try to make existing encryption technologies more useful.

Sometimes I and my colleagues also break encryption systems. I wish I could tell you this didn’t happen often, but it happens much more frequently than you’d imagine, and often in systems that have billions of users and that are very hard to fix. Encryption is a very exciting area to work in, but it’s also a young area. We don’t know all the ways we can get things wrong, and we’re still learning.

I’m here today to answer any questions about encryption in online communication systems. But mainly I’m here because the EU Commission has put forward a proposal that has me very concerned. This proposal, which is popularly called “chat control”, would mandate content scanning technology be added to private messaging applications. This proposal has not been properly analyzed at a technical level, and I’m very worried that the EU might turn it into law.

Before I get to those technical details, I would like to address the issue of where encryption fits into this discussion.

Some have argued that the new proposal is not about encryption at all. At some level these people are correct. The new legislation is fundamentally about privacy and confidentiality, and where law enforcement interests should balance against those things. I have opinions about this, but I’m not an EU citizen. Unfortunately this is a fraught debate that Europeans will have to have among themselves. I don’t envy you.

What concerns me is that the Commission does not appear to have a strong grasp on the technical implications of their proposal, and they do not seem to have considered how it will harm the security of our global communications systems. And this does affect me, because the security of our communications infrastructure is not localized to any one continent: if the 447 million citizens of the EU vote to weaken these technical systems, it could affect all consumers of computer security technology worldwide.

So why is encryption so critical to this debate?

Encryption matters because it is the single best tool we have for securing private data. My time here is limited, but if I thought that using all of it to convince you of this single fact was necessary, I would do that. Literally every other approach we’ve ever used to protect valuable data has been compromised, and often quite badly. And because encryption is the only tool that works for this purpose, any system that proposes to scan private data must — as a purely technical requirement — grapple with the technical challenges it raises when that data is protected with end-to-end encryption.

And those technical implications are significant. I have read the Impact Assessment authored by the Commission, and I hope I am not being rude to this audience when I say that I found it deeply naive and alarming. My impression is that the authors do not understand, at a purely technical level, that they are asking technology providers to deploy systems that none of them know how to build safely. Nor has the Commission consulted people with the technical and scientific expertise that would be needed to make this proposal viable.

In order to explain my concerns, I need to give some brief background on how content scanning systems work: both historically, and in the context that the EU is proposing.

Modern content scanning systems are a new creation. They have only been deployed since only about 2009, and widely deployed only after about 2011. These systems normally evaluate messages uploaded to a server, often a social network or public repository. In historical systems — that is, older systems without end-to-end encryption — they would process unencrypted plaintext data, usually to look for known child sexual abuse media files (or CSAM.) Upon finding such an image, they undertake various reporting: typically alerting employees at the provider, who may then escalate to the police.

Historical scanning systems such as Microsoft’s PhotoDNA used a perceptual hashing algorithm to reduce each image to a “fingerprint” that can be checked against a database of known illicit content. These databases are maintained by child safety organizations such as NCMEC. The hashing algorithms themselves are deliberately imperfect: they are designed to produce similar fingerprints for files that appear (to the human eye) to be identical, even if a user has slightly altered the file’s data.

A first limitation of these systems is that their inaccuracy can be exploited. It is relatively easy, using techniques that have only been developed recently, to make new images that appear to be harmless licit media files, but that will produce a fingerprint that is identical to harmful illicit CSAM.

A second limitation of these hash-based systems is that they cannot detect novel CSAM content. This means that criminals who post newly-created abuse media are effectively invisible to these scanners. Even a decade ago, the task of finding novel CSAM would have required human operators. However, recent advances in AI have made it possible to train deep neural networks on such imagery, so that these networks can try to detect new examples of it:

Of course, the key word in any machine-based image recognition system is “try.” All image recognition systems are somewhat fallible (see example at right) and even when they work well, they often fail to differentiate between licit and illicit content. Moreover these systems can be exploited by malicious users to produce surprising results. I’ll come back to that in a moment.

But allow me to return to the key challenge: integrating these systems with encrypted communication systems.

In end-to-end encrypted systems, such as WhatsApp or Apple iMessage or Signal, server-side scanning is no longer viable. The problem here is that private data is encrypted when it reaches the server, and cannot be scanned. The Commission proposal isn’t specific about how these systems should be handled, but it hints that this scanning should be done on the user’s device before the content is encrypted. This approach is called client side scanning.

There are several challenges here.

First, client-side scanning represents an exception to the privacy guarantees of encrypted systems. In a standard end-to-end encrypted system, your data is private to you and your intended recipient. In a system with client-side scanning, your data is confidential… with an asterisk. That is, the data itself will be private unless the scanning system determines a violation has occurred, at which point your confidentiality will be (silently) revoked and unencrypted data will be transmitted to the provider (and thus, anyone who has compromised your provider.)

This ability to selectively disable encryption creates new opportunities for attacks. If an attacker can identify the conditions that will cause the model to reduce the confidentiality of youe encryption, she can generate new — and apparently harmless — content that will cause this to happen. This will very quickly overwhelm the scanning system, rendering it useless. But it will also seriously reduce the privacy of many users.

A mirror version of this attacker exists as well: he will use knowledge of the model to evade these systems, producing new imagery and content that appear unchanged, but that these systems cannot detect at all. Your most sophisticated criminals — most likely the ones who create this awful content in the first place — will hide in plain sight.

Finally, a more alarming possibility exists: many neural-network classifiers allow for the extraction of the images that were used to train the model. This means every complex neural network model may potentially contain images of abuse victims, who would be exposed to further harm if these models were revealed.

The only known defense against all of these attacks is to tightly protect the models themselves: that is, the ensure that the complex systems of neural network weights and/or hash fingerprints are never revealed. Historical server-side systems to to great lengths to protect this data, even making their very algorithms confidential. This was feasible in server-side scanning systems because the data only exists on a centralized server. It does not work well with client-side scanning, where models must be distributed to users’ phones. And so without some further technical ingredient, models cannot exist either on the server or on the user’s device.

The only serious proposal that has attempted to address this technical challenge was devised — and then subsequently abandoned — by Apple in 2021. That proposal aimed only at detecting known content using a perceptual hash function. The company proposed to use advanced cryptography to “split” the evaluation of hash comparisons between the user’s device and Apple’s servers: this ensured that the device never received a readable copy of the hash database.

Apple’s proposal failed for a number of reasons, but its technical failures provided important lessons that have largely been ignored by the Commission. While Apple’s system protected the hash database, it did not protect the code of the proprietary neural-network-based hash function Apple devised. Within two weeks of the public announcement, users were able to extract this code and devise both the collision attacks and evasion attacks I mentioned above.

One of the first “meaningful” collisions against NeuralHash, found by Gregory Maxwell.
Evasion attacks against Apple’s NeuralHash, from Struppek et al. (source)

The Commission’s Impact Assessment deems the Apple approach to be a success, and does not grapple with this failure. I assure you that this is not how it is viewed within the technical community, and likely not within Apple itself. One of the most capable technology firms in the world threw all their knowledge against this problem, and were embarrassed by a group of hackers: essentially before the ink was dry on their proposal.

This failure is important because it illustrates the limits of our capabilities: at present we do not have an efficient means for evaluating complex neural networks in a manner that allows us to keep them secret. And so model extraction is a real possibility in all proposed client-side scanning systems today. Moreover, as my colleagues and I have shown, even “traditional” perceptual hash functions like Microsoft’s PhotoDNA are vulnerable to evasion and collision attacks, once their code becomes available. These attacks will proliferate, if only because 4chan is a thing: and because some people on the Internet love nothing more than hurting other Internet users.

From Prokos et al. (source)
This example shows how a neural-network based hash function (NeuralHash) can be misled, by making imperceptible changes to an image.

In practice, the Commission’s proposal — if it is implemented in production systems — invites a range of technical attacks that we simply do not comprehend today, and that scientists have barely begun to think about. Moreover, the Commission is not content to restrain themselves to scanning for known CSAM content as Apple did. Their desire to target previously unknown content as well as textual content such as “grooming behavior” poses risks from many parties and requires countermeasures against abuse and surveillance that are completely undeveloped.

Worse: the “grooming behavior” requirement implies that untested, perhaps not-yet-developed AI language models will be a core part of tomorrow’s security systems. This is worrisome, since these models have failure modes and exploit opportunities that we are only beginning to explore.

In my discussion so far I have only scratched the surface of this issue. My analysis today does not consider even more basic issues, such as how we can trust that the purveyors of these opaque models are honest, and that the model contents have not been altered: perhaps by insider attack or malicious outside hackers. Each of these threats was once theoretical, and I have seen them all occur in just the last several years. Nor does it consider how the scope of these systems might be increased by future governments, and how this infrastructure will make future abuses more likely.

In conclusion, I hope that the Commission will rethink its hurried schedule and give this proposal enough time to be evaluated by scientists and researchers here in Europe and around the world. We should seek to understand these technical details as a precondition for mandating new technologies, rather than attempting to “build the airplane while we are flying in it”, which is very much what this proposal will encourage.

Thank you for your time.

Why encrypted backup is so important

Why encrypted backup is so important

You might have seen the news today that Apple is announcing a raft of improvements to Macs and iOS devices aimed at improving security and privacy. These include FIDO support, improvements to iMessage key verification, and a much anticipated announcement that the company is abandoning their plans for (involuntary) photo scanning.

While every single one of these is exciting, one announcement stands above the others. This is Apple’s decision to roll out (opt-in) end-to-end encryption for iCloud backups. While this is only one partial step in the right direction, it’s still a huge and decisive step — one that I think will substantially raise the bar for cloud security across the whole industry.

If you’re looking for precise details on all of these features, see Apple’s description here or their platform security guide. Others will no doubt have the time to do deep-dive explanations on each one. (I was given a short presentation by Apple today, and was provided the opportunity to ask a bunch of questions that their representative answered thoughtfully. But this is no substitute for a detailed look at the technical specs.)

In the rest of this post I want to zero in on end-to-end encrypted iCloud backup, and why I think this announcement is such a big deal.

Smartphones and cloud backup: the biggest consumer privacy compromise you never heard of

If you’re the typical smartphone or tablet user, your devices have become the primary repository for your private papers, photos and communications. Imagine some document that your grandparents would have kept on a shelf or inside of a locked drawer in their home. Today the equivalent document probably resides in one of your devices. This data is the most personal stuff in a human life: your private family photos, your mail, your financial records, even a history of the books you read and which pages you found meaningful. Of course, it also includes new types of information that are unimaginably more valuable and invasive than anything your grandparents could have ever imagined.

But this is only half the story.

If you’re the typical user, you don’t only keep this data in your device. An exact duplicate exists in a data center hundreds or thousands of miles away from you. Every time you snap a photo, each night while you sleep, this doppelganger is scrupulously synchronized through the tireless efforts of cloud backup software — usually the default software built into your device’s operating system.

It goes without saying that you, dear reader, might not be the typical user. You might be one of the vanishingly small fraction of users who change their devices’ default backup policies. You might be part of the even smaller fraction who back up their phone to a local computer. If you’re one of those people, congratulations: you’ve made good choices. But I would beg you to get over it. You don’t really matter.

The typical user does not make the same choices as you did.

The typical user activates cloud backup because their device urges them to do at setup time and it’s just so easy to go along. The typical user sends their most personal photos to Apple or Google, not because they’ve thought deeply about the implications, but because they can’t afford to lose a decade of family memories when their phone or laptop breaks down. The typical user can’t afford to shell out an extra $300 to purchase extra storage capacity, so they buy a base-model phone and rely on cloud sync to offload the bulk of their photo library into the cloud (for a small monthly fee), so their devices can still do useful things.

And because the typical user does these things, our society does these things.

I am struggling to try to find an analogy for how crazy this is. Imagine your country held a national referendum to decide whether most citizens should be compelled to photocopy their private photos and store them in a centralized library — one that was available to both police and motivated criminals alike. Would anyone vote in favor of that, even if there was technically an annoying way to opt out? As ridiculous as this sounds, it’s effectively what we’ve done to ourselves over the past ten years: but of course we didn’t choose any of it. A handful of Silicon Valley executives made the choice for us, in pursuit of adoption metrics and a “magical” user experience.

What’s done is done, and those repositories now exist.

And that should scare you. It terrifies me, because these data repositories are not only a risk to individual user privacy, they’re effectively a surveillance super-weapon. However much damage as we’ve done to our privacy with search engines and cellphone location data, the private content of our papers is the final frontier in the battle for our privacy. And in less than a decade, we’ve already lost the war.

Apple’s slow motion battle to encrypt your backups

To give credit where it’s due, I think the engineers at Apple and Google were the first to realize what they’d unleashed — maybe even before many of us on the outside were even aware of the scale of the issue.

In 2016, Apple began quietly deploying new infrastructure designed to secure user encryption keys in an “end-to-end” fashion: this means that keys would only be accessible only to the user who generated them. The system Apple deployed was called the “iCloud Key Vault“, and it is consists of hundreds of specialized devices called Hardware Security Modules (HSMs) that live in the company’s data centers. The devices store user encryption keys. Those keys are in turn gated by a user-chosen passcode, which is typically the same passcode you use daily to unlock your device. A user who knows their passcode can ask for a copy of their key. An attacker who can’t guess that passcode (in a small number of attempts) cannot. Most critically: Apple counts themselves in the category of people who might be attackers. This means they went to some trouble to ensure that even they cannot (be forced to) bypass this system.

When it comes to encrypted backup there is essentially one major problem: how to store keys. I’m not saying this is literally the only issue, far from it. But once you’ve found a way for users to securely store and recover their keys, every other element of the system can be hung from that.

The remaining problems are still important! There are still, of course, reasonable concerns that some users will forget their device passcode and thus lose access to backups. You need a good strategy when this does happen. But even if solving these problems took some time and experimentation, it should only have been a matter of time until Apple activated end-to-end encryption for at least a portion of their user base. Once broadly deployed, this feature would have sent a clear signal to motivated attackers that future abuse of cloud backup repositories wasn’t a good place to invest resources.

But this is not quite what happened.

What actually happened is unclear, and Apple refuses to talk about it. But the outlines of what we do know tells a story that is somewhere between “meh” and “ugh“. Specifically, reporting from Reuters indicates that Apple came under pressure from government agencies: these agencies wished Apple to maintain the availability of cleartext backup data, since this is now an important law enforcement priority. Whatever the internal details, the result was not so much a retreat but a rout:

Once the decision was made, the 10 or so experts on the Apple encryption project — variously code-named Plesio and KeyDrop — were told to stop working on the effort, three people familiar with the matter told Reuters.

For what it’s worth, some have offered alternative explanations. John Gruber wrote a post that purports to push back on this reporting, arguing that the main issues were with users who got locked out of their own backups. (Apple has recently addressed this by deploying a feature that allows you to set another user as your recovery assistant.) However even that piece acknowledges that government pressure was likely an issue — a key dispute is about whether the FBI killed the plan, or whether fear of angering the FBI caused Apple to kill its own plan.

Whatever caused it, this setback did not completely close the door on end-to-end encrypted backups, of course. Despite Apple’s reticence, other companies — notably Google and Meta’s WhatsApp — have continued to make progress by deploying end-to-end encrypted systems very similar to Apple’s. At present, the coverage is partial: Google’s system may not encrypt everything, and WhatsApp’s backups are opt-in.

Selective encryption and client-side scanning: a road not taken

As of July 2021 the near-term deployment of end-to-end encrypted backups seemed inevitable to me. In the future, firms would finally launch the technology and demonstrate that it works — at least for some users. This would effectively turn us back towards the privacy world of 2010 and give users a clear distinction between private data and non-private user data. There was another future where that might not happen, but I thought that was unlikely.

One thing I did not foresee was a third possible future: one where firms like Apple rebuilt their encryption so we could have both end-to-end encryption — and governments could have their surveillance too.

In August of last year, Apple proposed such a vision. In a sweeping announcement, the company unveiled a plan to deploy “client-side image scanning” to 1.75 billion iCloud users. The system, billed as part of the company’s “Child Safety” initiative, used perceptual hashing and cryptography to scan users’ private photo libraries for the presence of known child sexual abuse media, or CSAM. This would allow Apple to rapidly identify non-compliant users and, subject to an internal review process, report violators to the police.

Apple’s proposal was not the first system designed to scan cloud-stored photos for such imagery. It was the first system capable of working harmoniously with end-to-end encrypted backups. This fact is due to the specific way that Apple proposed to conduct the scanning.

In previous content scanning systems, user files are scanned on a server. This required that content must be uploaded in plaintext, i.e., unencrypted form, so that the server can process it. Apple’s system, on the other hand, performed the necessary hashing and scanning on the user’s own device — before the data was uploaded. The technical implications of this design are critical: Apple’s scanning would continue to operate even if Apple eventually flipped the switch to activate end-to-end encryption for your private photos (as they did today.)

And let’s please not be dense about this. While Apple’s system did not yet encrypt cloud-stored photos last year (that’s the new announcement Apple made today), encryption plans were the only conceivable reason one would deploy a client-side scanning system. There was no other reasonable explanation.

Users have a difficult time understanding even simple concepts around encryption. And that’s not their fault! Firms constantly say things like “your files are encrypted” even when they store the decryption keys right next to the encrypted data. Now try explaining the difference between “encryption” and “end-to-end encryption” along with forty-six variants of “end-to-end encryption that has some sort of giant asterisk in which certain types of files can be decrypted by your cloud provider and reported to the police.” Who even knows what privacy guarantees those systems would offer you — and how they would evolve. To me it felt like the adoption of these systems would signal the end of a meaningful concept of user-controlled data.

Yet this came very close to happening. It could still happen.

It didn’t though. And to this day I’m not entire sure why. Security and privacy researchers told the company exactly how dangerous the idea was. Apple employees reacted negatively to the proposal. But much to my surprise, the real clincher was the public’s negative reaction: as much as people hate CSAM, people really seemed to hate the idea that their private data might be subject to police surveillance. The company delayed the feature and eventually abandoned it, with today’s result being the end of the saga.

I would love to be a fly on the wall to understand how this went down inside of Apple. I doubt I’ll ever learn what happened. I’m just glad that this is where we wound up.

What’s next?

I wish I could tell you that Apple’s announcement today is the end of the story, and now all of your private data will be magically protected — from hackers, abusive partners and the government. But that is not how things work.

Apple’s move today is an important step. It hardens certain walls: very important, very powerful walls. It will send a clear message to certain attackers that deeper investment in cloud attacks is probably not worthwhile. Maybe. But there is still a lot of work to do.

For one thing, Apple’s proposal (which rolls out in a future release) is opt-in: users will have to activate “Advanced Protection” features for their iCloud account. With luck Apple will learn from this early adoption, and find ways to make it easier to encourage more users to adopt this feature. But that’s a ways off.

And even if Apple does eventually move most of their users into end-to-end encrypted cloud backups, there will always be other ways to compromise someone’s data. Steal their phone, guess their password, jailbreak a partner’s phone, use sophisticated targeted malware. And of course a huge fraction of the world will still live under repressive governments that don’t need to trouble with breaking into cloud providers.

But none of these attacks will be quite as easy as attacks on non-E2E cloud backup, and none will offer quite the same level convenience and scale. Today’s announcement makes me optimistic that we seem to be heading — in fits and starts — to a world where your personal data will belong to you.

Cover photo by Scott Robinson, used under CC license.

One-Time Programs

One-Time Programs

One of the things I like to do on this blog is write about new research that has a practical angle. Most of the time (I swear) this involves writing about other folks’ research: it’s not that often that I write about work that comes out of my own lab. Today I’m going make an exception to talk about a new paper that will be appearing at TCC ’22. This is joint work with my colleagues Abhishek Jain and Aarushi Goel along with our students Harry Eldridge and Max Zinkus.

This paper is fun for three reasons: (1) it addresses a cool problem, (2) writing about it gives me a chance to cover a bunch of useful, general background that fits the scope of this blog [indeed, our actual research won’t show up until late in the post!], and most critically (3) I want people to figure out how to do it better, since I think it would be neat to make these ideas more practical. (Note, if you will, that TCC stands for Theory of Cryptography conference, which is kind of a weird fit.)

Our work is about realizing a cryptographic primitive called the One-Time Program (OTP). This is a specific kind of cryptographically obfuscated computer program — that is, a program that is “encrypted” but that you can mail (literally) to someone who can run it on any untrusted computer, using input that the executing party provides. This ability to send “secure, unhackable” software to people is all by itself of a holy grail of cryptography, since it would solve so many problems both theoretical and practical. One-time programs extend these ideas with a specific property that is foreshadowed by the name: the executing computer can only run a OTP once.

OTPs are one of the cooler ideas to pop out of our field, since they facilitate so many useful things. Aside from the obvious dark-side implications (perfect DRM! scary malware!) they’re useful for many constructive applications. Want to send a file encrypted under a weak password, but ensure nobody can just brute-force the thing? Just send a OTP that checks the password and outputs the file. Want to send a pile of sensitive data and let users compute statistical functions over it using differential privacy? Sure. Time-lock encryption? Why not. In fact, OTPs are powerful enough that you can re-invent many basic forms of cryptography from them… provided you’re not too worried about how efficient any of it will be.

As we’ll see in a minute, OTPs might be nice, but they are perhaps a little bit too good to be true. Most critically they have a fundamental problem: building them requires strong model-breaking assumptions. Indeed, many realizations of OTPs require the program author to deliver some kind of secure hardware to the person who runs the program.* This hardware can be ridiculously simple and lightweight (much simpler than typical smartcards) but the need to have some of it represents a very big practical limitation. This is likely why we don’t have OTPs out in the world — the hardware required by scientists does not actually exist.

In this work we tried to answer a very similar question. Specifically: can we use real hardware (that already exists inside of phones and cloud services) to build One-Time Programs? That’s the motivation at a high level.

Now for the details… and boy are there a lot of them.

One-Time Programs

One-Time Programs (OTPs) were first proposed by Goldwasser, Kalai and Rothblum (GKR) back in CRYPTO 2008. At a surface level, the idea is quite simple. Let’s imagine that Alice has some (secret) computer program P that takes in some inputs, cogitates for a bit, and then produces an output. Alice wishes to mail a copy of P to her untrustworthy friend Bob who will then be able to run it. However Alice (and Bob) have a few very strict requirements:

  1. Bob can run the program on any input he wants, and he will get a correct output.
  2. Bob won’t learn anything about the program beyond what he learns from the output (except, perhaps, an upper-bound on its size/runtime.)
  3. Bob can run the program exactly once.

Let’s use a very specific example to demonstrate how these programs might work. Imagine that Alice wants to email Bob a document encrypted under a relatively weak password such as a 4-digit PIN. If Alice employed a traditional password-based encryption scheme, this would be a very bad idea! Bob (or anyone else who intercepts the message before it reaches Bob) could attempt to decrypt the document by systematically testing each of the (10,000) different passwords until one worked correctly.

Using a one-time program, however, Alice can write a program with code that looks like this, and turn it into an OTP:

Program P: takes an argument "password"
  1. if password != "<secret password>",
      output "BAD"
  2. else output <secret document>
I don’t know if the kids even get my meme choices anymore.

When Bob receives an OTP of the program above, he can then run it on some password input he chooses — even if Alice is no longer around to help him. More critically, because it’s a one-time program, Bob will only be able to try a single password guess. Assuming Bob knows the right password this is just fine. But a recipient who does not know the password will have to guess it correctly the first time. The nice implication is that even a “weak” 4-digit PIN reasonably to safe to use as a password.

(Of course if Alice is worried about Bob innocently fat-fingering his password, she can send him a few different copies of the same program. Put differently: one-time programs trivially imply N-time programs.)

One-time programs have many other useful applications. Once I can make “unhackable” limited-use software, I can send you all sorts of useful functionalities based on secret intellectual property or private data rather than keeping that stuff locked up on my own server. But before we can do those things, we need to actually build OTPs.

Why hardware?

If you spend a few minutes thinking about this problem, it should become obvious that we can’t build OTPs using (pure) software: at least not the kind of software that can run on any general-purpose computer.

The problem here stems from the “can only run once” requirement.

Imagine that you send me a pure software version of a (purported) One-Time Program. (By this I mean: a piece of software I can run on a standard computer, whether that’s a Macbook or a VM/emulator like QEMU.) I’m supposed to be able to run the program once on any input I’d like, and then obtain a valid output. The program is never supposed to let me run it a second time on a different input. But of course if I can run the software once that way, we run into the following obvious problem:

What stops me from subsequently wiping clean my machine (or checkpointing my VM) and then re-installing a fresh copy of the same software you sent—and running it a second time on a different input?

Sadly the answer is: nothing can prevent any of this. If you implement your purported “OTP” using (only) software then I can re-run your program as many times as I want, and each time the program will “believe” it’s running for the very first time. (In cryptography this is sometimes called a “reset attack”.)

Keanu experiences a reset attack.

For those who are familiar with multi-party computation (MPC), you’ll recognize that such attacks can be thwarted by requiring some interaction between the sender and recipient each time they want to run the program. What’s unique about OTPs is that they don’t require any further interaction once Alice has sent the program: OTPs work in a “fire and forget” model.

In their original paper, GKR noted this problem and proposed to get around it by changing the model. Since pure software (on a single machine) can’t possibly work, they proposed to tap the power of tamper-resistant hardware. In this approach, the program author Alice sends Bob a digital copy of the OTP along with a physical tamper-resistant hardware token (such as a USB-based mini-HSM). The little token would be stateful and act like one of those old copy-protection dongles from the 1990s: that is, it would contains cryptographic key material that the program needs in order to run. To run the program, Bob would simply pop this “hardware token” into his computer.

A single USB token might contain thousands of “one-time memories.”

Now you might object: doesn’t using hardware make this whole idea kind of trivial? After all, if you’re sending someone a piece of specialized tamper-resistant hardware, why not just pop a general-purpose CPU into that thing and run the whole program on its CPU? Why use fancy cryptography in the first place?

The answer here has to do with what’s inside that token. Running general programs on tamper-proof hardware would require a token with a very powerful and expensive (not to mention complex) general-purpose CPU. This would be costly and worse, would embed a large attack software and hardware attack surface — something we have learned a lot about recently thanks to Intel’s SGX, which keeps getting broken by researchers. By contrast, the tokens GKR propose are absurdly weak and simple: they’re simple memory devices that spit out and erase secret keys when asked. The value of the cryptography here is that Bob’s (untrusted) computer can still do the overwhelming share of the actual computing work: the token merely provides the “icing” that makes the cake secure.

But while “absurdly weak and simple” might lower the hardware barrier to entry, this is not the same thing as having actual tokens exist. Indeed it’s worth noting that GKR proposed their ideas way back in 2008, it is now 2022 and nobody (to my knowledge) has ever built the necessary token hardware to deploy the in the world. (One could prototype such hardware using an open HSM platform, but how secure would that actually be — compared to a proper engineering effort by a major company like Apple, Google or Amazon?)

And yet One-Time Programs are neat. It would be useful to be able to write and run them on real devices! For many years I’ve run into problems that would melt away if we could deploy them easily on consumer devices. Wouldn’t it be great if we could build them using some hardware that already exists?

How to build a (GKR) One-Time Program

In order to explain the next part, it’s necessary to give an extremely brief overview of the GKR construction for One-Time Programs, and more specifically: their specialized tokens. This construction is based on garbled circuits and will make perfect sense if you’re already familiar with that technique. If not, it will require a little bit more explanation.

GKR’s idea is to rely on many individual tokens called One-Time Memories (OTM). An invidual OTM token works like this:

  1. When a program author (Alice) sets one up, she gets to install two different strings into it: let’s call them K0 and K1. She can then mail the token to Bob.
  2. When Bob receives the token and wants to use it, he can ask the token for either of the two strings (0/1). The token will hand Bob the desired string (either K0 or K1.)
  3. Once the token has given Bob the string he asked for, it permanently erases the other string.

The strings themselves need not be very long: 128 bits is ideal. To use these tokens for building One-Time Programs, Alice might need to set up a few hundred or a few thousand of these “tokens” (which can technically all be glommed together inside a single USB device) and send them to Bob.

Once you assume these tokens, the GKR style of building One-Time Programs is pretty straightforward if you’re a cryptographer. Summarized to someone who is familiar with garbled circuits: the basic idea is to take the classical Yao two-party computation (2PC) scheme and replace the (interactive) Oblivious Transfer portion by sending the evaluator a set of physical One-Time Memory tokens.

If that doesn’t work for you, a slightly more detailed explanation is as follows:

Alice first converts her program P into a boolean circuit, like the one below:

Having done that, she then assigns two random cryptographic keys (annoyingly called ‘labels’) to every single wire in the circuit. One key/label corresponds to the “0” value on that wire, and the other to the “1” bit. Notice that the input wires (top-left) also count here: they each get their own pair of keys (labels) that correspond to any input bit.

labeledcircuit

Alice next “garbles” the circuit (encrypting each gate) using a clever approached devised by Andrew Yao, which I won’t describe precisely here but Vitalik Buterin nicely explains it in this blog post. The result is that each table is replaced with an encrypted table of ciphertexts: anyone who has two of the appropriate labels going into that gate will be able to evaluate it, and in return they will receive the appropriate label for the gate’s output wire. Hence all Bob needs is the appropriate keys/labels for the value Bob wishes to feed into the input wires at the top-left of the circuit — and then he can then recursively evaluate the circuit all the way to the output wires.

From Bob’s perspective, therefore, all he needs to do is obtain the labels that correspond to the input arguments he wants to feed into the circuit. He will also need a way to translate the labels on the output wires into actual bits. (Alice can send Bob a lookup table to help him translate those labels into actual bits, or she can just make the circuit output raw binary values instead of labels on those wires.)

A critical warning is that Bob must receive only one input label for each wire: if he had more than that, he could run the circuit on two “different inputs.” (And in this case, many other bad things would probably happen.) Hence the high-level problem is how to deliver the appropriate input labels to Bob, while ensuring that he gets exactly one label for each wire and never more.

And this is where the One-Time Memory tokens fit in perfectly. Alice will set up exactly one OTM token for each input wire: it will contain both labels for that wire. She’ll send it to Bob. Bob can then query each token to obtain exactly one label for that wire, and then use those labels to evaluate the rest of the circuit. The OTM token will destroy all of the unused labels: this ensures that Bob can only run the program on exactly one input. And that’s the ballgame.**

So where do I pick myself up some One-Time Memories?

You buy them at your local One-Time Memory store, obviously.

Seriously, this is a huge flaw in the hardware-based OTP concept. It would be awfully useful to have OTPs for all sorts of applications, assuming we had a bunch of One-Time Memory tokens lying around and it was easy to ship them to people. It would be even more awesome if we didn’t need to ship them to people.

For example:

  • Imagine there was a publicly-accessible cloud provider like Google or Apple that had lots of One-Time Memories sitting in your data center that you could rent. Alice could log in to Google and set up a bunch of OTM “tokens” remotely, and then simply send Bob the URLs to access them (and hence evaluate a OTP that Alice mails him.) As long as the cloud provider uses really good trusted hardware (for example, fancy HSMs) to implement these memories, then even the cloud provider can’t hack into the secrets of Alice’s One-Time Programs or run them without Bob’s input.
  • Alternatively, imagine we had a bunch of One-Time Memories built into every smartphone. Alice couldn’t easily send these around to people, but she could generate One-Time Programs that she herself (or someone she sends the phone to) could later run. For example, if Alice could build a sophisticated biometric analysis program that uses inputs from the camera to unlock secrets in her app, and she could ensure that the program stays safe and can’t be “brute-forced” through repeated execution. Even if someone stole Alice’s phone, they would only be able to run the program (once) on some input, but they would never be able to run it twice.

The problem is that, of course, cloud providers and phone manufacturers are not incentivized to let users build arbitrary “unhackable” software, nor are they even thinking about esoteric things like One-Time Memories. No accounting for taste.

And yet cloud providers and manufacturers increasingly are giving consumers access to specialized APIs backed by secure hardware. For example, every single iPhone ships with a specialized security chip called the Secure Enclave Processor (SEP) that performs specific cryptographic operations. Not every Android phone has such processor, but most include a small set of built-in TrustZone “trustlets” — these employ processor virtualization to implement “secure” mini-apps. Cloud services like Google, Apple iCloud, Signal and WhatsApp have begun to deploy Hardware Security Modules (HSMs) in their data centers — these store encryption keys for consumers to use in data backup, and critically, are set up in such a way that even the providers can’t hack in and get the keys.

Unfortunately none of the APIs for these hardware services offer anything as powerful as One-Time Programs or (even) One-Time Memories. If anything, these services are locked down specifically to prevent people from doing cool stuff: Apple’s SEP supports a tiny number of crypto operations. TrustZone does support arbitrary computation, but today your applet must be digitally signed by a phone manufacturer or a TEE-developer like Qualcomm before you can touch it. Consumer-facing cloud providers definitely don’t expose such powerful functionality (for that you’d need something like AWS Nitro.) The same goes for “simple” functionalities like One-Time Memories: they may be “simple”, but they are also totally weird and why would any consumer manufacturer bother to support them?

But let’s never forget that cryptography is a sub-field of computer security. In that field when someone doesn’t let you run the machine you want to run, you figure out how to build it.

What hardware APIs do manufacturers and cloud providers support?

Not very many, unfortunately.

The Keymaster meets the Gatekeeper. I think there was also a dog in this movie?

If you hunt around the Apple Developer Documentation for ways to use the iPhone SEP, for example, you’ll find APIs for generating public keys and using them, as well as ways to store keys under your passcode. Android (AOSP) provides similar features via the “Gatekeeper” and “Keymaster” trustlets. Consumer-facing cloud services provide HSM-backed services that also let you store your keys by encrypting them under a password.

None of this stuff is a “One-Time Memory,” unfortunately. Most of it isn’t even stateful… with one exception.

“Counter lockboxes”

Reading through documentation reveals one functionality that isn’t exactly what we need, but at least has some promise. In its documentation, Apple calls this function a “counter lockbox“, and it’s designed to store encryption keys that are protected with a user-selected password. The same person (or someone else) can later retrieve the key by sending in the right passcode. In the event that the entered passcode is not the right one, the lockbox increments an “attempt counter” that caps the number of incorrect guesses that will be permitted before the key is locked forever.

And I do mean forever. When the user exceeds the maximum number of guesses, something interesting happens (here is Apple’s version):

Source: Apple Platform Security guide.

While I’m going to use Apple’s terminology, lockboxes are not just an Apple thing: a nearly-identical functionality exists inside every phone and in basically all of the consumer-facing cloud services I mentioned above.

Since counter lockboxes erase things, and they’re pretty much ubiquitous, this gives us hope. Even though they are not OTMs, maybe we can somehow use them to build OTMs. To explain how this works, we first need to give a clear description of how a basic a counter lockbox works. Here’s a simplified version:***

  1. To set up a fresh lockbox, Alice provides an encryption key K and a password P as well as a “maximum attempt counter” M. The initial attempt counter is set to A := 0. The token stores (K, P, A, M).
  2. When Bob wants to access the lockbox, he sends in a “guess” P’.
    1. If P’ == P (i.e., the guess is correct), the token returns K and resets A := 0.
    2. Else if P’ != P (the guess is incorrect), the token sets A := A + 1 and returns “bad password”.
  3. If A == M (i.e., the maximum attempts have been reached) then the token wipes its memory completely.

If you remember our description of a one-time memory (OTM) further above, you’ll notice that a lockbox stores only one string (key), but that an OTM has two different strings stored in it. Moreover, the OTM will always give us one or the other string, but a counter lockbox only gives us its single string when we enter a password.

But perhaps we can simulate OTMs by using multiple lockboxes.

Imagine that Alice has access to two different (fresh) counter lockboxes, as illustrated below. Let’s assume she configures both lockboxes to permit exactly one password attempt (M=1). Next, she stores the string K0 into the lockbox on the left, and sets it up to use password “0”. She then places the string K1 into the lockbox on the right and set it up with password “1”, as follows:

The picture above shows you what’s inside each lockbox. But you need to remember that to anyone other than Alice, lockboxes are kind of a black box. That is, they don’t reveal the key or password they’ve set up with — until the lockbox is queried on a specific password. To exploit this fact, Alice can shuffle the two lockboxes’ and send them to Bob in a random ordering. Until he tries to access one, these lockboxes will look like this to Bob:

So how does Bob “open” the lockboxes to reliably obtain one of the two strings?

For a moment let’s assume that Bob is honest, and wants to get either K0 or K1. He does not know which of the two lockboxes contains the string he wants. He can, however, employ a simple strategy to obtain it.

Assuming Bob’s choice is “0”, he can simply query both lockboxes on password “0”. One of the two lockboxes will output K0, and the other will output “bad password”. (If his choice is “1”, then he can do the same using that as the password.) This strategy works perfectly. In either case Bob always obtains the string he wanted, and the other string ends up getting erased: just like a One-Time Memory!

The problem with this approach is that Bob might not be honest.

A cheating Bob can query the first lockbox on password “0”, and if that lockbox hands him K0, he can switch strategies and query the remaining lockbox on password “1”. If this works, he will end up with both of the strings — something that should never happen in a One-Time Memory. And this is very bad! Imagine we use such broken “memories” to build GKR one-time programs and Bob pulls this off for even a single input wire in our garbled circuit, then he will be able to query the circuit on multiple distinct inputs. (And it gets worse: if Bob obtains two labels on the same input wire, the garbled circuit construction will often unravel like a cheap sweater.)

The good news here is that Bob’s strategy doesn’t always work. He will sometimes get both strings, but sometimes he won’t. Half the time he’ll query the first lockbox on “0” and it will say “bad password”… and the key it holds will be gone forever. (Bob can still go on to get the other string from the untouched lockbox, but that’s not a problem.) This means we have at best a 1/2 probability of thwarting a cheating Bob. That’s good… but can we reduce Bob’s chances even more?

The most obvious approach is to use more lockboxes. Instead of two lockboxes, imagine Alice sets up, say, 80 different shuffled lockboxes divided into two sets. The first forty lockboxes can be programmed with the “0” password, and the other forty can be programmed with password “1”: then all will be mixed together. Our goal in this is to ensure that Bob will obtain K0 only if he guesses where all of the “0” lockboxes are. To ensure this, Alice will use secret sharing to split K0 into 40 different “shares”, with the property that all of the shares are needed to obtain the original string. Each share can be placed into one of the 40 boxes. (A similar process will be used to protect K1.)

Of course Bob can still cheat and try to guess his way through this maze Alice has built, but he will need to correctly guess where each of the “0”-password lockboxes are located without making a single mistake: since even a single wrong guess will doom his chances to get both strings. Such a lucky guessing strategy is extremely challenging for Bob to pull off. (The analysis is not quite equivalent to Bob flipping a coin “heads” forty times in a row [1/240] but it results in a probability that’s similarly low.)

By adjusting the number of lockboxes she uses, Alice can carefully tune the security level of each “virtual one-time memory” so that cheating (almost) never works.

This sure seems like a lot of lockboxes

Well, I said this paper is appearing in the Theory of cryptography conference, didn’t I?

Obviously the proposal above is not the end of the story. In that solution, for an desired S-bit security level the “naive” approach requires Alice to set up 2S lockboxes (or O(S) if you don’t like constants.) To build GKR one-time programs from this, Alice would need to use that O(S) lockboxes for every input bit Bob might want to feed into the program. Surely we can do better.

The rest of our paper looks at ways to reduce the number of lockboxes required, mainly focusing on the asymptotic case. Our second proposal reduces the number of lockboxes to about O(1) lockboxes for every input bit (when there are many input wires) and the final proposal removes the dependence on the number of input bits entirely. This means in principle we can run programs of arbitrary input length using some reasonable fixed number of lockboxes.

I can’t possibly do justice to the full details here, except to note that we rely on some very cool techniques proposed by others. Our first transform relies on a ‘robust garbling’ technique devised by Almashaqbeh, Benhamouda, Han, Jaroslawicz, Malkin, Nicita, Rabin, Shah and Tromer. The second uses an underappreciated tool called “Laconic OT” that was proposed by Cho, Dottling, Garg, Gupta, Miao and Polychroniadou. (Sadly, Laconic OT constructions are not quite “concretely” practical yet, so I hope these new applications will motivate some more practical research into that area.)

The upshot is that we can, in principle, manufacture programs that run on user inputs of arbitrary size with a fixed cost of at most several thousand lockboxes. While this is still quite a large number in practice(!), it isn’t not infeasible. Such a quantity of lockboxes could be obtained by, for example, setting up boatloads of fake iCloud or Google accounts and then using Apple’s iCloud Key Vault or Google’s Titan Backup service to store a “backup key” for each of them.

(I do not recommend you actually do any of this: I suspect both Apple and Google find such things irritating, and will make your life unpleasant if you try it.)

Are there any downsides to this research?

Maybe. There are many applications of obfuscated programs (including one-time programs) that are extremely bad for the world. One of those applications is the ability to build extremely gnarly ransomware and malware. This is presumably one of the reasons that systems like TrustZone and Intel SGX require developers to possess a certificate in order to author code that can run in those environments.

For example: a ransomware author could encrypt a whole system and the store the keys inside of a One-Time Program that would, in turn, need to be executed on a valid chunk of a blockchain in order to release the keys. This blockchain would then incorporate a fragment that proves that the system owner has paid some sum of money to the malware developer. This system would be fully ‘autonomous’ in the sense that your computer, once infected, would never need to phone home to any “command and control” infrastructure operated by the ransomware author. This would make malware networks harder to take down.

If systems designers are worried about malware on SGX, our research shows that (eventually) those designers may also need to think about the availability of “lockbox”-type functionalities as well. Or to give a shorter summary: don’t let the bad guys get more than a few thousand lockboxes, or they gain the power of secure computation and all that comes with it. And the exact number they need could be much smaller than the results we achieve in this paper.

Perhaps at a higher level, the main lesson of this research is that computation is much easier than you’d expect it to be. If someone wants to do it badly enough, they’ll always find a way.

Notes:

* There is a second line of work that uses very powerful cryptographic assumptions and blockchains to build such programs. I am very enthusiastic about this work [and my co-authors and I have also written about this], but this post is going to ignore those ideas and stick with the hardware approach.

** Of course that’s not really the ballgame. As with all simple descriptions of complex protocol papers, this simple explanation omits all the subtle details that matter for the security proof, but that aren’t terribly relevant for the purposes of this blog. These include all sorts of interesting questions about how to prove security in an environment where an adversary can query the tokens in a weird, arbitrary order. It’s a good paper!

*** Some implementations will pick the key K for you. Others fix the maximum attempt counter at some constant (usually 10 attempts) rather than supporting any amount. All of these details can be worked around in practice.

In defense of crypto(currency)

In defense of crypto(currency)

Last week a group of technologists, including Bruce Schneier, sent a letter to Congress outlining their concerns around cryptocurrency and urging Congress to regulate the space.

Now let me be the first to say that I broadly support this goal. I have no problem with the idea of legislators (intelligently) passing laws to regulate cryptocurrency. Indeed, given the level of insanity and the number of outright scams that are happening in this area, it’s pretty obvious that our current regulatory framework is not up to the task. If the recent letter simply asked for intelligent regulation, I’d gladly sign onto it. Unfortunately that’s not at all what this letter says. Instead, it argues that the entire technology field is worthless and cannot be used for any practical purpose.

Don’t take my word for it. I urge you to stop reading this post right now and take a look for yourself. I’ve helpfully reproduced some of the critical pieces below (emphasis added):

By its very design, blockchain technology, specifically so-called “public blockchains”, are poorly suited for just about every purpose currently touted as a present or potential source of public benefit. From its inception, this technology has been a solution in search of a problem and has now latched onto concepts such as financial inclusion and data transparency to justify its existence, despite far better solutions already in use. After more than thirteen years of development, it has severe limitations and design flaws that preclude almost all applications that deal with public customer data and regulated financial transactions and are not an improvement on existing non-blockchain solutions.

The catastrophes and externalities related to blockchain technologies and crypto-asset investments are neither isolated nor are they growing pains of a nascent technology. They are the inevitable outcomes of a technology that is not built for purpose and will remain forever unsuitable as a foundation for large-scale economic activity.

Frankly, this whole letter bums me out. Over the years I’ve spent a decent amount of time on Twitter calling out cryptocurrency shills who spout technical nonsense while promoting outright confidence games. I took for granted that my technical colleagues would be more a little more reasonable, particularly when speaking as technical experts to Congress and regulators. This is not simply someone “being wrong on the Internet.” These are important claims that deserve serious attention, and there are real consequences to being wrong here.

So while I appreciate the authors’ intentions, against my better judgement — and you’d better believe my better judgement is shouting at me for writing these words — I feel compelled to say something in defense of this technology area. “Public blockchain” technology enables many stupid things: today’s cryptocurrency schemes can be venal, corrupt, overpromised. But the core technology is absolutely not useless. In fact, I think there are some pretty exciting things happening in the field, even if most of them are further away from reality than their boosters would admit. Moreover, many of crypto’s technical problems are also amenable to some really exciting technical solutions, many of which are already here or on their way to deployment.

So instead of enjoying a beautiful week of Baltimore summer, I’m going to spend my time indoors, writing about what I think those things are — and (more or less incidentally) why I think these distinguished authors are wrong that “public blockchain technology” is a technical dead end. This post is not precisely a rebuttal to the letter above: instead, I’ve decided to phrase it as a general response to some of the more common spurious objections I hear people make to public blockchain systems. It just happens to be the case that a few of these (but not all) come up in the letter.

Finally, in the interest of full disclosure: I have designed privacy-preserving cryptocurrency systems (and still serve on a Foundation board of one), and I’m currently working on a startup that is trying to add regulatory compliance capabilities to public blockchains. (You can decide for yourself if this makes me a shill for Big Blockchain or Big Regulation. Frankly I’m not sure.)

Objection: “Cryptocurrency is terrible for the environment”

It’s probably best not to beat around the bush on this one: the most serious current objection to cryptocurrency (as it’s currently construed) is the massive environmental impact of proof-of-work (PoW) mining. I won’t devote a single second towards minimizing this concern. Many defenders have tried to paint the electricity consumption of Bitcoin and other PoW currencies as “green” or define it as a form of energy storage. This is dishonest nonsense: estimates hold that at east 60% of mining energy consumption still comes from fossil sources.

And that’s a lot of energy.

The totals depend on which smallish nation we’re using as our standard unit of energy consumption: this September 2021 NYT article has Bitcoin (all by itself) consuming nearly as much energy as Finland. Yet whereas Finland produces Nokia cellphones and lovable wooden clogs, Bitcoin just produces… bitcoin. Not to mention a pathetic transaction rate of about 3.5 tx/second across the entire globe, as of this week.

Overview of mining pool activity on the Bitcoin chain. Does that look decentralized to you?

Regardless of where you stand on cryptocurrency as a technology, you should understand that this wasteful resource consumption colors the public’s views of cryptocurrency in a highly negative way, and it is absolutely right for people to have these feelings because we are in a climate crisis and wasting this much goddamn energy on a single consensus protocol is pointless, harmful, and quite possibly evil — particularly when the result of all this energy consumption isn’t even a particularly decentralized network.

However: before you call for some kind of misguided cryptocurrency ban, you should understand that this is a temporary situation and not an intrinsic component of public blockchain technology.

Proof-of-work was chosen early in Bitcoin’s history because Bitcoin was designed to be operated by volunteers with personal computers. The concern in this setting was that a single user could mount a “Sybil attack” and pretend to be many different computers, thus dominating the construction of blocks on the network. Since verifiable real-world identities didn’t exist on the Internet, Nakamoto chose proof-of-work as an elegant solution: this approach makes your “vote” in the network organization proportional to the amount of computing power you possess. Since the typical early Bitcoin user only had one or a small number of computer CPUs to mine with, this kept the early Bitcoin network relatively decentralized.

Unfortunately modern proof-of-work mining looks nothing at all like the early Bitcoin network: today’s mining is a capital-intensive industry that competes to build entire datacenters full of specialized ASIC-based mining hardware. This change has undone most of the early decentralization benefits, while pointlessly burning tons of coal and natural gas.

But all is not lost.

Proof-of-work is not the only technology we have on which to build consensus protocols. Today, many forward-looking networks are deploying proof-of-stake (PoS) for their consensus. In these systems, your “voting power” in the network is determined by your ownership stake in some valuable on-chain asset, such as a new or existing electronic token. Since cryptocurrency has coincidentally spent a lot of time distributing tokens, this means that new protocols can essentially “cut out the middleman” and simply use coin ownership directly as a proxy for voting power, rather than requiring operators to sell their coins to buy electricity and mining hardware. Proof-of-stake systems are not perfect: they still lead to some centralization of power, since in this paradigm the rich tend to get richer. However it’s hard to claim that the result will be worse than the semi-centralized mess that proof-of-work mining has turned into.

And proof-of-stake is no longer theory. It’s already been deployed in production within a number of successful projects, including Avalanche, Cardano, Algorand, and Tezos. The Ethereum project is in the process of rolling out a proof-of-stake upgrade they call “Ethereum 2”, and while the final plans still seem a little handwavy for this late date, there has at least been some real progress in launching parts of the system. Beyond proof-of-stake, there are other technologies in deployment, such as the proof-of-time-and-space construction used by Chia, or more centralized proof-of-authority systems.

Now, admittedly: none of this solves the problem that much cryptocurrency is dirty today.

The letter is actually surprisingly nuanced about this objection.

But the question we should be asking is not whether to be angry about the power consumption of proof-of-work mining. We should be trying to figure out the right path out of this mess. And more concretely, whether there’s a path forward which is more likely to produce a good outcome than what is already happening in the industry — namely, that projects are rapidly deploying cleaner technologies to replace proof-of-work. Because it’s very unlikely that shade or hypothetical cryptocurrency bans are going to fix the problem any faster, and in fact: government overreaction could make it much, much worse by driving resources away from the cleaner chains that are coming online to solve the problem.

Objection: “Public blockchains can never support banking features like transaction reversal.”

One of the biggest problems in the field of cryptocurrency is that too many technical experts stopped looking at the field around 2015. This means they’ve missed a lot of the more interesting developments that have occurred in the past couple of years.

To give an example of this phenomenon, let’s take one claim that occurs very near the top of my colleagues’ letter:

This claim is not technically accurate. And worse, it indicates that the readers have missed several years’ of business and technological development. Unfortunately, correcting mistakes like this requires diving pretty deep into the technical weeds.

Blockchains work by assembling a data structure called an append-only ledger. Much like a traditional pen-and-paper bank ledger (shown at right), this ledger represents a list of events, such as currency transactions. A common feature of blockchain tech is that the ledger is constructed using a kind of “adversarial collaboration” that runs between many different computers. The upshot of this process is that entries on the ledger are very difficult to tamper with.

In the earliest cryptocurrency systems (like Bitcoin), the ledger is used to record the ownership and transfer of made-up tokens, such as the bitcoin currency. Since there is no trusted party or “bank” that manages the accounts on these systems, Bitcoin’s transaction rules are very simple and work like cash. If I sent money to your account, only you (using a cryptographic private key) should be able to control where it goes next. This is exciting and also a bit scary: there is no “undo” feature in these cash-like tokens.

By contrast, the modern retail-facing credit card and banking industries work very differently. In those industries there exist trusted parties (your bank or credit card’s customer-service representative) who can and do “reverse” fraudulent or mistaken transactions under specific circumstances. (Whether they will do so is a very different question.)

While it’s technically accurate that blockchain ledgers cannot easily be overwritten, it’s critical to understand that ledgers really has nothing really to do with transaction reversibility — any more than the specific writing instrument used by a historical bank (pen vs. pencil) determines whether a bank can reverse transactions. In practice, transaction reversal has nothing to do with the way a ledger is written. Transaction reversal is not about ledger technology, it’s about transaction rules and trust: it requires that there is someone that you trust to make transaction in your accounts without your explicit permission.

In other words, transaction reversibility is not about the ledger, but rather about the transaction rules that a currency uses. A reversible currency requires that someone anoint this trusted party (or trusted parties) and that they use their powers to freeze/burn/transact currency in ways that are at odds with the recorded owners’ intentions. And indeed, this is a capability that many tokens now possess, thanks to the development of sophisticated smart contract systems like Ethereum, that allow parties to design currencies with basically any set of transaction rules they want.

And what’s fascinating is that none of this is hypothetical!

One of the most interesting developments of the past several years is the deployment of several government-regulated “stablecoins” that represent tokenized versions of real fiat currency (e.g., dollars) in a bank account. More or less without exception, these regulated currencies, which are issued by licensed and government-regulated organixations like USDC and BUSD (and are not the same as unregulated algorithmic scam coins like UST) each possess a centralized party/committee that can “freeze”, mint, or “burn” money owned by any user in the system [BUSD code, USDC code]. A centralized manager of the currency can therefore “lock” the account of any illegitimate recipient and compensate the spender directly, or in some cases, the centralized manager can “burn” and mint new currency to send back to the originator.

Indeed, this capability is explicitly mandated by regulators:

It is reasonable to point out that, compared to mature banking systems, current stablecoins’ transaction reversal capabilities are quite rudimentary. From a business perspective there is no guarantee that a coin issuer will return your stolen money, nor that they can do so in the event that a thief has already passed your money on to a fence. (Just as there’s no guarantee that Zelle will do so.) Modern banks implement these features with fraud-detection features and with a combination of delayed settlement, insurance, and trust. Some of these are technical capabilities, but many are largely business questions. In either case, none are “antithetical to the design of public blockchains.” If reversibility is actually important to you as a feature, then you should pay attention to how these new regulated systems develop.

Two regulated stablecoins with freeze capability: Binance USD (issued by Paxos) and
(issued by Circle/Centre). Chart shows market capitalization from near-$0 in 2020 to about $60bn today.

Objection: “Cryptocurrency doesn’t scale [or the fees are too damned high]”

The early Bitcoin protocol was designed to be many things, but fast and efficient was not one. The system’s famously low transaction rate is effectively the result of several tradeoffs in the design of the network’s consensus algorithm: where centralized payment systems like Visa or Mastercard can scale horizontally — by assigning different computers to handle various subsets of user transactions — in Bitcoin (and Ethereum, and most other extant platforms) every single participating node must validate every single transaction made by anyone in the system. This means that simply adding more computing power doesn’t produce a faster network.

The result is pretty dismal. Bitcoin’s transaction rate has historically topped out around 7 tx/sec worldwide (although recent upgrades may improve things slightly.) Ethereum pulls off maybe 20-30 tx/sec by sailing a bit closer to the wind. Meanwhile: networks like Visa handle about 1700 tx/sec on an ordinary day (!), and 10x that rate on major holidays.

This scaling problem makes cryptocurrency unworkable for just about any mainstream payments application. A single popular mobile game like Candy Crush probably conducts enough in-game transactions to challenge the Ethereum network.

via Etherscan.

And because the transaction rate in these so-called Layer 1 (L1) cryptocurrency networks is so low, competition for scarce network resources translates into high transaction fees. That’s why it recently cost $22 (!) to send a single token transaction on Ethereum, and much more to handle sophisticated transactions like DeFi swaps. These prices are fine if you’re a crypto speculator doing $1,000+ trades. But they rule out anything as mundane as paying people.

This sounds pretty bad. However, an important lesson I’ve learned in my career is this: if people are sufficiently motivated and the only barrier is an engineering problem, then it’s probably wise not to bet against them.

With some exceptions, the cryptocurrency community has acknowledged that existing techniques are unscalable, and they are engineering to try to get around this. The result has gone in two different directions. Both are unproven, and the resolution of these developments will determine exactly how well the field can grow in the future.

More money, more networks. The most obvious way to scale L1 cryptocurrencies is just to build more of them. This describes a lot of the action in 2019-2021: as networks like Ethereum saturate with transactions and become expensive, new entrants are deploying compatible (and sometimes incompatible) chains that provide more transaction capacity without the fees. These networks often look like Ethereum (in that they’re Nakamoto-consensus, proof-of-work mined systems), but sometimes they’re stripped down or use faster consensus technologies (examples include Avalanche, Polygon, Celo and Solana.) Some are just centralized clones of Ethereum.

This might stave off total chaos, but it probably isn’t sustainable. Even if all these networks work perfectly — and that’s a big assumption — the problem with adding more networks is that your ecosystem fragments. If funds are on the Ethereum chain and you want to use an application that lives on a different network, how do you get your funds over there? The solution today mainly involves “bridges” — semi-centralized parties that will accept your money on one L1 network and provide you with funds on another. If you want to move your funds back to the original network, that’s another pass through the bridge, with fees on both networks. It also means you have to trust the integrity of both the bridge itself and the destination network, which might itself be much less robust (and safe) than the original network you started from. It’s possible this approach will work well enough to enable most applications, but I wouldn’t bet on it.

Rollups. The second approach, heavily promoted by the Ethereum Foundation, is to improve the scaling of individual L1 networks like Ethereum by performing transactions on some second-layer, with the most common proposal being a “rollup server.”

Rollup servers are centralized machines that can validate many transactions quickly. One rollup server doesn’t handle every payment or smart contract on the chain. Instead, it operates over one or a small number of applications (say, a few specific smart contracts.) Users submit their transactions either to the chain itself, or directly to the server, which then validates the transactions and posts a short “proof” to the L1 chain asserting that a large collection of transactions has been checked and found to be valid. The idea here is to reduce the amount of computation and storage that the L1 nodes must perform to check these transactions: rather than validating 10,000 individual transactions, the L1 nodes simply verify one short assertion posted by the rollup server.

This sounds like magic, and it is, sort of. There are two approaches to building rollups, both of which are in “experimental” production today:

  • Optimistic rollups use a “trust and punish” approach. The rollup server posts a financial bond to assure the world that it will correctly validate all of its transactions. In the unlikely event that the rollup server “cheats” and authorizes an invalid transaction, any third party whistleblower can submit a “fraud proof” that proves the rollup server’s failure. The L1 network can check these proofs, which will invalidates the bad transactions and pay the whistleblower a large reward.
  • ZK rollups use cryptographic technology drawn from the field of zero-knowledge protocols, such as SNARK or STARK proofs, so the server can “prove” that all transactions were validated correctly before it posts the summary results to the chain. In principle this means the L1 chain can verify a short “proof” that covers many thousand complicated transactions, with (essentially) no possibility of cheating.

Rollups sound like a terrific idea, and the Ethereum community has bet heavily on the tech. But it’s worth pointing out that in practice nobody knows how well things will actually bear out when these systems are in widespread deployment.

One problem is that rollups today are largely focused on reducing the computational burden of verifying transactions. This is a big deal, particularly for Ethereum where verifying complex smart contract executions is quite costly. But even with today’s rollups, L1 nodes are still expected to store and transmit the raw transaction data: without keeping these transactions around, the loss of a rollup server could freeze an entire smart contract in place, blocking any further progress. This means scaling bottlenecks still exist — they’ll just be hit when nodes run low on bandwidth and storage, rather than compute.

That’s still a potentially huge improvement and many folks are optimistic: Vitalik Buterin calculates maximum possible scaling improvements on the order of 100x for rollup servers, though he quickly tempers this calculation by noting practical concerns.

In any case, the point of this section is not to claim that scaling is a solved problem. Rather, the point here is that scale improvements are on the minds of a lot of smart engineers, and there are solutions on the way. The results might not be perfect and presumably there will be much experimentation before we settle on a workable approach, but the end result will almost certainly work fine at some level of scalability. The main question is simply how robust and decentralized the result will be.

Objection: “There is no privacy on blockchains (or there is too much privacy)”

Public blockchains rely on volunteers to operate a network that verifies transactions. The implication of this design is that the transaction data itself must be publicly viewable. While a few naive people still believe that these currencies anonymous, the truth is quite different: these older public chains expose your transactions to anyone who wants to see them. In those systems, your main protection is that transactions use a pseudonym (called an address) in place of your real identity.

A random transaction from a Bitcoin block explorer.

Some advocates once felt that pseudonymity was good enough for privacy, but this belief has receded a bit as sophisticated “chain analysis” companies have grown up and made progress towards identifying the real owners of various accounts. The lack of privacy on these systems poses a real challenge for those who would like to build financial infrastructure on public chains.

The good news is that researchers have made a lot of progress on solving privacy problems around cryptocurrency. We now have deployed privacy tech that allows users to conduct transactions on public blockchains without revealing any information that they do not wish to reveal. These systems work by (effectively) encrypting transactions and using sophisticated zero-knowledge proofs to convince verifiers that the transactions data is consistent (i.e., that encrypted transaction amounts “add up” and do not create money out of thin air.) There now exist several deployed cryptocurrencies that provide strong privacy even against government surveillance, and law enforcement has expressed concerns about them.

Text from US Executive Order dated March 2022.

Still, to criticize an early technology for being both private and not-private-enough has a “nobody goes there, it’s too crowded” feel to it. What is true is that over many years the traditional financial system has learned to walk a tightrope where customer privacy is balanced on one hand against anti-money-laundering regulations on the other (as regulated by laws like the Bank Secrecy Act in the US.) Achieving those goals in today’s financial infrastructure has not been without cost, however. Our privacy as individuals has been vastly degraded by technological developments, with little opportunity for democratic debate. And the result sucks: collecting all of our private data and securing it is expensive, an expense we all pay for in high fees and catastrophic breaches. The cryptographic privacy offered by cryptocurrency is exciting because offers a different road forward: one that promises to keep irrelevant data in our own hands.

Even TornadoCash has regulatory compliance. Sort of.

The actual form that these mature systems take is unknown to me. Perhaps they’ll use zero-knowledge policies that keep smaller transactions private, while ensuring that larger transactions are known to regulators or other parties. I’m not excited to use those systems, because I think they’ll be risky. But at very least they’ll be better than our current collect-it-all-and-then-hand-it-to-hackers approach, which certainly has not done us very many favors.

So why do I care about any of this?

To put it simply: because payments are important. And because something is badly wrong.

Credit card merchant fees have risen since this ad was on TV.

Over the past four decades, computer networking has radically changed the economics of just about every industry that relies heavily on IT. Google made information access so easy that we can barely remember the world before it existed. My kids refuse to believe that I once paid $1/minute for long distance phone calls. It’s so inexpensive to start an online retailer that we now have more online pet food stores than we have drive-in movie theaters in the United States.

And yet if you looked at the money-transfer and payments industry, you’d see no such changes. Credit card merchant fees are similar, or have actually risen in the United States since the 1990s, and that is an absolute tragedy — since these fees are baked into the cost of most retail goods and thus fall heavily on the working poor (who pay them even if they use cash.)

If all you care about is technology, consumer payment tech has improved at a glacial pace. It took nearly two decades to roll out anti-fraud improvements like EMV chipchards and tap-to-pay (NFC) in the US. Shopping online in 1995 meant typing credit card numbers into webforms. In 2022… it mostly means exactly the same thing. As a result, online payment fraud has ballooned to around $200bn in 2020. And forget about real innovation like payment privacy or pay-by-phone (which is ubiquitous in Kenya and China but still in its infancy here in the US, and will likely only be “solved” by giving Apple and Google total control over payments.)

Why are these IT-focused industries so consistently immune to the same technological improvements and cost reductions we see everywhere else?

I’m only a computer scientist, so I’m going to let someone else answer that question. I can only tell you that what we have right now is not functioning properly: I suspect that legacy industry and regulators have smothered two generations of technological improvement, largely (I suspect) by building a (mostly) closed and permissioned financial system. And this is a big deal: payments are too important to our economy to entrust them to 1970s-era technology and an extractive industry. We don’t even know what novel applications — Googles, Facebooks, Wikipedias, Instagrams — we’re missing out on because the industry simply won’t allow them to exist.

I don’t know if blockchains are the solution to this problem. I see indications that the technology is finally starting to grow up in ways that seem like a harbinger of major positive changes on the horizon. Progress here is slow, though in some cases because the regulatory apparatus is throwing sand in the gears of cooperative products, and/or utterly failing to move expeditiously to uncover possible fraud. And maybe the result won’t even be a success for blockchain solutions: perhaps we’ll simply get more and better offerings from “traditional finance” industry as they start to wake up to the fact that more open systems can compete with their closed offerings.

So while I don’t know if cryptocurrency will be the answer, I’m just hopeful that something will be.

Title image by Flickr user Joegoauk Goa, used under CC license.

An extremely casual code review of MetaMask’s crypto

An extremely casual code review of MetaMask’s crypto

NB: This post describes a very casual code review of a few cryptography functions used by MetaMask. It does not describe any vulnerabilities. If you’re the kind of person who likes a meandering and amateurish code review that goes absolutely nowhere, you’ll enjoy this post. Otherwise you might want to read something more exciting: I suggest Moxie’s recent post on web3.

For reasons I can’t fully explain, the other day I decided that it might be fun to spend an hour or two investigating the cryptography used by MetaMask.

For those who aren’t familiar with the tool, MetaMask is a browser-based cryptocurrency wallet that is used used to access decentralized applications (dapps) on networks like Ethereum. My interest in MetaMask is mostly just based on curiosity: I recently invested about $100 into a decentralized finance application and I wanted to see how safe it really was. While there are lots of different ways I could lose that money, taking a look at MetaMask’s crypto seemed like as good a place to start as any.

I want to stress that this was an informal code review: I didn’t use any tooling, didn’t even download (most) code to my computer. In fact my “review” mostly involved poking around various Github repositories to see if I could find anything that immediately jumped out as incorrect, and failing that, at least could give me a feeling for the quality of MetaMask’s crypto code. (In fact I did about half the work on my phone while eating a burrito bowl at Chipotle.)

After an hour or two of hunting through dependencies, I made the mistake of tweeting about my feelings:

I swear that this offhand Tweet was not meant to make anyone unhappy or scare people that their funds were at risk. Although I had a couple of scary moments, those were entirely on me. So let me be clear as possible right now: I did not find any exploitable vulnerabilities in MetaMask’s crypto!

What I did come back with is an uncomfortable feeling about the complexity and quality of MetaMask’s (current) crypto code, and some unhappy feelings about its dependency structure. Some of this stuff is basically unavoidable: file it under “browser crypto is scary.” Some is specific to the way the code is laid out. And some of it is a (non-trivial) gripe: this code is is much harder to audit than it should be.

And this last part isn’t just a personal gripe. My TL;DR is that finding and reviewing the correct code was made difficult because there were far too many different organizations owning the dependencies that led to the crypto routines themselves. This made me uncomfortable, given how much money these routines are responsible for securing.

In this post I’m going to justify these opinions by walking you through a casual skim of MetaMask’s code and and crypto dependencies. To make you feel “like you were there”, I’m going to discuss the review more or less in the order it occurred — including the embarrassing part where I went off the rails and reviewed the wrong code.

If you enjoy reading crypto code for entertainment, this post might be for you. If you’re hoping that this will actually lead to anything exciting, I suspect you’ll be very disappointed.

Quick explainer: what is MetaMask?

If you already know what MetaMask does, skip this part.

Since many readers have probably never used DeFi, I figure I should give a quick background here on “web3” and MetaMask in particular.

From a technical perspective, web3 is pretty straightforward. It consists of many “decentralized apps” (dapps), each of which typically comprises a (typically Javascript) front-end web app, as well as some back-end business logic. What makes dapps special is the back-end portion, which is decentralized. Generally this means it relies on a smart contract running in a network like Ethereum.

Web3 front-ends are just web apps, and typically they’re served to your browser via standard web infrastructure. (Security here usually means HTTPS, so anyone who hacks a server or steals a Cloudflare API key can change these apps’ code in malicious ways.)

Of course, web apps can’t communicate directly to blockchains, nor are they a good place to store private keys. This is why MetaMask exists. In its most popular instantiation, MetaMask ships as a browser extension for Chrome and Firefox. When a web application needs to send a transaction to a smart contract (e.g., because you want to deposit money into Aave), MetaMask is responsible for signing the transaction and shipping the transaction to the chain.

On the bright side for this review, the actual cryptography in MetaMask is fairly limited: as a wallet, it must generate and store Ethereum public and private keys and it also needs to handle “simple” operations like ECDSA signing. On the other hand: we’re in a browser. So nothing is actually simple.

Climbing the dependency ladder

To put some bounds on the effort required, I decided to only look at the implementation of ECDSA signing and key generation (excluding the curve operations.) This seemed like an easy task I could get done in an hour or two while eating an extended lunch.

Finding the starting point for a review like this was harder than I expected. In short: it isn’t easy to grok the control flow of a complex event-based Javascript browser extension to find out exactly which calls are “real” and which are tests or dead code. To shortcut this problem, my approach amounts to “grepping and hoping”, starting from the “develop” branch of MetaMask’s extension repository.

A quick search for “sign” leads us to a promising call in the file metamask-controller.js:

Sadly, following this call path quickly leads us into dependency hell.

First, we reach a library called eth-keyring-controller, which presumably manages Ethereum keyrings. A quick scan of that library shows it calling a second dependency: eth-sig-util. (Both are NPM packages developed by MetaMask.) We’ll jump right to the latter package, which… you guessed it, calls yet another package:

This call takes us out of MetaMask-owned code and out to a new package called “ethereum-jsutil” that’s maintained by the “Ethereum Javascript community“, whoever they are. (Don’t worry, we won’t stay here long enough to care.) Of course, in this package we find yet another layer of dependency indirection:

From here, we head over to a package called “js-ethereum-cryptography“, which holy cow is actually a repository maintained by the Ethereum project itself!

At this point I’m about halfway into my allotted review time and asking myself questions like “why didn’t MetaMask just call this library directly” and “why do I make poor life choices”. But never mind, we’re almost there: surely we are going to see some actual crypto code soon!

Except no, we are not. This is what we find in the Ethereum library:

Yet another dependency: this time to something called “noble“. Here my quick-and-dirty approach to dependency resolution (Google “npm <package-name>”) fails me, since NPM says that “noble” is is a “Node.js Bluetooth Low Energy library.” As cool as that would be — access the blockchain through Bluetooth! — I’m guessing this is not right.

A bit more searching leads me to believe that noble is actually the Noble cryptography library, which appears to have been developed by Paul Miller. And hey, this code looks promising! The page lists actual cryptography design goals that seem reasonable, and the code is written in TypeScript. Even better: the library has been subject to an audit by Cure53.

Nonetheless, and with no disrespect to Paul or anyone else in this community: I would like to take a moment to complain about this dependency structure:

  1. Resolving all these pointless dependencies has eaten up a lot more of my review time than you might imagine. (I’m leaving out all the times I accidentally visited the wrong libraries because they used some combination of “js” and “ethereum” and “cryptography” and just Googling is risky here.)
  2. More substantively, I can’t help but notice that there are a lot of code owners in this critical path. So far I’ve traveled through repositories owned by four different organizations, and the last one is someone’s personal Github account. Is this normal for a system that secures billions of dollars of user funds? Maybe it should not be.

But enough complaining. There is actual crypto code here! We can finally look at ECDSA.

Except… it turns out that we can’t do that, because I made a stupid mistake.

After speaking with Paul Miller on Twitter, I learned that this whole code review has been premised on a very foolish assumption: namely that the code in the main (development) branches of MetaMask and its dependencies is actually the code people are using in MetaMask today. That turns out to be wrong. In fact, Paul tells me, the noble library is slated for deployment in an upcoming release. The current release of MetaMask relies on a library called “elliptic“, which was written by Fedor Indutny.

I’m now scraping up those last bits of cheese in the burrito bowl.

Let’s look at elliptic

Ok, so forget everything I did above. That was my mistake. I promise to come back to those newer code paths soon, but that’s for the future. My goal in this review was to look at MetaMask as it exists today. Apparently that means I need to look at Fedor’s elliptic library.

(Note: if I was being professional then what I would really do is review all the release branches of MetaMask and all of its dependencies just to make sure I’m in the right place, and to see what the calls looked like. But life is too short: I’m going to trust Paul.)

The elliptic library is written in “plain old” Javascript, so types will tend to be confusing. On the bright side, it’s relatively compact. The core library supports Ed25519 (for EdDSA) and Secp256k1 for ECDSA on networks like Ethereum. Let’s focus on the latter.

I’m not a Javascript developer so certain things aggravate me about reviewing this code. One is that developers often put critical routines in stupidly-named files like “index.js”, which is where we finally reach the core signing implementation for ECDSA.

Signing

ECDSA is a stupid algorithm, but fortunately it’s not a very complicated one. Leaving aside the curve operations, there are basically two places where ECDSA implementations tend to go wrong. The first is in key generation. The other is in signing.

In ECDSA signing the main security risk is in how nonces are generated: it’s critical that the ECDSA nonce (“k”) is sampled uniformly from the range 1 … n-1, where n is the group order. Critically, the same nonce must never be used with two distinct messages (really, message hashes.) If this ever happens, it’s possible to recover a private key from two signatures, something that’s generally frowned upon.

Let’s look at how signing is handled in this library:

So looking at the code above, how do we get “k“?

A first observation is that there is no actual randomness here, no call to a random number generator. Instead, the signing routine instantiates a deterministic random bit generator (DRBG) based on HMAC. That algorithm stretches a shorter “seed” into a longer sequence of pseudorandom bits that can then be used to obtain our random nonce.

I initially had a small heart attack when I (briefly!) misread the code and thought that the only seed for the DRBG was the ECDSA private key: this would definitely lead to nonce re-use. On closer inspection, the DRBG is seeded with two fields: entropy is set to the ECDSA private key, and nonce is set to the (hashed) message to be signed. Within the underlying DRBG implementation both values get hashed into the a value called seed, which actually is the seed for the DRBG. This design should mean that each (private key, message) pair will get different random bits, and thus different nonces “k”.

While I don’t love a purely deterministic generation of “k” (and I’ll explain why below), this isn’t some roll-your-own idea: in fact this is appears very similar to the approach recommended by RFC 6979. And going one step farther, it appears that elliptic is actually directly implementing an algorithm from that RFC, specifically the one in section 3.3, “Alternate Description of the Generation of k.” Annoyingly I did not learn this fact from the code itself, which would have saved me a lot of time. I realized it only while reading through the RFC for unrelated reasons (in fairness, elliptic’s README does mention it.)

(A brief aside to developers: if you implement a standard algorithm, please please add a reference in comments at the point where you’re using it, and comment each step so reviewers can see exactly how it maps to the pseudocode. For a good example of how to do this, take a look at how noble implemented the same standard.)

So are there any issues with this implementation?

A particularly weird feature is that the caller of the sign() function can pass in a data structure called options. One element of this structure appears to be a function, which is named (not very descriptively) options.k. If included in the options structure, this function appears to override the built-in nonce generation and replaces it with logic the caller provides. This is weird. While I can see an argument for allowing callers to provide extra entropy, I can’t see a good reason why a caller should be allowed to entirely override nonce generation within the signing routine.

Overall, this seems like a giant footgun and also a good opportunity for some caller to slip in malicious code that could easily slip past a review. (Indeed, thanks to all this dependency confusion I’m not even sure which of MetaMask’s dependencies is calling this routine.)

As a weird added bonus, the same options struct can also be passed in the enc argument as well; if this happens, the structure will be copied over into options. I’m sure there are perfectly good reasons for this.

A few smaller notes:

  • The function _truncateToN() seems poorly defined and serves two slightly different purposes, triggered by a boolean flag — which is just omitted in some calls.
  • In the first mode it simply truncates the message to ensure it is the same length as the group order n. This mode is only used to truncate nonces, which is not something that should ever need to happen — except when you’re using the crazy options.k function discussed above.
  • When called with “truncOnly = false“, however, _truncateToN() will also ensure that the truncated result is less than or equal to n, and if not it will subtract n. This appears to implement the bits2octets subroutine of RFC 6979, which is fine I guess but took too long to figure out. Why not just write two routines and name them appropriately?

Neither of these two issues are critical, but they made the code harder to read.

As a final note: I would urge developers to avoid these purely deterministic ECDSA implementations, mostly because they make signing implementations very fragile. Since all of the “entropy” in this signing routine comes from the message and private key, this means that any non-determinism in the actual ECDSA implementation (e.g., caused by weird option flags or glitches in BigNum encoding) can potentially produce exploitable signatures if the same message is ever signed twice. (As Nadia points out: this could also happen if you install the same private key into two different wallet implementations and sign the same message.)

In either case: adding some genuine entropy to the nonce could remove these concerns in most cases. Section 3.6 of RFC 6979 describes how to do this by adding an additional random input.

Key generation

Update (1/14): the comments below discuss key generation in elliptic, but MetaMask uses a BIP39-based keyphrase, which means that most ECDSA keys are actually generated elsewhere (allegedly in bitcoinjs/bip39). In retrospect this should have been obvious to me, but thanks to @kumavis_ for pointing it out. I’m keeping the section below just to be complete, but this code (probably) isn’t used by MetaMask.

The last stop on this very brief tour is the ECDSA key generation algorithm. This be found slightly higher up in the same file:

Once again we have an appearance from our old friend HMAC-DRBG. As with the signing routine, the caller can supply an options structure. Fortunately this one only contains optional entropy, and does not allow the caller to completely override key generation. Even better: the input to HMAC-DRBG appears to include actual randomness, drawn from a call to a function called rand().

Does rand() produce actual random numbers in all browsers? I think so, but really could not tell you for certain. This routine is implemented by a package called indutny/brorand, the entirety of which I’m pasting below, just because it made me burst into maniacal laughter:

Let’s just assume the entropy is good. The remainder of the (private) key generation function takes place in these lines:

This generates a random number priv between [0…2^{256} – 1] and then makes sure that priv is not greater than the group order n, in which case it re-generates priv. It then adds 1, which I think gives us a priv in the range [1…n+1], which seems wrong, since it should be in the range [1…n-1]. Am I doing my arithmetic backwards?

A basic point to make here is that much of this review has come down to examining code that solves a single problem: (deterministically) sampling integers within a precise range. So far I’ve reviewed at least two different custom implementations of the same process, both with slightly different results. Why is the same code repeated so often? Just make this a subroutine so we can analyze it and be sure it’s working correctly.

Anyway the (possible) issues above probably don’t really matter. I’ll assume in most cases private keys are generated correctly using a secure random number generator, which takes the most obvious risks off the table.

Conclusions

Ok, I am basically just exhausted now. This was a lot of work to evaluate a tiny piece of a crypto library that, frankly, might not even be the actual crypto library that MetaMask is using. I’m not sure if any of this was worth it, and I’m starting to get indigestion from the Chipotle.

If I could summarize my overall findings above, they would look something like this:

  1. I do not like reviewing Javascript for many reasons, not least of which is the lack of types. Fortunately many MetaMask dependencies are written in TypeScript. This library should be too. (Fortunately Noble appears to have made this upgrade.)
  2. The stupid dependency chain between MetaMask and its crypto libraries makes reviews more difficult than they need to be, and adds too many parties that need to be trusted. This should be greatly simplified, unless there is a reason for it that I don’t understand.
  3. The crypto code may be well-written in a cryptographic sense, but it was not really optimized for humans to review. This makes any review much harder, and I think anything that makes reviews harder is bad for users.
  4. Allowing callers to specify dangerous optional arguments to crypto routines seems bad.
  5. If you’re writing a security-critical routine like “sample random integer in a precise range,” write that routine one time, not multiple times. Please!
  6. Too much dependency on random numbers can make ECDSA dangerous. I would argue that too much determinism can be just as risky. It might be good to find a solution that mixes the two, as discussed forther above.

If it was up to me I would re-write the entire codebase in TypeScript and would try to use more standard libraries. I would remove layers of dependencies. I would tighten up the crypto APIs to make sure malicious calls are harder to get away with. Finally, I would make sure every single major block in this code had crystal-clear comments explaining precisely what it is doing.

As a last point, I would move all of this code back under the control of some more centralized organization(s), rather than leaving essential code in random personal repositories.

In summary: I don’t think this tiny review was entirely a waste of time. Although I don’t love everything about the way this code works, I’m now 15% more confident that MetaMask isn’t doing something utterly stupid with my cryptographic keys. That seems like a genuine win.

Next post: I’m going to review the noble-cryptography libraries to see what the new MetaMask code is planning to do.

Thinking about “traceability”

Thinking about “traceability”

A few weeks back, the messaging service WhatsApp sued the Indian government over new legislation that could undermine its end-to-end encryption (E2EE) software. The legislation requires, among other things, that social media and messaging companies must include the ability to “trace” the source of harmful viral content.

This tracing capability has been a major issue in India due to several cases of misinformation content that led to brutal mob attacks. The ostensible goal of the new legislation is to make it possible for police to track down those who originate or disseminate this content. Put simply, what the authorities say they want is a means to identify a piece of content (for example, a video or a meme) that has gone to a large group of people, and then trace the content back to the WhatsApp account that originally sent it.

I don’t plain to weigh in on whether this policy is a good idea or viable on the merits, nor is it in my wheelhouse to say whether the Indian government is being forthright in their reasons for demanding this capability. (I will express a very grave degree of skepticism that this approach will catch any criminal who is remotely malicious and takes steps to cover their tracks.) In this post I mostly want to talk about the technology implications for encrypted messaging services, and what tracing features might mean for end-to-end encrypted systems like WhatsApp.

Why is content tracing hard in the first place?

A first thing to keep in mind is that content tracing is not really a natural feature for messaging system. That is, tracing content poses a challenge even for completely unencrypted messaging apps: the kind where service providers can see all data traveling through the system. Tracing back to a content originator requires that the provider must be able to identify a file received at some end-user’s account, and then “chase” the content backwards through time through each user account that forwarded it. Even this description doesn’t quite respect the difficulty of the problem: not every user literally hits a “forward” button to send content onward. Many will save and re-upload a file, which breaks the forwarding chain and can even produce a slightly different file — thanks to the magic of digital compression.

These problems aren’t intractable. In a system with no end-to-end encryption, they could perhaps be solved using perceptual hash functions that identify similar media files by constructing a “fingerprint” of each file that can easily be compared against other files, and can survive re-encoding or minor edits. (This is the same technology that’s used in detecting child sexual abuse imagery, something I wrote about here.) What’s important is that even with this technology, the obvious approach to content tracing requires the provider to have plaintext access to (at least the hashes of) user content.

This turns out to be a big problem for encrypted communication systems like WhatsApp, in which end-to-end encryption protects the confidentiality of content even from the gaze of the service provider. In WhatsApp, all messages (as well as file attachments) are encrypted directly from sender to recipient, using an encryption key that WhatsApp doesn’t possess. With a few engineering caveats,* tracing content in these systems is very difficult.

But difficult is not the same thing as impossible. A recent post by WhatsApp makes the case that tracing is fundamentally impossible to implement securely in an end-to-end encrypted system. While this claim seems intuitively correct, it’s also kind of unsatisfying. After all, “impossible” is a strong word, and it’s highly dependent on which assumptions you’re making. The problem with imprecise claims is that they invite arguments — and indeed WhatsApp’s claim has has already been disputed by some in the field.

In this post I’m going to consider the following simple question: is traceability in end-to-end encrypted systems actually possible? And if so, what are the costs to privacy and security? For the record: I’m writing this post as much to answer the question for myself as to answer it for anyone else, so don’t expect this to be short or simple. I’m working things out as I go along.

Let’s start with the most basic question.

What is “traceability” in an end-to-end encrypted system?

The biggest problem with the debate over content tracing is that nobody seems to have a terribly rigorous definition of what, precisely an end-to-end encrypted tracing scheme is supposed to do — or more precisely, what its limits will be. To understand whether these systems can be built, we need to think hard about what they’re supposed to do in the first place.

From the perspective of law enforcement, content tracing is a simple feature. A piece of “viral” content (say an image file) has been widely distributed through a messaging platform, and authorities have decided that this content is harmful. Since the content is widespread, we can assume that police have received a copy of the file at an account they control (e.g., their own accounts, or the account of a cooperating user.) The authorities wish to contact the service provider and ask them for the originator and/or major spreaders of the content. This gives us our first requirement:

  • Traceability: given a piece of “viral” content received by a device/account (plus any cryptographic keys at the device), a tracing scheme should be able to reliably trace the content back to the originator (or at least, earlier) accounts.

From the user’s side, E2EE systems are supposed to maintain the confidentiality of user communications. Confidentiality is a broad term and can mean a lot of things. In this case it has two specific flavors that are relevant, with names that I just made up now:

I wanted to illustrate this post with memes about the Swedish monarchy. Unfortunately, it turns out that Swedish Monarchy memes basically suck.
  • The confidentiality of the content itself: this ensures that forwarded files are known only to the sender and authorized recipients(s) of a conversation. Notice that for viral content, this property isn’t terribly important. Remember: our assumption is that the content itself has ultimately been forwarded widely, until (nearly) everyone has received a copy.
  • The confidentiality of “who sent what”: while the content itself may not be secret, the fact that a given user transmitted a piece of content is still quite sensitive. If I send you a political meme — perhaps the one at right, poking fun at the King of Sweden — then I might not care very much about the secrecy of the meme itself. But I sure might want to hide the fact that I sent it, to avoid retribution by a totalitarian Swedish government.** Proper end-to-end encryption is supposed to protect this sort of expression.

In short: traceability can really screw with the “who sent what” side of content confidentiality. It is a fairly harmless thing for, say, the tyrannical Swedish government to learn that specific memes about the King of Sweden exist. It is very different for them to know that I’ve been sending a lot of them to a specific group of friends.

Finally I need to clarify one more thing, since discussions with colleagues have made me realize that it is not obvious. Information revealed about “who sent what” in an E2E system is not the same as metadata. I feel stupid having to point this out, but metadata (information about data that we can’t easily hide from providers, such as the list of contacts you’ve communicated with) is a very different thing. WhatsApp might inevitably learn that I texted 500 people last month because they delivered my (encrypted) messages. They still shouldn’t learn that any of my messages are making fun of the Swedish monarchy.

So can traceability be accomplished without breaking E2E?

It really depends what you mean by “traceability” and what you mean by “breaking.”

While confidentiality and traceability may seem like they’re in conflict, it’s important to point out that some forms of tracing can be implemented in a non-coercive way that does not inherently violate confidentiality. For example, imagine Alice originates a meme, and this meme subsequently makes its way to police officer Eve via the following forwarding path:

Provided that Bob, Charlie and Dave are willing to cooperate with the police, then Eve can use shoe-leather detective work to trace the content backwards towards Alice. After all: each participant (1) is an authorized recipient of the data and (2) knows who they received the content from. Nobody is “breaking” E2E if they perform this sort of cooperative tracing: it’s just people sharing information they’re already entitled to have.

It’s now time to say a stupid and obvious thing: what’s being proposed in India is not cooperative tracing.

Let’s be clear: if detective work and cooperation was sufficient to trace the originators of harmful content, the police wouldn’t be asking for new encryption laws, and WhatsApp wouldn’t be suing the Indian government.

By passing these laws, police are tacitly admitting that voluntary content tracing is not sufficient to met their investigative needs. This implies that when police try to follow a chain like the one shown above, they’re running into people who are either (1) unwilling to share this information in a timely way, or (2) simply don’t have the information anymore — maybe because they deleted the messages or lost their phone.

Let’s draw a picture of this situation. Below, each node represents a WhatsApp account, with the red node being the originator of some viral content, and the blue node representing police. Green nodes represent users who are willing to cooperate with the police, provided they are contacted. Here the gray nodes are users who won’t cooperate — either because they didn’t keep the information, or maybe because they just don’t like police.

Police (blue) can try to cooperatively trace content backwards through this forwarding graph by talking to cooperative users (green), but they’ll never reach the originator (red) because there are too many non-cooperating nodes (gray) in the way.

The prevalance of uncooperative nodes in the above graph makes it virtually impossible for cooperative tracing to find the originator. It seems obvious that real-world situations like this will make voluntary tracing very difficult to achieve.

This brings us to the central challenge of all content tracing proposals so far: to make tracing possible, a tracing system needs to turn every WhatsApp user (including the originator) into a cooperative green circle — regardless of whether users actually want to cooperate with police. Moreover, to prevent users from losing their phones and/or going offline, the system will need to force users to place the necessary tracing information into escrow as soon as they receive content, so it will remain available even if users leave the network or lose their phones.

Not only that, but each of these newly “cooperative” users might even be forced to admit to police that they also forwarded the content. Don’t want to tell the Swedish government that you made fun of their beloved King? Then you’d better not use a system that follows this pattern of enforced traceability.

How do we force users to cooperate?

The major questions facing an end-to-end tracing system are twofold: (1) precisely how much information are recipients going to be forced to reveal against their will, and (2) who will be able to access this information?

There are at least two proposals that I’ve seen for adding traceability to E2EE communications schemes, and both start from similar assumptions. They both rely on making changes to the end-users’ client software to ensure that tracing information is stored in “escrow” at the provider every time content is sent from one user to another.

One proposal is academic, and it takes something like the following “strawman” approach:

  1. Each time someone sends content to another user, they will generate some fresh encryption key K.
  2. They will use this key to encrypt a record that contains (A) the content (or a hash of it) and (B) the sender and receiver identities. They will store the encrypted record on WhatsApp’s servers as a kind of “key escrow.” Critically, at this point they will not send WhatsApp the key K.
  3. The sender will transmit the record encryption key K to its recipient, using end-to-end encryption.
  4. When the next user forwards the same content on to another user, it will repeat steps (1-4) and it will also send all the keys generated by previous users.

Now if the police receive a copy of some viral content on an account they control, they will have a list of encryption keys that correspond to everyone in the forwarding chain for that content. They can just go back to WhatsApp with a subpoena, obtain the encrypted records, and use the chain of keys to decrypt them. This will reveal the entire forwarding path back to the originator.

Alice sends a message to Bob and “escrows” some encrypted information on WhatsApp’s servers. She sends the encryption key to Bob. Bob forwards to Eve and “escrows” similar information, sending the key to Eve. Eve (a police officer) can now use her key to decrypt the records stored at WhatsApp until she learns who Alice is.

Of course sending thousands of keys along with each forwarded message is kind of a drag, so there are some efficiency optimizations one can use to compress this information. For example, each time a user forwards a message they can store the previous user’s encryption key inside the encrypted record they escrow with WhatsApp. That means if police get one key — corresponding to the last record in a chain — they can decrypt the escrow record, and then they will obtain the key for the previous record in the chain. They can repeat this process until the entire forwarding chain is “unzipped”.

A lazy diagram (at right) shows how this process might work with three participants. Essentially the whole thing is a form of key escrow, with WhatsApp acting as the escrow authority. If the police get included in any chain at all, they (Eve in this diagram) can subpoena WhatsApp to trace the chain back to originator.

Of course, this is a very simple strawman explanation of the ideas: for a more fully-specified (academic) proposal, you can see this paper by Tyagi, Miers and Ristenpart. Not only does it support path traceback, but it also lets you figure out who else the message was forwarded to! The cryptography is a bit more optimized, but the security guarantees are roughly the same.

A second proposal by Dr V. Kamakoti of IIT Madras is far simpler: it essentially requires each person who originates new content into the network (as opposed to forwarding it) to attach a “watermark” to the content that identifies the account ID of the sender. This also assumes a trustworthy WhatsApp client, of course. Presumably that watermark could be encrypted using a key stored at WhatsApp, so this tracing will at least require the provider’s involvement.

Ok, so what’s wrong with these traceability proposals?

Well if you’re ok with the fact that police can determine the identity of every single person who forwarded a piece of viral content regardless of whether they’re not the originator of that content then, I guess, nothing.

That’s the essence of what the Tyagi, Miers and Ristenpart proposal offers, and frankly I’m not particularly ok with it. Even if I accepted the logic that we should have the means to trace “content originators” — the actual justification governments have offered for building systems like this one — I surely would not want to reveal every random user account that happened to forward the content. That seems like a recipe for persecuting innocent people.

Moreover, regardless of whether the system finds “content originators” or just “everyone on the forwarding path”, I think these ideas are pretty much synonymous with mass surveillance — and certainly they buttress WhatsApp’s technical claim that “traceability” breaks end-to-end encryption.

But I want to go just a little farther, and point out that these ideas have a major weakness that makes the entire approach feel confused.

The approaches I describe above rely on a critical assumption: that all participants in the system are going to behave honestly — that is, everyone will run the official WhatsApp client, which will contain logic designed to store an escrow record on WhatsApp’s servers. Nobody in this system will try to bypass this system by running an unofficial client, or by hacking their client to disable the escrow logic.

If you’re willing to make such a strong assumption, why bother with the complicated Tyagi, Miers and Ristenpart proposal? Why not just use the Kamakoti proposal: modify your WhatsApp client to add a small “watermark” to each fresh non-forwarded media file. After all: once you’ve assumed that everyone is running an honest client, you can assume that the content originator will be too — can’t you? This approach would still reveal a lot of information to police, but it wouldn’t reveal the identity of every random person who forwarded the content.

My guess is that Tyagi, Miers and Ristenpart have an answer to this that boils down to something like “maybe you can’t trust the originator to run the correct client, but loads of other people will be running it.” To me this invites a much more detailed discussion about what security assumptions you’re making, and how “bad” the bad guys really are.

One last note to academic authors: don’t help bad people build unrestricted surveillance systems and then punt “preventing abuse” to later papers, ok?

If you read this far to answer the (rarified) question of how traceability could work and whether it breaks E2E encryption, then you can stop here. The rest of this post is not about that. It’s just me alienating a whole bunch of my academic peers.

Here is what I want to say to them.

The debate around key escrow and law enforcement surveillance is a very hard one. People have a lot of opinions about whether this work is “helping the good guys” or “helping the bad guys”, i.e., whether it’s about helping police find criminals, or whether it’s going to build the infrastructure for authoritarianism and spying. I don’t know the answer to this. I suppose the answer depends to some extent on the integrity of the government(s) that are implementing them. I have opinions, but I don’t expect all of my colleagues to share them.

What I would ask my colleagues to think hard about is the following:

When you propose (or review a paper that proposes) a new “lawful access” system, is it solving the hard problems, or is it punting on the hard problems and solving only the easy ones?

Because at the end of the day, building systems that violate the confidentiality of E2E encryption is a relatively easy problem from a scientific perspective. We’ve known how to build key escrow systems from the earliest days of encryption. Building these systems is not interesting, scientifically. It is useful from an engineering perspective, of course — to parties who want to deploy such systems. When respected academics write such papers, it is also politically useful to those same parties.

What is scientifically interesting is whether we can build systems that actually prevent abuse, either by governments that misuse the technology or by criminals who steal keys. We don’t really know to do that very well right now. This is the actual scientific problem of designing law enforcement access systems — not the “access” part, which is relatively easy and mostly consists of engineering. In short, the scientific problem is figuring out how to prevent the wrong type of access.

When I read a paper that builds a sophisticated surveillance system, I expect it to address those abuse problems in a meaningful way. If the paper punts the important problems to subsequent work — if what I get is a paragraph like the one at right — my thinking is that you aren’t solving the right problem. You’re just laying the engineering groundwork for a world I don’t want my kids to live in. I would politely ask you all to stop doing that.

Notes:

* Some messaging systems implement attachment forwarding by passing a pointer to an existing file that is stored on their servers. This is a nice storage optimization, since it avoids the need to make and store a full duplicate copy of each object whenever the user hits “forward”. The downside of this approach is that it makes tracing relatively easy for providers, since they can see exactly which users accessed a given file. Such optimizations are inimical to private systems and really should be avoided.

** All claims about the Swedish government are fictionalized.

A case against security nihilism

A case against security nihilism

This week a group of global newspapers is running a series of articles detailing abuses of NSO Group’s Pegasus spyware. If you haven’t seen any of these articles, they’re worth reading — and likely will continue to be so as more revelations leak out. The impetus for the stories is a leak comprising more than 50,000 phone numbers that are allegedly the targets of NSO’s advanced iPhone/Android malware.

Notably, these targets include journalists and members of various nations’ political opposition parties — in other words, precisely the people who every thinking person worried would be the target of the mass-exploitation software that NSO sells. And indeed, that should be the biggest lesson of these stories: the bad thing everyone said would happen now has.

This is a technical blog, so I won’t advocate for, say, sanctioning NSO Group or demanding answers from the luminaries on NSO’s “governance and compliance” committee. Instead I want to talk a bit about some of the technical lessons we’ve learned from these leaks — and even more at a high level, precisely what’s wrong with shrugging these attacks away.

We should all want perfect security!

Don’t feel bad, targeted attacks are super hard!

A perverse reaction I’ve seen from some security experts is to shrug and say “there’s no such thing as perfect security.” More concretely, some folks argue, this kind of well-resourced targeted attack is fundamentally impossible to prevent — no matter how much effort companies like Apple put into stopping it.

And at the extremes, this argument is not wrong. NSO isn’t some script-kiddy toy. Deploying it costs hundreds of thousands of dollars, and fighting attackers with that level of resources is always difficult. Plus, the argument goes, even if we raise the bar for NSO then someone with even more resources will find their way into the gap — perhaps charging an even more absurd price. So let’s stop crapping on Apple, a company that works hard to improve the baseline security of their products, just because they’re failing to solve an impossible problem.

Still that doesn’t mean today’s version of those products are doing everything they could be to stop attacks. There is certainly more that corporations like Apple and Google could be doing to protect their users. However, the only way we’re going to get those changes is if we demand them.

Not all vectors are created equal

Because spyware is hard to capture, we don’t know precisely how Pegasus works. The forensic details we do have come from an extensive investigation conducted by Amnesty International’s technology group. They describe a sophisticated infection process that proved capable of infecting a fully-patched iPhone 12 running the latest version of Apple’s iOS (14.6).

Many attacks used “network injection” to redirect the victim to a malicious website. That technique requires some control of the local network, which makes it hard to deploy to remote users in other countries. A more worrying set of attacks appear to use Apple’s iMessage to perform “0-click” exploitation of iOS devices. Using this vector, NSO simply “throws” a targeted exploit payload at some Apple ID such as your phone number, and then sits back and waits for your zombie phone to contact its infrastructure.

This is really bad. While cynics are probably correct (for now) that we probably can’t shut down every avenue for compromise, there’s good reason to believe we can close down a vector for 0-interaction compromise. And we should try to do that.

What can we do to make NSO’s life harder?

What we know that these attacks take advantage of fundamental weaknesses in Apple iMessage: most critically, the fact that iMessage will gleefully parse all sorts of complex data received from random strangers, and will do that parsing using crappy libraries written in memory unsafe languages. These issues are hard to fix, since iMessage can accept so many data formats and has been allowed to sprout so much complexity over the past few years.

There is good evidence that Apple realizes the bind they’re in, since they tried to fix iMessage by barricading it behind a specialized “firewall” called BlastDoor. But firewalls haven’t been particularly successful at preventing targeted network attacks, and there’s no reason to think that BlastDoor will do much better. (Indeed, we know it’s probably not doing its job now.)

Adding a firewall is the cheap solution to the problem, and this is probably why Apple chose this as their first line of defense. But actually closing this security hole is going to require a lot more. Apple will have to re-write most of the iMessage codebase in some memory-safe language, along with many system libraries that handle data parsing. They’ll also need to widely deploy ARM mitigations like PAC and MTE in order to make exploitation harder. All of this work has costs and (more importantly) risks associated with it — since activating these features can break all sorts of things, and people with a billion devices can’t afford to have .001% of them crashing every day.

An entirely separate area is surveillance and detection: Apple already performs some remote telemetry to detect processes doing weird things. This kind of telemetry could be expanded as much as possible while not destroying user privacy. While this wouldn’t necessarily stop NSO, it would make the cost of throwing these exploits quite a bit higher — and make them think twice before pushing them out to every random authoritarian government.

It’s the scale, stupid

Critics are correct that fixing these issues won’t stop exploits. The problem that companies like Apple need to solve is not preventing exploits forever, but a much simpler one: they need to screw up the economics of NSO-style mass exploitation.

Targeted exploits have been around forever. What makes NSO special is not that they have some exploits. Rather: NSO’s genius is that they’ve done something that attackers were never incentivized to do in this past: democratize access to exploit technology. In other words, they’ve done precisely what every “smart” tech business is supposed to do: take something difficult and very expensive, and make it more accessible by applying the magic of scale. NSO is basically the SpaceX of surveillance.

But this scalability is not inevitable.

NSO can afford to maintain a 50,000 number target list because the exploits they use hit a particular “sweet spot” where the risk of losing an exploit chain — combined with the cost of developing new ones — is low enough that they can deploy them at scale. That’s why they’re willing to hand out exploitation to every idiot dictator — because right now they think they can keep the business going even if Amnesty International or CitizenLab occasionally catches them targeting some human rights lawyer.

But companies like Apple and Google can raise both the cost and risk of exploitation — not just everywhere, but at least on specific channels like iMessage. This could make NSO’s scaling model much harder to maintain. A world where only a handful of very rich governments can launch exploits (under very careful vetting and controlled circumstances) isn’t a great world, but it’s better than a world where any tin-pot authoritarian can cut a check to NSO and surveil their political opposition or some random journalist.

So how do we get to that world?

In a perfect world, US and European governments would wake up and realize that arming authoritarianism is really is bad for democracy — and that whatever trivial benefit they get from NSO is vastly outweighed by the very real damage this technology is doing to journalism and democratic governance worldwide.

But I’m not holding my breath for that to happen.

In the world I inhabit, I’m hoping that Ivan Krstić wakes up tomorrow and tells his bosses he wants to put NSO out of business. And I’m hoping that his bosses say “great: here’s a blank check.” Maybe they’ll succeed and maybe they’ll fail, but I’ll bet they can at least make NSO’s life interesting.

But Apple isn’t going to do any of this if they don’t think they have to, and they won’t think they have to if people aren’t calling for their heads. The only people who can fix Apple devices are Apple (very much by their own design) and that means Apple has to feel responsible each time an innocent victim gets pwned while using an Apple device. If we simply pat Apple on the head and say “gosh, targeted attacks are hard, it’s not your fault” then this is exactly the level of security we should expect to get — and we’ll deserve it.