One-Time Programs

One-Time Programs

One of the things I like to do on this blog is write about new research that has a practical angle. Most of the time (I swear) this involves writing about other folks’ research: it’s not that often that I write about work that comes out of my own lab. Today I’m going make an exception to talk about a new paper that will be appearing at TCC ’22. This is joint work with my colleagues Abhishek Jain and Aarushi Goel along with our students Harry Eldridge and Max Zinkus.

This paper is fun for three reasons: (1) it addresses a cool problem, (2) writing about it gives me a chance to cover a bunch of useful, general background that fits the scope of this blog [indeed, our actual research won’t show up until late in the post!], and most critically (3) I want people to figure out how to do it better, since I think it would be neat to make these ideas more practical. (Note, if you will, that TCC stands for Theory of Cryptography conference, which is kind of a weird fit.)

Our work is about realizing a cryptographic primitive called the One-Time Program (OTP). This is a specific kind of cryptographically obfuscated computer program — that is, a program that is “encrypted” but that you can mail (literally) to someone who can run it on any untrusted computer, using input that the executing party provides. This ability to send “secure, unhackable” software to people is all by itself of a holy grail of cryptography, since it would solve so many problems both theoretical and practical. One-time programs extend these ideas with a specific property that is foreshadowed by the name: the executing computer can only run a OTP once.

OTPs are one of the cooler ideas to pop out of our field, since they facilitate so many useful things. Aside from the obvious dark-side implications (perfect DRM! scary malware!) they’re useful for many constructive applications. Want to send a file encrypted under a weak password, but ensure nobody can just brute-force the thing? Just send a OTP that checks the password and outputs the file. Want to send a pile of sensitive data and let users compute statistical functions over it using differential privacy? Sure. Time-lock encryption? Why not. In fact, OTPs are powerful enough that you can re-invent many basic forms of cryptography from them… provided you’re not too worried about how efficient any of it will be.

As we’ll see in a minute, OTPs might be nice, but they are perhaps a little bit too good to be true. Most critically they have a fundamental problem: building them requires strong model-breaking assumptions. Indeed, many realizations of OTPs require the program author to deliver some kind of secure hardware to the person who runs the program.* This hardware can be ridiculously simple and lightweight (much simpler than typical smartcards) but the need to have some of it represents a very big practical limitation. This is likely why we don’t have OTPs out in the world — the hardware required by scientists does not actually exist.

In this work we tried to answer a very similar question. Specifically: can we use real hardware (that already exists inside of phones and cloud services) to build One-Time Programs? That’s the motivation at a high level.

Now for the details… and boy are there a lot of them.

One-Time Programs

One-Time Programs (OTPs) were first proposed by Goldwasser, Kalai and Rothblum (GKR) back in CRYPTO 2008. At a surface level, the idea is quite simple. Let’s imagine that Alice has some (secret) computer program P that takes in some inputs, cogitates for a bit, and then produces an output. Alice wishes to mail a copy of P to her untrustworthy friend Bob who will then be able to run it. However Alice (and Bob) have a few very strict requirements:

  1. Bob can run the program on any input he wants, and he will get a correct output.
  2. Bob won’t learn anything about the program beyond what he learns from the output (except, perhaps, an upper-bound on its size/runtime.)
  3. Bob can run the program exactly once.

Let’s use a very specific example to demonstrate how these programs might work. Imagine that Alice wants to email Bob a document encrypted under a relatively weak password such as a 4-digit PIN. If Alice employed a traditional password-based encryption scheme, this would be a very bad idea! Bob (or anyone else who intercepts the message before it reaches Bob) could attempt to decrypt the document by systematically testing each of the (10,000) different passwords until one worked correctly.

Using a one-time program, however, Alice can write a program with code that looks like this, and turn it into an OTP:

Program P: takes an argument "password"
  1. if password != "<secret password>",
      output "BAD"
  2. else output <secret document>
I don’t know if the kids even get my meme choices anymore.

When Bob receives an OTP of the program above, he can then run it on some password input he chooses — even if Alice is no longer around to help him. More critically, because it’s a one-time program, Bob will only be able to try a single password guess. Assuming Bob knows the right password this is just fine. But a recipient who does not know the password will have to guess it correctly the first time. The nice implication is that even a “weak” 4-digit PIN reasonably to safe to use as a password.

(Of course if Alice is worried about Bob innocently fat-fingering his password, she can send him a few different copies of the same program. Put differently: one-time programs trivially imply N-time programs.)

One-time programs have many other useful applications. Once I can make “unhackable” limited-use software, I can send you all sorts of useful functionalities based on secret intellectual property or private data rather than keeping that stuff locked up on my own server. But before we can do those things, we need to actually build OTPs.

Why hardware?

If you spend a few minutes thinking about this problem, it should become obvious that we can’t build OTPs using (pure) software: at least not the kind of software that can run on any general-purpose computer.

The problem here stems from the “can only run once” requirement.

Imagine that you send me a pure software version of a (purported) One-Time Program. (By this I mean: a piece of software I can run on a standard computer, whether that’s a Macbook or a VM/emulator like QEMU.) I’m supposed to be able to run the program once on any input I’d like, and then obtain a valid output. The program is never supposed to let me run it a second time on a different input. But of course if I can run the software once that way, we run into the following obvious problem:

What stops me from subsequently wiping clean my machine (or checkpointing my VM) and then re-installing a fresh copy of the same software you sent—and running it a second time on a different input?

Sadly the answer is: nothing can prevent any of this. If you implement your purported “OTP” using (only) software then I can re-run your program as many times as I want, and each time the program will “believe” it’s running for the very first time. (In cryptography this is sometimes called a “reset attack”.)

Keanu experiences a reset attack.

For those who are familiar with multi-party computation (MPC), you’ll recognize that such attacks can be thwarted by requiring some interaction between the sender and recipient each time they want to run the program. What’s unique about OTPs is that they don’t require any further interaction once Alice has sent the program: OTPs work in a “fire and forget” model.

In their original paper, GKR noted this problem and proposed to get around it by changing the model. Since pure software (on a single machine) can’t possibly work, they proposed to tap the power of tamper-resistant hardware. In this approach, the program author Alice sends Bob a digital copy of the OTP along with a physical tamper-resistant hardware token (such as a USB-based mini-HSM). The little token would be stateful and act like one of those old copy-protection dongles from the 1990s: that is, it would contains cryptographic key material that the program needs in order to run. To run the program, Bob would simply pop this “hardware token” into his computer.

A single USB token might contain thousands of “one-time memories.”

Now you might object: doesn’t using hardware make this whole idea kind of trivial? After all, if you’re sending someone a piece of specialized tamper-resistant hardware, why not just pop a general-purpose CPU into that thing and run the whole program on its CPU? Why use fancy cryptography in the first place?

The answer here has to do with what’s inside that token. Running general programs on tamper-proof hardware would require a token with a very powerful and expensive (not to mention complex) general-purpose CPU. This would be costly and worse, would embed a large attack software and hardware attack surface — something we have learned a lot about recently thanks to Intel’s SGX, which keeps getting broken by researchers. By contrast, the tokens GKR propose are absurdly weak and simple: they’re simple memory devices that spit out and erase secret keys when asked. The value of the cryptography here is that Bob’s (untrusted) computer can still do the overwhelming share of the actual computing work: the token merely provides the “icing” that makes the cake secure.

But while “absurdly weak and simple” might lower the hardware barrier to entry, this is not the same thing as having actual tokens exist. Indeed it’s worth noting that GKR proposed their ideas way back in 2008, it is now 2022 and nobody (to my knowledge) has ever built the necessary token hardware to deploy the in the world. (One could prototype such hardware using an open HSM platform, but how secure would that actually be — compared to a proper engineering effort by a major company like Apple, Google or Amazon?)

And yet One-Time Programs are neat. It would be useful to be able to write and run them on real devices! For many years I’ve run into problems that would melt away if we could deploy them easily on consumer devices. Wouldn’t it be great if we could build them using some hardware that already exists?

How to build a (GKR) One-Time Program

In order to explain the next part, it’s necessary to give an extremely brief overview of the GKR construction for One-Time Programs, and more specifically: their specialized tokens. This construction is based on garbled circuits and will make perfect sense if you’re already familiar with that technique. If not, it will require a little bit more explanation.

GKR’s idea is to rely on many individual tokens called One-Time Memories (OTM). An invidual OTM token works like this:

  1. When a program author (Alice) sets one up, she gets to install two different strings into it: let’s call them K0 and K1. She can then mail the token to Bob.
  2. When Bob receives the token and wants to use it, he can ask the token for either of the two strings (0/1). The token will hand Bob the desired string (either K0 or K1.)
  3. Once the token has given Bob the string he asked for, it permanently erases the other string.

The strings themselves need not be very long: 128 bits is ideal. To use these tokens for building One-Time Programs, Alice might need to set up a few hundred or a few thousand of these “tokens” (which can technically all be glommed together inside a single USB device) and send them to Bob.

Once you assume these tokens, the GKR style of building One-Time Programs is pretty straightforward if you’re a cryptographer. Summarized to someone who is familiar with garbled circuits: the basic idea is to take the classical Yao two-party computation (2PC) scheme and replace the (interactive) Oblivious Transfer portion by sending the evaluator a set of physical One-Time Memory tokens.

If that doesn’t work for you, a slightly more detailed explanation is as follows:

Alice first converts her program P into a boolean circuit, like the one below:

Having done that, she then assigns two random cryptographic keys (annoyingly called ‘labels’) to every single wire in the circuit. One key/label corresponds to the “0” value on that wire, and the other to the “1” bit. Notice that the input wires (top-left) also count here: they each get their own pair of keys (labels) that correspond to any input bit.

labeledcircuit

Alice next “garbles” the circuit (encrypting each gate) using a clever approached devised by Andrew Yao, which I won’t describe precisely here but Vitalik Buterin nicely explains it in this blog post. The result is that each table is replaced with an encrypted table of ciphertexts: anyone who has two of the appropriate labels going into that gate will be able to evaluate it, and in return they will receive the appropriate label for the gate’s output wire. Hence all Bob needs is the appropriate keys/labels for the value Bob wishes to feed into the input wires at the top-left of the circuit — and then he can then recursively evaluate the circuit all the way to the output wires.

From Bob’s perspective, therefore, all he needs to do is obtain the labels that correspond to the input arguments he wants to feed into the circuit. He will also need a way to translate the labels on the output wires into actual bits. (Alice can send Bob a lookup table to help him translate those labels into actual bits, or she can just make the circuit output raw binary values instead of labels on those wires.)

A critical warning is that Bob must receive only one input label for each wire: if he had more than that, he could run the circuit on two “different inputs.” (And in this case, many other bad things would probably happen.) Hence the high-level problem is how to deliver the appropriate input labels to Bob, while ensuring that he gets exactly one label for each wire and never more.

And this is where the One-Time Memory tokens fit in perfectly. Alice will set up exactly one OTM token for each input wire: it will contain both labels for that wire. She’ll send it to Bob. Bob can then query each token to obtain exactly one label for that wire, and then use those labels to evaluate the rest of the circuit. The OTM token will destroy all of the unused labels: this ensures that Bob can only run the program on exactly one input. And that’s the ballgame.**

So where do I pick myself up some One-Time Memories?

You buy them at your local One-Time Memory store, obviously.

Seriously, this is a huge flaw in the hardware-based OTP concept. It would be awfully useful to have OTPs for all sorts of applications, assuming we had a bunch of One-Time Memory tokens lying around and it was easy to ship them to people. It would be even more awesome if we didn’t need to ship them to people.

For example:

  • Imagine there was a publicly-accessible cloud provider like Google or Apple that had lots of One-Time Memories sitting in your data center that you could rent. Alice could log in to Google and set up a bunch of OTM “tokens” remotely, and then simply send Bob the URLs to access them (and hence evaluate a OTP that Alice mails him.) As long as the cloud provider uses really good trusted hardware (for example, fancy HSMs) to implement these memories, then even the cloud provider can’t hack into the secrets of Alice’s One-Time Programs or run them without Bob’s input.
  • Alternatively, imagine we had a bunch of One-Time Memories built into every smartphone. Alice couldn’t easily send these around to people, but she could generate One-Time Programs that she herself (or someone she sends the phone to) could later run. For example, if Alice could build a sophisticated biometric analysis program that uses inputs from the camera to unlock secrets in her app, and she could ensure that the program stays safe and can’t be “brute-forced” through repeated execution. Even if someone stole Alice’s phone, they would only be able to run the program (once) on some input, but they would never be able to run it twice.

The problem is that, of course, cloud providers and phone manufacturers are not incentivized to let users build arbitrary “unhackable” software, nor are they even thinking about esoteric things like One-Time Memories. No accounting for taste.

And yet cloud providers and manufacturers increasingly are giving consumers access to specialized APIs backed by secure hardware. For example, every single iPhone ships with a specialized security chip called the Secure Enclave Processor (SEP) that performs specific cryptographic operations. Not every Android phone has such processor, but most include a small set of built-in TrustZone “trustlets” — these employ processor virtualization to implement “secure” mini-apps. Cloud services like Google, Apple iCloud, Signal and WhatsApp have begun to deploy Hardware Security Modules (HSMs) in their data centers — these store encryption keys for consumers to use in data backup, and critically, are set up in such a way that even the providers can’t hack in and get the keys.

Unfortunately none of the APIs for these hardware services offer anything as powerful as One-Time Programs or (even) One-Time Memories. If anything, these services are locked down specifically to prevent people from doing cool stuff: Apple’s SEP supports a tiny number of crypto operations. TrustZone does support arbitrary computation, but today your applet must be digitally signed by a phone manufacturer or a TEE-developer like Qualcomm before you can touch it. Consumer-facing cloud providers definitely don’t expose such powerful functionality (for that you’d need something like AWS Nitro.) The same goes for “simple” functionalities like One-Time Memories: they may be “simple”, but they are also totally weird and why would any consumer manufacturer bother to support them?

But let’s never forget that cryptography is a sub-field of computer security. In that field when someone doesn’t let you run the machine you want to run, you figure out how to build it.

What hardware APIs do manufacturers and cloud providers support?

Not very many, unfortunately.

The Keymaster meets the Gatekeeper. I think there was also a dog in this movie?

If you hunt around the Apple Developer Documentation for ways to use the iPhone SEP, for example, you’ll find APIs for generating public keys and using them, as well as ways to store keys under your passcode. Android (AOSP) provides similar features via the “Gatekeeper” and “Keymaster” trustlets. Consumer-facing cloud services provide HSM-backed services that also let you store your keys by encrypting them under a password.

None of this stuff is a “One-Time Memory,” unfortunately. Most of it isn’t even stateful… with one exception.

“Counter lockboxes”

Reading through documentation reveals one functionality that isn’t exactly what we need, but at least has some promise. In its documentation, Apple calls this function a “counter lockbox“, and it’s designed to store encryption keys that are protected with a user-selected password. The same person (or someone else) can later retrieve the key by sending in the right passcode. In the event that the entered passcode is not the right one, the lockbox increments an “attempt counter” that caps the number of incorrect guesses that will be permitted before the key is locked forever.

And I do mean forever. When the user exceeds the maximum number of guesses, something interesting happens (here is Apple’s version):

Source: Apple Platform Security guide.

While I’m going to use Apple’s terminology, lockboxes are not just an Apple thing: a nearly-identical functionality exists inside every phone and in basically all of the consumer-facing cloud services I mentioned above.

Since counter lockboxes erase things, and they’re pretty much ubiquitous, this gives us hope. Even though they are not OTMs, maybe we can somehow use them to build OTMs. To explain how this works, we first need to give a clear description of how a basic a counter lockbox works. Here’s a simplified version:***

  1. To set up a fresh lockbox, Alice provides an encryption key K and a password P as well as a “maximum attempt counter” M. The initial attempt counter is set to A := 0. The token stores (K, P, A, M).
  2. When Bob wants to access the lockbox, he sends in a “guess” P’.
    1. If P’ == P (i.e., the guess is correct), the token returns K and resets A := 0.
    2. Else if P’ != P (the guess is incorrect), the token sets A := A + 1 and returns “bad password”.
  3. If A == M (i.e., the maximum attempts have been reached) then the token wipes its memory completely.

If you remember our description of a one-time memory (OTM) further above, you’ll notice that a lockbox stores only one string (key), but that an OTM has two different strings stored in it. Moreover, the OTM will always give us one or the other string, but a counter lockbox only gives us its single string when we enter a password.

But perhaps we can simulate OTMs by using multiple lockboxes.

Imagine that Alice has access to two different (fresh) counter lockboxes, as illustrated below. Let’s assume she configures both lockboxes to permit exactly one password attempt (M=1). Next, she stores the string K0 into the lockbox on the left, and sets it up to use password “0”. She then places the string K1 into the lockbox on the right and set it up with password “1”, as follows:

The picture above shows you what’s inside each lockbox. But you need to remember that to anyone other than Alice, lockboxes are kind of a black box. That is, they don’t reveal the key or password they’ve set up with — until the lockbox is queried on a specific password. To exploit this fact, Alice can shuffle the two lockboxes’ and send them to Bob in a random ordering. Until he tries to access one, these lockboxes will look like this to Bob:

So how does Bob “open” the lockboxes to reliably obtain one of the two strings?

For a moment let’s assume that Bob is honest, and wants to get either K0 or K1. He does not know which of the two lockboxes contains the string he wants. He can, however, employ a simple strategy to obtain it.

Assuming Bob’s choice is “0”, he can simply query both lockboxes on password “0”. One of the two lockboxes will output K0, and the other will output “bad password”. (If his choice is “1”, then he can do the same using that as the password.) This strategy works perfectly. In either case Bob always obtains the string he wanted, and the other string ends up getting erased: just like a One-Time Memory!

The problem with this approach is that Bob might not be honest.

A cheating Bob can query the first lockbox on password “0”, and if that lockbox hands him K0, he can switch strategies and query the remaining lockbox on password “1”. If this works, he will end up with both of the strings — something that should never happen in a One-Time Memory. And this is very bad! Imagine we use such broken “memories” to build GKR one-time programs and Bob pulls this off for even a single input wire in our garbled circuit, then he will be able to query the circuit on multiple distinct inputs. (And it gets worse: if Bob obtains two labels on the same input wire, the garbled circuit construction will often unravel like a cheap sweater.)

The good news here is that Bob’s strategy doesn’t always work. He will sometimes get both strings, but sometimes he won’t. Half the time he’ll query the first lockbox on “0” and it will say “bad password”… and the key it holds will be gone forever. (Bob can still go on to get the other string from the untouched lockbox, but that’s not a problem.) This means we have at best a 1/2 probability of thwarting a cheating Bob. That’s good… but can we reduce Bob’s chances even more?

The most obvious approach is to use more lockboxes. Instead of two lockboxes, imagine Alice sets up, say, 80 different shuffled lockboxes divided into two sets. The first forty lockboxes can be programmed with the “0” password, and the other forty can be programmed with password “1”: then all will be mixed together. Our goal in this is to ensure that Bob will obtain K0 only if he guesses where all of the “0” lockboxes are. To ensure this, Alice will use secret sharing to split K0 into 40 different “shares”, with the property that all of the shares are needed to obtain the original string. Each share can be placed into one of the 40 boxes. (A similar process will be used to protect K1.)

Of course Bob can still cheat and try to guess his way through this maze Alice has built, but he will need to correctly guess where each of the “0”-password lockboxes are located without making a single mistake: since even a single wrong guess will doom his chances to get both strings. Such a lucky guessing strategy is extremely challenging for Bob to pull off. (The analysis is not quite equivalent to Bob flipping a coin “heads” forty times in a row [1/240] but it results in a probability that’s similarly low.)

By adjusting the number of lockboxes she uses, Alice can carefully tune the security level of each “virtual one-time memory” so that cheating (almost) never works.

This sure seems like a lot of lockboxes

Well, I said this paper is appearing in the Theory of cryptography conference, didn’t I?

Obviously the proposal above is not the end of the story. In that solution, for an desired S-bit security level the “naive” approach requires Alice to set up 2S lockboxes (or O(S) if you don’t like constants.) To build GKR one-time programs from this, Alice would need to use that O(S) lockboxes for every input bit Bob might want to feed into the program. Surely we can do better.

The rest of our paper looks at ways to reduce the number of lockboxes required, mainly focusing on the asymptotic case. Our second proposal reduces the number of lockboxes to about O(1) lockboxes for every input bit (when there are many input wires) and the final proposal removes the dependence on the number of input bits entirely. This means in principle we can run programs of arbitrary input length using some reasonable fixed number of lockboxes.

I can’t possibly do justice to the full details here, except to note that we rely on some very cool techniques proposed by others. Our first transform relies on a ‘robust garbling’ technique devised by Almashaqbeh, Benhamouda, Han, Jaroslawicz, Malkin, Nicita, Rabin, Shah and Tromer. The second uses an underappreciated tool called “Laconic OT” that was proposed by Cho, Dottling, Garg, Gupta, Miao and Polychroniadou. (Sadly, Laconic OT constructions are not quite “concretely” practical yet, so I hope these new applications will motivate some more practical research into that area.)

The upshot is that we can, in principle, manufacture programs that run on user inputs of arbitrary size with a fixed cost of at most several thousand lockboxes. While this is still quite a large number in practice(!), it isn’t not infeasible. Such a quantity of lockboxes could be obtained by, for example, setting up boatloads of fake iCloud or Google accounts and then using Apple’s iCloud Key Vault or Google’s Titan Backup service to store a “backup key” for each of them.

(I do not recommend you actually do any of this: I suspect both Apple and Google find such things irritating, and will make your life unpleasant if you try it.)

Are there any downsides to this research?

Maybe. There are many applications of obfuscated programs (including one-time programs) that are extremely bad for the world. One of those applications is the ability to build extremely gnarly ransomware and malware. This is presumably one of the reasons that systems like TrustZone and Intel SGX require developers to possess a certificate in order to author code that can run in those environments.

For example: a ransomware author could encrypt a whole system and the store the keys inside of a One-Time Program that would, in turn, need to be executed on a valid chunk of a blockchain in order to release the keys. This blockchain would then incorporate a fragment that proves that the system owner has paid some sum of money to the malware developer. This system would be fully ‘autonomous’ in the sense that your computer, once infected, would never need to phone home to any “command and control” infrastructure operated by the ransomware author. This would make malware networks harder to take down.

If systems designers are worried about malware on SGX, our research shows that (eventually) those designers may also need to think about the availability of “lockbox”-type functionalities as well. Or to give a shorter summary: don’t let the bad guys get more than a few thousand lockboxes, or they gain the power of secure computation and all that comes with it. And the exact number they need could be much smaller than the results we achieve in this paper.

Perhaps at a higher level, the main lesson of this research is that computation is much easier than you’d expect it to be. If someone wants to do it badly enough, they’ll always find a way.

Notes:

* There is a second line of work that uses very powerful cryptographic assumptions and blockchains to build such programs. I am very enthusiastic about this work [and my co-authors and I have also written about this], but this post is going to ignore those ideas and stick with the hardware approach.

** Of course that’s not really the ballgame. As with all simple descriptions of complex protocol papers, this simple explanation omits all the subtle details that matter for the security proof, but that aren’t terribly relevant for the purposes of this blog. These include all sorts of interesting questions about how to prove security in an environment where an adversary can query the tokens in a weird, arbitrary order. It’s a good paper!

*** Some implementations will pick the key K for you. Others fix the maximum attempt counter at some constant (usually 10 attempts) rather than supporting any amount. All of these details can be worked around in practice.

An extremely casual code review of MetaMask’s crypto

An extremely casual code review of MetaMask’s crypto

NB: This post describes a very casual code review of a few cryptography functions used by MetaMask. It does not describe any vulnerabilities. If you’re the kind of person who likes a meandering and amateurish code review that goes absolutely nowhere, you’ll enjoy this post. Otherwise you might want to read something more exciting: I suggest Moxie’s recent post on web3.

For reasons I can’t fully explain, the other day I decided that it might be fun to spend an hour or two investigating the cryptography used by MetaMask.

For those who aren’t familiar with the tool, MetaMask is a browser-based cryptocurrency wallet that is used used to access decentralized applications (dapps) on networks like Ethereum. My interest in MetaMask is mostly just based on curiosity: I recently invested about $100 into a decentralized finance application and I wanted to see how safe it really was. While there are lots of different ways I could lose that money, taking a look at MetaMask’s crypto seemed like as good a place to start as any.

I want to stress that this was an informal code review: I didn’t use any tooling, didn’t even download (most) code to my computer. In fact my “review” mostly involved poking around various Github repositories to see if I could find anything that immediately jumped out as incorrect, and failing that, at least could give me a feeling for the quality of MetaMask’s crypto code. (In fact I did about half the work on my phone while eating a burrito bowl at Chipotle.)

After an hour or two of hunting through dependencies, I made the mistake of tweeting about my feelings:

I swear that this offhand Tweet was not meant to make anyone unhappy or scare people that their funds were at risk. Although I had a couple of scary moments, those were entirely on me. So let me be clear as possible right now: I did not find any exploitable vulnerabilities in MetaMask’s crypto!

What I did come back with is an uncomfortable feeling about the complexity and quality of MetaMask’s (current) crypto code, and some unhappy feelings about its dependency structure. Some of this stuff is basically unavoidable: file it under “browser crypto is scary.” Some is specific to the way the code is laid out. And some of it is a (non-trivial) gripe: this code is is much harder to audit than it should be.

And this last part isn’t just a personal gripe. My TL;DR is that finding and reviewing the correct code was made difficult because there were far too many different organizations owning the dependencies that led to the crypto routines themselves. This made me uncomfortable, given how much money these routines are responsible for securing.

In this post I’m going to justify these opinions by walking you through a casual skim of MetaMask’s code and and crypto dependencies. To make you feel “like you were there”, I’m going to discuss the review more or less in the order it occurred — including the embarrassing part where I went off the rails and reviewed the wrong code.

If you enjoy reading crypto code for entertainment, this post might be for you. If you’re hoping that this will actually lead to anything exciting, I suspect you’ll be very disappointed.

Quick explainer: what is MetaMask?

If you already know what MetaMask does, skip this part.

Since many readers have probably never used DeFi, I figure I should give a quick background here on “web3” and MetaMask in particular.

From a technical perspective, web3 is pretty straightforward. It consists of many “decentralized apps” (dapps), each of which typically comprises a (typically Javascript) front-end web app, as well as some back-end business logic. What makes dapps special is the back-end portion, which is decentralized. Generally this means it relies on a smart contract running in a network like Ethereum.

Web3 front-ends are just web apps, and typically they’re served to your browser via standard web infrastructure. (Security here usually means HTTPS, so anyone who hacks a server or steals a Cloudflare API key can change these apps’ code in malicious ways.)

Of course, web apps can’t communicate directly to blockchains, nor are they a good place to store private keys. This is why MetaMask exists. In its most popular instantiation, MetaMask ships as a browser extension for Chrome and Firefox. When a web application needs to send a transaction to a smart contract (e.g., because you want to deposit money into Aave), MetaMask is responsible for signing the transaction and shipping the transaction to the chain.

On the bright side for this review, the actual cryptography in MetaMask is fairly limited: as a wallet, it must generate and store Ethereum public and private keys and it also needs to handle “simple” operations like ECDSA signing. On the other hand: we’re in a browser. So nothing is actually simple.

Climbing the dependency ladder

To put some bounds on the effort required, I decided to only look at the implementation of ECDSA signing and key generation (excluding the curve operations.) This seemed like an easy task I could get done in an hour or two while eating an extended lunch.

Finding the starting point for a review like this was harder than I expected. In short: it isn’t easy to grok the control flow of a complex event-based Javascript browser extension to find out exactly which calls are “real” and which are tests or dead code. To shortcut this problem, my approach amounts to “grepping and hoping”, starting from the “develop” branch of MetaMask’s extension repository.

A quick search for “sign” leads us to a promising call in the file metamask-controller.js:

Sadly, following this call path quickly leads us into dependency hell.

First, we reach a library called eth-keyring-controller, which presumably manages Ethereum keyrings. A quick scan of that library shows it calling a second dependency: eth-sig-util. (Both are NPM packages developed by MetaMask.) We’ll jump right to the latter package, which… you guessed it, calls yet another package:

This call takes us out of MetaMask-owned code and out to a new package called “ethereum-jsutil” that’s maintained by the “Ethereum Javascript community“, whoever they are. (Don’t worry, we won’t stay here long enough to care.) Of course, in this package we find yet another layer of dependency indirection:

From here, we head over to a package called “js-ethereum-cryptography“, which holy cow is actually a repository maintained by the Ethereum project itself!

At this point I’m about halfway into my allotted review time and asking myself questions like “why didn’t MetaMask just call this library directly” and “why do I make poor life choices”. But never mind, we’re almost there: surely we are going to see some actual crypto code soon!

Except no, we are not. This is what we find in the Ethereum library:

Yet another dependency: this time to something called “noble“. Here my quick-and-dirty approach to dependency resolution (Google “npm <package-name>”) fails me, since NPM says that “noble” is is a “Node.js Bluetooth Low Energy library.” As cool as that would be — access the blockchain through Bluetooth! — I’m guessing this is not right.

A bit more searching leads me to believe that noble is actually the Noble cryptography library, which appears to have been developed by Paul Miller. And hey, this code looks promising! The page lists actual cryptography design goals that seem reasonable, and the code is written in TypeScript. Even better: the library has been subject to an audit by Cure53.

Nonetheless, and with no disrespect to Paul or anyone else in this community: I would like to take a moment to complain about this dependency structure:

  1. Resolving all these pointless dependencies has eaten up a lot more of my review time than you might imagine. (I’m leaving out all the times I accidentally visited the wrong libraries because they used some combination of “js” and “ethereum” and “cryptography” and just Googling is risky here.)
  2. More substantively, I can’t help but notice that there are a lot of code owners in this critical path. So far I’ve traveled through repositories owned by four different organizations, and the last one is someone’s personal Github account. Is this normal for a system that secures billions of dollars of user funds? Maybe it should not be.

But enough complaining. There is actual crypto code here! We can finally look at ECDSA.

Except… it turns out that we can’t do that, because I made a stupid mistake.

After speaking with Paul Miller on Twitter, I learned that this whole code review has been premised on a very foolish assumption: namely that the code in the main (development) branches of MetaMask and its dependencies is actually the code people are using in MetaMask today. That turns out to be wrong. In fact, Paul tells me, the noble library is slated for deployment in an upcoming release. The current release of MetaMask relies on a library called “elliptic“, which was written by Fedor Indutny.

I’m now scraping up those last bits of cheese in the burrito bowl.

Let’s look at elliptic

Ok, so forget everything I did above. That was my mistake. I promise to come back to those newer code paths soon, but that’s for the future. My goal in this review was to look at MetaMask as it exists today. Apparently that means I need to look at Fedor’s elliptic library.

(Note: if I was being professional then what I would really do is review all the release branches of MetaMask and all of its dependencies just to make sure I’m in the right place, and to see what the calls looked like. But life is too short: I’m going to trust Paul.)

The elliptic library is written in “plain old” Javascript, so types will tend to be confusing. On the bright side, it’s relatively compact. The core library supports Ed25519 (for EdDSA) and Secp256k1 for ECDSA on networks like Ethereum. Let’s focus on the latter.

I’m not a Javascript developer so certain things aggravate me about reviewing this code. One is that developers often put critical routines in stupidly-named files like “index.js”, which is where we finally reach the core signing implementation for ECDSA.

Signing

ECDSA is a stupid algorithm, but fortunately it’s not a very complicated one. Leaving aside the curve operations, there are basically two places where ECDSA implementations tend to go wrong. The first is in key generation. The other is in signing.

In ECDSA signing the main security risk is in how nonces are generated: it’s critical that the ECDSA nonce (“k”) is sampled uniformly from the range 1 … n-1, where n is the group order. Critically, the same nonce must never be used with two distinct messages (really, message hashes.) If this ever happens, it’s possible to recover a private key from two signatures, something that’s generally frowned upon.

Let’s look at how signing is handled in this library:

So looking at the code above, how do we get “k“?

A first observation is that there is no actual randomness here, no call to a random number generator. Instead, the signing routine instantiates a deterministic random bit generator (DRBG) based on HMAC. That algorithm stretches a shorter “seed” into a longer sequence of pseudorandom bits that can then be used to obtain our random nonce.

I initially had a small heart attack when I (briefly!) misread the code and thought that the only seed for the DRBG was the ECDSA private key: this would definitely lead to nonce re-use. On closer inspection, the DRBG is seeded with two fields: entropy is set to the ECDSA private key, and nonce is set to the (hashed) message to be signed. Within the underlying DRBG implementation both values get hashed into the a value called seed, which actually is the seed for the DRBG. This design should mean that each (private key, message) pair will get different random bits, and thus different nonces “k”.

While I don’t love a purely deterministic generation of “k” (and I’ll explain why below), this isn’t some roll-your-own idea: in fact this is appears very similar to the approach recommended by RFC 6979. And going one step farther, it appears that elliptic is actually directly implementing an algorithm from that RFC, specifically the one in section 3.3, “Alternate Description of the Generation of k.” Annoyingly I did not learn this fact from the code itself, which would have saved me a lot of time. I realized it only while reading through the RFC for unrelated reasons (in fairness, elliptic’s README does mention it.)

(A brief aside to developers: if you implement a standard algorithm, please please add a reference in comments at the point where you’re using it, and comment each step so reviewers can see exactly how it maps to the pseudocode. For a good example of how to do this, take a look at how noble implemented the same standard.)

So are there any issues with this implementation?

A particularly weird feature is that the caller of the sign() function can pass in a data structure called options. One element of this structure appears to be a function, which is named (not very descriptively) options.k. If included in the options structure, this function appears to override the built-in nonce generation and replaces it with logic the caller provides. This is weird. While I can see an argument for allowing callers to provide extra entropy, I can’t see a good reason why a caller should be allowed to entirely override nonce generation within the signing routine.

Overall, this seems like a giant footgun and also a good opportunity for some caller to slip in malicious code that could easily slip past a review. (Indeed, thanks to all this dependency confusion I’m not even sure which of MetaMask’s dependencies is calling this routine.)

As a weird added bonus, the same options struct can also be passed in the enc argument as well; if this happens, the structure will be copied over into options. I’m sure there are perfectly good reasons for this.

A few smaller notes:

  • The function _truncateToN() seems poorly defined and serves two slightly different purposes, triggered by a boolean flag — which is just omitted in some calls.
  • In the first mode it simply truncates the message to ensure it is the same length as the group order n. This mode is only used to truncate nonces, which is not something that should ever need to happen — except when you’re using the crazy options.k function discussed above.
  • When called with “truncOnly = false“, however, _truncateToN() will also ensure that the truncated result is less than or equal to n, and if not it will subtract n. This appears to implement the bits2octets subroutine of RFC 6979, which is fine I guess but took too long to figure out. Why not just write two routines and name them appropriately?

Neither of these two issues are critical, but they made the code harder to read.

As a final note: I would urge developers to avoid these purely deterministic ECDSA implementations, mostly because they make signing implementations very fragile. Since all of the “entropy” in this signing routine comes from the message and private key, this means that any non-determinism in the actual ECDSA implementation (e.g., caused by weird option flags or glitches in BigNum encoding) can potentially produce exploitable signatures if the same message is ever signed twice. (As Nadia points out: this could also happen if you install the same private key into two different wallet implementations and sign the same message.)

In either case: adding some genuine entropy to the nonce could remove these concerns in most cases. Section 3.6 of RFC 6979 describes how to do this by adding an additional random input.

Key generation

Update (1/14): the comments below discuss key generation in elliptic, but MetaMask uses a BIP39-based keyphrase, which means that most ECDSA keys are actually generated elsewhere (allegedly in bitcoinjs/bip39). In retrospect this should have been obvious to me, but thanks to @kumavis_ for pointing it out. I’m keeping the section below just to be complete, but this code (probably) isn’t used by MetaMask.

The last stop on this very brief tour is the ECDSA key generation algorithm. This be found slightly higher up in the same file:

Once again we have an appearance from our old friend HMAC-DRBG. As with the signing routine, the caller can supply an options structure. Fortunately this one only contains optional entropy, and does not allow the caller to completely override key generation. Even better: the input to HMAC-DRBG appears to include actual randomness, drawn from a call to a function called rand().

Does rand() produce actual random numbers in all browsers? I think so, but really could not tell you for certain. This routine is implemented by a package called indutny/brorand, the entirety of which I’m pasting below, just because it made me burst into maniacal laughter:

Let’s just assume the entropy is good. The remainder of the (private) key generation function takes place in these lines:

This generates a random number priv between [0…2^{256} – 1] and then makes sure that priv is not greater than the group order n, in which case it re-generates priv. It then adds 1, which I think gives us a priv in the range [1…n+1], which seems wrong, since it should be in the range [1…n-1]. Am I doing my arithmetic backwards?

A basic point to make here is that much of this review has come down to examining code that solves a single problem: (deterministically) sampling integers within a precise range. So far I’ve reviewed at least two different custom implementations of the same process, both with slightly different results. Why is the same code repeated so often? Just make this a subroutine so we can analyze it and be sure it’s working correctly.

Anyway the (possible) issues above probably don’t really matter. I’ll assume in most cases private keys are generated correctly using a secure random number generator, which takes the most obvious risks off the table.

Conclusions

Ok, I am basically just exhausted now. This was a lot of work to evaluate a tiny piece of a crypto library that, frankly, might not even be the actual crypto library that MetaMask is using. I’m not sure if any of this was worth it, and I’m starting to get indigestion from the Chipotle.

If I could summarize my overall findings above, they would look something like this:

  1. I do not like reviewing Javascript for many reasons, not least of which is the lack of types. Fortunately many MetaMask dependencies are written in TypeScript. This library should be too. (Fortunately Noble appears to have made this upgrade.)
  2. The stupid dependency chain between MetaMask and its crypto libraries makes reviews more difficult than they need to be, and adds too many parties that need to be trusted. This should be greatly simplified, unless there is a reason for it that I don’t understand.
  3. The crypto code may be well-written in a cryptographic sense, but it was not really optimized for humans to review. This makes any review much harder, and I think anything that makes reviews harder is bad for users.
  4. Allowing callers to specify dangerous optional arguments to crypto routines seems bad.
  5. If you’re writing a security-critical routine like “sample random integer in a precise range,” write that routine one time, not multiple times. Please!
  6. Too much dependency on random numbers can make ECDSA dangerous. I would argue that too much determinism can be just as risky. It might be good to find a solution that mixes the two, as discussed forther above.

If it was up to me I would re-write the entire codebase in TypeScript and would try to use more standard libraries. I would remove layers of dependencies. I would tighten up the crypto APIs to make sure malicious calls are harder to get away with. Finally, I would make sure every single major block in this code had crystal-clear comments explaining precisely what it is doing.

As a last point, I would move all of this code back under the control of some more centralized organization(s), rather than leaving essential code in random personal repositories.

In summary: I don’t think this tiny review was entirely a waste of time. Although I don’t love everything about the way this code works, I’m now 15% more confident that MetaMask isn’t doing something utterly stupid with my cryptographic keys. That seems like a genuine win.

Next post: I’m going to review the noble-cryptography libraries to see what the new MetaMask code is planning to do.

Statement on DMCA lawsuit

My name is Matthew Green. I am a professor of computer science and a researcher at Johns Hopkins University in Baltimore. I focus on computer security and applied cryptography.

Today I filed a lawsuit against the U.S. government, to strike down Section 1201 of the Digital Millennium Copyright Act. This law violates my First Amendment right to gather information and speak about an urgent matter of public concern: computer security. I am asking a federal judge to strike down key parts of this law so they cannot be enforced against me or anyone else.

A large portion of my work involves building and analyzing the digital security systems that make our modern technological world possible. These include security systems like the ones that protect your phone calls, instant messages, and financial transactions – as well as more important security mechanisms that safeguard property and even human life.

I focus a significant portion of my time on understanding the security systems that have been deployed by industry. In 2005, my team found serious flaws in the automotive anti-theft systems used in millions of Ford, Toyota and Nissan vehicles. More recently, my co-authors and I uncovered flaws in the encryption that powers nearly one third of the world’s websites, including Facebook and the National Security Agency. Along with my students, I’ve identified flaws in Apple’s iMessage text messaging system that could have allowed an eavesdropper to intercept your communications. And these are just a sampling of the public research projects I’ve been involved with.

I don’t do this work because I want to be difficult. Like most security researchers, the research I do is undertaken in good faith. When I find a flaw in a security system, my first step is to call the organization responsible. Then I help to get the flaw fixed. Such independent security research is an increasingly precious commodity. For every security researcher who investigates systems in order to fix them, there are several who do the opposite – and seek to profit from the insecurity of the computer systems our society depends on.

There’s a saying that no good deed goes unpunished. The person who said this should have been a security researcher. Instead of welcoming vulnerability reports, companiesroutinely threaten good-faith security researchers with civil action, or even criminal prosecution. Companies use the courts to silence researchers who have embarrassing things to say about their products, or who uncover too many of those products’ internal details. These attempts are all too often successful, in part because very few security researchers can afford a prolonged legal battle with well-funded corporate legal team.

This might just be a sad story about security researchers, except for the fact that these vulnerabilities affect everyone. When security researchers are intimidated, it’s the public that pays the price. This is because real criminals don’t care about lawsuits and intimidation – and they certainly won’t bother to notify the manufacturer. If good-faith researchers aren’t allowed to find and close these holes, then someone else will find them, walk through them, and abuse them.

In the United States, one of the most significant laws that blocks security researchers is  Section 1201 of the Digital Millennium Copyright Act (DMCA). This 1998 copyright law instituted a raft of restrictions aimed at preventing the “circumvention of copyright protection systems.” Section 1201 provides both criminal and civil penalties for people who bypass technological measures protecting a copyrighted work. While that description might bring to mind the copy protection systems that protect a DVD or an iTunes song, the law has also been applied to prevent users from reverse-engineering software to figure out how it works. Such reverse-engineering is a necessary party of effective security research.

Section 1201 poses a major challenge for me as a security researcher. Nearly every attempt to analyze a software-based system presents a danger of running afoul of the law. As a result, the first step in any research project that involves a commercial system is never science – it’s to call a lawyer; to ask my graduate students to sign a legal retainer; and to inform them that even with the best legal advice, they still face the possibility of being sued and losing everything they have. This fear chills critical security research.

Section 1201 also affects the way that my research is conducted. In a recent project – conducted in Fall 2015 – we were forced to avoid reverse-engineering a piece of software when it would have been the fastest and most accurate way to answer a research question. Instead, we decided to treat the system as a black box, recovering its operation only by observing inputs and outputs. This approach often leads to a less perfect understanding of the system, which can greatly diminish the quality of security research. It also substantially increases the time and effort required to finish a project, which reduces the quantity of security research.

Finally, I have been luckier than most security researchers in that I have access to legal assistance from organizations such as the Electronic Frontier Foundation. Not every security researcher can benefit from this.

The risk imposed by Section 1201 and the heavy cost of steering clear of it discourage me – and other researchers — from pursuing any project that does not appear to have an overwhelming probability of success. This means many projects that would yield important research and protect the public simply do not happen.

In 2015, I filed a request with the Library of Congress for a special exemption that would have exempted good faith security researchers from the limitations of Section 1201. Representatives of the major automobile manufacturers and the Business Software Alliance (a software industry trade group) vigorously opposed the request. This indicates to me that even reasonable good faith security testing is still a risky proposition.

This risk is particularly acute given that the exemption we eventually won was much more limited than what we asked for, and leaves out many of the technologies with the greatest impact on public health, privacy, and the security of financial transactions.

Section 1201 has prevented crucial security research for far too long. That’s why I’m seeking a court order that would strike Section 1201 from the books as a violation of the First Amendment.

A letter from US security researchers

This week a group of more than fifty prominent security and cryptography researchers signed a letter protesting the mass surveillance efforts of the NSA, and attempts by NSA to weaken cryptography and privacy protections on the Internet. The full letter can be found here.

Most of you have already formed your own opinions on the issue over the past several months, and it’s unlikely that one letter is going to change that. Nonetheless, I’d like a chance to explain why this statement matters.

For academic professionals in the information security field, the relationship with NSA has always been a bit complicated. However, for the most part the public side of that relationship has been generally positive. Up until 2013 if you’d asked most US security researchers for their opinions on NSA, you would, of course, have heard a range of views. But you also might have heard notes of (perhaps grudging) respect. This is because many of the NSA’s public activities have been obviously in everyone’s interest — helping to fund research and secure our information systems.

Even where evidence indicated the possibility of unfair dealing, most researchers were content to dismiss these allegations as conspiracy theories. We believed the NSA would stay between the lines. Putting backdoors into US information standards was possible, of course. But would they do it? We thought nobody would be that foolish. We were wrong.

In my opinion this letter represents more than just an appeal to conscience. It measures the priceless trust and goodwill the NSA has lost — and continues to lose while our country fails to make serious reforms to this agency.

While I’m certain the NSA itself will survive this loss of faith in the short term, in the long term our economic and electronic security depend very much on the cooperation of academia, industry and private citizens. The NSA’s actions have destroyed this trust. And ironically, that makes us all less safe.

Hey Amazon: Banning Security Researchers Isn’t Making Us Safer

Readers of this blog may recall that I’m a big fan of the RSA-key ‘cracking’ research of Nadia Heninger, Zakir Durumeric, Eric Wustrow and Alex Halderman. To briefly sum it up: these researchers scanned the entire Internet, discovering nearly 30,000 weak RSA keys installed on real devices. Which they then factored.

In the fast-paced world of security, this is already yesterday’s news. The problems have been responsibly disclosed and repaired, and the manufacturers have promised not to make, well, this particular set of mistakes again. The research even received the Best Paper award at Usenix Security.** So you might ask why I’m writing about it now. And the answer is: I’m not.

What I’m writing about today is not the research itself, but rather: the blowback from the research. You see, Heninger et al. were able to conduct their work mostly thanks resources rented from Amazon’s Elastic Compute Cloud (EC2). And in response, Amazon has booted them off the service.

This is a real drag, and not just for the researchers in question.

Cloud services like EC2 are a huge resource for ethical security researchers. They help us to learn things about the Internet on a scale that we could never accomplish with the limited resources in most university labs. Cloud services also give us access to software and hardware that would be nigh on impossible to justify to a grant committee, stuff like GPU cluster instances which are invaluable to cryptographers who want to run specialized cracking tasks.

But more importantly: the rise of cloud computing has given rise to a whole new class of security threat: things we never had to worry about before, like side-channel and covert channel attacks between co-located VMs. Securing the cloud itself requires real-world analysis, and this means that researchers have to be trusted to do some careful, non-malicious work on actual platforms like EC2. Unfortunately this is just kind of research that the Heninger et al. ban could serve to discourage.

Now, I don’t pretend that I know all the details of this particular case. I haven’t spoken to the researchers about it, and although the paper makes their scan seem pretty benign, it’s always possible that it was more aggressive than it should have been.*

Moreover, I can’t challenge Amazon’s right to execute this ban. In fact their Acceptable Use Policy explicitly prescribes security scans under a section titled ‘No Security Violations’:

  • Unauthorized Access. Accessing or using any System without permission, including attempting to probe, scan, or test the vulnerability of a System or to breach any security or authentication measures used by a System.

The question here is not whether Amazon can do this. It’s whether their — or anyone else’s — interests are being served by actually going through with such a ban. The tangible result of this one particular research effort is that thousands of vulnerable systems became secure. The potential result of Amazon’s ban is that millions of systems may remain insecure.

Am I saying that Amazon should let researchers run amok on their network? Absolutely not. But there has to be a balance between unfettered access and an outright ban. I think we’ll all be better off if Amazon can clearly articulate where that balance is, and provide us with a way to find it.

Update (9/3): Kenn White points me to this nice analysis of the public EC2 image-set. The authors mention that they worked closely with Amazon Security. So maybe this is a starting point.

Notes:

* Admittedly, this part is a little bit ambiguous in their paper. NMAP host discovery can be somewhere between gentle poke and ‘active scrub’ depending on the options you’ve set.

** In case you haven’t seen it, you may also want to check out Nadia’s (NSFW?) Usenix/CRYPTO rump session talk.

The first rule of vulnerability acknowledgement is: there is no vulnerability acknowledgement

Just for fun, today we’re going to look at two recent vulnerability acknowledgements. The first one’s pretty mild; on the Torino scale of vulnerability denial, it rates only about a three:

The research team notified Amazon about the issues last summer, and the company responded by posting a notice to its customers and partners about the problem. “We have received no reports that these vulnerabilities have been actively exploited,” the company wrote at the time. 

But this one from RSA, wow. The charts weren’t made for it. I suggest you read the entire interview, perhaps with a stiff drink to fortify you. I warn you, it only gets worse.

If our customers adopted our best practices, which included hardening their back-end servers, it would now become next to impossible to take advantage of any of the SecurID information that was stolen.

… We gave our customers best practices and remediation steps. We told our customers what to do. And we did it quickly and publicly. If the attackers had wanted to use SecurID, they would want to have done it quietly, effectively and under the covers. The fact that we announced the attack immediately, and the fact that we gave our customers these remediation steps, significantly disadvantaged the attackers from effectively using SecurID information.

… We think because we blew their cover we haven’t seen more evidence [of successful attacks].

I have a paper deadline midweek, so blogging will be light ’til then. Once that’s done, I’ll have something more substantial to say about all this.