Some thoughts about Anthropic’s new cryptanalysis results

Yesterday Anthropic published two new cryptanalysis results, both outputs of Claude Mythos, their (still) unreleased advanced model. The first of these results attacks a signature scheme called HAWK, while the second is an improved attack against reduced-round AES. Anthropic also released a blog post describing the research process that produced these results. A few people online have asked me what this all means. While I’m not sure I have all the answers, I figured it wouldn’t hurt to write a bit about my current understanding. These are only my thoughts and other folks will probably differ (including domain experts in the two areas at issue) so take them for what they are.

The two new results cover two very different areas, and are overall just very different in quality. Before we get to broad statements about the world, and whether you should sell all your cryptocurrency, let’s take a minute to talk about the substance.

Hawk. The first is a new key recovery algorithm against the non-standard signature scheme HAWK. HAWK is a proposed post-quantum-safe signature scheme that’s based on the module Lattice Isomorphism Problem (module-LIP). For a brief Claude-written summary of the result itself, see here. There are five things you need to know about this result:

HAWK is not a deployed or standards-adopted algorithm, it’s a proposed algorithm. It is related to the Falcon signature scheme, which is being standardized, but the attack does not transfer to that setting (which is based on a different hard problem.)
However, HAWK was somewhat far along in the process of being evaluated for a future standard.
The attack does not break “real deployed” HAWK in the sci-fi sense. The resulting attack is still exponential time, but roughly halves the number of “bits” of security in the algorithm. That means it could theoretically be fixed by doubling key sizes. The downside is that this makes the scheme less efficient, and, since HAWK is entirely motivated by being more efficient than alternatives, that makes the existence of the scheme much harder to justify.
The attack produced real code that runs in a few hours of wall-clock time against a weakened “challenge instance” of HAWK that the authors provided for this purpose. While this instance doesn’t use the parameters that were proposed for real deployment, it does demonstrate the cryptanalytic weakness well enough.
What’s particularly concerning (and so especially ripe for AI) is that the attack does not invent fundamentally new mathematics. It simply extends a bunch of tools that were lying around and well-known, and gets a good result.

This last part is important. I asked Claude for its thoughts, and it doesn’t mince words: “what makes this genuinely interesting — and, frankly, a little embarrassing for the field — is that none of the ingredients are exotic.” The TL;DR is that someone just did a much more thorough job applying all of our known tools. This is the sort of things that attack AIs excel at.

AES. The second result is a new attack on reduced-round AES. This result initially sounds more exciting, since most people hear “attack on AES” and panic. However, this is also the result that’s much, much less interesting.

Most folks reading this blog will know that AES is a standard block cipher that’s used just about everywhere. It’s been a standard since 2001, and the deployed version has so far withstood everything significant that’s been thrown at it: that includes a substantial amount of non-public testing performed by the NSA. Since attacking full ciphers is very difficult, it’s standard for cryptanalysts to do their work against weakened, or “reduced-round” versions of a cipher. The full AES cipher runs for either 10, 12 or 14 rounds depending on key size. The new Anthropic result attacks a weaker 7-round variant of the cipher.

Critically, attacks against 7-round AES are not new: there have been several of these. In fact, this new Anthropic result is a modest constant-factor improvement on previous work from back in 2013. To give you a sense of how far these attacks are from really “breaking” AES, I’d note the headline results: the new attack requires 2⁸⁹ cipher operations and, even worse, this work is only possible after you’ve somehow convinced a real encryptor to produce 2¹⁰⁵ encryptions of chosen plaintexts under their secret key! Neither of these things is remotely practical in the real world. And while the new result modestly speeds up this attack over the previous result, it’s not even clear how “real” the speedup in this result is: since the actual attack requires 2⁸⁹ operations and can’t really be “run”, what we have is an on-paper analysis that may or may not yield an actual runtime improvement if all details are actually worked out.

This does not make the result bad! In fact it’s still interesting from a techniques point of view. But it is very much a small increment in our knowledge, not a practical new attack like the HAWK work. So TL;DR: no wildly new mathematical results here. But still, real cryptanalytic progress of the sort that make scientists excited. And certainly the HAWK result is very meaningful, since that scheme had a real chance at standardization and is now (very likely) not going to be.

Now let’s talk about how we got here, and what it all means.

How did Anthropic get these results?

The Anthropic post is detailed about what they did, and honestly, it’s kind of hilarious. No, the team at Anthropic was not a large set of domain experts that carefully tuned their AI to find novel results. They appear to have just told it to get some results and then strapped its nose to the grindstone until it found some. If you doubt me, here are some examples of the prompts they used (cited from their post):

So yes, the AIs are getting pretty good. In short: they are now capable of understanding existing cryptanalysis results, synthesizing them into real new attacks, and even extending them. They can apparently do this without detailed human intervention. This isn’t yet super-intelligent cryptanalysis. but it’s pretty damn impressive.

Verifiability is now the bottleneck

As a researcher I’ve also been spending a lot of time with models, talking through various ideas. I don’t think I will surprise anyone when I say that they’re obviously getting better, even over the course of the past few months. While I don’t have Mythos and $100k to spend, I have been able to query at least one new advanced unreleased model, and I also have received some surprising new “results” to questions that I’ve been interested in for a few years.

Which brings me to the real problem: just because a model spits out an apparent new result, this does not mean the result is real. Even if models are good at producing real results, they’re much better at producing results that look real but are misleading. This can be enormously frustrating, and often means that human attention is more necessary than ever.

There are exceptions to this rule: for “full” attacks like HAWK, where the attack runs in a few hours (against a weaker version of the scheme), verification is extremely easy. You can just send over the code and let anyone check that it recovers keys and signs real chosen messages. For more subtle speedup attacks like the AES result, checking validity is not so easy. Here the approach is more specific: formally-verifiable Lean proofs can help here, but (even where these proofs are easy to make), such proofs are still highly sensitive to how you’ve formulated the theorem statement, and that often requires human experts to check.

You’ll probably notice that many of the exciting recent mathematical results have had this flavor: they either include a machine-checkable proof of a well-understood theorem, or (like the Jacobian conjecture) they involve finding a simple counterexample you can compute on. Alternatively, a bunch of experts spent a lot of time reviewing the result and were eventually convinced by it. This need for some humans to check the work is going to slow down our progress. For non-devastating examples of cryptanalysis, this is probably where we’re going to be for a while.

What are the implications for the real world?

The answer to this question really depends on whether you’re talking about the consumers of cryptography, scientists, or humanity at large. Let’s take these one at a time.

For users of cryptography: there are two pieces of good news and some mixed news. The first is that our symmetric ciphers are very messy and robust. Imagine a farmer who drags a tractor out into a patch of quicksand, and then buries it under cement. That’s what symmetric cipher design is like; it’s deliberately designed to come up with structures that are quick and easy to apply, but very messy and hard to untangle. The addition of many new raw intelligence-hours probably aren’t going to magically improve this. AIs may be able to eventually make real progress against these problems using entirely new techniques, but so far they’re not demonstrating the truly groundbreaking intuition that would be required to do so. And even if they do: there’s a good reason to hope that they’ll be able to improve the ciphers themselves to make them much harder to break.

Public-key cryptography is messier. Public-key crypto requires a mathematical object that admits fast calculations in one direction, but not the other, and yet has a convenient “trapdoor” that lets one party reverse the process. We humans have come up with only a handful of very conveniently-structured mathematical objects to enable this: they involve conjectured “hard problems” like the (EC)DLP problem, RSA, lattice problems, and various problems from the domain of coding theory. While we’ve given these our all, there simply have not been enough human beings dedicated to analyzing these problems (even the older ones like RSA!), such that we can be absolutely certain there are no more good attacks out there. This is particularly true for the novel areas like code-based crypto and even lattice-based cryptography.

That means there’s a lot of fertile ground for AIs to make real progress.

With that said, I said there was good news, and I meant it. Right now we’re in the midst of a historic transition from traditional public-key algorithms based on EC-based cryptography and RSA, moving over to new post-quantum algorithms based on novel problems. This is why there are so many standards like HAWK being considered. If there was ever a perfect time for a massive new public cryptanalysis capability to come on line, we’re in it. So unless AIs succeed in undermining all of our hard problems altogether (or we live in Impagliazzo’s Minicrypt) then this could not be a better time for AI to get good at cryptanalysis. In the best case, the result is that we gain real confidence in the problems we’ve identified, and the cryptanalysis literature gets a lot more robust. Hopefully.

For scientists: this is also a wonderful time. You now have a plastic pal who’s fun to be with, and you can talk over your hardest problems. At the same time it’s not yet smart enough that it can solve all of them without your assistance. And even better, the pace of new findings is speeding way up. This is mostly good! If you’re energetic. I still have many questions, like: “who should get credit for these new results” and “who will review all of these new results” but so far I’m not panicked. The world is getting modestly better. For now.

For the world: I don’t know. If you’re under the impression that these models are “glorified autocomplete” or that progress is slowing down, I need to urge you: stop thinking that. The models are very intelligent and capable, they are getting better at a fast clip. I can cite measurable and impressive progress over just the past five months on specific types of problem I’ve asked them to look at. If there’s a ceiling out there, I don’t yet see evidence of it. The people who think models are dumb are mostly using Google’s free AI search results, and not interacting with the high-end stuff (which only costs $20, so it’s not out of reach.) And they’re mostly not working in new areas.

On the other hand: if you think that models are super-intelligent or that AGI is already here, you should also stop thinking that. Working with these tools is like swimming in a pond where the ground drops off sharply. One minute you’re wading comfortably and there’s support under your feet. Then suddenly you cross a specific line, and you’re back to swimming on your own. This analogy is my best way to explain what it feels like when the model goes from helpful to clueless. Right now it’s easy for a human being to find that line if you’re doing advanced research, so you know where the intelligence drops off. But the line is moving. You can feel it slowly drifting outwards under your feet.

Whether this is good or bad depends whether you prefer that human beings should wade or swim, and also, whether you should be comfortable swimming in a pond where the ground itself is moving.

The only good news I can share with you is that we’re all in that same pond, scientists, lawyers, salespeople, even plumbers. Whatever happens next, it’s probably going to happen to us all. Let’s hope it’s a good thing.

A quick post on Chen’s algorithm

Update (April 19): Yilei Chen announced the discovery of a bug in the algorithm, which he does not know how to fix. This was independently discovered by Hongxun Wu and Thomas Vidick. At present, the paper does not provide a polynomial-time algorithm for solving LWE.

If you’re a normal person — that is, a person who doesn’t obsessively follow the latest cryptography news — you probably missed last week’s cryptography bombshell. That news comes in the form of a new e-print authored by Yilei Chen, “Quantum Algorithms for Lattice Problems“, which has roiled the cryptography research community. The result is now being evaluated by experts in lattices and quantum algorithm design (and to be clear, I am not one!) but if it holds up, it’s going to be quite a bad day/week/month/year for the applied cryptography community.

Rather than elaborate at length, here’s quick set of five bullet-points giving the background.

(1) Cryptographers like to build modern public-key encryption schemes on top of mathematical problems that are believed to be “hard”. In practice, we need problems with a specific structure: we can construct efficient solutions for those who hold a secret key, or “trapdoor”, and yet also admit no efficient solution for folks who don’t. While many problems have been considered (and often discarded), most schemes we use today are based on three problems: factoring (the RSA cryptosystem), discrete logarithm (Diffie-Hellman, DSA) and elliptic curve discrete logarithm problem (EC-Diffie-Hellman, ECDSA etc.)

(2) While we would like to believe our favorite problems are fundamentally “hard”, we know this isn’t really true. Researchers have devised algorithms that solve all of these problems quite efficiently (i.e., in polynomial time) — provided someone figures out how to build a quantum computer powerful enough to run the attack algorithms. Fortunately such a computer has not yet been built!

(3) Even though quantum computers are not yet powerful enough to break our public-key crypto, the mere threat of future quantum attacks has inspired industry, government and academia to join forces Fellowship-of-the-Ring-style in order to tackle the problem right now. This isn’t merely about future-proofing our systems: even if quantum computers take decades to build, future quantum computers could break encrypted messages we send today!

(4) One conspicuous outcome of this fellowship is NIST’s Post-Quantum Cryptography (PQC) competition: this was an open competition designed to standardize “post-quantum” cryptographic schemes. Critically, these schemes must be based on different mathematical problems — most notably, problems that don’t seem to admit efficient quantum solutions.

(5) Within this new set of schemes, the most popular class of schemes are based on problems related to mathematical objects called lattices. NIST-approved schemes based on lattice problems include Kyber and Dilithium (which I wrote about recently.) Lattice problems are also the basis of several efficient fully-homomorphic encryption (FHE) schemes.

This background sets up the new result.

Chen’s (not yet peer-reviewed) preprint claims a new quantum algorithm that efficiently solves the “shortest independent vector problem” (SIVP, as well as GapSVP) in lattices with specific parameters. If it holds up, the result could (with numerous important caveats) allow future quantum computers to break schemes that depend on the hardness of specific instances of these problems. The good news here is that even if the result is correct, the vulnerable parameters are very specific: Chen’s algorithm does not immediately apply to the recently-standardized NIST algorithms such as Kyber or Dilithium. Moreover, the exact concrete complexity of the algorithm is not instantly clear: it may turn out to be impractical to run, even if quantum computers become available.

But there is a saying in our field that attacks only get better. If Chen’s result can be improved upon, then quantum algorithms could render obsolete an entire generation of “post-quantum” lattice-based schemes, forcing cryptographers and industry back to the drawing board.

In other words, both a great technical result — and possibly a mild disaster.

As previously mentioned: I am neither an expert in lattice-based cryptography nor quantum computing. The folks who are those things are very busy trying to validate the writeup: and more than a few big results have fallen apart upon detailed inspection. For those searching for the latest developments, here’s a nice writeup by Nigel Smart that doesn’t tackle the correctness of the quantum algorithm (see updates at the bottom), but does talk about the possible implications for FHE and PQC schemes (TL;DR: bad for some FHE schemes, but really depends on the concrete details of the algorithm’s running time.) And here’s another brief note on a “bug” that was found in the paper, that seems to have been quickly addressed by the author.

Up until this week I had intended to write another long wonky post about complexity theory, lattices, and what it all meant for applied cryptography. But now I hope you’ll forgive me if I hold onto that one, for just a little bit longer.

On Ashton Kutcher and Secure Multi-Party Computation

Back in March I was fortunate to spend several days visiting Brussels, where I had a chance to attend a panel on “chat control“: the new content scanning regime being considered by the EU Commission. Among various requirements, this proposed legislation would mandate that client-side scanning technology be incorporated into encrypted text messaging applications like Signal, WhatsApp and Apple’s iMessage. The scanning tech would examine private messages for certain types of illicit content, including child sexual abuse media (known as CSAM), along with a broad category of textual conversations that constitute “grooming behavior.”

I have many thoughts about the safety of the EU proposal, and you can read some of them here. (Or you’re interested in the policy itself, you can read this recent opinion by the EU’s Council’s Legal Service.) But although the EU proposal is the inspiration for today’s post, it’s not precisely what I want to talk about. Instead, I’d like to clear up some confusion I’ve noticed around the specific technologies that many have proposed to use for building these systems.

Also: I want to talk about Ashton Kutcher.

*Ashton Kutcher visits the EU parliament in March 2023 (photo: Roberta Metsola.)*

It turns out there were a few people visiting Brussels to talk about encryption this March. Only a few days before my own visit, Ashton Kutcher gave a major speech to EU Parliament members in support of the Commission’s content scanning proposal. (And yes, I’m talking about that Ashton Kutcher, the guy who played Steve Jobs and is married to Mila Kunis.)

Kutcher has been very active in the technical debate around client-side scanning. He’s the co-founder of an organization called Thorn, which aims to develop cryptographic technology to enable content scanning. In March he gave an impassioned speech to the EU Parliament urging the deployment of these technologies, and remarkably he didn’t just talk about the policy side of things. When asked how to balance user privacy against the needs of scanning, he even made a concrete technical proposal: to use fully-homomorphic encryption (FHE) as a means to evaluate encrypted messages.

Now let me take a breath here before my head explodes.

I promise I am not one of those researchers who believes only subject-matter experts should talk about cryptography. Really I’m not! I write this blog because I think cryptography is amazing and I want everyone talking about it all the time. Seeing mainstream celebrities toss around phrases like “homomorphic encryption” is literally a dream come true and I wish it happened every single day.

And yet, there are downsides to this much winning.

I ran face-first into some of those downsides when I spoke to policy experts about Kutcher’s proposal. Terms like fully homomorphic encryption can be confusing and off-putting to non-cryptographers. When filtered through people who are not themselves experts in the technology, these ideas can produce the impression that cryptography is magical pixie dust we can sprinkle on all the hard problems in the world. And oh how I wish that were true. But in the real world, cryptography is full of tradeoffs. Solving one problem often just makes new problems, or creates major new costs, or else shifts the risks and costs to other parts of the system.

So when people on various sides of the debate asked me whether “fully-homomorphic encryption” could really do what Kutcher said it would, I couldn’t give an easy five-word answer. The real answer is something like: (scream emoji) it’s very complicated. That’s a very unsatisfying thing to have to tell people. Out here in the real world the technical reality is eye-glazing and full of dragons.

Which brings me to this post.

What Kutcher is really proposing is that we to develop systems that perform privacy-preserving computation on encrypted data. He wants to use these systems to enable “private” scanning of your text messages and media attachments, with the promise that these systems will only detect the “bad” content while keeping your legitimate private data safe. This is a complicated and fraught area of computer science. In what goes below, I am going to discuss at a high and relatively non-technical level the concepts behind it: what we can do, what we can’t do, and how practical it all is.

In the process I’ll discuss the two most powerful techniques that we have developed to accomplish this task: namely, multi-party computation (MPC) and, as an ingredient towards achieving the former, fully-homomorphic encryption (FHE). Then I’ll try to clear up the relationship between these two things, and explain the various tradeoffs that can make one better than the other for specific applications. Although these techniques can be used for so many things, throughout this post I’m going to focus on the specific application being considered in the EU: the use of privacy-preserving computation to conduct content scanning.

This post will not require any mathematics or much computer science, but it will require some patience. So find a comfortable chair and buckle in.

Computing on private data

Encryption is an ancient technology. Roughly speaking, it provides the ability to convert meaningful messages (and data) into a form that only you, and your intended recipient(s) can read. In the modern world this is done using public algorithms that everyone can look at, combined with secret keys that are held only by the intended recipients.

Modern encryption is really quite excellent. So as long as keys are kept safe, encrypted data can be sent over insecure networks or stored in risky locations like your phone. And while occasionally people find a flaw in an implementation of encryption, the underlying technology works very well.

But sometimes encryption can get in the way. The problem with encrypted data is that it’s, well, encrypted. When stored in this form, such data is virtually useless for practical purposes like performing calculations. Before you can compute on that data, you often need to first decrypt it and thus remove all the beautiful protections we get from encryption.

If your goal is to compute on multiple pieces of data that originate from different parties, the problem can become even more challenging. Who can we trust to do the computing? An obvious solution is to decrypt all that data and hand it to one very trustworthy person, who will presumably swear not to steal it or get hacked. Finding those parties can be quite challenging.

Fortunately for us all, the first academic cryptographers also happened to be computer scientists, and so this was exactly the sort of problem that excited them. Those researchers quickly devised a set of specific and general techniques designed to solve these problems, and also came up with a cool name for them: secure multi-party computation, or MPC for short.

MPC: secure private computation (in six eight ten paragraphs)

The setting of MPC is fairly simple: imagine that we have two (or more!) parties that each have some private data they don’t want to give to anyone else. Yet each of the parties is willing to provide their data as input to some specific computation, and are all willing to reveal the output of that computation — either to everyone involved, or perhaps just to some agreed subset of the parties. Can these parties now perform the computation jointly, without appointing a trusted party?

Let’s make this easier by using a concrete example.

Imagine a group of workers all know their own salaries, but don’t know anything about anyone else’s salary. Yet they wish to compute some statistics over their salary data: for example, the average of their salaries. These workers aren’t willing to share their own salary data with anyone else, but they are willing to submit it as one input in a large calculation under the strict condition that only the final result is ever revealed.

This might seem contrived to you, but it is in fact a real problem that some people have used MPC to solve.

An MPC protocol allows the workers to do this, without appointing a trusted central party or revealing their inputs (and intermediate calculations) to anyone else. At the conclusion of the protocol each party will learn only the result of the calculation:

*The “cloud” at the center of this diagram is actually a complicated protocol where every party sends messages to every other party.*

MPC protocols typically provide strong provable guarantees about their properties. The details vary, but typically speaking: no party will learn anything about the other parties’ inputs. Indeed they won’t even learn any partial information that might be produced during the calculation. Even better, all parties can be assured that the result will be correct: as long as all parties submit valid inputs to the computation, none of them should be able to force the calculation to go awry.

Now obviously there are caveats.

In practice, using MPC is a bit like making a deal with a genie: you need to pay very careful attention to the fine print. Even when the cryptography works perfectly, this does not mean that computing your function is actually “safe.” In fact, it’s entirely possible to choose functions that when computed securely are still devastating to your privacy.

For example: imagine that I use an MPC protocol to compute an average salary between myself and exactly one other worker. This could be a very bad idea! Note that if the other worker is curious, then she can figure out how much I make. That is: the average of our two wages reveals enough information that she find my wage given knowledge of her own input. This (obvious) caveat applies to many other uses of MPC, even when the technology works perfectly.

This is not a criticism of MPC, just the observation that it’s a tool. In practice, MPC (or any other cryptographic technology) is not a privacy solution by itself, at least not in the sense of privacy that real-world human beings like to think about. It provides certain guarantees that may or may not be useful for providing privacy in the real world.

What does MPC have to do with client-side scanning?

We began this post by discussing client-side scanning for encrypted messaging apps. This is a perfect example of an application that fits the MPC (or two-party computation) use-case perfectly. That’s because in this setting we generally have multiple parties with secret data who want to perform some joint calculation on their inputs.

In this setting, the first party is typically a client (usually a person using an encrypted messaging app like WhatsApp or Signal), who possesses some private text message or perhaps a media file they wish to send to another user. Under proposed law in the EU, their app could be legally mandated to “scan” that image to see if it contains illegal content.

According to the EU Commission, this scanning can be done in a variety of ways. For example, the device could compare an images against a secret database of known illicit content (typically using a specialized perceptual hash function.) However, while the EU plan starts there, their plans also get much more ambitious: they also intend to look for entirely new instances of illicit content as well as textual “grooming” conversations, possibly using machine learning (ML) models, that is, deep neural networks that will be trained to recognize data that fits these patterns. These various models must be sophisticated enough to understand entirely new images, as well as to derive meaning from complex interactive human conversation. None of this is likely to be very simple.

Now most of this could be done using standard techniques on the client device, except for one major limitation. The challenge in this setting is that the provider doing the scanning usually wants to keep these hashes and/or ML models secret.

There are several reasons for this. The first reason is that knowledge of the scanning model (or database of illicit content) makes it relatively easy for bad actors to evade the model. In other words, with only modest transformations it’s possible to modify “bad” images so that they become invisible to ML scanners.

Knowledge of the model can also allow for the creation of “poisoned” imagery: these include apparently-benign images (e.g., a picture of a cat) that trigger false positives in the scanner. (Indeed this such “colliding” images have been developed for some hash-based CSAM scanning proposals.) More worryingly, in some cases the hashes and neural network models can be “reversed” to extract the imagery and textual content they were trained on: this has all kinds of horrifying implications, and could expose abuse victims to even more trauma.

So here the user doesn’t want to send its confidential data to the provider for scanning, and the provider doesn’t want to hand its confidential model parameters to the user (or even to expose them inside the user’s phone, where they might be stolen by reverse-engineers.) This is exactly the situation that MPC was designed to handle:

Sketch of a client-side scanning architecture that uses (two-party) MPC between the client and the Provider. The client inputs the content to be scanned, while the server provides its secret model and/or hash database. The protocol gives the provider a copy of the user’s content if and only if the model says it’s illicit content, otherwise the provider sees nothing. (Note in this variant, the output goes only to the Provider.)

This makes everything very complicated. In fact, there has only been one real-world proposal for client-side CSAM scanning that has ever come (somewhat) close to deployment: that system was designed by Apple for a (now abandoned) client-side photo scanning plan. The Apple approach is cryptographically very ambitious: it uses neural-network based perceptual hashing, and otherwise exactly follows the architecture described above. However, critically: it relied on a neural-network based hash function that was not kept secret. Disastrous results ensued (see further below.)

(If you’re interested in getting a sense of how complex this protocol is, here is a white paper describing how it works.)

A diagram from the Apple CSAM scanning protocol.

Ok, so what kind of MPC protocols are available to us?

Multi-party computation is a broad category. It describes a class of protocols. In practice there are many different cryptographic techniques that allow us to realize it. Some of these (like the Apple protocol) were designed for specific applications, while others are capable of performing general-purpose computation.

I promised this post would not go into the weeds, but it’s worth pointing out that general MPC techniques typically make use of (some combination of) three different techniques: secret sharing, circuit garbling, and homomorphic encryption. Often, efficient modern systems will use a mixture of two or three of those techniques, ~~just to make everything more confusing~~ because they’re to maximize efficiency.

What is it that you need to know about these techniques? Here I’ll try, in a matter of a few sentences (that will draw me endless grief) to try to summarize the strengths and disadvantages of each technique.

Both secret sharing and garbling techniques share a common feature, which is that they require a great deal of data to be sent between the parties. In practice the amount of data sent between the parties will grow with (at least) the size of the inputs they’re computing on, but often will grow according to the complexity of the calculation they’re performing. For things like deep neural networks where both the data and calculation are huge, this generally results in fairly surprising amounts of data transfer.

This is not usually considered to be a problem on the general Internet or within EC2 datacenters, where data transfer is cheap. It can be quite a challenge when one of those parties is using a cellphone, however. That makes any scheme using these technologies subject to some very serious limitations.

Homomorphic encryption schemes take a different approach. These systems make use of specialized encryption schemes that are malleable. This means that encrypted data can be “modified” in useful ways without ever decrypting it.

In a bit more detail: in fully-homomorphic encryption MPC systems, a first party can encrypt its data under a public key that it generates. It can then send the encrypted data to a second party. This second party can then perform calculations on the ciphertext while it is still encrypted — adding and multiplying it together with other data (including data encrypted by the second party) to perform some calculation. Throughout this process all of the data remains encrypted. At the conclusion of this process, the second party will end up with a “modified” ciphertext that internally contains a final calculation result, but that it cannot read. To finish the protocol, the second party can send that ciphertext back to the first party, who can then decrypt it using its secret key and obtain the final output.

The major upshot of the pure-FHE technique is that it substantially reduces the amount of data that the two parties need to transmit between them, especially compared to the other MPC techniques. The downside of this approach is… well, there are several. One is that FHE calculations typically require vastly more computational effort (and hence time and carbon emissions) than the other techniques. Moreover, they may still require a good deal of data transfer — in part because the number of calculations that one can perform on a given ciphertext is usually limited by “noise” that turns up within the ciphertext. Hence, calculations must either be very simple or else broken up into “phases”, where the partial calculation result is decrypted and re-encrypted so that more computation can be done. This can be done interactively between the parties, or by the second party alone (using a technique called “bootstrapping”) but in both cases the cost is either much more bandwidth exchanged or a great deal of extra computation.

In practice, cryptographers rarely commit to a single approach. They instead combine all these techniques in order to achieve an appropriate balance of data-transfer and computational effort. These “mixed systems” tend to have merely large amounts of data transfer and large amounts of computation, but are still amazingly efficient compared to the alternatives.

For an example of this, consider this very optimized two-party MPC scheme aimed at performing neural network classification. This scheme takes (from the client) a 32×32 image, and evaluates a tiny 7-layer neural network held by a server in order to perform classification. As you can see, evaluating the model even on a single image requires about 8 seconds of computation and 200 megabytes of bandwidth exchange, for each image being evaluated:

Source: MUSE paper, figure 8. These are the times for a 7-layer MiniONN network trained on the CIFAR-10 dataset.

These numbers may seem quite high, but in fact they’re actually really impressive as these things go. Previous systems used nearly an order of magnitude more time and bandwidth to do their work. Maybe there will be further improvements in the future! Even on a pure efficiency basis there is much work to be done.

What are the other risks of MPC in this setting?

The final point I would like to make is that secure MPC (or MPC built using FHE as a tool) is not itself enough to satisfy the requirements of a safe content scanning system. As I mentioned above, MPC systems merely evaluate some function on private data. The question of whether that function is safe is left largely to the system designer.

In the case of these content scanning systems, the safety of the resulting system really comes down to a question of whether the algorithms work well, particularly in settings where “bad guys” can find adversarial inputs that try to disrupt the system. It also requires new techniques to ensure that the system cannot be abused. That is: there must be guarantees within the computation to ensure that the provider (or a party who hacks the provider) cannot change the model parameters to allow them to access your private content.

This is a much longer conversation than I want to have in this post, because it fundamentally requires one to think about whether the entire system makes sense. For a much longer discussion of the risks, see this paper.

This was nice, but I would like to learn more about each of these technologies!

The purpose of this post was just to give the briefest explanation of the techniques that exist for performing all of these calculations. If you’re interested in knowing (a lot more!) about these technologies, take a look at this textbook by Evans, Kolesnikov and Rosulek. MPC is an exciting area, and one that is advancing every single (research) conference cycle.

And maybe that is the lesson of this post: these technologies are still research techniques. It’s probably not quite time to put them out in the world.

A few thoughts on CSRankings.org

(Warning: nerdy inside-baseball academic blog post follows. If you’re looking for exciting crypto blogging, try back in a couple of days.)

If there’s one thing that academic computer scientists love (or love to hate), it’s comparing themselves to other academics. We don’t do what we do for the big money, after all. We do it — in large part — because we’re curious and want to do good science. (Also there’s sometimes free food.) But then there’s a problem: who’s going to tell is if we’re doing good science?

To a scientist, the solution seems obvious. We just need metrics. And boy, do we get them. Modern scientists can visit Google Scholar to get all sorts of information about their citation count, neatly summarized with an “H-index” or an “i10-index”. These metrics aren’t great, but they’re a good way to pass an afternoon filled with self-doubt, if that’s your sort of thing.

But what if we want to do something more? What if we want to compare institutions as well as individual authors? And even better, what if we could break those institutions down into individual subfields? You could do this painfully on Google Scholar, perhaps. Or you could put your faith in the abominable and apparently wholly made-up U.S. News rankings, as many academics (unfortunately) do.

Alternatively, you could actually collect some data about what scientists are publishing, and work with that.

This is the approach of a new site called “Computer Science Rankings”. As best I can tell, CSRankings is largely an individual project, and doesn’t have the cachet (yet) of U.S. News. At the same time, it provides researchers and administrators with something they love: another way to compare themselves, and to compare different institutions. Moreover, it does so with real data (rather than the Ouija board and blindfold that U.S. News uses). I can’t see it failing to catch on.

And that worries me, because the approach of CSRankings seems a bit arbitrary. And I’m worried about what sort of things it might cause us to do.

You see, people in our field take rankings very seriously. I know folks who have moved their families to the other side of the country over a two-point ranking difference in the U.S. News rankings — despite the fact that we all agree those are absurd. And this is before we consider the real impact on salaries, promotions, and awards of rankings (individual and institutional). People optimize their careers and publications to maximize these stats, not because they’re bad people, but because they’re (mostly) rational and that’s what rankings inspire rational people do.

To me this means we should think very carefully about what our rankings actually say.

Which brings me to the meat of my concerns with CSRankings. At a glance, the site is beautifully designed. It allows you to look at dozens of institutions, broken down by CS subfield. Within those subfields it ranks institutions by a simple metric: adjusted publication counts in top conferences by individual authors.

The calculation isn’t complicated. If you wrote a paper by yourself and had it published in one of the designated top conferences in your field, you’d get a single point. If you wrote a paper with a co-author, then you’d each get half a point. If you wrote a paper that doesn’t appear in a top conference, you get zero points. Your institution gets the sum-total of all the points its researchers receive.

If you believe that people are rational actors optimize for rankings, you might start to see the problem.

First off, what CSRankings is telling us is that we should ditch those pesky co-authors. If I could write a paper with one graduate student, but a second student also wants to participate, tough cookies. That’s the difference between getting 1/2 a point and 1/3 of a point. Sure, that additional student might improve the paper dramatically. They might also learn a thing or two. But on the other hand, they’ll hurt your rankings.

(Note: currently on CSRankings, graduate students at the same institution don’t get included in the institutional rankings. So including them on your papers will actually reduce your school’s rank.)

I hope it goes without saying that this could create bad incentives.

Second, in fields that mix systems and theory — like computer security — CSRankings is telling us that theory papers (which typically have fewer authors) should be privileged in the rankings over systems papers. This creates both a distortion in the metrics, and also an incentive (for authors who do both types of work) to stick with the one that produces higher rankings. That seems undesirable. But it could very well happen if we adopt these rankings uncritically.

Finally, there’s this focus on “top conferences”. One of our big problems in computer science is that we spend a lot of our time scrapping over a very limited number of slots in competitive conferences. This can be ok, but it’s unfortunate for researchers whose work doesn’t neatly fit into whatever areas those conference PCs find popular. And CSRankings gives zero credit for publishing anywhere but those top conferences, so you might as well forget about that.

(Of course, there’s a question about what a “top conference” even is. In Computer Security, where I work, CSRankings does not consider NDSS to be a top conference. That’s because only three conferences are permitted for each field. The fact that this number seems arbitrary really doesn’t help inspire a lot of confidence in the approach.)

So what can we do about this?

As much as I’d like to ditch rankings altogether, I realize that this probably isn’t going to happen. Nature abhors a vacuum, and if we don’t figure out a rankings system, someone else will. Hell, we’re already plagued by U.S. News, whose methodology appears to involve a popcorn machine and live tarantulas. Something, anything, has to be better than this.

And to be clear, CSRankings isn’t a bad effort. At a high level it’s really easy to use. Even the issues I mention above seem like things that could be addressed. More conferences could be added, using some kind of metric to scale point contributions. (This wouldn’t fix all the problems, but would at least mitigate the worst incentives.) Statistics could perhaps be updated to adjust for graduate students, and soften the blow of having co-authors. These things are not impossible.

And fixing this carefully seems really important. We got it wrong in trusting U.S. News. What I’d like is this time for computer scientists to actually sit down and think this one out before someone imposes a ranking system on top of us. What behaviors are we trying to incentivize for? Is it smaller author lists? Is it citation counts? Is it publishing only in a specific set of conferences?

I don’t know that anyone would agree uniformly that these should be our goals. So if they’re not, let’s figure out what they really are.

A Few Thoughts on Cryptographic Engineering

Some random thoughts about crypto. Notes from a course I teach. Pictures of my dachshunds.

Category: academics