Some random thoughts about crypto. Notes from a course I teach. Pictures of my dachshunds.
Author: Matthew Green
I'm a cryptographer and professor at Johns Hopkins University. I've designed and analyzed cryptographic systems used in wireless networks, payment systems and digital content protection platforms. In my research I look at the various ways cryptography can be used to promote user privacy.
On Monday a team of researchers from Münster, RUB and NXP disclosed serious cryptographic vulnerabilities in a number of encrypted email clients. The flaws, which go by the cute vulnerability name of “Efail”, potentially allow an attacker to decrypt S/MIME or PGP-encrypted email with only minimal user interaction.
By the standards of cryptographic vulnerabilities, this is about as bad as things get. In short: if an attacker can intercept and alter an encrypted email — say, by sending you a new (altered) copy, or modifying a copy stored on your mail server — they can cause many GUI-based email clients to send the full plaintext of the email to an attacker controlled-server. Even worse, most of the basic problems that cause this flaw have been known for years, and yet remain in clients.
The big (and largely under-reported) story of EFail is the way it affects S/MIME. That “corporate” email protocol is simultaneously (1) hated by the general crypto community because it’s awful and has a slash in its name, and yet (2) is probably the most widely-used email encryption protocol in the corporate world. The table at the right — excerpted from the paper — gives you a flavor of how Efail affects S/MIME clients. TL;DR it affects them very badly.
Efail also happens to affect a smaller, but non-trivial number of OpenPGP-compatible clients. As one might expect (if one has spent time around PGP-loving folks) the disclosure of these vulnerabilities has created something of a backlash on HN, and among people who make and love OpenPGP clients. Mostly for reasons that aren’t very defensible.
So rather than write about fun things — like the creation of CFB and CBC gadgets — today, I’m going to write about something much less exciting: the problem of vulnerabilitydisclosure in ecosystems like PGP. And how bad reactions to disclosure can hurt us all.
How Efail was disclosed to the PGP community
Putting together a comprehensive timeline of the Efail disclosure process would probably be a boring, time-intensive project. Fortunately Thomas Ptacek loves boring and time-intensive projects, and has already done this for us.
Briefly, the first Efail disclosures to vendors began last October, more than 200 days prior to the agreed publication date. The authors notified a large number of vulnerable PGP GUI clients, and also notified the GnuPG project (on which many of these projects depend) by February at the latest. From what I can tell every major vendor agreed to make some kind of patch. GnuPG decided that it wasn’t their fault, and basically stopped corresponding.
All parties agreed not to publicly discuss the vulnerability until an agreed date in April, which was later pushed back to May 15. The researchers also notified the EFF and some journalists under embargo, but none of them leaked anything. On May 14 someone dumped the bug onto a mailing list. So the EFF posted a notice about the vulnerability (which we’ll discuss a bit more below), and the researchers put up a website. That’s pretty much the whole story.
There are three basic accusations going around about the Efail disclosure. They can be summarized as (1) maintaining embargoes in coordinated disclosures is really hard, (2) the EFF disclosure “unfairly” made this sound like a serious vulnerability “when it isn’t”, and (3) everything was already patched anyway so what’s the big deal.
Disclosures are hard; particularly coordinated ones
I’ve been involved in two disclosures of flaws in open encryption protocols. (Both were TLS issues.) Each one poses an impossible dilemma. You need to simultaneously (a) make sure every vendor has as much advance notice as possible, so they can patch their software. But at the same time (b)you need to avoid tellingliterallyanyone, because nothing on the Internet stays secret. At some point you’ll notify some FOSS project that uses an open development mailing list or ticket server, and the whole problem will leak out into the open.
Disclosing bugs that affect PGP is particularly fraught. That’s because there’s no such thing as “PGP”. What we have instead is a large and distributed community that revolves around the OpenPGP protocol. The pillar of this community is the GnuPG project, which maintains the core GnuPG tool and libraries that many clients rely on. Then there are a variety of niche GUI-based clients and email plugin projects. Finally, there are commercial vendors like Apple and Microsoft. (Who are mostly involved in the S/MIME side of things, and may reluctantly allow PGP plugins.)
Then, of course there are thousands of end-users, who will generally fail to update their software unless something really bad and newsworthy happens.
The obvious solution to the disclosure problem to use a staged disclosure. You notify the big commercial vendors first, since that’s where most of the affected users are. Then you work your way down the “long tail” of open source projects, knowing that inevitably the embargo could break and everyone will have to patch in a hurry. And you keep in mind that no matter what happens, everyone will blame you for screwing up the disclosure.
For the PGP issues in Efail, the big client vendors are Mozilla (Thunderbird), Microsoft (Outlook) and maybe Apple (Mail). The very next obvious choice would be to patch the GnuPG tool so that it no longer spits out unauthenticated plaintext, which is the root of many of the problems in Efail.
The Efail team appears to have pursued exactly this approach for the client-side vulnerabilities. Sadly, the GnuPG team made the decision that it’s not their job to pre-emptively address problems that they view as ‘clients misusing the GnuPG API’ (my paraphrase), even when that misuse appears to be rampant across many of the clients that use their tool. And so the most obvious fix for one part of the problem was not available.
This is probably the most unfortunate part of the Efail story, because in this case GnuPG is very much at fault. Their API does something that directly violates cryptographic best practices— namely, releasing unauthenticated plaintext prior to producing an error message. And while this could be understood as a reasonable API design at design time, continuing to support this API even as clients routinely misuse it has now led to flaws across the ecosystem. The refusal of GnuPG to take a leadership role in preemptively safeguarding these vulnerabilities both increases the difficulty of disclosing these flaws, and increases the probability of future issues.
So what went wrong with the Efail disclosure?
Despite what you may have heard, given the complexity of this disclosure, very little went wrong. The main issues people have raised seem to have to do with the contents of an EFF post. And with some really bad communications from Robert J. Hansen at the Enigmail (and GnuPG) project.
The EFF post. The Efail researchers chose to use the Electronic Frontier Foundation as their main source for announcing the existence of the vulnerability to the privacy community. This hardly seems unreasonable, because the EFF is generally considered a trusted broker, and speaks to the right community (at least here in the US).
The EFF post doesn’t give many details, nor does it give a list of affected (or patched) clients. It does give two pretty mild recommendations:
Temporarily disable or uninstall your existing clients until you’ve checked that they’re patched.
Maybe consider using a more modern cryptosystem like Signal, at least until you know that your PGP client is safe again.
This naturally led to a huge freakout by many in the PGP community. Some folks, including vendors, have misrepresented the EFF post as essentially pushing people to “permanently” uninstall PGP, which will “put lives at risk” because presumably these users (whose lives are at risk, remember) will immediately fall back to sending incriminating information via plaintext emails — rather than temporarily switching their communications to one of several modern, well-studied secure messengers, or just not emailing for a few hours.
The most reasonable criticism I’ve heard of the EFF post is that it doesn’t give many details about which clients are patched, and which are vulnerable. This could presumably give someone the impression that this vulnerability is still present in their email client, and thus would cause them to feel less than secure in using it.
I have to be honest that to me that sounds like a really good outcome. The problem with Efail is that it doesn’t matter if your client is secure. The Efail vulnerability could affect you if even a single one of your communication partners is using an insecure client.
So needless to say I’m not very sympathetic to the reaction around the EFF post. If you can’t be sure whether your client is secure, you probably should feel insecure.
Bad communications from GnuPG and Enigmail. On the date of the disclosure, anyone looking for accurate information about security from two major projects — GnuPG and Enigmail — would not have been able to find it.
They wouldn’t have found it because developers from both Enigmail and GnuPGwere on mailing lists and Twitter claiming that they had never heard of Efail, and hadn’t been notified by the researchers. Needless to say, these allegations took off around the Internet, sometimes in place of real information that could have helped users (like, whether either project had patched.)
It goes without saying that neither allegation was actually true. In fact, both project members soon checked with their fellow developers (and their memories) and found out that they’d both been given months of notice by the researchers, and that Enigmail had even developed a patch. (However, it turned out that even this patch may not perfectly address the issue, and the community is still working to figure out exactly what still needs to be done.)
This is an understandable mistake, perhaps. But it sure is a bad one.
PGP is bad technology and it’s making a bad community
Now that I’ve made it clear that neither the researchers nor the EFF is out to get the PGP community, let me put on my mask and horns and tell you why someone should be.
I’ve written extensively about PGP on this blog, but in the past I’ve written mostly from a technical point of view about the problems with PGP. But what’s really problematic about PGP is not just the cryptography; it’s the story it tells about path dependence and how software communities work.
The fact of the matter is that OpenPGP is not really a cryptography project. That is, it’s not held together by cryptography. It’s held together by backwards-compatibility and (increasingly) a kind of an obsession with the idea of PGP as an end in and of itself, rather than as a means to actually make end-users more secure.
Let’s face it, as a protocol, PGP/OpenPGP is just not what we’d develop if we started over today. It was formed over the years out of mostly experimental parts, which were in turn replaced, bandaged and repaired — and then worked into numerous implementations, which all had to be insanely flexible and yet compatible with one another. The result is bad, and most of the software implementing it is worse. It’s the equivalent of a beloved antique sports car, where the electrical system is totally shot, but it still drives. You know, the kind of car where the owner has to install a hand-switch so he can turn the reverse lights on manually whenever he wants to pull out of a parking space.
Unfortunately for us all, (Open)PGP does exist. And that means that even fancy greenfield email projects feel like they need to support OpenPGP, or at least some subset of it. This in turn perpetuates the PGP myth, and causes other clients to use it. And as a direct result, even if some clients re-implement OpenPGP from scratch, other clients will end up using tools like GnuPG which will support unauthenticated encryption with bad APIs. And the cycle will go round and around, like a spaceship stuck near the event horizon of a black hole.
And as the standard perpetuates itself, largely for the sake of being a standard, it will fail to attract new security people. It will turn away exactly the type of people who should be working on these tools. Those people will go off and build encryption systems in a totally different area, or they’ll get into cryptocurrency. And — with some exceptions — the people who work in the community will increasingly work in that community because they’re supporting PGP, and not because they’re trying to seek out the best security technologies for their users. And the serious (email) users of PGP will be using it because they like the idea of using PGP better than they like using an actual, secure email standard.
And as things get worse, and fail to develop, people who work on it will become more dogmatic about its importance, because it’s something threatened and not a real security protocol that anyone’s using. To me that’s where PGP is going today, and that is why the community has such a hard time motivating itself to take these vulnerabilities seriously, and instead reacts defensively.
Maybe that’s a random, depressing way to end a post. But that’s the story I see in OpenPGP. And it makes me really sad.
Yesterday I happened upon a Wired piece by Steven Levy that covers Ray Ozzie’s proposal for “CLEAR”. I’m quoted at the end of the piece (saying nothing much), so I knew the piece was coming. But since many of the things I said to Levy were fairly skeptical — and most didn’t make it into the piece — I figured it might be worthwhile to say a few of them here.
Ozzie’s proposal is effectively a key escrow system for encrypted phones. It’s receiving attention now due to the fact that Ozzie has a stellar reputation in the industry, and due to the fact that it’s been lauded by law enforcement (and some famous people like Bill Gates). Ozzie’s idea is the just the latest bit of news in this second edition of the “Crypto Wars”, in which the FBI and various law enforcement agencies have been arguing for access to end-to-end encryption technologies — like phone storage and messaging — in the face of pretty strenuous opposition by (most of) the tech community.
In this post I’m going to sketch a few thoughts about Ozzie’s proposal, and about the debate in general. Since this is a cryptography blog, I’m mainly going to stick to the technical, and avoid the policy details (which are substantial). Also, since the full details of Ozzie’s proposal aren’t yet public — some are explained in the Levy piece and some in this patent — please forgive me if I get a few details wrong. I’ll gladly correct.
[Note: I’ve updated this post in several places in response to some feedback from Ray Ozzie. For the updated parts, look for the *. Also, Ozzie has posted some slides about his proposal.]
How to Encrypt a Phone
The Ozzie proposal doesn’t try tackle every form of encrypted data. Instead it focuses like a laser on the simple issue of encrypted phone storage. This is something that law enforcement has been extremely concerned about. It also represents the (relatively) low-hanging fruit of the crypto debate, for essentially two reasons: (1) there are only a few phone hardware manufacturers, and (2) access to an encrypted phone generally only takes place after law enforcement has gained physical access to it.
I’ve written about the details of encrypted phone storage in a couple of previousposts. A quick recap: most phone operating systems encrypt a large fraction of the data stored on your device. They do this using an encryption key that is (typically) derived from the user’s passcode. Many recent phones also strengthen this key by “tangling” it with secrets that are stored within the phone itself — typically with the assistance of a secure processor included in the phone. This further strengthens the device against simple password guessing attacks.
The upshot is that the FBI and local law enforcement have not — until very recently (more on that further below) — been able to obtain access to many of the phones they’ve obtained during investigation. This is due the fact that, by making the encryption key a function of the user’s passcode, manufacturers like Apple have effectively rendered themselves unable to assist law enforcement.
The Ozzie Escrow Proposal
Ozzie’s proposal is called “Clear”, and it’s fairly straightforward. Effectively, it calls for manufacturers (e.g., Apple) to deliberatelyput themselves back in the loop. To do this, Ozzie proposes a simple form of key escrow (or “passcode escrow”). I’m going to use Apple as our example in this discussion, but obviously the proposal will apply to other manufacturers as well.
Ozzie’s proposal works like this:
Prior to manufacturing a phone, Apple will generate a public and secret “keypair” for some public key encryption scheme. They’ll install the public key into the phone, and keep the secret key in a “vault” where hopefully it will never be needed.
When a user sets a new passcode onto their phone, the phone will encrypt a passcode under the Apple-provided public key. This won’t necessarily be the user’s passcode, but it will be an equivalent passcode that can unlock the phone.* It will store the encrypted result in the phone’s storage.
In the unlikely event that the FBI (or police) obtain the phone and need to access its files, they’ll place the phone into some form of law enforcement recovery mode. Ozzie describes doing this with some special gesture, or “twist”. Alternatively, Ozzie says that Apple itself could do something more complicated, such as performing an interactive challenge/response with the phone in order to verify that it’s in the FBI’s possession.
The phone will now hand the encrypted passcode to law enforcement. (In his patent, Ozzie suggests it might be displayed as a barcode on a screen.)
The law enforcement agency will send this data to Apple, who will do a bunch of checks (to make sure this is a real phone and isn’t in the hands of criminals). Apple will access their secret key vault, and decrypt the passcode. They can then send this back to the FBI.
Once the FBI enters this code, the phone will be “bricked”. Let me be more specific: Ozzie proposes that once activated, a secure chip inside the phone will now permanently “blow” several JTAG fuses monitored by the OS, placing the phone into a locked mode. By reading the value of those fuses as having been blown, the OS will never again overwrite its own storage, will never again talk to any network, and will become effectively unable to operate as a normal phone again.
When put into its essential form, this all seems pretty simple. That’s because it is. In fact, with the exception of the fancy “phone bricking” stuff in step (6), Ozzie’s proposal is a straightforward example of key escrow — a proposal that people have been making in various guises for many years. The devil is always in the details.
A vault of secrets
If we picture how the Ozzie proposal will change things for phone manufacturers, the most obvious new element is the key vault. This is not a metaphor. It literally refers to a giant, ultra-secure vault that will have to be maintained individually by different phone manufacturers. The security of this vault is no laughing matter, because it will ultimately store the master encryption key(s) for every single device that manufacturer ever makes. For Apple alone, that’s about a billion active devices.
Does this vault sound like it might become a target for organized criminals and well-funded foreign intelligence agencies? If it sounds that way to you, then you’ve hit on one of the most challenging problems with deploying key escrow systems at this scale. Centralized key repositories — that can decrypt every phone in the world — are basically a magnet for the sort of attackers you absolutely don’t want to be forced to defend yourself against.
So let’s be clear. Ozzie’s proposal relies fundamentally on the ability of manufacturers to secure extremely valuable key material for a massive number of devices against the strongest and most resourceful attackers on the planet. And not just rich companies like Apple. We’re also talking about the companies that make inexpensive phones and have a thinner profit margin. We’re also talking about many foreign-owned companies like ZTE and Samsung. This is key material that will be subject to near-constant access by the manufacturer’s employees, who will have to access these keys regularly in order to satisfy what may be thousands of law enforcement access requests every month.
If ever a single attacker gains access to that vault and is able to extract, a few “master” secret keys (Ozzie says that these master keys will be relatively small in size*) then the attackers will gain unencrypted access to every device in the world. Even better: if the attackers can do this surreptitiously, you’ll never know they did it.
Now in fairness, this element of Ozzie’s proposal isn’t really new. In fact, this key storage issue an inherent aspect of all massive-scale key escrow proposals. In the general case, the people who argue in favor of such proposals typically make two arguments:
We already store lots of secret keys — for example, software signing keys — and things works out fine. So this isn’t really a new thing.
It is certainly true that software manufacturers do store secret keys, with varying degrees of success. For example, many software manufacturers (including Apple) store secret keys that they use to sign software updates. These keys are generally locked up in various ways, and are accessed periodically in order to sign new software. In theory they can be stored in hardened vaults, with biometric access controls (as the vaults Ozzie describes would have to be.)
But this is pretty much where the similarity ends. You don’t have to be a technical genius to recognize that there’s a world of difference between a key that gets accessed once every month — and can be revoked if it’s discovered in the wild — and a key that may be accessed dozens of times per day and will be effectively undetectable if it’s captured by a sophisticated adversary.
Moreover, signing keys leak all the time. The phenomenon is so common that journalists have given it a name: it’s called “Stuxnet-style code signing”. The name derives from the fact that the Stuxnet malware — the nation-state malware used to sabotage Iran’s nuclear program — was authenticated with valid code signing keys, many of which were (presumably) stolen from various software vendors. This practice hasn’t remained with nation states, unfortunately, and has now become common in retail malware.
The folks who argue in favor of key escrow proposals generally propose that these keys can be stored securely in special devices called Hardware Security Modules (HSMs). Many HSMs are quite solid. They are not magic, however, and they are certainly not up to the threat model that a massive-scale key escrow system would expose them to. Rather than being invulnerable, they continue to cough up vulnerabilities like this one. A single such vulnerability could be game-over for any key escrow system that used it.
In some follow up emails, Ozzie suggests that keys could be “rotated” periodically, ensuring that even after a key compromise the system could renew security eventually. He also emphasizes the security mechanisms (such as biometric access controls) that would be present in such a vault. I think that these are certainly valuable and necessary protections, but I’m not convinced that they would be sufficient.
Assume a secure processor
Let’s suppose for a second that an attacker does get access to the Apple (or Samsung, or ZTE) key vault. In the section above I addressed the likelihood of such an attack. Now let’s talk about the impact.
Ozzie’s proposal has one significant countermeasure against an attacker who wants to use these stolen keys to illegally spy on (access) your phone. Specifically, should an attacker attempt to illegally access your phone, the phone will be effectively destroyed. This doesn’t protect you from having your files read — that horse has fled the stable — but it should alert you to the fact that something fishy is going on. This is better than nothing.
This measure is pretty important, not only because it protects you against evil maid attacks. As far as I can tell, this protection is pretty much the only measure by which theft of the master decryption keys might ever be detected. So it had better work well.
The details on how this might work aren’t very clear in Ozzie’s patent, but the Wired article describes it as follows. This quote to repeat Ozzie’s presentation at Columbia University:
What Ozzie appears to describe here is a secure processor contained within every phone. This processor would be capable if securely and irreversibly enforcing that once law enforcement has accessed a phone, that phone could no longer be placed into an operational state.
My concern with this part of Ozzie’s proposal is fairly simple: this processor does not currently exist. To explain why this, let me tell a story.
Back in 2013, Apple began installing a secure processor in each of their phones. While this secure processor (called the Secure Enclave Processor, or SEP) is not exactly the same as the one Ozzie proposes, the overall security architecture seems very similar.
One main goal of Apple’s SEP was to limit the number of passcode guessing attempts that a user could make against a locked iPhone. In short, it was designed to keep track of each (failed) login attempt and keep a counter. If the number of attempts got too high, the SEP would make the user wait a while — in the best case — or actively destroy the phone’s keys. This last protection is effectively identical to Ozzie’s proposal. (With some modest differences: Ozzie proposes to “blow fuses” in the phone, rather than erasing a key; and he suggests that this event would triggered by entry of a recovery passcode.*)
For several years, the SEP appeared to do its job fairly effectively. Then in 2017, everything went wrong. Two firms, Cellebrite and Grayshift, announced that they had products that effectively unlocked every single Applephone, without any need to dismantle the phone. Digging into the details of this exploit, it seems very clear that both firms — working independently — have found software exploits that somehow disable the protections that are supposed to be offered by the SEP.
The cost of this exploit (to police and other law enforcement)? About $3,000-$5,000 per phone. Or (if you like to buy rather than rent) about $15,000. Aso, just to add an element of comedy to the situation, the GrayKey source code appears to have recently been stolen. The attackers are extorting the company for two Bitcoin. Because 2018. (🤡👞)
Let me sum this up my point in case I’m not beating you about the head quite enough:
The richest and most sophisticated phone manufacturer in the entire worldtried to build a processor that achieved goals similar to those Ozzie requires. And as of April 2018, after five years of trying, they have been unable to achieve this goal — a goal that is critical to the security of the Ozzie proposal as I understand it.
Now obviously the lack of a secure processor today doesn’t mean such a processor will never exist. However, let me propose a general rule: if your proposal fundamentally relies on a secure lock that nobody can ever break, then it’s on you to show me how to build that lock.
While this mainly concludes my notes about on Ozzie’s proposal, I want to conclude this post with a side note, a response to something I routinely hear from folks in the law enforcement community. This is the criticism that cryptographers are a bunch of naysayers who aren’t trying to solve “one of the most fundamental problems of our time”, and are instead just rejecting the problem with lazy claims that it “can’t work”.
The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.
Maybe that’s unreasonable. If so, I can live with that.
In general I try to limit this blog to posts that focus on generally-applicable techniques in cryptography. That is, I don’t focus on the deeply wonky. But this post is going to be an exception. Today, I’m going to talk about a topic that most “typical” implementers don’t — and shouldn’t — think about.
Specifically: I’m going to talk about various techniques for making public key encryption schemes chosen ciphertext secure. I see this as the kind of post that would have saved me ages of reading when I was a grad student, so I figured it wouldn’t hurt to write it all down (even though this absolutely shouldn’t serve as a replacement for just reading the original papers!)
Background: CCA(1/2) security
Early (classical) ciphers used a relatively weak model of security, if they used one at all. That is, the typical security model for an encryption scheme was something like the following:
I generate an encryption key (or keypair for public-key encryption)
I give you the encryption of some message of my choice
You “win” if you can decrypt it
This is obviously not a great model in the real world, for several reasons. First off, in some cases the attacker knows a lot about the message to be decrypted. For example: it may come from a small space (like a set of playing cards). For this reason we require a stronger definition like “semantic security” that assumes the attacker can choose the plaintext distribution, and can also obtain the encryption of messages of his/her own choice. I’ve written more about this here.
More relevant to this post, another limitation of the above game is that — in some real-world examples — the attacker has even more power. That is: in addition to obtaining the encryption of chosen plaintexts, they may be able to convince the secret keyholder to decrypt chosen ciphertexts of their choice.
At first blush this seems like a really stupid model. If you can ask the keyholder to decrypt chosen ciphertexts, then isn’t the scheme just obviously broken? Can’t you just decrypt anything you want?
The answer, it turns out, is that there are many real-life examples where the attacker has decryption capability, but the scheme isn’t obviously broken. For example:
Sometimes an attacker can decrypt a limited set of ciphertexts (for example, because someone leaves the decryption machine unattended at lunchtime.) The question then is whether they can learn enough from this access to decrypt other ciphertexts that are generated after she loses access to the decryption machine — for example, messages that are encrypted after the operator comes back from lunch.
Sometimes an attacker can submit any ciphertext she wants — but will only obtain a partial decryption of the ciphertext. For example, she might learn only a single bit of information such as “did this ciphertext decrypt correctly”. The question, then, is whether she can leverage this tiny amount of data to fully decrypt some ciphertext of her choosing.
The first example is generally called a “non-adaptive” chosen ciphertext attack, or a CCA1 attack (and sometimes, historically, a “lunchtime” attack). There are a few encryption schemes that totally fall apart under this attack — the most famous textbook example is Rabin’s public key encryption scheme, which allows you to recover the full secret key from just a single chosen-ciphertext decryption.
The more powerful second example is generally referred to as an “adaptive” chosen ciphertext attack, or a CCA2 attack. The term refers to the idea that the attacker can select the ciphertexts they try to decrypt based on seeing a specific ciphertext that they want to attack, and by seeing the answers to specific decryption queries.
In this article we’re going to use the more powerful “adaptive” (CCA2) definition, because that subsumes the CCA1 definition. We’re also going to focus primarily on public-key encryption.
With this in mind, here is the intuitive definition of the experiment we want a CCA2 public-key encryption scheme to be able to survive:
I generate an encryption keypair for a public-key scheme and give you the public key.
You can send me (sequentially and adaptively) many ciphertexts, which I will decrypt with my secret key. I’ll give you the result of each decryption.
Eventually you’ll send me a pair of messages (of equal length) and I’ll pick a bit at random, and return to you the encryption of , which I will denote as .
You’ll repeat step (2), sending me ciphertexts to decrypt. If you send me I’ll reject your attempt. But I’ll decrypt any other ciphertext you send me, even if it’s only slightly different from .
The attacker outputs their guess . They “win” the game if .
We say that our scheme is secure if the attacker wins only with a significantly greater probability than they would win with if they simply guessed at random. Since they can win this game with probability 1/2 just by guessing randomly, that means we want (Probability attacker wins the game) – 1/2 to be “very small” (typically a negligible function of the security parameter).
You should notice two things about this definition. First, it gives the attacker the full decryption of any ciphertext they send me. This is obviously much more powerful than just giving the attacker a single bit of information, as we mentioned in the example further above. But note that powerful is good. If our scheme can remain secure in this powerful experiment, then clearly it will be secure in a setting where the attacker gets strictly less information from each decryption query.
The second thing you should notice is that we impose a single extra condition in step (4), namely that the attacker cannot ask us to decrypt . We do this only to prevent the game from being “trivial” — if we did not impose this requirement, the attacker could always just hand us back to decrypt, and they would always learn the value of .
(Notice as well that we do not give the attacker the ability to request encryptions of chosen plaintexts. We don’t need to do that in the public key encryption version of this game, because we’re focusing exclusively on public-key encryption here — since the attacker has the public key, she can encrypt anything she wants without my help.)
With definitions out of the way, let’s talk a bit about how we achieve CCA2 security in real schemes.
A quick detour: symmetric encryption
This post is mainly going to focus on public-key encryption, because that’s actually the problem that’s challenging and interesting to solve. It turns out that achieving CCA2 for symmetric-key encryption is really easy. Let me briefly explain why this is, and why the same ideas don’t work for public-key encryption.
(To explain this, we’ll need to slightly tweak the CCA2 definition above to make it work in the symmetric setting. The changes here are small: we won’t give the attacker a public key in step (1), and at steps (2) and (4) we will allow the attacker to request the encryption of chosen plaintexts as well as the decryption.)
The first observation is that many common encryption schemes — particularly, the widely-used cipher modes of operation like CBC and CTR — are semantically secure in a model where the attacker does not have the ability to decrypt chosen ciphertexts. However, these same schemes break completely in the CCA2 model.
The simple reason for this is ciphertext malleability. Take CTR mode, which is particularly easy to mess with. Let’s say we’ve obtained a ciphertext at step (4) (recall that is the encryption of ), it’s trivially easy to “maul” the ciphertext — simply by flipping, say, a bit of the message (i.e., XORing it with “1”). This gives us a new ciphertext that we are now allowed to submit for decryption. We are now allowed (by the rules of the game) to submit this ciphertext, and obtain , which we can use to figure out .
So how do we fix this problem? The straightforward observation is that we need to prevent the attacker from mauling the ciphertext . The generic approach to doing this is to modify the encryption scheme so that it includes a Message Authentication Code (MAC) tag computed over every CTR-mode ciphertext. The key for this MAC scheme is generated by the encrypting party (me) and kept with the encryption key. When asked to decrypt a ciphertext, the decryptor first checks whether the MAC is valid. If it’s not, the decryption routine will output “ERROR”. Assuming an appropriate MAC scheme, the attacker can’t modify the ciphertext (including the MAC) without causing the decryption to fail and produce a useless result.
So in short: in the symmetric encryption setting, the answer to CCA2 security is simply for the encrypting parties to authenticate each ciphertext using a secret authentication (MAC) key they generate. Since we’re talking about symmetric encryption, that extra (secret) authentication key can be generated and stored with the decryption key. (Some more efficient schemes make this all work with a single key, but that’s just an engineering convenience.) Everything works out fine.
So now we get to the big question.
CCA security is easy in symmetric encryption. Why can’t we just do the same thing for public-key encryption?
As we saw above, it turns out that strong authenticated encryption is sufficient to get CCA(2) security in the world of symmetric encryption. Sadly, when you try this same idea generically in public key encryption, it doesn’t always work. There’s a short reason for this, and a long one. The short version is: it matters whois doing the encryption.
Let’s focus on the critical difference. In the symmetric CCA2 game above, there is exactly one person who is able to (legitimately) encrypt ciphertexts. That person is me. To put it more clearly: the person who performs the legitimate encryption operations (and has the secret key) is also the same person who is performing decryption.
Even if the encryptor and decryptor aren’t literally the same person, the encryptor still has to be honest. (To see why this has to be the case, remember that the encryptor has shared secret key! If that party was a bad guy, then the whole scheme would be broken, since they could just output the secret key to the bad guys.) And once you’ve made the stipulation that the encryptor is honest, then you’re almost all the way there. It suffices simply to add some kind of authentication (a MAC or a signature) to any ciphertext she encrypts. At that point the decryptor only needs to determine whether any given ciphertexts actually came from the (honest) encryptor, and avoid decrypting the bad ones. You’re done.
Public key encryption (PKE) fundamentally breaks all these assumptions.
In a public-key encryption scheme, the main idea is that anyone can encrypt a message to you, once they get a copy of your public key. The encryption algorithm may sometimes be run by good, honest people. But it can also be run by malicious people. It can be run by parties who are adversarial. The decryptor has to be able to deal with all of those cases. One can’t simply assume that the “real” encryptor is honest.
Let me give a concrete example of how this can hurt you. A couple of years ago I wrote a post about flaws in Apple iMessage, which (at the time) used simple authenticated (public key) encryption scheme. The basic iMessage encryption algorithm used public key encryption (actually a combination of RSA with some AES thrown in for efficiency) so that anyone could encrypt a message to my key. For authenticity, it required that every message be signed with an ECDSA signature by the sender.
When I received a message, I would look up the sender’s public key and first make sure the signature was valid. This would prevent bad guys from tampering with the message in flight — e.g., executing nasty stuff like adaptive chosen ciphertext attacks. If you squint a little, this is almost exactly a direct translation of the symmetric crypto approach we discussed above. We’re simply swapping the MAC for a digital signature.
The problems with this scheme start to become apparent when we consider that there might be multiple people sending me ciphertexts. Let’s say the adversary is on the communication path and intercepts a signed message from you to me. They want to change (i.e., maul) the message so that they can execute some kind of clever attack. Well, it turns out this is simple. They simply rip off the honest signature and replace it one they make themselves:
The new message is identical, but now appears to come from a different person (the attacker). Since the attacker has their own signing key, they can maul the encrypted message as much as they want, and sign new versions of that message. If you plug this attack into (a version) of the public-key CCA2 game up top, you see they’ll win quite easily. All they have to do is modify the challenge ciphertext at step (4) to be signed with their own signing key, then they can change it by munging with the CTR mode encryption, and request the decryption of that ciphertext.
Of course if I only accept messages from signed by some original (guaranteed-to-be-honest) sender, this scheme might work out fine. But that’s not the point of public key encryption. In a real public-key scheme — like the one Apple iMessage was trying to build — I should be able to (safely) decrypt messages from anyone, and in that setting this naive scheme breaks down pretty badly.
Ok, this post has gotten a bit long, and so far I haven’t actually gotten to the various “tricks” for adding chosen ciphertext security to real public key encryption schemes. That will have to wait until the next post, to come shortly.
Over the past several years I’ve been privileged to observe two contradictory and fascinating trends. The first is that we’re finally starting to use the cryptography that researchers have spent the past forty years designing. We see this every day in examples ranging from encrypted messaging to phone security to cryptocurrencies.
But before I get to all of that — much further below — let me stress that this is not a post about the quantum computing apocalypse, nor is it about the success of cryptography in the 21st century. Instead I’m going to talk about something much more wonky. This post will be about one of the simplest (and coolest!) cryptographic technologies ever developed: hash-based signatures.
Hash-based signature schemes were first invented in the late 1970s by Leslie Lamport, and significantly improved by Ralph Merkle and others. For many years they were largely viewed as an interesting cryptographic backwater, mostly because they produce relatively large signatures (among other complications). However in recent years these constructions have enjoyed something of a renaissance, largely because — unlike signatures based on RSA or the discrete logarithm assumption — they’re largely viewed as resistant to serious quantum attacks like Shor’s algorithm.
First some background.
Background: Hash functions and signature schemes
In order to understand hash-based signatures, it’s important that you have some familiarity with cryptographic hash functions. These functions take some input string (typically or an arbitrary length) and produce a fixed-size “digest” as output. Common cryptographic hash functions like SHA2, SHA3 or Blake2 produce digests ranging from 256 bits to 512 bits.
In order for a function to be considered a ‘cryptographic’ hash, it must achieve some specific security requirements. There are a number of these, but here we’ll just focus on three common ones:
1. Pre-image resistance (sometimes known as “one-wayness”): given some output , it should be time-consuming to find an input such that . (There are many caveats to this, of course, but ideally the best such attack should require a time comparable to a brute-force search of whatever distribution is drawn from.)
2. Second-preimage resistance: This is subtly different than pre-image resistance. Given some input , it should be hard for an attacker to find a different input such that .
3. Collision resistance: It should be hard to find any two values such that . Note that this is a much stronger assumption than second-preimage resistance, since the attacker has complete freedom to find any two messages of its choice.
The example hash functions I mentioned above are believed to provide all of these properties. That is, nobody has articulated a meaningful (or even conceptual) attack that breaks any of them. That could always change, of course, in which case we’d almost certainly stop using them. (We’ll discuss the special case of quantum attacks a bit further below.)
Since our goal is to use hash functions to construct signature schemes, it’s also helpful to briefly review that primitive.
A digital signature scheme is a public key primitive in which a user (or “signer”) generates a pair of keys, called the public key and private key. The user retains the private key, and can use this to “sign” arbitrary messages — producing a resulting digital signature. Anyone who has possession of the public key can verify the correctness of a message and its associated signature.
From a security perspective, the main property we want from a signature scheme is unforgeability, or “existential unforgeability“. This requirement means that an attacker (someone who does not possess the private key) should not be able to forge a valid signature on a message that you did not sign. For more on the formal definitions of signature security, see this page.
The Lamport One-Time Signature
The first hash-based signature schemes was invented in 1979 by a mathematician named Leslie Lamport. Lamport observed that given only simple hash function — or really, a one-way function — it was possible to build an extremely powerful signature scheme.
Powerful that is, provided that you only need to sign one message! More on this below.
For the purposes of this discussion, let’s suppose we have the following ingredient: a hash function that takes in, say, 256-bit inputs and produces 256-bit outputs. SHA256 would be a perfect example of such a function. We’ll also need some way to generate random bits.
Let’s imagine that our goal is to sign 256 bit messages. To generate our secret key, the first thing we need to do is generate a series of 512 separate random bitstrings, each of 256 bits in length. For convenience, we’ll arrange those strings into two separate lists and refer to each one by an index as follows:
The lists represent the secret key that we’ll use for signing. To generate the public key, we now simply hash every one of those random strings using our function . This produces a second pair of lists:
We can now hand out our public key to the entire world. For example, we can send it to our friends, embed it into a certificate, or post it on Keybase.
Now let’s say we want to sign a 256-bit message using our secret key. The very first thing we do is break up and represent as a sequence of 256 individual bits:
The rest of the signing algorithm is blindingly simple. We simply work through the message from the first bit to the last bit, and select a string from one of the two secret key list. The list we choose from depends on value of the message bit we’re trying to sign.
Concretely, for i=1 to 256: if the message bit , we grab the secret key string from the list, and output that string as part of our signature. If the message bit we copy the appropriate string from the list. Having done this for each of the message bits, we concatenate all of the strings we selected. This forms our signature.
Here’s a toy illustration of the process, where (for simplicity) the secret key and message are only eight bits long. Notice that each colored box below represents a different 256-bit random string:
When a user — who already has the public key — receives a message and a signature, she can verify the signature easily. Let represent the component of the signature: for each such string. She simply checks the corresponding message bit and computes hash . If the result should match the corresponding element from . If the result should match the element in .
The signature is valid if every single element of the signature, when hashed, matches the correct portion of the public key. Here’s an (admittedly) sketchy illustration of the verification process, for at least one signature component:
If your initial impression of Lamport’s scheme is that it’s kind of insane, you’re both a bit right and a bit wrong.
Let’s start with the negative. First, it’s easy to see that Lamport signatures and keys are be quite large: on the order of thousands of bits. Moreover — and much more critically — there is a serious security limitation on this scheme: each key can only be used to sign one message. This makes Lamport’s scheme an example of what’s called a “one time signature”.
To understand why this restriction exists, recall that every Lamport signature reveals exactly one of the two possible secret key values at each position. If I only sign one message, the signature scheme works well. However, if I ever sign two messages that differ at any bit position , then I’m going to end up handing out both secret key values for that position. This can be a problem.
Imagine that an attacker sees two valid signatures on different messages. She may be able to perform a simple “mix and match” forgery attack that allows her to sign a third message that I never actually signed. Here’s how that might look in our toy example:
The degree to which this hurts you really depends on how different the messages are and how many of them you’ve given the attacker to play with. But it’s rarely good news.
So to sum up our observations about the Lamport signature scheme. It’s simple. It’s fast. And yet for various practical reasons it kind of sucks. Maybe we can do a little better.
From one-time to many-time signatures: Merkle’s tree-based signature
While the Lamport scheme is a good start, our inability to sign many messages with a single key is a huge drawback. Nobody was more inspired by this than Martin Hellman’s student Ralph Merkle. He quickly came up with a clever way to address this problem.
While we can’t exactly retrace Merkle’s steps, let’s see if we can recover some of the obvious ideas.
Let’s say our goal is to use Lamport’s signature to sign many messages — say of them. The most obvious approach is to simply generate different keypairs for the original Lamport scheme, then concatenate all the public keys together into one mega-key.
(Mega-key is a technical term I just invented).
If the signer holds on to all secret key components, she can now sign different messages by using exactly one secret Lamport key per message. This seems to solve the problem without ever requiring her to re-use a secret key. The verifier has all the public keys, and can verify all the received messages. No Lamport keys are ever used to sign twice.
Obviously this approach sucks big time.
Specifically, in this naive approach, signing times requires the signer to distribute a public key that is times as large as a normal Lamport public key. (She’ll also need to hang on to a similar pile of secret keys.) At some point people will get fed up with this, and probably won’t every get to be very large. Enter Merkle.
What Merkle proposed was a way to retain the ability to sign different messages, but without the linear-cost blowup of public keys. Merkle’s idea worked like this:
First, generate separate Lamport keypairs. We can call those .
Next, place each public key at one leaf of a Merkle hash tree (see below), and compute the root of the tree. This root will become the “master” public key of the new Merkle signature scheme.
The signer retains all of the Lamport public and secret keys for use in signing.
Merkle trees are described here. Roughly speaking, what they provide is a way to collect many different values such that they can be represented by a single “root” hash (of length 256 bits, using the hash function in our example). Given this hash, it’s possible to produce a simple “proof” that an element is in a given hash tree. Moreover, this proof has size that is logarithmic in the number of leaves in the tree.
To sign the message, the signer simply selects the public key from the tree, and signs the message using the corresponding Lamport secret key. Next, she concatenates the resulting signature to the Lamport public key and tacks on a “Merkle proof” that shows that this specific Lamport public key is contained within the tree identified by the root (i.e., the public key of the entire scheme). She then transmits this whole collection as the signature of the message.
(To verify a signature of this form, the verifier simply unpacks this “signature” as a Lamport signature, Lamport public key, and Merkle Proof. She verifies the Lamport signature against the given Lamport public key, and uses the Merkle Proof to verify that the Lamport public key is really in the tree. With these three objectives achieved, she can trust the signature as valid.)
This approach has the disadvantage of increasing the “signature” size by more than a factor of two. However, the master public key for the scheme is now just a single hash value, which makes this approach scale much more cleanly than the naive solution above.
As a final optimization, the secret key data can itself be “compressed” by generating all of the various secret keys using the output of a cryptographic pseudorandom number generator, which allows for the generation of a huge number of (apparently random) bits from a single short ‘seed’.
Making signatures and keys (a little bit) more efficient
Merkle’s approach allows any one-time signature to be converted into an -time signature. However, his construction still requires us to use some underlying one-time signature like Lamport’s scheme. Unfortunately the (bandwidth) costs of Lamport’s scheme are still relatively high.
There are two major optimizations that can help to bring down these costs. The first was also proposed by Merkle. We’ll cover this simple technique first, mainly because it helps to explain the more powerful approach.
If you recall Lamport’s scheme, in order sign a 256-bit message we required a vector consisting of 512 separate secret key (and public key) bitstrings. The signature itself was a collection of 256 of the secret bitstrings. (These numbers were motivated by the fact that each bit of the message to be signed could be either a “0” or a “1”, and thus the appropriate secret key element would need to be drawn from one of two different secret key lists.)
But here’s a thought: what if we don’t sign all of the message bits?
Let’s be a bit more clear. In Lamport’s scheme we sign every bit of the message — regardless of its value — by outputting one secret string. What if, instead of signing both zero values and one values in the message, we signed only the message bits were equal to one? This would cut the public and secret key sizes in half, since we could get rid of the list entirely.
We would now have only a single list of bitstrings in our secret key. For each bit position of the message where we would output a string . For every position where we would output… zilch. (This would also tend to reduce the size of signatures, since many messages contain a bunch of zero bits, and those would now ‘cost’ us nothing!)
An obvious problem with this approach is that it’s horrendously insecure. Please do not implement this scheme!
As an example, let’s say an attacker observes a (signed) message that begins with “1111…”, and she want to edit the message so it reads “0000…” — without breaking the signature. All she has to do to accomplish this is to delete several components of the signature! In short, while it’s very difficult to “flip” a zero bit into a one bit, it’s catastrophically easy to do the reverse.
But it turns out there’s a fix, and it’s quite elegant.
You see, while we can’t prevent an attacker from editing our message by turning one bits into zero bits, we can catch them. To do this, we tack on a simple “checksum” to the message,then sign the combination of the original message and the checksum. The signature verifier must verify the entire signature over both values, and also ensure that the received checksum is correct.
The checksum we use is trivial: it consists of a simple binary integer that represents the total number of zero bits in the original message.
If the attacker tries to modify the content of the message (excluding the checksum) in order to turn some one bit into a zero bit, the signature scheme won’t stop her. But this attack have the effect of increasing the number of zero bits in the message. This will immediately make the checksum invalid, and the verifier will reject the signature.
Of course, a clever attacker might also try to mess with the checksum (which is also signed along with the message) in order to “fix it up” by increasing the integer value of the checksum. However — and this is critical — since the checksum is a binary integer, in order to increase the value of the checksum, she would always need to turn some zero bit of the checksum into a one bit. But since the checksum is also signed, and the signature scheme prevents this kind of change, the attacker has nowhere to go.
(If you’re keeping track at home, this does somewhat increase the size of the ‘message’ to be signed. In our 256-bit message example, the checksum will require an additional eight bits and corresponding signature cost. However, if the message has many zero bits, the reduced signature size will typically still be a win.)
Winternitz: Trading space for time
The trick above reduces the public key size by half, and reduces the size of (some) signatures by a similar amount. That’s nice, but not really revolutionary. It still gives keys and signatures that are thousands of bits long.
It would be nice if we could make a bigger dent in those numbers.
The final optimization we’ll talk about was proposed by Robert Winternitz as a further optimization of Merkle’s technique above. In practical use it gives a 4-8x reduction in the size of signatures and public keys — at a cost of increasing both signing and verification time.
Winternitz’s idea is an example of a technique called a “time-space tradeoff“. This term refers to a class of solutions in which space is reduced at the cost of adding more computation time (or vice versa). To explain Winternitz’s approach, it helps to ask the following question:
What if, instead of signing messages composed of bits (0 or 1), we treated our messages as though they were encoded using larger symbol alphabets? For example, what if we signed four-bit ‘nibbles’? Or eight-bit bytes?
In Lamport’s original scheme, we had two lists of bitstrings as part of the signing (and public) key. One was for signing zero message bits, and the other was for one bits.
Now let’s say we want to sign bytes rather than bits. An obvious idea would be to increase the number of secret key lists (and public key) from two such list to 256 such lists — one list for each possible value of a message byte. The signer could work through the message one byte at a time, and pick from the much larger menu of key values.
Unfortunately, this solution really stinks. It reduces the size of the signature by a factor of eight, at a cost of increasing the public and secret key size by a factor of 256. Even this might be fine if the public keys could be used for many signatures, but they can’t — when it comes to key re-use, this “byte signing version of Lamport” suffers from the same limitations as the original Lamport signature.
All of which brings us to Winternitz’s idea.
Since it’s too expensive to store and distribute 256 truly random lists, what if we generated those lists programatically only when we needed them?
Winternitz’s idea was to generate single list of random seeds for our initial secret key. Rather than generating additional lists randomly, he proposed to use the hash function on each element of that initial secret key, in order to derive the next such list for the secret key: . And similarly, one can use the hash function again on that list to get the next list . And so on for many possible lists.
This is helpful in that we now only need to store a single list of secret key values , and we can derive all the others lists on-demand just by applying the hash function.
But what about the public key? This is where Winternitz gets clever.
Specifically, Winternitz proposed that the public key could be derived by applying the hash function one more time to the final secret key list. This would produce a single public key list . (In practice we only need 255 secret key lists, since we can treat the final secret key list as the public key.) The elegance of this approach is that given any one of the possible secret key values it’s always possible to check it against the public key, simply by hashing forward multiple times and seeing if we reach a public key element.
The whole process of key generation is illustrated below:
To sign the first byte of a message, we would pick a value from the appropriate list. For example, if the message byte was “0”, we would output a value from in our signature. If the message byte was “20”, we would output a value from . For bytes with the maximal value “255” we don’t have a secret key list. That’s ok: in this case we can output an empty string, or we can output the appropriate element of .
Note as well that in practice we don’t really need to store each of these secret key lists. We can derive any secret key value on demand given only the original list . The verifier only holds the public key vector and (as mentioned above) simply hashes forward an appropriate number of times — depending on the message byte — to see whether the result is equal to the appropriate component of the public key.
Like the Merkle optimization discussion in the previous section, the scheme as presented so far has a glaring vulnerability. Since the secret keys are related (i.e., ), anyone who sees a message that signs the message “0” can easily change the corresponding byte of the message to a “1”, and update the signature to match. In fact, an attacker can increment the value of any byte(s) in the message. Without some check on this capability, this would allow very powerful forgery attacks.
The solution to this problem is similar to the we discussed just above. To prevent an attacker from modifying the signature, the signer calculates and also signs a checksum of the original message bytes. The structure of this checksum is designed to prevent the attacker from incrementing any of the bytes, without invalidating the checksum. I won’t go into the gory details right now, but you can find them here.
It goes without saying that getting this checksum right is critical. Screw it up, even a little bit, and some very bad things can happen to you. This would be particularly unpleasant if you deployed these signatures in a production system.
Illustrated in one terrible picture, a 4-byte toy example of the Winternitz scheme looks like this:
What are hash-based signatures good for?
Throughout this entire discussion, we’ve mainly been talking about the how of hash-based signatures rather than the why of them. It’s time we addressed this. What’s the point of these strange constructions?
One early argument in favor of hash-based signatures is that they’re remarkably fast and simple. Since they only require only the evaluation of a hash function and some data copying, from a purely computational cost perspective they’re highly competitive with schemes like ECDSA and RSA. This could hypothetically be important for lightweight devices. Of course, this efficiency comes at a huge tradeoff in bandwidth efficiency.
More concretely: the imminent arrival of quantum computers is going to have a huge impact on the security of nearly all of our practical signature schemes, ranging from RSA to ECDSA and so on. This is due to the fact that Shor’s algorithm (and its many variants) provides us with a polynomial-time algorithm for solving the discrete logarithm and factoring problems, which is likely to render most of these schemes insecure.
Most implementations of hash-based signatures are not vulnerable to Shor’s algorithm. That doesn’t mean they’re completely immune to quantum computers, of course. The best general quantum attacks on hash functions are based on a search technique called Grover’s algorithm, which reduces the effective security of a hash function. However, the reduction in effective security is nowhere near as severe as Shor’s algorithm (it ranges between the square root and cube root), and so security can be retained by simply increasing the internal capacity and output size of the hash function. Hash functions like SHA3 were explicitly developed with large digest sizes to provide resilience against such attacks.
So at least in theory, hash-based signatures are interesting because they provide us with a line of defense against future quantum computers — for the moment, anyway.
What about the future?
Note that so far I’ve only discussed some of the “classical” hash-based signature schemes. All of the schemes I described above were developed in the 1970s or early 1980s. This hardly brings us up to present day.
After I wrote the initial draft of this article, a few people asked for pointers on more recent developments in the field. I can’t possibly give an exhaustive list here, but let me describe just a couple of the more recent ideas that others brought up (thanks to Zooko and Claudio Orlandi):
Signatures without state. A limitation of all the signature schemes above is that they require the signer to keep state between signatures. In the case of one-time signatures the reasoning is obvious: you have to avoid using any key more than once. But even in the multi-time Merkle signature, you have to remember which leaf public key you’re using, so you can avoid using any leaf twice. Even worse, the Merkle scheme requires the signer to construct all the keypairs up front, so the total number of signatures is bounded.
In the 1980s, Oded Goldreich pointed out that one can build signatures without these limitations. The idea is as follows: rather than generate all signatures up front, one can generate a short “certification tree” of one-time public keys. Each of these keys can be used to sign additional one-time public keys at a lower layer of the tree, and so on and so forth. Provided all of the private keys are generated deterministically using a single seed, this means that the full tree need not exist in full at key generation time, but can be built on-demand whenever a new key is generated. Each signature contains a “certificate chain” of signatures and public keys starting from the root and going down to a real signing keypair at the bottom of the tree.
This technique allows for the construction of extremely “deep” trees with a vast (exponential) number of possible signing keys. This allows us to construct so many one-time public keys that if we pick a signing key randomly (or pseudorandomly), then with high probability the same signing will never be used twice. This is intuition, of course. For a highly optimized and specific instantiation of this idea, see the SPHINCS proposal by Bernstein et al. The concrete SPHINCS-256 instantiation gives signatures that are approximately 41KB in size.
Picnic: post-quantum zero-knowledge based signatures. In a completely different direction lies Picnic. Picnic is based on a new non-interactive zero-knowledge proof system called ZKBoo. ZKBoo is a new ZK proof system that works on the basis of a technique called “MPC in the head”, where the prover uses multi-party computation to calculate a function with the prover himself. This is too complicated to explain in a lot of detail, but the end result is that one can then prove complicated statements using only hash functions.
The long and short of it is that Picnic and similar ZK proof systems provide a second direction for building signatures out of hash functions. The cost of these signatures is still quite large — hundreds of kilobytes. But future improvements in the technique could substantially reduce this size.
Epilogue: the boring security details
If you recall a bit earlier in this article, I spent some time describing the security properties of hash functions. This wasn’t just for show. You see, the security of a hash-based signature depends strongly on which properties a hash function is able to provide.
(And by implication, the insecurity of a hash-based signature depends on which properties of a hash function an attacker has managed to defeat.)
Most original papers discussing hash-based signatures generally hang their security arguments on the preimage-resistance of the hash function. Intuitively, this seems pretty straightforward. Let’s take the Lamport signature as an example. Given a public key element , an attacker who is able to compute hash preimages can easily recover a valid secret key for that component of the signature. This attack obviously renders the scheme insecure.
However, this argument considers only the case where an attacker has the public key but has not yet seen a valid signature. In this case the attacker has a bit more information. They now have (for example) both the public key and a portion of the secret key: and . If such an attacker can find a second pre-image for the public key she can’t sign a different message. But she has produced a new signature. In the strong definition of signature security (SUF-CMA) this is actually considered a valid attack. So SUF-CMA requires the slightly stronger property of second-preimage resistance.
Of course, there’s a final issue that crops up in most practical uses of hash-based signature schemes. You’ll notice that the description above assumes that we’re signing 256-bit messages. The problem with this is that in real applications, many messages are longer than 256 bits. As a consequence, most people use the hash function to first hash the message as and then the sign the resulting value instead of the message.
This leads to a final attack on the resulting signature scheme, since the existential unforgeability of the scheme now depends on the collision-resistance of the hash function. An attacker who can find two different messages such that has now found a valid signature on two different messages. This leads to a trivial break of EUF-CMA security.
In Fall 2016 I was invited to come to Miami as part of a team that independently validated some alleged flaws in implantable cardiac devices manufactured by St. Jude Medical (now part of Abbott Labs). These flaws were discovered by a company called MedSec. The story got a lot of traction in the press at the time, primarily due to the fact that a hedge fund called Muddy Waters took a large short position on SJM stock as a result of these findings. SJM subsequently sued both parties for defamation. The FDA later issued a recall for many of the devices.
Due in part to the legal dispute (still ongoing!), I never had the opportunity to write about what happened down in Miami, and I thought that was a shame: because it’s really interesting. So I’m belatedly putting up this post, which talks a bit MedSec’s findings, and implantable device security in general.
By the way: “we” in this case refers to a team of subject matter experts hired by Bishop Fox, and retained by legal counsel for Muddy Waters investments. I won’t name the other team members here because some might not want to be troubled by this now, but they did most of the work — and their names can be found in this public expert report (as can all the technical findings in this post.)
Quick disclaimers: this post is my own, and any mistakes or inaccuracies in it are mine and mine alone. I’m not a doctor so holy cow this isn’t medical advice. Many of the flaws in this post have since been patched by SJM/Abbot. I was paid for my time and travel by Bishop Fox for a few days in 2016, but I haven’t worked for them since. I didn’t ask anyone for permission to post this, because it’s all public information.
A quick primer on implantable cardiac devices
Implantable cardiac devices are tiny computers that can be surgically installed inside a patient’s body. Each device contains a battery and a set of electrical leads that can be surgically attached to the patient’s heart muscle.
When people think about these devices, they’re probably most familiar with the cardiacpacemaker. Pacemakers issue small electrical shocks to ensure that the heart beats at an appropriate rate. However, the pacemaker is actually one of the least powerful implantable devices. A much more powerful type of device is the Implantable Cardioverter-Defibrillator (ICD). These devices are implanted in patients who have a serious risk of spontaneously entering a dangerous state in which their heart ceases to pump blood effectively. The ICD continuously monitors the patient’s heart rhythm to identify when the patient’s heart has entered this condition, and applies a series of increasingly powerful shocks to the heart muscle to restore effective heart function. Unlike pacemakers, ICDs can issue shocks of several hundred volts or more, and can both stop and restart a patient’s normal heart rhythm.
Like most computers, implantable devices can communicate with other computers. To avoid the need for external data ports – which would mean a break in the patient’s skin – these devices communicate via either a long-range radio frequency (“RF”) or a near-field inductive coupling (“EM”) communication channel, or both. Healthcare providers use a specialized hospital device called a Programmer to update therapeutic settings on the device (e.g., program the device, turn therapy off). Using the Programmer, providers can manually issue commands that cause an ICD to shock the patient’s heart. One command, called a “T-Wave shock” (or “Shock-on-T”) can be used by healthcare providers to deliberately induce ventrical fibrillation. This capability is used after a device is implanted, in order to test the device and verify it’s functioning properly.
Because the Programmer is a powerful tool – one that could cause harm if misused – it’s generally deployed in a physician office or hospital setting. Moreover, device manufacturers may employ special precautions to prevent spurious commands from being accepted by an implantable device. For example:
Some devices require that all Programmer commands be received over a short-range communication channel, such as the inductive (EM) channel. This limits the communication range to several centimeters.
Other devices require that a short-range inductive (EM) wand must be used to initiate a session between the Programmer and a particular implantable device. The device will only accept long-range RF commands sent by the Programmer after this interaction, and then only for a limited period of time.
From a computer security perspective, both of these approaches have a common feature: using either approach requires some form of close-proximity physical interaction with the patient before the implantable device will accept (potentially harmful) commands via the long-range RF channel. Even if a malicious party steals a Programmer from a hospital, she may still need to physically approach the patient – at a distance limited to perhaps centimeters – before she can use the Programmer to issue commands that might harm the patient.
In addition to the Programmer, most implantable manufacturers also produce some form of “telemedicine” device. These devices aren’t intended to deliver commands like cardiac shocks. Instead, they exist to provide remote patient monitoring from the patient’s home. Telematics devices use RF or inductive (EM) communications to interrogate the implantable device in order to obtain episode history, usually at night when the patient is asleep. The resulting data is uploaded to a server (via telephone or cellular modem) where it can be accessed by healthcare providers.
What can go wrong?
Before we get into specific vulnerabilities in implantable devices, it’s worth asking a very basic question. From a security perspective, what should we even be worried about?
There are a number of answers to this question. For example, an attacker might abuse implantable device systems or infrastructure to recover confidential patient data (known as PHI). Obviously this would be bad, and manufacturers should design against it. But the loss of patient information is, quite frankly, kind of the least of your worries.
A much scarier possibility is that an attacker might attempt to harm patients. This could be as simple as turning off therapy, leaving the patient to deal with their underlying condition. On the much scarier end of the spectrum, an ICD attacker could find a way to deliberately issue dangerous shocks that could stop a patient’s heart from functioning properly.
Now let me be clear: this isn’t not what you’d call a high probability attack. Most people aren’t going to be targeted by sophisticated technical assassins. The concerning thing about this the impact of such an attack is significantly terrifying that we should probably be concerned about it. Indeed, some high-profile individuals have already taken precautions against it.
The real nightmare scenario is a mass attack in which a single resourceful attacker targets thousands of individuals simultaneously — perhaps by compromising a manufacturer’s back-end infrastructure — and threatens to harm them all at the same time. While this might seem unlikely, we’ve already seen attackers systematically target hospitals with ransomware. So this isn’t entirely without precedent.
Securing device interaction physically
The real challenge in securing an implantable device is that too much security could hurt you. As tempting as it might be to lard these devices up with security features like passwords and digital certificates, doctors need to be able to access them. Sometimes in a hurry.
This is a big deal. If you’re in a remote emergency room or hospital, the last thing you want is some complex security protocol making it hard to disable your device or issue a required shock. This means we can forget about complex PKI and revocation lists. Nobody is going to have time to remember a password. Even merely complicated procedures are out — you can’t afford to have them slow down treatment.
At the same time, these devices obviously must perform some sort of authentication: otherwise anyone with the right kind of RF transmitter could program them — via RF, from a distance. This is exactly what you want to prevent.
Many manufacturers have adopted an approach that cut through this knot. The basic idea is to require physical proximity before someone can issue commands to your device. Specifically, before anyone can issue a shock command (even via a long-range RF channel) they must — at least briefly — make close physical contact with the patient.
This proximity be enforced in a variety of ways. If you remember, I mentioned above that most devices have a short-range inductive coupling (“EM”) communications channel. These short-range channels seem ideal for establishing a “pairing” between a Programmer and an implantable device — via a specialized wand. Once the channel is established, of course, it’s possible to switch over to long-range RF communications.
This isn’t a perfect solution, but it has a lot going for it: someone could still harm you, but they would have to at least get a transmitter within a few inches of your chest before doing so. Moreover, you can potentially disable harmful commands from an entire class of device (like telemedecine monitoring devices) simply by leaving off the wand.
St. Jude Medical and MedSec
So given this background, what did St. Jude Medical do? All of the details are discussed in a full expert report published by Bishop Fox. In this post we I’ll focus on the most serious of MedSec’s claims, which can be expressed as follows:
Using only the hardware contained within a “Merlin @Home” telematics device, it was possible to disable therapy and issue high-power “shock” commands to an ICD from a distance, and without first physically interacting with the implantable device at close range.
This vulnerability had several implications:
The existence of this vulnerability implies that – through a relatively simple process of “rooting” and installing software on a Merlin @Home device – a malicious attacker could create a device capable of issuing harmful shock commands to installed SJM ICD devices at a distance. This is particularly worrying given that Merlin @Home devices are widely deployed in patients’ homes and can be purchased on eBay for prices under $30. While it might conceivably be possible to physically secure and track the location of all PCS Programmer devices, it seems challenging to physically track the much larger fleet of Merlin @Home devices.
More critically, it implies that St. Jude Medical implantable devices do not enforce a close physical interaction (e.g., via an EM wand or other mechanism) prior to accepting commands that have the potential to harm or even kill patients. This may be a deliberate design decision on St. Jude Medical’s part. Alternatively, it could be an oversight. In either case, this design flaw increases the risk to patients by allowing for the possibility that remote attackers might be able to cause patient harm solely via the long-range RF channel.
If it is possible – using software modifications only – to issue shock commands from the Merlin @Home device, then patients with an ICD may be vulnerable in the hypothetical event that their Merlin @Home device becomes remotely compromised by an attacker. Such a compromise might be accomplished remotely via a network attack on a single patient’s Merlin @Home device. Alternatively, a compromise might be accomplished at large scale through a compromise of St. Jude Medical’s server infrastructure.
We stress that the final scenario is strictly hypothetical. MedSec did not allege a specific vulnerability that allows for the remote compromise of Merlin @Home devices or SJM infrastructure. However, from the perspective of software and network security design, these attacks are one of the potential implications of a design that permits telematics devices to send such commands to an implantable device. It is important to stress that none of these attacks would be possible if St. Jude Medical’s design prohibited the implantable from accepting therapeutic commands from the Merlin @Home device (e.g., by requiring close physical interaction via the EM wand, or by somehow authenticating the provenance of commands and restricting critical commands to be sent by the Programmer only).
Validating MedSec’s claim
To validate MedSec’s claim, we examined their methodology from start to finish. This methodology included extracting and decompiling Java-based software from a single PCS Programmer; accessing a Merlin @Home device to obtain a root shell via the JTAG port; and installing a new package of custom software written by MedSec onto a used Merlin @Home device.
We then observed MedSec issue a series of commands to an ICD device using a Merlin @Home device that had been customized (via software) as described above. We used the Programmer to verify that these commands were successfully received by the implantable device, and physically confirmed that MedSec had induced shocks by attaching a multimeter to the leads on the implantable device.
Finally, we reproduced MedSec’s claims by opening the case of a second Merlin @Home device (after verifying that the tape was intact over the screw holes), obtaining a shell by connecting a laptop computer to the JTAG port, and installing MedSec’s software on the device. We were then able to issue commands to the ICD from a distance of several feet. This process took us less than three hours in total, and required only inexpensive tools and a laptop computer.
What are the technical details of the attack?
Simply reproducing a claim is only part of the validation process. To verify MedSec’s claims we also needed to understand why the attack described above was successful. Specifically, we were interested in identifying the security design issues that make it possible for a Merlin @Home device to successfully issue commands that are not intended to be issued from this type of device. The answer to this question is quite technical, and involves the specific way that SJM implantable devices verify commands before accepting them.
MedSec described to us the operation of SJM’s command protocol as part of their demonstration. They also provided us with Java JAR executable code files taken from the hard drive of the PCS Programmer. These files, which are not obfuscated and can easily be “decompiled” into clear source code, contain the software responsible for implementing the Programmer-to-Device communications protocol.
By examining the SJM Programmer code, we verified that Programmer commands are authenticated through the inclusion of a three-byte (24 bit) “authentication tag” that must be present and correct within each command message received by the implantable device. If this tag is not correct, the device will refuse to accept the command.
From a cryptographic perspective, 24 bits is a surprisingly short value for an important authentication field. However, we note that even this relatively short tag might be sufficient to prevent forgery of command messages – provided the tag ws calculated using a secure cryptographic function (e.g., a Message Authentication Code) using a fresh secret key that cannot be predicted by an the attacker.
Based on MedSec’s demonstration, and on our analysis of the Programmer code, it appears that SJM does not use the above approach to generate authentication tags. Instead, SJM authenticates the Programmer to the implantable with the assistance of a “key table” that is hard-coded within the Java code within the Programmer. At minimum, any party who obtains the (non-obfuscated) Java code from a legitimate SJM Programmer can gain the ability to calculate the correct authentication tags needed to produce viable commands – without any need to use the Programmer itself.
Moreover, MedSec determined – and successfully demonstrated – that there exists a “Universal Key”, i.e., a fixed three-byte authentication tag, that can be used in place of the calculated authentication tag. We identified this value in the Java code provided by MedSec, and verified that it was sufficient to issue shock commands from a Merlin @Home to an implantable device.
While these issues alone are sufficient to defeat the command authentication mechanism used by SJM implantable devices, we also analyzed the specific function that is used by SJM to generate the three-byte authentication tag. To our surprise, SJM does not appear to use a standard cryptographic function to compute this tag. Instead, they use an unusual and apparently “homebrewed” cryptographic algorithm for the purpose.
Specifically, the PCS Programmer Java code contains a series of hard-coded 32-bit RSA public keys. To issue a command, the implantable device sends a value to the Programmer. This value is then “encrypted” by the Programmer using one of the RSA public keys, and the resulting output is truncated to produce a 24-bit output tag.
The above is not a standard cryptographic protocol, and quite frankly it is difficult to see what St. Jude Medical is trying to accomplish using this technique. From a cryptographic perspective it has several problems:
The RSA public keys used by the PCS Programmers are 32 bits long. Normal RSA keys are expected to be a minimum of 1024 bits in length. Some estimates predict that a 1024-bit RSA key can be factored (and thus rendered insecure) in approximately one year using a powerful network of supercomputers. Based on experimentation, we were able to factor the SJM public keys in less than one second on a laptop computer.
Even if the RSA keys were of an appropriate length, the SJM protocol does not make use of the corresponding RSA secret keys. Thus the authentication tag is not an RSA signature, nor does it use RSA in any way that we are familiar with.
As noted above, since there is no shared session key established between the specific implantable device and the Programmer, the only shared secret available to both parties is contained within the Programmer’s Java code. Thus any party who extracts the Java code from a PCS Programmer will be able to transmit valid commands to any SJM implantable device.
Our best interpretation of this design is that the calculation is intended as a form of “security by obscurity”, based on the assumption that an attacker will not be able to reverse engineer the protocol. Unfortunately, this approach is rarely successful when used in security systems. In this case, the system is fundamentally fragile – due to the fact that code for computing the correct authentication tag is likely available in easily-decompiled Java bytecode on each St. Jude Medical Programmer device. If this code is ever extracted and published, all St. Jude Medical devices become vulnerable to command forgery.
How to remediate these attacks?
To reiterate, the fundamental security concerns with these St. Jude Medical devices (as of 2016) appeared to be problems of design. These were:
SJM implantable devices did not require close physical interaction prior to accepting commands (allegedly) sent by the Programmer.
SJM did not incorporate a strong cryptographic authentication mechanism in its RF protocol to verify that commands are truly sent by the Programmer.
Even if the previous issue was addressed, St. Jude did not appear to have an infrastructure for securely exchanging shared cryptographic keys between a legitimate Programmer and an implantable device.
There are various ways to remediate these issues. One approach is to require St. Jude implantable devices to exchange a secret key with the Programmer through a close-range interaction involving the Programmer’s EM wand. A second approach would be to use a magnetic sensor to verify the presence of a magnet on the device, prior to accepting Programmer commands. Other solutions are also possible. I haven’t reviewed the solution SJM ultimately adopted in their software patches, and I don’t know how many users patched.
Implantable devices offer a number of unique security challenges. It’s naturally hard to get these things right. At the same time, it’s important that vendors take these issues seriously, and spend the time to get cryptographic authentication mechanisms right — because once deployed, these devices are very hard to repair, and the cost of a mistake is extremely high.
Last week Apple made an announcement describing changes to the iCloud service for users residing in mainland China. Beginning on February 28th, all users who have specified China as their country/region will have their iCloud data transferred to the GCBD cloud services operator in Guizhou, China.
Chinese news sources optimistically describe the move as a way to offer improved network performance to Chinese users, while Apple admits that the change was mandated by new Chinese regulations on cloud services. Both explanations are almost certainly true. But neither answers the following question: regardless of where it’s stored, how secure is this data?
Apple has strong data privacy and security protections in place and no backdoors will be created into any of our systems”
That sounds nice. But what, precisely, does it mean? If Apple is storing user data on Chinese services, we have to at least accept the possibility that the Chinese government might wish to access it — and possibly without Apple’s permission. Is Apple saying that this is technically impossible?
This is a question, as you may have guessed, that boils down to encryption.
Does Apple encrypt your iCloud backups?
Unfortunately there are many different answers to this question, depending on which part of iCloud you’re talking about, and — ugh — which definition you use for “encrypt”. The dumb answer is the one given in the chart on the right: all iCloud data probably is encrypted. But that’s the wrong question. The right question is: who holds the key(s)?
There’s a pretty simple thought experiment you can use to figure out whether you (or a provider) control your encryption keys. I call it the “mud puddle test”. It goes like this:
Imagine you slip in a mud puddle, in the process (1) destroying your phone, and (2) developing temporary amnesia that causes you to forget your password. Can you still get your iCloud data back? If you can (with the help of Apple Support), then you don’t control the key.
With one major exception — iCloud Keychain, which I’ll discuss below — iCloud fails the mud puddle test. That’s because most Apple files are not end-to-end encrypted. In fact, Apple’s iOS security guide is clear that it sends the keys for encrypted files out to iCloud.
However, there is a wrinkle. You see, iCloud isn’t entirely an Apple service, not even here in the good-old U.S.A. In fact, the vast majority of iCloud data isn’t actually stored by Apple at all. Every time you back up your phone, your (encrypted)
data is transmitted directly to a variety of third-party cloud service providers including Amazon, Google and Microsoft.
And this is, from a privacy perspective, mostly** fine! Those services act merely as “blob stores”, storing unreadable encrypted data files uploaded by Apple’s customers. At least in principle, Apple controls the encryption keys for that data, ideally on a server located in a dedicated Apple datacenter.*
So what exactly is Apple storing in China?
You see, it’s entirely possible that the new Chinese cloud stores will perform the same task that Amazon AWS, Google, or Microsoft do in the U.S. That is, they’re storing encrypted blobs of data that can’t be decrypted without first contacting the iCloud mothership back in the U.S. That would at least be one straightforward reading of Apple’s announcement, and it would also be the most straightforward mapping from iCloud’s current architecture and whatever it is Apple is doing in China.
Of course, this interpretation seems hard to swallow. In part this is due to the fact that some of the new Chinese regulations appear to include guidelines for user monitoring. I’m no lawyer, and certainly not an expert in Chinese law — so I can’t tell you if those would apply to backups. But it’s at least reasonable to ask whether Chinese law enforcement agencies would accept the total inability to access this data without phoning home to Cupertino, not to mention that this would give Apple the ability to instantly wipe all Chinese accounts. Solving these problems (for China) would require Apple to store keys as well as data in Chinese datacenters.
The critical point is that these two interpretations are not compatible. One implies that Apple is simply doing business as usual. The other implies that they may have substantially weakened the security protections of their system — at least for Chinese users.
And here’s my problem. If Apple needs to fundamentally rearchitect iCloud to comply with Chinese regulations, that’s certainly an option. But they should say explicitly and unambiguously what they’ve done. If they don’t make things explicit, then it raises the possibility that they could make the same changes for any other portion of the iCloud infrastructure without announcing it.
It seems like it would be a good idea for Apple just to clear this up a bit.
You said there was an exception. What about iCloud Keychain?
I said above that there’s one place where iCloud passes the mud puddle test. This is Apple’s Cloud Key Vault, which is currently used to implement iCloud Keychain. This is a special service that stores passwords and keys for applications, using a much stronger protection level than is used in the rest of iCloud. It’s a good model for how the rest of iCloud could one day be implemented.
For a description, see here. Briefly, the Cloud Key Vault uses a specialized piece of hardware called a Hardware Security Module (HSM) to store encryption keys. This HSM is a physical box located on Apple property. Users can access their own keys if and only if they know their iCloud Keychain password — which is typically the same as the PIN/password on your iOS device. However, if anyone attempts to guess this PIN too many times, the HSM will wipe that user’s stored keys.
The critical thing is that the “anyone” mentioned above includes even Apple themselves. In short: Apple has designed a key vault that even they can’t be forced to open. Only customers can get their own keys.
What’s strange about the recent Apple announcement is that users in China will apparently still have access to iCloud Keychain. This means that either (1) at least some data will be totally inaccessible to the Chinese government, or (2) Apple has somehow weakened the version of Cloud Key Vault deployed to Chinese users. The latter would be extremely unfortunate, and it would raise even deeper questions about the integrity of Apple’s systems.
Probably there’s nothing funny going on, but this is an example of how Apple’s vague (and imprecise) explanations make it harder to trust their infrastructure around the world.
So what should Apple do?
Unfortunately, the problem with Apple’s disclosure of its China’s news is, well, really just a version of the same problem that’s existed with Apple’s entire approach to iCloud.
Where Apple provides overwhelming detail about their best security systems (file encryption, iOS, iMessage), they provide distressingly little technical detail about the weaker links like iCloud encryption. We know that Apple can access and even hand over iCloud backups to law enforcement. But what about Apple’s partners? What about keychain data? How is this information protected? Who knows.
This vague approach to security might make it easier for Apple to brush off the security impact of changes like the recent China news (“look, no backdoors!”) But it also confuses the picture, and calls into doubt any future technical security improvements that Apple might be planning to make in the future. For example, this article from 2016 claims that Apple is planning stronger overall encryption for iCloud. Are those plans scrapped? And if not, will those plans fly in the new Chinese version of iCloud? Will there be two technically different versions of iCloud? Who even knows?
And at the end of the day, if Apple can’t trust us enough to explain how their systems work, then maybe we shouldn’t trust them either.
* This is actually just a guess. Apple could also outsource their key storage to a third-party provider, even though this would be dumb.
** A big caveat here is that some iCloud backup systems use convergent encryption, also known as “message locked encryption”. The idea in these systems is that file encryption keys are derived by hashing the file itself. Even if a cloud storage provider does not possess encryption keys, it might be able to test if a user has a copy of a specific file. This could be problematic. However, it’s not really clear from Apple’s documentation if this attack is feasible. (Thanks to RPW for pointing this out.)
If you’ve read this blog before, you know that secure messaging is one of my favorite topics. However, recently I’ve been a bit disappointed. My sadness comes from the fact that lately these systems have been getting too damned good. That is, I was starting to believe that most of the interesting problems had finally been solved.
If nothing else, today’s post helped disabuse me of that notion.
This result comes from a new paper by Rösler, Mainka and Schwenk from Ruhr-Universität Bochum (affectionately known as “RUB”). The RUB paper paper takes a close look at the problem of group messaging, and finds that while messengers may be doing fine with normal (pairwise) messaging, group messaging is still kind of a hack.
If all you want is the TL;DR, here’s the headline finding: due to flaws in both Signal and WhatsApp (which I single out because I use them), it’s theoretically possible forstrangers to add themselves to an encrypted group chat. However, the caveat is that these attacks are extremely difficult to pull off in practice, so nobody needs to panic. But both issues are very avoidable, and tend to undermine the logic of having an end-to-end encryption protocol in the first place. (Wired also has a good article.)
First, some background.
How do end-to-end encryption and group chats work?
In recent years we’ve seenplenty of evidence that centralized messaging servers aren’t a very good place to store confidential information. The good news is: we’re not stuck with them. One of the most promising advances in the area of secure communications has been the recentwidespread deployment of end-to-end (e2e) encrypted messaging protocols.
At a high level, e2e messaging protocols are simple: rather than sending plaintext to a server — where it can be stolen or read — the individual endpoints (typically smartphones) encrypt all of the data using keys that the server doesn’t possess. The server has a much more limited role, moving and storing only meaningless ciphertext. With plenty of caveats, this means a corrupt server shouldn’t be able to eavesdrop on the communications.
In pairwise communications (i.e., Alice communicates with only Bob) this encryption is conducted using a mix of public-key and symmetric key algorithms. One of the most popular mechanisms is the Signal protocol, which is used by Signal and WhatsApp (notable for having 1.3 billion users!) I won’t discuss the details of the Signal protocol here, except to say that it’s complicated, but it works pretty well.
A fly in the ointment is that the standard Signal protocol doesn’t work quite as well for group messaging, primarily because it’s not optimized for broadcasting messages to many users.
To handle that popular case, both WhatsApp and Signal use a small hack. It works like this: each group member generates a single “group key” that this member will use to encrypt all of her messages to everyone else in the group. When a new member joins, everyone who is already in the group needs to send a copy of their group key to the new member (using the normal Signal pairwise encryption protocol). This greatly simplifies the operation of group chats, while ensuring that they’re still end-to-end encrypted.
How do members know when to add a new user to their chat?
Here is where things get problematic.
From a UX perspective, the idea is that only one person actually initiates the adding of a new group member. This person is called the “administrator”. This administrator is the only human being who should actually do anything — yet, her one click must cause some automated action on the part of every other group members’ devices. That is, in response to the administrator’s trigger, all devices in the group chat must send their keys to this new group member.
(In Signal, every group member is an administrator. In WhatsApp it’s just a subset of the members.)
The trigger is implemented using a special kind of message called (unimaginatively) a “group management message”. When I, as an administrator, add Tom to a group, my phone sends a group management message to all the existing group members. This instructs them to send their keys to Tom — and to notify the members visually so that they know Tom is now part of the group. Obviously this should only happen if I reallydid add Tom, and not if some outsider (like that sneaky bastard Tom himself!) tries to add Tom.
And this is where things get problematic.
Ok, what’s the problem?
According to the RUB paper, both Signal and WhatsApp fail to properly authenticate group management messages.
The upshot is that, at least in theory, this makes it possible for an unauthorized person — not a group administrator, possibly not even a member of the group — to add someone to your group chat.
The issues here are slightly different between Signal and WhatsApp. To paraphrase Tolstoy, every working implementation is alike, but every broken one is broken in its own way. And WhatsApp’s implementation is somewhat worse than Signal. Here I’ll break them down.
Signal. Signal takes a pragmatic (and reasonable) approach to group management. In Signal, every group member is considered an administrator — which means that any member can add a new member. Thus if I’m a member of a group, I can add a new member by sending a group management message to every other member. These messages are sent encrypted via the normal (pairwise) Signal protocol.
The group management message contains the “group ID” (a long, unpredictable number), along with the identity of the person I’m adding. Because messages are sent using the Signal (pairwise) protocol, they should be implicitly authenticated as coming from me — because authenticity is a property that the pairwise Signal protocol already offers. So far, this all sounds pretty good.
The problem that the RUB researchers discovered through testing, is that while the Signal protocol does authenticate that the group management comes from me, it doesn’t actually check that I am a member of the group — and thus authorized to add the new user!
In short, if this finding is correct, it turns out that any random Signal user in the world can you send a message of the form “Add Mallory to the Group 8374294372934722942947”, and (if you happen to belong to that group) your app will go ahead and try to do it.
The good news is that in Signal the attack is very difficult to execute. The reason is that in order to add someone to your group, I need to know the group ID. Since the group ID is a random 128-bit number (and is never revealed to non-group-members or even the server**) that pretty much blocks the attack. The main exception to this is former group members, who already know the group ID — and can now add themselves back to the group with impunity.
(And for the record, while the group ID may block the attack, it really seems like a lucky break — like falling out of a building and landing on a street awning. There’s no reason the app should process group management messages from random strangers.)
So that’s the good news. The bad news is that WhatsApp is a bit worse.
WhatsApp. WhatsApp uses a slightly different approach for its group chat. Unlike Signal, the WhatsApp server plays a significant role in group management, which means that it determines who is an administrator and thus authorized to send group management messages.
Additionally, group management messages are not end-to-end encrypted or signed. They’re sent to and from the WhatsApp server using transport encryption, but not the actual Signal protocol.
When an administrator wishes to add a member to a group, it sends a message to the server identifying the group and the member to add. The server then checks that the user is authorized to administer that group, and (if so), it sends a message to every member of the group indicating that they should add that user.
The flaw here is obvious: since the group management messages are not signed by the administrator, a malicious WhatsApp server can add any user it wants into the group. This means the privacy of your end-to-end encrypted group chat is only guaranteed if you actually trust the WhatsApp server.
This undermines the entire purpose of end-to-end encryption.
But this is silly. Don’t we trust the WhatsApp server? And what about visual notifications?
One perfectly reasonable response is that exploiting this vulnerability requires a compromise of the WhatsApp server (or legal compulsion, perhaps). This seems fairly unlikely.
And yet, the entire point of end-to-end encryption is to remove the server from the trusted computing base. We haven’t entirely achieved this yet, thanks to things like key servers. But we are making progress. This bug is a step back, and it’s one a sophisticated attacker potentially could exploit.
A second obvious objection to these issues is that adding a new group member results in a visual notification to each group member. However, it’s not entirely clear that these messages are very effective. In general they’re relatively easy to miss. So these are meaningful bugs, and things that should be fixed.
How do you fix this?
The great thing about these bugs is that they’re both eminently fixable.
The RUB paper points out some obvious countermeasures. In Signal, just make sure that the group management messages come from a legitimate member of the group. In WhatsApp, make sure that the group management messages are signed by an administrator.*
Obviously fixes like this are a bit complex to roll out, but none of these should be killers.
Is there anything else in the paper?
Oh yes, there’s quite a bit more. But none of it is quite as dramatic. For one thing, it’s possible for attackers to block message acknowledgements in group chats, which means that different group members could potentially see very different versions of the chat. There are also several cases where forward secrecy can be interrupted. There’s also some nice analysis of Threema, if you’re interested.
I need a lesson. What’s the moral of this story?
The biggest lesson is that protocol specifications are never enough. Both WhatsApp and Signal (to an extent) have detailed protocol specifications that talk quite a bit about the cryptography used in their systems. And yet the issues reported in the RUB paper not obvious from reading these summaries. I certainly didn’t know about them.
In practice, these problems were only found through testing.
So the main lesson here is: test, test, test. This is a strong argument in favor of open-source applications and frameworks that can interact with private-garden services like Signal and WhatsApp. It lets us see what the systems are getting right and getting wrong.
The second lesson — and a very old one — is that cryptography is only half the battle. There’s no point in building the most secure encryption protocol in the world if someone can simply instruct your client to send your keys to Mallory. The greatest lesson of all time is that real cryptosystems are always broken this way — and almost never through the fancy cryptographic attacks we love to write about.
* The challenge here is that since WhatsApp itself determines who the administrators are, this isn’t quite so simple. But at very least you can ensure that someone in the group was responsible for the addition.
** According to the paper, the Signal group IDs are always sent encrypted between group members and are never revealed to the Signal server. Indeed, group chat messages look exactly like pairwise chats, as far as the server is concerned. This means only current or former group members should know the group ID.