Where Things Fall Apart: Primitives

This is the second installment in a series on cryptographic engineering.  See here for the Introduction.

In the previous post I argued that cryptographic systems often fail, at least in part because there are so many places for them to go wrong.  Over the next several of posts I’m going to give a few examples.  This section is intended as an overview rather than a deep examination of the issues.

Primitives: Making a call

The fundamental unit of any cryptographic system is the cryptographic primitive — codes, ciphers, digital signing algorithms. Traditionally, attacking a cryptographic system meant attacking the primitive itself. History is littered with broken primitives, including many that were believed to be “unbreakable” until, well, they weren’t.
This isn’t purely a history lesson. Many of these broken primitives are still with us today. To find one, all you need to do is take our your cellphone. The most common cellular technology in use today is GSM (Global System for Mobile), which was developed by a consortium of European telecoms, equipment manufacturers and national governments in the 1980s.
In a move that was very advanced for the time, the GSM designers included a built-in cipher called “A5”, which was designed to protect the confidentiality of calls transmitted between a phone and the cell tower. Call encryption represented a huge improvement over the prevailing analog cellphone technology of the day, which provided essentially no privacy for voice calls (with sometimes embarrassing results).
The A5/1 cipher (courtesy Wikipedia). 

Ross Anderson describes the selection process for A5/1 in his excellent book. By his account, European governments were divided into two factions: those that wanted to strongly protect call confidentiality (mostly those states bordering the Warsaw pact) and other states, such as France, who wanted to simplify government eavesdropping. The weak cipher faction won the day. Worse, they created an even weaker variant (A5/2) for export purposes.

One of the notable aspects of A5/1 is its simplicity. It was clearly designed with efficiency in mind, particularly efficiency in hardware. As with many hardware-optimized ciphers, it consists of several interlinked Linear Feedback Shift Registers (LFSRs). While it may be possible to build a secure cipher using this technique, LFSR-based ciphers seem to be overrepresented in the ranks of broken ciphers. Unfortunately A5/1 is no exception.

For a number of years there were no public attacks against A5/1, but that’s mostly because nobody knew how it worked. The cipher design was a well-guarded secret. This changed in the mid-1990s when the full cipher was published after having being reverse-engineered from a GSM phone.

By 1997 the first attacks had been published in the research literature, and over the next decade these attacks were gradually refined. By 2006 the weakened A5/2 cipher was essentially broken, and academic attacks on A5/1 had reached a level of practicality that put them within reach of even private groups. Finally, in 2009, security researchers published and began distributing via BitTorrent a series of pre-computed tables that allow for the complete decryption of a recorded, A5/1 encrypted phone call in just a few minutes.

To a motivated adversary, calls encrypted with A5/1 are essentially cleartext.

So why has A5/1 stuck with us? There are many answers to that question. Standards tend to be “sticky” once they achieve widespread adoption. Unless there’s a viable and cost-effective plan for incrementally renewing security, it can be almost impossible to convince to move the herd. Complicating matters, many GSM phones implement A5/1 in hardware. And while software can be updated, hardware is forever.

Thus, with an installed base of millions of GSM phones (and tower equipment) throughout the world, it’s unlikely that the primitive will be replaced anytime soon. Instead, we’ll gradually drift to to a newer technology, such as the UMTS “3G” standard which employs a new, proprietary block cipher called Kasumi, or A5/3. That is, assuming Kasumi makes it that long.

Misusing Primitives

Sometimes the problem isn’t with the primitive per se, but rather in the way it’s used. There are many, many examples of this. A classic example is the Wired Equivalent Privacy (WEP) protocol that used to be the standard for securing WiFi access points.

WEP was built around Ron Rivest’s RC4 stream cipher, developed in the late 1980s. These days security professionals are deeply suspicious of RC4, mostly because it’s non-standard and because there are some theoretical attacks against it. Still, as of today none of these attacks are quite practical… as long as RC4 is used correctly.

RC4 is a stream cipher, which means that, once you set it up with a key it deterministically spits out a stream of pseudo-random bits. To encrypt you use the exclusive-OR (XOR) function to merge those bits (called a “keystream”) with your message. For reasons that we’ll discuss later, it’s very important the encryptor should never use the same keystream to encrypt two different messages. To avoid this problem, WEP uses a slightly different RC4 key for each data packet it sends.  These keys are derived by simply concatenating a fixed 40-bit or 104-bit shared key with the packet’s sequence number, also known as an “Initialization Vector”. Using || to indicate concatenation, the keys look like this:

 Packet 1 RC4 encryption key = 00001 || {WEP secret key}
 Packet 2 RC4 encryption key = 00002 || {WEP secret key}

At the time WEP debuted nobody had seriously considered the implications of setting RC4 up this way. However, in the late 1990s a trio of researchers named Fluhrer, Mantin and Shamir took a closer look. Their result? It turns out RC4 is devastatingly insecure in this configuration. Essentially what they showed is that when RC4 is used to encrypt different messages with keys that are mostly the same (but only differ in, let’s say, the last few bytes), the resulting outputs are statistically correlated. Worse — much worse — given a large enough collection such ciphertexts it’s actually possible to recover the shared portion of the key. In the case of WEP that means recovering the shared WEP secret key used by all users, essentially breaking WEP altogether.

Many of the vulnerabilities in WEP have been addressed in the WPA2 spec, which replaces RC4 with the much more standard AES cipher, and adds some other important features such as message authentication. For those legacy devices whose hardware couldn’t support AES, the WPA designers also proposed a band-aid fix called TKIP. TKIP’s an interesting case study in design compromises — but that’s a story for another post.

The Great Hash Function Adventure

Sometimes you do everything right — and you still wind up in trouble. Consider the sad story of the MD5 hash function. Once upon a time MD5 was king of the world — it was the standard used for digital signatures, for hashing software packages, and anything else you can think of. Not only was it designed by professionals, it was rigorously tested and validated by no less than the National Institutes of Standards and Technology.

Today MD5 is like zucchini. If you have it, you really want to get rid of it.

If you’re not familiar with hash functions, you probably don’t realize how critical and pervasive they are in cryptographic systems. Briefly, a hash function takes a large file and reduces it to a small, fixed-size “digest” value. If you routinely download software from open source repositories you might be accustomed to checking a 20-byte SHA1 checksum to make sure you’re not getting the wrong file.

Hash functions are also used to make digital signatures more efficient — by hashing down large files so we can sign the resulting small digest, rather than the original file itself. This works because cryptographic hash functions are assumed to be “collision resistant”; namely, it’s hard to find two files A and B such that A and B both hash to the same digest. In principle if I give you a signature on the hash of A, it should be just as good as signature on A itself.

But consider what would happen if that guarantee broke down. Say Alice wants to obtain an SSL certificate on her domain name “www.alice.com”. A digital certificate is nothing more than a file containing Alice’s identity and key, all signed by some trusted authority like Verisign. Imagine now that Alice is evil: she’s come up with one file that says she owns “www.alice.com”, and a second file that says she owns “www.bankofamerica.com”.

Moreover, because Alice has found a collision in the hash function, both of these files hash to the same value.  And Verisign doesn’t sign the files directly, it signs the hashes.

Thus if Alice can get Verisign to sign the hash of that harmless “www.alice.com” file, then she’s got a signature that also works for her malicious file as well. This is very bad, at least if you’re a Bank of America customer.

Thus, when in 2004 a team of researchers announced that they’d discovered collisions in the MD5 hash function, security professionals took it very seriously. Fortunately, the attack described above requires Alice to come up with not just any collision (i.e., two arbitrary files that hash to the same thing), but what we call a “meaningful” collision — those two values must each contain valid certificate data. Surely it wouldn’t be so easy to find such meaningful collisions!

Unfortunately, it turns out that the construction of the MD5 hash function has a particular property that makes it easy to “extend” one collision into many different collisions, some of which can be meaningful. Within a year, researchers had shown how to construct colliding software files, colliding PDF files, and most importantly, colliding digital certificates. The latter attack was particularly significant, due to the attack described above.  A malicious individual could request a signed SSL certificate on “www.my-pony-blog.com” and instead come away with a valid certificate for “www.microsoft.com”.  Or worse, a certificate designating them as a Certificate Authority, with the right to create new certificates on any domain!


 The “supercomputer” used to find colliding digital certificates
(yes, it’s a bunch of Sony PS3s).  From the link above.

In the short term, the solution is to remove MD5 wherever it’s used. That would almost be enough, but it seems that the next candidate for collision finding is the widely-used SHA1 hash function. Fortunately NIST, the US organization responsible for certifying cryptography, is literally two steps ahead of this one. They’ve already begun a competition to choose a new hash function that will someday be designated SHA3.

So what if you’re using all the right primitives?  In the next sections we’ll discuss another way that security systems can go wrong: bad cryptographic protocols.

One thought on “Where Things Fall Apart: Primitives

Comments are closed.