TL;DR: It’s complicated.

Yesterday Zoom (the videoconferencing company, not the defunct telecom) put out a clarification post describing their encryption practices. This is a nice example of a company making necessary technical clarifications during a difficult time, although it comes following widespread criticism the company received over their previous, and frankly slightly misleading, explanation.

Unfortunately, Citizenlab just put out a few of their own results which are based on reverse-engineering the Zoom software. These raise further concerns that Zoom isn’t being 100% clear about how much end-to-end security their service really offers.

This situation leaves Zoom users with a bit of a conundrum: now that everyone in the world is relying on this software for so many critical purposes, should we trust it? In this mostly non-technical post I’m going to talk about what we know, what we don’t know, and why it matters.

End-to-end encryption: the world’s shortest explainer

The controversy around Zoom stems from some misleading marketing material that could have led users to believe that Zoom offers “end-to-end encryption”, or E2E. The basic idea of E2E encryption is that each endpoint — e.g., a Zoom client running on a phone or computer — maintains its own encryption keys, and sends only encrypted data through the service.

In a truly E2E system, the data is encrypted such that the service provider genuinely cannot decrypt it, even if it wants to. This ensures that the service provider can’t read your data, nor can anyone who hacks into the service provider or its cloud services provider, etc. Ideally this would include various national intelligence agencies, which is important in the unlikely event that we start using the system to conduct sensitive government business.

While end-to-end encryption doesn’t necessarily stop all possible attacks, it represents the best path we have to building secure communication systems. It also has a good track record in practice. Videoconferencing like Apple’s FaceTime, and messaging apps like WhatsApp and Signal already use this form of encryption routinely to protect your traffic, and it works.

Zoom: the good news

The great news from the recent Zoom blog post is that, if we take the company at its word, Zoom has already made some progress towards building a genuinely end-to-end encrypted videoconferencing app. Specifically, Zoom claims that:

[I]n a meeting where all of the participants are using Zoom clients, and the meeting is not being recorded, we encrypt all video, audio, screen sharing, and chat content at the sending client, and do not decrypt it at any point before it reaches the receiving clients.

Note that the emphasis is mine. These sections represent important caveats.

Taken at face value, this statement seems like it should calm any fears about Zoom’s security. It indicates that the Zoom client — meaning the actual Zoom software running on a phone or desktop computer — is capable of encrypting audio/video data to other Zoom clients in the conversation, without exposing your sensitive data to Zoom servers. This isn’t a trivial technical problem to solve, so credit to Zoom for doing the engineering work.

Unfortunately the caveats matter quite a bit. And this is basically where the good news ends.

The “unavoidably bad”

The simplest caveats are already present in Zoom’s statement above. Zoom provides a number of value-added services, including “cloud recording” and support for dial-in telephony conferencing. Unfortunately, each of these features is fundamentally incompatible with end-to-end encryption.

Zoom supports these services in a fairly rational way. When those services are active, they provide a series of “endpoints” within their network. These endpoints act like normal Zoom clients, meaning that they participate in your group conversation, and they obtain the keys to decrypt and access the audio/video data: either to record it, or bridge to normal phones.

In theory this isn’t so bad. Even an end-to-end encrypted system can optionally allow these features: a user (e.g., the conference host) could simply send its encryption keys to a Zoom endpoint, allowing it to participate in the call. This would represent a potential loss of security, but at least users would be making the decision themselves.

Unfortunately, in Zoom’s system the decision to share keys may not be entirely left up to the users. And this is where Zoom gets a little scary.

The “pretty-damn-bad”, AKA key management

The real magic in an end-to-end encrypted system is not necessarily the encryption. Rather, it’s the fact that decryption keys never leave the endpoint devices, and are therefore never available to the service provider.

(If you need a stupid analogy here, try this one: availability of keys is like the difference between me when I don’t have access to a cheescake, and me when a cheesecake is sitting in my refrigerator.)

So the question we should all be asking is: does Zoom have access to the decryption keys? On this issue, Zoom’s blog post becomes maddeningly vague:

Zoom currently maintains the key management system for these systems in the cloud. Importantly, Zoom has implemented robust and validated internal controls to prevent unauthorized access to any content that users share during meetings, including – but not limited to – the video, audio, and chat content of those meetings.

In other words: it sounds an awful lot like Zoom has access to decryption keys.

Thankfully we don’t have to wait for Zoom to clarify their answers to this question. Bill Marczak and John Scott-Railton over at CitizenLab have done it for us, by reverse-engineering and taking a close look at the Zoom protocol in operation. (I’ve worked with Bill and his speed at REing things amazes me.)

What they found should make your hair curl:

By default, all participants’ audio and video in a Zoom meeting appears to be encrypted and decrypted with a single AES-128 key shared amongst the participants. The AES key appears to be generated and distributed to the meeting’s participants by Zoom servers  ….

In addition, during multiple test calls in North America, we observed keys for encrypting and decrypting meetings transmitted to servers in Beijing, China.

In short, Zoom clients may be encrypting their connections, but Zoom generates the keys for communication, sometimes overseas, and hands them out to clients. This makes it easy for Zoom to add participants and services (e.g., cloud recording, telephony) to a conversation without any user action.

It also, unfortunately, makes it easy for hackers or a government intelligence agency to obtain access to those keys. This is problematic.

The good news

From the limited information in the Zoom and Citizenlab posts, the good news is that Zoom has already laid much of the groundwork for building a genuinely end-to-end encrypted service. That is, many of the hard problems have already been solved.

(NB: Zoom has some other cryptographic flaws, like using ECB mode encryption, eek, but compared to the key management issues this is a minor traffic violation.)

What Zoom needs now is to very rapidly deploy a new method of agreeing on cryptographic session keys, so that only legitimate participants will have access to them. Fortunately this “group key exchange” problem is relatively easy to solve, and an almost infinite number of papers have been written on the topic.

(The naive solution is simply to obtain the public encryption keys of each participating client, and then have the meeting host encrypt a random AES session key to each one, thus cutting Zoom’s servers out of the loop.)

This won’t be a panacea, of course. Even group key exchange will still suffer from potential attacks if Zoom’s servers are malicious. It will still be necessary to authenticate the identity and public key of different clients who join the system, because a malicious provider, or one compelled by a government, can simply modify public keys or add unauthorized clients to a conversation. (Some Western intelligence agencies have already proposed to do this in practice.) There will be many hard UX problems here, many of which we have not solved even in mature E2E systems.

We’ll also have to make sure the Zoom client software is trustworthy. All the end-to-end encryption in the world won’t save us if there’s a flaw in the endpoint software. And so far Zoom has given us some reasons to be concerned about this.

Still: the perfect is the enemy of the good, and the good news is that Zoom should be able to get better.

Are we being unfair to Zoom?

I want to close by saying that many people are doing the best they can during a very hard time. This includes Zoom’s engineers, who are dealing with an unprecedented surge of users, and somehow managing to keep their service from falling over. They deserve a lot of credit for this. It seems almost unfair to criticize the company over some hypothetical security concerns right now.

But at the end of the day, this stuff is important. The goal here isn’t to score points against Zoom, it’s to make the service more secure. And in the end, that will benefit Zoom as much as it will benefit all of the rest of us.

 

 

10 thoughts on “Does Zoom use end-to-end encryption?

  1. Matt — I want to score points against people who don’t use Qubes OS, where can you sign me up?

  2. Matt – Have you looked at Wickr? The group I work for is looking at using it as opposed to Zoom due to the E2E systems. Thanks.

  3. All conference systems i used so far have one big flaw: distributing the joining information to the participants.
    On jitsi it is done like this: url meet.jit.si/superSecurePasdphrase same is true for 3cx.

    So whoever intercepts the url can join in. The encryption might be great, but the meeting id is the passphrase/key

    So go figure.

    I’d expect that signal groups would be a starting point for a key management system.

    1. The Jitsi URL includes the room name, not the passphrase. In a web browser on a laptop or desktop, go to the URL you suggested, move your mouse near the bottom-right, and click on the circled i. Some information should pop up, as well as a link to set a passphrase for that room, which I think will persist until the room is empty again.

      I think meet.jit.si gives everyone moderator privileges, but out of the box on Debian, it’ll give moderator privileges only to the first person to arrive in an empty room, and it can be set up to give the ability to create rooms only to authorized people.

  4. I have questions about this: “… we encrypt all video, audio, screen sharing, and chat content at the sending client, and do not decrypt it at any point before it reaches the receiving clients”.

    If they don’t decrypt it, does that mean that each receiving client is responsible for processing up to 49 incoming video streams simultaneously? Or is the meeting host’s client responsible for composing all the video streams into one master stream that gets sent to all the other clients?

    In either case, how many laptops or smartphones on WiFi connections would be capable of handling all that?

    I guess what I’m trying to say is that if Zoom is usable at all with up to 49 HD video streams on screen, they must be decrypting and processing the incoming video streams before sending them to the clients. Unless they’ve come up with some truly unbelievable breakthroughs in fully homomorphic encryption. And that seems unlikely, given that they were using ECB mode.

  5. This was a very helpful explanation. I hope you can take a look at the Google-Apple Bluetooth proximity detector for coronavirus contact tracing.

Comments are closed.