Patching is hard; so what?

It’s now been about a week since Equifax announced the record-breaking breach that Equifax-Genericaffected 143 million Americans. We still don’t know enough — but a few details have begun to come out about the causes of the attack. It’s now being reported that Equifax’s woes stem from an unpatched vulnerability in Apache Struts that dates from March 2017, nearly two months before the breach began. This flaw, which allows remote command execution on affected servers, somehow allowed an attacker to gain access to a whopping amount of Equifax’s customer data.

While many people have criticized Equifax for its failure, I’ve noticed a number of tweets from information security professionals making the opposite case. Specifically, these folks point out that patching is hard. The gist of these points is that you can’t expect a major corporation to rapidly deploy something as complex as a major framework patch across their production systems. The stronger version of this point is that the people who expect fast patch turnaround have obviously never patched a production server.

I don’t dispute this point. It’s absolutely valid. My very simple point in this post is that it doesn’t matter. Excusing Equifax for their slow patching is both irrelevant and wrong. Worse: whatever the context, statements like this will almost certainly be used by Equifax to excuse their actions. This actively makes the world a worse place.

I don’t operate production systems, but I have helped to design a couple of them. So I understand something about the assumptions you make when building them.

If you’re designing a critical security system you have choices to make. You can build a system that provides defense-in-depth — i.e., that makes the assumption that individual components will fail and occasionally become insecure. Alternatively, you can choose to build systems that are fragile — that depend fundamentally on the correct operation of all components at all times. Both options are available to system designers, and making the decision is up to those designers; or just as accurately, the managers that approve their design.

The key point is that once you’ve baked this cake, you’d better be willing to eat it. If your system design assumes that application servers will not contain critical vulnerabilities — and you don’t have resilient systems in place to handle the possibility that they do — then you’ve implicitly made the decision that you’re never ever going to allow those vulnerabilities to fester. Once an in-the-wild vulnerability is detected in your system, you’d damn well better have a plan to patch, and patch quickly. That may involve automated testing. It may involve taking your systems down, or devoting enormous resources to monitoring activity. If you can’t do that, you’d better have an alternative. Running insecure is not an option.

So what would those systems look like? Among more advanced system designs I’ve begun to see a move towards encrypting back-end data. By itself this doesn’t do squat to protect systems like Equifax’s, because those systems are essentially “hot” databases that have to provide cleartext data to application servers — precisely the systems that Equifax’s attackers breached.

The common approach to dealing with this problem is twofold. First, you harden the cryptographic access control components that handle decryption and key management for the data — so that a breach in an application server doesn’t lead to the compromise of the access control gates. Second, you monitor, monitor, monitor. The sole advantage that encryption gives you here is that your gates for access control are now reduced to only the systems that manage encryption. Not your database. Not your web framework. Just a — hopefully — small and well-designed subsystem that monitors and grants access to each record. Everything else is monitoring.

Equifax claims to have resilient systems in place. Only time will tell if they looked like this. What seems certain is that whatever those systems are, they didn’t work. And given both the scope and scale of this breach, that’s a cake I’d prefer not to have to eat.

5 thoughts on “Patching is hard; so what?

  1. Hi Matt,

    Thank you for the blogpost, I have a couple of remarks, though:

    1. Patching of production systems might be hard indeed, but do it often and you get good at it (some automation might help you out).
    The main problem is the systems that run on the servers being not as flexible as they should be. Old legacy monoliths make this painful, but newer lightweight microservices are much easier to maintain. There are two main parts in this patching process:
    – Keeping the OS and middleware up to date, this should be much easier if applications running on the systems can be easily brought down and back up. Doing this often avoids major incidents due to the patches, leading to minor impact in both time and effort (again, automating this process will get crucial over time). This is for example something we are working on in our company. (avoiding WannaCry and Petya type of attacks)
    – Keeping the libraries and frameworks within the application up to date is much harder. However, in a continuous deployment cycle tools can be used to check versions of libraries, it is also possible to check against CVEs for example. Small upgrades make this process not as hard as when you have to skip 5 major versions of a framework. I agree, this is also in the assumption your application is flexible enough and part of a CI/CD lifecycle, or what I call continuous security. (avoiding Equifax type of attacks)

    Again, before starting a crazy polemic, I make some assumptions here about the applications. We all have to deal with old legacy systems which makes this all hard and painful, but well, if you wanna get secure, better start investing in upgrading/replacing these systems one by one, small steps at a time.

    2. I agree that limiting access to the data through an encryption system seems a good idea, but this leads to a bunch of other problems. For one, how do you keep your data easily searchable? And it seems this might impact general performance quite a bit. On top of that, this seems to introduce a complexity and operational overhead that might be as hard as getting patch management right. Nevertheless, I agree that security in depth is the way to go and depending on the criticality of the data this solution is a very good investment to make.

    In general, I agree with these people saying this is a hard problem, but I don’t think it is impossible like some say. Companies need the right attitude, focus, and priorities to actually do something about this. And I know how hard that is, but go talk with management, tell them the risks, tell them resources need to be devoted to good patch management. Ask for time to maintain your applications, they won’t maintain themselves. And use these type of breaches to convince people that it is a necessary evil. And maybe most important, don’t tell people it can’t be done, everything is possible given priority and at a certain price.

    Best,
    Gijs

  2. The company for which I work is still vulnerable to the Apache struts vulnerability. Our externally-facing retail site — which captures millions of credit card details — is vulnerable. It’s one of many vulnerabilities of which I’m aware, but it’s the most serious.

    I work in the IT security team. The developers and operations teams don’t want to know, because it’s a lot of work to run the patch through development and pre-production environments. The scheduling, post-verification testing, regression testing, etc. are tough, and they don’t have the resources. Even shutting down the application gracefully takes a team of people, not to mention handling the system that monitors that application.

    The business don’t want to know, because it means downtime to patch, which means possible lost revenue. Our compliance and risk teams don’t want to know, because they can’t go up against the business. Risks cannot be pinned on anyone; there is no risk acceptance process, no individual’s name goes against anything.

    All arguments fall on deaf ears. Compliance — we’re amazingly PCI compliant — arguments don’t work, and we’re even currently being audited against our patch/vulnerability management standards. We somehow always pass, despite my monthly report showing that we have critical vulnerabilities in our credit card processing environment that are well over 365 days old. We have celebrated the fact that we’re PCI compliant — a great win for our CIO — so much that any mention that we’re actually not compliant if you read the PCI clauses is shot down. “We must be compliant!”, they scream, “We have a certificate”.

    I’ve set up a meeting with the business to discuss patching this vulnerability. I’ve sent them links to the latest and greatest breach. I’ve rated the risk against our enterprise risk management matrix. It’s high.

    Will we patch? I doubt it. We need to be breached in order for something to happen, and it needs to make the headlines. I’m quietly hopeful that it’ll happen some day.

    1. That’s very relatable, it is a struggle sometimes and compliancy is often a joke in comparison with real security.
      – Are you sure you haven’t been breached yet? It might be quite easy for attackers to infiltrate quietly. Maybe go through logs and look for the attack signature.
      – What I did is run a POC of an attack, there is a small exploit library to help with that. I didn’t go as far as actually getting data out, but that would have been fairly simple in our case. Not sure if that would help, but worth a try?

      But I get your struggle, if no one is responsible, this is a damn hard problem.

      Best,
      Gijs

  3. This is a multiply million (or billion) company with many employees and executives with titles like CSO, CIO, CPO, … and vast resources in terms of expertise at their disposal. There is nor reason to think that they weren’t using the latest, standard computer security technology (why no encryption?) available and yet no one at the company notice vast amount data/packets leaving their network. Are we using similar technologies that gives us a false sense of security? Was it just complete negligence the employees from the top down?
    This why it’s important to make what happened at Equifax public so other companies can update/adjust their security posture.

  4. Yes, patching is critical, but patching is just one layer of a defense-in-depth strategy. Many reverse proxy / load balancers (such as F5) are able to block many attacks, including CVE-2017-5638 by applying regex tests against incoming HTTP/HTTPS requests. Something like that could have been put in place very quickly while the development team went to work on the QA process for the patch. As mentioned in the blog post, some other application layer should have been between the web site and the database back end sanity-checking the types of requests and the rate of requests. A web site like this should be talking to hardened middleware, not a database.

    We will always have human error and management error as part of the production environment equation. By not relying on a single point of failure, we give ourselves more slack to deal with these issues.

Comments are closed.