Meltdown notes

The Meltdown vulnerability leaked out into public news a full week before patches were available for many distributions. When patches did become available, sometimes the patch caused further trouble.

Our vulnerable systems

Before patches were available, we downloaded the Proof-of-Concept exploit code, compiled and tested it on a variety of the environments we work in, or have in production.

Here's a quick run-down of what we found affected, and what was not:

Environment	Affected?	Notes
Local 16.04 workstation	Yes	Exploit ran quickly and reliably.
Dev server - 14.04 virtual machine, 10 year-old hardware	Yes -- but slow	Exploit ran, and revealed information -- but unlike our workstations, the information dripped out a character at a time, with some errors.
Amazon AWS servers	Yes -- but slow	Similar to our own virtual servers, exploit code ran and revealed secrets, but slowly and with errors. Amazon had already patched the underlying hosts.
Google Cloud Engine	No	Google's Project Zero team was one of the groups that discovered the exploit, and Google has deployed something on their infrastructure that seemed to completely foil this attack. The attack printed a bunch of garbage characters, no actual clear text.
Digital Ocean	Yes	The exploit ran perfectly, and very quickly, within our Digital Ocean guests.

We did not attempt to exploit other guests on the same hardware -- all our testing was exploiting Meltdown within a single virtual (or dedicated) host.

What happened when we patched

Most of our infrastructure uses Ubuntu LTS (Long-Term Support) releases. Ubuntu published patches for Meltdown on Tuesday January 9, the original coordinated disclosure date. We updated our older 14.04 servers to use the 16.04 HWE kernel, and deployed Ubuntu's 4.4.0-108-generic pretty much across the board, aside from some hosts that used the AWS-specific kernel. We installed these updates on Tuesday afternoon, and rebooted all our hosts that evening.

For the most part, everything went very smoothly. However, we had 2 incidents:

One of our continuous integration workers failed to boot into the new kernel. This was dedicated hardware, in our office, and we did not have a remote console available -- which essentially made all our overnight scheduled maintenance jobs fail. This was fixed by a kernel release the following day, for 4.4.0-109-generic.
Our configuration management server (Salt) ended up getting extremely high loads whenever a git commit got pushed.

Meltdown is an attack on how the CPU schedules work, and patches for it essentially disables processor features designed to speed up computing. Most sources suggest there is a 5% - 30% degradation in CPU speed after patching for Meltdown -- highly dependent on workload.

For the most part, we're not noticing big slowdowns, with the one exception of our Salt events.

Salt event.fire_master performance devastation

We've spent a lot of time automating our systems, and have a variety of triggers hooked up. Once we decide upon a trigger, we will often publish events in various systems, so that at some point in the future if we decide to use them, they're already there running. One of those is a git post-update hook -- whenever anyone pushes any git commits to our central git server, we publish an event in several different systems that any other system can subscribe to, and take action.

In our SaltStack configuration management system, our bot uses "salt-call event.fire_master" to publish a system-wide Salt event. At the moment, we have a "Salt Reactor" listening for these on a few of our repositories, but for the most part these end up entirely ignored. And our Salt Master was ending up with a load north of 20 - 30, with a bunch of these event triggers stacked up.

When you run the event command in a shell, it normally fires and returns within a second or so. However, with the kernel patched for Meltdown, the same exact command would take 2 - 3 minutes before the shell prompt re-appeared -- even for repositories that had nothing subscribed to the event! Worse, our bot uses node.js to trigger these events, and in that environment it was taking more like 15 - 20 minutes before it timed out and cleaned up the process. And with commits happening every minute or two, the CPU load quickly started climbing and triggering all sorts of monitoring issues.

Meltdown

Security

Updates

Recovering from attacks

Preventing Attacks

Add new comment

Your name

The content of this field is kept private and will not be shown publicly.

Homepage

Notify me when new comments are posted

All comments

Replies to my comment

Comment

About text formats

Filtered HTML

Web page addresses and email addresses turn into links automatically.
Allowed HTML tags: <a href hreflang> <em> <strong> <blockquote cite> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h1> <h2 id> <h3 id> <h4 id> <h5 id> <p> <br> <img src alt height width>
Lines and paragraphs break automatically.

About the Author

John Locke

John Locke is the lead developer and founder of Freelock, LLC. In addition to being a proficient web developer, he is an experienced technical writer, network administrator, and all around problem solver. He has worked with computers since 1984, and currently advises small businesses on open source software.

Unleashing the power of ECA: No-code coding for ambitious site builders

/ 0 Comments

Website Availability - handling an outage

/ 0 Comments

Containerless Dev environments for Drupal development with Nix

/ 0 Comments

Adding power to your shell - my home manager configuration

/ 1 Comments

Creating Product Bundles in Drupal Commerce

/ 0 Comments

Rate Limiting an aggressive bot in Nginx

/ 2 Comments

Deploying blocks and content to other site environments

/ 0 Comments

A new approach to Drupal theming, just in time for Drupal 10

/ 4 Comments

What We've Done

City of Federal WaySite Description

The federalwaywa.gov website is the main site for the City of Federal Way in Washington State. This site is a resource for residents, visitors, businesses, and people interacting…

What makes websites work

Our vulnerable systems

What happened when we patched

Salt event.fire_master performance devastation

Add new comment

Filtered HTML