Yesterday the Drupal security team gave a dire warning about extremely dangerous security vulnerabilities in multiple contributed modules. The fixes, and the details, would be released at 9am Pacific Time today.
I dropped what I was doing and started going through our customer sites, making sure they were all clean and ready for these updates when they were released.
When 9:00 this morning struck... it turned out to be completely anti-climactic. It wasn't the dozen widely used, highly vulnerable modules I was prepared to update on 60 different Drupal sites. It turned out to be... Three. None in our regular module selection.
Even so, I sprang into action. I went into our "configuration management" system, and did a search across the 39 servers and workstations we currently manage. Of the 3 modules that were affected, I found:
- Coder, which had the most dangerous exploit, on 4 sites.
- Restws, the other highly critical exploit, on 1 site.
- webform_multifile nowhere on any site we've touched in the past two years.
The coder exploit is by far the most dangerous, as it can be exploited even if you have this module completely disabled. I found it on two OpenAtrium sites we manage, and quickly removed. I also found it on our main site, freelock.com -- where I could do a test to see if it was even exploitable.
And... due to the way we have our production web servers set up, it was not possible to exploit on our servers, because we do not allow any PHP execution in subdirectories on any Drupal sites. Which means that even if you had this kind of dangerous code present on your server, if we set it up for you in the past year, you're completely immune to not just this attack, but all attacks of this type!
How we can rush out an update to 60+ Drupal sites better than you can
It all starts with preparation. Every site we work on, we know what modules are running, we set up "visual regression tests" so we can see if a single pixel changes on a list of specific pages on your site after an upgrade, and we routinely check the production environment for anomolies that might cause problems when we need to rush something out in a hurry. We also make sure we have solid nightly backups of databases, files, and code if something truly goes awry (or your site gets hacked through some previously unknown vulnerability).
Before we can provide adequate maintenance
Most of the sites we maintain are not sites we built -- they are sites that others built, and the customers realized they need ongoing support, or the peace of mind that somebody has their back in situations like these. Most sites we build go straight on our maintenance plan -- with our Base Business Site this is included for the first year.
For other sites, we highly recommend our Drupal Site Assessment. When we do a site assessment, we compare every module in use on the site with the original state, so we can determine whether a prior developer changed it in ways that will likely break on an upgrade. Based on the site assessment, we will recommend a budget for getting the site fully cleaned up and up-to-date so that future security updates can be rolled out without catastrophe. We also set up the site in our "continuous integration" system, complete with setting up a base set of tests that will run on every update so we can catch things that break before they reach production.
Preparing for a big rollout
This is what I did to prepare for what could have been a much bigger deal today:
- Made sure our inventory of maintenance sites was current, up-to-date
- Checked each production site to make sure it is "clean" -- e.g. if there are any changes on production that might get clobbered with a release (we're on the verge of having this step automatically done, reporting unclean sites to us on a nightly basis)
- Created a spreadsheet to index the current sites by version, server, server software, so I could keep track of sites that might have one-off affected modules, and could triage appropriately once I learned what the vulnerabilities were
- Made sure our "continuous integration/deployment" systems were up and running and ready for a major workout!
Triage -- What happens when the vulnerabilities and fixes are announced
We have several tools all set up and ready to help us decide what or where we should focus our efforts first. These include:
- Our "upstream" git repository, using gitolite. We curate a bunch of contributed modules here -- if any of these need an update, we can update it here and quickly pull it out to all of our sites, for Drupal 6, 7, or 8.
- Salt, a server management tool that is the backbone of our server maintenance plan, and allows us to search across our entire managed infrastructure. This ended up being the only tool we needed today!
- Matrix, a chat service, using Vector.im. This has become a crucial tool for us to manage all our projects, and we have our own custom bot that integrates it with our build system/test runner, Concourse CI. A full test run on any of our sites can take 10 - 15 minutes. Our bot tells us in Matrix when the tests are complete, provides a link to the test results, and lets us trigger the actual deployment to most of our production sites.
- The spreadsheet we generated before this round of updates.
On this spreadsheet, I added a column for each vulnerable module, conducted a system-wide search for each module, and marked which sites had the affected module.
Then I reviewed the security notices themselves -- already able to ignore the modules we don't have anywhere.
For each vulnerability, my biggest concerns are potential data loss (SQL injection-type of vulnerabilities, like DrupalGeddon) and "Remote code execution" flaws that might allow an attacker to gain a foothold on the server.
For each of these, I then analyze whether the other security measures we have in place provide effective protection. On our servers, we have some very effective measures against remote code execution -- the web server itself cannot write to anywhere that code can execute, so if it's exploited via an upload, our servers protect for that. In the case of Coder, a different configuration we have in place prevents all direct access of PHP outside the webroot, and this proved to be effective on freelock.com.
Once I've done that analysis, I know which sites are most vulnerable, and to which vulnerabilities. At that point I order the list of sites for those I deem most at risk, and group them by version and production server environment.
List in hand, I can go on our dev servers and for any of the sites that are "clean" (e.g. don't actually have work in progress on them) I can apply the updates in about a minute, including creating the release and triggering the tests.
By the time I've applied all the updates, the test results start filtering in -- I go to Matrix, and review the results one at a time. If they look good, I add a note and trigger the deployment to production. If they don't look good, I send over to a developer to figure out why.
Then one more pass: I go back through the list in Matrix, where our bot kindly drops in the list of changes and whether or not it deployed successfully, paste the changelog into our Atrium project management system and notify the customer it's complete, or proceed with deploying manually if the automatic deployment failed (necessary for some sites hosted on shared hosts, or at Acquia or Pantheon which we haven't entirely scripted yet).
We were entirely ready for a dozen different security updates today, to deploy across 60 sites. All of the sites on our maintenance plan would have had the updates applied, with testing, within 3 - 4 hours of the announcement. We've spent a good part of the past couple years getting systems in place to make this rapid response a reality, and to make it so we can maintain this response time with 200 or more sites, with perhaps one or two more people on the team to help during these crunches.
Other companies provide tools so you can manage this type of upgrade on your own site, if you have the skills and knowledge. I don't know of any who offer the kind of testing we do with such a fast response time at such an affordable rate.
Let us know if you'd like us to handle your maintenance, keep your site protected and up-to-date so you can rest easy when other site owners are scrambling to make sure they are safe!