DevOps is the union of development, operations, and quality assurance -- but it's really the other way around. You start with the quality -- developing tests to ensure that things that have broken in the past don't break in the future, making sure the production environment is in a known, fully reproducible state, and setting protections in place so you can roll back to this state if anything goes wrong. Next comes operations automation -- building out operational tools that ensure snapshots are happening, deployments happen in an entirely consistent way, and environments are monitored carefully for performance degradation as well as simple uptime. Finally, you're in a safe place where you can develop new functionality without risking major breakage.
This is the philosophy we employed while building out our DevOps practice. We employ these principles to developing the very process we use to develop new sites as well as additions to existing sites -- as we flesh out one part of our pipeline, the next one becomes easier to get in place.
DevOps as a project
Our best example of a DevOps project is what we have developed internally to manage dozens of production Drupal and WordPress websites. At the center of it all is a chat bot we've written, named Watney. Watney gets input from all sorts of internal systems, including git repositories, configuration management tools, continuous integration tools, project management systems, and developers. It then enables/disabled pipelines and kicks off jobs as needed (and authorized), updates issues, logs activities, and sends notifications as jobs are completed.
Each day Watney triggers a job that checks the environment of every production website we manage. It alerts us of any changes that have not been properly tracked in configuration or code management. It also alerts us to any sites that have changes staged but not yet deployed. As new work is done, Watney kicks off Behavior-Driven Design tests automatically, and reports the results. If the tests pass, the site is automatically marked as "Ok to stage," otherwise it requires attention from a developer. When sites are staged, they are automatically run through visual regression testing, comparing stage sites with production. These often fail due to fresh content on production, different slides visible in carousels, different ads loading -- so we allow a person to approve a test run even if the number of pixels different exceeds the base threshold.
For the actual deployment, Watney assembles release notes from git commits, notes from the developers, cases/issues/user stories included in the release. It takes a fresh snapshot of the database, verifies that all tests have passed or been approved, tags and then rolls out the code. The next day, Watney creates a fresh sanitized copy of the production database to put on stage, resets the pipeline, and bumps the version number
In the process of fleshing out this pipeline, we had to address a lot of requirements and concerns, many of them particular issues around Drupal and WordPress sites:
- How to prevent test sites that integrate with 3rd party APIs from polluting external production databases from test data
- How to make sure we only work with sanitized data
- For WordPress, how to consistently change domain names so the site functions on test and stage environments
- How to deploy configuration changes for Drupal 8's new configuration management system
- How to enable development-only modules on dev sites, and disable for production, and vice-versa
- How to generate privileged logins while maintaining security around the sites
- How to turn pipelines off and on when we reached some hard limits on scaling a pipeline across dozens of sites
- How to improve pipeline performance to be able to push out a fully tested security update to all maintained sites within 4 hours
- How to integrate our pipeline for a variety of hosting environments along with dedicated Drupal hosts, including Acquia, Pantheon, Amazon AWS, Microsoft Azure, Google Cloud Engine, Digital Ocean, and self-hosting
Other Operational systems
Operations is all about systems -- processes and tools to keep things running smoothly, without interruption. Surrounding our core chat bot and deployment process are more typical operations systems -- server monitoring, performance monitoring, ticketing systems, time tracking, billing/invoicing, phone systems, email, file sharing services. At Freelock we have rolled out self-hosted open source systems across the board, along with some custom glue holding it all together.
As a result, we sometimes provide consulting services to other organizations to help them build their own DevOps practices.
If we can help your organization build out a tailored Continuous Integration pipeline, or other DevOps process, let us know!