Not 2 weeks after my newsletter calling out how people take for granted that nothing bad will happen to their web sites, two of the biggest providers went down yesterday, Amazon and Akamai, in several separate incidents.
And it sounds like one of the Amazon incidents actually involved losing customer data.
Several very popular sites were affected by the outages, including Netflix, Foursquare, Quora, and Reddit.
If you really want to keep your service running in spite of a data center outage, you've got a few considerations to make around confidentiality, data integrity, and availability. With basic shared hosting, you don't get any of these.
Confidentiality. If you use FTP to put your files up on a server, any confidentiality is blown before you get there, including your passwords. If you're on a shared server, confidentiality is also a very tough thing to protect. To get confidentiality, generally you're looking to find a data center with rigorous physical access policies who's willing to stand for an audit of their processes and systems. That's if you're storing data that might be a target for attack. For a huge number of sites, just maintaining encrypted connections to the server during any sensitive transactions might be enough.
Integrity, when it comes to your data, means making sure it's intact. Did your backup get corrupted somehow, perhaps through a virus, an attacker, a hardware failure, or an accident on your part? Backups and redundancy are the key drivers here -- and this is where our services shine -- we've spent a lot of time protecting the integrity of data and providing historical, redundant backups. But the other cost factor with data integrity becomes how much data are you willing to lose? If you're willing to lose a few hours of data, we can generally prevent more data loss than that on one or two servers. Getting that number down generally involves running mirrors and setting up failover -- and you're still not well protected against accidental deletion.
Availability is an area we tend to sacrifice first, for clients with limited budgets. Availability tends to be measured in percentage of uptime per year -- one request we've seen recently is looking for 99.98% uptime, which translates to around 105 minutes per year. We generally have that level of availability if nothing goes wrong -- but Amazon EC2 has already had two outages this year in their US East datacenter that's approaching that number of minutes of downtime. So to feel confident in having less than 2 hours of downtime a year, we would want to configure multiple servers in a failover arrangement so that if the main service got disrupted we could simply point the traffic at the backup, barely skipping a beat. This more than doubles the cost right out of the gate.
The bottom line here is that if your business depends upon your site in any kind of mission-critical way, you need to assess your tolerance for potential loss of confidentiality, data integrity, and system availability, and plan accordingly. And if it involves Drupal, we'd be happy to help!