When choosing any service provider, a crucial question is, "What happens if something goes wrong?" When you're choosing a hosting provider, we like to dig a bit deeper, and ask what risks are likely to be an issue for you?
Here are some of our questions:
- How easy is it to hack?
- How easy is it to recover?
- How much downtime might you have?
- How much data might you lose?
- How much traffic can you handle?
- How much access do you have?
- What steps are involved in handling more traffic than you can handle today?
- What regulatory compliance needs do you have?
- Who is maintaining the site, OS, and supporting packages?
- How much support do you need?
- What is your budget?
Before drilling into these areas, let's quickly go over the basic types of hosting:
- Shared Hosting - Your site is one of many running on a large server. This is what you get when you go sign up for a "cheap" hosting plan. You generally get FTP access and a control panel -- if you're lucky there is "shell" access (SSH).
- Dedicated Virtual Private Server (VPS, or Cloud Server) - This is generally what most of our customers use, a private server that only runs your stuff. Generally you have full root access and the ability to run whatever software you like.
- Platform hosting - For Drupal this is hosting like Pantheon or Acquia -- infrastructure built up to allow you to manage your site, with tools to assist with deployment, multiple environments, and often abstracting away the hardware or virtual servers supporting the site.
- Dedicated hardware ("Bare metal") - This is the "old school" co-located server hosting that was very common before the "Cloud" became such a big deal -- you lease a server in a datacenter, or purchase one and lease a connection in the data center.
- Load-balanced, distributed system - Beyond a certain level of traffic, a single server cannot handle everything. Or, if any downtime means significant loss of business, you need redundant systems.
Yet another dimension to consider is the level of management you're getting, and the tools you have available to help you (or us) manage your site and server. This varies highly by the host.
Now let's take a look at how these stack up when we look at the various areas of risk:
How easy is it to hack?
We see hacked sites all the time. This single question basically eliminates most shared hosting from the discussion -- when the standard hosting package puts hundreds of sites on the same server, your site might get "infected" by a worm installed by any of those other sites. Or, if anyone on your team uses FTP to copy data up to the server, the username and password can be easily sniffed by any "man in the middle" or a script kiddie on that same coffeeshop wireless network.
Even if you take reasonable precautions, and have a security-conscious host, most shared hosts run your site with the same user account you upload files with -- meaning that somebody exploiting a vulnerability in Drupal or some other library can plant a malicious file and run it remotely, and from there attack the server itself, sniff out all traffic to your site, do whatever they want -- your site is owned by the attacker.
This really eliminates the vast majority of Shared Hosting for any site that you don't want to be easily hacked.
At Freelock, our shared hosting is not like this -- we don't run FTP anywhere, we don't even allow customers to log into production servers, and all executable code is locked down in a way that the web server cannot write to places where code can run -- an extra layer of security that hardens the site, makes it much harder for an attacker to get in.
This is the same configuration we put on all the servers we manage for our customers, too. So, scorecard:
|Shared hosting||Veto (-20)||Most shared hosting has really bad security, and no way to harden your site.|
|Managed VPS/Bare Metal||3||If you get a VPS managed by a host, typically they provide a control panel and stack that cannot easily be hardened.|
|Platform Hosting||5||The Drupal-specific platform hosts are generally hardened.|
|VPS, Bare Metal||5||Unmanaged VPSs or Bare Metal can be configured with a hardened stack and made secure.|
|Freelock Shared Hosting||5||We harden our stack!|
How easy is it to recover?
Even the most secure systems sometimes have outages. Hard drives fail. Power supplies fail. Controllers fail. Passwords get compromised. Data gets corrupted.
How easy is it to recover? Depends entirely on your backups -- and the nature of the incident.
We generally recommend having two entirely different backups to cover all the risks. And with Drupal, you have 4 different components to consider:
- Site code -- Drupal core and contrib modules.
- Site assets -- all user-generated files, uploaded photos/images.
- Site configuration -- in Drupal 8, exported configuration. In Drupal 6 - 7, generally features.
- Database -- your content, and generally a lot of configuration.
So being able to recover effectively means you need to prepare for all of the risks. Hardware failure is generally the easiest risk to recover from. A corrupt or accidentally deleted file that you only discover weeks or months later is perhap the hardest.
At a minimum, you should have a full server snapshot taken on a nightly basis, and keep a few of those around. We generally keep nightlies for 1 week.
A server snapshot can be spun up in a matter of minutes, to recover from the previous night. Extracting a single item starts to become a forensics activity, and is much harder to do, but if you have the data somewhere, it at least becomes possible.
So we couple our nightly snapshots with historical file-based backups that go back 16 months. And we keep this secondary backup system at an entirely different provider, so we have a copy of the data if a vendor shuts off your access entirely.
In short, most hosts (other than Shared Hosting) provide some sort of image snapshot capability -- make sure you configure and use them! Here's a quick run-down of ones we have current experience with, and a rating to reflect how easy to set up, easy to use, and how powerful/flexible the system is:
|Most shared hosts||-5||Most shared hosting does not provide any auto-backup facility.|
|Amazon EC2 (AWS)||4||Must be scripted -- Snapshots may be easily taken on demand at any time. We provide a custom script to take snapshots nightly. Can also leverage S3 versioned files to maintain historical backups, Content Delivery Networks, and more.|
|Google Cloud Engine (GCE)||3||Like AWS, these backups must be scripted, and we have a custom script for doing GCE snapshots.|
|Digital Ocean||3||Snapshots may be enabled when the image is created, for an extra charge. Automatic snapshots run on DO's schedule, you have no ability to trigger a snapshot on your schedule (unless you stop the server first).|
|Rackspace Cloud||5||Easy automatic backups, easy on-demand backups|
|Pantheon||5||Automatic and on-demand snapshots, very easy to configure. Can be downloaded with a shell tool.|
|Acquia||5||Automatic and on-demand snapshots, very easy to configure. Can be downloaded with a shell tool.|
|Pair.com||4||We have several clients at Pair, and they have a decent backup system, but generally requires interacting with their support to set up and actually use.|
Generally with proper planning and extra work, any host can have proper backups set up. AWS is more work to set up properly, but extremely flexible with lots of options.
How much downtime might you have?
This roughly breaks down into several different kinds of downtime:
- Downtime due to platform maintenance, upgrades
- Downtime due to site maintenance
- Downtime as a result of system failure, time to recover from backups
For regular maintenance, and upgrades, we have to give the crown to Pantheon. We have several clients at Pantheon, and never noticed any downtime at all.
Acquia, on the other hand, seems to have frequent maintenance windows that involve brief amounts of downtime.
Our own Freelock server management also generally involves server reboots every few weeks that lead to a minute or two of downtime, and Docker container restarts that take a few seconds.
Otherwise if you're on servers that never reboot, it is opening up the risk of running old server software that may contain more vulnerabilities.
Drupal site updates
Downtime due to Drupal site upgrades generally involves the time period from when the code is updated and the database update scripts applied -- and if something fails, the time it takes to recover. Our Drupal site maintenance provides testing of all updates before rolling out to production. Pantheon forces you through a deployment path that gives you the opportunity to test before rolling out, but doesn't test for you. Acquia also provides multiple environments for testing, but does not force you to use them.
Otherwise you're on your own.
Recovery from hardware failure
Ultimately a "Cloud Server" is a shorthand way of saying "Somebody else's computer." At some point, there's a hard drive, a controller, a powersupply, a motherboard, RAM, and physical network equipment. And hardware fails sometimes.
There are more serious risks to consider as well -- vendors go away without notice, accounts get shut down, stuff gets corrupted and hacked -- but hardware failure is relatively common and routine, and one of the first risks you need to have covered.
So what does recovery from a local snapshot backup look like?
|Most shared hosts||-5||Never||What snapshot backup?|
|Freelock Shared Hosting||3||1 - 2 hours||We use cloud VPS providers for all our production hosting, and configure multiple backups.|
|Amazon EC2 (AWS), Google GCE, Rackspace, most cloud VPS providers||3||1 - 2 hours||Create a new server, attach disks from old server, reassign IP address. Create new disks from snapshot if necessary.|
|Pantheon, Acquia||4||< 1 hour||Contact host, get them to do the steps above|
|Dedicated hardware||1||1 -2 days, could be a week or more||This is the big downside of dedicated hardware -- when it fails, it's not easy to migrate to new hardware. It needs to be replaced, and the site will be offline until replacements can be arranged.|
|Distributed systems||5||Theoretically, no downtime||
Everything's great, in theory. A properly architected system can be far more resilient to hardware outages, but costs a lot more to implement, and introduces a lot more complexity that can have its own risks. And a distributed system might also make you more vulnerable to outages, with more systems that have to operate correctly to maintain uptime.
The crucial point here is that with some effort ahead of time, and using a cloud server, you can generally recover from hardware failures within a matter of hours, but on dedicated hardware it could be days or weeks. Largely for this reason, we greatly prefer virtual private servers over dedicated hardware.
How much data could you lose?
Shared hosting, it's entirely up to you to create your own backup systems. Generally most platform and VPS providers make it easy to do nightly backups, and possible to do them more frequently -- there's no reason you can't do a snapshot every hour if it matters that much to you, though each snapshot tends to put a relatively heavy load on the site, which means it may impact the site performance and how much traffic it can handle.
So generally, if you're ok with losing any new data that was added today, most professional packages have you covered. That's what we cover with our server maintenance plans, and generally what most of our customers are comfortable with.
What if you're not okay with losing several hours of data?
Really the only answer is adding redundancy, with failover servers. Usually out of the components of the Drupal site, the data you need to back up is all in the database. You can set up master-slave replication, and get a live copy of all data going through the site. Then if something happens to the main database -- something other than data corruption that gets replicated, that is -- you have up-to-the minute data you can use when you get the service back up and running.
How much access do you have?
This boils down to what you can put and run on a server. For us to fully manage it, we need full root access, and the ability to install whatever software we need. Different providers will have their own management tools, and may dictate a particular stack of software that if you change, you break their tools and can no longer use their support (if you have the ability to change this at all). Here's our assessment of hosts we have current experience with:
|Digital Ocean, Google Cloud Engine, Rackspace, most unmanaged dedicated servers||5||Can install anything you want, large range of distributions, you have full root access, AND a remote console so you can log into the individual server if something goes wrong with the network stack or SSH.|
|Amazon EC2, many other VPS providers||4||Full root access, even broader range of distributions available -- the one knock is no remote console, you have to have SSH/Networking running to administer a server. Recovering if you don't have that involves attaching the drive to another server.|
|Pair, Acquia, most shared hosts with SSH support||3||No root access, no ability to install server software, but sufficient access to manage a Drupal site effectively. We cannot provide server maintenance on these plans, if we don't have root access.|
|Pantheon||2||The only access you have is via Drupal itself, or the web interface/"terminus" shell tool.|
|FTP-only shared hosts||0||FTP just plain sucks, you can't script deployments, we can't provide our site maintenance plan with only FTP access.|
|Freelock hosting||1||Of course we have full access internally, but we don't provide access to clients. We will work with clients on arranging backup solutions, but we restrict access very tightly to maintain a very secure environment, and ensure we score highly everywhere else!|
So the main questions with access are relevant to who's doing the work. We have a lot of flexibility over what we can install and manage, but if you have another company also managing the server, it usually greatly restricts what we can do, and usually conflicts with our management.
What steps are involved in handling more traffic than you can handle today?
This basically revolves around how much traffic are you provisioned to handle, and what is involved in making that more?
The Platform hosting providers -- Acquia, Pantheon. Platform.sh handle this scaling for you, and have the infrastructure built out to make this a really easy thing to do.
The larger infrastructure providers make this kind of growth possible -- Amazon, GCE, and Rackspace (among others) will let you secure a dedicated IP address you can assign to a load balancer or another server that can greatly ease a move when it's time, and also provide database-as-a-service ("RDS" for Amazon) and a Content Delivery Network (CDN) for your assets that provide a path to scaling up to a fully distributed system.
Digital Ocean, other smaller VPS providers, and dedicated hardware don't necessarily provide an easy upgrade path here -- so scaling up usually means getting a bigger VPS and transferring your site over, or building out your own distributed system from generic servers.
What regulatory compliance needs do you have?
The most common one we deal with is PCI Compliance -- the "Payment Card Industry" standards you need to adhere to if you're collecting credit card payments.
In our opinion no shared host adequately meets these requirements while having the ability to harden the site to the level we think necessary for any sensitive data. To be compliant, you either need to send the customer away to a completely separate payment site, or host at a host that has been certified for PCI Compliant hosting. There are a few hundred hosts on that list -- currently Amazon EC2, Google Cloud Engine, and Rackspace are on this list -- but Digital Ocean and many smaller providers, including Freelock, are not.
If you need PCI compliance, we generally turn to Amazon or Google -- or a local/regional datacenter that has undergone compliance.
Other kinds of compliance includes "PII" - Personally Identifiable Information, which is largely regulated state by state and not nationally. If you're storing PII in any way and get hacked, you could easily have a state Attorney General investigating you, and instituting large fines if you have not covered a whole range of best practices and measures to prevent compromise.
And the other big one: HIPAA, which covers health care information. Generally if you work with HIPAA-protected data, we would work with a data center that makes people available to help go through HIPAA compliance. We've spoken with several, though have not yet worked with any directly.
Who is maintaining the site, OS, and supporting packages?
Freelock stepped into this niche, because there is basically nobody else providing full management of the entire stack.
In the table below, the green cells show areas you don't have to even think about because they are handled for you. The orange ones illustrate areas where you might need to make some decisions or request support, but is largely on you -- or may have limited options. The red cells are areas that are completely up to you to figure out and manage on your own (unless you hire somebody else to manage these areas for you).
|Host Provider||Manages Hardware||Manages OS||Manages Servers (Solr, Redis, Nginx, Apache, PHP, Databases)||Manages Drupal site updates||Manages content/ effectiveness of your site||Notes|
|Shared Host||Yes (unless reseller)||Yes||Some (though rarely up-to-date)||You||You|
|VPS Provider - Amazon, Google, Digital Ocean, etc||Yes||You||You||You||You||Everything is up to you above the hardware!|
|Managed dedicated - e.g. Pair (without root)||Yes||Yes||Yes - but cannot install extras beyond what they provide||You||You||They manage the server, you manage the site|
|Platform Hosting - Acquia, Pantheon||Yes||Yes||Yes - and useful extras available||Provides tools to make this easy, but you have to do it||You||Nice supporting tools if you want to be hands on managing your site|
|Unmanaged Dedicated||You (though with "remote hands" support)||You||You||You||You||It's all you|
|Freelock Server + Site maintainance on Cloud VPS||Relies on Amazon/DO/Google for this||Yes||Yes||Yes||Provides consulting and support||You get to focus on the content and goals for your site|
Our Site Maintenance plan can provide everything necessary for the "Manages Drupal Site Updates" column, and some feedback on managing content and site effectiveness (though actual implementation is usually done as hourly retainer or budget).
Our server maintenance plan covers managing the OS and the servers, and we can provide all the same server add-ons that Pantheon and Acquia offer on top of a more bare VPS from elsewhere.
How much support do you need?
Taking a look at the previous question, there are lots of different levels where support may be needed. Most often, we get the question of, "Do you have 24x7 support?" And the direct answer is, no we don't, but do you really need that? And if so, what sort of issue are you concerned about?
The biggest one is downtime. Oh crap, my site is offline, and it's 9:00 on a Friday, am I going to be down all weekend?
Unless you've been hacked, or unless you've mucked around in a production site doing stuff you really should have done in a test site first, chances are you really only need 24x7 support from the actual host, whoever is managing the actual hardware. If you really need that, you will need more premium support, and it's worth looking at some of the support plans for Amazon, Rackspace, or Pair.
While we don't offer an SLA (a "Service Level Agreement" that basically refunds you if your site is unavailable for a specific amount of time) with support terms, we do monitor all of our hosting and server maintenance clients, and do our best to assist with any outages at any hour of the day. We've been doing server maintenance and hosting since 2003. Our longest outage in over 13 years has been one incident that took a server down for about 13 hours, and a handful of incidents that affected particular customers for 3 - 6 hours. Otherwise we may see up to 30 minutes of downtime if we screw something up, perhaps once or twice within 5 years, with upgrades we run well after regular business hours, and routine interruptions of a minute or two as we deploy updates.
If you need 24x7 Drupal support, Acquia is the only place I'm aware of that offers that, and then only on enterprise plans that cost tens of thousands per year.
Otherwise, when you're looking for support be sure to get support for the kinds of things you need -- help using your site, help fixing things that break, help restoring backups, help setting up new sites, help doing your own development, help managing server access, etc.
What is your budget?
Finally, the question most people start with, is where we're going to end. As you can see, there's a lot to consider when choosing a host, and an enormous range of what you get for your dollars. When it comes down to it, where are you going to get the most value for your money?
And this boils down to a huge number of factors we've only touched on in broad strokes.
At this level, you don't get full coverage or support. If you have the technical skills, you can certainly run your own sites and servers and find bargains here, but if you're reliant on others for skills, your best bets are probably either Pantheon Personal sites (at $25/month) or Freelock Multi-site hosting (at $50/month). These are low traffic options, minimal support, and often poor security options -- we focus on providing security and maintenance at the expense of flexibility in our multi-site option.
$75 - $300/month
At this level both Pantheon and Acquia have some options to consider, if you want to handle upgrades yourself, and Freelock hosting with maintenance enters the picture for low-traffic but highly customized sites.
$300 - $500/month
This is a reasonable level to spend to get decent support and maintenance for both a server and a site, with a dedicated VPS. At this level you should be able to handle a decent amount of traffic, conduct e-commerce, and generally have a well-secured server, full access and ability to run whatever you want.
Several requirements push you over $500/month:
- High traffic levels, beyond what a single server can handle
- Business critical need for protecting against data loss of even a single minute
- Need for highly available, redundant systems
- 24x7 support
So where should I host?
Our current top 3 recommendations are Amazon EC2, Google Cloud Engine, and Digital Ocean -- any of those combined with our site and server maintenance plans. These all are infrastructure-oriented cloud hosting services, and we generally provision virtual private servers at one of these three.
We don't tend to put people at Acquia or Pantheon mainly because there's a huge overlap with our services in terms of the deployment pipelines, server management and more -- and their toolchains tend to get in the way of ours (especially Pantheon's). But we do have active clients at both, and are very familiar with their offerings, and recommend both for people who aren't a good fit for our maintenance plans.
Digital Ocean used to be our "go-to" budget option, but Amazon and Google have been in a price war, and while the pricing might vary depending on exactly what you need, we're finding that for what we most commonly provision (2GB, 4GB, or 8GB servers) all three are right in the same ballpark.
We still like Digital Ocean as it's very easy to get started, very easy to use, and we like smaller startups. But the two downsides are:
- Lack of PCI Compliance
- No real control over timing of backup snapshots, without taking the server offline.
Sign up with this link and get a $10 credit.
Google Cloud Engine
Google Cloud Engine seems to have it all -- a nice, streamlined management interface, console access to the servers, a range of server add-ons for persistent disk, database servers, and much more. Plus you get 2 months of free service, up to $300 of resources.
But the one big downside of GCE is the risk of going offline. We've had two accounts get suspended without warning or notice, over a weekend, taking hours to resolve, because of billing snafus at the end of the free trial. And this story scares the hell out of me (quoted on the right).
We like the Google service... we're just a bit unnerved by their "too smart" systems and lack of ability to reach a human to resolve any issues.
Amazon EC2 (AWS)
... That leaves Amazon. Amazon is the big behemoth of the "cloud" world, and tend to be innovators. We don't use most of what they offer as a platform -- but given their steadily sinking prices, extreme flexibility, their compliance story, and heavy reliance by most of the industry, it's the current "go-to" host and we recommend it more often than not.
Compared to GCE and Digital Ocean, AWS strangely lacks a direct console to any VPS you spin up, and sometimes it "feels" slower -- server restarts can take a couple minutes instead of a few seconds. Otherwise there really isn't a downside to using AWS.
Whew! I'm exhausted!
This is our best overview of how to pick a host, with notes based on our experiences as both a (small) host and a maintenance provider. I'd love to hear more about your favorite hosts, and why, below. There are plenty of other options, but some of what you need to know to provide a proper evaluation can only come after taking the time to put something into production there, working with the backup systems, upgrading servers.
If you want to advertise your own service in the comments below, please make sure you answer the questions as they apply to your service -- general spam is ruthlessly deleted. And if you'd like our assistance with server or site maintenance, contact us to get started!
The longest incident in our recent history turned out to be two things -- a disk filled, and our monitoring failed to alert us. It was a new client server and we did not get our monitoring properly set up, so we were not alerted to the problem until the next day (by the client...) Since then we have tightened up our monitoring setup a lot, and have added detection for specific text on the homepage as well as slow ( > 5 second) response times.
Most of the other longer outages have tended to be denial-of-service attacks during weekends/evenings when we're not always immediately available, or in one case a "Distributed Denial of Service" (DDOS) attack, which we can't really do anything about other than wait the attack out -- at least not without scaling up their infrastructure much further.
Downtime and outages are a fact of life, we strive to learn from each outage and find ways of preventing any given cause from taking us down in the future.
Your longest outage in over 13 years has been one incident that took a server down for about 13 hours, and a handful of incidents that affected particular customers for 3 - 6 hours. What was happened?