When looking at high availability for any CMS, and particularly for Drupal, the list of contenders for part or all of the solution is growing and can be daunting. We'll take a look at the various parts of a solution and what options we have.
It seems nearly every answer to every problem in IT these days is "The Cloud", but is it?
Cloud is not perfect
We need to get this up front. There have been high profile failures of cloud providers that have left people relying on their services in the lurch. It isn't all bad news, but it does highlight that just moving your infrastructure into the cloud doesn't guarantee availability. What it also demonstrates is the need to build with failure in mind.
What the cloud does give you are the essential components of a flexible, scalable solution. Probably the most attractive benefit is the low barrier to entry. Unlike a traditional hardware based solution that requires an expediture for the components even before you start building your system, cloud systems tend to have no initial deployment costs, only usage charges.
Before we get too deep into the cloud, let's go back to basics.
Breaking down the problem
A Drupal system is comprised of a number of parts, each of which needs to be addressed in order to meet high availability requirements.
- HTTP Delivery Mechanism (web server)
- PHP processing
- Application code
- Static files
- User Sessions
In a standard Drupal installation, cache and user sessions are stored in the database.
In our previous post we have already discussed the difference between Apache and Nginx and the potential benefits of moving to Nginx. What wasn't covered was how they interact with PHP
PHP is not thread safe, so when used with Apache you are forced to use the pre-fork version rather than the threaded version. With pre-fork, each process sticks around for a period of time and will accumulate memory usage as the PHP code gets exercised. It is not unusual to have to adjust the number of requests allowed before a process is killed down to what may seem a ridiculously small number in order to prevent your server exhausting system memory.
Nginx, on the other hand, doesn't have any inbuilt mechanism for running PHP, however it does support the use of PHP running FastCGI. This normally means you have a separate daemon running the PHP and communicating results back to Nginx. In practice there are a lot of static files being delivered as part of a page request, so the number of requests actually needing to be processed by the FastCGI layer are relatively small, so you can still get quite a boost from Nginx handling the static files.
Memcached is a distributed memory resident cache system. By itself it isn't a high availability solution, but is often used as a part of a high-performance and highly available system.
Memcached works by setting aside memory on as many servers as you have available. Quite often you would use it on each web server. You can cache just about anything, as long as you can supply a key to identify the data, and the data fits into memory. Memcached has been used to cache complete web pages, database lookups, even images. Since everything is in memory cache hits are blindingly fast.
The keys are hashed and distributed amongst the available servers, and the data is set with a time to live. Each key/value pair will live on one and one only memcached server at any one time. If a server instance goes down, cache misses occur and the hashing will then redistibute inserts amongst the remaining servers. So while a crash will cause a performance dip, it will not cause the system to fail - provided the application handles cache misses.
Generally to use memcached you would need to modify the code that you are interested in applying cache for. Luckily for Drupal sites there are modules that will help. For both D6 and D7 the Memcache API and Integration provides caching for user sessions and the Drupal cache, as well as memory-based locking (Drupal uses a database-backed locking mechanism for synchronisation of various tasks). The Memcache Storage module is a replacement that only works in D7 and provides a few nice extras. Note that these modules are more about performance than availability and they don't address issues like slave lag.
DRBD is the Data Replication Block Device and can provide block-by-block replication of a Linux filesystem. By default DRBD will have a primary node and a standby node that receives all the changes that are written to the primary node. As such it is fine for failover situations, but doesn't handle multiple front-ends writing to the same resource. There is a configuration that will allow multiple primary nodes, however it requires an underlying filesystem that supports sharing, such as GFS2 or OCFS2. In either case if you are using it to handle the filesystem the database server uses, you must ensure that the database storage engine can handle this, e.g. InnoDB.
The classic solution for the shared files problem is a SAN (Storage Area Network) or a NAS (Network Attached Storage). Both of these solutions are hardware based and provide a network interface to the data allowing multiple nodes to read and write the data. The availability of the solution is dependent upon the architecture of the SAN/NAS. For a simple solution you could attach a filesystem via NFS.
MySQL Replication and Load Balancing Plugin for PHP
The mysqlnd replication and load balancing plugin for PHP does exactly what it says on the tin. For those stuck with D6 this can get you around the lack of application support for database replication and load balancing. For D7 this is not entirely necessary, except in meeting some replication scenarios not covered by the inbuilt support.
Pacemaker and Heartbeat are part of the Linux-HA project. Pacemaker is a cluster resource manager that works with either Heartbeat or Corosync to manage clustered systems. Together these allow you to use commodity Linux systems and build a high-available cluster with automated failover. If you are building your own HA system on your own hardware, these tools will give you the basics. Coupled with cluster resource agents they can be used to provide load balancing, automated failover, monitoring and control of your cluster.
It might seem obvious, but MySQL and MariaDB provide replication out of the box which can allow you to build systems that are resilient and performant. While multi-master replication is possible, it does have some gotchas, so most systems using the standard replication will have a single master database server and often many slave servers. This works well if you have a system the has a high read load compared to its write load. Fortunately for most web sites this is the case. When building out the mysql.com infrastructure I found that adding slaves was far more important in handling load than adding front end web servers, as in that case the read performance of the database was the limiting factor.
If the master goes down, a slave can be promoted to master, and all other slaves point to this master. This can be problematic if not all slaves are at the same point as the new master, and you have to ensure you pick the most up-to-date slave for the job. There are answers to this, though as we'll discuss shortly.
Why not multiple masters? While it is possible, it is not simply a matter of pointing your application to the multiple masters. Take a simple example. You have a table that has a primary key using an auto-increment field. So the database server is keeping track of each insert and making sure the key field is the next available number in the sequence. Now add a second master with the same table definition. Suddenly you get duplicate keys trying to be replicated between the masters. There are ways around this, like setting the auto increment starting value and increment, but these are less than ideal when you have to replace a master.
Master High Availability (MHA)
MHA is a system that simplifies the management of a MySQL or MariaDB replication setup. It's overview page also gives an excellent description of the problems hinted at above in master failover. For those taking the plunge into replicated database systems this is a good choice for a simple but reliable management overlay.
This started looking promising a few years ago, but MySQL Proxy has not apparently moved on beyond the alpha stage. Added here for completeness, Proxy sits between your application and the database server or servers and with LUA scripts it can handle load balancing and read-write splitting or even deep inspection of query to support complex decision making. This has the promise of supporting a wide range of scenarios.
Many people assume MySQL Cluster when talking about clustering with MySQL databases. Unfortunately MySQL Cluster is a completely different product to MySQL Server and is for an entirely different problem domain. What you probably want when you go from a single MySQL server to a clustered solution is in fact Galera Cluster. This is now available in MariaDB so is the natural answer to building a clustered database solution. The benefits include multiple active masters, synchronous replication and no slave lag.
Amazon Web Services
OK, AWS is a big area, and putting it all into one paragraph is ludicrously over simplifying. So let's get a bit more specific. In essence in AWS you can have load balancing (ELB), compute units (EC2), storage (S3 and EBS) and database services (RDS). There are also mechanism for monitoring this (CloudWatch), and for handling automatic scaling. This can be a complete nightmare if you need to start from scratch having to understand how all the components fit together. Luckily AWS also supplies Elastic Beanstalk, a mechanism that combines many of the aforementioned components into an application delivery system.
With AWS you have options. Many options. You can take your classic hardware model and effectively duplicate it in the cloud, or you can take advantage of some of the products like Elastic Beanstalk to quickly build a system that meets most of your needs. There are even tools like the SkySQL Cloud Data Suite that can configure your system and provision it in the cloud.
There are a number of interesting Drupal modules starting to appear that address parts of the AWS puzzle, although they are mainly targeted at D7. The CloudFront module provides an interface to the Imagecache module to allow images to be delivered via the AWS CloudFront CDN. The AWS SDK module looks like it might be a good start to accessing other AWS functionality.
With OpenStack you can even leverage cloud strategies in your own data center. This gives you the flexibility of cloud management with the privacy and security of having the data under your own roof.
Private clouds also give benefits such as finer control over latency issues which can be an issue in the public cloud
So much for the shopping list, what next?
We have a huge number of options to choose from, but now it is time for the rubber to hit the road. In our next post we will start to flesh out a solution and look at benchmarks between the options.