High Availability and Disaster Recovery for Small Businesses

Too often in the past , High Availability and Disaster Recovery have been marketed as expensive choices for businesses with deep pockets. The truth is that, with careful planning, there are sensible and economic solutions for small businesses that can maintain business continuity when disaster strikes.

Introduction

There are a lot of misconceptions about the cost of high availability and disaster recovery (or HA/DR for short) for small businesses. Any conclusions based on these misconceptions can have severe consequences for the businesses. Two of the more common misconceptions are that:

  • Protection against disaster is expensive. It takes at least a six-figure budget to have a robust disaster recovery plan, such as having a remote set of servers available to be turned on at a moment’s notice.
  • Making our databases highly available necessitates having the Enterprise edition of SQL Server. Standard Edition doesn’t have the features required.

In the following sections we’ll show that both these claims are patently and provably false. Yes, it’s true that you can spend millions of dollars creating all kinds of elaborate solutions to disaster recovery. It’s also true that, especially since the introduction of AlwaysOn / Availability Groups in SQL 2012, Enterprise edition offers some very attractive features when it comes to designing highly available solutions. But as we’ll see, it’s also possible to make our applications and services highly available and resilient to disaster without either spending a fortune or switching to the far more expensive Enterprise edition of SQL.

But first, I want to explain why it is so important to consider high availability and disaster recovery for small businesses.

The State Of Things

Small businesses are likely to suffer severely from any disaster. In 2011, Symantec surveyed 1,288 small businesses (defined as those with between 5 and 1,000 employees) about their plans and practices for disaster recovery. The results are startling.

Only half of businesses surveyed said that they had any kind of plan in place. That number decreased to forty-three percent (43%) amongst the smallest companies surveyed. Fourteen percent stated that not only did they not have a plan, but they have no intention of creating one at all. Fifty two percent of those currently without a plan said it was because they did not feel their IT assets were critical to their businesses, and forty one percent said that the idea of planning for disaster had never even occurred to them. Forty percent went so far as to say that disaster planning was not even a priority.

This lax attitude is not justified by reality. Of the businesses surveyed, a sizeable majority (65 percent) felt they resided in areas “they consider susceptible to natural disasters.” The businesses also experienced an average of six outages per year, all due to preventable and / or mitigable causes, such as “power outages, employee errors, and upgrades.” Disasters also hit small businesses hard in their pockets, causing a median loss of $3,000 per day (the number increased to $23,000 per day for medium businesses).

Finally, the pain of disaster was felt not only by the businesses themselves, but by their customers. Those customers surveyed as part of the study reported that outages of their SMB vendors cost them an average of $10,000 per day. Not surprisingly, this lead to a lot of customer loss: over half (54 percent) of respondents reported switching vendors due to “unreliable computing systems”.

Based on these numbers, planning for disaster recovery and the availability of our systems is critical for those DBAs and IT professionals serving small businesses.

Two Strategies For HA/DR On A Budget

Since small businesses are generally cost-averse and don’t have an excess of capital, I’ve tried to pick out approaches that carry a minimum of unnecessary costs or overhead. That’s not to say they are cheap; in many cases as the level of complexity and protection increases, so do the costs, and we need to carefully consider the trade-offs between increasing our resiliency and paying out more in costs. However, there really is no valid reason not to have some kind of strategy in place, preferably one that includes some level of offsite redundancy.

Let’s dive into the two strategies well suited to small businesses.

Use The Cloud

Many small businesses are geo-locked, meaning that the majority (if not all) of their operations are located in one physical location. Given this, they can’t take advantage of branch offices for the purposes of redundancy. But rather than rent out a facility in some distant city or data center, why not use one of the many cloud providers available?

Let’s start with a very simple requirement: keeping all backups (both full, differential, and transaction log) in an offsite location. Note that this is not a true “high availability” solution, in that it does not provide a means to dramatically reduce downtime in the event of a local failure (i.e. we still must rebuild everything from scratch, then restore backups). What it does do is protect data against total loss due to a localized disaster such as a fire, flood, or vandalism, all of which are situations to consider when planning for disaster recovery.

Here are some examples of costs associated with the storage of data in a few cloud providers, as of the date of publication (please check the links included, these prices often change). For comparison, we’ll say that we have a total of (100 GB of data * 14 days of backups) 1.4 TB of total data storage required, along with ~100GB of data transfer in daily.

Provider Name

Cost Per GB

Other Costs

Source

Amazon S3

$0.03 / GB for first 1TB, $0.0295 each additional up to 49TB. Total: $42.52

$0.005 per 1,000 requests, data transfer in is free

https://aws.amazon.com/s3/pricing/

Amazon Glacier

$0.01 / GB flat rate. Total: $14.24

$0.05 per 1,000 requests, data transfer in is free

https://aws.amazon.com/glacier/pricing/

Rackspace Cloud

$0.10 / GB for first 1TB, $0.09 each additional up to 49TB. Total: $138.40

Incoming data transfer is free.

http://www.rackspace.com/cloud/public-pricing/#cloud-files

Windows Azure

$0.03 / GB for first 1TB, $0.0295 each additional up to 49TB. Total: $42.52

$0.0036 per 100,000 requests, data transfer in is free.

http://azure.microsoft.com/en-us/pricing/details/storage/

Looking at the (currently at least) cheapest solution, Amazon Glacier, we’re looking at under twenty dollars a month to ensure that our backups are stored safely offsite. Twenty dollars. That’s less than the cost of a Cafe Latte every day. Yes, there will be time and cost to acquire the necessary software to allow us to back up our databases to the cloud (for example, the Cloudberry Backup product currently retails for $79.99; necessary disclaimer that I’m not specifically recommending this software, it’s just an illustration of what’s out there) and to setup the backup process, but these seem small compared to the risks we’re mitigating.

Let’s briefly look at a more complex example, that of keeping a standby server in the cloud for recovery purposes. For simplicity, we’ll stick with just Amazon as a vendor. Looking at Amazon’s AWS calculator, I selected a “Windows and Standard SQL on m3.large” image, which includes the following: 2vCPU, 7.5GB of RAM, 1x32GB SSD drive. I then added a 100GB data and 25GB log drive, both on “General Purpose SSD” storage. Assuming 100% uptime, this comes out to around $530 dollars per month, not including data transfers in or out. So while we’re looking at a noticeably higher cost, it’s not anywhere near the six figure mark. If we can do without a true “hot” standby (and keep something built but powered off and ready), we can make the price is even more attractive by having the compute instance shut down until required. This reduces the cost to only $9.50 per month (mainly because we’re paying for storage only).

Thanks to the power of cloud computing platforms, having offsite redundancy in one form or another is completely feasible, even for small businesses. Certainly there will be a learning curve, and perhaps the cost of retaining a specialized IT group in the setup. But this is still very much an avenue worth pursuing in our quest to protect our data.

Make Use Of What You’ve Got

I’ve heard people say in the past that if you want true high availability options that you must pay for the higher cost of Enterprise edition SQL Server. Frankly, that’s just not the case. Yes, it’s true that Enterprise gives you a broader selection of options. AlwaysOn / Availability Groups is a fantastic feature and one that I personally love, but it’s certainly not the only one we have in our toolbox.

SQL Standard Edition currently has three excellent, proven options when it comes to making our databases highly available: clustering, mirroring, and log shipping. Note: I say “currently” because Microsoft has stated that mirroring is a deprecated feature and will be removed in a future version of SQL Server. Still, it’s worthwhile to consider it along with the other two. Let’s examine the pros and cons of each.

Clustering

Failover clustering involves having a SQL instance that can run on two (or more, but that’s only in Enterprise edition) servers. The instance can fail over between the two, either on demand or automatically if the instance it currently resides on fails for some reason. For example, if the server blue screens, the passive node (the server on which the SQL instance does not reside) detects this, and will bring SQL online after claiming all the relevant resources (disks, network name, etc). Clustering protects against both hardware and, at least to some degree, operating system failure. In versions of SQL from 2008 R2 upwards, you can also use clustering to apply SQL patches without downtime, using a rolling upgrade process.

Clustering can be (at least in my experience) difficult to setup and keep running smoothly, though this has gotten notable better in more recent versions of Windows. Clustering also requires the Enterprise edition of Windows (except in Server 2012 and up, where Standard edition also includes this feature), which means higher costs. In addition, the necessity of shared storage increases both the cost and complexity of setup: you either have to have a SAN or configure a shared Direct Attached Storage (DAS) enclosure like this one. You also have a built-in single point of failure, namely your backend storage.

Mirroring

Database mirroring sends all transactions that occur on a database over the wire to a secondary copy. This secondary stays in an offline state, but can be brought online if necessary. Optionally a third “witness” server can be used, allowing for automatic failover in case of a primary (the server where the database normally lives) failure. In Standard edition, we can only choose the “synchronous” mode of mirroring, which means that transactions do not actually commit on the principal until they are also committed on the mirror. This has the advantage of ensuring zero data loss, but also introduces latency into the equation if the network link between the primary and secondary servers either gets saturated or is over a slower connection, such as a WAN.

A nice feature of mirroring is what’s known as Automatic Page Repair. When SQL Server detects a torn or corrupted page, it will check the secondary to see if an up to date version of the same page is available there. If it is, then the page will be retrieved and repaired without further user intervention.

Mirroring is currently the only solution for high availability in Standard Edition that guarantees zero data loss while still protecting against total failure. For example, you can configure the secondary server to be wholly independent of the first by ensuring that its storage and networking connections (as well as things like where the storage resides, such as on a separate SAN or locally attached array) are independent of each other.

Since it is done at the database level, mirroring could conceivably generate quite a bit of overhead as the number of mirrored databases increased, especially on high transaction throughout systems. (Microsoft specifically states a maximum of ten databases on a 32 bit systems.)

Log Shipping

Log shipping is really quite simple: we restore a copy of a database on one or more secondary servers, which are left in either standby (basically read only) or no-recovery mode (offline waiting for a restore). We then setup jobs that copy the transaction log backups from the primary to a jointly accessible location, where the secondaries can pick them up and restore them. It’s probably the simplest of the three solutions to setup, at least in my experience. It’s also the only one that allows you to have a second copy of the data accessible (albeit read only and interrupted when the restore job runs), as well as multiple copies of the data (i.e. you have multiple secondary servers, all restoring log backups on their own). We can setup a known delay between when the log backups are taken on the primary, and when they are applied on the secondary, giving us some measure of protection against things like accidental deletes.

When configuring log shipping you must allow the log shipping framework to handle taking the transaction log backups of the shipped database(s). This means that you’ll need to find a way to exclude those database from your regularly scheduled log backups. Naturally any decent maintenance framework (such as Ola Hallengren’s award winning set) allows for this, but it still needs to be called out and planned for. There’s also nothing stopping you from rolling your own custom log shipping framework, though in many cases the extra expense / overhead of doing to isn’t worth the added flexibility of having your own.

Mix and Match

There’s also nothing to say that you can’t mix and match the choices given above into your own special sauce of highly available goodness. In many cases, no one solution meets all the needs, so combining them is the best approach. For example, you could setup a failover cluster for protection against hardware failure and allow for easier maintenance, as well as setup log shipping to a remote standby server in the cloud for disaster recovery purposes.

You could also combine database mirroring locally (for ensuring a standby copy with zero data loss), while again utilizing log shipping to a remote instance or a reporting secondary. This is a well known configuration and Microsoft helpfully published some guidelines around using it.

Ultimately, it’s all about defining your needs up front and then moving towards designing a solution that meets them.

Conclusion

At the beginning of the article we laid out two misconceptions that we wanted to examine and disprove:

  • Disaster recovery protection is costly
  • Only those using Enterprise edition could take advantage of SQL Server’s built in capabilities for high availability.

Both are incorrect; thanks to cloud technology, prices have decreased to the point where any business, whatever its size, can afford some measure of protection against disaster. SQL Server Standard edition contains three well-proven technologies for providing highly-available databases. We also saw how small and medium businesses are vulnerable to disasters and data loss, proving that a well thought out plan for coping with these is a critical role of SQL Server professionals in these organizations. We must be proactive about protecting our data, so that if disaster does occur, we can resume business in a timely fashion and get back to what small businesses do best: serving their customers.