The Shortcut Guide to Untangling the Differences Between High Availability and Disaster Recovery

by Richard Siddaway


Solutions and systems that address High Availability (HA) and Disaster Recovery (DR) have been paramount in preventing application downtime and its crippling effect on productivity and operational output. But the traditional differences between HA and DR have been blurred by new mechanisms introduced by emerging technologies. And while it is possible to provide HA and DR together, there are cases when it is important to keep the two activities separate in order to guarantee 24X7 access to business applications.

In The Shortcut Guide to Untangling the Difference Between High Availability and Disaster Recovery, author and IT expert Richard Siddaway examines HA and DR, and explains how understanding the many differences between the two will allow you to create a proper system to guard against application downtime. Siddaway offers his insight not only on how technology can be used in HA and DR solutions, but also how management of the human element must factor into any successful HA / DR solution.


Chapter 1: What Is High Availability?

High availability and disaster recovery are two topics that are often tangled in thought and action. This guide will untangle the differences as well as explain the similarities and where the two areas converge. The guide starts with this chapter explaining high availability-what it is and what needs to be considered when implementing a highly available infrastructure.

Chapter 2 introduces disaster recovery, explaining the concept and comparing it with high availability-planning, implementation, and testing are discussed. The chapter closes with a look at how the technologies enabling high availability and disaster recovery are producing a convergence in how the two are implemented.

Chapter 3 returns to high availability and examines how you can configure your environment to be highly available. The chapter examines the reasons systems become unavailable and looks at traditional high-availability solutions such as clustering. We explore high availability from applications such as Microsoft Exchange Server and Microsoft SQL Server and discuss how virtualization brings its own availability challenges and solutions to the mix.

High availability is not created by technology alone. We also need to consider the people and the process they operate, which we explore in Chapter 4. This chapter considers the causes of down time and how you can eliminate the largest cause of unplanned downtime; it also discusses the impact of people and processes on high availability. A consideration of how you can apply the techniques and concepts of high availability into the disaster recovery arena closes the chapter. The first area we need to look at is high availability.

Chapter 2: Disaster Recovery Is Not High Availability

Chapter 1 looked at various aspects of high availability—what it is and the implications to your organization of not implementing it. In this chapter, the focus turns to the other side of the coin—namely, disaster recovery.

The first chapter clarified that disaster recovery is not the same as high availability. It is now time to consider disaster recovery in its own right and discover what is meant by that term. In order to cover the topic properly, let's consider what is meant by disaster.

It Can't Happen Here

Many organizations do not have a disaster recovery plan. If you take an honest look at the organizations that you know, how many of them actually have a disaster recovery plan that is tested and would actually work if it was invoked? My expectation is that the percentage of organizations with a workable disaster recovery plan is relatively small. The majority of IT administrators will go through their careers without being involved in a major disaster recovery incident. They may have to recover a server occasionally, but that is about the limit. If most organizations don't have a disaster recovery plan and most IT admins never lved in a full disaster recovery operation, you don't have to worry because it's not happen to you. Right?


If you play the odds, eventually they will catch up with you and disaster will strike. "It can't happen here" is not a defense. If your organization is caught in a disaster without a recovery plan, the chances are that your organization will cease operations. Industry statistics suggest that one out of every five organizations that suffers a major disaster never actually recovers and goes out of business.

Chapter 3: Configuring Your Environment for High Availability

The first two chapters looked at high availability and disaster recovery. Each topic was explained, with a discussion of how the two traditionally separate areas are converging due to advances in technology.

This chapter will return to the topic of high availability to consider how you can configure your environment for high availability. As we saw in Chapter 1, high availability is not simply a matter of implementing a particular technology. The environment has to be administered by people with the correct level of skills and knowledge. These administrators need to be working with the correct processes to ensure the systems remain available. No two environments will present completely identical issues; however, there are a number of themes that we can address that will aid you in creating a high availability environment of your own. The starting point of your configuration is ensuring that your systems are reliable.

Reliability and High Availability

Reliability might seem to be the same as availability, but they are quite distinct concepts. Chapter 1 provides definitions of availability. Reliability can be considered to be the ability of a system to keep working under normal operating conditions. Some perspectives might extend this definition to cover abnormal conditions, but I think doing so confuses the discussion as to where to draw the line. With such a definition, we would quickly overlap into high availability and disaster recovery, so we will stick with normal conditions.

Systems that remain available are designed from the ground up with reliability in mind. Trying to bolt a high‐availability solution onto a badly designed system is not going to work. Designing in high availability starts with ensuring they are reliable.

Chapter 4: People and Poor Processes: Eliminating the Biggest Causes of Application Downtime

In earlier chapters, we examined high availability and disaster recovery. We have looked at the differences, the similarities, and how, in some circumstances, the two are converging. The ability to deliver high availability and disaster recovery from the same technology can supply a cost saving to the organization as well as a greater depth of assurance that services can be maintained.

The configuration of high availability involves answering a number of questions, as we saw in the previous chapter. The high‐availability solution can be designed and built once you have that information. Creating an environment that will supply high availability for your business processes is just the first step. You have to keep the systems running so that there isn’t an interruption in service.

The continuation of service is a matter of people and processes. High availability can only be supplied, and maintained, when the people and processes are correct. In this chapter, we will examine how we can minimize, and preferably eliminate, the biggest causes of downtime:

  • Poor processes
  • Human error

The first step in eliminating downtime is identifying what causes it.