The Shortcut Guide to Availability, Continuity, and Disaster Recovery

by Dan Sullivan


It is only a matter of time before human error, hardware failure, natural disaster, or any other adverse condition throws a wrench into your perfectly running IT operation. Learning how to recover and maintain your critical business systems during these adverse conditions is vital to delivering on the increased demands for high availability. The Shortcut Guide to Availability, Continuity, and Disaster Recovery is a concise roadmap to uncovering the not-so-obvious business requirements of recovery management and meeting some of the most vexing challenges to traditional backup models, including virtualization and application-specific requirements. This excellent new book offers concrete advice on scheduling and monitoring, choosing storage media, managing growth, and controlling costs. With the right combination of backup systems, high-availability solutions, and sound recovery management procedures, businesses can have the backup protection they need without straining their staff or budget.


Chapter 1: The Business Case for Recovery Management

Professionals can develop their businesses with effective strategies, stay ahead of the competition by analyzing dynamic market conditions, and build brand loyalty with exceptional customer service—and it could all turn out to be for nothing. That is, if the IT systems that store, manage, and distribute our information fail and there is no recovery management process. It would be as if we had never done our work in the first place.

Data loss is an all-too-common problem. We lose information on the small scale with damaged laptops and misplaced flash drives. We lose information on the large scale with natural disasters that destroy entire data centers. Sometimes human error is at fault and sometimes applications fail with unfortunate consequences. Regardless of the cause of the initial data loss, the ripple effects can result in redundant work to reproduce the lost data, or in the worse case, to legal liabilities, brand damage, and business disruptions.

Recovery management is a framework to mitigate the risk of lost data and lost IT systems. It includes practices such as making backup copies of essential data, maintaining stand-by systems in case primary systems fail, and establishing policies and procedures to cost-effectively protect business assets, applying appropriate procedures based on the value of the data. Of course, no business practice will eliminate all risk or guarantee we can recover from calamitous events. We can implement cost-effective measures that allow a business to continue to operate at, or near, normal operating levels in spite of adverse events.

The purpose of The Shortcut Guide to Availability, Continuity, and Disaster Recovery is to provide you with the information you need to understand the business drivers behind recovery management, the technical aspects of recovery management, some of the operational challenges you might face, and best practices for implementing recovery management.

The four chapters of this guide cover the following topics:

  • Chapter 1 discusses the obvious and sometimes not-so-obvious business drivers behind recovery management. The chapter also describes how to develop a recovery management strategy, including assessing threats and risks and outlining policies and applications needed to implement an effective recovery management strategy.
  • Chapter 2 examines the challenges posed by the increasing complexity of IT environments, including virtualization, application-specific backup requirements, and remote office protection.
  • Chapter 3 delves into common issues in recovery management, including scheduling and monitoring, media options, controlling costs, the growing volumes of data, and recovering in case of disaster.
  • Chapter 4 digs into the details of how different types of organizations frame their recovery management strategies. The chapter concludes with a discussion of best practices for availability, continuity, and disaster recovery.

We start our examination with the business requirements that drive the need for recovery management.

Chapter 2: Breaking Through Technical Barriers to Effective Recovery Management

Information technologies are constantly advancing in ways that enable businesses to execute their strategies more efficiently and effectively than was possible previously. Virtualization improves server utilization, relational database provides high‐performance data services, and email offers what has become a dominant form of business communication. With these advances come new levels of complexity, some of which have a direct impact on recovery management practices.

In addition to technical advances, there are organizational structures that create technical challenges to effective recovery management. The physical distribution of offices, for example, affects how we implement recovery management practices. If a company has multiple sites, it may not be practical to have skilled IT support in each office. Centralized IT support is often more economical; however, it raises the question of how to remotely provide recovery management protection. What starts as an organizational issue quickly leads to technical issues.

This chapter will examine several technical barriers commonly encountered when implementing recovery management services. These common challenges include:

  • Protecting virtual environments
  • Meeting the specialized backup and recovery requirements of databases and content management systems
  • Solving remote office backup and recovery challenges
  • Ensuring continuity in disaster recovery operations

Throughout this chapter, we will see examples of the need to adapt recovery management techniques to application‐specific requirements and systems‐implementation–specific requirements. These examples show that recovery management is much more than simply a matter of backing up files.

Chapter 3: Top 5 Operational Challenges in Recovery Management and How to Solve Them

Maintaining effective recovery management procedures is not a trivial task. From making sure processes are running correctly to controlling the costs of operations, there is no shortage of challenges. In this chapter of the Shortcut Guide to Availability, Continuity, and Disaster Recovery, we will examine five of the top operational challenges we commonly face in recovery management:

  • Scheduling and monitoring
  • Choosing the right storage media option
  • Controlling the costs of off-site storage
  • Keeping up with growing data volumes
  • Recovering when disaster strikes

These five challenges are interrelated. For example, dealing with growing data volumes is directly related to controlling costs. Monitoring is less directly related to recovery operations but is just as important—we do not want to find out about a failed backup operation when we try to restore critical data after a disaster. As we address each of these five challenges, we will consider both the fundamentals of the individual challenges as well as how the challenges influence each other.

Chapter 4: Putting It All Together: Recovery Management Scenarios for Small Businesses to Emerging Enterprises

Throughout The Shortcut Guide to Availability, Continuity, and Disaster Recovery, we have explored how to address the business and technical requirements of data protection. Some of the requirements are obvious and apply to all organizations: restoring from isolated failures, for example, accidently deleting a file, and recovering from catastrophic failure, such as a natural disaster that destroys a data center. There are also less obvious technical and business needs. For example, server virtualization is widely adopted for its ability to improve server utilization and help control costs, but it introduces additional technical challenges with regards to backup and recovery. Business strategies can also influence recovery management objectives. A move to improve customer service by providing longer periods of access to online data directly affects the cost and required resources of recovery services.

These and other considerations have been woven into both the business strategy discussions and the technical assessments documented in earlier chapters. In this chapter, we take a different approach and consolidate key recovery management issues according to business types and the special case of failover recovery. We will consider five scenarios. Each scenario delves into typical business and technical issues faced by particular types of businesses or technology use cases; in particular, we will consider:

  • Small business backup and recovery
  • Midsize business and remote office protection
  • Operational management and enterprise backup
  • Backup and recovery with virtual machines 48
  • Continuity and failover recovery

These scenarios are not mutually exclusive. Some of the discussion of small business backup and recovery services may be relevant to midsize businesses, especially those with remote offices. Similarly, virtual machine recovery management may be relevant to all types of businesses, regardless of size. Continuity and failover is such an important topic that we address it separately, although we will touch on failover in other sections when relevant. We will conclude this guide with a summary of best practices in availability, continuity, and disaster recovery.