The Definitive Guide to Business Service Management

by Greg Shields


The Definitive Guide to Business Service Management provides a deep and broad look at the challenges and solutions of properly monitoring the network, systems, applications, and user experience for a business. Linking the goals and requirements of the business with the metrics and data of IT, Business Service Management enables a real-time and historical heads-up glimpse into the specific metrics that show how IT operations drive dollars and cents for the business.

BSM looks at IT elements not as individual units, but as parts of a whole that provide value to the business and quality service to its customers. This process involves linking business-critical IT services to the health and performance of underlying infrastructure and application components.

This guide discusses the challenges of monitoring business systems from the perspective of its users and shows you how BSM software and methodology illuminates business-relevant solutions. Noted author, columnist, and speaker Greg Shields discusses the best ways to implement BSM as well as getting the best return out of it. This guide will enlighten the business leader as much as the network administrator in understanding the importance of BSM, as well as the issues involved with implementation.


Chapter 1: The Power of Business Service Management

Its 2:15am on Monday morning and First Class Glass (FCG), a global distributor of high-end automotive and industrial glass products, is about to experience a server outage of a "minor IT system" at its Denver data center. But this is no standard outage…

With domestic data centers in Denver and Baltimore as well as international operations in Geneva, Switzerland and Osaka, Japan, IT Operations for FCG has hundreds of interconnected systems and network devices under management. With rare exception, each of these systems is monitored under the watchful eye of First Class Glass' monitoring and notification system.

What's different about today's outage is that the alert for this minor IT system in Denver actually got lost in the daily shuffle of alerts and notifications brought up by FCG's network management system. The alert for the "minor IT system" was categorized with a very low priority and was missed by FCG's monitoring help desk. So, this outage will go unnoticed for nearly 6 hours before IT in the Denver office begins arriving for the Monday morning grind.

Chapter 2: The Alignment of IT and Business

Its 8:45am, two weeks after the outage of our "minor IT system" at First Class Glass' Denver data center and IT Manager John Brown is finishing up his monthly metrics presentation to FCG's business leaders. Amped on too much coffee and not enough sleep, John reminds himself why he hates these end-of-the-month presentations.

"There're just too many manual steps in putting together these reports," he thinks to himself for what seems the hundredth time since late last night. "Every month, the FCG executive leadership wants all this statistical and trend line data so that they can see that our systems remain as operational as they always have," he thinks, "Yea, I know we sometimes run out of disk space on some of the servers, and rarely our critical applications may slow down or go down for a short period of time. But we've thrown so much money at this application and network, adding clustering and redundancy, disaster recovery, and everything else that we rarely have major problems anymore."

What I really wish we had was a way to pull these metrics automatically. Then, once a month my senior staff and I wouldn't have to spend the night here pulling together statistics for this marginally-useful 30-minute presentation."

John adds the finishing touches to his PowerPoint slide deck, entering in some last-minute figures he received from his Help desk manager on work order closing metrics. He saves the completed file to the network and heads off to the boardroom for his presentation.

If you've ever pulled together these kinds of metrics for your company's monthly executive update meeting, you've probably had the same series of thoughts at the 11th hour. Figuring out what kinds of metrics the leadership wants to know is only part of the trouble. Compiling those metrics from dozens of separate and un-automated systems can have the effect of shutting down the department for a day as people scramble to gather the end-of-the-month statistics. Often, the presentation goes by with nary a question when senior leadership doesn't understand the statistics you're providing them.

Contrary to what the IT people in the trenches often believe, this data is critical to the smooth and continued operations of the business network. Executive leadership needs these statistics so that they can prove to themselves that the money they've "thrown at the network" is actually providing value and measurable return.

Where the difficulty often presents itself in these situations is in translating what is important to IT into information that is digestible to executive leadership. If business leaders can't understand the kind of data they're being presented at the monthly metrics meeting, they can't make good decisions on what to do with that data. This chapter talks about the dissonances between what IT believes is important and what the business leaders want to see.

When these two groups speak the same language and share the same priorities, we say that they have achieved alignment. This chapter will discuss the alignment issue as well as common failures in alignment. We'll talk about why misalignment occurs and what IT can do to develop itself both culturally and technologically to resolve the problem. Throughout the discussion, we'll incorporate what we've learned so far from Chapter 1 to show how the implementation of BSM into the operating environment helps enable the alignment of IT and the business.

Chapter 3: IT Service Management Evolution

“Beep, beep.”

“Oh, not again,” thinks First Class Glass COO Dan Bishop as he rolls over in bed for what seems like the third time this month. Picking up his mobile device he reads the message on its screen, “Another one? Why do these things always seem to happen in the middle of the night?”

‘DEN-RTR-02B-H, Failed Ping Response, 10/13, 4:23a,
Expected down. TKT 104328 assigned to NET_OPS.'

“What the heck is a DEN RTR,” he thinks as he starts to dial up his IT Director John Brown, “and why do I care if it failed a ping response? These sorts of middle of the night mobile device beeps must be commonplace for the IT guys. But I'm too busy and too near retirement to get blasted out of bed like this once a week. How do these guys do this all the time?”

John answers the phone with an equally bleary voice. “What's a D-E-N-R-T-R and what happens when it fails a ping response,” asks Dan to his IT top gun.

John responds groggily, “G'mornin', Dan. That means one of our backup routers in the customer DMZ couldn't be reached by the monitoring system. Lack of a ping response tells us that it's not talking on the network.”

“Is this bad?”

Chapter 4: Implementing BSM

As you can see, our continuing story on First Class Glass (FCG) gives us a glimpse into the maturation of their organization. That maturity aligns with their need for ever-better data quality. Two years ago, FCG found that by implementing a monitoring system, they would be notified when systems go down rather than waiting for customers to tell them about it. After 2 years of middle-of-the-night pages, they have gotten better with parsing through the data provided by that notification system. But one thing is still missing from that data: business relevance.

This is validated by the way notifications are arriving into Dan's mobile inbox. He is receiving data that doesn't make sense to him. Dan embodies the “business” side of the business, while John embodies the technology side. John recognizes FCG's need for a new router, but his focus on the tight IT budget and cost avoidance has led him towards running that device in production well past its useful life.

Throughout this guide, we've discussed how a well-defined BSM service model with links into the correct business systems can augment monitoring data with value. BSM provides a quantitative measure of the quality of a service by measuring it against financial rules specific to the business. Chapter 2 discussed how IT organizations must endure a process of maturation for them to recognize the need for data quality. Chapter 3 analyzed how that organizational maturity links to the evolution of IT Service Management and service management targeting. Our historical look there helped us better understand the gap filled by BSM.

In this chapter, we begin the process of implementing our BSM solution and its surrounding framework. We will begin by assuming that the implementing organization has the will and the way to incorporate the necessary software and processes to successfully complete the installation. The evolution of the implementing organization's service management has elevated them to recognizing the need for BSM's data quality measurements within their organization. All that is left is laying down the structure. At the conclusion of this chapter, you should be fully cognizant of the tasks and activities necessary to implement the BSM solution that is right for your environment.

Chapter 5: End-User Experience Monitoring

One of the key tenets of BSM is to provide to the business an understanding of its systems and how those systems relate to profitability and the satisfaction of its customers. BSM does this through the concept of Service Quality. In Chapter 1, we talked about how the quality of a service is impacted not only by that service but also its supporting services:

A loss in a sub-system to a business service feeds into the total quality of that service. A reduction in the performance of a system reduces its quality. And, most importantly, an increase in response time for a customer-facing system reduces its service quality.

We talk so heavily about this measurement of service quality because it is that quantitative measurement that helps the business come to understand its customers' experience with the services it provides. If those services are not of high quality, customers may take their business elsewhere. If services do not perform appropriately, a business' public relations can suffer. Ultimately, if a business cannot service its customers to the level that they require, that business cannot continue to operate.

One problem, however, with quantitatively measuring that level of quality lies within the device-centric nature of IT itself. In Chapter 2, we talked at length about the maturity of the IT organization and how a maturing IT organization finds itself better aligned with the needs of its business. As we'll find out in this chapter, another benefit of a maturing IT organization is that they have a much better understanding of the types of metrics that are critical to correctly measuring their services' quality.

Chapter 6: Achieving Management Value

To this point we've been looking at the technical aspects of Business Service Management, and its complement that is End-User Experience monitoring. We've talked about the technical and process-based aspects of implementing such a system to the benefit of the organization. This chapter as well as the next two chapters will deviate from those discussions a bit to consider the value returned back to the business by implementing such a system.

In this chapter we'll discuss the value associated with managing business systems. Here, we'll talk about the potential return that can be obtained by enterprises, outsourcers, and end users themselves. We'll show some examples of management dashboards that enable that return, and how the information gained through those dashboards improve business leaders' ability to better service their customers.

In the following two chapters we'll continue the conversation on value, delving into the achievement of operational and IT value. In Chapter 7, we'll focus on how BSM's information can reduce operational expenditures to an organization. We'll also talk about how BSM can be a management umbrella, under which management controls can be housed. There, we'll revisit the topic of dashboards, discussing best practices in building effective ones.

Chapter 7: Achieving Operational Value

As with any investment, getting the most return is critical to determining its worth to the business. A BSM system's internal calculations engine itself is involved with the determination of value and return. This determination works not only for the systems it monitors but the same calculations associated with return can be used to validate the BSM system itself. As such, BSM, through its measurements and internal financial calculations, is capable of determining that value.

As you can see in Figure 7.1, at the highest level, a BSM system is intended to be a sort of black box. Being input into that box is a set of raw data arriving through its own End User Experience instrumentation or through various connectors that plug into other management systems.

On the right side of Figure 7.1, we see the output of BSM's calculations. These are a series of visualizations that can be used to validate system health, understand the financial impact of IT systems, and ultimately make decisions based on data that has been formatted into a digestible format.

Figure 7.1: In many ways, BSM's internal computational logic is like a black box. Raw data from connected systems goes in one end. Visualizations of that information in digestible formats are output on the other end.

Chapter 8: Achieving IT Value

We've spent a lot of time in this guide discussing the output side associated with a BSM implementation. Dashboards, visualizations, and actionable information that IT and the business can use to make effective decisions and troubleshoot problems are all useful and easy-to-see results from a BSM implementation. They're also arguably the more exciting part. Seeing a BSM implementation's visualizations and understanding how they can be used effectively is easily the most impressive part of the concept. But as you can see in Figure 8.1, that output side is only one-third of the entire process.
figure 8.1
Figure 8.1: BSM's visualizations and output information are only as good as the data ingested into the system and the calculations made on that data.

Bringing importance and value to the output from the system is the information that got you to those ultimate conclusions. The data collected by the End User Experience (EUE) monitoring product in combination with data from other management systems is the raw material that is fed into the BSM calculations engine—the center black box. Once those connections are set up and fully understood from a data perspective, the next step is to model the data to give it meaning. How the results of those calculations are interpreted is what ultimately drives what its end users see.

In this chapter, we'll talk about both of these pieces. Having already gone through a thorough discussion of the importance of EUE monitoring back in Chapter 5, this chapter will dig deep into the individual connectors that link into other IT systems. Understanding those, we'll also take a look at some sample calculations and how those calculations are built to feed BSM's visualizations.

Chapter 9: BSM and ITIL & Six Sigma

In this, our discussion of the basics, value, and utility of BSM, we've touched on many of the topics necessary for educating you on where and how BSM fits into your computing environment. In the past three chapters, we have focused specifically on three areas of value that BSM provides to management, operations, and IT.

In these past chapters, we've documented how dashboards and the data they provide enable management to make better decisions. We've illustrated the value to daily operations related to those dashboards and how good dashboard design brings visibility to otherwise complex data. Lastly, we drilled down into the technical IT elements that feed raw data into BSM's calculations engine. Figure 9.1 shows a representation of BSM in relation to those business-oriented inputs and outputs.

Figure 9.1: BSM provides reports and views to the business while ingesting expectations of service levels.

Chapter 10: A Primer on BSM

This guide has attempted to unravel the complexities surrounding BSM all the while illustrating to you the value proposition associated with its implementation. As we've learned, BSM is a platform upon which various classes of monitoring data can reside. As we've learned, BSM is effectively a mechanism by which the goals of business are applied to the technology of IT. With BSM, IT data and metrics are ingested from numerous sources, including BSM's own internal monitoring instrumentation, and used to calculate metrics that determine the overall health of a business service.

The intent for this chapter is to provide you with a sort of crib sheet for the concepts and major topics we've discussed throughout each of the previous chapters. As a Definitive Guide, this book is intended to be fully comprehensive in the story it attempts to tell. But sometimes what you really need is a down-and-dirty explanation with plenty of useful images to be used in helping others understand the utility and value associated with BSM. That being said, this chapter intends to wrap up our conversation on this topic as well as reiterate the critical concepts and takeaways discussed in each of the earlier chapters.

BSM—More than Just a Framework

Throughout this guide, we've started each chapter with a story. That story has told the tale of John Brown and Dan Bishop's iterative embrace of BSM as a solution for quantitatively illustrating the quality of their online B2B Web service. The story started out in Chapter 1 with what appears at the outset to be a common problem in any IT environment. There, what was originally believed to be a “minor IT system” went down one night. But that system was quite a bit more important to the functionality of FCG's entire Web system. Its outage resulted in a 6-hour loss of order state information for special orders placed through the Web service.

During that 6-hour outage, every order placed ended up in an incomplete state. Neither the individual placing the order nor FCG itself knew the status of the order. As a result, FCG's ordering department was forced to go through a highly time-consuming and costly reconciliation process to determine the state of each order and work with its owner to resolution. This added activity was expensive enough to significantly affect FCG's quarterly numbers. What we learned over the course of each chapter's example story was how an iterative approach to a BSM implementation in the end significantly benefitted FCG's ability to make better business decisions as well as track and resolve problems as they occur.