Creating Unified IT Monitoring and Management in Your Environment

by Don Jones


The IT industry continues to evolve more advanced and efficient tools for troubleshooting operational problems, monitoring operational stability, and improving service response times. However, many organizations are still carrying around decades-old philosophies and processes that hinder true improvement in IT management. In Creating Unified IT Monitoring and Management in Your Environment, IT author and analyst Don Jones examines these outdated processes, and proposes a new set of philosophies and approaches designed to help maintain improved service levels, reduce time to resolution for problems, and increase the communication and cooperation between all levels of the organization — both within IT as well as within the organization's larger user community. Told in part through composite case studies, readers will truly find themselves "inside the story," and discover new ways to make IT monitoring and management more effective. Jones brings a practical perspective to his discussion by helping readers to build a realistic "shopping list" for features and capabilities that need to be brought into their environments to realize his proposed process improvements and enhanced approaches.


Chapter 1: Managing Your IT Environment: Four Things You're Doing Wrong

At the very start of the IT industry, "monitoring" meant having a guy wander around inside the mainframe looking for burnt-out vacuum tubes. There wasn't really a way to locate the tubes that were working a bit harder than they were designed for, so monitoring-such as it was-was an entirely reactive affair.

In those days, the "Help desk" was probably that same guy answering the phone when one of the other dozen or so "computer people" needed a hand feeding punch cards into a hopper, tracking down a burnt-out tube, and so on. The concepts of tickets, knowledge bases, service level agreements (SLAs), and so forth hadn't yet been invented.

IT management has certainly evolved since those days, but it unfortunately hasn't evolved as much as it could or should have. Our tools have definitely become more complex and more mature, but the way in which we use those tools-our IT management processes-are in some ways still stuck in the days of reactive tube-changing.

Some of the philosophies that underpin many organizations' IT management practices are really becoming a detriment to the organizations that IT is meant to support. The discussion in this chapter will revolve around several core themes, which will continue to drive the subsequent chapters in this book. The goal will be to help change your thinking about how IT management-particularly monitoring-should work, what value it should provide to your organization, and how you should go about building a better-managed IT environment.

This chapter examines the following problems:

  • Problem 1: You're Managing IT in Silos
  • Problem 2: You Aren't Connecting Your Users, Service Desk, and IT Management
  • Problem 3: You're Measuring the Wrong Things
  • Problem 4: You're Losing Knowledge

Chapter 2: Eliminating the Silos in IT Management

In the previous chapter, I proposed that one of the biggest problems in modern IT is the fact that we manage our environment in technology‐specific silos: database administrators are in charge of databases, Windows admins are in charge of their machines, VMware admins run the virtualization infrastructure, and so forth. I'm not actually proposing that we change that exact practice—having domain‐specific experts on the team is definitely a benefit. However, having these domain‐specific experts each using their own unique, domain‐specific tool definitely creates problems. In this chapter, we'll explore some of those problems, and see what we can do to solve them and create a more efficient, unified IT environment.

Too Many Tools Means Too Few Solutions

"Comparing apples to oranges" is an apt phrase when it comes to how we manage performance, troubleshooting, and other core processes in IT. Tell an Exchange Server administrator that there's a performance problem with the messaging system, and he'll likely jump right into Windows' Performance Monitor, perhaps with a pre‐created counter set that focuses on disk throughput, processor utilization, RPC request count, and so forth—as shown in Figure 2.1.

Figure 2.1: Monitoring Exchange.

Chapter 3: Connecting Everyone to the IT Management Loop

IT management has for too long involved discrete, disconnected processes that often leave key participants wondering what's going on. Bringing everyone-users, managers, IT professionals, and more-into the loop can create significant benefits as well as reduce the tendency to fall back into discipline-based silos. This is where the integration between monitoring and service desk truly happens, and these concepts deliver the most critical, central themes discussed throughout this book. It's all about communication-ways to better achieve communication as well as create opportunities for continuous improvement.

Users sometimes perceive their IT department as out-of-touch, ivory-tower geeks with poor people skills. Whether or not that's true depends on the actual IT team members, but the perception, fair or not, often exists. That's because IT can too often be the last ones to know about things that users perceive as problems. Sure, the server might me humming along within specs, but the order-entry application is incredibly slow. IT says that email is working fine, but I've been waiting on an incoming purchase order for an hour-the email system can't possibly be working correctly!

IT has its own unique problems to deal with, and they sometimes involve a disconnect with management. Finding windows in which to make approved changes, for example, can be incredibly tricky. Simply coordinating the changes that are proposed, approved, under development, ready for implementation, and so forth can be difficult. Many organizations have adopted change management frameworks, such as those proposed by ITIL, that outline specific processes for reviewing and approving changes. Physically coordinating that process, however, can seem like herding cats. It's even worse when IT has been divided into silos: The database team might have a change scheduled for tonight, but that change is going to conflict with the power supply changes being implemented by the data center team. We need to get everyone on the same page.

Chapter 4: Monitoring: Look Outside the Datacenter

IT has moved beyond our own data centers, and nearly every organization has at least one or two outsourced IT services. Although we're probably always going to have on-premise assets to manage and monitor, we need to realize that in most cases, monitoring has to start outside the data center-both in the sense of accommodating off-premises services as well as focusing more closely on what end users are actually experiencing.

Monitoring Technical Counters vs. the End-User Experience

The traditional IT monitoring approach is what I call inside out: It starts within the data center and moves outward toward the end user. Figure 4.1 provides a visual for this idea, illustrating how typical monitoring focuses on the backend: database servers, application servers, Web servers, cloud services, and so forth. The general reason for this approach is that we have the best control and insight over what's inside the data center. If everything inside the data center is running smoothly, it stands to reason that the end users who consume the data center's services will be happy.

Most Service Level Agreements (SLAs) derive from this approach: We promise a certain amount of uptime, and we set up monitoring thresholds around data center-centric measurements like CPU utilization, network utilization, disk utilization, and so forth. We also tend to look at low-level response times: query response time, disk response time, network latency, and so on.

There's something deeply and inherently inaccurate about the underlying assumption of this approach: Even if you start with a perfect pile of bricks, there's no guarantee that you're going to end up with a stable building in the end. In other words, what end users experience isn't merely the sum of the data center's various metrics. A smoothly-running data center usually leads to satisfied users, but that isn't always the case.

It's obviously important for us to continue monitoring these data center-centric measurements, but those can't be the only thing we monitor and measure. Current thinking in the industry is that we need to more directly measure what the end user experiences. In fact, "end user experience," or EUE, has become a common term in more forward-thinking management circles.

Here's another way to think of it: Suppose you go to a restaurant to eat. Your steak comes out cooked wrong, they brought the wrong side items, and the waitress is rude. The manager, standing back in the kitchen, thinks everything is fine: the steaks are hot, the veggies are hot, and the waitress smiles at him every time she goes back there. He's focused on the backend, with no knowledge of your expectations. Restaurants address this by having the manager periodically roam around and ask, "Is everything okay?" That's monitoring the EUE: Rather than looking at his back-of-house metrics, he's going out into the cube farm—er, restaurant floor—and testing the waters.

Chapter 5: Turning Problems into Solutions

The satirical news outlet The Onion recently ran a story related to the economy. In it, the publication claimed that a special kind of scientist called a historian was advancing the novel idea of looking at the past. "Sometimes," one pseudo-historian was quoted, "we can look at how people tried to solve problems which are similar to those problems we are having today. We can look and see how their solutions worked, and that can give us an idea of whether or not the same solution will work for us." Hah!

Although targeted at politicians who seem to keep making the same mistakes over and over, The Onion's jibe is pretty applicable to IT as well. "Look, if this same problem happened 3 months ago, and we solved it then, perhaps we can solve it more quickly now. What, exactly, did we do last time? Maybe doing the same thing again will have the same effect that it did then!"

I'll put it another way: Perhaps you have children, or at least know someone who does. Ever tell a kid not to touch the hot pot that's on the stove? Sure. Did they touch it? Of course. How many times? Usually just once. That's because human beings are designed to learn primarily by making mistakes. Provided we remember the mistake, and that we remember how to avoid it or solve it, we can do so in the future very quickly. Memory becomes the key factor, and as we get older, stop touching hot pots and start playing with computers at work, it sometimes gets harder to remember. This chapter is all about the final aspect of unified management: Taking problems that we've solved, and turning those into solutions for the future.

Chapter 6: Unified Management, Illustrated
In this final chapter of the book, I want to revisit everything from the first five chapters. However, I’m going to do so in the form of case studies. I’ve been fortunate enough to speak with several consulting clients of mine who’ve been struggling with the same issues I’ve outlined, and who’ve recently been trying solutions that follow the basic approach I’ve described. They’ve agreed to let me share their stories (although they’ve asked that I not use their names or company names) so that you can get a before-and-after look at how this “unified management” thing should work. Along the way, I’ll also share some of the challenges and roadblocks they’ve encountered. A switch to unified management isn’t always going to be hassle-free, so I think it’s valuable for you to see what they’ve had to deal with, and how they think they’re going to do so.

This chapter will also include some of the practical information on unified management that hasn’t made it into the previous chapters. I’ll provide a consolidated shopping list of unified management features so that when you start examining solutions, you can have that list in hand to help you. I’ll also look at different purchasing models that vendors are offering these days to give you an idea what kind of flexibility you might have for acquiring and implementing a solution.