The Shortcut Guide to IT Workload Automation and Job Scheduling

by Mark Scott


With the explosion of data systems in most enterprises, the need to integrate and share data across systems becomes vital. For most organizations, this means coordinating a delicate balance of processes that move data between systems. The task of orchestrating this movement, tracking and troubleshooting issues, and optimizing the use of shared enterprise resources is quite daunting. Add to that the changing face of service oriented architectures and the need to maintain green IT operations, and the prospect can become overwhelming.

The Shortcut Guide to IT Workload Automation and Job Scheduling provides CIOs, IT Managers, Enterprise Architects and IT Operations personnel with an incisive look into the issues surrounding the scheduling and automation of jobs in heterogeneous enterprise environments. It provides insights into the key obstacles to overcome and provides best practices to help design and maintain the operation of mission critical tasks throughout the organization.


Chapter 1: Challenges of IT Workload Automation and Job Scheduling

The information technology (IT) landscape for many organizations has grown into a complex web of servers, networks, locations, products, and processes. Simple, siloed applications now need to interact with systems scattered across the country and across the globe. The process of helping all these systems, processes, and procedures work in an efficient, orderly, and secure manner has become an enormous task.

The purpose of this guide is to identify the issues surrounding the design, control, and management of jobs within an organization. It provides insight into challenges and recommends best practices for establishing an efficient, maintainable system for scheduling, monitoring, and adjusting jobs within the enterprise.

Chapter 2: IT Workload Automation and Job Scheduling

There are many automation workloads that IT must see executed regularly on the servers throughout their organization. As mentioned in the previous chapter, the task of coordinating a series of individual jobs across servers, platforms, applications, firewalls, time zones, and wide area network (WAN) links can be daunting. When the jobs depend on one another, IT needs a system that allows these different tasks to work together as if designed as part of a single program.

This chapter will explore the process of scheduling those tasks and orchestrating them to work together. There are several aspects to consider when choreographing the jobs throughout the enterprise:

  • Defining Plan Requirements: Plan requirements should be outcome based. The end result is paramount, and the steps required only facilitate that end. When considering plan requirements, one must identify the input sources and tasks required to provide those inputs. The dependencies of the jobs on one another must be determined. The interfaces that jobs use to communicate with one another must be defined. One must choose the appropriate paradigm for linking jobs (time schedules, events, and so on). The jobs compete for the same computing resources, so mechanisms must be devised for determining priorities. A system of monitoring and auditing must be established.
  • Time-Based Schedules: The oldest and most familiar paradigm for scheduling, basing things on a time schedule, can be anything but simple. The windows and time constraints must be taken into consideration. Implementing the business calendar—including weekends, holidays, month end closings and fiscal events—must be factored into the schedule. Determining which events are synchronous and which are asynchronous, which events can occur in parallel and which in series, and how to keep everything interoperating correctly becomes the key to building a dependable, workable schedule.
  • Event-Driven Scheduling: Plans can be created more simply if one job can call another job directly through a programmatic interface. To develop event-driven schedules, the manner in which jobs are triggered and monitored must be defined. The jobs are not scheduled, so dynamic selection of available resources becomes most important. The system must employ event-driven interfaces, such as a Web service, message queues, file triggers, server systems alerts, and other targets that listen for specific job occurrences. The system must be designed to audit and alert to irregularities and the absence of expected events.
  • Job Execution: The execution of plans associated with a schedule must be responsive to the needs of the organization. The load must be balanced across resources, either manually or automatically. Resource utilization must be monitored on a regular basis to help keep the optimal usage of resources. Error-handling should be built-in to the system and compensating resources employed to keep the schedule on track when failures occur. There must be the means for prioritizing a related stream of jobs that occur on different systems throughout the organization and reorganizing those jobs as they relate to one another. Service Level Agreements (SLAs) help form the basis of these priorities and are critical to keeping the system well ordered.
  • Monitoring and Reporting: To govern enterprise solutions, monitoring becomes vital. Establishing guidelines and mechanisms for reporting on the execution and health of the entire automation system helps Operations validate that everything is running as designed. Alerts will help the staff respond quickly to errors and correct them within the established limits of the SLAs. The reporting can be used to satisfy regulatory and corporate compliance standards. And proactive analysis of the operational data can help the system adjust to impending risks before those risks become issues and create disruptions in the automation workflows.

Chapter 3: IT Workload Automation and Job Design and Configuration

The design of an IT automation job is somewhat unique. For jobs that occur on a single server or group of servers on the same operating system (OS), it is much simpler. Most OSs and applications provide systems for scripting and scheduling simple repetitive jobs. The task becomes challenging because of the variable landscape found in most enterprises.

Applications often do not run on the same types of servers. Many enterprises use a mix of mainframes; minicomputers; and UNIX, Linux, and Microsoft Windows servers. These servers often are found in data centers distributed throughout the world, separated by time zones, WAN links, firewalls, and network domains. For all their diversity, IT must help them work together as a single, seamless system that allows servers to share information and keep the entire organization current. This chapter will address many critical aspects of designing and configuring systems that can husband tasks that span the servers, applications, data centers, and networks used in enterprises:

  • Job Modeling—The job automation processes will need to conform to business processes and adjust as business processes evolve. The process will need to compare traditional time-based scheduling with event-driven scheduling. The model must also determine the best use of appropriate resources to accomplish the work in a cost-effective manner.
  • Target Platforms—The individual job steps of each plan are performed on a specific server or group of servers, so those servers are a key consideration in planning and configuring job automation. Applications and servers do not necessarily store or share information in the same manner, so the format of data and interfaces used to share it must be accounted for. Even the batch languages used to execute specific job tasks play a role in the maintenance and cost of the automation task sequences.
  • Communication Systems—Sharing information between servers is at the heart of multi-system workload automation. The choice of mechanisms to move data between servers will affect the effectiveness of the plans. When systems are connected unreliably or only occasionally, the mechanisms for sharing must be designed accordingly. Monitoring communications between servers then becomes critical for troubleshooting and proactively maintaining the schedule.
  • Conditional Logic—Plans can have individual jobs that adjust to the conditions of the moment if they have variables that can be populated and used to make decisions on how plans are processed. Implementing variables that take information from one server and pass it to another can allow conditional branching of the job steps. This can result in much more efficient plan processing. Conditional logic can also help with error correction and the synthesis of durable, reliable systems.
  • Security—Secure organizations employ defense in depth when building networks. The IT automation system will need to deal with network security and credentials that allow it to access servers safely. Secured network channels need to be built to connect servers to one another. The credentials need to be protected and yet maintained to allow the system to operate in compliance with corporate standards.
  • Enterprise Integration Architecture—Many enterprise applications provide their own mechanisms for sharing information. The IT automation workload system should leverage the Service-Oriented Architectures (SOAs) and application programming interfaces provided by the individual applications. Effective use of resources derived from Event Driven Architectures (EDAs) should be included. Designers need to leverage the business modeling built-in to these applications so that the logic does not need to be duplicated. A system that can provide a single point of scheduling will prove easier to monitor and maintain. The system should provide mechanisms that ease deployment and re-configuration as well as a unified reporting structure that helps operations oversee job automation functions as a whole.

Chapter 4: Best Practices for IT Workload Automation and Job Scheduling

When organizations approach solving their business challenges, they want to know what approach is likely to solve their problem. A proven solution can save time, money, and frustration when developing the solution. This chapter will address approaches to IT workload automation and job scheduling that have been vetted by success in many other organizations. It should provide a place to begin when solving the specific challenges for a given enterprise. It outlines topics of consideration that every organization should address to determine how to define, deploy, and manage their automations system. These topics of discussion include:

  • Service Level Agreements: To build a system that meets the needs of the enterprise, IT and the business must agree to the level of service that they expect. The demands for performance and limits of budget must be balanced and an agreement reached so that everyone knows what to expect. A clear set of service level agreements (SLAs) provide the key ingredients for planning a successful workflow and job automation system.
  • Job Deployment and Execution: The system must operate cost effectively. Beyond the purchase price of the system, organizations must consider the operational costs, maintenance costs, and efficient use of existing resources. Mapping out these costs can help design and implement a system with a higher return on investment and a system that will provide years of additional savings to the organization.
  • Monitoring and Auditing: There is no way to operate or improve the system if there is no record of what the system is doing. A well-designed system provides timely notifications to system operators of issues and provides them with the right information to correct problems effectively. Integration with enterprise resource monitoring systems provides a more complete picture to allow automation planners to optimize the system and plan for the future. The ever-expanding regulation of corporate data and system operations also makes auditing a key component to successful automation systems.
  • Resource Management: The automation system works on the corporate servers that provide information to the entire organization. Using those servers effectively can help reduce server counts and bandwidth constraints. From making the best use of licenses to opening up data center real estate to conserving power, systems that make optimum use of corporate resources reduce costs.
  • System Architecture: Businesses grow, change, and adapt. The automation must take these changes in stride. By providing a nimble, resilient architecture, the automation system can help the organization make the best changes with little anxiety over the ability of the automation system to fulfill its mandated role.