sizing-up it operations management: three key points of comparison

Report 1 Downloads 37 Views
SIZING-UP IT OPERATIONS MANAGEMENT: THREE KEY POINTS OF COMPARISON M OVING B EYOND M ONITORING , A LERTING AND R EPORTING TO D ELIVER AN A UTOMATED C ONTROL S YSTEM FOR T ODAY ’ S V IRTUALIZED D ATA C ENTERS VMTurbo is a company founded on the belief that IT operations management needs to be fundamentally changed to allow companies to unlock the full value of virtualized infrastructure and cloud services. The approach must be shifted from bottom-up to

The analytic engine must be built with the goal of

top-down, and the emphasis must be transitioned from manual intervention to prescriptive analytics that identify the necessary actions to control the environment in an optimal state—one where

prescribing specific actions

applications get the resources they require to meet business goals

that will maintain the

while making most efficient use of network, storage and compute

environment in the desired

resources. Virtualization is a key enabler, but existing management

state based on business rules

tools fail to embrace and leverage this new architecture effectively.

– in effect, automating the decision-making process for IT operations.

With traditional approaches, too much time is spent collecting thousands of data points, triggering alerts based on perceived anomalies, and leaving IT operators with the difficult task of figuring out what to do to return the system to acceptable performance. These approaches, by definition, lead to unpredictable service, are OPEX-intensive, and become increasingly complex as shared-everything environments are constantly changing due to workload fluctuations, virtual machine movement, and all the self-service benefits that accompany virtual infrastructure.

To do it properly, the operations management framework must have three important characteristics in its design: 1. The approach must be oriented on a top-down view of the infrastructure with a layer of abstraction allowing the environment to be modeled in rapid fashion and removing the data collection burden. 2. The model must be capable of understanding the entire environment and all the constraints and interdependencies that exist within it. 3. The analytic engine must be built with the goal of prescribing specific actions that will maintain the infrastructure in the desired state based on business rules—in effect, automating the decisionmaking process for IT operations.

© 2012 VMTurbo, Inc. All Rights Reserved.

 

RETHINKING THE APPROACH TO IT OPERATIONS MANAGEMENT Conceptually, virtualization provides an incredible opportunity to change the way IT is managed through the ability to use software controls to dynamically change resource allocation and workload configurations for applications operating across a shared physical infrastructure. Examples of decisions that can be executed through these software controls

The number of variables, constraints, and dependencies that must be considered and modeled to make effective decisions is immense. When these challenges are combined with the dynamic nature of the virtual infrastructure itself, it creates a problem that is exponentially complex in nature. And, as the environment grows beyond a handful of physical hosts it becomes impossible to compute before variations in the data occur.

include altering workload placement on servers

Fundamentally, the traditional approach of

and storage infrastructure; changing virtual

collecting data, generating alerts and manually

machine resource limits and reservations; starting

troubleshooting cannot scale to meet the

and stopping virtual machines; or cloning

requirements of this new architecture. IT

application instances. In effect, virtualization

operators struggle to keep pace and the

enables the environment to be horizontally

collection mechanism itself creates a significant

scaled (to meet specific workload demand) and

tax on the overall system. Moreover, this

vertically scaled (to reallocate resources based

approach is not designed to leverage the

on existing workload requirements) in rapid

inherent fluidity in virtual data centers to control

fashion.

the environment based on workload demands,

These controls provide an agile way for IT operators to optimize resource usage or address

resource capacity, and configuration and business constraints.

performance bottlenecks reported by monitoring systems. However, because of the

Key benefits to a new approach:

shared nature of virtualized infrastructure and



the dynamic fluctuation of workloads, decisions

Provides a more stable and

regarding how to orchestrate these controls

consistent user experience by

need to be taken with great care to prevent any

assuring the quality of service of

action from impacting performance and

virtualized applications

efficiency of other IT services. For example, re-



configuring the placement of virtual machines

Lowers operational costs by reducing

on servers in a cluster or re-sizing a virtual

the number of problems and

machine on a host might solve a specific

incidents IT must handle

resource bottleneck, but cause other resource



constraints across the environment. This knock-on

storage assets by driving higher levels

effect can impact applications or workloads

of utilization across the environment

that are more latency-sensitive or critical to the business.

Improves the ROI of compute and

 

2  

  When determining a strategy for performance

actions to maintain the system in the optimal

assurance of virtualized services it is important to

operating state. Doing this properly requires a

draw the distinction between vendors based on

layer of abstraction across the environment

several key differences in how the solutions

through which the analytic model can be run to

approach solving this complex set of challenges.

determine the right actions based on business rules and system interdependencies. This solves

A BOTTOM-UP COLLECTION MODEL VS. A TOP-DOWN ABSTRACTION MODEL

for the “data collection at scale” issue that can manifest itself in larger environments and ensures that the analysis engine is designed to prescribe

Solutions with a heritage in visibility and alerting

actions with a full understanding of the

often incorporate data analysis engines that

topological relationships in the infrastructure. By

focus on examining thousands of performance

focusing specifically on prescriptive analytics,

metrics to identify abnormal patterns in the data

this type of solution approaches operations

and infer impact or potential impact to service

management with the goal of preventing

performance. In many cases these analytic

performance constraints based on service level

engines focus on thresholds and correlate events

priorities and determining the specific actions

to identify anomalies based on learned

that will allocate resources appropriately.

behavior. This is problematic as threshold-driven events and learned behavior can be misleading frequently, or—as is often the case—not

ELEMENT- VS. ENVIRONMENT-CENTRIC RESOURCE OPTIMIZATION

configured optimally.

Resource optimization is a key benefit marketed

if the environment is irregular, changes

More importantly these approaches are bottomup methods that are not designed with the goal of determining the actions required to systematically control resource allocation and workload performance. Because they focus exclusively on the myriad of individual metrics at the infrastructure layer, they lack the necessary understanding of topological relationships and dependencies that are required to effectively drive intelligent decisions (and actions) across the IT environment that result in maintaining the health of the infrastructure. At best, they present operators with huge amounts of event data and require them to drill into it with the hope of

by vendors across the IT operations management landscape. However, it is important to properly assess what each vendor is actually delivering in this regard. Does the solution focus on individual metrics at the component level and “optimize” based on a narrow view of each element? Or is the solution more comprehensive in nature, understanding the constraints and interdependencies across the environment?

Does the solution focus on individual metrics at the component level and “optimize” based on a narrow view of each element?

determining what actions are required to address the anomaly. A better approach is a top-down one that understands the control points that can be

Or is the solution more comprehensive in nature, understanding the constraints and interdependencies across the environment?

leveraged to tune the environment and uses only the data it needs to prescribe the necessary

3  

 

A decision analysis engine takes a top-down approach. It understands the control points that can be leveraged to tune the environment & uses only the data it needs to prescribe the necessary actions to maintain the system in the optimal operating state.

Element-centric optimization is fairly straightforward

PROCESS VS. DECISION AUTOMATION

in that it focuses on specific requirements and

As with “resource optimization,” the term

constraints on an individual metric basis for a given

automation is used extensively in the marketing

workload or physical resource. The most common

lexicon of all IT operations management vendors.

application of this in virtual environments is for

And with good reason—manual tasks are labor

virtual machine rightsizing. For example, it is

intensive and prone to error. When actions can be

possible to look at an individual virtual machine

automated, they should be (IT process or run-book

and conclude that the allocated vMem, vCPU or

automation solutions do just that). These solutions

vDisc should be increased due to usage

automate many of the discrete tasks associated

exceeding a threshold.

with running the virtual data center. However, they

However, taking these actions could create larger issues in the environment if they are not considered in the context of other workloads sharing those

do not solve for the complex decision-making requirements that most IT operators face in maintaining the environment.

resources. Before increasing resource allocation,

In reality, these solutions are well suited to

virtual machines may need to be moved to

automation where the individual steps in the

different hosts or data stores to create the

process can be very clearly defined, programmed,

headroom in the environment so the change does

and executed in a workflow engine.

not impact the performance of other workloads. If

Unfortunately, determining the actions required to

there is simply not enough capacity in the

maximize performance and efficiency across the

environment to meet the increased demand then

virtualized infrastructure is not an easy task, as

physical resources might need to be added before

each workload has its own personality and

allocating more virtual machine resources.

consumes resources differently from its neighbors.

Additionally, if no capacity is available and

This means that very different results may be

resources in the environment are constrained,

achieved depending on how workloads are

understanding the service levels or business priority

combined on different server and storage

of this workload as compared to others in the

resources and based on how physical or virtual

system is required before addressing the need.

resources are sized. Decision automation requires a deeper level of understanding beyond just how to procedurally execute a set of tasks. To effectively ensure performance, the solution must be capable of determining what tasks to carry out.

4  

  In effect, the process automation itself is the easy part. To solve the workload performance management challenge, a decision analysis engine must determine and prescribe resource allocations and workload configurations based on the assessment of multiple criteria on an ongoing basis. This includes individual workload demand and patterns, the capacity of allocated physical and virtual resources, the environmental and business constraints which impact what decisions can actually be taken, and with full understanding of the systematic effect of executing those decisions across the environment. Once the actions have been identified, the automation capabilities are readily available in the virtualization layer via APIs or through comprehensive run book automation solutions.

CONCLUSION At VMTurbo, our operations management solution focuses specifically on applying this new approach for planning, onboarding, and controlling virtualized data centers. By automating the decision-making process in software, VMTurbo Operations Manager maximizes utilization of the physical infrastructure, ensures critical applications have the resources they require, and reduces the operational costs of running virtual data centers. To do it, the product employs an economic abstraction on the IT infrastructure and uses a market-based approach driven by pricing principles to derive the specific actions that tune the environment for optimal performance and utilization. VMTurbo is the only vendor that provides a closed-loop management system capable of holistically assuring workload QoS while maximizing

Decision automation requires a deeper level of understanding beyond just how to procedurally execute a set of tasks. To

infrastructure efficiency. Our solution continuously identifies inefficiencies, resource contention and bottlenecks in the system and is able to determine—and automate—the necessary actions

effectively ensure performance, the solution

that control the environment in the optimal

must be capable of determining what tasks

operating zone. It changes the economics of

to carry out.

managing virtualized data centers and delivers operational savings and productivity gains across the organization. And, it is a better approach to IT

Continuously ensuring workload performance while maximizing the utilization of the underlying

operations management in today’s virtualized data center.

infrastructure is a complex problem to solve. It requires a highly sophisticated decision analysis engine, a holistic view of the environment built on an abstraction layer that reduces complexity, and a top-down understanding of the control points in

  www.vmturbo.com

the virtualized infrastructure so that the right actions can be taken.

5