SIZING-UP IT OPERATIONS MANAGEMENT: THREE KEY POINTS OF COMPARISON M OVING B EYOND M ONITORING , A LERTING AND R EPORTING TO D ELIVER AN A UTOMATED C ONTROL S YSTEM FOR T ODAY ’ S V IRTUALIZED D ATA C ENTERS
© 2012 VMTurbo, Inc. All Rights Reserved.
VMT-CONTROL-1212
VMTurbo is a company founded on the belief that IT operations management needs to be fundamentally changed to allow companies to unlock the full value of virtualized infrastructure and cloud services. The approach must be shifted from bottom-up to
The analytic engine must be built with the goal of
top-down, and the emphasis must be transitioned from manual intervention to prescriptive analytics that identify the necessary actions to control the environment in an optimal state—one where
prescribing specific actions
applications get the resources they require to meet business goals
that will maintain the
while making most efficient use of network, storage and compute
environment in the desired
resources. Virtualization is a key enabler, but existing management
state based on business rules
tools fail to embrace and leverage this new architecture effectively.
– in effect, automating the decision-making process for IT operations.
With traditional approaches, too much time is spent collecting thousands of data points, triggering alerts based on perceived anomalies, and leaving IT operators with the difficult task of figuring out what to do to return the system to acceptable performance. These approaches, by definition, lead to unpredictable service, are OPEX-intensive, and become increasingly complex as shared-everything environments are constantly changing due to workload fluctuations, virtual machine movement, and all the self-service benefits that accompany virtual infrastructure.
To do it properly, the operations management framework must have three important characteristics in its design: 1. The approach must be oriented on a top-down view of the infrastructure with a layer of abstraction allowing the environment to be modeled in rapid fashion and removing the data collection burden. 2. The model must be capable of understanding the entire environment and all the constraints and interdependencies that exist within it. 3. The analytic engine must be built with the goal of prescribing specific actions that will maintain the infrastructure in the desired state based on business rules—in effect, automating the decisionmaking process for IT operations.
3
RETHINKING THE APPROACH TO IT OPERATIONS MANAGEMENT
The number of variables, constraints, and dependencies that must be considered and modeled to make effective decisions is
Conceptually, virtualization provides an
immense. When these challenges are combined
incredible opportunity to change the way IT is
with the dynamic nature of the virtual
managed through the ability to use software
infrastructure itself, it creates a problem that is
controls to dynamically change resource
exponentially complex in nature. And, as the
allocation and workload configurations for
environment grows beyond a handful of physical
applications operating across a shared physical
hosts it becomes impossible to compute before
infrastructure. Examples of decisions that can be
variations in the data occur.
executed through these software controls include altering workload placement on servers
Fundamentally, the traditional approach of
and storage infrastructure; changing virtual
collecting data, generating alerts and manually
machine resource limits and reservations; starting
troubleshooting cannot scale to meet the
and stopping virtual machines; or cloning
requirements of this new architecture. IT
application instances. In effect, virtualization
operators struggle to keep pace and the
enables the environment to be horizontally
collection mechanism itself creates a significant
scaled (to meet specific workload demand) and
tax on the overall system. Moreover, this
vertically scaled (to reallocate resources based
approach is not designed to leverage the
on existing workload requirements) in rapid
inherent fluidity in virtual data centers to control
fashion.
the environment based on workload demands, resource capacity, and configuration and
These controls provide an agile way for IT
business constraints.
operators to optimize resource usage or address performance bottlenecks reported by monitoring systems. However, because of the
Key benefits to a new approach:
shared nature of virtualized infrastructure and
•
the dynamic fluctuation of workloads, decisions
Provides a more stable and
regarding how to orchestrate these controls
consistent user experience by
need to be taken with great care to prevent any
assuring the quality of service of
action from impacting performance and
virtualized applications
efficiency of other IT services. For example, re-
•
configuring the placement of virtual machines
Lowers operational costs by reducing
on servers in a cluster or re-sizing a virtual
the number of problems and
machine on a host might solve a specific
incidents IT must handle
resource bottleneck, but cause other resource
•
constraints across the environment. This knock-on
Improves the ROI of compute and storage assets by driving higher levels
effect can impact applications or workloads
of utilization across the environment
that are more latency-sensitive or critical to the business.
4
When determining a strategy for performance
actions to maintain the system in the optimal
assurance of virtualized services it is important to
operating state. Doing this properly requires a
draw the distinction between vendors based on
layer of abstraction across the environment
several key differences in how the solutions
through which the analytic model can be run to
approach solving this complex set of challenges.
determine the right actions based on business rules and system interdependencies. This solves for the “data collection at scale” issue that can
A BOTTOM-UP COLLECTION MODEL VS. A TOP-DOWN ABSTRACTION MODEL
manifest itself in larger environments and ensures that the analysis engine is designed to prescribe
Solutions with a heritage in visibility and alerting
actions with a full understanding of the
often incorporate data analysis engines that
topological relationships in the infrastructure. By
focus on examining thousands of performance
focusing specifically on prescriptive analytics,
metrics to identify abnormal patterns in the data
this type of solution approaches operations
and infer impact or potential impact to service
management with the goal of preventing
performance. In many cases these analytic
performance constraints based on service level
engines focus on thresholds and correlate events
priorities and determining the specific actions
to identify anomalies based on learned
that will allocate resources appropriately.
behavior. This is problematic as threshold-driven events and learned behavior can be misleading frequently, or—as is often the case—not
ELEMENT- VS. ENVIRONMENT-CENTRIC RESOURCE OPTIMIZATION
configured optimally.
Resource optimization is a key benefit marketed
if the environment is irregular, changes
by vendors across the IT operations
More importantly these approaches are bottom-
management landscape. However, it is
up methods that are not designed with the goal
important to properly assess what each vendor is
of determining the actions required to
actually delivering in this regard. Does the
systematically control resource allocation and
solution focus on individual metrics at the
workload performance. Because they focus
component level and “optimize” based on a
exclusively on the myriad of individual metrics at
narrow view of each element? Or is the solution
the infrastructure layer, they lack the necessary
more comprehensive in nature, understanding
understanding of topological relationships and
the constraints and interdependencies across
dependencies that are required to effectively
the environment?
drive intelligent decisions (and actions) across the IT environment that result in maintaining the
Does the solution focus on individual metrics
health of the infrastructure. At best, they present
at the component level and “optimize”
operators with huge amounts of event data and
based on a narrow view of each element?
require them to drill into it with the hope of determining what actions are required to
Or is the solution more comprehensive in
address the anomaly.
nature, understanding the constraints and
A better approach is a top-down one that
interdependencies across the environment?
understands the control points that can be leveraged to tune the environment and uses only the data it needs to prescribe the necessary
5
A decision analysis engine takes a top-down approach. It understands the control points that can be leveraged to tune the environment & uses only the data it needs to prescribe the necessary actions to maintain the system in the optimal operating state.
Element-centric optimization is fairly straightforward
PROCESS VS. DECISION AUTOMATION
in that it focuses on specific requirements and
As with “resource optimization,” the term
constraints on an individual metric basis for a given
automation is used extensively in the marketing
workload or physical resource. The most common
lexicon of all IT operations management vendors.
application of this in virtual environments is for
And with good reason—manual tasks are labor
virtual machine rightsizing. For example, it is
intensive and prone to error. When actions can be
possible to look at an individual virtual machine
automated, they should be (IT process or run-book
and conclude that the allocated vMem, vCPU or
automation solutions do just that). These solutions
vDisc should be increased due to usage
automate many of the discrete tasks associated
exceeding a threshold.
with running the virtual data center. However, they do not solve for the complex decision-making
However, taking these actions could create larger
requirements that most IT operators face in
issues in the environment if they are not considered
maintaining the environment.
in the context of other workloads sharing those resources. Before increasing resource allocation,
In reality, these solutions are well suited to
virtual machines may need to be moved to
automation where the individual steps in the
different hosts or data stores to create the
process can be very clearly defined, programmed,
headroom in the environment so the change does
and executed in a workflow engine.
not impact the performance of other workloads. If
Unfortunately, determining the actions required to
there is simply not enough capacity in the
maximize performance and efficiency across the
environment to meet the increased demand then
virtualized infrastructure is not an easy task, as
physical resources might need to be added before
each workload has its own personality and
allocating more virtual machine resources.
consumes resources differently from its neighbors.
Additionally, if no capacity is available and
This means that very different results may be
resources in the environment are constrained,
achieved depending on how workloads are
understanding the service levels or business priority
combined on different server and storage
of this workload as compared to others in the
resources and based on how physical or virtual
system is required before addressing the need.
resources are sized. Decision automation requires a deeper level of understanding beyond just how to procedurally execute a set of tasks. To effectively ensure performance, the solution must be capable of determining what tasks to carry out.
6
CONCLUSION
In effect, the process automation itself is the easy part. To solve the workload performance
At VMTurbo, our operations management solution
management challenge, a decision analysis
focuses specifically on applying this new approach
engine must determine and prescribe resource
for planning, onboarding, and controlling
allocations and workload configurations based on
virtualized data centers. By automating the
the assessment of multiple criteria on an ongoing
decision-making process in software, VMTurbo
basis. This includes individual workload demand
Operations Manager maximizes utilization of the
and patterns, the capacity of allocated physical
physical infrastructure, ensures critical applications
and virtual resources, the environmental and
have the resources they require, and reduces the
business constraints which impact what decisions
operational costs of running virtual data centers. To
can actually be taken, and with full understanding
do it, the product employs an economic
of the systematic effect of executing those
abstraction on the IT infrastructure and uses a
decisions across the environment. Once the
market-based approach driven by pricing
actions have been identified, the automation
principles to derive the specific actions that tune
capabilities are readily available in the
the environment for optimal performance and
virtualization layer via APIs or through
utilization. VMTurbo is the only vendor that provides
comprehensive run book automation solutions.
a closed-loop management system capable of holistically assuring workload QoS while maximizing infrastructure efficiency. Our solution continuously
Decision automation requires a deeper level
identifies inefficiencies, resource contention and
of understanding beyond just how to
bottlenecks in the system and is able to
procedurally execute a set of tasks. To
determine—and automate—the necessary actions
effectively ensure performance, the solution
that control the environment in the optimal
must be capable of determining what tasks
operating zone. It changes the economics of managing virtualized data centers and delivers
to carry out.
operational savings and productivity gains across the organization. And, it is a better approach to IT operations management in today’s virtualized
Continuously ensuring workload performance while
data center.
maximizing the utilization of the underlying infrastructure is a complex problem to solve. It requires a highly sophisticated decision analysis engine, a holistic view of the environment built on an abstraction layer that reduces complexity, and a top-down understanding of the control points in the virtualized infrastructure so that the right actions can be taken.
7
VMTurbo, Inc. One Burlington Woods Drive Burlington, MA 01803 USA Phone: (781) 373-3540 www.vmturbo.com