Welcome!

Virtualization Authors: Elizabeth White, Pat Romanski, Liz McMillan, Maureen O'Gara, Dana Gardner

Related Topics: Virtualization

Virtualization: Article

Monitoring in a Virtualized Environment

The power of virtualization lies in its dynamic nature

Virtualization Expo on Ulitzer

Monitoring is essential to ensure the availability, security and usability of IT infrastructure. However, as essential as monitoring is, it's never complete. There are always challenges in keeping pace with new innovations in infrastructure technologies. For example, with the rapid adoption of virtualization, IT organizations need new tools to monitor the additional complexity introduced by the technology.

A virtualized infrastructure introduces a new layer that needs to be monitored - the hypervisor. Earlier, it was the application, operating systems, and physical infrastructure, including storage and networking, that were the primary objects of monitoring. However, with virtualization, the operating systems are working with virtual resources made available to them by the hypervisor. While the hypervisor adds a new dimension to monitoring, the real challenge lies elsewhere.

The power of virtualization lies in its dynamic nature. It has the potential to combine the individual compute capacity available on physical nodes into one large compute unit, which can then be used by all virtualized workloads - the promise of reduced costs and increased efficiencies that virtualization represents is a direct result of these dynamics. It is precisely this dynamic nature that makes monitoring a real challenge in a virtualized infrastructure.

To understand these challenges it's important to know who is interested in monitoring and what expectations they have from their monitoring solutions.

System Administrators
There are at least three specialized areas in an IT infrastructure that are typically administered by different individuals or groups: virtual infrastructure (VI) administrators, storage administrators, and network administrators. Virtualization brings interesting new interactions among these three domains. In a virtual environment it's often difficult to isolate a problem if all three aspects are not evaluated in a holistic manner. What these administrators share is the need to see a comprehensive reporting of the performance and health of these systems. Since their roles require them to ensure that the infrastructure delivers on their Service Level Agreements (SLAs), they expect the monitoring system to be capable of detecting and alerting them to any hardware or software failures, performance degradation, or capacity imbalance or overload.

Application Owners
Many application owners are skeptical about installing and configuring applications in a virtualized environment. First, they are often unsure how their application will behave in a virtualized environment. This is compounded by the fact that virtual machines hosting various application tiers or even different applications now share the same physical infrastructure and therefore interact and potentially conflict in ways that are difficult to anticipate. The application owners need to assure themselves that their end users are getting the required performance from the application to be productive and that they are able to deliver on response time and availability SLAs. As a result, there is a greater need for application availability and response monitoring to be in place to detect issues as early as possible. In addition, the monitoring tools need to ascertain if the observed problems are an application issue or an infrastructure issue and help in assign the responsibility to the right owner.

Capacity Management and Planning
With a dynamic virtualized environment, capacity management becomes more of a day-to-day activity compared to physical infrastructure, where procurement delays can impede flexibility. A virtualized environment can be very adaptable to changes in workload, the evolution of applications, and in identifying maintenance windows. The analytics built on the collected data should provide the capacity manager with insight into the placement of virtual machines by examining historical patterns and current requirements. Similarly, capacity planners now need to have a global picture of the entire infrastructure, rather than capacity assigned to individual applications. They need to know in advance what will be a constraining resource in the near future by looking at the overall usage trends across the infrastructure.

Facilities Management
Facilities managers need to understand the power usage trends and predictive analysis for the future. They need recommendations on how additional efficiency and cost reduction can be extracted from the infrastructure. In addition, they would like to see a mechanism in place to consolidate workloads to a minimum number of physical servers at off-peak hours.

IT Head
The IT head needs high-level reporting and analysis on resource usage, license audits, capacity planning, cost of ownership, chargeback, VI performance, trends and predictive analysis, security threads, compliance, ROI, and TCOs, in addition to any perceived risks or threats. These details are required for the IT managers to plan and strategize.

IT Security Head
The security head needs to see reporting and analysis to identify threats and security violations with the help of security policy monitoring (configuration, patch management, VM sprawl, access control etc.), infrastructure security monitoring, and compliance monitoring.

While monitoring tools for physical infrastructure that collect key metrics and indicators from a variety of sources (excluding the hypervisor layer) already exist, the analytics required to convert that into actionable information now need to evolve to cater to the dynamic nature of virtualized environments. They must provide a global picture of the overall infrastructure rather than presenting a picture of independent silos.

Now that we have discussed the various roles and responsibilities for monitoring, the next step is to drill down into the various entities and corresponding metrics that need to be monitored. Overall, virtual infrastructure monitoring requires a range of parameters such as system performance, application performance, system health, resource inventory, resource configuration, resource capacity, resource utilization and other important changes in a dynamic virtualized environment.

Virtualized environments consist of a number of physical and virtual entities such as physical servers, virtual machine monitors (VMMs) /hypervisors, virtual machines (VMs), virtual disks, and virtual network and applications (running inside the VMs). All of these elements are associated with one another through complex relationships. Data is captured for analysis by monitoring various attributes of these elements and their relationships. The following are the key applications of this monitored data.

Health Monitoring
Monitoring the health/status of the complete infrastructure requires monitoring of the physical server hardware status, hypervisor status, virtual machine status, physical and virtual network switches and routers, and storage systems.

Performance Monitoring
Basic performance monitoring looks at the CPU, memory, storage and network performance metrics from the VM guest OS as well as from the hypervisor. These metrics typically get monitored even in non-virtualized environments. The virtualization-specific metrics could be for specific entities that are introduced by various virtualization technologies, e.g., the cluster and datacenter concepts in VMware. The behavior of other virtualization features can also be measured as metrics, such as how frequently VM migrations are occurring or when other availability features are engaged. Then there are specialized applications built using virtualization, for example, desktop virtualization (VDI). Monitoring for such solutions requires more parameters to be collected from the virtual machine as well as the hypervisor layer, for example, how quickly VMs are provisioned to a requesting end user.

Capacity Monitoring
Today's organizations are truly dynamic and their resource utilization/requirements are continuously evolving. So, continuous planning of various resources such as servers, desktops, network, and storage is required. This requirement demands periodic audits of physical as well as virtual resources. The capacity monitoring requires end-to-end continuous capacity monitoring of the following key metrics:

  • Server utilization: Peak/average server resource utilization - memory /CPU/resource, server bottlenecks and correlation with a number of users/VMs.
  • Memory usage: Memory utilization on each server, capacity bottlenecks and relationship with number of users/VMs.
  • Network usage: Peak/average network utilization, capacity/bandwidth bottlenecks and relationship with a number of users/VMs.
  • Storage utilization: Overall storage capacity metrics, VM/virtual disk utilization, I/O performance metrics, snapshot monitoring and correlation with a number of users/VMs.

Security and Compliance Monitoring
Virtualization introduces a new set of security risks due to VM sprawl, and the introduction of new threat targets - the hypervisor layer, VI configurations, and potential conflicts in the way access control is managed and policies are applied. IT security and compliance monitoring becomes critical for securing the virtualized environment. Security and compliance monitoring requires end-to-end VI activity monitoring for:

  • VM sprawl: Metrics to monitor the VM activities as they get cloned, copied, V-Motion-ed within VI, move of network, move to different storage media.
  • Configuration metrics: Virtual server configuration monitoring to ensure that they are compliant with standards and hardening guidelines, VM configuration monitoring for software licensing policy enforcement. VI Events that help enforce/detect violations of IT policy. This includes individual security and organization security policy monitoring.
  • Access control: Access control monitoring and reports for role-based access control enforcement.
  • Compliance monitoring: Metrics to validate/audit IT for standards such as HIPAA, SOX, and GLBA.

Monitoring for Chargebacks
In a virtualized environment, where the infrastructure gets centralized, it's important to measure resource usage by different business units, groups, and users. This information can be used to distribute/amortize and, in some cases, recover the cost correctly across the organization through a proper chargeback mechanism. Chargebacks could be based on dynamic parameters such as resource usage and/or fixed parameters. To compute the correct chargeback information in a dynamic virtualized environment, it's important to monitor virtual as well as physical resource usage and allocations and be able to normalize the same across the infrastructure. Chargeback monitoring requires end-to-end VI activity monitoring for:

  • Standard metrics: All chargeable resource metrics like CPU usage, memory usage, storage usage, and network usage metric.
  • Key VI events: VI Events for virtual resource life cycle events like start date and end date of VM creation and allocation.
  • Configuration monitoring: VM configuration in terms of assigned resources and reservations and also applications installed to account for software licensing costs.
  • VM usage metrics: VM uptime, number of VMs can vary depending on how the charging model is employed in the organization.

Application Monitoring
The need for application monitoring is important in a virtualized environment. Particularly because the application may have problems even if the VM or the physical server on which it's running looks perfectly normal. Application monitoring is required to monitor the basic health of application servers with the help of application-specific response time and throughput metrics. The analytics on this data should be able to correlate the application-observed metrics to all layers of the infrastructure to perform a root-cause analysis in the event something going wrong. Application performance monitoring using the capture of network traffic is an interesting development in this area.

There are a few other aspects to VI monitoring that add to the complexity of building a comprehensive monitoring solution. All hypervisors provide API to be able to collect metrics. However, each hypervisor has its own object model and APIs. There are wide differences in features and even the behavior of the common features. Therefore, the analytics that are to be built on the collected metrics must be developed on a per-hypervisor basis. The virtualization management APIs standardized by DMTF are not yet available on most of these platforms.

Conclusion
When we speak to software providers that we assist in developing monitoring applications for virtual infrastructure, there is a recurring requirement to use a library that supports the collection of metrics from a broad spectrum of hypervisors. The analysis of these metrics is something that is not well understood. In the case of virtual infrastructure, the configuration database is not as static as in the case of physical infrastructure, which needs to be kept up-to-date for every change that happens. Forecasting based on this analysis becomes a challenge because there needs to be a clear understanding of which parameters would be stable if the placement of VM changes. Virtualization also opens a number of opportunities to automate responses to system issues and problems, which was not possible with physical systems. Therefore, monitoring solutions have every reason to evolve to be able to take corrective actions automatically.

More Stories By Jayant Walvekar

Jayant Walvekar is Associate Vice President, Practice Head for Virtualization, at Persistent Systems. As the Associate Vice President at Persistent he is responsible for executing projects in the virtualization area, building virtualization practice, and providing consulting and engineering services to software companies (ISVs) to enhance their products to adopt virtualization platforms. Prior to Persistent, he worked for Cognos Incorporated, Canada and Infosys, Bangalore. Jayant has earned his Bachelor’s in Computer Engineering from Walchand College of Engineering, Sangli.

More Stories By Midhun Chandran

Midhun Chandran is an Architect, Virtualization Practice, at Persistent Systems. He works with software companies building management, monitoring and automation products for the virtualized environment. He has over 10 years of experience in the software industry and has a strong background in performance engineering of scalable software applications. Midhun has an MS is Software Systems from BITS, Pilani.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.