Welcome!

Containers Expo Blog Authors: Jyoti Bansal, Gordon Haff, Liz McMillan, Pat Romanski, Yeshim Deniz

Related Topics: Containers Expo Blog

Containers Expo Blog: Blog Post

The Problem with SLA Monitoring in Virtualized Environments

The time-keeping problem and how it impacts application performance

Because virtual machines work by time-sharing host physical hardware, a virtual machine cannot exactly duplicate the timing behaviour of a physical machine. This leads to the timekeeping problems explained in the VMWare White Paper about Timekeeping in Virtual Machines that results in inaccurate time measurements within the virtual machine. This affects ALL performance metrics that rely on the operating system clock time to keep track of time which includes system counters like CPU or I/O Utilization. Performance Management solutions therefore run into the problem that the monitored metrics are inaccurate and can lead to incorrect enforcement of SLAs or wrong assumptions about application performance.

This blog explains the time keeping problem, how it impacts Application Performance Management in virtualized environments and what can be done to solve this problem.

Time keeping problem explained

Operating Systems that use a Tick Counting approach to keep track of time use hardware interrupts to count how many ticks have occurred since the system started. In a virtualized environment these interrupts are consumed by the virtualization infrastructure which keeps track of what we call the “Real Time”. The interrupts are then forwarded to the hosted virtual machines which itself keep track of the time that we call “Apparent Time“. In the best case scenario Real Time and Apparent Time are the same:

Timekeeping - Phase 1 - Real and Apparent Time are the same

Timekeeping - Phase 1 - Real and Apparent Time are the same

A virtual machine is not “always on” as it gets descheduled by the virtual server because of time-sharing with other virtual machines. In that time the hardware interrupts cannot be handled by the virtual machine and are therefore put into a queue for later consumption.

Timekeeping - Phase 2 - Virtual Machine is descheduled

Timekeeping - Phase 2 - Virtual Machine is descheduled

At the time the Virtual Machine gets scheduled again the operating system’s Apparent Time is still the time it was before it got descheduled as it has not yet received the timer interrupts that happened in the meantime. In that case the Apparent Time has drifted from the Real Time. Impact: Any performance metric taken at this time only shows the time that the Virtual Machine believes had passed which is not the time that really passed.

Over time the Virtual Machine catches up with the interrupts it missed while descheduled.

 

Timekeeping - Phase 3 - Catching up with Time
Timekeeping – Phase 3 – Catching up with Time

There are several other techniques that virtualization environments use to bring the Apparent Time back to Real Time as fast as possible. For more details have a look at the VMWare White Paper as mentioned on the top of this blog.

Impacts of the Time Keeping Problem

Any time based metrics captured from within the Virtual Machine are subject to the timekeeping problem including CPU Utilization, Memory Allocations per time interval, I/O access per time interval,… and any custom time tracking that can be used for e.g.: response time or transaction time monitoring. Operating system counters like % CPU per process also runs into another interesting problem where individual processes might get charged incorrectly with time that they never consumed. After a Virtual Machine is resumed the queued timer interrupts are processed. These interrupts come in a faster rate than normal. The currently active processes are charged with all these timing events although they have not done any work in that time because they were actually descheduled.

You can see that the timekeeping issue can really mess up your performance counters. The more load you have on a virtual server, the more virtual machines there are to schedule and de-schedule – the higher the impact on accurate timing will be. Other side effects like over-provisioning of CPU or Memory have an impact as well.

Using performance metrics from within the Virtual Machine for application performance management and enforcement of Service Level Agreements is therefore very questionable as the results are not accurate and not predictable.

Accurate Time Keeping with Pseudo Performance Counters

VMWare is aware of this problem and explains in great detail the reasons and the effects in their White Paper. As a solution for performance management solutions VMWare provides a way to query the actual Real Time at any time from within the Virtual Machine. Pseudo Performance Counters are made available via virtual processor registers that can be accessed from any application within the Virtual Machine.

dynaTrace is using these new counters for accurate time measurement when Managing Application Performance in virtualized VMWare environments. This allows accurate SLA enforcement and application performance management down to individual transactions or even methods. The following illustration shows a single captured transaction with accurate timings. dynaTrace captures the Real and the Apparent Time on method and transaction level:

 

Accurate Timing on Transaction and Method Level
Accurate Timing on Transaction and Method Level

Capturing the Real Time values and also showing the Apparent Time Drift enables Application Performance Management with accurate timing values. Accurate timings are the basics for accurate SLA Enforcement in Production as well as Application Performance Monitoring.

Is timekeeping a real issue in your environment?

The timekeeping problem is well known within the VMWare community and brings challenges to accurate application performance management in virtualized environments. I am interested in your experience with this problem. Have you been aware of it? Do you live with the inaccuracy or do you have other approaches for accurate measuring? Please share your thoughts on this topic.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@ThingsExpo Stories
The security needs of IoT environments require a strong, proven approach to maintain security, trust and privacy in their ecosystem. Assurance and protection of device identity, secure data encryption and authentication are the key security challenges organizations are trying to address when integrating IoT devices. This holds true for IoT applications in a wide range of industries, for example, healthcare, consumer devices, and manufacturing. In his session at @ThingsExpo, Lancen LaChance, vic...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
SYS-CON Events announced today that IoT Now has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, SSL, peer-to-peer, mob...
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, discussed the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports.
Big Data, cloud, analytics, contextual information, wearable tech, sensors, mobility, and WebRTC: together, these advances have created a perfect storm of technologies that are disrupting and transforming classic communications models and ecosystems. In his session at @ThingsExpo, Erik Perotti, Senior Manager of New Ventures on Plantronics’ Innovation team, provided an overview of this technological shift, including associated business and consumer communications impacts, and opportunities it m...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
SYS-CON Events announced today that Dataloop.IO, an innovator in cloud IT-monitoring whose products help organizations save time and money, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Dataloop.IO is an emerging software company on the cutting edge of major IT-infrastructure trends including cloud computing and microservices. The company, founded in the UK but now based in San Fran...
Things are changing so quickly in IoT that it would take a wizard to predict which ecosystem will gain the most traction. In order for IoT to reach its potential, smart devices must be able to work together. Today, there are a slew of interoperability standards being promoted by big names to make this happen: HomeKit, Brillo and Alljoyn. In his session at @ThingsExpo, Adam Justice, vice president and general manager of Grid Connect, will review what happens when smart devices don’t work togethe...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
In his session at @ThingsExpo, Sudarshan Krishnamurthi, a Senior Manager, Business Strategy, at Cisco Systems, will discuss how IT and operational technology (OT) work together, as opposed to being in separate siloes as once was traditional. Attendees will learn how to fully leverage the power of IoT in their organization by bringing the two sides together and bridging the communication gap. He will also look at what good leadership must entail in order to accomplish this, and how IT managers ca...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
TechTarget storage websites are the best online information resource for news, tips and expert advice for the storage, backup and disaster recovery markets. By creating abundant, high-quality editorial content across more than 140 highly targeted technology-specific websites, TechTarget attracts and nurtures communities of technology buyers researching their companies' information technology needs. By understanding these buyers' content consumption behaviors, TechTarget creates the purchase inte...
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...