Welcome!

Containers Expo Blog Authors: Pat Romanski, Elizabeth White, Liz McMillan, XebiaLabs Blog, Stackify Blog

Related Topics: Containers Expo Blog

Containers Expo Blog: Blog Post

The Problem with SLA Monitoring in Virtualized Environments

The time-keeping problem and how it impacts application performance

Because virtual machines work by time-sharing host physical hardware, a virtual machine cannot exactly duplicate the timing behaviour of a physical machine. This leads to the timekeeping problems explained in the VMWare White Paper about Timekeeping in Virtual Machines that results in inaccurate time measurements within the virtual machine. This affects ALL performance metrics that rely on the operating system clock time to keep track of time which includes system counters like CPU or I/O Utilization. Performance Management solutions therefore run into the problem that the monitored metrics are inaccurate and can lead to incorrect enforcement of SLAs or wrong assumptions about application performance.

This blog explains the time keeping problem, how it impacts Application Performance Management in virtualized environments and what can be done to solve this problem.

Time keeping problem explained

Operating Systems that use a Tick Counting approach to keep track of time use hardware interrupts to count how many ticks have occurred since the system started. In a virtualized environment these interrupts are consumed by the virtualization infrastructure which keeps track of what we call the “Real Time”. The interrupts are then forwarded to the hosted virtual machines which itself keep track of the time that we call “Apparent Time“. In the best case scenario Real Time and Apparent Time are the same:

Timekeeping - Phase 1 - Real and Apparent Time are the same

Timekeeping - Phase 1 - Real and Apparent Time are the same

A virtual machine is not “always on” as it gets descheduled by the virtual server because of time-sharing with other virtual machines. In that time the hardware interrupts cannot be handled by the virtual machine and are therefore put into a queue for later consumption.

Timekeeping - Phase 2 - Virtual Machine is descheduled

Timekeeping - Phase 2 - Virtual Machine is descheduled

At the time the Virtual Machine gets scheduled again the operating system’s Apparent Time is still the time it was before it got descheduled as it has not yet received the timer interrupts that happened in the meantime. In that case the Apparent Time has drifted from the Real Time. Impact: Any performance metric taken at this time only shows the time that the Virtual Machine believes had passed which is not the time that really passed.

Over time the Virtual Machine catches up with the interrupts it missed while descheduled.

 

Timekeeping - Phase 3 - Catching up with Time
Timekeeping – Phase 3 – Catching up with Time

There are several other techniques that virtualization environments use to bring the Apparent Time back to Real Time as fast as possible. For more details have a look at the VMWare White Paper as mentioned on the top of this blog.

Impacts of the Time Keeping Problem

Any time based metrics captured from within the Virtual Machine are subject to the timekeeping problem including CPU Utilization, Memory Allocations per time interval, I/O access per time interval,… and any custom time tracking that can be used for e.g.: response time or transaction time monitoring. Operating system counters like % CPU per process also runs into another interesting problem where individual processes might get charged incorrectly with time that they never consumed. After a Virtual Machine is resumed the queued timer interrupts are processed. These interrupts come in a faster rate than normal. The currently active processes are charged with all these timing events although they have not done any work in that time because they were actually descheduled.

You can see that the timekeeping issue can really mess up your performance counters. The more load you have on a virtual server, the more virtual machines there are to schedule and de-schedule – the higher the impact on accurate timing will be. Other side effects like over-provisioning of CPU or Memory have an impact as well.

Using performance metrics from within the Virtual Machine for application performance management and enforcement of Service Level Agreements is therefore very questionable as the results are not accurate and not predictable.

Accurate Time Keeping with Pseudo Performance Counters

VMWare is aware of this problem and explains in great detail the reasons and the effects in their White Paper. As a solution for performance management solutions VMWare provides a way to query the actual Real Time at any time from within the Virtual Machine. Pseudo Performance Counters are made available via virtual processor registers that can be accessed from any application within the Virtual Machine.

dynaTrace is using these new counters for accurate time measurement when Managing Application Performance in virtualized VMWare environments. This allows accurate SLA enforcement and application performance management down to individual transactions or even methods. The following illustration shows a single captured transaction with accurate timings. dynaTrace captures the Real and the Apparent Time on method and transaction level:

 

Accurate Timing on Transaction and Method Level
Accurate Timing on Transaction and Method Level

Capturing the Real Time values and also showing the Apparent Time Drift enables Application Performance Management with accurate timing values. Accurate timings are the basics for accurate SLA Enforcement in Production as well as Application Performance Monitoring.

Is timekeeping a real issue in your environment?

The timekeeping problem is well known within the VMWare community and brings challenges to accurate application performance management in virtualized environments. I am interested in your experience with this problem. Have you been aware of it? Do you live with the inaccuracy or do you have other approaches for accurate measuring? Please share your thoughts on this topic.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@ThingsExpo Stories
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Consumers increasingly expect their electronic "things" to be connected to smart phones, tablets and the Internet. When that thing happens to be a medical device, the risks and benefits of connectivity must be carefully weighed. Once the decision is made that connecting the device is beneficial, medical device manufacturers must design their products to maintain patient safety and prevent compromised personal health information in the face of cybersecurity threats. In his session at @ThingsExpo...
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
Because IoT devices are deployed in mission-critical environments more than ever before, it’s increasingly imperative they be truly smart. IoT sensors simply stockpiling data isn’t useful. IoT must be artificially and naturally intelligent in order to provide more value In his session at @ThingsExpo, John Crupi, Vice President and Engineering System Architect at Greenwave Systems, will discuss how IoT artificial intelligence (AI) can be carried out via edge analytics and machine learning techn...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast t...
There is only one world-class Cloud event on earth, and that is Cloud Expo – which returns to Silicon Valley for the 21st Cloud Expo at the Santa Clara Convention Center, October 31 - November 2, 2017. Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers. Companies are each developing their unique mix of cloud technologies and service...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, will introduce two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
Recently, IoT seems emerging as a solution vehicle for data analytics on real-world scenarios from setting a room temperature setting to predicting a component failure of an aircraft. Compared with developing an application or deploying a cloud service, is an IoT solution unique? If so, how? How does a typical IoT solution architecture consist? And what are the essential components and how are they relevant to each other? How does the security play out? What are the best practices in formulating...
In his session at @ThingsExpo, Arvind Radhakrishnen discussed how IoT offers new business models in banking and financial services organizations with the capability to revolutionize products, payments, channels, business processes and asset management built on strong architectural foundation. The following topics were covered: How IoT stands to impact various business parameters including customer experience, cost and risk management within BFS organizations.