Welcome!

Containers Expo Blog Authors: Carmen Gonzalez, Sematext Blog, Elizabeth White, Pat Romanski, Stackify Blog

Related Topics: Containers Expo Blog

Containers Expo Blog: Blog Post

The Problem with SLA Monitoring in Virtualized Environments

The time-keeping problem and how it impacts application performance

Because virtual machines work by time-sharing host physical hardware, a virtual machine cannot exactly duplicate the timing behaviour of a physical machine. This leads to the timekeeping problems explained in the VMWare White Paper about Timekeeping in Virtual Machines that results in inaccurate time measurements within the virtual machine. This affects ALL performance metrics that rely on the operating system clock time to keep track of time which includes system counters like CPU or I/O Utilization. Performance Management solutions therefore run into the problem that the monitored metrics are inaccurate and can lead to incorrect enforcement of SLAs or wrong assumptions about application performance.

This blog explains the time keeping problem, how it impacts Application Performance Management in virtualized environments and what can be done to solve this problem.

Time keeping problem explained

Operating Systems that use a Tick Counting approach to keep track of time use hardware interrupts to count how many ticks have occurred since the system started. In a virtualized environment these interrupts are consumed by the virtualization infrastructure which keeps track of what we call the “Real Time”. The interrupts are then forwarded to the hosted virtual machines which itself keep track of the time that we call “Apparent Time“. In the best case scenario Real Time and Apparent Time are the same:

Timekeeping - Phase 1 - Real and Apparent Time are the same

Timekeeping - Phase 1 - Real and Apparent Time are the same

A virtual machine is not “always on” as it gets descheduled by the virtual server because of time-sharing with other virtual machines. In that time the hardware interrupts cannot be handled by the virtual machine and are therefore put into a queue for later consumption.

Timekeeping - Phase 2 - Virtual Machine is descheduled

Timekeeping - Phase 2 - Virtual Machine is descheduled

At the time the Virtual Machine gets scheduled again the operating system’s Apparent Time is still the time it was before it got descheduled as it has not yet received the timer interrupts that happened in the meantime. In that case the Apparent Time has drifted from the Real Time. Impact: Any performance metric taken at this time only shows the time that the Virtual Machine believes had passed which is not the time that really passed.

Over time the Virtual Machine catches up with the interrupts it missed while descheduled.

 

Timekeeping - Phase 3 - Catching up with Time
Timekeeping – Phase 3 – Catching up with Time

There are several other techniques that virtualization environments use to bring the Apparent Time back to Real Time as fast as possible. For more details have a look at the VMWare White Paper as mentioned on the top of this blog.

Impacts of the Time Keeping Problem

Any time based metrics captured from within the Virtual Machine are subject to the timekeeping problem including CPU Utilization, Memory Allocations per time interval, I/O access per time interval,… and any custom time tracking that can be used for e.g.: response time or transaction time monitoring. Operating system counters like % CPU per process also runs into another interesting problem where individual processes might get charged incorrectly with time that they never consumed. After a Virtual Machine is resumed the queued timer interrupts are processed. These interrupts come in a faster rate than normal. The currently active processes are charged with all these timing events although they have not done any work in that time because they were actually descheduled.

You can see that the timekeeping issue can really mess up your performance counters. The more load you have on a virtual server, the more virtual machines there are to schedule and de-schedule – the higher the impact on accurate timing will be. Other side effects like over-provisioning of CPU or Memory have an impact as well.

Using performance metrics from within the Virtual Machine for application performance management and enforcement of Service Level Agreements is therefore very questionable as the results are not accurate and not predictable.

Accurate Time Keeping with Pseudo Performance Counters

VMWare is aware of this problem and explains in great detail the reasons and the effects in their White Paper. As a solution for performance management solutions VMWare provides a way to query the actual Real Time at any time from within the Virtual Machine. Pseudo Performance Counters are made available via virtual processor registers that can be accessed from any application within the Virtual Machine.

dynaTrace is using these new counters for accurate time measurement when Managing Application Performance in virtualized VMWare environments. This allows accurate SLA enforcement and application performance management down to individual transactions or even methods. The following illustration shows a single captured transaction with accurate timings. dynaTrace captures the Real and the Apparent Time on method and transaction level:

 

Accurate Timing on Transaction and Method Level
Accurate Timing on Transaction and Method Level

Capturing the Real Time values and also showing the Apparent Time Drift enables Application Performance Management with accurate timing values. Accurate timings are the basics for accurate SLA Enforcement in Production as well as Application Performance Monitoring.

Is timekeeping a real issue in your environment?

The timekeeping problem is well known within the VMWare community and brings challenges to accurate application performance management in virtualized environments. I am interested in your experience with this problem. Have you been aware of it? Do you live with the inaccuracy or do you have other approaches for accurate measuring? Please share your thoughts on this topic.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@ThingsExpo Stories
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
@ThingsExpo has been named the Most Influential ‘Smart Cities - IIoT' Account and @BigDataExpo has been named fourteenth by Right Relevance (RR), which provides curated information and intelligence on approximately 50,000 topics. In addition, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to eve...
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search an...