Welcome!

Containers Expo Blog Authors: Flint Brenton, Elizabeth White, Yeshim Deniz, Pat Romanski, Liz McMillan

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

Three Levels of Network Monitoring for DevOps By @Wall_Dirk | @DevOpsSummit #DevOps

Back in the days when individual services ran on standalone machines, measuring performance was easy

Three Levels of Network Monitoring for DevOps
By Dirk Wallerstorfer

Network communications are a top priority for DevOps teams working in support of modern globally-distributed systems and microservices. But basic network interface statistics like received and sent traffic aren’t as useful as they once were because multiple microservices may share the same network interface. For meaningful analysis, you need to dig deeper and correlate network-traffic metrics with individual processes. This is however just the beginning … In this article, I’ll show you how deep the network monitoring rabbit hole goes. The types of meaningful analysis that DevOps engineers have been waiting for are finally available.

Back in the days when individual services ran on standalone machines, measuring performance was easy. Most of a machine’s CPU, memory, and network load were dedicated to support the one service. If the response time of the service increased, it was easy to determine if the problem was processing, memory, or network related. But times have changed. Resource shortages are now largely a thing of the past. The way that applications are built, deployed, and operated is also different. And, once again, error-free network performance is a top priority.

Level 1: Host-based monitoring

Modern performance-monitoring tools provide network-related metrics by default. In addition to throughput data though, you need to know the quality of your network connections. Knowing that your host transfers a certain amount of kilobytes per second is interesting, but it’s only the beginning. For example, knowing that half of your traffic is comprised of TCP retransmissions is extremely valuable information. The amount of incoming and outgoing traffic, connectivity, and information about connection quality (i.e., number of dropped packets and retransmissions) are the metrics that serious performance monitoring tools must provide.

When compared with overall traffic patterns passing through the host NIC, such metrics can provide important insights into network quality. If there is only one service process running on a host, all the host metrics are representative of the one process. If there are several processes running, these metrics provide information about the overall availability and connection quality of all the processes. But host-based monitoring can’t show you if a process has a network problem or the amount of resources that are consumed by each process (e.g., network bandwidth). Host-based network metrics can however be good indicators that something has gone wrong in your network. The question is, who you gonna call to tell you exactly what’s gone wrong?

Standard network metrics on host level

Level 2: Process-based monitoring

Monitoring resource consumption at the process level is a more sophisticated approach. Analyzing the throughput, connectivity, and connection quality of each process is a good starting point for productive analysis.

When monitoring at the process level you might expect to see network-volume metrics like incoming and outgoing network traffic for each process (i.e., the average rate at which data is transmitted to and from the process during a given time interval). But such volume-based metrics alone aren’t sufficient for meaningful analysis because they don’t tell you anything about the communication behavior of the process. If you take the number of TCP requests into account you have a three-dimensional model of process characteristics. High network traffic and few TCP requests can indicate, for example, an FTP server providing large files. Low traffic and many requests can indicate a service that has a small data footprint (e.g., an authentication service). If you only monitor network traffic volume, you won’t be able to tell the difference between an occasionally used, throttled FTP server and a frequently used web service. Clearly, the number of processed TCP requests is essential. You can use the combined network volume information to check your architectural design and expectations against empirical data and identify issues if something hasn’t worked as planned, or is getting out of hand.

The rate of properly established TCP connections, both inbound and outbound, is representative of the connection availability of a process. The number of refused and timed-out TCP connections per second need to be included in an integrated view that’s focused on process connectivity. With this information you can easily identify connectivity problems. Closed ports or full queues of pending connections can be the cause of connection refusals. Firewalls that don’t send a TCP reject or ICMP errors and hosts that die during transmissions can be reasons for timeouts.

In addition to quantitative data, a qualitative analysis of network connections is necessary for providing a holistic view of the network properties of a process. Assessing TCP retransmissions, round-trip times, and the effective use of network bandwidth provide additional insights. Opposing host and process retransmission rates can help in identifying the source of network connection problems.

Round-trip times are an important measure, especially when clients from remote locations or hosts in different availability zones play a role. The most precise measurement is handshake round-trip time measured during TCP session establishment. With persistent connections, for example in the backend of an application infrastructure, these handshakes occur rarely. Round-trip time during data transfer isn’t as accurate but it reveals fluctuations in network latency. Typically these values don’t exceed a few milliseconds for hosts on the same LAN and 50-100ms for geographically close nodes from different networks.

Apart from nominal network interface speed, the actual throughput that a process can realize is interesting data. Regardless of how fast a process responds, when large quantities of data need to be transferred, the bandwidth that is available to the process is the limiting factor. Keeping in mind that the network interface of the host running the process, the local network, and the Internet are shared resources, there are dozens of things that can affect data transfer and cause fluctuations over time. Average transfer speed per client session under current network conditions is vital information.

Obviously, having all this information about the quality of your network connections is useful and can provide exceptionally deep insights. Ultimately, this information enables you to pinpoint the exact processes that are having network problems. However, one piece of the puzzle is still missing: It takes two communicating parties to produce any sort of networking problem. Wouldn’t it be good to know what’s going on on the remote side of the network as well?

Network metrics on the process level

Level 3: Communications-based monitoring

Although network monitoring on the process level is innovative, you need more to properly diagnose and troubleshoot problems that can occur between the components of your application infrastructure. To get the best out of network monitoring you have to monitor the volume and quality of communication between processes. Only then can you unambiguously identify process pairs that have, for example, high traffic or connectivity problems.

With this approach you can check the bandwidth usage on both ends of a communication and identify which end might be the bottleneck. You can also single out process pairs that have connectivity problems or numerous TCP retransmissions. This obviously is way faster and less error-prone than manual checks on both sides. Aside from network overlays and SDN, you can pinpoint erroneous connections down to a level where you can start doing health checks on cables and switch ports because you know exactly which components participate in the conversation.

Monitoring volume and quality of network connections on the process/communications level makes detecting and resolving issues easier, more efficient, and more comfortable. It’s a bit like geotagging for network problems in your application environment.

The Ruxit way of network monitoring
Ruxit provides host-based network metrics and detailed metrics about each monitored process. Ruxit even goes the extra mile, providing all necessary information related to process-to-process communication.

Process connections overview

It shows all processes running on a host in the middle of the connection graph. On the left side you see all incoming connections from other processes. On the far left you see the hosts where these processes are running. To the right, you see all outgoing connections to other processes. On the far right, you see the hosts where the processes are running. You can select any process within that view to view traffic, connectivity, and retransmission rates for each conversation.

You can even use this feature to verify your software architecture and infrastructure. Are all processes talking to the processes they are supposed to talk to? Are all necessary services connected to load balancers? Is network bandwidth causing a performance bottleneck between my web and database servers?

This is the sort of information that DevOps engineers need. This feature eliminates the need to comb through log files and use tcpdump to locate networking problems and identify affected hosts and processes. It helps saving time and sparing nerves when analyzing networking problems. For insights into how networking errors can impact the performance of services, have a look at my recent article: Detecting network errors and their impact on services.

The post 3 levels of network monitoring for DevOps appeared first on #monitoringlife.

Read the original blog entry...

More Stories By Dynatrace Blog

Building a revolutionary approach to software performance monitoring takes an extraordinary team. With decades of combined experience and an impressive history of disruptive innovation, that’s exactly what we ruxit has.

Get to know ruxit, and get to know the future of data analytics.

@ThingsExpo Stories
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
DXWorldEXPO LLC announced today that the upcoming DXWorldEXPO | CloudEXPO New York event will feature 10 companies from Poland to participate at the "Poland Digital Transformation Pavilion" on November 12-13, 2018.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smart...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and ...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
"We are a well-established player in the application life cycle management market and we also have a very strong version control product," stated Flint Brenton, CEO of CollabNet,, in this SYS-CON.tv interview at 18th Cloud Expo at the Javits Center in New York City, NY.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
Most people haven’t heard the word, “gamification,” even though they probably, and perhaps unwittingly, participate in it every day. Gamification is “the process of adding games or game-like elements to something (as a task) so as to encourage participation.” Further, gamification is about bringing game mechanics – rules, constructs, processes, and methods – into the real world in an effort to engage people. In his session at @ThingsExpo, Robert Endo, owner and engagement manager of Intrepid D...