Welcome!

Containers Expo Blog Authors: Moshe Kranc, Elizabeth White, Derek Weeks, Michel Courtoy, Pat Romanski

Related Topics: Containers Expo Blog, Java IoT, Microservices Expo, Linux Containers, Open Source Cloud, SDN Journal

Containers Expo Blog: Article

Preventing Performance Bottlenecks with Inline Deduplication

In a typical enterprise storage system the bottleneck to performance is in media bandwidth or computational overhead

Implementing high performance in enterprise storage is a constant battle to find and eliminate the next system bottleneck. Normally this alternates between limits of the underlying media and the computational overhead of metadata management, but choosing the wrong approach to deduplication can introduce a third performance challenge that can be impossible to overcome. Storage that implements a multi-pass approach to data optimization, such as staged or post-process deduplication, becomes inherently at a disadvantage for both computational and media overhead.

Common Performance Bottlenecks
In a typical enterprise storage system the bottleneck to performance is in one of two places: media bandwidth or computational overhead.

For the storage system designer, media overhead is the simplest to address - add more or faster media. In a hard disk-based storage system, this means adding faster drives, more drives, and larger drive sets. In a flash storage system bandwidth is increased by using SLC flash, adding more independent modules, allocating more over-provisioned space, and improving the flash translation layer.

Reducing computational overhead is a greater challenge. Once you have enough media bandwidth the problem becomes shuffling data to and from the storage initiators. Identifying the bottlenecks to performance here can be devilishly complex, as the designer must be concerned about matters such as system memory bandwidth, number of data copies and, especially in today's multi-core world, synchronization between multiple requests. There's no silver bullet here, so the only solution is having a very talented team of software engineers designing and optimizing the storage platform.

Adding deduplication introduces complexity directly into this most challenging area for performance improvement. Any deduplication implementation must interact directly with the storage metadata that is so critical to performance, since I/O requests are being redirected or eliminated based on the system's knowledge of duplicate data. Unless the deduplication technology has been designed and implemented in an inline, multi-core scalable, and low memory overhead way, system architects often try to separate deduplication into a separate layer and a second pass through the data. This is a mistake that harms storage performance in a way that cannot be repaired.

The Impossible Challenge of Multi-Pass Deduplication
This second pass through the data commonly occurs in two possible places: on the final storage media or when transferring data from a staging area to the final storage media.

The first case is always called post-process deduplication. Data are written to their resting media location and a separate process later reads them back, as time and bandwidth allow, determining if any portions are duplicates. If there are duplicates then storage metadata is updated to note this and space is freed for reuse. I've written extensively in the past about the risks of post-process deduplication; since it always requires additional media bandwidth and computational overhead it severely harms performance, and since there are no guarantees about when deduplication will occur it does not meet the requirements for high-change-rate use cases such as VDI.

Post-process Deduplication

The second case, where data is deduplicated as it is being moved from a staging location to a final media location, is often erroneously called inline - as it is inline with that destaging process - but is really just a modified form of post-process deduplication. As with conventional post-process deduplication, another round of data read and processing must occur. Additionally, now both the staging media and final media must provide the full system level of performance or either can become the bottleneck.

For example, some flash storage systems stage all data to a small arena of SLC flash prior to deduplication. This design doubles the number of possible performance bottlenecks in the architecture: performance writing to the staging area, front-end data ingestion, final media storage performance, and the deduplication and de-staging process itself. This sort of multi-pass deduplication process retains all of the negative performance aspects of a traditional post-process implementation.

Staged Post-process Deduplication

High Performance Requires Inline Deduplication
Any form of multi-pass deduplication introduces new bottlenecks that prevent an enterprise storage system from delivering the highest levels of performance. Post-process deduplication, whether on the final media or during a destaging process, creates additional overhead in both media access and data processing. For flash storage platforms requiring the highest levels of performance, only tightly integrated inline deduplication can meet all system requirements.

More Stories By Jered Floyd

Jered Floyd, Chief Technology Officer and Founder of Permabit Technology Corporation, is responsible for exploring strategic future directions for Permabit’s products, and providing thought leadership to guide the company’s data optimization initiatives. He has previously deployed Permabit’s effective software development methodologies and was responsible for developing Permabit product’s core protocol and initial server and system architectures.

Prior to Permabit, Floyd was a Research Scientist on the Microbial Engineering project at the MIT Artificial Intelligence Laboratory, working to bridge the gap between biological and computational systems. Earlier at Turbine, he developed a robust integration language for managing active objects in a massively distributed online virtual environment. Floyd holds Bachelor’s and Master’s degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
SYS-CON Events announced today that Datanami has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datanami is a communication channel dedicated to providing insight, analysis and up-to-the-minute information about emerging trends and solutions in Big Data. The publication sheds light on all cutting-edge technologies including networking, storage and applications, and the...
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased effi...
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organi...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
SYS-CON Events announced today that TMC has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo and Big Data at Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Global buyers rely on TMC’s content-driven marketplaces to make purchase decisions and navigate markets. Learn how we can help you reach your marketing goals.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"MobiDev is a Ukraine-based software development company. We do mobile development, and we're specialists in that. But we do full stack software development for entrepreneurs, for emerging companies, and for enterprise ventures," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.