Accelerating File Delivery for the World’s Most Demanding Data-Intensive Workflows

Jason Goodman

Jason Goodman

Accelerating File Delivery for the World’s Most Demanding Data-Intensive Workflows

Background

The continuous high-volume growth of global unstructured data creates larger file system sizes and spawns increasingly larger numbers of files and directories.  For some customers, this translates to hundreds of millions of files of varying sizes and types.  The volume, variety, and velocity of file-based data sets are so great that traditional mechanisms for replicating and synchronizing files simply can’t keep up.  

Most conventional techniques for replicating and synchronizing files were designed for smaller files and file system sizes containing fewer files.  Tools like rsync, for example, were designed in the 90’s when there were fewer files, smaller file systems, and the network topology interconnecting systems was an afterthought.  Back then, in support of basic one- or two-way sync operations, using rsync over TCP may have been “good enough”.  

By contrast, many Resilio customers have more demanding synchronization, distribution, and ingest scenarios, supporting a diversity of file sizes (from extremely small to very large). In some cases, our customers’ jobs include many files (measured in thousands and millions) and across more dynamic and challenging data movement scenarios.  These more complex bi- and multi-directional sync scenarios (what we refer to as omni-directional file delivery) must now support replicating (in some cases) many millions of files, over a variety of network conditions topologies.  And doing it extremely efficiently using existing systems, with a finite amount of memory, IO, and CPU.   

And with some file data continuously changing, the task of efficiently moving changed files in real-time poses great challenges.  Especially when updating many files across multiple locations, many devices, and across a variety of platforms (Windows, Mac, Linux, et al).  

Moreover, it’s important that customers are able to centrally manage more data and devices. We’ve heard from larger customers that they need to gain more visibility and insight into what data is moving where–and how fast it’s getting there.  A key goal for this latest release of Resilio Platform is to provide greater visibility, improved diagnosability, and a massive gain in efficiency for managing and controlling larger numbers of files. 

End-to-End System Performance Optimization

Resilio Platform 3.0 rapidly and efficiently identifies, processes, and synchronizes massive numbers of files at unprecedented speeds, in any direction. 

Resilio has taken a holistic approach to improving scalability and performance.  The end-to-end process of synchronizing millions of files, for example, requires optimizing all phases of the job:  reading, indexing, and merging file systems.  Merging is the non-trivial process of reconciling differences across N-number of file systems, in parallel.  Considering that some file systems may be large, or deployed in a remote location, over an unreliable or low bandwidth link, this merging and sync process must be extremely efficient, fast, and reliable.   

Resilio has improved performance for each phase mentioned above.  Each step in a given job (Loading, Indexing, Syncing, Synced) is displayed through counters in the Resilio Management console.    

Resilio Connect: Enhanced Performance Monitoring and Statistics

Memory Efficiency

One transformational focus area for Resilio engineering is memory utilization.  The engineering team has done an incredible job reducing memory consumption, the amount of physical memory required for each job.  Memory footprint requirements have been reduced by 80% on average for all jobs.  These optimizations provide a significant reduction in the amount of physical memory needed per agent.  This work translates to a saltationist leap in scalability: the ability to replicate exponentially more files across a broader range of use cases.  

File System Scalability—and Synchronizing Hundreds of Millions of Files

Improved memory efficiency, combined with optimizations in startup time, indexing, merging, and end-to-end transport, enables Resilio Platform to not only transfer millions of files, but to synchronize millions of files in real-time, in any direction.  Resilio engineering has tested and successfully accomplished synchronizing 200 million files in a single job.  This number is by no means a design limitation, but represents a target goal validated for the release.  

Pure Speed 

An ongoing focus area for Resilio is improving pure performance, measured in sustained, end-to-end transfer speeds.  The 3.0 release improves pure performance across a number of use cases such as:  sustained transfer speed for agent-to-agent, scale-out performance, cloud ingest and egress (upload/download), sync, and site-to-site transfer speeds (on-premise, hybrid cloud, and cloud native).  Resilio has verified over 10Gbps for agent-to-agent transfers, cloud ingest, and site-to-site performance within any cloud. 

Network Optimization

It’s important to note that Resilio continuously optimizes the core transport for both LAN and WAN environments.  Resilio’s proprietary ZGT protocol provides a turnkey, end-to-end, WAN optimization module making it easy to deploy and use out of the box.  

Systems Operations Scalability

The Resilio Management Console now gives customers the ability to centrally manage up to fifty thousand agents per console instance–and scale-out as needed.  Some of Resilio’s largest customers use the MC to centrally manage thousands of agents deployed globally. 

Management Capabilities

As stated earlier, another goal of the release was to improve the monitoring and visualization capabilities of the Resilio Management Console.  

Customers will see an improved administration experience.  Much of the existing UI was redeveloped to simplify management and visually expose broader capabilities around statistics collection (performance counters, job progress, etc.) and job management. 

Enhanced Performance Monitoring and Statistics

In cases where customers would like to discover and potentially diagnose the status or progress of a job, Resilio has added a variety of counters to collect and display this information.  For example, the ability to track the phase of the job (from start to finish), exposing information around bytes transferred, and other lower level operations–are now available for those that are curious or need the information. 

Other Enhancements

Prior to this release, there have been a number of enhancements that some customers may not know about.  These include: 

  • Windows Cluster Support:  As of 2.12, Resilio Platform supports working with Microsoft Windows Cluster Server. 
  • 3rd-party systems management frameworks:  Resilio supports exporting management information such as log files, events, and Webhooks from the Resilio Management Console to systems management frameworks such as Microsoft Systems Center Operations Manager (SCOM), Splunk, LCE, and a variety of open source tool sets such Grafana and Prometheus, among others. 

Learn more about Resilio Connect. To see or try Resilio Platform 3.0, please schedule a demo or start a free trial.

Overview

A key goal for this latest release of Resilio Platform is to provide greater visibility, improved diagnosability, and a massive gain in efficiency for managing and controlling larger numbers of files.
Related Posts