Rsync: A Brief History & Overview
Rsync is still a popular tool for synchronizing smaller data sets in basic scenarios for uni- and bi-directional file sync. When Andrew “Tridge” Tridgell developed rsync for Linux back in the 90’s, file sizes and file systems were relatively small, counted in gigabytes (not petabytes) and no more than a few thousand files per file system.
For smaller data sets across relatively low-latency networks, rsync provides an efficient unidirectional approach. Rsync relies on scanning the file system and reading all files into memory to acquire information on file changes or deltas. In rsync vernacular, this file change information is referred to as “blocks” (not to be confused with block changes). Rsync then stores this information about each file in memory on the source and target systems. Rsync then communicates over TCP to compare local file chunks on the source system with remote file chunks on the target to make decisions on which files to replicate.
Rsync’s major limitation is the time it takes to scan a source and target file system for changes and to compare and synchronize those changes over networks of varying conditions. Through delta encoding and compression algorithms, rsync offers some level of optimization. Yet, without the ability to capture incremental file changes in real-time, rsync is not a practical solution for larger file systems containing millions of files; nor is rsync well suited to more complex synchronization scenarios requiring multi-directional synchronization over WANs.
Depending on factors such as file system size and network conditions, rsync may be useful in scenarios such as:
- Sync files and folders between two (2) offices. The offices can be located anywhere in the world as long as the file systems are small and network conditions good (low latency and minimal packet loss).
- Distribute files from one office to another or several offices. DevOps, for example, faces this problem delivering builds, videos, or other files from the main office to regional offices.
- Consolidate (ingest) data from one or several offices using rsync to a single office.
Rsync Alternative for Windows
At a previous company I worked for, our IT manager, Roger, installed Rsync for Windows (which required Cygwin) to run on my Windows XP laptop. That’s right: Windows XP! The original Rsync was released well before Windows Vista, XP, and Windows 2000, back around the time of Windows NT 4.0. Today, Rsync for Windows lives on. It’s probably OK for smaller files and directories with fewer files. Back in the day, it was a good Windows alternative to Microsoft “offline folders” sync and that horrific Windows Briefcase app. Instead of going to the command line, Rsync for Windows file synchronization jobs were kicked off by running a batch file. Each time I’d finish work for the day, I’d double-click the batch file from my Windows desktop to run the Rsync job. It would occasionally crash Cygwin; or maybe it was Cygwin that crashed Rsync for Windows. Either way, I’m not sure on the Rsync Windows implementation; it was either cwRsync or another Rsync alternative. As an Rsync alternative to Xcopy and Robocopy, it worked pretty well; at least for synchronizing smaller files and home directories. Things like file permissions sometimes didn’t copy correctly. And did I mention Cygwin?
Rsync and Resilio Connect Similarities
Some file synchronization tools (like PeerSync) use timestamps to determine which files to synchronize; others like GoodSync use block-level change detection. Another approach, which both Rsync and Resilio Connect utilize, is differential change detection on the file level. The most similar thing between Rsync and Resilio Connect is a differential sync engine. Only the deltacopy of a file is synchronized between the source and target. For file servers or filesystems with larger files or files that continuously change, this reduces the amount of data sent across the wire. Other than that and offering command line tools, in terms of functionality, that is where the similarities end.
Why Resilio Connect is the best Rsync Alternative?
Resilio Connect is an excellent Rsync alternative especially when users need:
- Real-time synchronization of files anywhere in the world, where updates are efficiently captured and propagated in (near) real-time for files of any size.
- File synchronization scalability to support largeer capacity file systems containing many files (or many millions of files) of varying file sizes, from small to large files.
- Flexible N-way multi-diirectional synchronization (uni- , bi- , multi-directional, or full mesh): Resilio Connect enables massively scalable omni-directional replication and file synchronization. There’s is nothing like it.
- WAN optimization for moving or synchronizing files over unreliable networks (VSAT, LTE, WiFI, and/or WANs): Resilio Connect is WAN-optimized and versatile for use over any network, with built-in compression, delta detection, and efficient recovery from failures to minimize data transfer.
- Centralized management: Resilio Connect enables all jobs to be centrally managed and easily configured for Distribution, Sync, Consolidation, and Scripting.
- Automation: all data movement jobs can be automated, scheduled, scripted, or integrated into workflows through a complete REST API.
- Multi-cloud-ready with your cloud storage vendor of choice–for on-premises, hybrid, or cloud native deployments.
Other Rsync Limitations
In today’s massively big-and-bulky file synchronization world, rsync’s architecture poses a number of challenges to data-intensive global enterprises. Rsync challenges include the following:
No real-time file change detection
As stated earlier, rsync is not optimized for real-time file change detection with a large number of files. It is usually very slow synchronizing folders with millions of files. Rsync’s architecture is limited by the time it takes to scan large folders, find changes, and transfer those changes. As the complexity and size of the directory (dir) structure increase, rsync’s replication ability of changes decreases. As stated earlier, Resilio Connect offers an alternative approach based on real-time file system monitoring to efficiently detect and replicate changes on-the-fly in real-time.
Poor scalability
Rsync is notoriously slow synchronizing folders with large numbers of files. As file system sizes grow into the millions of files, it may become impractical to use rsync. Rsync’s open source architecture is limited by the time it takes to scan a folder or directory, find changes, and transfer those changes.
Rsync and WAN connections
Rsync is slow when used over WANs and unreliable networks (cell, vsat, et al) with long retransmission times and varying degrees of packet loss. Rsync uses TCP/IP as its transport mechanism. TCP/IP treats every packet loss or acknowledgement delay as network congestion and backs off rsync speed in order to reduce the load on the connection. This approach helps applications that are TCP/IP-based share networks and collectively agree on the maximum speed they can use for data transfer. In the case of wide-area networks (WAN), a delay or a packet loss doesn’t mean the network is congested. Therefore, the logic of rsync (and TCP/IP) is not appropriate for WAN connectivity.
Quickly transfer files to more than one destination
It is rare these days for an organization to only send or copy files to just one location or server. Usually, most companies require synchronizing across multiple locations or servers. Thus, a common approach with rsync and FTP is to “follow the sun”, executing jobs individually; once the previous job completes, a second job is started, and so forth. What was reasonably quick for one-to-one transfers becomes very slow when it has to be repeated many times, usually via command line, serialized in sequence.
Rsync and dynamic IPs
Rsync needs static IPs to establish a connection. If a machine has a new IP, rsync stops operations and needs human intervention to resume file transfers.
Rsync and remote script execution
Rync can be wrapped by a script to perform additional operations after a file is delivered or folders are synchronized. However, it becomes tricky in case of more than one destination and a need to synchronize script execution on all destinations (e.g., a software patch that should only be done if all machines have it). If you add-in a mix of different operating systems (Linux, Unix, Windows, MacOS) it becomes even more complex to develop cross-platform synchronization of events.
Resilio Connect: An Rsync Alternative
Through real-time data synchronization and other key functionality, Resilio Connect scales-out data movement in parallel over any network, efficiently scaling transfer performance up to 20x faster than rsync. Resilio enables true multidirectional (n-way) data movement to overcome transfer bottlenecks–over any distance and location.
Architecturally, Resilio Connect is an agent-based solution. Resilio Agents are installed on all devices participating in data movement jobs. Job types include Distribution, Consolidation, Scripting, and Synchronization. Resilio Connect agents support popular operating systems such as Windows, MacOS, Linux, FreeBSD, and Android. Connect also supports popular virtualization platforms, servers, storage, NAS devices, networks, and cloud storage services providers.
The Resilio Connect Management Console is a centralized, web-based management system used to manage and monitor all job functions through an easy-to-administer graphical user interface. Optionally, Resilio offers a complete API set to expose and automate all functions performed by the Management Console. You can install and configure the Management Console on Windows and Linux servers.
Rsync Optimization
How to make rsync faster?
It’s hard. Performance is limited due to rsync’s basic set of technologies. Performance optimization is limited to delta encoding and compression.
To get faster file transfer speeds with rsync, you would need to use a replacement. Resilio Connect adds peer-to-peer scalable data transfer, WAN optimization, and real-time file system monitoring to speed up syncing for today’s enterprise.
Rsync & large file synchronization
It is possible but slow. Rsync doesn’t have an optimized way of calculating the checksum of files. This leads to an extremely long time to calculate file differences across large file sets. It is also not very good at recovering from connection failures and sometimes a transmission of a large amount of data will start over.
Resilio Connect optimizes the checksum calculations so that it can sync faster than rsync, with files of any size. It also moves files in small chunks and minimizes re-transmission in case of a failure.
Rsync & transferring folders
Rsync is a file synchronization tool and it’s designed to scan each folder file by file. This means it could take hours or days before rsync discovers the changed file and transfers it to the destination.
Resilio Connect uses real-time notification events from the host OS to detect changed files. This guarantees that the changed file will be delivered to its destination much faster than with rsync, and holds true for any folder size.
Rsync & end-to-end encryption
Rsync lacks end-to-end encryption, making it insecure to use rsync without additional encryption. The lack of traffic encryption requires the installation of an additional encryption solution such as SSH or VPN.
Any good rsync alternative should have built-in end-to-end encryption of data in transit. Resilio Connect uses AES256 in CTR mode to encrypt all the traffic sent between endpoints.
Rsync & static IP addresses
Dynamic network environments present a challenge to rsync. Rsync requires static IP and port addresses for both source and destination machines. As soon as an IP address changes, rsync fails.
Resilio Connect uses a dynamic routing approach. When a rule specifies that two machines need to exchange data, both machines use a tracker or multicast to discover the addresses of each other on the fly. No human intervention is necessary when a new IP is assigned.
Rsync & long-haul WAN connections
Rsync fails to utilize the available bandwidth over long, high latency, or lossy connections, which leads to slow rsync transfer speeds. The long-distance between offices makes TCP packet travel time long (high latency) and increases the chances of packet loss due to equipment failure or congestion. TCP will slow down the speed significantly for these types of networks.
Resilio Connect has a built-in, pre-configured WAN optimization module. With Connect you can utilize 100% of the available bandwidth in your network independent of distance, latency, or loss. Resilio Connect uses a unique, UDP based protocol, called uTP3, that uses bulk packet transfer with selective acknowledgment of lost packets. You can read more about uTP here.
Rsync & multiple destinations
Using Rsync free file sync synchronization to multiple destinations is very inefficient. One option is to run multiple rsync instances. This will split the network channel and increase the time to complete transfers to any single destination. Another approach is “follow the sun”, where files are transferred first to one destination and then to a second destination, once the first transfer completes. This way, the file transfer will use the full bandwidth, but the second destination needs to wait until transfer to the first one is completed. Both of these solutions are slow and fragile. Both solutions also leave most of your network underutilized.
Resilio Connect uses a scale-out, peer-to-peer approach that leverages networking between all offices/servers and significantly speeds up data transfers. This optimized approach splits each file into blocks and sends these blocks independently. Each recipient can send the block to other recipients once received. This dramatically speeds up syncing operations: Resilio transfers concurrently to N-number of destinations. Resilio also makes efficient use of all the available network capacity that may otherwise be left unused.
Rsync & NAT
Unfortunately, rsync doesn’t mix well with NATs. You will need to forward ports for rsync to be able to connect to devices behind a NAT.
Unlike rsync, Resilio Connect uses NAT traversal techniques that establish a direct connection between computers without needing manual configuration.
Resilio also provides a Resilio Connect Proxy Server and other enhancements in release 2.12 and later.
Resilio Connect vs. Rsync
Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up as an rsync alternative.
Resilio Connect | Rsync | |
Delta encoding | + | + |
Compression | + | + |
Dynamic IP support | + | – |
Encryption | + | – |
WAN optimization | + | – |
NAT traversal | + | – |
Cross-platform | + | + |
1M+ files | + | – |
Big folders | + | – |
Real-time file sync | + | – |
Resilio Connect: An Excellent Rsync Alternative
Through real-time file synchronization and other key functionality, Resilio Connect scales-out data movement in parallel over any network, efficiently scaling transfer performance up to 20x faster than rsync. Resilio enables true multidirectional (n-way) file delivery overcoming network bottlenecks–over any distance and location.
Architecturally, Resilio Connect is an agent-based solution. Resilio Agents are installed on all devices participating in data movement jobs. Job types include Distribution, Consolidation, Scripting, and Synchronization. Resilio Connect agents support popular operating systems such as Windows, MacOS, Linux, FreeBSD, and Android. Connect also supports popular virtualization platforms, servers, storage, NAS devices, networks, and cloud storage services providers.
The Resilio Connect Management Console is a centralized, web-based management system used to manage and monitor all job functions through an easy-to-administer graphical user interface. Optionally, Resilio offers a complete API set to expose and automate all functions performed by the Management Console. You can install and configure the Resilio Connect Management Console on Windows and Linux (Ubuntu, et al) in virtual machines. Each server supports up to (at least) 10,000 endpoints (devices, cloud storage buckets, desktops, or servers running a Resilio agent). The Management Console can be clustered to support massive numbers of endpoints. One Resilio customer has over 100,000 managed endpoints in their edge computing deployment.
Resilio Connect also supports a variety of operating systems: Microsoft Windows, Apple Mac, BSD (FreeBSD), Android, and a variety of NAS devices (too many to list).
Unlike cloud file sharing and collaborations tools (Google Drive, Microsoft OneDrive, Dropbox, Webdav, et al), Resilio Connect enables users to directly collaborate within and between companies, over any distance. Jobs for distribution, consolidation, and file synchronization enable rapid transfer and file synchronization of very large files–there is no size limit–and up to many millions of files.
Are you interested in learning more to see if Resilio Connect is the Rsync replacement or alternative you’ve been looking for?
Please schedule a Resilio Connect demo or start a free trial to see how much faster and more reliable Resilio Connect is compared to Rsync.