AWS DataSync is one of many file transfer solutions from AWS. It can be used for ingest, data migration, and a variety of file transfer and synchronization scenarios. There are even ways to parallelize data transfers from point to point. But, alas, that’s just what DataSync is: point-to-point transfer and sync.
DataSync is functional if you need to move or sync files in one direction or if you need a one-time data migration from on-prem to the cloud. It’s an ongoing, pay-as-you-go solution used for transferring or synchronizing files between (at most) two endpoints at a given time. Its performance suffers because it’s limited to the slowest endpoint, and each endpoint is a potential point of failure. When you add these problems to the complexity of AWS pricing (DataSync alone is free, but using it can lead to other AWS fees downstream), it leaves ample room for alternatives — in particular, Resilio Connect.
Businesses such as Blizzard Entertainment, Match.com, Skywalker Sound, Warner Bros, and more rely on Resilio Platform for blazing-fast synchronization, edge ingest, and more.
To see Resilio in action, schedule a demo with our team.
In contrast to AWS DataSync, Resilio Platform enables any-directional, real-time sync. You can have multiple users in multiple regions continuously synchronizing files in real-time. We have customers like Sunrise Productions that iterate on high-value assets in real-time across multiple AWS regions. They are able to extend their contributors to the far edge using Resilio Connect. How? Resilio’s n-way sync: they can scale their global pipeline by simply adding Resilio agents on their devices and adjusting bandwidth to AWS.
If you have users in remote locations across the globe, they often experience high latency using AWS. But Resilio all but eliminates cross-region latency in AWS. Because everyone can sync in real-time, a change made by one user in one location is instantly reflected across all other endpoints and regions within about 4-5 seconds, depending on the bandwidth. On average, Resilio is seeing about a 10x speed up compared to conventional point-to-point tools like AWS DataSync.
Resilio lets you make better use of bandwidth and network capacity while using storage (file, block, or object) more efficiently. It can also be used as a file gateway, providing low-latency and efficient access to files stored anywhere: on-prem or in any cloud (not just AWS).
Another big advantage of Resilio is resilience. You can reliably move data in any direction: from thousands of endpoints collecting data at the remote edge to pushing updates to thousands of locations globally. Resilio makes exceptional use of any available network: Our proprietary WAN acceleration protocol makes Resilio the most rugged and fair user of your bandwidth on any network, including VSAT, Wi-Fi, 3G/4G/5G cell networks, or any IP network.
Suffice it to say, Resilio Platform outperforms AWS DataSync in every way. That said, DataSync may be the better choice in certain use cases. So, in this article, we’ll discuss:
- What AWS DataSync does well and what it does poorly (and how to use that information to decide if it’s right for you).
- How Resilio Platform provides a more reliable, faster, scalable alternative to AWS DataSync (and a better user experience to boot).
Customers rely on Resilio Platform to distribute and synchronize data for media workflows (Turner Sports, Innovative), gaming (Wargaming, Larian Studios), remote operations (Mercedes-Benz, Buckeye Power Sales), and more. To see for yourself how Resilio Platform can drastically improve data sync into, within, and out of the AWS cloud, schedule a demo.
Is AWS DataSync Right for You?
Below, learn the basics of what AWS can do, when to choose AWS DataSync, and when AWS DataSync falls short.
What AWS DataSync Offers
AWS DataSync is a point-to-point data sync solution designed to sync data between on-prem storage and other remote devices and AWS storage services. Data sources can be mounted in a variety of ways — using Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), Amazon Simple Storage Service (Amazon S3), and other ways. Your sources may be AWS Snowcone or Snowball, or just about any type of device generating data (self-managed object storage, Amazon EFS (Amazon Elastic File System), Amazon FSx (for Windows File Server file systems, Lustre file systems, and NetApp ONTAP file systems), and Amazon FSz for OpenZFS file systems and so forth).
DataSync is neither the most complicated sync tool nor the easiest to use. You can create DataSync tasks through the DataSync console, AWS CLI (Command-Line Interface), or DataSync API. Meanwhile, AWS Discovery scans and indexes files on on-premise devices to make suggestions about how much data to migrate to the cloud (note that Discovery does not work with AWS Direct Connect locations, but you can still move data in and out of VPC endpoints). If you want to monitor and report on transfers, you’ll need to spend the extra money for Amazon CloudWatch and Amazon CloudTrail.
DataSync protects your data using TLS end-to-end encryption and performs data integrity checks on files both in transit and at rest (at both the source location and destination location). However, it only supports using default at-rest encryption for Amazon S3 buckets and supports encryption of data at rest and in transit for Amazon EFS and Amazon FSx. It also enables you to create policies that secure access to services and resources using AWS Identity and Access Management (IAM).
When to Use Data Sync
DataSync is a solid choice for one-time data migrations moving in a one-way, single direction. For example, let’s say you have some filers (NAS systems) on-premises you need to mount via NFS or SMB and migrate to AWS. You could use either Resilio Platform or AWS DataSync to mount these using any number of agents and perform a brute-force copy or sync from the source to the destination. Resilio will give you some added benefits around reliability, scalability, and ease of management, but you can do a similar thing by mounting multiple AWS DataSync agents and partitioning the scan across the NAS. These one-time lift-and-shift projects are easy enough to do without Resilio Connect.
When to Consider Alternatives to DataSync
Where Resilio Platform shines (and DataSync falls short) is in remote areas and data flows where many files (or file changes) rapidly flow in multiple directions. For example, if you have data in a remote (yet human-accessible) Quonset hut, you could send out a Snowball or Snowcone device, transfer the files, and mail it back to AWS. But, if you have many remote (and perhaps not even easily human-accessible) huts or boats, drones, or moving trucks, it’s much easier to have Resilio installed on your target endpoints (or edge centers) and move data in and out as you please.
For example, you could set up a many-to-one flow from the extreme edge back to a core data center or AWS region and include two-way transfers back and forth between some of the endpoints or edge centers. You can also go remote-to-remote (endpoint-to-endpoint). For example, you may have construction crews working off the side of a cliff who need to share data. Resilio enables them to share data directly without reaching back up to a hub (e.g., the AWS cloud) for updates.
With DataSync, the data flows always need to be point-to-point: you’d transfer or sync from the edge to the hub and then back out to the edge again. Resilio does enable this (at a much vaster scale), but it also opens up pathways to reliable data sharing at the edge or anywhere.
But what about DataSync’s WAN optimization features? Unfortunately, the WAN optimization features offered by DataSync are basic and may not work across unreliable connections. They are also bound by its point-to-point replication architecture, yielding slower sync speeds and, if not architected correctly, presenting single points of failure. Other than a hub-and-spoke scenario, DataSync will not be able to provide true multi-directional synchronization such as a full mesh N-way sync. For customers needing to support more advanced scenarios — such as large-scale ingest across poor networks, global remote work and hybrid work initiatives, file-based “hot site” DR (disaster recovery), and web or app server deployment — we highly recommend you take a look at Resilio Connect.
Many customers use Resilio Platform to keep their global AWS data pipelines flowing and current within sub-five seconds of a file change. According to Chris Botha, the IT manager at Sunrise Production, they simply couldn’t do what they do on AWS without Resilio Connect.
“After COVID, we saw a massive decentralization of talent,” said Botha. “It used to be all of the talent was in the Western US, around LA, London, India, and then we had a small subset of that in South Africa. But as soon as everyone realized that they could work from home, that talent just scattered all over the planet. So it’s been quite a big infrastructure redesign on our side, but there is, to be honest, zero chance we’d be able to do it without Resilio actually keeping all that data in sync, because there are 15 departments in our (AWS) pipeline, and everything’s dependent on the one before.”
“I don’t know of any other tool that would be able to let us have a central data source that is 99% in sync across 16 countries,” he added.
Directly Comparing AWS DataSync & Resilio Connect
While a table can’t capture every nuance, the following chart gives a succinct look at the similarities and differences between AWS DataSync and Resilio Connect. In the following section, you can explore more deeply the features of Resilio that interest you the most.
AWS DataSync | Resilio Connect | |
Protocol | ⚬ NFS ⚬ SMB ⚬ AWS-designed WAN transfer protocol | ⚬ NFS ⚬ SMB ⚬ UDP-based Zero Gravity Transport (WAN optimization) |
Speed | Up to 10 Gbps per transfer (over network link only) | 100+ Gbps per transfer (See how scale-out, or horizontal scalability, works) |
Cloud Object Storage Support | ⚬ Works with all AWS object storage services ⚬ Limited options with Google Cloud Storage and Azure Blob Storage | Works with a variety of AWS and 3rd-party object storage services, located anywhere (remote, on-prem, or cloud). These include but are not limited to: ⚬ AWS S3 services ⚬ Azure Blobs ⚬ Google Object Storage ⚬ Cloudian ⚬ Ceph ⚬ MinIO ⚬ VAST Data ⚬ Wasabi ⚬ Weka IO |
Sync Types | Scheduled sync | ⚬ Selective sync on-demand ⚬ Partial selective sync ⚬ Scheduled sync ⚬ Multi-directional real-time sync |
Sync Architecture | Point-to-point: Transfer between at most two endpoints | Peer-to-peer: Transfer among any number of endpoints concurrently |
Data Flows | ⚬ Unidirectional point-to-point (one-way only) ⚬ One-to-many ⚬ Hub-and-spoke ⚬ Follow-the-sun or chaining (Transfer from A to B and then from B to C) ⚬ Many-to-one via multiple one-to-one transfers | ⚬ Unidirectional point-to-point (source/destination) ⚬ Reliable bidirectional (two-way) across any number of endpoints ⚬ One-to-many ⚬ Hub-and-spoke ⚬ Many-to-one via a single point of management ⚬ Many-to-many (or full mesh / N-way) transfer or sync |
Bandwidth Utilization Controls | Provides granular bandwidth controls | Provides granular bandwidth controls |
Security Features | ⚬ TLS encryption ⚬ Data integrity validation | ⚬ Mutual authentication ⚬ AES 256-bit encryption ⚬ Data immutability ⚬ Cryptographic data integrity validation ⚬ Access control |
WAN Optimization | AWS-designed WAN transfer protocol | UDP-based Zero Gravity Transport protocol |
Reliability | Single head architecture creates single points of failure (SPOF): The point-to-point sync architecture for AWS DataSync creates the potential for sync failure at each endpoint | ⚬ No single point of failure — all agents work together; if one fails another takes over ⚬ Cryptographic data integrity validation ⚬ Fault-tolerant transfer (the protocol automatically resumes every failed transfer from the point of interruption until the job is 100% completed) |
Monitoring & Reporting | Integrates with AWS Cloudwatch and AWS CloudTrail for monitoring and reporting (for an additional cost) | Native monitoring and reporting through Resilio Management Console |
Ready to see Resilio Platform in action? Schedule a demo now.
How Resilio Platform Provides Superior Synchronization for Data Stored in Any Cloud
Resilio Platform is a superior sync solution to AWS DataSync because it:
- Uses a P2P (peer-to-peer) replication architecture that syncs files up to 10x faster, eliminates single points of failure, syncs in any direction, and enables you to organically scale your sync environment.
- Uses a proprietary WAN optimization protocol to reliably sync files over any network, providing superior edge synchronization and cross-region replication.
- Works with any operating system and cloud storage provider so that you can manage your on-prem and cloud data from one unified location.
- Enables you to easily manage, monitor, and report on sync jobs through a built-in Management Console.
- Includes native, iron-clad security features to protect your data.
High-Performance Synchronization
Traditional sync solutions like AWS DataSync use point-to-point sync architectures, such as:
- Hub-and-spoke: One server acts as a central server. Every remote server must first send data to the central server, which then syncs the data with the other remote servers one by one. Remote servers can’t communicate with each other at all.
- Follow-the-sun: One server syncs data with another server sequentially (i.e., Server 1 syncs with Server 2; then Server 2 syncs with Server 3; and so forth).
Point-to-point architectures suffer from many weaknesses. The fact that synchronization occurs from one server to another means that syncing your entire system will take longer, particularly when syncing large files and large numbers of files, or when syncing to many endpoints.
Point-to-point architectures also introduce single points of failure. If any server or network goes down, the remaining servers must wait for the sync to complete before receiving updates. And, in a hub-and-spoke architecture, the central server must stay online at all times in order for synchronization to occur at all. This forces many organizations to invest in expensive backup servers and failover infrastructures.
But, with Resilio Connect’s P2P architecture, every server can share files with every other server simultaneously. Resilio also splits the file into several chunks that can transfer independently from each other (called file chunking). Altogether, this leads to sync speeds 3-10x faster than traditional solutions.
For example, imagine you want to sync a file among five servers. Resilio can split that file into five chunks. Server 1 can share the first chunk with Server 2, the second chunk with Server 3, and so on. Server 2 can immediately start sharing that file chunk with Server 3, even before it receives the rest of the file. With every server working together to sync your files, you can utilize the full bandwidth of your sync environment. We’ve seen sync speeds of 100+ Gbps site to site.
On top of that, Resilio uses optimized checksum calculations and notifications from the host OS to immediately detect and sync only the changed portions of a file (real-time change detection). Not only does this speed up the process, but it also helps you save money (such as on AWS egress fees).
Sync in Any Direction
While solutions like DataSync can only perform one-way and two-way sync, Resilio Platform can replicate data in any direction — one-way, two-way, one-to-many, many-to-one, and N-way. Because every server can share files with every other server, there are no single points of failure in your environment. If any server or network goes down, the necessary files or services can be retrieved from any other server in your system.
N-way sync is particularly useful when you need to sync large amounts of data across many geographically distributed endpoints (on-prem and cloud).
For example, organizations with remote workers or distributed workforces can use N-way sync to ensure that every member of your team in each office can make changes to files and have those changes immediately distributed to every other office — so everyone is working on the exact same files.
And, when recovering from a disaster (such as a server failure), Resilio can utilize all of your servers to achieve sub-five-second RPOs (Recovery Point Objectives) and RTOs (Recovery Time Objectives) within minutes of an outage.
Scale Organically
Since every server in a P2P environment can communicate with every other server, your sync environment is organically scalable. The more servers you add to your environment, the more servers you have that can contribute to synchronization (i.e., more bandwidth and resources). In other words, more demand creates more supply so that Resilio can handle ever-increasing workloads.
Resilio can sync hundreds of endpoints in roughly the same time it takes most solutions to sync two. Resilio can also sync files of any size and number (we successfully tested and synced 450+ million files in a single job).
Sync Types
AWS DataSync only supports scheduled syncs, which can be configured hourly, daily, or weekly.
But Resilio Platform supports both scheduled syncs and real-time syncs. You can optimize your sync configuration around your bandwidth consumption (such as syncing files during off-business hours where consumption is low) or sync data immediately to ensure every server has the most updated versions of files.
Efficient File Access
You can also use Resilio Platform as an efficient object storage gateway solution to ingest, sync, and access files stored in AWS or any other cloud storage provider.
When storing data in the AWS cloud, many organizations don’t properly account for AWS data egress pricing. AWS charges fees whenever you transfer data within the AWS cloud (within and across cloud regions) as well as from the AWS cloud out over the internet. And, in order to avoid expensive storage bills, organizations need to calculate and optimize AWS egress costs.
Unlike other cloud storage gateway vendors, Resilio Connect’s file gateway increases productivity and reduces costs. You can:
- Choose which files and folders are cached locally, so you can store frequently accessed files on on-premise servers and infrequently accessed files in the cloud. This reduces data egress costs and provides employees with quicker access to necessary files.
- Access files directly at each endpoint (rather than through centralized office servers), so you don’t need to invest in expensive servers for each office.
- Create policies that control which files get synced to which endpoints and how files are downloaded, locally cached, and purged — so you can automate the process and free employees to focus on their tasks rather than wasting time on manual syncs.
- Perform partial downloads as an end-user for quicker file access and reduced data egress traffic and charges.
- Give end users a unified view of files that operate much like Microsoft OneDrive.
Note: Resilio can also help you save on fees via smart routing and identifying the optimal path for transferring files to their necessary destination. Sign up for a Resilio Platform demo here.
Superior WAN Acceleration
Transferring data across cloud regions or geographically distributed, on-prem endpoints requires WAN networks, which suffer from high latency and varying degrees of packet loss. And, when syncing data from edge devices, you need a sync solution that can optimize transfers in areas with little to no network connectivity (such as when syncing in remote locations or at sea).
DataSync enhances WAN transfer using an AWS-designed WAN transfer protocol. The protocol optimizes WAN transfers using:
- Incremental transfers
- Sparse file detection
- In-line compression
- Multi-threaded connections between the local DataSync agent and the in-cloud service components
Resilio Connect, however, uses a highly-resilient, WAN-optimized UDP-based transport protocol known as Zero Gravity Transport™ (ZGT). ZGT ensures data always arrives at its destination location and enables bulletproof predictability of transfers over any network by using:
- A congestion control algorithm that constantly probes the RTT (Round Trip Time) to calculate and maintain the ideal data packet send rate.
- Sending interval acknowledgments for groups of packets, rather than acknowledging each individual packet receipt.
- Retransmitting lost packets in groups once per RTT.
Resilio can also utilize VSAT, cell (3/4/5G), Wi-Fi, or any IP connection — so it can reliably sync data using any device or operating system, ingest data from the edge into the cloud, and immediately sync it across your entire environment.
Case Study: Northern Marine Group
Northern Marine Group provides ship management and marine services to a global customer base. Previously, they had to mail CDs with software updates, and it still took weeks to troubleshoot all the installs on the ships. Now, they use Resilio Platform to quickly and reliably distribute and synchronize updates across their fleet of sea vessels.
“Being able to use the scripting engine meant that we could essentially have the updates distribute, install and report back on status all automatically, allowing us to avoid the installation going wrong because of user error,” says Clark. “For a few of the ships where we did have issues, all we had to do was create some modified scripts and distribute those using Resilio as well.”
What used to take six months now takes only two weeks thanks to Resilio. Learn more about how Resilio Platform helped Northern Marine Group keep their fleet of vessels in compliance 92% faster than their previous solution.
Versatile Hybrid and Multi-Cloud Management
Both Resilio Platform and AWS DataSync are agent-based solutions that can be deployed on your existing IT infrastructure.
But DataSync can only be used to sync data from on-prem devices and within AWS cloud storage services, plus some limited communication with Google Cloud Storage and Azure Blob Storage. If you’re storing data in multiple cloud platforms, you’ll need a separate solution for managing and syncing data there — increasing costs and the complexity of managing your data.
But Resilio Platform is a hardware and cloud-vendor agnostic solution that works with:
- Any popular operating system, such as Microsoft Windows, Linux, Mac, OpenBSD, FreeBSD, Unix, iOS, Android, and more (Resilio also offers apps for mobile devices).
- Any cloud storage providers, such as AWS, Azure, Google Storage, MinIO, Wasabi, Backblaze, and more.
- Your current IT setup (i.e., the servers, desktops, networks, DAS, NAS, and SAN storage your team is currently using).
- VMware, Citrix, hypervisors, and other virtual machines.
Resilio Platform can be deployed with minimal operational interruption. You can install Resilio agents on your devices and begin syncing in as little as two hours. And it enables you to manage your entire data storage environment (i.e., all on-prem and cloud endpoints) from a single, unified location.
Granular Control & Bandwidth Usage Automation
One of the key features DataSync promotes is its ability to control bandwidth utilization in your environment (so you can control costs and optimize transfers). For example, DataSync enables you to throttle transfer speeds up to 10 Gbps during off hours and set limits when network availability is needed elsewhere.
Resilio Platform provides users with granular control over bandwidth allocation for each endpoint. You can use Resilio’s Management Console to manually adjust bandwidth utilization and even create profiles for each endpoint that govern how much bandwidth it’s allotted at certain times of the day and on certain days of the week.
Centralized, User-Friendly Console
Resilio Platform includes built-in management and reporting through the Resilio Management Console. You can use the Management Console to:
- Control bandwidth allocation for each endpoint.
- Manage files stored in AWS (or any other cloud).
- Deploy instructions across public, private, and hybrid cloud storage.
- Collect logs and get notifications sent to email or Webhooks.
- Create and control replication jobs and collect real-time performance metrics.
- Configure replication parameters, such as disk I/O, buffer size, and more.
- Manage and monitor Resilio agents and job functions.
You can also create replication jobs using Resilio’s command-line interface, or use Resilio’s REST API to script any type of automation and functionality your job requires.
Iron-Clad Security Features
Resilio Platform encrypts all data at rest and in transit using AES 256-bit encryption. It also utilizes cryptographic data integrity validation to ensure data arrives at its destination intact and uncorrupted.
Resilio also stores immutable copies of files in the public cloud in order to protect you from ransomware and data loss (known as data immutability).
You can use Resilio’s Management Console to control permissions for specific files and folders.
All of Resilio’s security features were reviewed by 3rd party security experts. And, since they’re native to Resilio’s software, there’s no need for you to invest in 3rd party security solutions or VPNs.
Use Resilio Platform for Hybrid and Multi-Cloud Sync
While DataSync works well for synchronizing small files and smaller numbers of files in the AWS cloud, it fails to provide the versatility and benefits offered by Resilio Connect, such as:
- Faster sync: Resilio’s P2P architecture delivers sync speeds 3-10x faster than traditional solutions.
- Multi-directional sync: Resilio can sync data in any direction, including one-way, two-way, one-to-many, many-to-one, and N-way. This makes it a superior solution for syncing large numbers of geographically distributed endpoints.
- Organic scalability: As your sync environment grows, you’ll have more bandwidth and resources available. Resilio can sync files of any size and number, and can sync hundreds of endpoints in roughly the same time it takes most solutions to sync two.
- No single points of failure: P2P sync eliminates single points of failure. If any server goes down, other servers can step in to deliver the necessary files or services.
- Superior edge sync: Resilio’s proprietary WAN acceleration protocol enables it to optimize synchronization over any network. And it can utilize any network to reliably sync data in areas with little to no network connectivity.
- Versatility: Resilio can be deployed on your existing environment and supports any cloud storage provider — enabling you to manage your data infrastructure from one unified location.
- Built-in management: You can use Resilio’s Management Console to control, monitor, and report data transfers in real-time.
- Native security: Resilio includes native security features that encrypt and protect your data at rest and in transit.
Businesses such as Blizzard Entertainment, Match.com, Skywalker Sound, Warner Bros, and more rely on Resilio Platform for blazing-fast synchronization, edge ingest, and more. To see Resilio in action, schedule a demo with our team.
Frequently Asked Questions
What Is AWS DataSync?
AWS (Amazon Web Services) DataSync is a software-only, agent-based data transfer service for data migration and synchronization between on-prem devices and AWS cloud storage services. The DataSync service uses a scanning and indexing feature known as Discovery to make suggestions on which files to transfer to the cloud and when. It protects data at rest and in transit using TLS encryption and data integrity validation. And it enables you to create sync schedules and utilize bandwidth throttling in order to control costs and reduce the burden on your network.
What Is the Difference between AWS DataSync and AWS Storage Gateway?
AWS DataSync is a service for transferring and syncing data between on-premises storage devices and your AWS account.
AWS Storage Gateway is a solution that provides low-latency access to datasets stored in AWS cloud storage. It includes Amazon Tape Gateway (for accessing iSCSI-based virtual tape libraries of virtual tape drives and a virtual media changer), Amazon S3 File Gateway (for accessing files stored as objects), and Amazon Volume Gateway (for accessing data stored as block storage volumes).
What Is the Difference between AWS DataSync and S3 Transfer Acceleration?
AWS DataSync and S3 Transfer Acceleration provide similar capabilities. But S3 Transfer Acceleration is designed for use cases involving web or mobile applications with widespread users or applications hosted far away from their S3 bucket. S3TA enhances transfer speeds by 50-500% by reducing the variability in internet routine, network congestion, and speeds.
Customers rely on Resilio Platform to distribute and synchronize data for media workflows (Turner Sports, Innovative), gaming (Wargaming, Larian Studios), remote operations (Mercedes-Benz, Buckeye Power Sales), and more. To see for yourself how Resilio Platform can drastically improve data sync into, within, and out of the AWS cloud, schedule a demo.