S3 Replication Latency: What to Expect & How to Reduce Delays

Eleanor Parker

Eleanor Parker

S3 Replication Latency: What to Expect & How to Reduce Delays

S3 Replication is a useful feature for asynchronously replicating S3 objects across Amazon S3 buckets in the same or different regions. Most new objects are typically replicated in 15 minutes but you may experience delays as long as a few hours based on their size and number.

You can monitor the process by enabling Amazon S3 Replication Time Control (S3 RTC), which automatically turns on S3 replication metrics. These metrics publish data to CloudWatch, which lets you track:

  • Bytes pending.
  • Replication latency.
  • Operations pending.
  • Operations that failed replication.

Note: Check out AWS’s docs for tutorial on how to enable replication metrics via the Command Line Interface (CLI).

You can also set up AWS S3 Event Notifications to get alerts whenever objects don’t replicate successfully. These notifications can be sent via AWS Lambda, SQS, or SNS.

Despite its usefulness, S3 Replication can be slow and unreliable when trying to replicate lots of large objects, especially in the case of S3 cross-region replication (S3 CRR) where the source and destination buckets are in different AWS regions.

And there’s no simple way to reliably reduce replication latency without a third-party solution like Resilio Connect (you can learn more about Resilio and schedule a demo here).

Plus, many issues can cause the replication process to fail, including:

  • Permission misconfigurations in the replication rules, IAM roles, or AWS accounts. 
  • Bucket and object ownership (when they’re owned by different accounts). 
  • Destination and source bucket versioning, and more.

This can cause lots of headaches for organizations that need their data distributed across the globe and kept up-to-date to meet workflow demands or compliance requirements. 

That’s why, in this article, we explore how Resilio Connect can help you drastically lower CRR latency and achieve more predictable replication times. 

Resilio Connect is our replication solution that delivers industry-leading replication speed thanks to its organically scalable P2P replication topology and proprietary WAN transfer technology. Our software is also:

  • Simple to set up and use, as you can deploy it on your existing infrastructure and start replicating in as little as two hours.

  • Flexible, as you can use it with any cloud provider (Amazon Web Services, Google Cloud, Microsoft Azure, etc.), on-prem, or in a hybrid cloud environment.

  • Secure, as it encrypts data at rest and in transit using AES 256 and employs other key data integrity mechanisms.

To learn how Resilio Connect can help your company replicate data across AWS regions, services, other cloud providers, and on-prem environments with drastically low latencyschedule a demo with our team.

How Resilio Overcomes S3 Replication Latency

One of the biggest reasons for slow and unreliable replication speeds is the typical methodology used by replication tools. Most solutions rely on a client-server or “follow-the-sun” replication topology, both of which have significant downsides.

Specifically:

  • In the client-server model, only one device is designated as a hub server, while the others can only be clients. Only the hub can replicate and receive objects from any device. Conversely, clients can only replicate objects to the hub server, so all replication must go through it first. For example, if Client 1 wants to replicate objects through the other servers in your environment, it must first send them to the hub, which then replicates them to the other clients.

  • In the “follow-the-sun” replication method, replication can only occur sequentially, from one device to the next. So if Device 1 wants to replicate objects across other devices in your environment, it must first replicate them across Device 2, which will then replicate Device 2 to Device 3, and so on.

It’s easy to see how both of these topologies create replication bottlenecks by limiting replication to only two devices at a time (be it from hub to client or from one device to another).

To overcome these issues, we’ve built a unique P2P (peer-to-peer) transfer architecture and a proprietary WAN (wide area network) optimization technology for Resilio Connect.

P2P Replication

P2P replication means that every device in your environment can replicate objects across other devices, without going through a hub server first. 

Plus, Resilio Connect uses file chunking to turn files into several chunks when sharing them. Each one can be transferred independently from the others, resulting in transfer speeds 3-10x faster than traditional replication solutions.

For example, say Device A wants to replicate data to Devices B, C, D, and E. Device A can transfer the first block to Device B. Once it receives the block, Device B can share it with any other device in the network while Device A sends the remaining blocks.

The combination of P2P replication and file chunking makes Resilio one of the only real-time replication solutions that can perform true bidirectional sync, including:

  • One-to-one transfer.
  • One-to-many transfer.
  • Many-to-one transfer.
  • N-way transfer.
P2P vs Client-Server architecture

This topology doesn’t have a single point of failure, unlike the client-server and “follow-the-sun” topologies, which can be affected by a single device in the network:

  • In the client-server context, the hub server must be constantly running in order for replication to go as planned. Plus, replication speeds are negatively affected by the need to “cloud hop” (i.e., send objects to a hub server before replicating them across your environment).

  • In the “follow-the-sun” context, any one device in the network can stop or slow down the replication process. For example, if one device goes down or is in a slow network, all others must wait for it before receiving any replicas.

The P2P topology overcomes both problems by allowing all devices to share objects with each other. This means the replication process is not dependent on a single device, so your replication environment will never fail.

WAN Optimization Technology

Most replication solutions, including ones offered by AWS, use the TCP/IP transfer protocol for transfer over WANs. But while TCP/IP is great for local area networks (LANs) it struggles with WAN transfers, which leads to replication delays. 

First, packet loss — one of the defining characteristics of WANs — and latency disrupt replication for most conventional TCP-based replication solutions.

Second, TCP/IP treats packet loss as a network congestion issue and reduces the transfer speed in response. However, packet loss is not a network congestion issue in WANs. 

As a result, you can get data transfer speeds that are significantly lower than the available bandwidth of your internet provider. This can be a big issue for companies that want to fully utilize expressive WAN connections and synchronize their data across multiple sites and cloud providers.

Resilio overcomes this process with Zero Gravity Transport™ — our proprietary WAN optimization technology that creates a uniform packet distribution over time and uses:

  • A Congestion control algorithm to calculate the ideal send rate and avoid overloading the network.

  • Interval acknowledgements for a group of packets, instead of sending acknowledgements after every packet receipt.

  • Delayed retransmission of lost packets to increase transfer speed.
Resilio Connect vs Competitors: 10GB file to 10 endpoints over 10 Mbps link

We have a detailed WAN optimization whitepaper if you want more details on how Resilio optimizes WAN transfers.

Additional Resilio Benefits: Efficiency, Scalability, Security, and More

While the lighting-fast replication speed is one of the biggest advantages of Resilio Connect, it’s far from the only one. Our software’s unique P2P topology, along with its other functionality, also makes it:

  • Highly available and fault-tolerant.
  • Flexible.
  • Organically scalable.
  • Simple to set up and manage.
  • Equipped with enterprise-grade security.

1. High Availability and Fault Tolerance

As we said earlier, Resilio doesn’t have a single point of failure. Our P2P topology means that if one device fails, Resilio agents can always access data and services from other devices.

Thanks to this architecture, as well as other key features we’ve built on top of it, Resilio can:

  • Utilize any network connectivity in your environment, which lets it meet sub-five-second RPOs (Recovery Point Objectives) and RTOs (Recovery Time Objectives) within minutes of an outage.

  • Perform “checksum restarts” in the event of a network failure, which allows failed transfers to continue where they left off.

  • Use cryptographic file validation to guarantee that replicated files remain uncorrupted.

This mix of capabilities makes Resilio a highly resilient solution ideal for all kinds of disaster recovery (DR) scenarios, including hot-site DR, warm-site DR, cold DR, and offsite copy.

Hot/Live DR: Multi-site Active/Active; Warm DR: Active/Active; Cold DR: Active/Passive; Offsite Copy: Backup Copy

Note: We’ve covered this topic in detail in our article on the top hot-site disaster recovery solutions.

2. Storage and Cloud Flexibility

Unlike S3 Replication (or other proprietary cloud storage and replication services), Resilio builds on open formats, open file formats, open standards, and an open, multi-cloud architecture. This means Resilio is cloud-agnostic and can be deployed on any infrastructure — single-cloud, multi-cloud, hybrid cloud, on-prem, and so on.

As a result, you can avoid vendor lock-in and replicate your data across a variety of regions, services, cloud providers, and on-prem storage. For example, Resilio Connect lets you:

  • Replicate your data quickly, efficiently, and reliably across AWS regions and services.

  • Use a variety of cloud storage services (including AWS, Azure, GCP, Wasabi, Backblaze, and more) and file storage solutions.

  • Browse and sync files on file, block, or object storage via popular tools on operating systems like Mac and Windows.

Lastly, Resilio also runs on VMware, Citrix, and other virtualization platforms. And it can be deployed on your existing infrastructure and begin replicating extremely quickly, as we’ll discuss in a bit.

3. Built-In Scalability

A big downside of traditional point-to-point replication topologies is that they become less reliable as the number of replication endpoints (and object sizes) increases. As a result, the replication speed and data accuracy take a hit.

Resilio’s P2P architecture makes our solution organically scalable, so the more servers you add, the better Resilio performs.

For example, Resilio can synchronize data 50% faster than point-to-point solutions in a 1:2 scenario and 500% faster in a 1:10 scenario. Our engineers have even successfully tested replicating 250+ million files in a single job with Resilio Connect. 

Put simply, Resilio’s built-in scalability makes it ideal for tons of replicationuse cases, regardless of the network, number of endpoints, regions, cloud providers, and number of objects (or files).

For a real-life example of Resilio Connect’s speed, scalability, and efficiency, check out our case study with VoiceBase.

VoiceBase is a speech-to-text solution for audio and video transcription. Data accuracy is the most demanding aspect of their work, as it requires over 50GB of files to be disseminated to over 400 production servers as fast as possible. Thanks to Resilio Connect, the VoiceBase team now confidently overcomes this challenge.

“Resilio Connect enables us to reliably distribute our code, specifically new language models in a fraction of time. These copy jobs now take an hour, down from eight. Best of all, once Resilio Connect was installed, it just works: We never need to manually intervene in any way.” 

4. Simple Setup and Management

While replication in AWS can be complex to set up, manage, and debug, Resilio Connect makes the process much simpler.

Our agent-based software can be deployed on your existing infrastructure, including industry-standard desktops, servers, storage, and networks. It can be configured cross-platform on Linux, Ubuntu, OS X, Android, iOS, and Microsoft Windows servers. And as we said, it can be deployed on-prem, in the cloud, and in hybrid cloud scenarios.

In short, Resilio Connect can be configured on your infrastructure and begin replicating in as little as two hours.

You also don’t need to purchase any new hardware when working with Resilio. Instead, you can continue using the storage your team already uses, like DAS, NAS, or SAN. You can even blend storage capacity from any type of storage — hard drives, SSDs, storage systems, and so on.

After the setup, Resilio Connect also gives you an easy way to monitor, control, and debug the replication process with our REST API and Management Console.

Resilio Connect Overview, General Info, Statistics

The console gives you plenty of options to navigate the replication process and optimize its resource use by adjusting key parameters, like:

  • Buffer size, packet size, and network performance.
  • Disk io threads, data hashing, and file priorities.
  • Server bandwidth.

You can also schedule server bandwidth rules for different times of days or weeks, optimize your file servers to cache files on-demand and minimize data transfer costs, and much more.

For example, MixHits Radio uses Resilio Connect to update metadata across their servers. Their team has experienced massive time savings thanks to Resilio’s simplicity and ability to coordinate and troubleshoot the process from a single place.

Here’s what MixHits Radio CEO Gary Hanna had to say about working with Resilio Connect:

“We have gone from spending 15 hours on average per week troubleshooting conflicts in the prior solution to spending no time at all with Resilio. We configure jobs once in the Resilio Connect Management Console and never have to look at it again.” 

5. Enterprise-Grade Security

Storing your data in S3 — or any other cloud storage service — and moving it across regions and services always comes with security risks. That’s why it’s imperative to take all possible measures for your data’s safety during its storage and replication. 

Resilio Connect helps you do that with its state-of-the-art security and data protection features, including: 

  • AES 256 encryption for protecting your data at rest and in transit.

  • Mutual authentication keys for guaranteeing your data gets delivered only to designated endpoints.

  • Cryptographic data validation integrity for preventing data loss and ensuring data arrives at its destination uncorrupted.

  • One-time session encryption keys.

Lastly, Resilio’s security features are reviewed by 3rd-party security experts to guarantee the highest level of protection for your data.

Overcome Delays and Get Reliable S3 Replication Every Time with Resilio

Resilio Connect can help you achieve real-time, bi-directional (two-way) replication with and across AWS regions, services, on-prem environments, and even other cloud providers. Besides providing industry-leading replication speeds, our software is also:

  • Highly resilient, making it an ideal solution for disaster recovery scenarios.

  • Flexible, as you can use it with various AWS services, other cloud providers, and on-prem environments.

  • Organically scalable, thanks to its P2P architecture that performs better as you add more replication endpoints. 

  • Simple to set up and manage, because it can be deployed on your existing infrastructure and start replicating in as little as two hours.

  • Secure, thanks to AES 256 protection and a myriad of other security features built by our engineering team and verified by 3rd-party security experts.

To learn more about how Resilio Connect can help your business, schedule a demo with our team.

Overview

Learn about the typical S3 replication delays & how to monitor and eliminate them with Resilio.
Related Posts