Collecting files from remote locations into the Azure cloud is difficult. Edge data ingestion is plagued by the challenge of transferring unstructured data over low-quality networks and the need to keep that data synchronized across your environment once it’s ingested. And, there’s confusion in the market over how best to ingest, and what solutions offer what capabilities.
There are a variety of ways to ingest and transfer files with Azure, such as the Azure Data Box and the Azure Data Box Gateway. But, these solutions suffer from limitations (e.g., file size limits for sync, unreliable network utilization) that may pose challenges to collecting files in remote and hard-to-reach places.
Alternatively, you can use Resilio Connect — our software-only ingest and gateway solution for securely collecting many files of any type and size from remote locations (using VSAT, cell, radio, Wi-FI, or wired network connections) to any other location in predictable time frames.
Note: Customers rely on Resilio Connect to distribute and synchronize data for media workflows (Turner Sports, Innovative), gaming (Wargaming, Larian Studios), remote operations (Mercedes-Benz, Buckeye Power Sales), and more. To learn how Resilio Connect can drastically improve sync and gateway performance in the Azure cloud, schedule a demo.
In this article, we’ll discuss the capabilities and limitations of Azure Data Box Gateway, as well as how our multi-cloud file gateway and synchronization solution, Resilio Connect, can complement or replace Azure Data Box Gateway to provide superior file ingest and sync across all of your locations.
For example, Resilio Connect:
- Works reliably and efficiently across any type of network, from VSAT uplinks to Cell to Wi-Fi Internet.
- Makes optimal and fair use of bandwidth, and offers fine-grained bandwidth control.
- Overcomes limitations with hub-and-spoke architectures — unlike traditional endpoint to edge and edge to core data transfer solutions — and can transfer from multiple remote endpoints directly to a single core or cloud or any other remote location.
- Provides complete control over remote transfers across as many locations as needed, located anywhere.
- Offers continuous and reliable offline access for remote systems.
- Can make use of multiple networks — where an offline LAN on-site is used for high-speed interconnect — then efficiently update changes once back online.
- Includes built-in WAN optimization, and can be configured to minimize usage across uplinks (satellite or cell) and take full advantage of remote Wi-Fi.
- Syncs files 3-10x faster than Azure and has successfully synced 450 million files in one job.
Schedule a demo with Resilio’s team.
Comparing Azure Data Box Gateway & Resilio Connect
To transfer data into Azure cloud storage, Microsoft offers users several solutions:
- Azure Data Box Gateway: A virtual device that is installed on Azure Data Box or in a virtualized environment or hypervisor.
- Azure Data Box Edge: A physical device deployed on-premises that can transfer data to the Azure cloud, designed specifically to optimize data transfer in edge deployments.
- Azure Data Box: A physical device that can collect data on site and then be shipped to Microsoft for offline ingestion into your Azure storage account. They offer Microsoft Azure Data Box (100TB capacity), Azure Data Box Heavy (1PB capacity), and Azure Data Box Disk (capacity of 8TB SSD with a USB/SATA interface).
Azure Data Box Gateway can be used for archival ingestion, continuous ingestion, and incremental data transfer into the Azure cloud with automatic and real-time sync into Azure. It consists of the Data Box Gateway virtual device (which is installed in your virtual environment or hypervisor), the Data Box Gateway resource (a resource in the Azure portal that can be accessed via a web interface in order to manage the device, shares, users, and alerts), and the local web UI (which can be used to run diagnostics, shut down or restart the device, and file service requests).
But, Azure Data Box Gateway has several limitations that may incentivize certain users to seek an alternative gateway solution. For example, Azure Data Box Gateway can only be used to sync data from one data center into the Azure cloud, and can’t be used to keep data synchronized across multiple endpoints.
And, while it has some features that enable it to optimize transfer over networks, Azure is not an ideal solution for edge deployments. You either have to ship the physical data box to an Azure cloud data center for upload before syncing your data across your environment, or you must use Azure Data Box Edge (which isn’t always reliable and still requires data to be delivered to the Azure cloud before it can be synced across your environment). And, there are limits to the size of files it can ingest (more on this later) as well as how it caches data locally.
For easy analysis, we’ve compiled a table that compares and contrasts the features of Azure Data Box Gateway and Resilio Connect. While the details are important (we cover them in the next section), this table provides an overview comparison.
Azure Data Box Gateway | Resilio Connect | |
Deployment Options | Virtual appliance & physical device (the Data Boxes) | Software-only: Runs on any VM or device (even drones, Android phones, field machines, and more). |
Protocol | ⚬ NFS ⚬ SMB ⚬ iSCSI | ⚬ NFS ⚬ SMB ⚬ UDP-based Zero Gravity Transport (WAN optimization) |
Management | Web UI that can be accessed from anywhere | Web UI that can be accessed from anywhere |
Cloud Object Storage Support | Azure Blob Storage | Cloud-Vendor Agnostic Works with any S3-compatible object storage, on-prem and cloud, such as: ⚬ Azure Blobs ⚬ AWS S3 ⚬ Google Object Storage ⚬ Cloudian ⚬ Ceph ⚬ MinIO ⚬ VAST Data ⚬ Wasabi ⚬ Weka IO |
Sync Types | ⚬ Scheduled sync ⚬ One-way real-time sync | ⚬ Selective sync on-demand ⚬ Partial selective sync ⚬ Scheduled sync ⚬ Multi-directional real-time sync |
Sync Architecture | Point-to-point (transfer between at most two endpoints) | Peer-to-peer (transfer among any number of endpoints concurrently) |
Sync Topologies | ⚬ Hub-and-spoke ⚬ Follow-the-sun (Transfer from A to B and then from B to C) | ⚬ Hub-and-spoke ⚬ Two-way ⚬ One-to-many ⚬ Many-to-one ⚬ Full mesh (N-way) |
Local Cache | ⚬ Stores recently used files locally | ⚬ Stores recently used files ⚬ Stores files on-demand using selective sync |
Security Features | ⚬ Mutual authentication ⚬ AES 256-bit encryption | ⚬ Mutual authentication ⚬ AES 256-bit encryption ⚬ Data immutability ⚬ Cryptographic data integrity validation ⚬ Access control |
WAN Optimization | None | UDP-based Zero Gravity Transport protocol |
Reliability | ⚬ Checksums can be generated to verify data integrity ⚬ Single points of failure: The Data Box can get lost or damaged ⚬ The point-to-point sync architecture for Azure Data Box Gateway creates the potential for sync failure at each endpoint | ⚬ No single point of failure — always works over any network ⚬ Cryptographic data integrity validation ⚬ Resilio Connect’s fault-tolerant transfer protocol automatically resumes every failed transfer until the job is 100% completed — the automatic retry resumes from the point of interruption |
How Resilio Connect Provides a Superior (& Complementary) Gateway and Sync Solution
Resilio Connect is a file ingest and gateway solution that you can use to ingest, transfer, and sync files stored as objects in the Azure cloud (as well as in any other cloud storage solution). You can use Resilio Connect by itself, or use it in conjunction with Azure Data Box Gateway in order to overcome Azure’s shortcomings and keep your data synchronized across your entire environment.
Resilio Connect syncs data using a P2P replication architecture, which enables it to sync environments 3-10x faster than traditional solutions like Azure Data Box Gateway. It’s designed to handle large enterprise deployments, and can sync files of any size and number across hundreds of endpoints in the same time that most solutions take to sync two endpoints.
Resilio Connect also utilizes a proprietary WAN optimization protocol to enhance transfer over any unreliable, high-latency, loss-prone networks. This makes Resilio an ideal solution for use cases such as large-scale fleet data collection and control, ingestion and synchronization of data from the edge, and more.
Superior Edge Deployment & Network Utilization
When ingesting or syncing data from the edge (such as collecting data from a fleet of vessels at sea or delivering data to remote locations), it’s important that your gateway solution be able to optimize data transfer in areas with little to no network connectivity.
Resilio Connect is a software-only solution that works with VSAT, cell (3/4/5G), Wi-FI, or any IP connection. Your edge data is reliably collected using any device and operating system. And, you can ingest the data directly into any cloud, including Azure, and immediately sync it across your entire environment.
Resilio optimizes bandwidth utilization, and enables you to control how bandwidth is distributed across all endpoints. It provides continuous and reliable offline access for remote systems. You can use multiple networks — where an offline LAN on-site is used for high-speed interconnect — then efficiently update changes once back online.
Resilio utilizes a built-in, highly resilient, WAN-optimized UDP-based transport known as Zero Gravity Transport™ (ZGT). No matter how poor or intermittent your network connection, ZGT enables bulletproof predictability of transfers and gets your data where it needs to be by:
- Using a congestion control algorithm that constantly probes the RTT (Round Trip Time) to identify and maintain the ideal data packet send rate.
- Sending interval acknowledgements — i.e., acknowledging groups of packets rather than each individual packet, which enhances transfer speed.
- Delayed retransmission — i.e., retransmitting lost packets in groups, rather than individually.
Case Study: Northern Marine Group
Northern Marine Group uses Resilio Connect to distribute and synchronize updates across their fleet of sea vessels.
“With Resilio, we can now proactively monitor these systems, validate ones that are likely to cause problems and fix them. So, we’re not just using it to keep these vessels up to date. We’re also using it to remotely administer these environments.“
Learn more about how Resilio Connect helped Northern Marine Group reduce time to compliance by 92%.
Versatility & Unified Management of Your Data Storage
Azure Data Box Gateway can only be used to sync data from devices with Windows operating systems to the Azure cloud. If you want to sync to another cloud or use a device with another OS, you’ll need to invest in another gateway solution — increasing costs and the complexity of managing your environment.
But, Resilio Connect is hardware and cloud-vendor agnostic. It works with:
- Windows and other popular operating systems, such as Mac, Linux, FreeBSD, OpenBSD, iOS, Android and more.
- Azure and any other cloud storage provider, such as Google storage, Wasabi, AWS, Backblaze, MinIO, and more.
- The desktops, servers, networks, DAS, NAS, and SAN storage you’re already using.
- Virtual machines, such as VMware, hypervisors (such as MicrosoftHyper-V), Citrix, and more.
Because of its versatility, Resilio can be deployed on your existing environment with minimal operational interruption. And, while Azure Data Box Gateway only handles files stored on-premises and in the Azure cloud, you can use Resilio Connect to manage all of your cloud and on-prem data from one unified location — which is especially useful in multi-cloud storage scenarios.
From Resilio’s Management Console, you can:
- Automate bandwidth utilization, and create policies that govern how much bandwidth each endpoint has access to at certain times of the day and on certain days of the week.
- Manage files stored in the Azure cloud (or whatever cloud storage you’re using).
- Create and control replication jobs and collect real-time performance metrics.
- Manage and monitor Resilio agents and job functions.
- Configure replication parameters, such as disk I/O, buffer size, and more.
- Collect logs and get notifications sent to email or Webhooks.
- Deploy instructions across public, private, and hybrid cloud storage.
- Use Resilio’s REST API to script any type of automation and functionality your job requires
End-users can also access your data in the Azure cloud or on-prem servers from a unified interface that operates much like Microsoft OneDrive.
Efficiency and Flexibility: Optimize Workflows and Reduce Costs
Resilio Connect offers capabilities that you can use to reduce operational costs and optimize workflows for enhanced productivity.
While Azure Data Box Gateway enables you to store recently accessed files on local devices, Resilio Connect lets you choose which files you want to store locally. This enables you to have full control over local cache, so you can store frequently accessed files locally and infrequently accessed files in long-term cloud storage — freeing up space on your on-prem devices and reducing the costs of data egress from the cloud. Selective caching also enables you to optimize workflows by giving employees faster access to the files they need most.
Resilio offers another beneficial feature known as Transparent Selective Sync (TSS). TSS gives you full control over how data is synchronized across your cloud and on-prem endpoints. You can configure files and folders to only be synchronized to specific endpoints and relevant employees/offices — reducing the amount of data that is transferred, the burden on your network, and the costs of data migration.
You can automate file synchronization, either in real-time or on a fixed schedule. By automating syncs, you eliminate the need for employees to manually sync their data and enable them to focus on their tasks.
End-users can perform partial downloads of data, so they can acquire only the parts of files and folders that they need at the moment. This gives employees faster access to what they need and reduces costs.
With most gateway solutions, employees would need to access files from a local server in a nearby office or data center. In distributed workforce scenarios where employees are stationed in different locations, this requires you to invest in expensive servers for each office. But, because it is a P2P solution, Resilio Connect can deliver global file accessibility directly to each endpoint.
In other words, remote employees can access data directly from the desktops/laptops/workstations they’re using — making Resilio Connect a much more cost-effective option at scale.
High-Performance Synchronization with P2P Replication
Azure Data Box Gateway syncs data from on-prem devices to the cloud. Once the data is in the cloud, it can be synchronized across your other cloud regions and on-prem environments.
But, this is an ineffective method for syncing applications, as data must first get sent to one endpoint (the cloud), then get distributed to each other endpoint one by one.
Resilio Connect’s P2P replication topology enables each endpoint in your system to communicate and share files with every other endpoint simultaneously. This gives you the power of omnidirectional synchronization — i.e., one-way, two-way, one-to-many, many-to-one, and N-way synchronization — so you can sync your environment much faster.
N-way sync is especially useful in situations where you need to sync a large number of servers quickly (as each server can contribute to the sync process simultaneously, resulting in faster sync of your environment) and when you have a geographically distributed workforce collaborating on the same files (as every team can make changes to files and have those changes synced to everyone else, irrespective of location).
When syncing files, Resilio replicates only the changed portions of data. It also uses a process known as file chunking to break the data down into multiple chunks that can be transferred independently of each other. Every endpoint can work together to share file chunks simultaneously, so you can utilize the full bandwidth of your system and sync files 3-10x faster than traditional solutions.
P2P replication also enables Resilio Connect to scale organically. Since every server can take part in the sync process, adding more servers to your system only increases the speed and resources available to you. So, Resilio can sync hundreds of servers in roughly the same amount of time as most solutions sync two.
And, because every server can communicate with each other, there’s no single point of failure. If one server goes down, the files or services can be retrieved from any other server. This makes Resilio an ideal solution for disaster recovery, as it can achieve sub-five-second RPOs (Recovery Point Objectives) and RTOs (Recovery Time Objectives) within minutes of an outage.
Azure’s documentation states that, when syncing multiple files, if the aggregate size of files is greater than 10GB, you should use a bulk-copy program like Robocopy or Rsync. So if you’re trying to sync multiple large files with Azure Data Box Gateway, you’ll need another solution (like Resilio).
But, Resilio can replicate any amount of data effectively. It can handle files of any size and number. Resilio’s engineers were able to successfully sync 450+ million files in a single job.
Case Study: Lindblad Expeditions
Lindblad Expeditions is an ecotourism and nature photography company that uses Resilio Connect to share data between their ships and HQ and maintain fleet software.
“Resilio Connect has been a game changer. It’s proven to be reliable in file transfer, It’s proven to be reliable in database replication. Overall, Resilio Connect jobs are very easy to set up and they just work!“
Additional Security Features for Extra Data Protection
In addition to AES 256 encryption and mutual authentication, Resilio includes several other security features that Azure doesn’t have in order to help you secure your data. These features include:
- Data immutability: Immutable copies of files are stored in the public cloud, protecting you from ransomware and data loss.
- Cryptographic data integrity validation: Resilio ensures files always arrive at their destination uncorrupted.
- Access control: You can control who gets access to specific folders and files.
Resilio’s security features were reviewed by 3rd-party security experts. They’re built into Resilio’s system, so you don’t have to invest in 3rd-party security tools or VPNs.
Use Resilio Connect to Ingest, Sync, and Access Azure Files
Resilio Connect is the best solution for ingesting and syncing data in the Azure cloud because it:
- Uses proprietary WAN acceleration technology to optimize network utilization, so you can predictably transfer data over any network no matter how unreliable.
- Works reliably across any type of network, including VSAT uplinks, Cell, and Wi-Fi.
- Supports any hardware or cloud storage vendor, so you can deploy it on your existing environment with little operational interruption and manage your entire environment from one unified location.
- Provides complete control over remote transfers across as many locations as needed.
- Caches, downloads, and syncs data both efficiently and flexibly, so you can reduce cloud storage costs and optimize workloads.
- Has no reliance on hub-and-spoke architectures. It uses P2P replication to sync in any direction and can transfer from multiple remote endpoints directly to a single core, cloud, or another remote location.
- Scales organically to support large environments and large amounts of data (i.e., can handle files of any size and number).
- Offers continuous and reliable offline access for remote systems.
- Provides native security features that secure data and save money.
- Optimizes bandwidth utilization.
To learn how Resilio Connect can drastically improve sync and gateway performance in the Azure cloud, or any cloud, please schedule a demo.