When data locality and real-time replication matters, go Resilio.
I recently started working at Resilio. I first learned about and started working with Resilio at my previous company, where I worked as a VDI and storage engineer. We were looking for faster, more reliable ways to replicate and sync data across the US. The company’s IT infrastructure is composed of on-prem data centers, Azure public cloud regions, and many widely distributed teams working on mission critical, time-sensitive engineering projects.
We tested a variety of storage and VDI profile replication options such as NAS filers, SANs, public cloud storage, and numerous global file systems from Nasuni, Panzura, PeerGFS, et al; each has its own pluses and minuses feature-wise. Yet each failed to perform the way we needed.
Whatever the vendor, global file systems (GFS) suffer from a common problem: first in, first out hub-and-spoke architectures. File replication and sync is serialized through the hub. We needed to keep files updated and in sync in near- or real-time—across all of our sites and endpoints. So the GFS model just didn’t work well for us, especially at the edge.
Another reason GFS didn’t work is that data locality matters.
As a customer, I picked Resilio Connect because it’s not sitting behind that wall of first in, first out. In the Resilio model, any file can go anywhere at any time. You can instantly replicate and sync files in any direction—one-to-one, two-way, one-to-many, or many-to-many. You just can’t do that with a global file system. Or a distributed file system. Or a scale-out NAS.
One example at my previous company, we had an office in Kansas City and another remote facility in Washington State–which happened to be located on the side of a mountain. Remote teams of engineers needed immediate file access to the latest plans and images for this project. I’m talking about design plans, building plans, CAD files, Microsoft Office docs, and other time-sensitive files. These are business critical file data workflows.
Before Resilio, we tried a variety of global file systems and file services, from remote caching appliances on site to file servers attached to fiber channel SANs to cloud storage; the list goes on. The GFS approach couldn’t overcome the latency out to the edge—and we had to wait for replication to a primary on-prem (hub) data center before updates trickled out to the edge. This GFS hub bottlenecked our entire workflow.
In short, with a global file system, you’d have to first sync to/from the central DC or in Nasuni’s case the cloud. (Nasuni’s cloud file system is a GFS hosted on AWS.) If we used their remote caching appliance, it still had to sync back to the central hub for updates. The updates would at best trickle down. For example, if there was a change order committed in Kansas City, then the change has to be replicated to another data center before it can once again be replicated out to the caching VM—and then wait again until the file update is replicated out to our field engineers (representing each spoke connected to the hub).
Sometimes GFS file replication would take hours. One reason is that, if there are a lot of changes happening, the appliance synchronizes the changes in the order they were committed; that is, first in, first out. In the GFS world—file replication and sync is serialized through the hub.
And the reality is, time is money. Our teams and company couldn’t afford to wait around for updates to percolate out to the field.
So instead of having to sync the GFS hub node with each of the endpoints, we deployed Resilio Connect. Resilio avoids that bottleneck by sending files directly to wherever they need to be. In the Resilio model, multiple transactions run in parallel across multiple nodes (endpoints) that can scale-out to move and sync the files rapidly either on-demand or automated in real-time.
Using Resilio, you can immediately start syncing changes. Instead of waiting on the file from the home office to sync to the file store and then pull that over a VPN or to the onsite appliance, I have the devices right on my laptop when I need them. For example, if the engineer sitting next to me needs the files, there is no need to transfer that file all the way from the home office and back out to the field. Resilio will see the files on the other laptop and pull it over to mine seamlessly using wireless or any network we set up. Resilio saved us countless hours.
Other benefits of Resilio over global file systems are flexibility and agility. I’m not stuck behind an appliance. Local caching is done directly on the endpoint. Any device with a local file system can be a storage location. Or remote object storage. It’s super flexible because you can enable an unlimited number of scenarios: working remotely, site-to-site sync, incorporating the public cloud, and so forth.
In our WA state use case, there could be 2 guys in a truck with a hot spot. Sometimes it could be a Conex box (a shipping container with some IT gear and a VSAT uplink). Prior to using Resilio, some guys would have to drive down to McDonald’s to use the free WiFI to sync back to the GFS caching appliance. Or they’d have to phone home to the data center. With the global file system, we’d also need a VPN which was slow and problematic. We couldn’t afford to wait on high bandwidth connectivity or a sync job to central NAS storage to complete.
With Resilio, we could replicate their project folder locally, from anywhere to anywhere, using any type of network. One scenario could be in one location where an engineer had to share all of the project files with other engineers. Resilio made that extremely easy. If one person received the download, that user could immediately sync changes with all other users at the remote location. When I can sync things without user intervention and not rely on VPNs, it helps. Since Resilio doesn’t rely on a VPN that’s big; with a GFS I would need a VPN. With Resilio, file integrity, security, and encryption are all built in. And features like file permissions (ACLs, etc.) are replicated with files as well.
In our traditional GFS setup, there were tons of complaints about sluggishness; or that the files didn’t arrive at all. There were lot’s of performance problems. With Resilio, these problems go away. If someone on site has the latest copy of the file, all of the other laptops on site (some of which may not be connected to the Internet) can still sync up with that one user’s device.
And with edge deployments, there’s bound to be a few connectivity issues. There’s the WAN and dependencies on high latency and slow uplinks. And Resilio does a really good job getting around latency and irritating bandwidth issues. (Resilio’s built in WAN optimization syncs faster and copes well with high latency networks like VSAT, cell, radio, and WiFI.)
I mentioned earlier that time is money. Why else does high performance replication matter? Reliability.
From an IT perspective in the construction world, no news is good news. And there was a lot of “no news” using Resilio! What this meant for me and other IT folks was that user complaints went down; there were fewer tickets coming in and overall stress went down. You don’t have end users calling you saying, “My files are missing”. In the traditional GFS world, you just have to wait. And waiting was stressful because there was nothing I could do to fix the problem.
Using Resilio Connect, if someone called in saying they couldn’t see a file update, I could log into the Resilio management console, and see and track the file update—and then report back to that person on what’s happening. Things like reporting on file transfer status and progress, and what’s been delivered and what hasn’t, is easy to see with Resilio. And with Webhooks, it was easy to hook into Microsoft Teams and Slack for locked files or failed jobs. I could see what was happening and take some action (like potentially unlock the file or report on status). We didn’t have to poke the system much. It just worked. And having an API at our fingertips helped extend this visibility even further.
Then there’s the added benefit of being able to use any storage from any vendor. Not to mention any device, even a laptop (running Windows, macOS, Linux) could be a caching point. And unlike a GFS, you have the flexibility to push and pull files automatically or on-demand—in any direction over distance.
Speed and scalability are also important. Resilio is high performance across any network. Speed increases in lock-step with bandwidth. As you add endpoints you actually increase the performance because the file replication is hashed in parallel across all (participating) endpoints, which can be configured or set automatically. This helps when you need to:
- Efficiently send files fast and direct from any place to any place (across any network).
- Replicate in real-time – as changes are made to files, they are instantly sent and synchronized.
- Be efficient. Resilio’s differential sync engine (how Resilio sends deltas) is better than Nasuni’s and other GFS’s and scales to sync many millions of files concurrently.
Resilience was also key for us. Devices would periodically fail or just go offline when you least expect. Or there could be actual network connectivity issues. Another cool thing about Resilio is that there’s no SPOF. Because Connect is based on a peer-to-peer architecture, Resilio routes around failures automatically. And then there’s file integrity. I think many people take this for granted. With Resilio, files are never corrupted. Since Resilio maintains a file hash even in a failure during replication, Resilio will always bring a bad file back to the correct version following a failure or outage.
Unlike a hub-and-spoke GFS, Resilio is peer-to-peer, which speeds everything up for everyone. As I said earlier, users didn’t need to go all the way back to the hub to retrieve updates. I can reach across the network and grab the update from another available or nearby endpoint. In the Conex scenario, for example, there’s usually about 20 people working on a job. If one of them has most or all of the files, then they can share among themselves using any storage device (like a laptop)—and any one of them can sync and it’s almost like a local peer-to-peer network. Someone new may arrive on site and need all of the files. Resilio can immediately update that new person’s laptop.
Across our brick-and-mortar data centers (a multi-site hybrid cloud) availability is also very important. Both in terms of uptime during upgrades and potential outages. Some storage models are by their very nature disruptive. For example, many NAS architectures can only replicate from a single source to a single target both on-prem or in Azure. And the standby target is usually passive. I.e., if there’s a failure, the standby has to be promoted to being active. Then a failover has to happen. This all takes time and drives up RTOs and RPOs. Depending on your workloads, you may need higher levels of availability than can be provided with traditional storage replication. By contrast, Resilio keeps files in sync within seconds and you the setup for VDI and other use cases can be active-active across multiple sites.
Another storage issue many grapple with is complexity. Managing traditional storage replication requires numerous management layers, in some cases managing metadata servers (like cluster file systems). This is a huge issue if you are familiar traditional enterprise storage solutions (name your vendor). With Resilio, there’s a tiny learning curve up front but from a storage perspective, Resilio simplifies so many things. The Resilio model simply works with files (instead of LUNs, zones, volumes, file systems, RAID groups, and so forth). And when using the cloud, object storage can be presented as files as well.
Comparison — Where to use Resilio vs. a GFS
This is not to say that global file systems don’t have their place. So when should you use a Global File System (GFS) and when should you use Resilio Connect?
Both Resilio and the GFS model—on the positive side—simplify data management. Resilio minimizes complexity through automation. File replication is real-time and automated. Files can be kept up to date within seconds. Resilio has no limits on file size or type. You can replicate and synchronize the largest data sets using NAS and other storage your company already owns.
In general, global file systems are good when:
- You can centralize all of your primary storage in a single location and don’t need fast access at the edge or across multiple locations.
- Hub-and-spoke replication architectures suffice meeting your workflows and SLAs. I.e., when one-way replication between at most 2 sites or 2 endpoints is “good enough”. Note that, If needed, Resilio can also be configured to distribute files in a hub-and-spoke-type topology.
- Disaster Recovery (DR) and other objectives can be met within the limits of 1 source to 1 target. For example, if your RTO is 15 minutes, and you only have 2 sites, you may be able to achieve that with scheduled snapshot-based replication on a filer or GFS. But if you have more than 2 sites or endpoints, or real-time update requirements, you may not be able to.
- Workflow objectives for file data can be met within these constraints.
- Global file system-specific features are needed (file locking, etc.). Resilio does not provide native file locking. However, Resilio works out-of-the-box with global namespace technologies like Microsoft DFS and NFS v.4.0 (or any modern version of SMB and NFS). If you have DFSR, Resilio has an elegant DFSR replacement.
- If you’re OK with implementing a single vendor storage solution (which in some cases may be proprietary). With Resilio, you can use any storage from any vendor. And mix and match vendors.
Resilio Platform is preferred when:
- Data locality matters: Resilio stores files near users and applications. If the user or application moves (e.g. to the cloud or another location), Resilio makes it easy and fast to keep files with users, VDI profiles, and apps.
- Real-time updates matter: you need files current and up-to-date in multiple locations, across multiple devices or storage systems.
- You need to replicate or sync files fast and resiliently across multiple locations. When there are more than 2 locations (or endpoints), Resilio’s scalability rocks, replicating updates to all locations in about the same time as the time it takes to sync between 2 locations.
- You need the flexibility to replicate or sync files in any direction—one-to-one, bidirectional, one-to-many, many-to-one, or many-to-many.
- Eliminating single points of failure (SPOFs) and active-active HA is important to your organization. With Resilio’s peer-to-peer file delivery, there are zero SPOFs and HA is active-active.
- Fast time-to-access for all files located anywhere (e.g., for distributed file sharing and access, upgrades, DR, and VDI profiles). We saw a big speed up with VDI in terms of faster login times (aka time-to-desktop).
- Infrastructure and platform flexibility are desired (storage, operating systems, clouds, etc.).
Summary
For use cases where data locality matters and you need to get your files where you need them as fast as possible across multiple locations, consider Resilio Connect. When it comes to moving files fast and in real-time–across multiple endpoints concurrently–Resilio can’t be beat in terms of speed, agility, resilience, and scalability. And of course, you have the flexibility of using storage you already own–on your choice of IT and cloud infrastructure.
We’d appreciate the opportunity to learn more about your goals and needs. Please feel free to schedule a demo or start a free trial to see if Resilio Connect can help your company replicate file data faster. In the meantime, feel free to check out the Resilio case study about my previous company’s VDI profile replication solution.