Active-Active High Availability for Citrix User Profile Management and VMware

Jason Goodman

Jason Goodman

Turbocharge Citrix UPM and VMware DEM with resilient, high performance, scale-out replication and sync

To paraphrase the car racing legend, Carroll Shelby: It’s silly to use an archaic taxicab engine from 1918 in a high performance race car of the 1960’s. So Shelby ditched the old engine and designed the 427 Cobra engine based on a new standard V-8 platform.  

The same holds true for VDI. You need the best performance from Citrix UPM and VMware DEM. So why use a slow and unreliable relic like DFSR?

The short answer is: Don’t. Resilio can help you turbocharge replication for VDI.  

Resilio offers a high performance, highly resilient, file replication alternative to legacy tools and storage replication. Resilio is fully compatible with Citrix UPM and VMware DEM—and is available through the Citrix marketplace and VMware marketplace as well.  

Bidirectional Replication for Active-Active High Availability 

Consider the need for two-way (vs. one-way) replication across 2 data centers. This could be all on-prem, a hybrid cloud, or cloud native. Resilio gives you a way to keep all sites active—there could be 2 or 20 or 2000 or more locations. By active-active we mean that all sites are serving VDI users concurrently. If there’s an outage, you can use global load balancing combined with Resilio Platform to redirect users to the closest site or one with the least amount of latency. 

From a disaster recovery and high availability perspective, Resilio gives you fast and reliable bidirectional replication for user profiles, user stores, and other file-based data. You can replicate VHDs, Office files, file systems, file shares and folders stored on DFS or most any NAS or file server. With or without Microsoft DFS.  

Nothing needs to change on the frontend for users: you can still use your Microsoft DFS namespace (DFS-N) hosted on Windows Servers or NAS for local and global access to all user stores and file shares and for load balancing and failover within and across sites.  On the backend, with Resilio under the hood, if a failure occurs, users will still have near instantaneous access to all of their user profiles, user data, and app data. That’s because Resilio is replicating changes concurrently for all user profiles in near real-time.

For example, you may need users connected through your Microsoft DFS namespace, DNS, or Netscaler on the frontend, or be using Citrix for global load balancing and failover. Resilio’s replication approach dynamically routes around failures and overcomes latency through native WAN optimization. Resilio builds resilience into the file replication process end-to-end to preserve file integrity and reliably replicate file changes. Architecturally, Resilio employs a peer-to-peer (P2P) design that distributes file chunks across multiple replication endpoints  in parallel.  From a VDI perspective, this gives you the flexibility to replicate file changes anywhere—at any time.  

Another benefit to end users is faster time-to-desktop. One customer saw a 3x faster time-to-desktop for VMware DEM compared to snapshot-based storage replication. 

Data Availability Matters

According to James Moore, a customer solutions engineer with Resilio: As a virtualization or storage manager, there are two questions you never want to hear (from users): 

  1. “Where are my files?” and 
  2. “Where are my new files?”  

If there hasn’t been a site outage, you may be wondering what the heck happened to those files.  And the answer is often as simple as: the file changes were never replicated because DFSR (or another tool) failed to replicate the data in time.  

So it’s absolutely critical that file changes are replicated as fast as possible. That way, when a user logs on to their operating system in a virtual desktop, their user profile is updated and ready for use. In systems like FSLogix Cloud Cache, which can provide active-active high availability, all of the updates are serialized through a single client/server instance—creating latency and degrading performance. 

Resilio gets around this through its peer-to-peer design, so all profiles can be updated in parallel, and at much higher speeds. Across any network—over any distance. 

Be Prepared for the VDI Storms—Scalability Readiness

According to Moore, file-based VDI solutions have been the profile standard for almost a decade now. It’s what we all know as VDI admins, and you would think it might be easier from a replication or HA perspective. Alas, several issues simply come down to problems with DFSR and compounding challenges using DFSR in larger, multi-site environments.

DFSR is especially problematic in larger environments facing high user churn mainly around log-off storms. These events can create several thousand files per user all at once during a log-off event. 

For example, if hundreds or even thousands of users all log off at the same time, and Citrix active write-back mode is enabled—you can still replicate files to get the profile changes propagated. When 1000 users concurrently log off—and need to immediately propagate the changes—you will likely overwhelm DFSR and cause it to crash or hang. Or worse, corrupt data. 

Each file created during the log-off process gets stuck in the DFSR backlog. DFSR serializes all replication between (at most) two servers and lacks scalability. So a large batch of updates fail to replicate.  

Other challenges with DFSR include:  

  • Poor visibility and monitoring – IT may spend countless hours troubleshooting user profile replication.    
  • Poor reliability and scalability – DFSR may fail or not scale to support replicating many concurrent changes at once. 
  • No just-in-time replication – While DFSR does support “partial file replication” it is notorious for queuing up changes in a backlog and not fully syncing files. 
  • Slow replication – Because DFSR does not scale beyond (2) file servers; jobs must be synced between the 2 servers for replication can occur to a 3rd server. 

Moore says that these issues are not limited to just DFSR but can be seen in some global file system (GFS) products as well where the backlog of replicated content can cause problems and drive tickets around out-of-date files or missing files.  I.e. Most storage replication tools employ a hub-and-spoke or point-to-point architecture, as Moore describes in his blog post on GFS vs Resilio

By contrast, Resilio distributes replication across multiple systems to replicate changes in near real-time.  And in a variety of directions based on your use case.  This could be bidirectional, one-to-many, many-to-one, or many-to-many.  

Location Matters—Keeping Data Close to Users

With today’s variable workloads and workflows and the move to remote work, it’s becoming more difficult for IT professionals to keep widely distributed VDI profiles and their associated data (applications, user files, and other settings) in sync across multiple locations.

The conventional hub-and-spoke storage approach does not work well for distributed implementations, as all changes needed to flow through a centralized hub.  Resilio avoids that bottleneck by sending files directly to any other endpoint. In the Resilio model, multiple transactions run in parallel across multiple nodes (endpoints) that can scale-out to move and sync the files rapidly either on-demand or automated in real-time. 

Using Resilio, you can immediately start syncing changes. Instead of waiting on the file from the home office to sync to the file store and then pull that over a VPN or to the onsite appliance, I have the devices right on my laptop when I need them. For example, if the engineer sitting next to me needs the files, there is no need to transfer that file all the way from the home office and back out to the field. Resilio will see the files on the other laptop and pull it over to mine seamlessly using wireless or any network we set up. Resilio saved us countless hours.

In terms of WAN optimization, Resilio optimizes replication for use over high latency or unreliable networks. Resilio can use either TCP or UDP; the sync process is highly optimized for data compression, dedupe, and scaling to sync 100s of millions of files; all allowing for faster replication for data in flight.

Moore says that many storage teams are continuously strategizing on ways to reduce downtime; either by being in multiple regions within a cloud provider or moving data closer to end-users, which is often times a challenge that has to be overcome in the Citrix user profile management world. He says, Resilio can help solve these data locality problems by keeping all your cloud and regions in sync. So that data and users never have to wait for a failover to happen to keep working.

“Maybe part of your resiliency plan is to burst from on-prem to the cloud to support seasonal or contract workers using VDI, or for DR. Resilio can help get your data where you need it—when you need.” Or what Moore refers to as just-in-time scalability. 

“Oftentimes this has to be accomplished a variety of ways. But implementing Resilio for VDI user profiles and profile containers, simplifies replication–and for Citrix and VMware makes it easy to scale. Simply provisioning new endpoints and user profiles will scale the system.” 

Moreover, he likes that Resilio works out of the box with core functionality included with Citrix UPM, roaming profiles, and other profile management software:  Citrix workspace, XenApp, Citrix virtual apps, Microsoft Windows, Active Directory user groups and GPOs, and Microsoft DFS, among other apps stored in XenDesktop or Citrix Workspace. 

Summary

Resilio Platform offers a turnkey scale-out file replication solution for your Citrix profile management and VMware DEM environments. Resilio gives you active-active high availability across multiple sites without compromising performance.  You get resilience for all files: zero data corruption; no single point of failure; and dynamic routing around failures. You get speed and scalability: add as many endpoints as you need (file servers, VHDs, file systems, user profiles, cloud instances, etc.).  And you get centralized management and automation for global control of user profile replication. 

In summary, this benefits VDI customers in ways such as.   

  1. Maintaining a high performance, predictable user experience for a widely distributed workforce.  The virtual desktop environment should load fast in predictable time frames when users log in. 
  2. Enabling multi-site active-active high availability for DR without compromising performance. User profiles, user data, and app data are always available, no matter the location—even across WANs.  
  3. Centralized monitoring and management.  As the popularity of VDI grows and larger numbers of users require virtual desktops, it should be just as easy as managing a smaller deployment.  
  4. Azure cloud and multi-cloud readiness.  If your company decides to incorporate Azure or another cloud provider’s storage, Resilio provides an easy way of extending your on-prem facilities to incorporate the cloud for VDI, using your cloud provider of choice (such as Azure, AWS, GCP, among others). 

We’d appreciate the opportunity to learn more about your goals and needs. Please feel free to schedule a demo or start a free trial to see if Resilio Connect can help your company replicate user profile and user store data faster and more reliably. 

Overview

Related Posts