Windows Storage Replica

0 views
Skip to first unread message

Azalee Freas

unread,
Aug 3, 2024, 3:52:19 PM8/3/24
to paswebanksar

Storage Replica is Windows Server technology that enables replication of volumes between servers or clusters for disaster recovery. It also enables you to create stretch failover clusters that span two sites, with all nodes staying in sync.

Storage Replica offers disaster recovery and preparedness capabilities in Windows Server. Windows Server offers the peace of mind of zero data loss, with the ability to synchronously protect data on different racks, floors, buildings, campuses, counties, and cities. After a disaster strikes, all data exists elsewhere without any possibility of loss. The same applies before a disaster strikes; Storage Replica offers you the ability to switch workloads to safe locations prior to catastrophes when granted a few moments warning - again, with no data loss.

Storage Replica allows more efficient use of multiple datacenters. By stretching clusters or replicating clusters, workloads can be run in multiple datacenters for quicker data access by local proximity users and applications, as well as better load distribution and use of compute resources. If a disaster takes one datacenter offline, you can move its typical workloads to the other site temporarily.

Storage Replica may allow you to decommission existing file replication systems such as DFS Replication that were pressed into duty as low-end disaster recovery solutions. While DFS Replication works well over low bandwidth networks, its latency is high - often measured in hours or days. This is caused by its requirement for files to close and its artificial throttles meant to prevent network congestion. With those design characteristics, the newest and hottest files in a DFS Replication replica are the least likely to replicate. Storage Replica operates below the file level and has none of these restrictions.

Storage Replica also supports asynchronous replication for longer ranges and higher latency networks. Because it isn't checkpoint-based, and instead continuously replicates, the delta of changes tends to be far lower than snapshot-based products. Storage Replica operates at the partition layer and therefore replicates all VSS snapshots created by Windows Server or backup software. By using VSS snapshots it allows use of application-consistent data snapshots for point in time recovery, especially unstructured user data replicated asynchronously.

Stretch Cluster allows configuration of computers and storage in a single cluster, where some nodes share one set of asymmetric storage and some nodes share another, then synchronously or asynchronously replicate with site awareness. This scenario can utilize Storage Spaces with shared SAS storage, SAN and iSCSI-attached LUNs. It's managed with PowerShell and the Failover Cluster Manager graphical tool, and allows for automated workload failover.

Cluster to Cluster allows replication between two separate clusters, where one cluster synchronously or asynchronously replicates with another cluster. This scenario can utilize Storage Spaces Direct, Storage Spaces with shared SAS storage, SAN and iSCSI-attached LUNs. It's managed with Windows Admin Center and PowerShell, and requires manual intervention for failover.

Server to server allows synchronous and asynchronous replication between two standalone servers, using Storage Spaces with shared SAS storage, SAN and iSCSI-attached LUNs, and local drives. It's managed with Windows Admin Center and PowerShell, and requires manual intervention for failover.

Simple deployment and management. Storage Replica has a design mandate for ease of use. Creation of a replication partnership between two servers can utilize the Windows Admin Center. Deployment of stretch clusters uses intuitive wizard in the familiar Failover Cluster Manager tool.

Guest and host. All capabilities of Storage Replica are exposed in both virtualized guest and host-based deployments. This means guests can replicate their data volumes even if running on non-Windows virtualization platforms or in public clouds, as long as using Windows Server in the guest.

SMB3-based. Storage Replica uses the proven and mature technology of SMB 3, first released in Windows Server 2012. This means all of SMB's advanced characteristics - such as multichannel and SMB direct support on RoCE, iWARP, and InfiniBand RDMA network cards - are available to Storage Replica.

Security. Unlike many vendor's products, Storage Replica has industry-leading security technology baked in. This includes packet signing, AES-128-GCM full data encryption, support for Intel AES-NI encryption acceleration, and pre-authentication integrity man-in-the-middle attack prevention. Storage Replica utilizes Kerberos AES256 for all authentication between nodes.

High performance initial sync. Storage Replica supports seeded initial sync, where a subset of data already exists on a target from older copies, backups, or shipped drives. Initial replication only copies the differing blocks, potentially shortening initial sync time and preventing data from using up limited bandwidth. Storage replicas block checksum calculation and aggregation means that initial sync performance is limited only by the speed of the storage and network.

Consistency groups. Write ordering guarantees that applications such as Microsoft SQL Server can write to multiple replicated volumes and know the data is written on the destination server sequentially.

User delegation. Users can be delegated permissions to manage replication without being a member of the built-in Administrators group on the replicated nodes, therefore limiting their access to unrelated areas.

Network Constraint. Storage Replica can be limited to individual networks by server and by replicated volumes, in order to provide application, backup, and management software bandwidth.

Thin provisioning. Support for thin provisioning in Storage Spaces and SAN devices is supported in order to provide near-instantaneous initial replication times under many circumstances. Once initial replication is initiated, the volume won't be able to shrink or trim

Compression. Storage Replica offers compression for data transferred over the network betweenthe source and destination server. Storage Replica Compression for Data Transfer is only supported inWindows Server Datacenter: Azure Edition beginning with OS build 20348.1070 and later(KB5017381).

Storage Spaces with SAS JBODs, Storage Spaces Direct, fibre channel SAN, shared VHDX, iSCSI Target, or local SAS/SCSI/SATA storage. SSD or faster recommended for replication log drives. Microsoft recommends that the log storage is faster than the data storage. Log volumes must never be used for other workloads.

A network between servers with enough bandwidth to contain your IO write workload and an average of 5 ms round trip latency or lower, for synchronous replication. Asynchronous replication doesn't have a latency recommendation.

Disaster Recovery (DR) refers to a contingency plan for recovering from site catastrophes so that the business continues to operate. Data DR means multiple copies of production data in a separate physical location. For example, a stretch cluster, where half the nodes are in one site and half are in another. Disaster Preparedness (DP) refers to a contingency plan for preemptively moving workloads to a different location prior to an oncoming disaster, such as a hurricane.

Service level agreements (SLAs) define the availability of a business' applications and their tolerance of down time and data loss during planned and unplanned outages. Recovery Time Objective (RTO) defines how long the business can tolerate total inaccessibility of data. Recovery Point Objective (RPO) defines how much data the business can afford to lose.

Synchronous replication guarantees that the application writes data to two locations at once before completion of the IO operation. This replication is more suitable for mission critical data, as it requires network and storage investments, and risks degraded application performance by needing to perform writes in two locations.

When application writes occur on the source data copy, the originating storage doesn't acknowledge the IO immediately. Instead, those data changes replicate to the remote destination copy and return an acknowledgment. Only then does the application receive the IO acknowledgment. This ensures constant synchronization of the remote site with the source site, in effect extending storage IOs across the network. In the event of a source site failure, applications can fail over to the remote site and resume their operations with assurance of zero data loss.

Zero Data LossRPO1. Application writes data
2. Log data is written and the data is replicated to the remote site
3. Log data is written at the remote site
4. Acknowledgment from the remote site
5. Application write acknowledgedt & t1 : Data flushed to the volume, logs always write throughAsynchronous replicationContrarily, asynchronous replication means that when the application writes data, that data replicates to the remote site without immediate acknowledgment guarantees. This mode allows faster response time to the application and a DR solution that works geographically.

When the application writes data, the replication engine captures the write and immediately acknowledges to the application. The captured data then replicates to the remote location. The remote node processes the copy of the data and lazily acknowledges back to the source copy. Since replication performance is no longer in the application IO path, the remote site's responsiveness and distance are less important factors. There's risk of data loss if the source data is lost and the destination copy of the data was still in buffer without leaving the source.

Near zero data loss(depends on multiple factors)RPO1. Application writes data
2. Log data written
3. Application write acknowledged
4. Data replicated to the remote site
5. Log data written at the remote site
6. Acknowledgment from the remote sitet & t1 : Data flushed to the volume, logs always write throughKey evaluation points and behaviors

  • Network bandwidth and latency with fastest storage. There are physical limitations around synchronous replication. Because Storage Replica implements an IO filtering mechanism using logs and requiring network round trips, synchronous replication is likely make application writes slower. By using low latency, high-bandwidth networks, and high-throughput disk subsystems for the logs, you minimize performance overhead.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages