Wazuh Csross-Cluster Replication (Failover).

63 views
Skip to first unread message

Emar Flix

unread,
Feb 16, 2026, 6:21:51 AM (8 days ago) Feb 16
to Wazuh | Mailing List
Hi. I deployed wazuh cluster in two site, and now I want to connect this two site as active-passive failover structure.  
this is not for two sites from different domain. I get logs form same agents but my storages are in different zones. If Zone A (PR) crush, I don't lose any data because Zone B is in different location. 

I want to use load balancer in from of master nodes and only  PR site get logs from agents and DR indexers get logs form PR indexers and keep replicate data. If any disaster happen I switch DR site as active site. 


I cannot find oficial documentation from wazuh web site that how to implement that. Can anybody show me right way?


Olamilekan Abdullateef Ajani

unread,
Feb 16, 2026, 9:49:17 AM (8 days ago) Feb 16
to Wazuh | Mailing List
Hello Emarflix01,

Wazuh does not natively provide an active-passive multi-site cluster failover. Instead, you build this by layering two independent systems, load balancing, and using the OpenSearch replication plugin.

Traffic Routing: For the agents to move between sites, you need a way to shift their traffic.

DNS Failover: You update the DNS record for your manager to point to Site B’s IP. Note: Agents will remain 'Locked' until the DNS TTL expires.
Global Load Balancing (GSLB): The most robust method. A single entry point uses health checks to automatically route traffic to the healthy site.

Site A (Primary): Priority 10 (Weight 100)
Site B (Standby): Priority 20 (Weight 0)
This would be done on the load balancer configuration for proper routing.

Data Protection
Ref:
https://docs.opensearch.org/latest/tuning-your-cluster/replication-plugin/getting-started/

To ensure Site B already has the data when a failover occurs, you use Cross-Cluster Replication (CCR) on the Wazuh Indexers (OpenSearch).

Site A (PR): Actively receiving and indexing logs.
Site B (DR): Pulling a read-only copy of those indices in real-time.

For failover, you must manually stop replication on Site B to make the indices writable so it can start accepting new logs.

Lastly, Authentication for Agents: (client.keys)
If the agents move to Site B but Site B doesn't have their keys, they will be rejected. So you must continuously synchronize the /var/ossec/etc/client.keys file from the Site A Master to the Site B Master (using a tool like rsync or lsyncd).

If you don't sync, you'll be forced to re-register all agents or restore that specific file from a backup before Site B will accept any connections.

Please let me me know what you think.


Ref:

Olamilekan Abdullateef Ajani

unread,
Feb 20, 2026, 10:28:18 AM (4 days ago) Feb 20
to Wazuh | Mailing List
Hello Emarflix01,

Please always use the reply all so other community users can benefit from the thread. That said, Stopping replication using: POST /_plugins/_replication/<index>/_stop
does exactly what the OpenSearch documentation states: the follower index permanently detaches from the leader and becomes a normal writable index. 

However, this action only affects the specified index. Any other indices will remain follower indices and must be stopped individually if you intend to fully promote the DR cluster.

Before declaring DR fully active, it is important to verify replication status across all wazuh-* indices using the replication plugin APIs.

The CCR as described in the opensearch documentation correctly handles replication mechanics, but cluster promotion remains an operational/manual responsibility.

That being said, DR is achieved by combining:

Routing / DNS strategy
Indexer data strategy
Clear promotion procedures

Regards,

Emar Flix

unread,
Feb 20, 2026, 11:58:04 AM (4 days ago) Feb 20
to Wazuh | Mailing List
Thank you for your respond, Olamilekan.
I did what you say and it works now, I  shutdown PR cluster nodes from F5 and I can get agent logs to DR clusetr now. 

Now I want to get back PR cluster as leader and make DR cluster follower again. How to do that properly?. 

How I stop replication and delete auto-follow:
=======================================================================
DELETE /_plugins/_replication/_autofollow
{
   "leader_alias" : "pr-cluster",
   "name": "autofollow-wazuh-alerts"
}

GET _cluster/health?pretty

GET _cat/indices/wazuh-*?h=index

POST /_plugins{}/_replication/wazuh-archives-4.x-2026.02.19/_stop
=======================================================================
Olamilekan Abdullateef Ajani yazdı, 16 fevral 2026, bazar ertəsi, 18:49:17 UTC+4:

Olamilekan Abdullateef Ajani

unread,
Feb 20, 2026, 2:33:49 PM (4 days ago) Feb 20
to Wazuh | Mailing List
Hello again,

So from looking at that documentation, it says, "Once you run _stop on a follower index, replication cannot be resumed." The follower index permanently detaches, and replication cannot be resumed.

When bringing the PR cluster back, you need to treat the DR cluster as the authoritative data source, which is exactly the inverse of the previous action leading here with PR as the authoritative source.

First, ensure the PR cluster is healthy, remove any stale Wazuh indices from the PR environment, reconfigure CCR with the DR cluster acting as the leader and the PR cluster as the follower, and then allow the indices to fully initialize and replicate.

Ref:
https://docs.opensearch.org/latest/tuning-your-cluster/replication-plugin/getting-started/



Reply all
Reply to author
Forward
0 new messages