Architecture advice needed — 3-node cluster for agentless M365 monitoring at high log volume

40 views

Skip to first unread message

Asmit Desai

unread,

Jun 8, 2026, 4:20:54 AM (5 days ago) Jun 8

to Wazuh | Mailing List

Hi Wazuh community,

I'm building out a Wazuh deployment for internal security monitoring and would love some architecture advice before I commit to the full setup. I've done a fair amount of research (including your documentation and mailing list archives) but want to validate my decisions before proceeding.
USE CASE
We are monitoring:
- Microsoft 365 / Exchange — via the native Wazuh O365 wodle module (fields map as office365.* for rule matching)
- Entra ID (Azure AD) — sign-in and identity events
CURRENT STATE
We started with an all-in-one single-node deployment on an HDD machine. As alert volume grew, performance degraded and we began planning a distributed cluster. We currently have:

- Node 1 (existing, HDD): Running Wazuh Manager (master), Indexer, and Dashboard
- Node 2 (new, SSD): Intended as Wazuh Worker + Indexer + Dashboard
WHAT WE'VE FIGURED OUT SO FAR
1. 2-node indexer cluster has a quorum problem — with minimum_master_nodes = 2, losing either node makes the cluster refuse writes. This defeats the purpose of our HA goal. We found your mailing list guidance recommending a minimum of 3 indexer nodes to avoid split-brain.
2. We are planning to add a third node (SSD) to resolve this.
3. For our planned 3-node setup, we intend to use role separation as follows:
- Node 1 (HDD): Wazuh Master Manager + Indexer (master-only role, no data/ingest) + Dashboard
- Node 2 (SSD): Wazuh Worker + Indexer (data + ingest) + Dashboard
- Node 3 (SSD): Wazuh Worker + Indexer (data + ingest) — no Dashboard

Rationale: Node 1's HDD is a bottleneck for write-heavy indexing I/O. As the master-only indexer node, it only handles cluster coordination (in-memory) and avoids disk-intensive shard writes. Dashboard on Node 1 and Node 2 provides UI-level HA without wasting RAM on Node 3 that should be reserved for indexing.

4. We plan to use DNS round-robin with low TTL for load balancing and failover across the Wazuh Manager nodes, rather than Nginx or Keepalived, since we do not have a third machine solely for load balancing.

5. We are using OpenSearch shard replication with number_of_replicas: 1 and auto_expand_replicas: 0-1 to ensure data survives any single node going down and automatically resyncs on recovery.
QUESTIONS
1. Does the proposed role separation make sense — specifically using Node 1 (HDD) as a dedicated master-only indexer while Nodes 2 and 3 (SSD) handle all data/ingest? Are there any caveats with this approach in Wazuh's indexer cluster?

2. Is it appropriate to run the Wazuh Dashboard only on Node 1 and Node 2, with Node 3 being dashboard-free? Or does Wazuh recommend any specific guidance on dashboard placement in a 3-node setup?

3. For our log volume (1–5M logs/day, agentless/rsyslog + O365 wodle), are 8 CPU cores and 16 GB RAM per indexer data node sufficient, or would you recommend higher specs?

4. With DNS round-robin as our load balancing strategy (no Nginx), are there known issues with Wazuh Manager cluster behaviour — particularly around agent connections or O365 wodle event routing when one node is temporarily unreachable?

5. Any general recommendations or pitfalls you've seen in similar agentless, high-volume deployments would be greatly appreciated.

Thank you for your time and for the excellent documentation — especially the mailing list thread on the 3-node minimum for split-brain avoidance, which was very helpful.

Best regards

ismail....@wazuh.com

unread,

Jun 8, 2026, 5:25:21 AM (5 days ago) Jun 8

to Wazuh | Mailing List

Hi,

After reviewing the proposed architecture against both the Wazuh and OpenSearch/wazuh indexer recommendations, the design is technically feasible. However, there are several considerations that should be evaluated before finalizing the deployment.

Using the existing HDD-based server as a dedicated cluster-manager-only Wazuh Indexer node is supported and can function correctly, provided that the node does not host data shards or perform ingest operations. In this configuration, the node primarily participates in cluster coordination activities such as maintaining cluster state, shard allocation decisions, node membership management, and leader election.

However, it is important to note that OpenSearch/wazuh indexer still persists cluster metadata and state information to disk. While cluster-manager nodes generally experience significantly less I/O than data nodes, SSD storage remains the preferred option because cluster state updates, recovery operations, shard reallocations, and cluster membership changes can still be impacted by storage performance. Therefore, using an HDD-based cluster-manager node is acceptable, but it should be viewed as a compromise rather than an optimal design.

Another important consideration is resource utilization. In the proposed architecture, Node 1 would contribute cluster coordination only, while Nodes 2 and 3 would be responsible for all indexing, storage, search, and ingest operations. As a result, the HDD server contributes little processing capacity toward actual log ingestion and storage. For small-to-medium Wazuh deployments, this can lead to underutilization of available hardware resources, particularly if the server has significant CPU and memory capacity.

The more significant concern is ensuring proper cluster-manager eligibility and quorum. OpenSearch relies on quorum-based decision making, and high availability depends on maintaining sufficient cluster-manager eligible nodes. If Node 1 is the only cluster-manager eligible node in the cluster, it becomes a single point of failure. In such a scenario, the loss of Node 1 could prevent cluster-manager election and impact cluster operations. For production environments, OpenSearch recommends maintaining multiple cluster-manager eligible nodes to ensure cluster resilience and proper leader election.

From a best-practice perspective, the preferred approach for a three-node Wazuh Indexer cluster is often to allow all three nodes to be cluster-manager eligible while using the SSD-backed nodes for data storage and ingest workloads. This provides better utilization of cluster resources while maintaining quorum and fault tolerance.

If the HDD server must remain part of the deployment, a reasonable compromise would be:

Node 1: Cluster-manager eligible node (preferably no data role if storage performance is a concern)
Node 2: Cluster-manager eligible + Data + Ingest
Node 3: Cluster-manager eligible + Data + Ingest

This approach preserves quorum, avoids a single point of failure, and allows the cluster to continue operating if any individual node becomes unavailable.

In summary, the proposed architecture is supported and should work for the expected ingestion volume. However, the primary concern is not the use of HDD storage itself, but rather ensuring proper cluster-manager eligibility, quorum design, and efficient utilization of available hardware resources. SSD storage remains the recommended option for all OpenSearch/wazuh indexer nodes, including cluster-manager nodes, whenever possible.

References:
Wazuh Indexer Cluster Tuning: https://documentation.wazuh.com/current/user-manual/wazuh-indexer-cluster/wazuh-indexer-cluster-tuning.html
OpenSearch Cluster Tuning and Node Roles: https://docs.opensearch.org/2.19/tuning-your-cluster/
OpenSearch Dedicated Cluster Manager Nodes and Quorum Guidance: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-dedicatedmasternodes.html

Based on the information provided, the estimated ingestion volume is approximately 5 million events per day from Microsoft 365, Entra ID, and agentless Syslog sources.
Converting this to Events Per Second (EPS):

5,000,000 events/day ÷ 86,400 seconds/day = ~58 EPS average

However, infrastructure sizing should not be based solely on average EPS. Microsoft 365, Entra ID, and Syslog sources typically generate events in bursts rather than at a constant rate throughout the day. As a result, peak ingestion rates can be several times higher than the daily average.

Therefore, a practical sizing target would be approximately 300–600 EPS peak ingestion capacity.

Based on this workload, the proposed architecture should be sufficient provided that the indexer data nodes use SSD storage and proper retention and index management policies are implemented. Please find the recommended architecture.

Regards,

arc.png

Asmit Desai

unread,

Jun 8, 2026, 7:13:06 AM (5 days ago) Jun 8

to Wazuh | Mailing List

Hi,

Thank you for the detailed response and the architecture diagram — both were very helpful in validating our design decisions.

Based on your guidance, we have finalised the following role distribution across our 3 nodes:

- Node 1 (HDD): Wazuh Manager (master) + Wazuh Indexer (cluster_manager only, no data/ingest role) + Wazuh Dashboard
- Node 2 (SSD): Wazuh Manager (worker) + Wazuh Indexer (cluster_manager + data + ingest) + Wazuh Dashboard
- Node 3 (SSD): Wazuh Manager (worker) + Wazuh Indexer (cluster_manager + data + ingest)

We have a follow-up question regarding the Wazuh Dashboard placement.

Would you recommend installing the Wazuh Dashboard on all 3 nodes (all-in-one style on each node), or is installing only the Wazuh Manager and Wazuh Indexer on Node 3 sufficient?

Our concern is that running the Dashboard on all 3 nodes may unnecessarily consume RAM on Node 3, which we would prefer to reserve entirely for the Wazuh Indexer data node workload — especially given our target ingestion volume of up to 5 million events per day.

We currently plan to run the Dashboard only on Node 1 and Node 2, which provides UI-level redundancy without impacting Node 3's indexing performance. We would appreciate your confirmation on whether this is the recommended approach, or if there are specific reasons to deploy the Dashboard on Node 3 as well.

Thank you again for your time and support.

Regards,
Asmit Desai

ismail....@wazuh.com

unread,

Jun 9, 2026, 2:45:36 AM (4 days ago) Jun 9

to Wazuh | Mailing List

Hi,

Thank you for the follow-up and for sharing the finalized architecture.

Regarding the Wazuh Dashboard placement, your proposed approach of deploying the Dashboard only on Node 1 and Node 2 is perfectly valid and is the approach we would recommend for this deployment.

The Wazuh Dashboard is a stateless component that provides the user interface and communicates with the Wazuh Indexer cluster. It does not participate in indexing, data storage, shard allocation, cluster management, or event processing. Therefore, there is no requirement to install the Dashboard on every node in the cluster. deploying the Dashboard on Nodes 1 and 2 provides UI-level redundancy while allowing Node 3 to dedicate all available resources to Wazuh Manager and Indexer operations.

Additionally, multiple Dashboard instances are fully supported. All Dashboard instances connect to the same Indexer cluster and display the same data. Running multiple Dashboards does not duplicate data or increase indexing workload. The primary load generated by Dashboards comes from user searches, visualizations, and reporting activities, which are executed against the Indexer cluster regardless of the number of Dashboard instances.

One additional consideration is that Wazuh's recommended architecture for larger production environments is to separate the Wazuh Manager layer and the Wazuh Indexer layer onto dedicated servers. This recommendation exists because Managers are responsible for event processing, decoding, rule matching, and alert generation, while Indexers are responsible for indexing, searching, shard management, and storage operations. Separating these roles allows resources to be dedicated to each workload independently and simplifies future scaling.

However, for the estimated workload you described (approximately 5 million events per day, equivalent to ~58 average EPS with peak rates potentially several times higher), your proposed architecture remains a reasonable and practical compromise. That said, we recommend carefully validating the available CPU and memory resources on each node. According to Wazuh sizing recommendations, a deployment handling this workload may require approximately 8 CPU cores and 16 GB RAM for the Wazuh Manager alone. In your design, the same servers are also hosting Wazuh Indexer and, on some nodes, the Wazuh Dashboard. As a result, the total resource requirements will be higher than those allocated for a standalone Manager deployment. Before proceeding, we recommend reviewing the combined resource consumption of all components and ensuring that sufficient CPU, memory, and storage capacity are available, particularly on Nodes 2 and 3 where the Indexer data and ingest workloads will reside. Monitoring resource utilization after deployment is also recommended to confirm that adequate headroom remains for future growth.

I hope it helps. Please let us know if you have any further questions or concerns.

Regards,

Reply all

Reply to author

Forward

0 new messages