We’re using Amazon Aurora MySQL DB Cluster where we have 6 reader instances rep-1, rep-2, rep-3 and trx-1,trx-2, trx-3 . We've defined 2 custom endpoints in the Aurora reporting and transactionalReads.
We've assigned rep-1,rep-2,rep-3 to reporting custom endpoint and trx-1 to trx-3 to the transactionalReads custom endpoint.
We've done this because we have two distinct read workloads that need isolation and uniform load balancing of connections / queries across the reader instances in these respective reader groups reporting and transactionalReads -
- Reporting queries: Long-running, analytical. Dedicated reader group (custom endpoint: reporting).
- Transactional reads: Low-latency, user-facing. Separate reader group (transactionalReads).
Our goal is to:Uniformly load balance connections /queries within each reader group, no DNS-based round robin like Aurora MySQL as it assigns all connections to any 1 readers on connections burst on pool initialization OR when there is a load creating a burst of connections.
Control isolation of query workloads: For example - we create different datasources for reporting and transactionalReads with their own set of connection pools in our java backend applications. The queries executed using connections from the reporting datasource go to reporting custom endpoint and which internally go to reporting readers rep-1, rep-2, rep3.
Attempted Solutions
1. Amazon's advanced JDBC wrapper Driver: Failed to load balance across reader instances assigned under the custom endpoint. The driver has limitations that only supports load balancing for read/write splitting scenarios. Our unique case of custom endpoints is not supported, and Aurora's DNS-based load balancing causes all our connections on pool initialization or concurrent request load to go to any one of the reader instances.
We read about the ProxySQL and want to confirm that we can achieve the above goals with it -
What we've understand so far from documentation -
We can define 2 different hostgroups and add respective instance endpoints in this 2 host groups:
- hostgroup 10 → Reporting readers (IPs: reporting-reader-1:3306, etc.).
- hostgroup 20 → Transactional readers (transactional-reader-1:3306, etc.).
Can we route queries to reporting readers or the transactional readers based on SQL Comments in the query like this -
SQL comments (e.g., /* REPORTING */) or
Questions
Is this the right architecture for our use case? Can ProxySQL load balance say in a round-robin fashion when we establish the initial connection pool for reporting across the readers available in the reporting host-group ?
How to configure query rules to ensure:
- Reporting queries only use hostgroup 10.
- Transactional queries only use hostgroup 20.
How to handle reader failover (e.g., Aurora auto-scaling readers) with ProxySQL ?
Performance impact: Any tuning tips for high concurrency (1K+ connections)?