ProxySQL - Load balancing connections uniformly across readers mapped under a custom endpoint in Aurora DB cluster

18 views
Skip to first unread message

kamlesh sharma

unread,
Jul 29, 2025, 8:46:55 AMJul 29
to proxysql
Hi

We’re using Amazon Aurora MySQL DB Cluster where we have 6 reader instances rep-1, rep-2, rep-3 and trx-1,trx-2, trx-3 . We've defined 2 custom endpoints in the Aurora reporting and transactionalReads. 

We've assigned rep-1,rep-2,rep-3 to reporting custom endpoint and trx-1 to trx-3 to the transactionalReads custom endpoint. 

We've done this because we have two distinct read workloads that need isolation and uniform load balancing of connections / queries across the reader instances in these respective reader groups reporting and transactionalReads -

  1. Reporting queries: Long-running, analytical. Dedicated reader group (custom endpoint: reporting).
  2. Transactional reads: Low-latency, user-facing. Separate reader group (transactionalReads).

Our goal is to:

Uniformly load balance connections /queries within each reader group, no DNS-based round robin like Aurora MySQL as it assigns all connections to any 1 readers on connections burst on pool initialization OR when there is a load creating a burst of connections. 

Control isolation of query workloads: For example - we create different datasources for reporting and transactionalReads with their own set of connection pools in our java backend applications. The queries executed using connections from the reporting datasource go to reporting custom endpoint and which internally go to reporting readers rep-1, rep-2, rep3. 

Attempted Solutions

1. Amazon's advanced JDBC wrapper Driver: Failed to load balance across reader instances assigned under the custom endpoint. The driver has limitations that only supports load balancing for read/write splitting scenarios. Our unique case of custom endpoints is not supported, and Aurora's DNS-based load balancing causes all our connections on pool initialization or concurrent request load to go to any one of the reader instances.

2. RDS Proxy (Not implemented, just researched) : Unlike custom reader endpoints in Aurora, we can't control through RDS proxy's endpoint that queries/connections go to only rep-1, rep-2, rep-3 reporting readers. (Refer to section Limitations of Proxy Endpoints - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-endpoints.html#rds-proxy-endpoints-limits

We read about the ProxySQL and want to confirm that we can achieve the above goals with it - 

What we've understand so far from documentation - 

We can define 2 different hostgroups and add respective instance endpoints in this 2 host groups:

  • hostgroup 10 → Reporting readers (IPs: reporting-reader-1:3306, etc.).
  • hostgroup 20 → Transactional readers (transactional-reader-1:3306, etc.).

Can we route queries to reporting readers or the transactional readers based on SQL Comments in the query like this - 

SQL comments (e.g., /* REPORTING */) or

Questions

Is this the right architecture for our use case? Can ProxySQL load balance say in a round-robin fashion when we establish the initial connection pool for reporting across the readers available in the reporting host-group ?
How to configure query rules to ensure:
  • Reporting queries only use hostgroup 10.
  • Transactional queries only use hostgroup 20.
How to handle reader failover (e.g., Aurora auto-scaling readers) with ProxySQL ?
Performance impact: Any tuning tips for high concurrency (1K+ connections)?

Thanks,
Kamlesh Sharma

Dave Rix

unread,
Sep 30, 2025, 7:20:13 PM (6 days ago) Sep 30
to proxysql
Hi Kamlesh,
I'm not part of the ProxySQL team, but I found this question interesting and hopefully I can provide some input based on my usage of ProxySQL.

First, you can put the Aurora replicas into hostgroups as you have indicated, '10' for some read traffic and '20' for other read traffic, but that would mean you would have to manually manage the Aurora instances in those host groups and could not take advantage of the ProxySQL auto-discovery of new instances as they come online, or are replaced by Aurora (rare, but it can happen).

I have a similar situation with some very read-heavy workloads that used to need isolation, so we built a second Aurora cluster that uses traditional binlog replication to receive data updates from the main Aurora cluster. This allowed the read-heavy workloads to run in isolation from the regular workloads, and with the binlog replication keeping them almost in sync. Occasionally, the replication lags a bit, but that is due to certain workloads overloading the second cluster, causing the replication SQL thread to not be able to keep up with the IO thread - but this wasn't a big issue for us, as the output from the read-heavy traffic was not required to be real-time.

If you go down this route, you can take advantage of the ProxySQL auto-discovery, by specifying multiple entries in the `mysql_aws_aurora_hostgroups` and `mysql_replication_hostgroups` tables that manage the two clusters separately from the ProxySQL perspective and allow ProxySQL to add or remove instances as required if you scale-in or -out your clusters (maybe using Cluster auto-scaling?).

Whichever route you choose, I would recommend using separate 'usernames' for each of the applications, so that you would not need to specify /* REPORTING */ as a query hint to select the target hostgroup, but could specify the default hostgroup for each username, so those users were routed to the appropriate hostgroup, and thereby the required replica instances. Your users with 'write' permissions would then be routed by default to the hostgroup of the 'writer' instance in the primary cluster.

You may still get the problem of all the connections from a particular source going to one instance if they start up as part of a pool, but we've found that configuring a delay in the pool warmup and keeping the initial minimum connections in the pool small does seem to help with that, too. Don't ask me what the connection parameters are, I can't remember - the app team deal with that side of things.

On the connection side of things, we have a small ProxySQL cluster in AWS running on three m7.large instances, and we regularly push over 5k connections to the backend Aurora clusters. I suggest tuning the ProxySQL parameter mysql-connection_max_age_ms to a value other than zero (unlimited), as we found that the connections to Aurora continued to grow and consumed a significant amount of RAM on the instance. We have ours set for 2 hours, so ProxySQL drops idle connections to Aurora after 2 hours.

I hope that helps point you in the right direction - I'm not saying it's the best/ultimate solution for you, but it would satisfy most of your requirements.

Regards,
Dave
Reply all
Reply to author
Forward
0 new messages