Proxysql with Aurora cluster endpoint

Sulav Regmi

unread,

Mar 1, 2024, 10:02:45 AM3/1/24

to proxysql

Hello,

I'm seeing an issue with Aurora cluster read/write endpoints behaving differently from RDS. The hostgroup 1 is the only writer hostgroup and the rest are reader hostgroups. I'm seeing an issue where the reader (xx-db-rr.xx.internal) is running as hostgroup 1 looking at the stats_mysql_errors table. This is the exact same configuration with an RDS primary and a replica, but there were no issues like this. Please see below for the table contents. Am I misunderstanding something?

---------------------

---------------------

mysql> SELECT hostgroup, hostname, last_error FROM stats_mysql_errors\G
*************************** 1. row ***************************
hostgroup: 1
hostname: xx-db-rr.xx.internal
last_error: Cannot execute statement in a READ ONLY transaction.
*************************** 2. row ***************************
hostgroup: 1
hostname: xx-db-rr.xx.internal
last_error: The MySQL server is running with the --read-only option so it cannot execute this statement
2 rows in set (0.01 sec)

Thank you.

Sulav Regmi

unread,

Mar 18, 2024, 12:22:59 PM3/18/24

to proxysql

I think I know what the issue might be (have yet to confirm). The `mysql_replication_hostgroups` check_type should be `innodb_read_only` instead of the default `read_only` or to make it compatible during an RDS -> Aurora migration `read_only|innodb_read_only` since the `mysql-monitor_writer_is_also_reader` with the value of `true` will have the node be on both the reader and writer hostgroups.

Sulav Regmi

unread,

Mar 19, 2024, 3:39:13 PM3/19/24

to proxysql

A new issue around this is that the mysql_replication_hostgroups data from the proxysql.cnf file doesn't get applied when the check_type is set to read_only|innodb_read_only.

mysql_replication_hostgroups=

({

writer_hostgroup = 1

reader_hostgroup = 2

check_type = "read_only|innodb_read_only"

comment = "Database"

})

The above results in,

René Cannaò

unread,

Mar 19, 2024, 3:41:38 PM3/19/24

to Sulav Regmi, proxysql

Hi Salav,

ProxySQL supports Aurora with dedicated monitoring. Please use that, instead of replication hostgroups.

Details here:

https://proxysql.com/documentation/aws-aurora-configuration/

Thanks,

René

--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proxysql/95c54ee3-4d42-4af5-acb1-5b75e0403cb6n%40googlegroups.com.

Sulav Regmi

unread,

Mar 19, 2024, 5:18:25 PM3/19/24

to proxysql

Thanks René, I need to get this setup quickly which is why I'm trying to do this the easier way for now. I'll eventually look into using the dedicated monitoring.

René Cannaò

unread,

Mar 19, 2024, 6:02:06 PM3/19/24

to Sulav Regmi, proxysql

Hi Sulav,

I respectfully disagree here, sorry.

I think the "easy way" is the right way: you are struggling with this for over 2 weeks because you are using the wrong set of monitoring tools.

Use the right set of tool, it is the easy way: use the native support for AWS Aurora.

Thanks,

René

To view this discussion on the web visit https://groups.google.com/d/msgid/proxysql/c19f2a53-41d3-430e-9f78-e896878c8d1bn%40googlegroups.com.

Sulav Regmi

unread,

Apr 1, 2024, 11:32:29 PM4/1/24

to proxysql

Hey René,

Thanks. I switched to using the mysql_aws_aurora_hostgroups now.

mysql> SELECT hostgroup_id, hostname FROM mysql_servers;

+--------------+------------------------------------------------------------------+-----------------+---------------------+
| hostgroup_id | hostname | max_connections | max_replication_lag |
+--------------+------------------------------------------------------------------+-----------------+---------------------+
| 1 | xx-db-rw.xx.internal | 200 | 5 |
| 1 | xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 2 | xx-db-rr.xx.internal | 200 | 5 |
| 2 | xx-db-rw.xx.internal | 200 | 0 |
| 2 | xx-xx-rr.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 2 | xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 3 | xx-db-rr.xx.internal | 200 | 0 |
| 5 | xx-xx-mirror.endpoint.us-east-1.rds.amazonaws.com | 200 | 0 |
+--------------+------------------------------------------------------------------+-----------------+---------------------+

The xx-db-rw/rr.xx.internal endpoints point to the cluster rw/rr endpoints respectively and looks like ProxySQL discovers the instance endpoints and inserts into the the servers table.

The issue I'm currently seeing which I didn't see with the RDS backend is a high amount of Aborted clients on Aurora when running load tests. In general everything works fine, but once there's a high amount of load, errors like these show up,

MySQL_Session.cpp:1690:handler_again___status_PINGING_SERVER(): [ERROR] Ping timeout during ping on xx-db-rw.xx.internal:3306 after 200092us (timeout 200ms)

MySQL_Monitor.cpp:5913:monitor_AWS_Aurora_thread_HG(): [ERROR] Timeout on AWS Aurora health check for xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com:3306 after 1008ms. If the server is overload, increase mysql_aws_aurora_hostgroups.check_timeout_ms

MySQL_Monitor.cpp:6054:monitor_AWS_Aurora_thread_HG(): [ERROR] Error after 1000ms on server hs-db-rw.staging.internal:3306 : timeout check

mysql_connection.cpp:1178:handler(): [ERROR] Connect timeout on xx-db-rw.xx.internal:3306 : exceeded by 3077us

MySQL_Monitor.cpp:5862:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com:3306 after 1001ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 110

MySQL_Session.cpp:3101:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on xx-db-rw.xx.internal , 3306

They happen for about 30 seconds and then they resolve itself without any intervention. I even increased the connect_timeout_server to be 2s and increased the monitor_read_only_timeout to be 1.5s. The database is fine during that time, I can connect to it and query against it directly, but for some reason ProxySQL says it can't reach it. Aurora CPU gets to about 20% usage but nothing out of the ordinary due to load tests. I'm talking to AWS about if there's anything on the Aurora side. Any ideas, anything I can look out for on the ProxySQL end? Thanks!

Reply all

Reply to author

Forward