Hey René,
Thanks. I switched to using the mysql_aws_aurora_hostgroups now.
mysql> SELECT hostgroup_id, hostname FROM mysql_servers;
+--------------+------------------------------------------------------------------+-----------------+---------------------+
| hostgroup_id | hostname | max_connections | max_replication_lag |
+--------------+------------------------------------------------------------------+-----------------+---------------------+
| 1 | xx-db-rw.xx.internal | 200 | 5 |
| 1 | xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 2 | xx-db-rr.xx.internal | 200 | 5 |
| 2 | xx-db-rw.xx.internal | 200 | 0 |
| 2 | xx-xx-rr.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 2 | xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com | 1000 | 0 |
| 3 | xx-db-rr.xx.internal | 200 | 0 |
| 5 | xx-xx-mirror.endpoint.us-east-1.rds.amazonaws.com | 200 | 0 |
+--------------+------------------------------------------------------------------+-----------------+---------------------+The
xx-db-rw/rr.xx.internal endpoints point to the cluster rw/rr endpoints respectively and looks like ProxySQL discovers the instance endpoints and inserts into the the servers table.
The issue I'm currently seeing which I didn't see with the RDS backend is a high amount of Aborted clients on Aurora when running load tests. In general everything works fine, but once there's a high amount of load, errors like these show up,
MySQL_Session.cpp:1690:handler_again___status_PINGING_SERVER(): [ERROR] Ping timeout during ping on xx-db-rw.xx.internal:3306 after 200092us (timeout 200ms)
MySQL_Monitor.cpp:6054:monitor_AWS_Aurora_thread_HG(): [ERROR] Error after 1000ms on server hs-db-rw.staging.internal:3306 : timeout check
mysql_connection.cpp:1178:handler(): [ERROR] Connect timeout on xx-db-rw.xx.internal:3306 : exceeded by 3077us
MySQL_Monitor.cpp:5862:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for xx-xx-rw.endpoint.us-east-1.rds.amazonaws.com:3306 after 1001ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 110
MySQL_Session.cpp:3101:handler_again___status_CHANGING_USER_SERVER(): [ERROR] Change user timeout during COM_CHANGE_USER on xx-db-rw.xx.internal , 3306
They happen for about 30 seconds and then they resolve itself without any intervention. I even increased the connect_timeout_server to be 2s and increased the monitor_read_only_timeout to be 1.5s. The database is fine during that time, I can connect to it and query against it directly, but for some reason ProxySQL says it can't reach it. Aurora CPU gets to about 20% usage but nothing out of the ordinary due to load tests. I'm talking to AWS about if there's anything on the Aurora side. Any ideas, anything I can look out for on the ProxySQL end? Thanks!