Failover not working as expected

24 views

Skip to first unread message

Mahendra Singh Bisht

unread,

Jul 9, 2025, 3:31:00 PMJul 9

to proxysql

Hi team,

I am facing an issue with testing failover in ProxySQL. Below are the details:

ProxySQL version : ProxySQL version 2.4.7-5-gd467cc0

MySQL version : Percona Server for MySQL : 5.7.40

OS version : CentOS Linux release 7.9.2009 (Core)

Master : va-db-replica-015

Replica : va-db-main-002.

Here is how I am simulating a failover.

Stop replication on the Replica : va-db-main-002
Stop MySQL service on the Replica : va-db-main-002

The proxysql.log shows :

2025-07-09 12:48:56 MySQL_Monitor.cpp:1230:monitor_ping_thread(): [ERROR] Error after 0ms on server va-db-main-002:3306 : Lost connection to MySQL server during query
2025-07-09 12:48:56 MySQL_Monitor.cpp:1565:monitor_read_only_thread(): [ERROR] Got error: mmsd 0x7fc3474d1140 , MYSQL 0x7fc343004600 , FD 43 : Lost connection to MySQL server during query
2025-07-09 12:48:58 MySQL_Monitor.cpp:1388:monitor_read_only_thread(): [ERROR] Timeout on read_only check for va-db-main-002:3306 after 0ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout on creating new connection: Can't connect to MySQL server on '10.64.100.10' (115).
2025-07-09 12:49:00 MySQL_Monitor.cpp:2890:monitor_ping(): [ERROR] Server va-db-main-002:3306 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.

ProxySQLAdmin> select * from mysql_servers;
+--------------+-------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10 | va-db-replica-015 | 3306 | 0 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | |
| 20 | va-db-main-002 | 3306 | 0 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | |
| 1 | 127.0.0.1 | 9999 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+-------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

ProxySQLAdmin> select * from runtime_mysql_servers;
+--------------+-------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1 | 127.0.0.1 | 9999 | 0 | SHUNNED | 1 | 0 | 1000 | 0 | 0 | 0 | |
| 10 | va-db-replica-015 | 3306 | 0 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | |
| 20 | va-db-main-002 | 3306 | 0 | SHUNNED | 100 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+-------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

ChatGPT says try: SAVE MYSQL SERVERS TO DISK; to save the runtime status but this doesn't seems a valid approach because we are forcing the behavior whereas the OFFLINE/SHUNNED status should be automatic.

SHOW VARIABLES LIKE 'read_only';
SHOW VARIABLES LIKE 'super_read_only';

The output for the above variables in Master and Replica are as expected. Master : 0/OFF and Replica : 1/ON.

Should we not rely on the mysql_servers table and should focus on runtime_mysql_servers?
Do we need to manually persist server status? Because the documentation says :

status: the configured of the backend. This does not represent the current status, but the configured one:
ONLINE – backend server is fully operational
SHUNNED – backend sever is temporarily taken out of use because of either too many connection errors in a time that was too short, or the replication lag exceeded the allowed threshold
OFFLINE_SOFT – when a server is put into OFFLINE_SOFT mode, no new connections are created toward that server, while the existing connections are kept until they are returned to the connection pool or destructed. In other words, connections are kept in use until multiplexing is enabled again, for example when a transaction is completed. This makes it possible to gracefully detach a backend as long as multiplexing is efficient
OFFLINE_HARD – when a server is put into OFFLINE_HARD mode, no new connections are created toward that server and the existing free connections are immediately dropped, while backend connections currently associated with a client session are dropped as soon as the client tries to use them. This is equivalent to deleting the server from a hostgroup. Internally, setting a server in OFFLINE_HARD status is equivalent to deleting the server

UPDATE mysql_servers SET status='OFFLINE_HARD' WHERE hostname='host';

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

How would I test the failover in ProxySQL for the Master/Replica node ?

I am fairly new in ProxySQL and we are testing this at the moment as POC, we would like to use ProxySQL instead of HAProxy for MySQL backend servers.

Regards,

Dave Rix

unread,

Jul 9, 2025, 4:59:51 PMJul 9

to proxysql

Hey,

I'm not on the ProxySQL team, but I may be able to help here.

The `mysql_servers` table is your 'staging' area, where you define and prepare the servers, set weights and connection limits, etc.

The `runtime_mysql_servers` table is the runtime environment showing the actual status of the servers.

You use `load mysql servers to runtime;` to upload your staging changes to the runtime environment and make them active.

You use `save mysql servers from runtime;` to download the runtime environment to your staging environment.

The `save mysql servers to disk;` will ONLY save the staging environment to disk, not the runtime environment (basically don't trust ChatGPT 😬 ).

When you stop the MySQL service on a host, ProxySQL identifies that as shown in your logs, and updates the `runtime_mysql_servers` table.

No new application connections would be sent to that host until it comes back `ONLINE`.

So yes, ignore the `mysql_servers` table when looking at the realtime state of the cluster, refer to the `runtime_mysql_servers` table instead.

Also, by stopping the `replica` as you described above does not test a `failover` event, but a `failure` event of that host.

To test a `failover`, you would need to stop the MySQL service on the primary DB host, and then alter the config on the replica to remove the read-only setting making it read-write.

ProxySQL would then detect this, and move the (old) replica into the `writer` hostgroup, and route your users with the `writer` hostgroup as a default to the new primary node.

Remember, if you do this, your old primary is essentially then a stale database, and would need to be recreated as a new replica to the new primary.

And of course, always test this using a database backend cluster that you don't mind recreating if it all goes wrong.

Regards,

Dave

Mahendra Singh Bisht

unread,

Jul 9, 2025, 5:22:23 PMJul 9

to proxysql

Thank you for the explanation Dave, really appreciated.

While I was trying to find more details, I found this : https://proxysql.com/documentation/backend-server-configuration/ which has some details that corelates with what you explained. Going forward, I will refer runtime* tables. My bad in using the term 'failover' for failure, I meant failure :).

Following up with the failover, so after disabling the read-only on the replica, will ProxySQL detect it automatically or do I need to make some changes in ProxySQL tables/config as well?

Regards,

Dave Rix

unread,

Jul 10, 2025, 11:34:56 AMJul 10

to proxysql

From my experience, you only need to alter the configuration on the MySQL servers as part of the failover process, ProxySQL will detect that change and change the hostgroup allocations for the servers so that the new 'primary' is the host in the 'writer' hostgroup - it should also remain as 'ONLINE' so will be the one taking traffic.

The old 'primary' should be set to 'read-only' in the MySQL config so ProxySQL allocates it to the 'reader' hostgroup - and if you are failing over due to a failure of the old primary, then it will be offline, and ProxySQL will mark it as 'SHUNNED' and not send any new application traffic to it.

Hope that helps.

Dave

Reply all

Reply to author

Forward

0 new messages