Trouble understanding why ProxySQL sends queries to servers it cannot connect to

1,655 views
Skip to first unread message

dan.j...@sproutsocial.com

unread,
Mar 22, 2017, 9:38:03 AM3/22/17
to proxysql
Howdy,

I have a simple vagrant setup with a single master and a couple of read replicas.  When I turn off one of the read replicas, I see it's status set to "SHUNNED" in runtime_mysql_servers.  This is what I expect to happen.  I also see entries like the following in monitor.mysql_server_connect_log

mysql> select * from monitor.mysql_server_connect_log where hostname='192.168.33.102' LIMIT 1;
+----------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| hostname       | port | time_start_us    | connect_success_time_us | connect_error                                                                                          |
+----------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| 192.168.33.102 | 3306 | 1490189043197151 | 0                       | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 |
+----------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

This clearly indicates that the server cannot be connected to.  Sometimes if I query the actual backend through ProxySQL, the instance will change from "SHUNNED" to "ONLINE" and the query will time out.

mysql> select count(*) from employees;
ERROR 9001 (HY000): Max connect timeout reached while reaching hostgroup 1 after 6733ms

Why is ProxySQL ever trying to send queries to a server that it cannot connect to?  Is there some variable I need to change to prevent this from occurring? 

Thanks! 

Dan

René Cannaò

unread,
Mar 23, 2017, 6:10:24 AM3/23/17
to proxysql
I forgot to copy the mailing list.

---------- Forwarded message ----------
From: René Cannaò <rene....@gmail.com>
Date: 23 March 2017 at 03:09
Subject: Re: Trouble understanding why ProxySQL sends queries to servers it cannot connect to
To: dan.j...@sproutsocial.com


Hi Dan,

In ProxySQL there are two modules that take care of the status of the backends:
* Hostgroup Manager
* Monitor

Let's discuss about Monitor on a later stage, and let's analyze just the Hostgroup Manager.

The Hostgroup Manager is responsible for managing the connection pool, and is able to detect a faulty backend while running queries.
If a backend generates too many errors, it is shunned for some time (10 seconds by default).
After this time passes, it is flagged again as ONLINE, that means it will try again to connect to it (this specifically answer your question).
Simplifying, the Hostgroup Manager:
* shuns a node when it generates errors
* brings it back online after some time

The Hostgroup Manager doesn't perform any background health check: it doesn't really need to, because is able to detect failures in real time.
Although, background check is performed by the Monitor.
If the Monitor detects that a node is down, it informs the Hostgroup Manager to shun it and immediately kill all the connections.
Monitor will never inform the Hostgroup Manager that a node is healthy.
That add some information to your question: Hostgroup Manager is the only responsible to bring a node online: it tries to communicate with it.

It is interesting to understand why Monitor exists, if Hostgroup Manager is the main module that shun and re-enable a node.
Hostgroup Manager can detect if a connection is broken most of the time, but there is one exception to this: if a query is sent to an already established connection and a network issue happens while running the query, proxysql doesn't know if the backend is still processing the query or if there is an network issue.
This is  why Monitor check the status of the backends with its own intervals and timeouts, and if pings are failing will inform the Hostgroup Manager that something is wrong with the backend and that all connections should be terminated immediately.

Note also that due to https://github.com/sysown/proxysql/issues/531 , if only one backend is available ProxySQL tries to bring it online after 1 second instead of the default 10 seconds. There is nothing to lose in trying to resume communication with a node down.


From your email I read you have multiple reads, but you shutdown only one of them.
Unless timeouts are misconfigured, you shouldn't get the error you reported.
What timeouts did you changed?
Can you please attach the output of:
SHOW GLOBAL VARIABLES;
SELECT * FROM mysql_servers;
SELECT * FROM mysql_query_rules;

Thanks


--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Dan Johnson

unread,
Mar 23, 2017, 9:43:17 AM3/23/17
to René Cannaò, proxysql
Hello,

Before anything else, I want to thank you for the incredibly thorough response on how the normal connection pool and monitor work.  This clarifies quite a bit so thanks so much for that!  Here is the output you requested.

mysql> show global variables;
+----------------------------------------+---------------------------------------+
| Variable_name                          | Value                                 |
+----------------------------------------+---------------------------------------+
| admin-admin_credentials                | admin:admin                           |
| admin-hash_passwords                   | true                                  |
| admin-mysql_ifaces                     | /tmp/proxysql_admin.sock;0.0.0.0:6032 |
| admin-read_only                        | false                                 |
| admin-refresh_interval                 | 2000                                  |
| admin-stats_credentials                | stats:stats                           |
| admin-telnet_admin_ifaces              | (null)                                |
| admin-telnet_stats_ifaces              | (null)                                |
| admin-version                          | 1.3.4-0-gd158707                      |
| mysql-client_found_rows                | true                                  |
| mysql-commands_stats                   | true                                  |
| mysql-connect_retries_delay            | 1                                     |
| mysql-connect_retries_on_failure       | 10                                    |
| mysql-connect_timeout_server           | 1000                                  |
| mysql-connect_timeout_server_max       | 3000                                  |
| mysql-connection_max_age_ms            | 0                                     |
| mysql-default_charset                  | utf8                                  |
| mysql-default_max_latency_ms           | 1000                                  |
| mysql-default_query_delay              | 0                                     |
| mysql-default_query_timeout            | 36000000                              |
| mysql-default_reconnect                | true                                  |
| mysql-default_schema                   | information_schema                    |
| mysql-enforce_autocommit_on_reads      | false                                 |
| mysql-eventslog_filename               |                                       |
| mysql-eventslog_filesize               | 104857600                             |
| mysql-free_connections_pct             | 10                                    |
| mysql-have_compress                    | true                                  |
| mysql-init_connect                     | (null)                                |
| mysql-interfaces                       | 0.0.0.0:6033;/tmp/proxysql.sock       |
| mysql-long_query_time                  | 1000                                  |
| mysql-max_allowed_packet               | 4194304                               |
| mysql-max_connections                  | 2048                                  |
| mysql-max_stmts_cache                  | 10000                                 |
| mysql-max_stmts_per_connection         | 20                                    |
| mysql-max_transaction_time             | 14400000                              |
| mysql-monitor_connect_interval         | 60000                                 |
| mysql-monitor_connect_timeout          | 200                                   |
| mysql-monitor_enabled                  | true                                  |
| mysql-monitor_history                  | 600000                                |
| mysql-monitor_password                 | monitorpass                           |
| mysql-monitor_ping_interval            | 10000                                 |
| mysql-monitor_ping_max_failures        | 3                                     |
| mysql-monitor_ping_timeout             | 1000                                  |
| mysql-monitor_query_interval           | 60000                                 |
| mysql-monitor_query_timeout            | 100                                   |
| mysql-monitor_read_only_interval       | 1500                                  |
| mysql-monitor_read_only_timeout        | 500                                   |
| mysql-monitor_replication_lag_interval | 10000                                 |
| mysql-monitor_replication_lag_timeout  | 1000                                  |
| mysql-monitor_slave_lag_when_null      | 60                                    |
| mysql-monitor_username                 | monitor                               |
| mysql-monitor_writer_is_also_reader    | true                                  |
| mysql-multiplexing                     | true                                  |
| mysql-ping_interval_server_msec        | 10000                                 |
| mysql-ping_timeout_server              | 500                                   |
| mysql-poll_timeout                     | 2000                                  |
| mysql-poll_timeout_on_failure          | 100                                   |
| mysql-query_cache_size_MB              | 256                                   |
| mysql-query_digests                    | true                                  |
| mysql-query_digests_lowercase          | false                                 |
| mysql-query_digests_max_digest_length  | 2048                                  |
| mysql-query_digests_max_query_length   | 65000                                 |
| mysql-query_processor_iterations       | 0                                     |
| mysql-query_retries_on_failure         | 1                                     |
| mysql-server_capabilities              | 45578                                 |
| mysql-server_version                   | 5.5.30                                |
| mysql-servers_stats                    | true                                  |
| mysql-session_idle_ms                  | 1000                                  |
| mysql-session_idle_show_processlist    | false                                 |
| mysql-sessions_sort                    | true                                  |
| mysql-shun_on_failures                 | 5                                     |
| mysql-shun_recovery_time_sec           | 10                                    |
| mysql-ssl_p2s_ca                       | (null)                                |
| mysql-ssl_p2s_cert                     | (null)                                |
| mysql-ssl_p2s_cipher                   | (null)                                |
| mysql-ssl_p2s_key                      | (null)                                |
| mysql-stacksize                        | 1048576                               |
| mysql-threads                          | 4                                     |
| mysql-threshold_query_length           | 524288                                |
| mysql-threshold_resultset_size         | 4194304                               |
| mysql-wait_timeout                     | 28800000                              |
+----------------------------------------+---------------------------------------+
81 rows in set (0.00 sec)

mysql> SELECT * FROM mysql_servers;  # Note it says 102 is ONLINE, but it is offline
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname       | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 0            | 192.168.33.77  | 3306 | ONLINE | 1      | 0           | 1000            | 20                  | 0       | 0              |         |
| 1            | 192.168.33.101 | 3306 | ONLINE | 1      | 0           | 1000            | 20                  | 0       | 0              |         |
| 1            | 192.168.33.102 | 3306 | ONLINE | 1      | 0           | 1000            | 20                  | 0       | 0              |         |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM mysql_query_rules;
+---------+--------+----------+------------+--------+-------------+------------+------------+--------------------+--------------+-----------------------+----------------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+----------------+------------------+-----------+-----+-------+---------+
| rule_id | active | username | schemaname | flagIN | client_addr | proxy_addr | proxy_port | digest             | match_digest | match_pattern         | negate_match_pattern | flagOUT | replace_pattern | destination_hostgroup | cache_ttl | reconnect | timeout | retries | delay | mirror_flagOUT | mirror_hostgroup | error_msg | log | apply | comment |
+---------+--------+----------+------------+--------+-------------+------------+------------+--------------------+--------------+-----------------------+----------------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+----------------+------------------+-----------+-----+-------+---------+
| 1       | 1      | NULL     | NULL       | 0      | NULL        | NULL       | NULL       | NULL               | NULL         | ^SELECT .* FOR UPDATE | 0                    | NULL    | NULL            | 0                     | NULL      | NULL      | NULL    | NULL    | NULL  | NULL           | NULL             | NULL      | NULL | 0     | NULL    |
| 2       | 1      | NULL     | NULL       | 0      | NULL        | NULL       | NULL       | NULL               | NULL         | ^SELECT .*            | 0                    | NULL    | NULL            | 1                     | NULL      | NULL      | NULL    | NULL    | NULL  | NULL           | NULL             | NULL      | NULL | 0     | NULL    |
| 3       | 1      | NULL     | NULL       | 0      | NULL        | NULL       | NULL       | 0xE8930CB2CC9E68D7 | NULL         | NULL                  | 0                    | NULL    | NULL            | NULL                  | 2000      | NULL      | NULL    | NULL    | NULL  | NULL           | NULL             | NULL      | NULL | 1     | NULL    |
+---------+--------+----------+------------+--------+-------------+------------+------------+--------------------+--------------+-----------------------+----------------------+---------+-----------------+-----------------------+-----------+-----------+---------+---------+-------+----------------+------------------+-----------+-----+-------+---------+
3 rows in set (0.00 sec)


--
You received this message because you are subscribed to a topic in the Google Groups "proxysql" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/proxysql/19BbntSrOtQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to proxysql+unsubscribe@googlegroups.com.

René Cannaò

unread,
Mar 23, 2017, 10:43:54 AM3/23/17
to Dan Johnson, proxysql
Hi Dan,

Thank you for the feedback!

According to the mysql-monitor_ping_interval , every 10 seconds Monitor will inform Hostgroup Manager that the node is down, and if the the node is not shunned already it will be.
At the same time, every 10 seconds (mysql-shun_recovery_time_sec) Hostgroup Manager will try to bring it back online.
A note here: there is no background thread in the Hostgroup Manager that set the node online every 10 seconds! When a session requires a connection for hostgroup 1 , the Hostgroup Manager will scan the servers in that hostgroup to find a suitable candidate and if it finds a node shunned for more than 10 seconds it bring it back online.
That means that while Monitor tells Hostgroup Manager to shun the node exactly every 10 seconds, Hostgroup Manager will bring it back online on-demand, but only if it was shunned for already 10 seconds.
Without Monitor informing Hostgroup Manager that the node is down, Hostgroup Manager will shun the node if it fails 5 times in a single second (mysql-shun_on_failures).
If you are using just one client connection, and with a mysql-connection_timeout_server of 1 second, it is impossible that the node will generates 5 errors in one second so Hostgroup Manager won't shun it.

The ways of tweaking this so that your client doesn't get an error are any of the follow:
a) reduce mysql-monitor_ping_interval (ex: 1000) and mysql-monitor_ping_timeout (ex: 500) : Monitor will inform Hostgroup Manager more frequently that the backend is down
b) reduce mysql-shun_on_failures to 1 : Hostgroup Manager will shun the backend at the first error
c) use many client connections sending queries in parallel: in this way a lot of connections will fail at the same time and Hostgroup Manager will understand something is wrong with the backend .

While option #a is always valid:
* option #b is suitable for testing purpose with 1-2 client connections
* option #c sounds odd for testing purpose, but in production you normally don't have just one client sending query.

Thanks,
René
Reply all
Reply to author
Forward
0 new messages