Excessive connect timeout errors using ProxySQL

3,769 views
Skip to first unread message

Steven Hsu

unread,
Jan 13, 2017, 7:39:53 PM1/13/17
to proxysql
Hi there,

We are getting many timeout errors in our proxysql logs. We are using MariaDB 10.1 with cross region master-slave replication managed by RDS. ProxySQL is 1.3.2a. We are not sure what's causing this issue. Any help would be greatly appreciated!

mysql_connection.cpp:595:handler(): [ERROR] Connect timeout on rds.edvisor.io:3306 : exceeded by 9740us

mysql> select hostgroup, status, ConnERR, Queries, Latency_ms from stats_mysql_connection_pool;
+-----------+--------+---------+---------+------------+
| hostgroup | status | ConnERR | Queries | Latency_ms |
+-----------+--------+---------+---------+------------+
| 0         | ONLINE | 24      | 1176210 | 151593     |
| 1         | ONLINE | 7       | 978411  | 996        |
+-----------+--------+---------+---------+------------+

mysql_variables=
{
interfaces="0.0.0.0:3306;/tmp/proxysql.sock"
max_connections=100
long_query_time=5000
max_transaction_time=600000
default_query_timeout=300000
connect_timeout_server=1000
connect_retries_delay=100

monitor_username="monitor"
monitor_password="monitor-password"
monitor_replication_lag_interval=2000
monitor_connect_timeout=1000
ping_timeout_server=1000
}

mysql_servers =
(
{
hostgroup = 0
address = "rds.edvisor.io"
port = 3306
max_connections = 50
compression = 1
},
{
hostgroup = 1
port = 3306
max_connections = 50
max_replication_lag = 5
}
)

mysql_users:
(
{
username = "production"
password = "production-password"
default_hostgroup = 0
max_connections = 100
transaction_persistent = 1
}
)

mysql_query_rules:
(
{
rule_id=1
active=1
match_pattern="^SELECT .* FOR UPDATE$"
destination_hostgroup=0
apply=1
},
{
rule_id=2
active=1
match_pattern="^SELECT"
destination_hostgroup=1
apply=1
}
)

René Cannaò

unread,
Jan 14, 2017, 1:33:54 AM1/14/17
to Steven Hsu, proxysql
Hi Steven,

It seems like you have a very high latency between the proxy and the RDS instance.

I would recommend to increase connect_timeout_server :
SET mysql-connect_timeout_server=5000;
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;


Thanks,
René

--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steven Hsu

unread,
Jan 14, 2017, 2:57:07 AM1/14/17
to proxysql
Hi Rene,

Appreciate your quick reply, we'll definitely test out that change. 

Do you have ideas where we can start investigating the excessively high latency? Hostgroup 1 points to a slave in the same region, while hostgroup 0 points to a master that is in a different region, with approximately 150ms in latency ping wise. We are running on standard RDS with minor slave parallel replication tweaks.

Thanks!

René Cannaò

unread,
Jan 14, 2017, 1:06:00 PM1/14/17
to Steven Hsu, proxysql
Hi Steven,

From ProxySQL's Admin, you could run queries like the follows to better understand the latency:
SELECT * FROM monitor.mysql_server_ping_log;
SELECT * FROM monitor.mysql_server_connect_log;
SELECT * FROM monitor.mysql_server_replication_lag_log;

These tables have columns {ping_|connect_|}success_time_us that should report how long it took to complete the check, and also an error column (that probably will often report a timeout).
Cross region check could be expensive, and metrics collected in these tables could give some useful information.

Furthermore, you are not using IP addressed, therefore DNS lookup may also affect the time to establish a connection.


Steven Hsu

unread,
Jan 14, 2017, 1:50:42 PM1/14/17
to proxysql
Here are the stats, I found these numbers closely match the actual query times. 

In what ways can the ping/connect/success times can be way lower than the latency obtained from stats_mysql_connection_pool?

mysql> SELECT * FROM monitor.mysql_server_ping_log ORDER BY time_start_us DESC LIMIT 15;
+---------------------+------+------------------+----------------------+------------+
| hostname            | port | time_start_us    | ping_success_time_us | ping_error |
+---------------------+------+------------------+----------------------+------------+
| rds.edvisor.io       | 3306 | 1484419472344663 | 187093               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419472329521 | 1066                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419412344547 | 165316               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419412329415 | 1074                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419352344427 | 168123               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419352329285 | 989                  | NULL       |
| rds.edvisor.io       | 3306 | 1484419292344309 | 165308               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419292329189 | 1078                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419232344177 | 165312               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419232329041 | 1062                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419172344063 | 165284               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419172328924 | 1082                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419112343907 | 165263               | NULL       |
| rds-slave.edvisor.io | 3306 | 1484419112328774 | 1045                 | NULL       |
| rds.edvisor.io       | 3306 | 1484419052343784 | 165261               | NULL       |
+---------------------+------+------------------+----------------------+------------+
15 rows in set (0.00 sec)

mysql> SELECT * FROM monitor.mysql_server_connect_log ORDER BY time_start_us DESC LIMIT 15;
+---------------------+------+------------------+-------------------------+---------------+
| hostname            | port | time_start_us    | connect_success_time_us | connect_error |
+---------------------+------+------------------+-------------------------+---------------+
| rds.edvisor.io       | 3306 | 1484419412637917 | 463611                  | NULL          |
| rds-slave.edvisor.io | 3306 | 1484419412607786 | 30919                   | NULL          |
| rds.edvisor.io       | 3306 | 1484419292637822 | 567643                  | NULL          |
| rds-slave.edvisor.io | 3306 | 1484419292607677 | 46569                   | NULL          |
| rds.edvisor.io       | 3306 | 1484419172637697 | 470262                  | NULL          |
| rds-slave.edvisor.io | 3306 | 1484419172607566 | 42096                   | NULL          |
| rds.edvisor.io       | 3306 | 1484419052637601 | 533496                  | NULL          |
| rds-slave.edvisor.io | 3306 | 1484419052607436 | 29135                   | NULL          |
| rds.edvisor.io       | 3306 | 1484418932637474 | 565035                  | NULL          |
| rds-slave.edvisor.io | 3306 | 1484418932607298 | 44143                   | NULL          |
+---------------------+------+------------------+-------------------------+---------------+
10 rows in set (0.00 sec)

<_server_replication_lag_log ORDER BY time_start_us DESC LIMIT 15;
+---------------------+------+------------------+-----------------+----------+-------+
| hostname            | port | time_start_us    | success_time_us | repl_lag | error |
+---------------------+------+------------------+-----------------+----------+-------+
| rds-slave.edvisor.io | 3306 | 1484419520561542 | 1234            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419518561401 | 1311            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419516561282 | 1292            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419514561220 | 1312            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419512561029 | 1269            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419510560915 | 1252            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419508560782 | 1342            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419506560671 | 1263            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419504560533 | 1288            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419502560410 | 1329            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419500560289 | 132890          | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419498560127 | 1191            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419496559995 | 1297            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419494559861 | 1272            | 0        | NULL  |
| rds-slave.edvisor.io | 3306 | 1484419492559702 | 1304            | 0        | NULL  |
+---------------------+------+------------------+-----------------+----------+-------+
15 rows in set (0.00 sec)



--
You received this message because you are subscribed to a topic in the Google Groups "proxysql" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/proxysql/gowOdj2o6eg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to proxysql+unsubscribe@googlegroups.com.

René Cannaò

unread,
Jan 14, 2017, 2:04:58 PM1/14/17
to Steven Hsu, proxysql
Steven,

About "In what ways can the ping/connect/success times can be way lower than the latency obtained from stats_mysql_connection_pool?" it turned out to be a reporting bug. The metric is in "us" instead of "ms". I created an issue for it:
https://github.com/sysown/proxysql/issues/882


Did changing mysql-connect_timeout_server fixed the issue?

Steven Hsu

unread,
Jan 14, 2017, 2:27:53 PM1/14/17
to proxysql
Thanks for filing the reporting bug, now things make sense. We have deployed the mysql-connect_timeout_server change and currently monitoring the situation.

Keep you updated.

Steven Hsu

unread,
Feb 6, 2017, 5:42:36 PM2/6/17
to proxysql
Hi,

We are still occasionally hitting PING and connect timeouts. Is there a way we can rule out proxysql as cause of the issue? Eg. mysql client commands to test ping and connect with a set timeout. 

Environment: RDS MariaDB 10.1.19

mysql_variables=
{
interfaces="0.0.0.0:3306;/tmp/proxysql.sock"
max_connections=100
long_query_time=5000
max_transaction_time=600000
default_query_timeout=300000
connect_timeout_server=5000
ping_timeout_server=1000
connect_retries_delay=100
connection_max_age_ms=900000

monitor_username="monitor"
monitor_password="monitor-password"
monitor_replication_lag_interval=2000
monitor_connect_timeout=1000
}

René Cannaò

unread,
Feb 7, 2017, 4:04:51 PM2/7/17
to proxysql
Hi Steven,

There are several reasons why a connection or a ping can timeout, ranging from the fact that the requests are through multiple AZ, to the fact that the RDS could be overloaded, to the fact that DNS lookup is a blocking operation and can increase the overall request time: you are using CNAMEs and not IPs, therefore it may take some time to resolve the name.

I would run the follow commands to start investigating a pattern, so to understand if the timeout is a rare or frequent event.

SELECT AVG(ping_success_time_us) FROM monitor.mysql_server_ping_log WHERE hostname='rds.edvisor.io' AND ping_error IS NULL;
SELECT AVG(connect_success_time_us) FROM monitor.mysql_server_connect_log WHERE hostname='rds.edvisor.io' AND connect_error IS NULL;
SELECT ping_error, COUNT(*) FROM monitor.mysql_server_ping_log WHERE hostname='rds.edvisor.io' GROUP BY ping_error;
SELECT connect_error, COUNT(*) FROM monitor.mysql_server_connect_log WHERE hostname='rds.edvisor.io' GROUP BY connect_error;

Thanks,
René

Steven Hsu

unread,
Mar 15, 2017, 7:57:04 PM3/15/17
to proxysql
Thanks for the queries, we ran them several times in the past month and the averages were acceptable. However we are still getting occasional spike in timeouts (once or twice a day - each time consisting of 10-20 error notifications)

Our slave is on an M3.medium instance while the master is on an M4.large. The cname DNS queries goes through R53 which routes to the closest slave node.


To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "proxysql" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/proxysql/gowOdj2o6eg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to proxysql+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "proxysql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+u...@googlegroups.com.

Oytun Tez

unread,
Mar 19, 2017, 1:08:55 PM3/19/17
to proxysql
Hi Steven,

I am also restructering our DB flow these days. May I ask you a couple questions from your experience?

Current setup:
1 master in RDS, db.m3.xlarge
1 read replica in RDS, db.m3.large
1 ProxySQL instance in EC2, m4.large.

Questions:
  1. Did you benchmark the "transactions per second" and "95 percentile execution time"? (names are from sysbench, your benchmarking may use different wording). What was the difference between direct connection to RDS and through ProxySQL?
  2. How did you deal with the timeouts you mentioned in your previous email?
  3. How does "max_connections=100" work for you? Did you try increasing it? If yes, did it make a difference?
You can also refer to my last 2 posts:

Thank you very much!

René Cannaò

unread,
Mar 19, 2017, 3:15:16 PM3/19/17
to Steven Hsu, proxysql
Steven,

When you get the "occasional spike in timeouts (once or twice a day)" , what is the connection time reported in the monitor tables?
Is it possible there is really a a timeout?
Also, consider that you are using CNAME and not IPs, thus the time to resolve the name counts in the timeout as well: it the slave is slow in accepting a connection, or if R53 is slow in resolving a name, you will get a timeout.

Thanks,
René

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

Steven Hsu

unread,
Mar 23, 2017, 6:14:18 PM3/23/17
to proxysql, ste...@edvisor.io

Hi Rene, here are the results from the above commands:


AVG Ping Master: 162ms
AVG Ping Slave: 2ms
Max Ping Master: 163ms
Max Ping Slave: 137ms

AVG Connect Master: 534ms
AVG Connect Slave: 66ms
Max Connect Master: 927ms
Max Connect Slave: 158ms


We are trying to work with AWS to find the cause of the issue too.


2017-03-23 21:30:21 ProxySQL_Admin.cpp:2906:flush_mysql_variables___database_to_runtime(): [ERROR] Impossible to set not existing variable session_debug with

2017-03-23 21:39:00 MySQL_Session.cpp:2200:handler(): [ERROR] Detected an offline server during query: rds-slave.edvisor.io, 3306

Steven Hsu

unread,
Mar 23, 2017, 6:15:46 PM3/23/17
to proxysql
Hey Oytun,

We haven't done many benchmarks, the max connection is enough for us right now since we only have 4 instances (each capped at 20) accessing the database. 

Regards

Steven Hsu

unread,
Mar 24, 2017, 2:01:42 PM3/24/17
to proxysql, ste...@edvisor.io
Our proxysql resides inside a docker container, each app instance is linked to its own proxysql container.

Do you think this might cause an issue?

Thanks

Oytun Tez

unread,
Mar 24, 2017, 2:32:40 PM3/24/17
to Steven Hsu, proxysql
This is a valid use case, I don't think it would be an issue. Rene of course could comment more healthily.


---
Oytun Tez

M O T A W O R D - The World's Fastest Human Translation Platform.
J S O N - L A N G - JSON specification for localization file exchange.
I AM A FACT - Entrepreneurification!

To unsubscribe from this group and all its topics, send an email to proxysql+unsubscribe@googlegroups.com.

René Cannaò

unread,
Apr 7, 2017, 10:15:57 PM4/7/17
to proxysql
Steven,

I recently came across a similar issue, and it turned out that the problem was that the kernel on the database server detecting a flood of connections from the proxy and disabling connections.

Try to run this on the database server:
echo 0 > /proc/sys/net/ipv4/tcp_syncookies

Thanks,
René

To unsubscribe from this group and stop receiving emails from it, send an email to proxysql+unsubscribe@googlegroups.com.

Steven Hsu

unread,
Apr 19, 2017, 8:35:58 PM4/19/17
to proxysql
Hi Rene,

Thanks for following up. I'll try to find a way to disable tcp_syncookies on RDS.

Thanks!
Reply all
Reply to author
Forward
0 new messages