Lost connection to Mysql server during query

4284 views
Skip to first unread message

iass

unread,
Jan 5, 2010, 2:45:04 PM1/5/10
to
Hi,

I've a Mysql 5.0.66sp1 source distribution on a RedHat 5.4 64bit and my
problem is that when I make a remote connection to the mysql server after
exactly two hours an idle connection, select sleep(28800),gets
disconnected with an "Lost connection to Mysql server during query"
error. If I look in Mysql I can see the process is still present in Mysql.
All timeout settings are set to 8 hours in Mysql. I suspect there is
firewall (or IDS?) in the middle which is killing idle connections but
the network guys say there are no such settings.
I've been logging the network traffic with tcpdump and using wireshark to
identify the possible cause but so far without success.
The Mysql settings are identical to an environment where we had no
problem at all with lost connections.
What else could I do to check the network?

regards,

Ivan

Axel Schwenke

unread,
Jan 5, 2010, 5:30:18 PM1/5/10
to
iass <ia...@gmail.com> wrote:
>
> I've a Mysql 5.0.66sp1 source distribution on a RedHat 5.4 64bit and my
> problem is that when I make a remote connection to the mysql server after
> exactly two hours an idle connection, select sleep(28800),gets
> disconnected with an "Lost connection to Mysql server during query"
> error. If I look in Mysql I can see the process is still present in Mysql.

This very much looks like somebody or something closing the TCP
connection between the client and the MySQL server. If the server
thread (SHOW PROCESSLIST) is still there, then it was not the MySQL
server who did this.

> I suspect there is
> firewall (or IDS?) in the middle which is killing idle connections but
> the network guys say there are no such settings.

I don't believe that.

> I've been logging the network traffic with tcpdump and using wireshark to
> identify the possible cause but so far without success.

Have you been able to log the RST packet that closes the TCP
connection? If this packet is generated somewhere between the
MySQL server and the client, then it is important where you
sniff the network. But if you sniff at the MySQL server and at
the client, but see the RST packet only at the latter, then you
have a prove that some network device inbetween injected it.

> What else could I do to check the network?

There's one odd coincidence here. Two hours is the default TCP
keepalive timeout. So it is possible that the connection was
terminated (silently) much earlier and this was only detected by
TCP keepalive which closed the connection (and thus interrupted
the client waiting in read() and triggered error 2013).

In this case your sniffer will not see any RST packet. It also
means that the rogue network device *silently* cuts idle TCP
connections. This would be *very* bad.

Maybe TCP keepalive is also the solution. You can use it to make
the network connection look less idle. MySQL activates TCP keepalive
on all network sockets. But the keepalive timeout is set by the
operating system and the defaults are pretty insane.

I.e.

~ $sysctl -a | fgrep -i keep
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

Read: "after 7200 seconds with no traffic, send a keepalive probe.
Unless there is an ACK, continue sending probe packets every 75
seconds. If there was no ACK after 9 probes, the connection is dead"

Much better numbers are 300, 10, 6.


HTH,
XL

Peter H. Coffin

unread,
Jan 5, 2010, 3:23:11 PM1/5/10
to

"the network guys" where? The ones responsible for wherever you're
connecting FROM, or the ones responsible for wherever you're connecting
TO? Do not forget that individual machines can have their OWN firewalls
or filtering rules that may not be the responsibility of people
responsible for networks themselves.

In short, not enough information.

--
'Cluids' - def: "Fluids having the effect of restoring or imparting
Clue. Eg; beer, coffee, sulphuric acid, etc."
-- Lionel, in the Monastery

toby

unread,
Jan 6, 2010, 11:21:30 AM1/6/10
to
On Jan 5, 2:45 pm, iass <i...@gmail.com> wrote:
> Hi,
>
> I've a Mysql 5.0.66sp1 source distribution on a RedHat 5.4 64bit and my
> problem is that when I make a remote connection to the mysql server after
> exactly two hours an idle connection, ...

Apart from anything else that you need to fix, set your client to auto-
reconnect.

>
> regards,
>
> Ivan

Axel Schwenke

unread,
Jan 6, 2010, 12:12:21 PM1/6/10
to
toby <to...@telegraphics.com.au> wrote:
>
> Apart from anything else that you need to fix, set your client to auto-
> reconnect.

Strong NAK. Auto-reconnect is considered evil because it clobbers the
complete session status: started transaction, temporary tables, session
(user) variables, connection charset, etc.

When the connection is lost, this should be catched and explicitly
reconnected (including all session initialization that might be
necessary). Silent reconnect can lead to erroneus application
behavior that is very hard to debug. BTST!


XL

iass

unread,
Jan 6, 2010, 12:42:18 PM1/6/10
to

Alex,

I did monitor the network between client and mysql server with tcpdump
and could see that after 2 hours (by the way the tcp_keepalive have the
same values as in yours: 7200, 9, 75) I could see that the client did
sent a tcp keep-alive packet to the Mysql port and Mysql responded with a
RST. I downloaded the tcpdump file to my pc and did analyze it with
WireShark. When the client sends the tcp keep-alive the seq/ack analysis
says:
"the rtt to ack the segment was: 7200.047751 seconds"
Immediatly Mysql responds with a RST, ACK packet. The client then aborts
with "lost connection to Mysql Server during query".
The ip-address from where the RST is coming corresponds to the Mysql
server.

I'm going to trace again and this time I'll tcpdump the connection at the
Mysql server side to see if it sends the RST packet itself or not.

Thanks for you help so far!

regards,

Ivan

iass

unread,
Jan 6, 2010, 12:45:00 PM1/6/10
to

Toby,

That could be an option but we need to know why this is happening. That
database, apps, etc are coming from a datacenter and is being moved to
another managed by a different provider and the apps-database were
working perfectly. Making changes to the apps/configuration should not be
needed.

regards,

Ivan

Sherm Pendley

unread,
Jan 6, 2010, 2:34:52 PM1/6/10
to

Ugh, newbies!

sherm--

iass

unread,
Jan 7, 2010, 1:49:31 PM1/7/10
to

Alex,

I did some more monitoring on the Mysql server and on the client and I
can see that after two hours the Mysql Server sends 9 tcp keep-alive
packets to the client and apparently it receives nothing from client so
the Mysql server sends a RST packet.
This is exactly the meaning of:

Read: "after 7200 seconds with no traffic, send a keepalive probe.
Unless there is an ACK, continue sending probe packets every 75
seconds. If there was no ACK after 9 probes, the connection is dead"

Looking at the tcpdump on the client I can see that slightly after two
hours the client send also a tcp keep-alive packet to the server and then
it receives a RST packet.

So it seems to me the client isn't receiving the keep-alive packets from
the Mysql server.

Have any one see this behaviour before?

At this moment I don't think this is a Mysql problem at all but more a
Linux issue.

regards,

Ivan

Gordon Burditt

unread,
Jan 7, 2010, 3:42:21 PM1/7/10
to
>>> This very much looks like somebody or something closing the TCP
>>> connection between the client and the MySQL server. If the server
>>> thread (SHOW PROCESSLIST) is still there, then it was not the MySQL
>>> server who did this.
>>>
>>>> I suspect there is
>>>> firewall (or IDS?) in the middle which is killing idle connections but
>>>> the network guys say there are no such settings.

The behavior you are seeing *could be* that of a stateful firewall
(perhaps as part of a NAT gateway) which has an expiration time,
or which has a limited table size and eventually has to drop
connections to make room for new ones. The stateful firewall might
be on the server host, on the client host, or anywhere in between.

A stateful firewall *can* work without a timeout for TCP (if it has
enough memory to keep all the state), unlike UDP, where there's no
way to tell when the conversation is over besides a timeout.
Sometimes such a firewall can be configured to test if the connection
is still alive (by sending its own keepalives)


>>> There's one odd coincidence here. Two hours is the default TCP
>>> keepalive timeout. So it is possible that the connection was terminated
>>> (silently) much earlier and this was only detected by TCP keepalive
>>> which closed the connection (and thus interrupted the client waiting in
>>> read() and triggered error 2013).

*IF* there's a stateful firewall causing this problem, it's timing out
at somewhat less than two hours, killing the connection silently, but the
hosts involved don't find out until a keepalive is sent (and gets dropped).

>I did some more monitoring on the Mysql server and on the client and I
>can see that after two hours the Mysql Server sends 9 tcp keep-alive
>packets to the client and apparently it receives nothing from client so
>the Mysql server sends a RST packet.
>This is exactly the meaning of:
>
> Read: "after 7200 seconds with no traffic, send a keepalive probe.
> Unless there is an ACK, continue sending probe packets every 75
> seconds. If there was no ACK after 9 probes, the connection is dead"
>
>Looking at the tcpdump on the client I can see that slightly after two
>hours the client send also a tcp keep-alive packet to the server and then
>it receives a RST packet.
>
>So it seems to me the client isn't receiving the keep-alive packets from
>the Mysql server.
>
>Have any one see this behaviour before?

It corresponds to a stateful firewall with a timeout.

Is there anything more complicated than an ethernet cable connecting
the server and client machiens? Does either machine have a firewall
active?

Axel Schwenke

unread,
Jan 7, 2010, 5:10:56 PM1/7/10
to
Hi,

iass <ia...@gmail.com> wrote:

> I did some more monitoring on the Mysql server and on the client and I
> can see that after two hours the Mysql Server sends 9 tcp keep-alive
> packets to the client and apparently it receives nothing from client so
> the Mysql server sends a RST packet.

...

> Looking at the tcpdump on the client I can see that slightly after two
> hours the client send also a tcp keep-alive packet to the server and then
> it receives a RST packet.

Right. Both sides are expected to begin sending keepalive packets.
What bothers me: why does none of the keepalive packets from the
server reach the client (I assume that, you did not say if you saw
the servers keepalive probes on the client side) - but the RST packet
*is* delivered?

We can assume there is some NAT device or stateful firewall between
your client and server. At some point in time the connection table
overflows (or the connection is silently purged). Then this device
would reject any future packet that claims to belong to the dropped
connection. But why is the RST packet then relayed?

Looks like somewhere on the network path between your server and your
client there is a firewall/NAT router which is severely broken or
misconfigured. Not only does it silently drop TCP connections that it
regards dead, it also has a strange policy on what to do with packets
belonging to such dropped connections.

I agree this is not a MySQL problem.


XL

The Natural Philosopher

unread,
Jan 7, 2010, 5:25:26 PM1/7/10
to
Gordon Burditt wrote:
>>>> This very much looks like somebody or something closing the TCP
>>>> connection between the client and the MySQL server. If the server
>>>> thread (SHOW PROCESSLIST) is still there, then it was not the MySQL
>>>> server who did this.
>>>>
>>>>> I suspect there is
>>>>> firewall (or IDS?) in the middle which is killing idle connections but
>>>>> the network guys say there are no such settings.
>
> The behavior you are seeing *could be* that of a stateful firewall
> (perhaps as part of a NAT gateway) which has an expiration time,
> or which has a limited table size and eventually has to drop
> connections to make room for new ones. The stateful firewall might
> be on the server host, on the client host, or anywhere in between.
>
> A stateful firewall *can* work without a timeout for TCP (if it has
> enough memory to keep all the state), unlike UDP, where there's no
> way to tell when the conversation is over besides a timeout.
> Sometimes such a firewall can be configured to test if the connection
> is still alive (by sending its own keepalives)
>

TRhat was my thought, yet the OP swears blind its an identical config.

And says he got a NACK *from the mysql server*..unless the stateful
firewall is proxying that, I suppose..


>
>>>> There's one odd coincidence here. Two hours is the default TCP
>>>> keepalive timeout. So it is possible that the connection was terminated
>>>> (silently) much earlier and this was only detected by TCP keepalive
>>>> which closed the connection (and thus interrupted the client waiting in
>>>> read() and triggered error 2013).
>
> *IF* there's a stateful firewall causing this problem, it's timing out
> at somewhat less than two hours, killing the connection silently, but the
> hosts involved don't find out until a keepalive is sent (and gets dropped).
>

yerrs. That fits..

Sherm Pendley

unread,
Jan 8, 2010, 10:13:52 AM1/8/10
to
Sherm Pendley <pshe...@gmail.com> writes:

> Ugh, newbies!

This person is stalking me. He has created (so far) seven fake profiles
on Facebook using my name and picture, and now he has followed me to
usenet as well.

His ISP is Cox Internet in Metairie, Lousiana. Please report this abuse
to <ab...@cox.net>.

sherm--

iass

unread,
Jan 8, 2010, 1:55:31 PM1/8/10
to

Axel and all others,

Thank you four your help. My findings will be given to OS/Network
management. I'll report back to you as soon as possible.

regards,

Ivan

Ivan Saez

unread,
Jan 17, 2010, 10:06:06 AM1/17/10
to

Hi,

It was after all a firewall problem. It was denied but when the network
guys started to monitor themselves they saw it:-(

Thank you all for your help.

regards,

Ivan

Reply all
Reply to author
Forward
0 new messages