RabbitMQ claims client closed connection, tcpdump proves otherwise

2,789 views
Skip to first unread message

Roy Reznik

unread,
Jun 15, 2017, 8:15:12 AM6/15/17
to rabbitmq-users
Hi,

We are using RabbitMQ 3.6.5 over Ubuntu 14.04 with Erlang 18.2.

We have tons (10s per second) of these messages in the log:

=WARNING REPORT==== 15-Jun-2017::12:09:36 ===
closing AMQP connection <0.22731.27> (10.0.9.10:47534 -> 10.0.4.15:5672):
client unexpectedly closed TCP connection

While if I run tcpdump, it looks like the *server* is the one sending an RST packet to the client.
Our client never closes TCP connections unexpectedly and uses channel.abort() and connection.abort() in order to shutdown.
There is no other data in the log, and in the Java log everything seems fine, shutdown notifications show this:

com.rabbitmq.client.ShutdownSignalException: clean connection shutdown; protocol method: #method<connection.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0

How can we further debug to determine the root cause?
Could it be heartbeats? In other threads I saw there is a special message for missing heartbeats which doesn't look like this.

Thanks,
Roy.

Roy Reznik

unread,
Jun 15, 2017, 8:19:14 AM6/15/17
to rabbitmq-users
Also posting rabbitmqctl status:

[{pid,19686},
 {running_applications,
     [{rabbitmq_management_visualiser,"RabbitMQ Visualiser","3.6.5"},
      {rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.5"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
      {rabbit,"RabbitMQ","3.6.5"},
      {rabbit_common,[],"3.6.5"},
      {os_mon,"CPO  CXC 138 46","2.4"},
      {rabbitmq_auth_mechanism_ssl,
          "RabbitMQ SSL authentication (SASL EXTERNAL)","3.6.5"},
      {mnesia,"MNESIA  CXC 138 12","4.13.2"},
      {ssl,"Erlang/OTP SSL application","7.2"},
      {public_key,"Public key infrastructure","1.1"},
      {crypto,"CRYPTO","3.6.2"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {compiler,"ERTS  CXC 138 10","6.0.2"},
      {inets,"INETS  CXC 138 49","6.1"},
      {xmerl,"XML parser","1.3.9"},
      {asn1,"The Erlang ASN1 compiler version 4.0.1","4.0.1"},
      {syntax_tools,"Syntax tools","1.7"},
      {sasl,"SASL  CXC 138 11","2.6.1"},
      {stdlib,"ERTS  CXC 138 10","2.7"},
      {kernel,"ERTS  CXC 138 10","4.1.1"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 18 [erts-7.2] [source] [64-bit] [smp:16:16] [async-threads:256] [kernel-poll:true]\n"},
 {memory,
     [{total,60668869304},
      {connection_readers,147039384},
      {connection_writers,40792544},
      {connection_channels,512760944},
      {connection_other,242067936},
      {queue_procs,17082884440},
      {queue_slave_procs,0},
      {plugins,97262000},
      {other_proc,970627664},
      {mnesia,25225952},
      {mgmt_db,21536},
      {msg_index,531401728},
      {other_ets,63286416},
      {binary,40853242312},
      {code,27844069},
      {atom,1000601},
      {other_system,73411778}]},
 {alarms,[]},
 {listeners,
     [{clustering,25672,"::"},
      {amqp,5672,"::"},
      {'amqp/ssl',4183,"::"},
      {'amqp/ssl',4184,"::"}]},
 {vm_memory_high_watermark,0.6},
 {vm_memory_limit,142001801625},
 {disk_free_limit,50000000},
 {disk_free,17442390355968},
 {file_descriptors,
     [{total_limit,63900},
      {total_used,12860},
      {sockets_limit,57508},
      {sockets_used,5339}]},
 {processes,[{limit,1048576},{used,62018}]},
 {run_queue,511},
 {uptime,6953},
 {kernel,{net_ticktime,60}}]


We're very far from hitting the limits.
Things like disk, CPU, RAM are all not completely utilized.

Roy Reznik

unread,
Jun 15, 2017, 8:55:10 AM6/15/17
to rabbitmq-users

Following the TCP stream on Wireshark shows the server sends Connection.Start, and after less than 1ms, sending RST on connections.


Michael Klishin

unread,
Jun 15, 2017, 9:21:28 AM6/15/17
to rabbitm...@googlegroups.com
Hi Ray,

That message in the log will appear for certain other TCP socket exceptions.

Can you please post, say, a few minutes worth of logs around this event, include the SASL one?
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 15, 2017, 10:33:00 AM6/15/17
to rabbitmq-users
Attached last 10K rows in each of the logs.
It reproduces A LOT so you have quite a few instances of that issue in there.
rabbitlog.log
rabbitlog-sasl.log

Michael Klishin

unread,
Jun 15, 2017, 11:04:18 AM6/15/17
to rabbitm...@googlegroups.com
The sasl log is full of messages from https://github.com/rabbitmq/rabbitmq-server/issues/953
and some variation of https://github.com/rabbitmq/rabbitmq-management/issues/81 which I haven't seen
since 3.6.7.

My best theory is that the latter caused one of the higher up supervisors to terminate,
which means new connection processes have no "parent to attach to". In case
none of that makes sense but you'd like to learn more, see http://learnyousomeerlang.com/supervisors, or
just ignore it :)

Nothing else stands out, so please upgrade to 3.6.10 (see release notes, upgrading from 3.6.5 to 3.6.7+
requires a cluster-wide shutdown):

and Erlang/OTP 19.3.6, which fixes an issue with TCP sockets on node shutdown.

Restarting nodes one by one could help temporarily.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Roy Reznik

unread,
Jun 18, 2017, 4:18:50 AM6/18/17
to rabbitmq-users
Michael, thanks for your help.
While working on upgrading the environment, I noticed that there is no deb package (Ubuntu) for Erlang 19.3.6, only 19.3, which I believe does not solve the TCP socket issue.
Even when working with the updated repo @ https://packages.erlang-solutions.com/debian/
I know that you maintain a minimal rpm here: https://github.com/rabbitmq/erlang-rpm/releases but that's only RPMs and does not help us.

Except for building from source (which I prefer not to do) - is there any repository out there that contains deb packages for minor versions as well?

Thanks,
Roy.

Michael Klishin

unread,
Jun 18, 2017, 4:57:57 AM6/18/17
to rabbitm...@googlegroups.com
I see a 19.3.6 .deb on https://www.erlang-solutions.com/resources/download.html,
would installing via dpkg be a major inconvenience for you?

Erlang Solutions apt repository is not always up-to-date and has differences
between Debian and Ubuntu releases (somewhat understandable).

We will ask someone from ESL to make 19.3.6 available via the apt repo but
I can't provide an ETA. What distribution do you use?


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 18, 2017, 6:39:16 AM6/18/17
to rabbitmq-users
These packages are esl-erlang package and not the erlang package, and I think rabbitmq depends on erlang?

Michael Klishin

unread,
Jun 18, 2017, 4:37:36 PM6/18/17
to rabbitm...@googlegroups.com

"[one of the dependencies is] erlang-nox (>= 1:16.b.3) | esl-erlang.
Erlang can installed either from the standard repositories, backport repositories or Erlang Solutions…"

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 19, 2017, 6:36:15 AM6/19/17
to rabbitmq-users
Thanks, got it. I will try with the esl-erlang package.

Michael Klishin

unread,
Jun 19, 2017, 4:47:06 PM6/19/17
to rabbitm...@googlegroups.com
Hi Roy,

If you plan on going with the ESL apt repo, keep in mind that they provision OTP 20-rc2 by
default at the moment and will provision 20 GA later this week. No released RabbitMQ version
supports OTP 20 yet, so we highly recommend using apt pinning.

See https://groups.google.com/forum/#!topic/rabbitmq-users/_imbAavBYjY for more details and updates on OTP 20
support.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 22, 2017, 1:53:30 AM6/22/17
to rabbitmq-users
Hey Michael,

We upgraded successfully (yey!) to RabbitMQ 3.6.10 with Erlang 19.3.6 (Pinned the version, thanks for the tip).
The errors did go away, however, apparently, they were not the root issue.

We're still getting very bad rates on a pretty huge server that is not being utilized at all (io, cpu, memory or network).
We've tried modifying all sort of credit parameters (even though it doesn't seem like any of the connections, channels or queues are in "flow"), but it didn't seem to improve anything.

How can we get any leads to what's stopping RabbitMQ from working faster?

Thanks,
Roy.

Roy Reznik

unread,
Jun 22, 2017, 9:59:21 AM6/22/17
to rabbitmq-users
Michael, I'll provide more info:

We have a "star" architecture, where remote nodes have local RabbitMQ clusters which send messages using the shovel plugin to a centralized remote RabbitMQ instance.
That centralized instance has multiple queues (> 300) and multiple consumers (> 3000).
For some reason, the remote queues send messages VERY slowly to the centralized RabbitMQ, even though no connections, channels or queues are in "flow" mode and the instance's resources are not utilized.
If we turn our consumers off - then messages are sent very quickly and millions can drain within 10s of minutes.

Can you provide us with any clue why RabbitMQ becomes so slow when consumers are running?

Thanks.

Michael Klishin

unread,
Jun 22, 2017, 11:28:14 AM6/22/17
to rabbitm...@googlegroups.com
RabbitMQ management UI has quite a bit of metrics, including
rates, consumer utilisation, message location (RAM vs. disk) breakdown,
GC activity for the node or particular queues/channels/connections, disk operation
rates and "volume", network bandwidth used by connections, and so on.

rabbitmq-top further helps identify individual Erlang processes that use most VM scheduler
resources ("CPU") and RAM.

Are you collecting and analysing any of them?

Application-level settings (that are also applicable to, say, Shovel) can directly impact

Some systems choose to use a single queue or a couple of queues. See
CPU Usage and Parallelism Considerations in http://www.rabbitmq.com/queues.html.

It is not uncommon to see publishers that publish large-ish message (say, tens of kB if not megabytes)
over a 100 Mbit network and expect to see many thousands of messages per second, which is mathematically
impossible.

I absolutely cannot give you an informed
answer with the amount of information provided. I know nothing about your applications,
network bandwidth, node configuration and so on.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jun 22, 2017, 11:28:33 AM6/22/17
to rabbitm...@googlegroups.com
Also, consider starting new threads for new questions.
Reply all
Reply to author
Forward
0 new messages