Broken pipe - Coud not connect to rabbitmq server

588 views
Skip to first unread message

Luong Nguyen Tien

unread,
Dec 11, 2018, 9:37:29 PM12/11/18
to rabbitmq-users
   Hi community, 

We faced a problem when server trying to publish message to rabbitmq ( only on high load probably )
All stats seems fine (RAM/CPU/Disk space/Files descriptor ...)

I seached with the same keyword "Broken pipe", someone had the same problem but at the end no solution/clue provided.

DO anyone have the same rabbitmq behavior? Please help/share your solution here.

Many thanks!!

java.net.SocketException: Broken pipe (Write failed)
 at java
.net.SocketOutputStream.socketWrite0(Native Method) [rt.jar:1.7.0_201]
 at java
.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115) [rt.jar:1.7.0_201]
 at java
.net.SocketOutputStream.write(SocketOutputStream.java:161) [rt.jar:1.7.0_201]
 at java
.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) [rt.jar:1.7.0_201]
 at java
.io.BufferedOutputStream.write(BufferedOutputStream.java:95) [rt.jar:1.7.0_201]
 at java
.io.DataOutputStream.writeByte(DataOutputStream.java:153) [rt.jar:1.7.0_201]
 at com
.rabbitmq.client.impl.Frame.writeTo(Frame.java:185) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.SocketFrameHandler.writeFrame(SocketFrameHandler.java:171) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.AMQConnection.writeFrame(AMQConnection.java:549) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:104) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:363) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:339) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.ChannelN.exchangeDeclareNoWait(ChannelN.java:729) [amqp-client-4.1.0.jar:4.1.0]
 at com
.rabbitmq.client.impl.recovery.AutorecoveringChannel.exchangeDeclareNoWait(AutorecoveringChannel.java:260) [amqp-client-4.1.0.jar:4.1.0]


Regards,

LuongNT 

Michael Klishin

unread,
Dec 11, 2018, 9:54:40 PM12/11/18
to rabbitm...@googlegroups.com
That's UNIX speak for "broken connection" and it is on the client end (even though monitoring
server metrics is always useful).

Connection recovery should handle it assuming the a new TCP connection can be successfully open.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Luong Nguyen Tien

unread,
Dec 11, 2018, 10:03:53 PM12/11/18
to rabbitm...@googlegroups.com
   Thanks Michael for your input, 

We already enable AutomaticRecoveryEnabled but it seems not auto re-create the broken connection. 
Anything we may miss here? 

factory.setAutomaticRecoveryEnabled(true);

Regards,

Tien Luong NGUYEN - Chief Architect Officer
MPOS J.S.C | MPOS Global Group
M 09.06.27.64.24 | S tienluong.nguyen

Michael Klishin

unread,
Dec 11, 2018, 10:11:27 PM12/11/18
to rabbitm...@googlegroups.com
That should be enough [1]. Automatic recovery attempts happen periodically with a 5 second
interval by default. Java client logs them. RabbitMQ will log all successful connections, authentication failures
and abruptly closed client TCP connections. It will not and cannot log unsuccessful TCP connections.

We cannot suggest much without server and Java client logs posted to this list.

Michael Klishin

unread,
Dec 11, 2018, 10:22:38 PM12/11/18
to rabbitm...@googlegroups.com
It's worth mentioning that a client can fail to reconnect for a fairly broad range of reasons,
from genuine TCP or IP connectivity issues to hostname resolution issues to server nodes *or clients*
running out of file descriptors when mass disconnect/reconnect events happen to other things.

If you have evidence of a mass client disconnect and reconnection attempt (from server logs, tcpdump ands so on),
make sure that the TCP listener is configured to accept at least a few hundred concurrent connections [1]
(note: accepting connections is not the same thing as maintaining them here; the TCP connection backlog
controls what peak rate of inbound connections a node will sustain before the kernel will stop rejecting connection attempts).

Again, this only matters if you have evidence of mass client reconnect. If you have 10 client connections total you cannot
surpass the default backlog of 128 even in theory.

Michael Klishin

unread,
Dec 11, 2018, 10:24:30 PM12/11/18
to rabbitm...@googlegroups.com
Sent previous response too soon.

And then there's an entire area of things to worry about if there's evidence of high connection churn [1][2].

Luong Nguyen Tien

unread,
Dec 12, 2018, 9:12:16 PM12/12/18
to rabbitmq-users
   Hi team, 

We log a little bit more and see that sometime the java connection object = NULL
It should throw a "Broken pipe" when we try to use this connection to publish message to rabbitmq.

So anyone have clue in which circumstances the java connections object get NULL to cause this problem?

Many thanks, 

Michael Klishin

unread,
Dec 12, 2018, 10:23:57 PM12/12/18
to rabbitm...@googlegroups.com
An I/O exception by definition cannot stem from an operation on a null. That would be an NPE, a NullPointerException.

As mentioned earlier the exception merely indicates a failed socket write. Those things happen from time to time in any realistic system.

I see no path in ConnectionFactory#newConnection that would return a null [1].
Tracking down nulls in your app is something only your team can do I'm afraid.

Michael Klishin

unread,
Dec 13, 2018, 5:23:37 PM12/13/18
to rabbitm...@googlegroups.com
I forgot to mention that client recovery and resiliency towards e.g. severe network latency can be tested
with Toxiproxy and similar tools. Bunny uses Toxiproxy for some specific tests around error detection and recovery
if it's available, for example, and we expect to use it more in more clients and other projects in the future.

Toxiproxy can easily simulate socket write failures as well as less trivial scenarios.
Reply all
Reply to author
Forward
0 new messages