What causes heartbeat_timeout and java.net.SocketException: Broken pipe ?

456 views
Skip to first unread message

Kohei Nozaki

unread,
Apr 25, 2016, 2:01:45 AM4/25/16
to rabbitmq-users
Hello, I'm running into a strange problem which is Channel#basicPublish() constantly produces "java.net.SocketException: Broken pipe" for every invocation as stacktrace of my application:

    [2016/04/25 05:01:38.443][WARN ][pool-6-thread-2][com.lisb.ourapp.server.service.MqService] Broken pipe
    java.net.SocketException: Broken pipe
            at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_91]
            at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_91]
            at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_91]
            at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_91]
            at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_91]
            at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[na:1.7.0_91]
            at com.rabbitmq.client.impl.SocketFrameHandler.flush(SocketFrameHandler.java:142) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.AMQConnection.flush(AMQConnection.java:488) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:125) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:316) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:292) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:636) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:619) ~[amqp-client-3.1.4.jar:na]
            at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:610) ~[amqp-client-3.1.4.jar:na]
            at com.lisb.ourapp.server.mq.MqChannel.basicPublishEvent(MqChannel.java:64) ~[classes/:na]
            at com.lisb.ourapp.server.mq.MqChannel.basicPublish(MqChannel.java:50) ~[classes/:na]
            at com.lisb.ourapp.server.service.MqService$EnqueueTask.run(MqService.java:247) ~[classes/:na]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_91]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_91]
            at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91]

My application manages threads that execute Channel#basicPublish() until it succeeds, and they constantly retries and fails in every 60 seconds. I can see it through following result of grep:

    $ grep pool-6-thread-2 /var/log/ourapp.log
    ...
    [2016/04/25 05:00:38.426][WARN ][pool-6-thread-2][com.lisb.ourapp.server.service.MqService] Broken pipe
    [2016/04/25 05:01:38.443][WARN ][pool-6-thread-2][com.lisb.ourapp.server.service.MqService] Broken pipe
    [2016/04/25 05:02:38.483][WARN ][pool-6-thread-2][com.lisb.ourapp.server.service.MqService] Broken pipe
    ...

Meanwhile I can see that Rabbit produces error reports in exactly same moments (05:01:38, 05:02:38 and 05:03:38) as my application produces "Broken pipe". that states "heartbeat_timeout,running" as follows:

    $ grep eartbeat -B 2 rabbit\@ip-10-0-63-228.log | tail -20
    --
    =ERROR REPORT==== 25-Apr-2016::05:01:38 ===
    closing AMQP connection <0.23213.141> (10.0.50.64:46004 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}
    --
    =ERROR REPORT==== 25-Apr-2016::05:02:35 ===
    closing AMQP connection <0.26088.141> (10.0.50.64:46042 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}
    --
    =ERROR REPORT==== 25-Apr-2016::05:02:38 ===
    closing AMQP connection <0.26236.141> (10.0.50.64:46043 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}
    --
    =ERROR REPORT==== 25-Apr-2016::05:03:35 ===
    closing AMQP connection <0.28791.141> (10.0.50.64:46087 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}
    --
    =ERROR REPORT==== 25-Apr-2016::05:03:38 ===
    closing AMQP connection <0.28946.141> (10.0.50.64:46088 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}

I can also see Rabbit constantly accepts and closes connection from my application in every 60 seconds as following log states:

    ...
    =INFO REPORT==== 25-Apr-2016::05:49:39 ===
    accepting AMQP connection <0.17185.145> (10.0.50.64:47854 -> 10.0.63.228:5672)
    ...
    
    =ERROR REPORT==== 25-Apr-2016::05:50:39 ===
    closing AMQP connection <0.17185.145> (10.0.50.64:47854 -> 10.0.63.228:5672):
    {heartbeat_timeout,running}
    ...

Note that my app always gets new Connection via ConnectionFactory#newConnection() before executing Channel#basicPublish(). despite that, it looks like ConnectionFactory#newConnection() constantly returns an Connection instance which tied to broken TCP connection for a thread of my app.

Is my assumption correct? or do you think of any other causes or solutions? any other thoughts or any request of more detailed information for further investigating would be greatly appreciated. thank you.

My environment:

* RabbitMQ v3.1.5 (two-node cluster)
* amqp-client (Java) v3.1.4
    * note: I set ConnectionFactory#setRequestedHeartbeat to 20 while Rabbit regards it as 60.

P.S. @MK Thank you for the answer for my question about the message "Publishers will be blocked...". that's makes sense.

Michael Klishin

unread,
Apr 25, 2016, 7:55:16 AM4/25/16
to rabbitm...@googlegroups.com
See http://rabbitmq.com/heartbeats.html. Typically a problem with previously open TCP connection.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages