Rabbit mq channel hangs

3,081 views
Skip to first unread message

springusr

unread,
Nov 29, 2016, 9:01:06 PM11/29/16
to rabbitmq-users
Hi, we are facing issue with rabbit mq on cloud foundry where sometimes our application hangs and we can see all threads that is communicating rabbit mq hangs. It turned out that rabbit channels are waiting for some reply or something and causing this. In our case we have multiple producers and multiple consumers accessing/deleting same queues. We also observed that when we have this issue, if we see rabbit dashboard we can see those channels in running mode forever. Now if I manually close connection from dashboard in few seconds my application throws connection error , reconnects and start processing again. That means its definitely something with rabbit mq. It could be way clustering done on rabbit etc. or something else. Ideally it should have timed out that would have helped a lot. Below is stack trace taken when we see this issue. (This is not something reproducible on local)

Please advice.

{
"threadName": "taskExector-10",
"threadId": 77,
"blockedTime": -1,
"blockedCount": 317,
"waitedTime": -1,
"waitedCount": 379,
"lockName": "com.rabbitmq.utility.BlockingValueOrException@105f30b9",
"lockOwnerId": -1,
"lockOwnerName": null,
"inNative": false,
"suspended": false,
"threadState": "WAITING",
"stackTrace": [
{
"methodName": "wait",
"fileName": "Object.java",
"lineNumber": -2,
"className": "java.lang.Object",
"nativeMethod": true
},
{
"methodName": "wait",
"fileName": "Object.java",
"lineNumber": 502,
"className": "java.lang.Object",
"nativeMethod": false
},
{
"methodName": "get",
"fileName": "BlockingCell.java",
"lineNumber": 50,
"className": "com.rabbitmq.utility.BlockingCell",
"nativeMethod": false
},
{
"methodName": "uninterruptibleGet",
"fileName": "BlockingCell.java",
"lineNumber": 89,
"className": "com.rabbitmq.utility.BlockingCell",
"nativeMethod": false
},
{
"methodName": "uninterruptibleGetValue",
"fileName": "BlockingValueOrException.java",
"lineNumber": 33,
"className": "com.rabbitmq.utility.BlockingValueOrException",
"nativeMethod": false
},
{
"methodName": "getReply",
"fileName": "AMQChannel.java",
"lineNumber": 361,
"className": "com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation",
"nativeMethod": false
},
{
"methodName": "privateRpc",
"fileName": "AMQChannel.java",
"lineNumber": 226,
"className": "com.rabbitmq.client.impl.AMQChannel",
"nativeMethod": false
},
{
"methodName": "exnWrappingRpc",
"fileName": "AMQChannel.java",
"lineNumber": 118,
"className": "com.rabbitmq.client.impl.AMQChannel",
"nativeMethod": false
},
{
"methodName": "queueDeclare",
"fileName": "ChannelN.java",
"lineNumber": 844,
"className": "com.rabbitmq.client.impl.ChannelN",
"nativeMethod": false
},
{
"methodName": "queueDeclare",
"fileName": "ChannelN.java",
"lineNumber": 61,
"className": "com.rabbitmq.client.impl.ChannelN",
"nativeMethod": false
},
{
"methodName": "invoke",
"fileName": null,
"lineNumber": -1,
"className": "sun.reflect.GeneratedMethodAccessor176",
"nativeMethod": false
},
{
"methodName": "invoke",
"fileName": "DelegatingMethodAccessorImpl.java",
"lineNumber": 43,
"className": "sun.reflect.DelegatingMethodAccessorImpl",
"nativeMethod": false
},
{
"methodName": "invoke",
"fileName": "Method.java",
"lineNumber": 498,
"className": "java.lang.reflect.Method",
"nativeMethod": false
},
{
"methodName": "invoke",
"fileName": "CachingConnectionFactory.java",
"lineNumber": 916,
"className": "org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler",
"nativeMethod": false
},
{
"methodName": "queueDeclare",
"fileName": null,
"lineNumber": -1,
"className": "com.sun.proxy.$Proxy166",
"nativeMethod": false
},
{
"methodName": "declareQueues",
"fileName": "RabbitAdmin.java",
"lineNumber": 577,
"className": "org.springframework.amqp.rabbit.core.RabbitAdmin",
"nativeMethod": false
},
{
"methodName": "access$200",
"fileName": "RabbitAdmin.java",
"lineNumber": 67,
"className": "org.springframework.amqp.rabbit.core.RabbitAdmin",
"nativeMethod": false
},
{
"methodName": "doInRabbit",
"fileName": "RabbitAdmin.java",
"lineNumber": 209,
"className": "org.springframework.amqp.rabbit.core.RabbitAdmin$3",
"nativeMethod": false
},
{
"methodName": "doInRabbit",
"fileName": "RabbitAdmin.java",
"lineNumber": 206,
"className": "org.springframework.amqp.rabbit.core.RabbitAdmin$3",
"nativeMethod": false
},
{
"methodName": "doExecute",
"fileName": "RabbitTemplate.java",
"lineNumber": 1394,
"className": "org.springframework.amqp.rabbit.core.RabbitTemplate",
"nativeMethod": false
},
{
"methodName": "execute",
"fileName": "RabbitTemplate.java",
"lineNumber": 1367,
"className": "org.springframework.amqp.rabbit.core.RabbitTemplate",
"nativeMethod": false
},
{
"methodName": "execute",
"fileName": "RabbitTemplate.java",
"lineNumber": 1343,
"className": "org.springframework.amqp.rabbit.core.RabbitTemplate",
"nativeMethod": false
},
{
"methodName": "declareQueue",
"fileName": "RabbitAdmin.java",
"lineNumber": 206,
"className": "org.springframework.amqp.rabbit.core.RabbitAdmin",
"nativeMethod": false
},
{
"methodName": "sendData",
"fileName": "QDispatcherService.java",
"lineNumber": 59,
"className": "com.mycompany.QDispatcherService",
"nativeMethod": false
},
....
"lockedMonitors": [
{
"className": "java.lang.Object",
"identityHashCode": 285810320,
"lockedStackFrame": {
"methodName": "invoke",
"fileName": "CachingConnectionFactory.java",
"lineNumber": 916,
"className": "org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler",
"nativeMethod": false
},
"lockedStackDepth": 13
}
],
"lockedSynchronizers": [
{
"className": "java.util.concurrent.ThreadPoolExecutor$Worker",
"identityHashCode": 372417558
}
],
"lockInfo": {
"className": "com.rabbitmq.utility.BlockingValueOrException",
"identityHashCode": 274673849
}
},

Michael Klishin

unread,
Nov 30, 2016, 5:47:52 AM11/30/16
to rabbitm...@googlegroups.com
That does not definitely mean "something with RabbitMQ."
Channels are not supposed to be used concurrently by applications.

Depending on the RabbitMQ version, deleting a queue that does not exist
is either a channel error (< 3.0) or a no-op that returns success (3.0+).

Recent Java client versions have a configurable continuation timeout for
(nearly) all operations. Set it e.g. to 10 or 15 seconds and also see http://www.rabbitmq.com/heartbeats.html.
Delayed responses and unresponsive peers is a sad fact of life in distributed systems and when
working with data services. So your application needs to be prepared to handle them.

Finally, use Java client 3.6.6 or 4.0 (they both support the same set of RabbitMQ server versions: 2.0+)


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Nov 30, 2016, 5:48:49 AM11/30/16
to rabbitm...@googlegroups.com
By "used concurrently" I mean "shared between threads".
in Channels and Concurrency Considerations.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

springusr

unread,
Nov 30, 2016, 1:40:03 PM11/30/16
to rabbitmq-users
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Thanks Mike, we are using spring boot 1.4 so it uses client version 3.6. Also we use rabbit template so channels will not be shared between threads. Let me know if any other information you need. How to set such timeout using spring rabbit template and spring amqp admin objects. 

Gary Russell

unread,
Nov 30, 2016, 2:14:55 PM11/30/16
to rabbitm...@googlegroups.com
@Michael, it looks like this is the same as [1].

@springusr - if that's the case, when cross-posting like this, it would be helpful to so indicate.

And, if that's the case, one thing they are doing that's a bit odd is declaring the queue before each send but there should not be any concurrent use of the channel.

According to the stack trace, the queueDeclare is stuck in BlockingCell.uninterruptibleGet() which can never timeout as far as I can see.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

Michael Klishin

unread,
Nov 30, 2016, 2:30:10 PM11/30/16
to rabbitm...@googlegroups.com
There is a version of BlockingCell#uninterruptableGet that accepts a timeout
but it is only used by one protocol method (connection initiation).

It should be reasonably easy to introduce continuation ("blocking cell") timeouts
e.g. per channel.

Declaring a queue before each publish is completely unnecessary, of course, and publishing
uses no continuations because there is no response from the server in the protocol (publisher confirms
are asynchronously sent and an extension).

springusr

unread,
Nov 30, 2016, 2:44:17 PM11/30/16
to rabbitmq-users
Gary, yes I will keep in mind when cross-posting. My bad.

Now declaring queue is requirement, I believe that is not what causing this situation. We have to declare queue because we are going to create thousans of queue for individual customers and those queues will be deleted  once processed by consumer. We cant use one queue, also sequence of messages matters to us in processing. That is reason behind prod creates queues and consumers after processing deletes queue. 

Now back to problem, how do I set timeout or something. I am using spring boot (1.4)/rabbittemplate/Cloud foundry/Rabbit mq v 3.6.3.

As mentioned when I see this issue, I can see on dashboard all those channels in running mode. Just now found this type of log from devops in rabbit mq server:

operation queue.declare caused a channel exception not_found: "no queue 'ourqueueName' in vhost '778fb055-6bd0-405c-913b-11a9925ec2f3'"

and

some other error like - rab...@d793bde59d57926276778ec63b36215c.log:Channel error on connection <0.25252.369> (10.999.999.12:52784 -> 10.999.999.14:5672, vhost: '778fb055-6bd0-405c-913b-11a9925ec2f3', user: 'df6w349b-ef3a-41e5-bec2-a0316eb98be3'), channel 7:

that is my user - these seems keep us hanging.

springusr

unread,
Nov 30, 2016, 2:55:57 PM11/30/16
to rabbitmq-users
Also, .net client we have is also hitting same situation but they are lucky that they are getting below exception and not getting blocked like java client.

RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
     OUT          at RabbitMQ.Client.Impl.ModelBase.BasicGet(String queue, Boolean noAck)


On Wednesday, November 30, 2016 at 1:44:17 PM UTC-6, springusr wrote:
Gary, yes I will keep in mind when cross-posting. My bad.

Now declaring queue is requirement, I believe that is not what causing this situation. We have to declare queue because we are going to create thousans of queue for individual customers and those queues will be deleted  once processed by consumer. We cant use one queue, also sequence of messages matters to us in processing. That is reason behind prod creates queues and consumers after processing deletes queue. 

Now back to problem, how do I set timeout or something. I am using spring boot (1.4)/rabbittemplate/Cloud foundry/Rabbit mq v 3.6.3.

As mentioned when I see this issue, I can see on dashboard all those channels in running mode. Just now found this type of log from devops in rabbit mq server:

operation queue.declare caused a channel exception not_found: "no queue 'ourqueueName' in vhost '778fb055-6bd0-405c-913b-11a9925ec2f3'"

and

some other error like - rabbit@d793bde59d57926276778ec63b36215c.log:Channel error on connection <0.25252.369> (10.999.999.12:52784 -> 10.999.999.14:5672, vhost: '778fb055-6bd0-405c-913b-11a9925ec2f3', user: 'df6w349b-ef3a-41e5-bec2-a0316eb98be3'), channel 7:

Arnaud Cogoluègnes

unread,
Dec 1, 2016, 8:52:07 AM12/1/16
to rabbitm...@googlegroups.com
We've filled in an issue for this [1], it will ship in 4.1.0.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

springusr

unread,
May 28, 2017, 11:46:39 AM5/28/17
to rabbitmq-users
Hi Gents, 

I want to give feedback on this issue, we again started seeing same issue and now I upgraded to version 4.1.0 and I configured default channel timeout from 10 min to 1 min and our hang is gone :) That is good thing. Thanks for enhancing library and giving this hook.

I have question though - what is causing it and should be remedy for this. Is that something wrong with how we use or network/cluster config.

 After upgrading version I can see periodic exception in log as below that prevents hang situation. 

com.rabbitmq.client.ChannelContinuationTimeoutException: Continuation call for method #method<channel.open>(out-of-band=) on channel AMQChannel(amqp://4ak19088-e413-4b62...@X.X.X.X:5672/e455646b-0d81-4516-fht5-823160b104kl,69) (#69) timed out

Thank you again.

springusr

unread,
May 31, 2017, 9:58:08 AM5/31/17
to rabbitmq-users
Also I noticed that default timeout for channel is set 10 minutes - that is lot - why is set at such high value?


On Sunday, May 28, 2017 at 10:46:39 AM UTC-5, springusr wrote:
Hi Gents, 

I want to give feedback on this issue, we again started seeing same issue and now I upgraded to version 4.1.0 and I configured default channel timeout from 10 min to 1 min and our hang is gone :) That is good thing. Thanks for enhancing library and giving this hook.

I have question though - what is causing it and should be remedy for this. Is that something wrong with how we use or network/cluster config.

 After upgrading version I can see periodic exception in log as below that prevents hang situation. 

com.rabbitmq.client.ChannelContinuationTimeoutException: Continuation call for method #method<channel.open>(out-of-band=) on channel AMQChannel(amqp://4ak19088-e413-4b62-9df4-8ef2d16d543f@X.X.X.X:5672/e455646b-0d81-4516-fht5-823160b104kl,69) (#69) timed out

Michael Klishin

unread,
May 31, 2017, 10:17:40 AM5/31/17
to rabbitm...@googlegroups.com
Default heartbeat timeout was about 10 minutes no so long ago. I suspect
at some point channel operation timeouts were set to match it (because
it doesn't make much sense for channel timeouts to be lower).

What Java client version do you use?

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

springusr

unread,
May 31, 2017, 11:29:37 AM5/31/17
to rabbitmq-users
Hi Mike,  I am using 4.1.0 now. Also it seem you might have missed my comment in this thread that I posted on May 28.

Michael Klishin

unread,
May 31, 2017, 12:09:55 PM5/31/17
to rabbitmq-users
OK, so you used a different timeout but the default is still the same. We'll see if we can reduce it, e.g. for 4.2.0.

As for why a timeout can happen, there can be all kinds of reasons:

 * The node this client was connected to was in an alarmed state and this client published something in that time frame (http://www.rabbitmq.com/alarms.html, https://www.rabbitmq.com/connection-blocked.html)
 * The node could be under high load
 * There was a network throughput slowdown

Some of these are less likely with the timeout of 1 minute than, say, 10 seconds, but
it's impossible to confidently say without having a certain number of metrics collected.

springusr

unread,
May 31, 2017, 12:21:47 PM5/31/17
to rabbitmq-users
Thanks, we will continue testing this and probably will change timeout to few seconds.

Amit Khosla

unread,
Jan 19, 2018, 9:41:12 AM1/19/18
to rabbitmq-users
Hi,

I am also facing similar issue. I am using spring-amqp/spring-rabbit 1.6.7, amqp-client 3.4.2.

@springuser, above, you mentioned that you tried amqp-client-4.1 and it worked. You still using spring amqp or using amqp-client directly?

Has this issue been addressed in new versions of spring-amqp which are using newer clients?

We are using RabbitMQ 3.6.10 and Erlang 19.3.4 on CentOS Linux 7 (Core).

Thanks & Regards
Amit

Michael Klishin

unread,
Jan 19, 2018, 5:10:07 PM1/19/18
to rabbitmq-users
Please start new threads for new questions. This is mailing list etiquette 101.

You’d get a more informed response quicker if you post Spring AMQP questions to Stack Overflow and tag
them as spring-amqp. That’s the discussion forum the maintainers of that project prefer.
Reply all
Reply to author
Forward
0 new messages