Queue in stopped state - and service unable to connect to them with error

4,592 views
Skip to first unread message

getaju69

unread,
Oct 21, 2019, 9:10:46 AM10/21/19
to rabbitmq-users
Hi Team,

We recently encountered an issue with one of the queues setup on RabbitMQ (HA on kubernetes) . The service did not connect to the queue anymore and the error that was thrown was:

2019-10-09T07:24:54.970Z - error: RabbitMQ channel error: Channel closed by server: 404 (NOT-FOUND) with message "NOT_FOUND - queue 'job_queue' in vhost 'arhost' process is stopped by supervisor"
2019-10-09T07:24:54.972Z - warn: RabbitMQ channel closed, exiting.
error: Forever detected script exited with code: 1

And on the rabbitmq admin page, we could see:

stopped.JPG







We had to delete the queue and let the service create it again.

 We didn't have ha policies setup then (when the issue occurred) but we do have it now (ha-mode, ha-sync and ha-params) . Are these related in any way ? (Had faced other issues without the ha-policies)

What exactly is a 'stopped' state and how can we recover from this state automatically?

Wesley Peng

unread,
Oct 21, 2019, 9:17:01 AM10/21/19
to rabbitm...@googlegroups.com
Hi


Before the community can help you, we must know this information:

    Version of Erlang and RabbitMQ
    Operating system and version
    RabbitMQ configuration files
    RabbitMQ log files, or log file entries
    Exact error output
    Exact commands you are running, or code you are running

Regards 

getaju69 <geta...@gmail.com>于2019年10月21日 周一下午9:10写道:
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f6488062-8027-4eba-8cf8-394e03fb75ee%40googlegroups.com.

getaju69

unread,
Oct 21, 2019, 9:33:47 AM10/21/19
to rabbitmq-users
I could provide some information:

    Version of Erlang and RabbitMQ: Erlang 22.0.5 and RabbitMQ 3.7.15
    Operating system and version: Alpine Linux v3.8
    RabbitMQ configuration files: Not clear on what is required, where exactly I can get these from?
    RabbitMQ log files, or log file entries: The issue doesn't occur anymore, we did not collect the RabbitMQ log files then
    Exact error output: As pasted in the message,
    Exact commands you are running, or code you are running: It is an application running on docker, the source code of which we do not have access to

On Monday, 21 October 2019 15:17:01 UTC+2, Wesley Peng wrote:
Hi


Before the community can help you, we must know this information:

    Version of Erlang and RabbitMQ
    Operating system and version
    RabbitMQ configuration files
    RabbitMQ log files, or log file entries
    Exact error output
    Exact commands you are running, or code you are running

Regards 
getaju69 <geta...@gmail.com>于2019年10月21日 周一下午9:10写道:
Hi Team,

We recently encountered an issue with one of the queues setup on RabbitMQ (HA on kubernetes) . The service did not connect to the queue anymore and the error that was thrown was:

2019-10-09T07:24:54.970Z - error: RabbitMQ channel error: Channel closed by server: 404 (NOT-FOUND) with message "NOT_FOUND - queue 'job_queue' in vhost 'arhost' process is stopped by supervisor"
2019-10-09T07:24:54.972Z - warn: RabbitMQ channel closed, exiting.
error: Forever detected script exited with code: 1

And on the rabbitmq admin page, we could see:

stopped.JPG







We had to delete the queue and let the service create it again.

 We didn't have ha policies setup then (when the issue occurred) but we do have it now (ha-mode, ha-sync and ha-params) . Are these related in any way ? (Had faced other issues without the ha-policies)

What exactly is a 'stopped' state and how can we recover from this state automatically?


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Wesley Peng

unread,
Oct 21, 2019, 9:58:26 PM10/21/19
to rabbitm...@googlegroups.com
Hi

on 2019/10/21 21:33, getaju69 wrote:
> I could provide some information:
>
>     Version of Erlang and RabbitMQ: Erlang 22.0.5 and RabbitMQ 3.7.15
>     Operating system and version: Alpine Linux v3.8
>     RabbitMQ configuration files: Not clear on what is required, where
> exactly I can get these from?
>     RabbitMQ log files, or log file entries: The issue doesn't occur
> anymore, we did not collect the RabbitMQ log files then
>     Exact error output: As pasted in the message,
>   Exact commands you are running, or code you are running: It is an
> application running on docker, the source code of which we do not have
> access to

A simple way to fixup, just restart the node that handles the queue.

regards.

getaju69

unread,
Oct 22, 2019, 4:01:02 AM10/22/19
to rabbitmq-users
That would work, but then we do not want to end up in a situation that it doesn't connect for long and we have to manually recover from it.

So is there a  way to recover from it automatically?

What exactly does it mean by a queue to be in stopped state and what does the error mean ?  Then we could do something. Is there some documentation for this?

Wesley Peng

unread,
Oct 22, 2019, 4:05:05 AM10/22/19
to rabbitm...@googlegroups.com
Hi

on 2019/10/22 16:01, getaju69 wrote:
> That would work, but then we do not want to end up in a situation that
> it doesn't connect for long and we have to manually recover from it.
>
> So is there a  way to recover from it automatically?
>
> What exactly does it mean by a queue to be in stopped state and what
> does the error mean ?  Then we could do something. Is there some
> documentation for this?

This is an environment problem, not rabbitmq's issue.
see please:
https://stackoverflow.com/questions/7732371/how-to-properly-manage-rabbitmq-with-supervisord

regards.

getaju69

unread,
Oct 23, 2019, 6:52:25 AM10/23/19
to rabbitmq-users
It probably is, but I couldn't figure out yet how the linked issue is similar to mine.

My question specifically is what does it mean when a queue is in stopped state ? I haven't found any explanation about this anywere. Also after the ha-setup - we noticed that the queue although in stopped state, still has consumers and is working as expected. Can you tell us more about this? Are there any steps we could take for this? 

Wesley Peng

unread,
Oct 23, 2019, 7:28:32 AM10/23/19
to rabbitm...@googlegroups.com
If you see stopped statement from webadmin, that would be possible just the stale information.  You would flush the DB cache.

Regards 

getaju69 <geta...@gmail.com>于2019年10月23日 周三下午6:52写道:
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/22b724e6-8efa-4367-a0f1-d4cad2d2bb39%40googlegroups.com.

getaju69

unread,
Oct 23, 2019, 7:54:30 AM10/23/19
to rabbitmq-users
Thanks. Can you also tell me maybe how I could do that ? Is it possible from the webadmin ?


On Wednesday, 23 October 2019 13:28:32 UTC+2, Wesley Peng wrote:
If you see stopped statement from webadmin, that would be possible just the stale information.  You would flush the DB cache.

Regards 
getaju69 <geta...@gmail.com>于2019年10月23日 周三下午6:52写道:
It probably is, but I couldn't figure out yet how the linked issue is similar to mine.

My question specifically is what does it mean when a queue is in stopped state ? I haven't found any explanation about this anywere. Also after the ha-setup - we noticed that the queue although in stopped state, still has consumers and is working as expected. Can you tell us more about this? Are there any steps we could take for this? 


On Tuesday, 22 October 2019 10:05:05 UTC+2, Wesley Peng wrote:
Hi

on 2019/10/22 16:01, getaju69 wrote:
> That would work, but then we do not want to end up in a situation that
> it doesn't connect for long and we have to manually recover from it.
>
> So is there a  way to recover from it automatically?
>
> What exactly does it mean by a queue to be in stopped state and what
> does the error mean ?  Then we could do something. Is there some
> documentation for this?

This is an environment problem, not rabbitmq's issue.
see please:
https://stackoverflow.com/questions/7732371/how-to-properly-manage-rabbitmq-with-supervisord

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

getaju69

unread,
Oct 24, 2019, 3:44:11 AM10/24/19
to rabbitmq-users
Hi,

Do you know how I could flush the DB cache to clear the stale information ? Do I need to ask this question separately?


On Wednesday, 23 October 2019 13:28:32 UTC+2, Wesley Peng wrote:
If you see stopped statement from webadmin, that would be possible just the stale information.  You would flush the DB cache.

Regards 
getaju69 <geta...@gmail.com>于2019年10月23日 周三下午6:52写道:
It probably is, but I couldn't figure out yet how the linked issue is similar to mine.

My question specifically is what does it mean when a queue is in stopped state ? I haven't found any explanation about this anywere. Also after the ha-setup - we noticed that the queue although in stopped state, still has consumers and is working as expected. Can you tell us more about this? Are there any steps we could take for this? 


On Tuesday, 22 October 2019 10:05:05 UTC+2, Wesley Peng wrote:
Hi

on 2019/10/22 16:01, getaju69 wrote:
> That would work, but then we do not want to end up in a situation that
> it doesn't connect for long and we have to manually recover from it.
>
> So is there a  way to recover from it automatically?
>
> What exactly does it mean by a queue to be in stopped state and what
> does the error mean ?  Then we could do something. Is there some
> documentation for this?

This is an environment problem, not rabbitmq's issue.
see please:
https://stackoverflow.com/questions/7732371/how-to-properly-manage-rabbitmq-with-supervisord

regards.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Wesley Peng

unread,
Oct 24, 2019, 3:46:33 AM10/24/19
to rabbitm...@googlegroups.com
getaju69 wrote:
>
> Do you know how I could flush the DB cache to clear the stale
> information ? Do I need to ask this question separately?

Hello

I once did it. but I forgot the process. I have no rabbitmq instance in
the hand, so you may google and test it by yourself.

regards.

getaju69

unread,
Oct 24, 2019, 4:36:00 AM10/24/19
to rabbitmq-users
Thank you. 


rabbitmqctl eval "rabbit_mgmt_storage:reset_all()."

Before doing it on our production clusters, I wanted to confirm with you if this is what you mean by it? Does resetting stats also reset the queue state or just the stats regarding the message rates ?

getaju69

unread,
Oct 24, 2019, 8:34:54 AM10/24/19
to rabbitmq-users
Hello Wesley,

Investigating it a bit more, I do not think it is the stale information. Since when I get the information of the queue using rabbitmqctl , I still see it as stopped.

bash-4.4$  rabbitmqctl list_queues -p cmhost name state | grep items_queue
items_queue    stopped

This is also on all 3 nodes. So it means the actual state is indeed stopped, correct ? Then why would it be working ? . 


On Thursday, 24 October 2019 09:46:33 UTC+2, Wesley Peng wrote:

Wesley Peng

unread,
Oct 24, 2019, 8:39:43 AM10/24/19
to rabbitm...@googlegroups.com
getaju69 wrote:
> bash-4.4$  rabbitmqctl list_queues -p cmhost name state | grep items_queue
> items_queue    stopped
>
> This is also on all 3 nodes. So it means the actual state is indeed
> stopped, correct ? Then why would it be working ? .
>

Hello

This is maybe a bug. what's your rabbitmq and erlang versions?

regards.

getaju69

unread,
Oct 24, 2019, 8:45:05 AM10/24/19
to rabbitmq-users
RabbitMQ: 3.7.15
Erlang: Erlang 22.0.5

Wesley Peng

unread,
Oct 24, 2019, 8:48:59 AM10/24/19
to rabbitm...@googlegroups.com
getaju69 wrote:
> RabbitMQ: 3.7.15
> Erlang: Erlang 22.0.5
>
> On Thursday, 24 October 2019 14:39:43 UTC+2, Wesley Peng wrote:
>
> getaju69 wrote:
> > bash-4.4$  rabbitmqctl list_queues -p cmhost name state | grep
> items_queue
> > items_queue    stopped

This stopped status sounds strange.

It is maybe something referenced to system internal error.

Can @Michael, @Luke and @Karl confirm it?

Thanks.

getaju69

unread,
Oct 25, 2019, 3:23:19 AM10/25/19
to rabbitmq-users
Dear Team,

Any updates on this ?

getaju69

unread,
Oct 25, 2019, 10:48:46 AM10/25/19
to rabbitmq-users
Hi Wesley and team,

Are there any updates on this ?

getaju69

unread,
Oct 28, 2019, 5:42:32 AM10/28/19
to rabbitmq-users
Dear Team,

I haven't got an update yet. Do you have any updates on this and can you please tell me how I could proceed with this. Thanks in advance.

Wesley Peng

unread,
Oct 28, 2019, 5:46:58 AM10/28/19
to rabbitm...@googlegroups.com


on 2019/10/28 17:42, getaju69 wrote:
> I haven't got an update yet. Do you have any updates on this and can you
> please tell me how I could proceed with this. Thanks in advance.
>
>

Is this info below helpful?

It's possible that when you shut down a master node that all available
slaves are unsynchronised. A common situation in which this can occur is
rolling cluster upgrades. By default, RabbitMQ will refuse to fail over
to an unsynchronised slave on controlled master shutdown (i.e. explicit
stop of the RabbitMQ service or shutdown of the OS) in order to avoid
message loss; instead the entire queue will shut down as if the
unsynchronised slaves were not there. An uncontrolled master shutdown
(i.e. server or node crash, or network outage) will still trigger a
failover even to an unsynchronised slave.

If you would prefer to have master nodes fail over to unsynchronised
slaves in all circumstances (i.e. you would choose availability of the
queue over avoiding message loss) then you can set the
ha-promote-on-shutdown policy key to always rather than its default
value of when-synced.

https://www.rabbitmq.com/ha.html

getaju69

unread,
Oct 28, 2019, 6:08:58 AM10/28/19
to rabbitmq-users
Still how does it explain the state stopped of a queue, 

It is working good, as in the queue still has consumers and messages are consumed. So I am happy with the condition now, just that the queue is still in the stopped state ?

Before I setup the ha-policies. this error was shown in the application and the stopped state. Now although application doesn't throw any errors, the queue is still in the stopped state.

Wesley Peng

unread,
Oct 28, 2019, 6:13:38 AM10/28/19
to rabbitm...@googlegroups.com


on 2019/10/28 18:08, getaju69 wrote:
> Before I setup the ha-policies. this error was shown in the application
> and the stopped state. Now although application doesn't throw any
> errors, the queue is still in the stopped state.

As the manual says, when the master refuse to fail over to an unsynced
slave, the entire queue will shut down. So you can continue to read, but
the queue content is maybe stale.

regards.

getaju69

unread,
Oct 29, 2019, 10:29:16 AM10/29/19
to rabbitmq-users
I'm not sure I understand. What do you mean when you say the queue content is maybe stale ? There are new messages coming into the queue, and also being consumed at a normal rate. Which means queue is functioning. Then why would it be in a stopped state?

Let's say this happened when there was a shutdown (graceful as in the documentation) . We might not have had synced mirrors then and it would have been stopped. So updating the policy to have it always (ha-promote-on-shutdown) will make sure that queue doesn't go to a stopped state? 

But that doesn't explain the current state of the queue, right ? How can I make the state of the queue "Running" again ? 

getaju69

unread,
Nov 1, 2019, 6:31:02 AM11/1/19
to rabbitmq-users
Hi Wesley,

Could you please check and let me know?

Wesley Peng

unread,
Nov 1, 2019, 6:40:49 AM11/1/19
to rabbitm...@googlegroups.com
getaju69 wrote:
> Hi Wesley,
>
> Could you please check and let me know?
>
> On Tuesday, 29 October 2019 15:29:16 UTC+1, getaju69 wrote:
>
> I'm not sure I understand. What do you mean when you say the queue
> content is maybe stale ? There are new messages coming into the
> queue, and also being consumed at a normal rate. Which means queue
> is functioning. Then why would it be in a stopped state?
>
> Let's say this happened when there was a shutdown (graceful as in
> the documentation) . We might not have had synced mirrors then and
> it would have been stopped. So updating the policy to have it always
> (ha-promote-on-shutdown) will make sure that queue doesn't go to a
> stopped state?
>
> But that doesn't explain the current state of the queue, right ? How
> can I make the state of the queue "Running" again ?

getaju,

I did try to check the source code to find out it, but got no luck.
I expect the staff from @Pivotal would dig into it.

regards.
Reply all
Reply to author
Forward
0 new messages