Hi, dear friends, we encounter a wired problem on rabbitmq: publish messages to rabbitmq is super slow.
We have queue A and queue B in rabbitmq-server. And we use celery with rabbitmq. The status of queues are
| Queue Name | Ready Messages | Unacked Messages |
|---|---|---|
| Queue A | 2 | 482,063 |
| Queue B | 3 | 31 |
We have 3 rabbitmq-servers, they are deployed in HA mode, all the queues are durable. And we deploy 9 consumers (celery workers) in 9 separate servers (didn’t run consumers in the same server with rabbitmq-servers).
When we publish the messages to Queue A, it is super slow which take more than 10s seconds. Sometimes it even takes more than 1 minute. We use tcpdump to track the packets between client side (celery.send_task) and rabbitmq-server. The client side ((celery.send_task)) indeed stuck more than 10 seconds when waiting for response from rabbitmq-server.
But at the meantime, publish the messages to Queue B is fast. And if we stop all the consumers, publish messages to Queue A would become fast as well. But if you start all consumers (9 consumers), publish messages to Queue A would stuck again.
We suspect it is because of network problems at first, but network should be fine according to ping/mtr result. All of them are located in the same private subnet. And we try to login rabbitmq servers and publish messages on rabbitmq servers directly, it is very slow when enqueue message as well. CPU/Memory/Disk IO/Network are fine in rabbitmq-servers/producer/consumer.
Is it a known issue? Anything I could help to troubleshoot this problem?
Our rabbitmq-servers are
ii rabbitmq-server 3.6.9-1 all Multi-protocol messaging broker
Erlang version is
# dpkg -l|grep erlang
ii esl-erlang 1:19.3 amd64 Erlang
OS is
Ubuntu 16.04 (Kernel 4.4.0-21-generic)
Looking forward your response, thank you in advance.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi, @Michael Klishin Thanks a lot for your quick response.> a queue only has messages to acceptDo you mean when there are a lot of unack messages in a queue, the throughout of this queue would become terrible without affect other queues?
> When it also has a consumer, it needs to do a certain amount of extra work and has a potentially slow> "downstream component" (the socket, the consumer, the channel prefetch [3]). It does not get twice the throughput,> of course, so internal flow control [1] makes sure that the publisher is throttled (or else the RAM usage> will keep growing and growing until the node is killed by the kernel).Yes, we check the state of connections via rabbitmqctl, but none of them is flow or blocked state.Would it become better that if we reduce consumers, for example only run 3 consumers instead of run 9 consumers?
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/HrbeWxst_TQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
And I read the implementation of rabbit_reader.erl
Found
-record(throttle, {
%% never | timestamp()
last_blocked_at,
%% a set of the reasons why we are
%% blocked: {resource, memory}, {resource, disk}.
%% More reasons can be added in the future.
blocked_by,
and
control_throttle(State = #v1{connection_state = CS,
throttle = #throttle{blocked_by = Reasons} = Throttle}) ->
Throttle1 = case credit_flow:blocked() of
true ->
Throttle#throttle{blocked_by = sets:add_element(flow, Reasons)};
false ->
Throttle#throttle{blocked_by = sets:del_element(flow, Reasons)}
end,
State1 = State#v1{throttle = Throttle1},
case CS of
running -> maybe_block(State1);
%% unblock or re-enable blocking
blocked -> maybe_block(maybe_unblock(State1));
_ -> State1
end.
Is there any way I could get the value of last_blocked_at from rabbitmq api, rabbitmqctl, webui without change the rabbitmq-server code and print it into log?
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/HrbeWxst_TQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Best Regards,Haosdent Huang
This looks not true. Because in rabbit_reader.erl, if the connection is blocked by credit flow, it would update connection_state to blocked as well.
maybe_block(State = #v1{connection_state = CS, throttle = Throttle}) ->
case should_block_connection(Throttle) of
true ->
State1 = State#v1{connection_state = blocked,
throttle = update_last_blocked_at(Throttle)},
To post to this group, send email to rabbitmq-users@googlegroups.com.
Hmm, confirmed should not because by flow control. Because after we changed to
{credit_flow_default_credit,{10000000,1000000}},
in all rabbitmq servers, the issue still happen.
Now we have a new founding, when publish messages is slow, the memory usage of beam.smp always 10+GB. When publish messages is fast, the memory usage of beam.smp always 8GB. Our servers are 64G and always have 40GB available memory. And vm_memory_high_watermark andvm_memory_high_watermark_paging_ratio are 0.9, so should not caused by no enough memory.
Is it because erlang stuck at gc and could not accept messages? Or the version of erlang we used would cause problem of rabbitmq
To post to this group, send email to rabbitmq-users@googlegroups.com.
Hi, @Michael Thanks a lot for your explanation. Sorry that just saw your message sent at "6:42 PM" after I sent last message.>This is what a queue process has to deal with: "serving" publishers and also delivering to consumers.Do you mean if not consumer, the queue process no need to worry about delivering messages to consumers, so it would faster, right?
>rabbitmq-sharding is much less prone to this behavior because sharded "logical" queues are process groups (if you have more than one consumer).Because we find this issue still happen even we only start one consumer, this plugin would not help for our cases, right?The two factors we think cause issue are a lot of unacked message in queue and 10G memory usage of beam.smp.Is there any bottleneck or known issues would happen for such situation?
Yep, it starts from 3.6.3. Just need rabbitmq-plugins enable rabbitmq_top.
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/HrbeWxst_TQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.