PerfTest on latest rabbitmq version fails

241 views
Skip to first unread message

Jeff S

unread,
Nov 2, 2022, 12:32:00 PM11/2/22
to rabbitmq-users
I set up a 3 node (each node is 16 cpus/16gb ram) on a windows 2019 server using 3.10.8. I have a few questions regarding the PerfTest

I'm running:
runjava.bat com.rabbitmq.perf.PerfTest -H amqp://username:password@ip_address:5672/%2f -z 1800 -f persistent -q 1000 -c 1000 -ct -1 -ad false --rate 80 --size 50 --queue-pattern 'perf-test-%d' --queue-pattern-from 1 --queue-pattern-to 16 -qa auto-delete=false,durable=false,x-queue-type=quorum --producers 200 --consumers 200 --consumer-latency 10000 --producer-random-start-delay 30

1) it fails with IOException if change the queue-pattern-to anything but 16 like 32,64 or 128. any idea why?
2) I see all queues in management ui created on node 1. any way I can balance them out across all 3 nodes?
3) when hitting over 20K msg/sec, every queue goes red (minority) with CPU and memory less than 30%. What should I look at to figure out why all 16 queues go into minority? What can I troubleshoot?

Thank you in advance.

Arnaud Cogoluègnes

unread,
Nov 3, 2022, 8:30:34 AM11/3/22
to rabbitmq-users
For 1) we'd need more information, like the stack trace of the exception.

For 2) provide the URL of each node separated by commas. PerfTest will create the queues on the different nodes.

Make sure to start from a fresh cluster, without any existing resources, to be sure PerfTest does not try to create resources that already exist with different parameters.

Jeff S

unread,
Nov 3, 2022, 9:41:16 AM11/3/22
to rabbitmq-users
For #1, exception is below
For #2, thanks although it's weird since a load balancer with round robin is used so I would assume it would work.
For #3, it's brand new with no queues or policies enabled, what can I troubleshoot?

java.lang.RuntimeException: java.io.IOException

        at com.rabbitmq.perf.Consumer.registerAsynchronousConsumer(Consumer.java:231)

        at com.rabbitmq.perf.Consumer.run(Consumer.java:165)

        at com.rabbitmq.perf.MulticastSet.startConsumers(MulticastSet.java:435)

        at com.rabbitmq.perf.MulticastSet.run(MulticastSet.java:241)

        at com.rabbitmq.perf.PerfTest.main(PerfTest.java:418)

        at com.rabbitmq.perf.PerfTest.main(PerfTest.java:542)

Caused by: java.io.IOException: null

        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)

        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)

        at com.rabbitmq.client.impl.ChannelN.basicConsume(ChannelN.java:1384)

        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.basicConsume(AutorecoveringChannel.java:543)

        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.basicConsume(AutorecoveringChannel.java:520)

        at com.rabbitmq.perf.Consumer.registerAsynchronousConsumer(Consumer.java:227)

        ... 5 common frames omitted

Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue ''perf-test-2'' in vhost '/', class-id=60, method-id=20)

        at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)

        at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)

        at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)

        at com.rabbitmq.client.impl.ChannelN.basicConsume(ChannelN.java:1378)

        ... 8 common frames omitted

Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue ''perf-test-2'' in vhost '/', class-id=60, method-id=20)

        at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:517)

        at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:341)

        at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:182)

        at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:114)

        at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:739)

        at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47)

        at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:666)

        at java.lang.Thread.run(Unknown Source)

Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding.

Michal Kuratczyk

unread,
Nov 3, 2022, 11:56:06 AM11/3/22
to rabbitm...@googlegroups.com
Hi,

Are you sure these nodes actually formed a valid cluster? Do you see 3 nodes in the Management UI? Can you share the screenshots wof the Overview and Queues pages?

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/68cf67cd-6de6-4283-9c0c-10606962aec0n%40googlegroups.com.


--
Michał
RabbitMQ team

AceMQ

unread,
Nov 3, 2022, 12:40:59 PM11/3/22
to rabbitm...@googlegroups.com
Yes, perftest runs fine with 16 queues. I also put 3 individual nodes and it ran fine. Just not with more. My more immediate request is to figure out what to troubleshoot when load exceeds 26k messages/sec and queues go all red and say minority. 
But would love a response on both if possible. 
image

On Nov 3, 2022, at 11:56 AM, Michal Kuratczyk <mkura...@gmail.com> wrote:


You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/F2HEBCbHT2I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAA81d0srnvuXJ0j4gMvi7HwwfxvoxPWgAh3S20t%2BSV5wndDbCw%40mail.gmail.com.

Michal Kuratczyk

unread,
Nov 3, 2022, 2:38:26 PM11/3/22
to rabbitm...@googlegroups.com
I've never seen queues going into minority in such a situation but:
1. All queue leaders on a single node are (almost certainly) due to perf-test declaring queues through a dedicated connection and the default queue leader locator of client-local;
    add `queue_leader_locator = balanced` to your config file or `--queue-args`
1. The logs should provide some clues; my guess is that your nodes are overloaded and start missing heartbeats or something along those lines
2. Check `rabbitmq-queues quorum_status` for some of the affected queues
3. Browse through Prometheus/Grafana metrics and especially for fully utilized CPUs, Erlang distribution buffer (this should be mentioned in the logs as well)
4. By the way, you are trying to set durable=false which is not a thing for quorum queues.

Here are some results I see (with balanced queue leaders), First 16 queues, then 20 then 24. I'm not sure whether your goal is indeed to have 200 producers/consumers but increase the number of queues
or rather to have as many queues as possible with 80 msgs/s each - you mentioned setting a higher --queue-pattern-to but you also mentioned load exceeding 26k msgs/s, which would require a higher rate
or more producers). For now I only changed --queue-pattern-to, so we see 16k msgs/s published in all cases:

Screenshot 2022-11-03 at 19.34.58.png


To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/9D331A33-D119-49B8-A632-8117292D13D9%40acemq.com.

--
Michał
RabbitMQ team
RabbitMQ team

Arnaud Cogoluègnes

unread,
Nov 4, 2022, 4:02:15 AM11/4/22
to rabbitmq-users
> For #2, thanks although it's weird since a load balancer with round robin is used so I would assume it would work.

When only one URL is specified, PerfTest creates one "configuration" connection and uses it to create all the queues, that's why they end up on the same node in your case. With 3 URLs, PerfTest will use 3 "configuration" connections and use them in a round-robin fashion to create the queues. You can try to fool PerfTest by providing the load balancer URL 3 times, it should work.

For #1, PerfTest does not find a queue when it tries to consume from it. This can happen if the queue is e.g. exclusive and the configuration / consumer connections are not the same, but apparently it's not the case here. Another cause could be that the nodes are not clustered, so the queue is created on one node and the consumer is connected to another node (that's what Michal suggested). Yet another cause could be a network partition that makes the queue not visible on a node.

Could you skip the load balancer and go directly to the node (just to remove one layer)?

Note you can use the --quorum-queue flag [1], it's a shortcut for "--flag persistent --queue-args x-queue-type=quorum --auto-delete false", it'll make the command line shorter.


Jeff S

unread,
Nov 4, 2022, 9:23:26 AM11/4/22
to rabbitmq-users
1) I created a policy and made `queue_leader_locator = balanced`
2) I removed unncecessary -qa arguments and added --quorum-queue instead
3) I specified all 3 uris/nodes in PerfTest and see queues created on all 3 nodes

I see about 13 queues out of 16 going into red and when hovering over, says minority. Cluster did form, at low rates, worked fine, at over 20K msg/sec, this is what I see.

Michal Kuratczyk

unread,
Nov 4, 2022, 9:57:07 AM11/4/22
to rabbitm...@googlegroups.com
Please share the logs

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.


--
Michał
RabbitMQ team

Jeff S

unread,
Nov 4, 2022, 10:35:17 AM11/4/22
to rabbitmq-users

Michal Kuratczyk

unread,
Nov 4, 2022, 11:23:21 AM11/4/22
to rabbitm...@googlegroups.com
Seems like there's something wrong with file access:
```
crasher:
  initial call: ra_log_segment_writer:init/1
  pid: <0.961.0>
  registered_name: ra_log_segment_writer
  exception error: no match of right hand side value {error,eacces}
    in function  ra_log_segment_writer:handle_cast/2 (src/ra_log_segment_writer.erl, line 180)
    in call from gen_server:try_dispatch/4 (gen_server.erl, line 1123)
    in call from gen_server:handle_msg/6 (gen_server.erl, line 1200)
```



--
Michał
RabbitMQ team

Michal Kuratczyk

unread,
Nov 4, 2022, 11:26:13 AM11/4/22
to rabbitm...@googlegroups.com
Sent too fast. Anyway, you have file access errors like above and this:
 <0.5849.136> segment_writer: skipping segment as directory c:/Users/283985/AppData/Roaming/RabbitMQ/db/rabbit@PMELRABMQ00-mnesia/quorum/rabbit@PMELRABMQ00/2F_C0ANWON5SCDXRDW does not exist

I don't know what's going on, perhaps some antivirus or intrusion prevention system kicks in when there's too much activity?


--
Michał
RabbitMQ team

Jeff S

unread,
Nov 4, 2022, 4:06:46 PM11/4/22
to rabbitmq-users
Super helpful advice, we're investigating...Thank you

AceMQ

unread,
Nov 9, 2022, 10:53:30 AM11/9/22
to rabbitm...@googlegroups.com
An update on this issue. Quorum queues continue going into minority with the same error message (segment file error). Switching to classic queues on the same cluster with ha-params of exactly 3 (7 node cluster), no problems at all, everything is green. We were able to hit 30-35k messages/sec with publisher confirms on. I was hoping to use quorum and not rely on classic but looks like I have no choice. Any insights will be super helpful.

Luke Bakken

unread,
Nov 9, 2022, 11:35:15 AM11/9/22
to rabbitmq-users
Hello,

In a previous message, Michal suggested that your Windows servers may be running software that is interfering with RabbitMQ. We haven't heard back about that.

The permission errors and missing directories points to something like that in your environment.

Thanks,
Luke

AceMQ

unread,
Nov 9, 2022, 11:38:29 AM11/9/22
to rabbitm...@googlegroups.com
We've disabled every single service on a service. Literally no external agents or services are running. No anti virus or scanning or snapshotting is happening. The only IOPS we see is by erl.exe against %APPDATA%\RabbitMQ\roaming.....

Luke Bakken

unread,
Nov 9, 2022, 11:39:22 AM11/9/22
to rabbitmq-users
Thank you for confirming.

AceMQ

unread,
Nov 9, 2022, 12:45:50 PM11/9/22
to rabbitm...@googlegroups.com
We've consistently recreated the issue with quorum queues. They go red with same message as original post. Classic queues consistently work with no issues (with publisher confirms, ha-params set to majority number, when-synced, automatic). I would totally expect the opposite but quorum somehow give us exceptions.

I've tried switching back and forth classic and quorum and can recreate every single time so it's not something that kicks in randomly or scans randomly.
Reply all
Reply to author
Forward
0 new messages