Degraded peformance for random stream queues on identical consumers

216 views
Skip to first unread message

Christian Ehmig

unread,
Jun 25, 2025, 5:42:33β€―AMJun 25
to rabbitmq-users
Hi,

Setup

RabbitMQ 4.1.1
3 node cluster
64 core AMD EPYC 7742, 2,25 GhZ
512 GB RAM
Bildschirmfoto 2025-06-25 um 11.31.24.png


Data

154 stream queues with following config (identical for all)

(sample stream list sorted by number of messages)
Bildschirmfoto 2025-06-25 um 11.32.17.png
x-max-age 172800s (2 days)
x-queue-type: stream
x-queue-leader-locator: least-leaders
durable: true

Message size: 28 byte
Stream sizes vary between a a few thousand messages "per day" and several billion per day.

Producers
1 producer per stream
latest java client


Consumers


3 consumer instances
each "consumer instance" consumes all 154 streams
one connection is used per stream

-> 154 x 3 = 462 connections for consumers (from consumer to rmq cluster)



Issue
The read performance is randomly inconsistent among all consumers.
In total, each consumer is capable of reading around 1.5 - 2 million messages per second.
However, this only applies for certain streams.

Example
A stream with 5.2 billion entries is consumed fine by consumer A and B but very slow on consumer C.
For another stream from the same setup, read performance is fine on consumer B and C but A is slow.

By observation, the read ratio for β€žslowβ€œ streams falls down to 10.000 messages per second.
This was even worse, when we used one connection for each consumer instead of one connection for each stream.

We already tried different initialCredit sizes - no luck.

What we expect: read from all streams with full speed on any consumer

The consumers itself just put the read messages β€žin memoryβ€œ - there is now slowdown on consumer end.


Go consumer setup code

env, err := stream.NewEnvironment(
stream.NewEnvironmentOptions().
SetMaxConsumersPerClient(1).
SetRPCTimeout(1 * time.Minute),
)

...

consumer, err := ha.NewReliableConsumer(env, streamName,
stream.NewConsumerOptions().
SetConsumerName(consumerName).Β 
SetCRCCheck(false). Β  Β  Β  Β  Β  Β 
SetAutoCommit(stream.NewAutoCommitStrategy().
SetCountBeforeStorage(10000). Β  Β  Β 
SetFlushInterval(10*time.Second)).Β 
SetOffset(stream.OffsetSpecification{}.First()),
ts.HandleRMQMessage)


Is there any tuning we could try? Thanks for any help.
Should we try with "autoOffset" disabled?


Best Regards
Christian


kjnilsson

unread,
Jun 25, 2025, 12:02:58β€―PMJun 25
to rabbitmq-users
If you have tried various higher initialCredit sizes then the only reason could be, as you suspect, the auto tracking.

Do you need to track offsets?

The current offset tracking implementation embeds offset tracking data in the stream. This could make is such that
actual data is spread further apart affecting the read performance of the stream.

At some point we may make an alternative tracking store but for now, if you still need tracking, you could just implement
your own in a db table or redis or similar.

Just turning the auto tracking off won't immediately fix it, you need to write new data that doesn't also include the tracking
data and see if that increases read performance between streams.

Cheers
Karl

Christian Ehmig

unread,
Jun 26, 2025, 4:53:45β€―AMJun 26
to rabbitmq-users
Hi Karl,

Thanks a lot for your response. We disabled auto-tracking yesterday (and yes, we need it to correctly spin up consumers again after failures).
Of course, we can implement a manual tracking.

But still, the issue persists. To give a specific example again:

We re-created all streams yesterday night.
Stream S has 300 million messages right now with an append rate of about 100k messages per second.
No issue on producer side.

Consumers A, B and C exist - they concurrently consume messages from Stream S.

Consumer A and B are up-to-date with a consume rate of 80k - 150k msg/s. Consumer C has a consume rate of 10k messages per second.
This leads to the fact that message timestamps on C are 2 hours behind A and B.

For another stream in the same size and update category, it’s completely different.
Consumers A and C are up-to-date while consumer B is behind several hours.

Cheers
Christian

Karl Nilsson

unread,
Jun 26, 2025, 5:18:00β€―AMJun 26
to rabbitm...@googlegroups.com
Oh ok, that does sound very strange indeed. So it isn't different streams, it is different consumers of the same stream.

I assume the consumers have identical configurations. Are they connected to different nodes?

Can you use the `rabbitmq-streams stream_status` tool to check it isn't the stream replication for a given node that is lagging behind.



--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/c701f477-ba91-440a-bd0a-898641b541e3n%40googlegroups.com.


--
Karl Nilsson
Message has been deleted
Message has been deleted

Christian Ehmig

unread,
Jun 26, 2025, 5:47:58β€―AMJun 26
to rabbitmq-users
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ role Β  Β β”‚ node Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β β”‚ epoch β”‚ offset Β  Β β”‚ committed_offset β”‚ first_offset β”‚ readers β”‚ segments β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ replica β”‚node1 β”‚ 3 Β  Β  β”‚ 643862139 β”‚ 643862138 Β  Β  Β  Β β”‚ 0 Β  Β  Β  Β  Β  Β β”‚ 0 Β  Β  Β  β”‚ 49 Β  Β  Β  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ writer Β β”‚ node2β”‚ 3 Β  Β  β”‚ 643862139 β”‚ 643862138 Β  Β  Β  Β β”‚ 0 Β  Β  Β  Β  Β  Β β”‚ 5 Β  Β  Β  β”‚ 49 Β  Β  Β  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ replica β”‚node3 β”‚ 3 Β  Β  β”‚ 643862139 β”‚ 643862138 Β  Β  Β  Β β”‚ 0 Β  Β  Β  Β  Β  Β β”‚ 0 Β  Β  Β  β”‚ 49 Β  Β  Β  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Readers from node2 (writer) are: node1, node3, consumer A, B and C. Don't know why they are all connected to the writer.
Are there any other limitations? Is the "maximum read speed" usually evenly distributed among consumers?

Karl Nilsson

unread,
Jun 26, 2025, 6:03:03β€―AMJun 26
to rabbitm...@googlegroups.com
There are no artificial limitations to read speed per se apart from the consumer flow control (initial credits).

If you restart consumer C (the slow one) - does it behave differently?

what kind of disks do you have?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.


--
Karl Nilsson
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Christian Ehmig

unread,
Jun 27, 2025, 3:19:28β€―AMJun 27
to rabbitmq-users
For some reason, I cannot post anything - messages get deleted instantly, do certain keywords trigger the automatic deletion?

Trying again - we have NVMe disks.

After the restart the "consume" rate of the restarted consumer raises significantly from 560k messages per second to 2.6 million messages per second.
However, since the restart two hours ago, the rate decreases continuously, we're at 900k messages per second now.

And this although some larger streams are still way behind (several hours). The measured rate is the consume rate of all streams (157 in total) summed up for one "consumer".

Christian Ehmig

unread,
Jun 27, 2025, 3:27:30β€―AMJun 27
to rabbitmq-users
Trying to post a chart of the consume rate. After the restart (yesterday around 14:45), the rate was fine.Β You can see the degrading performance over the day. This morning, the insert rate should e around 650k - 700k to manage incoming data but we barely get to 400k.

Bildschirmfoto 2025-06-27 um 09.24.19.png

Christian Ehmig

unread,
Jun 27, 2025, 3:54:03β€―AMJun 27
to rabbitmq-users
We have NVMe disks (nvme-Micron_9300). The restart of one node made things better, but as you can see in the attached chart or the node insert rate, the "consume" rate slows down over time, although several consumed streams are still behind and need to catch up.tps.png

Christian Ehmig

unread,
Jun 27, 2025, 3:54:07β€―AMJun 27
to rabbitmq-users
Bildschirmfoto 2025-06-26 um 11.38.12.png

Readers are node1, node2, consumer A, B and C which sums up to 5. Don't know why they are all connected to the writer though.

Michal Kuratczyk

unread,
Jun 27, 2025, 3:57:08β€―AMJun 27
to rabbitm...@googlegroups.com
Hey,

For whatever reason, Google Groups marked some of your messages as potential spam.
I've now approved some of them, so hopefully all info is in this thread. In general, IΒ personallyΒ prefer Github Discussions as they have
no such issues, but feel free to stay here - just keep in mind we may need to approve some messages for them to appear.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.


--
Michal
RabbitMQ Team

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Gabriele Santomaggio

unread,
Jun 30, 2025, 9:48:05β€―AMJun 30
to rabbitmq-users
Christian,
Do you have aΒ chanceΒ to test this [1] commit?
It reduces the potential in memory chunks.Β Β 

( as Michal said, a Github DiscussionsΒ  would be better btw )

-
Gabriele

1- https://github.com/rabbitmq/rabbitmq-stream-go-client/commit/2d47c02351472f5e608aa05078a742781bdb5e48
Message has been deleted

Christian Ehmig

unread,
Jul 3, 2025, 8:39:41β€―AMJul 3
to rabbitmq-users
Hi Gabriele,

Thanks - will be able to test the commit during the next days. I can switch to Github Discussions of course.

BestΒ 
Christian

Message has been deleted

Christian Ehmig

unread,
Jul 8, 2025, 2:28:20β€―AMJul 8
to rabbitmq-users
Moved toΒ https://github.com/rabbitmq/rabbitmq-stream-go-client/issues/414 now. Issues still persists.
Reply all
Reply to author
Forward
0 new messages