RabbitMQ streams internals documentation

200 views
Skip to first unread message

Francesco Pessina

unread,
Aug 10, 2023, 11:01:31 AM8/10/23
to rabbitmq-users
Hello everyone,
I'm new to RabbitMQ streams and I wish to study it thoroughly.
Is there any documentation regarding its internals mechanisms?


For example, some questions I can't answer by myself:
- how is determined the size of a "chunk"?
- is there a way to achieve "exactly once" semantics?
- which are the configurations to achieve the less possible latency?

Knowing the internals of the system would help me a lot.

Thank you!

kjnilsson

unread,
Aug 14, 2023, 4:46:00 AM8/14/23
to rabbitmq-users
Hi,

1. It depends on the ingress rater, higher rate = larger chunks. You can't control it directly. See: https://github.com/rabbitmq/gen-batch-server
2. You are going to have to give me a lot more detail as to what you are wanting to achieve here. :)
3.  Not really. use a stream client rather than AMQP is the best option. Streams don't fsync so latency is very low anyway.

Artur Wroblewski

unread,
Aug 14, 2023, 1:10:15 PM8/14/23
to rabbitm...@googlegroups.com
On Mon, Aug 14, 2023 at 01:46:00AM -0700, kjnilsson wrote:
> Hi,
>
> 1. It depends on the ingress rater, higher rate = larger chunks. You can't
> control it directly. See: https://github.com/rabbitmq/gen-batch-server

I might be missing something, or maybe I misunderstood the original
question, but it depends on a client, which implementes the protocol, isn't
it?

I believe, the protocol itself supports only sending a number of messages
in a chunk (which can be fixed or vary from chunk to chunk). Broker accepts
a chunk and does *not* split or merge chunks (but there is concept of
segment on the broker side). How that feature of the protocol is
implemented, depends on an implementation of a client.

For example, in Java client, maximum number of messages accumulated by
a client (producer instance?) can be set with `batchSize` parameter

https://rabbitmq.github.io/rabbitmq-stream-java-client/stable/htmlsingle/#creating-a-producer

And, bytewise, chunk will be no bigger than streams protocol frame size

https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_stream/docs/PROTOCOL.adoc#frame-structure

I believe, there is timeout factor here as well, so maybe that's the
indirect part dependant on ingress.

However, in RbFly (sorry for shameless plug, but this is what I know in
detail :), you can use a publisher implemented either by
PublisherBatchLimit class or PublisherBatchFast class

https://wrobell.dcmod.org/rbfly/publish.html

The former blocks when an application hits its own limit of number of
messages. On flush, it will send a chunk of messages, up to that limit.

The latter publisher allows to send a fixed number of messages to a broker
in a loop.

In both cases, messages might be split into multiple chunks, when their
total length goes beyond the protocol frame size. This is client feature
though.

[...]

Best regards,

Artur

--
https://mortgage.diy-labs.eu/

kjnilsson

unread,
Aug 15, 2023, 3:43:25 AM8/15/23
to rabbitmq-users
The broker _does_ split any batches sent by stream clients before it sends it on to the stream process so the batching that is done at the client side does not form a chunk. The chunk size is all handled by the gen batch server code which depends on ingress.

(The exception to this are "sub batches" which are a special type of multi-message record).

IMO stream clients should not arbitrarily delay sending of messages to form batches. The batches should only be built up during the time it takes for the client to format a send the previous batch of messages. Like gen_batch_server works roughly.

Cheers
Karl

Artur Wroblewski

unread,
Aug 15, 2023, 5:48:00 AM8/15/23
to rabbitm...@googlegroups.com
On Tue, Aug 15, 2023 at 12:43:25AM -0700, kjnilsson wrote:
> The broker _does_ split any batches sent by stream clients before it sends
> it on to the stream process so the batching that is done at the client side
> does not form a chunk. The chunk size is all handled by the gen batch
> server code which depends on ingress.
>
> (The exception to this are "sub batches" which are a special type of
> multi-message record).
>
> IMO stream clients should not arbitrarily delay sending of messages to form
> batches. The batches should only be built up during the time it takes for
> the client to format a send the previous batch of messages. Like
> gen_batch_server works roughly.

The protocol documentation says nothing, so I can guess only. Based on
above and on what I observe when sending messages (my observations might be
out-of-date)

1. I send a single message in a chunk and *wait* for confirmation from the
broker - I will always get chunks containing just one message. No
merging will happen.

2. I send single message chunks *without* waiting for confirmation, then
the chunk size is handled by the server.

Are both statements above correct?

Karl Nilsson

unread,
Aug 15, 2023, 5:53:01 AM8/15/23
to rabbitm...@googlegroups.com
1. will only be true if you are the only publisher to that stream at that time.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ZNtJvbjQtyCp1Lc4%40greymatter.whale.


--
Karl Nilsson

Artur Wroblewski

unread,
Aug 15, 2023, 6:14:20 AM8/15/23
to rabbitm...@googlegroups.com
On Tue, Aug 15, 2023 at 10:52:36AM +0100, Karl Nilsson wrote:
> 1. will only be true if you are the only publisher to that stream at that
> time.

Good to know. Thanks!
[...]

Francesco Pessina

unread,
Dec 6, 2023, 4:21:19 AM12/6/23
to rabbitmq-users
Hi kjnillson,
I took a look to the gen-batch-server, and it seems that the chunk dimension is determined when the mailbox of the gen-batch-server is full or (if the ingress rate is slow) by a timeout (which default is 5 seconds).
As I'm not so familiar with Erlang, I didn't figure out if and how this timeout is configurable?

How gen-batch-server is used in RabbitMQ? Which is the value of the timeout?

Thanks

kjnilsson

unread,
Dec 6, 2023, 4:39:32 AM12/6/23
to rabbitmq-users
There is no timeout, if there is only 1 message and no backlog the batch will only contain 1 message. We don't like artificially introduce delays like that.

Francesco Pessina

unread,
Dec 6, 2023, 4:43:25 AM12/6/23
to rabbitmq-users
So, how gen-batch-server determines whether to send a chunk or not?

The "Timeout" attribute of the call function how is used? (See here https://github.com/rabbitmq/gen-batch-server#callserverref-request-timeout---reply)

In general, I'd need some explanation on how single message are grouped in chunk and which is the login behind this.

kjnilsson

unread,
Dec 6, 2023, 4:56:48 AM12/6/23
to rabbitmq-users
gen batch server doesn't send chunks, it dequeues as many messages as it can from it's internal process mailbox and passes these messages as a list to the `handle_batch` callback. Once this callback returns it will go back to check the mailbox and so on. 

The timeout in call is for external callers into the gen batch server process. It is the same as all other gen abstractions.

Francesco Pessina

unread,
Dec 6, 2023, 4:58:38 AM12/6/23
to rabbitmq-users
Ok, sorry for my misunderstanding.
So, where can I find some documentation on where, how and when these chunks are generated?

kjnilsson

unread,
Dec 6, 2023, 5:02:54 AM12/6/23
to rabbitmq-users
why don't you spend a bit of time with gen-batch-server - write a simple application that uses it. Then you can get a bit more of a feel for it. We don't have any more documentation on the current behaviour as it is very much an internal consideration and not something that anyone would even need to work with from the outside. Also it could well be subject to change.
Reply all
Reply to author
Forward
0 new messages