Preserving offsets when re-creating a Stream

91 views
Skip to first unread message

Nicolas Piguet

unread,
Jun 7, 2023, 11:17:21 AM6/7/23
to rabbitmq-users
Hello,

I would like to know if there is a way, when creating a stream, to specify the offset that is going to be assigned to the first message published to a stream.

This is important for us, because we track our position in the stream through offsets (though those offsets are persisted in the DB). We also have data integrity checks that verify that every message that is applied to an entity has been published after the previous message that was already applied to this entity. We do this by comparing offsets.

However, this whole mechanism breaks down if for whatever reason (operating error, hardware crash, migrating the streams to a different RMQ cluster, etc..), the stream has to be re-created. If we do this and the offset of the new streams are reset to 0, all our integrity checks will fail, and our systems will basically be dead in the water.

I looked in the documentation, but could not find any reference to a queue argument that could be used to set the initial message offset of a stream. Does this exist?

We could implement a workaround on our side by tracking (streamVersion, offset) where streamVersion is incremented each time we handle a message with offset 0, but it seems somewhat brittle and adds some extra complexity that we would rather avoid, if possible.

Thanks for your help,

Nicolas Piguet

kjnilsson

unread,
Jun 7, 2023, 11:55:47 AM6/7/23
to rabbitmq-users
You could perhaps store a tuple of (offset, timestamp) to detect the valid case where the offsets go backwards (but timestamp is greater).?

Nicolas Piguet

unread,
Jun 8, 2023, 6:33:58 AM6/8/23
to rabbitmq-users
Yes, this is similar to the workaround I planed to use. In our particular architecture, tough, using timestamps will open the door to annoying race conditions. (We use an active-active "competing consumers" architecture, where the same message is handled concurrently by multiple consumers, the first one who finishes wins. This allows us to smooth out some latency peaks and allows for 0-downtime upgrades or failure recovery). It may also not be enough to establish an absolute order if there are multiple overlapping offset ranges (not likely, but people screwing things up multliple times in a row has happened in the past :-) )

That being said, it doesn't directly answer my question: Is there a mechanism in the RMQ config that allows me to create a stream where I can specify the offset that will be assigned to the very first message that is published to the stream?

This would be a simple solution to our problem.

Arnaud Cogoluègnes

unread,
Jun 8, 2023, 8:58:00 AM6/8/23
to rabbitmq-users
There's no such mechanism for now, but that's something that could be added. We'll follow up on this thread.

Artur Wroblewski

unread,
Jun 8, 2023, 9:38:23 AM6/8/23
to rabbitm...@googlegroups.com
I have similar problem in one of my application. I solved it by using UUID
when publishing messages, using timestamp offset when reading messages from
a stream, and SQL merge when adding data into a database.

I wonder if it would be feasible for RabbitMQ to support UUID ver. 7 for
stream offset value

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-uuid-version-7

IMHO, it could to simplify the issue of a stream data to be synchronized
with 3rd party storage.

Best regards,

Artur

--
https://mortgage.diy-labs.eu/

Mc Polu

unread,
Jun 9, 2023, 2:45:19 AM6/9/23
to rabbitmq-users
Hello, this would be useful for me too, the ability to specify a start number for IDs
Reply all
Reply to author
Forward
0 new messages