Kafka Questions

60 views
Skip to first unread message

Phil Wieland

unread,
Oct 3, 2025, 3:56:27 AM (12 days ago) Oct 3
to A gathering place for the Open Rail Data community
I've been experimenting with the splendid Passenger Train Allocation and Consist feed this week, and very useful it is.

But I've got a couple of questions at the Kafka level:
1. If I want two completely independent systems to consume the feed (e.g. test server and live server) what settings have to be different?  I made sure the client.id was unique, which is what is needed on the STOMP feeds, but I found the consumers were sharing the data, getting half each.

2.  For this feed what is the preferred Commit strategy.  Every message, time based, message count based, or something else?

TIA for any advice,

Phil

Peter Hicks

unread,
Oct 3, 2025, 4:18:44 AM (11 days ago) Oct 3
to openrail...@googlegroups.com
Hi Phil

On Friday, 3 October 2025 at 08:56, Phil Wieland <philw...@gmail.com> wrote:

I've been experimenting with the splendid Passenger Train Allocation and Consist feed this week, and very useful it is.

I know several people who will be very happy with your feedback!

But I've got a couple of questions at the Kafka level:

1. If I want two completely independent systems to consume the feed (e.g. test server and live server) what settings have to be different? I made sure the client.id was unique, which is what is needed on the STOMP feeds, but I found the consumers were sharing the data, getting half each.

Unfortunately (you can guess what I'm about to say), RDM doesn't have a concept of live or test environments. This is a particular issue for me as I consume a lot of the static data sources by having RDM push them to an S3 bucket on my side.  Despite the fact I can have multiple file destinations, I can only send one datasource to one of them at any time.

This is less of an issue for APIs, as you can probably get away with using the same credentials in two environments.  I'll speak to RDG and put this forward as an enhancement with justification (the difficult bit), albeit a significant one.

2. For this feed what is the preferred Commit strategy. Every message, time based, message count based, or something else?

I'm not sure.  If you're reading and processing messages one at a time, I'd commit after each message to ensure you consume each message exactly once, but this may have a performance impact - but since messages are produced every five minutes and have an expiry of (I think) an hour, this might not be an issue as it's not truly real-time.  However, if you're reading messages and re-queueing them locally, I'd do time or message count commits.  Finally, if you have multiple consumers with the same consumer group, I'm not entirely sure of the best way to handle it - you could have consumer 1 process a message and fail, but consumer 2 succeed and commit a higher offset than consumer 1, so you'd lose the message that consumer 1 failed to process.


Peter

Gaelan Steele

unread,
Oct 3, 2025, 4:26:23 AM (11 days ago) Oct 3
to openrail...@googlegroups.com
Hi Phil,
> On Oct 3, 2025, at 8:56 AM, Phil Wieland <philw...@gmail.com> wrote:
>
> 1. If I want two completely independent systems to consume the feed (e.g. test server and live server) what settings have to be different? I made sure the client.id was unique, which is what is needed on the STOMP feeds, but I found the consumers were sharing the data, getting half each.

I believe the relevant Kafka concept is the consumer group. But RDM seems to specify one consumer group for you to use, so I’m not sure if you have the flexibility needed here.

Best wishes,
Gaelan

Ben Woodward

unread,
Oct 3, 2025, 4:30:07 AM (11 days ago) Oct 3
to openraildata-talk
Yes, you would need a second consumer group.

I'm using Redpanda connect to pull it into my own queue and then splitting off that, I could knock up a docker compose if you wanted. 

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/AE9E660C-01F4-4807-8BBC-B0B4B3D6D628%40canishe.com.

Phil Wieland

unread,
Oct 3, 2025, 4:39:43 AM (11 days ago) Oct 3
to A gathering place for the Open Rail Data community
 Sigh.  When you've been coding as long as I have (Pause, counts decades on fingers, er, fifty-five years) this modern fatware seems very confusing and, in this case, unhelpful.  I guess I'll be needing a second RDM account for my test environment if I want to run a test server.

Ben:  But what about the test Redpanda machine?  That would need a second account as well?

Anyway, many thanks to all for the helpful advice.

Phil 

Ben Woodward

unread,
Oct 3, 2025, 4:49:44 AM (11 days ago) Oct 3
to openraildata-talk
No, it would only use one RDM account, and then you could set up however many consumer groups as you need and hang them off it. 

You could also use Kafka, RabbitMQ, NATS or any other similar solution

David Wheatley

unread,
Oct 3, 2025, 4:53:32 AM (11 days ago) Oct 3
to openrail...@googlegroups.com
A week and a bit ago I posted on the RDM community about wanting support for additional consumer groups for this exact reason of per-env separation. My suggestion of up to 4 consumer groups per user came from the idea someone might have a local, development, staging and production environment.

Currently it means we only have Gemini ingestion in our prod environment as I was in a bit of a rush to integrate the feed so haven't set up any kind of connector into our own message queue.


I did also check, and RDM does actually validate the consumer group you provide when connecting to the feed.

David

On Fri, 3 Oct 2025, 09:26 Gaelan Steele, <g...@canishe.com> wrote:

Gaelan Steele

unread,
Oct 3, 2025, 5:23:23 AM (11 days ago) Oct 3
to openrail...@googlegroups.com


> On Oct 3, 2025, at 9:53 AM, 'David Wheatley' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:
>
> A week and a bit ago I posted on the RDM community about wanting support for additional consumer groups for this exact reason of per-env separation. My suggestion of up to 4 consumer groups per user came from the idea someone might have a local, development, staging and production environment.

Ideally there wouldn’t be any sort of arbitrary limit at all - I don’t get the impression consumer groups are particularly heavyweight, so I doubt there’s a technical need for one. But yes, any more than one consumer group would be very welcome.

Best wishes,
Gaelan
Reply all
Reply to author
Forward
0 new messages