Overall approach to consuming Darwin Push Port?

Peter Wood

unread,

Jun 25, 2025, 1:02:59 PM6/25/25

to A gathering place for the Open Rail Data community

Good Evening Folks,

I've been "tinkering" with the schedule feeds from TRUST (is that technically correct?), consuming the TOC snapshot and then daily updates, I've also been happily consuming VSTP and movements over STOMP. I've been able to get a resolved schedule, see the planned times, see the actual times, capture cancellations/reinstatements, etc - there are a few edge cases I need to iron out, but in general it's good.

I've been ignoring (up until now) Darwin, however there's various pieces of information in Darwin I want, formations, loading, platform changes, estimates. I've got the PushPort feed from Kafka into my own message streaming solutions (NATS), so that side is all good.

My question is around the necessity of consuming both the Darwin TimeTable files and the PP feed, with the view I've already materialised the schedule from TRUST.

Can I "get away with" just consuming the contents of the kafka topic? Or will I be missing critical data if I don't consume the file based snapshots as well?

My naïve view of the snapshots suggests there's no details in there that I haven't already materialised from ITPS files and feed. However, the one thing that does seem to necessitate it, is the mapping from the WTT UID/Date/Signalling ID combination to a RID?

Could someone just give me a nod if I've got the right view of the data?

A huge thanks to everyone who's contributed to the google group in the past, it's been hugely useful to read through and gain more understanding.

Incidentally, does anyone know what partition key is being used on the PP Kafka topics? I noticed that the topic has two partitions, and as such message times shuffle around now and again, depending where you are on each partition (no global ordering accross partitions in Kafka topics). Are we safe to assume messages about a RID (for example) will always be in the same partition, and thus there's no ordering concern?

Peter.

Peter Hicks

unread,

Jun 25, 2025, 1:14:27 PM6/25/25

to openrail...@googlegroups.com

Hi Peter

On Wednesday, 25 June 2025 at 18:03, Peter Wood <pe...@alastria.net> wrote:

I've been ignoring (up until now) Darwin, however there's various pieces of information in Darwin I want, formations, loading, platform changes, estimates. I've got the PushPort feed from Kafka into my own message streaming solutions (NATS), so that side is all good.

My question is around the necessity of consuming both the Darwin TimeTable files and the PP feed, with the view I've already materialised the schedule from TRUST.

Can I "get away with" just consuming the contents of the kafka topic? Or will I be missing critical data if I don't consume the file based snapshots as well?

TRUST and Darwin are different systems, with TRUST being a historical view of 'what has just happened' and a coarse view of 'what might happen'. Darwin is a system more concerned with what is about to happen, and can reflect in quite some detail things that TRUST cannot, such as:

Trains terminating short, prior to the train arriving at the location where the train terminates
Locations where a train is planned not to stop, prior to the train not stopping there (and I know there are sites which work this out historically, but it's no good if you're using to catch a train!)
Additional stops

Rather than trying to merge two dissimilar worlds, I'd recommend creating a "TRUST view" and a "Darwin view". Use TRUST for history, Darwin for now-and-future. Avoid the temptation to try to resolve the differences between the two, because they're different beasts.

Incidentally, does anyone know what partition key is being used on the PP Kafka topics? I noticed that the topic has two partitions, and as such message times shuffle around now and again, depending where you are on each partition (no global ordering accross partitions in Kafka topics). Are we safe to assume messages about a RID (for example) will always be in the same partition, and thus there's no ordering concern?

I don't actually know the answer to this - but I know exactly who to ask!

Peter

RailAleFan

unread,

Jun 26, 2025, 12:03:36 PM6/26/25

to A gathering place for the Open Rail Data community

Also interested in the answer to this - just starting out with RDM/PushPort and messages relating the same RID are being received on both partitions;

RdKafka\Message Object
(
[err] => 0
[topic_name] => prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-JSON
[timestamp] => 1750953015301
[partition] => 0
[payload] => stdClass Object
(
[ts] => 2025-06-26T16:50:15.3018776+01:00
[version] => 18.0
[uR] => stdClass Object
(
[updateOrigin] => TD
[TS] => stdClass Object
(
[rid] => 202506268019489
[uid] => P19489
[ssd] => 2025-06-26
)
)
)
[len] => 3272
[key] => {"messageID":"ID:liv1-dwnpp102-49752-638840716752370682-1:22:1:1:59026703"}
[offset] => 123565974
[headers] => Array
(
)
[opaque] =>
)

RdKafka\Message Object
(
[err] => 0
[topic_name] => prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-JSON
[timestamp] => 1750953015561
[partition] => 1
[payload] => stdClass Object
(
[ts] => 2025-06-26T16:50:15.5608327+01:00
[version] => 18.0
[uR] => stdClass Object
(
[updateOrigin] => TD
[TS] => stdClass Object
(
[rid] => 202506268019489
[uid] => P19489
[ssd] => 2025-06-26
)
)
)
[len] => 1160
[key] => {"messageID":"ID:liv1-dwnpp102-49752-638840716752370682-1:22:1:1:59026709"}
[offset] => 123568412
[headers] => Array
(
)
[opaque] =>
)

Cheers!

Peter Wood

unread,

Jun 26, 2025, 12:15:48 PM6/26/25

to openrail...@googlegroups.com

Hello Peter et al,

Rather than trying to merge two dissimilar worlds, I'd recommend creating a "TRUST view" and a "Darwin view". Use TRUST for history, Darwin for now-and-future. Avoid the temptation to try to resolve the differences between the two, because they're different beasts.

Thanks for the overall recommendation, the temptation is there for sure. On the assumption that I build two views for actual "what's happened now", using Darwin for passenger, TRUST for other - do I really need to ingest the Darwin TimeTable files, or are we lucky enough that when trains are activated we get a full copy of their schedule over PP - or is that not guaranteed/is often missed?

Incidentally, does anyone know what partition key is being used on the PP Kafka topics?

I don't actually know the answer to this - but I know exactly who to ask!

RailAleFan's message has answered the practical side of it at least - sounds like I need to consume each partition seperately and reorder on my side in to a well ordered stream. Shouldn't be too hard, a bit of a pain though - though I was going to pre-process them into different subjects anyway... so... :shrug:.

Thanks for your advice!

P

--

Peter Wood

pe...@alastria.net

Peter Hicks

unread,

Jun 26, 2025, 12:17:43 PM6/26/25

to openrail...@googlegroups.com

On Thursday, 26 June 2025 at 17:03, RailAleFan <raila...@gmail.com> wrote:

Also interested in the answer to this - just starting out with RDM/PushPort and messages relating the same RID are being received on both partitions

Just to set expectations, due to summer having arrived and people taking holiday, it might be a couple of weeks before I find out the answer.

It's possible the partition key isn't being set properly and is doing some kind of round-robin, but rather than speculate, let's wait for the concrete answer.

Peter

Peter Hicks

unread,

Jun 26, 2025, 12:29:22 PM6/26/25

to openrail...@googlegroups.com

Hi Peter

On Thursday, 26 June 2025 at 17:15, Peter Wood <pe...@alastria.net> wrote:

Thanks for the overall recommendation, the temptation is there for sure. On the assumption that I build two views for actual "what's happened now", using Darwin for passenger, TRUST for other - do I really need to ingest the Darwin TimeTable files, or are we lucky enough that when trains are activated we get a full copy of their schedule over PP - or is that not guaranteed/is often missed?

I'd always ingest the Darwin timetable, as my use-cases rely on having details of everything, including buses and non-running train services. But if you just want a subset of data, you might be able to get away with it. I'm not sure about the licence conditions and whether you're still compliant with it if you do this, that'll be a question RDG can answer directly.

Peter

David Wheatley

unread,

Aug 21, 2025, 11:45:03 AM8/21/25

to A gathering place for the Open Rail Data community

Hi Peter,

Did you ever hear back about RDM Push Port wrt the partitions?

Receiving messages about the same service (potentially) out-of-order due to this issue is currently blocking one of the projects I was working on.

Thanks,

David

Peter Hicks

unread,

Aug 21, 2025, 11:47:36 AM8/21/25

to openrail...@googlegroups.com

Hi David

On Thursday, 21 August 2025 at 16:45, 'David Wheatley' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:

Did you ever hear back about RDM Push Port wrt the partitions?

Receiving messages about the same service (potentially) out-of-order due to this issue is currently blocking one of the projects I was working on.

Nothing back so far, and it has been a few weeks. I'll give out some gentle nudges.

Peter

Peter Hicks

unread,

Sep 19, 2025, 7:05:41 AM9/19/25

to openrail...@googlegroups.com

Hi David

I still haven't heard anything back (which is a pity) but I have found the answer in troubleshooting the consist feed!

It looks like there are two partitions, numbered '0' and '1'. The partition is selected based on a hash of the message key, which has been set to the message ID from the Darwin Push Port ActiveMQ server, e.g. {"messageID": "ID:liv1-dwnpp102-49747-638894340902160133-1:22:1:1:102857828"}.

So this will likely be the reason you're getting some messages out-of-order. I'll feed it back upstream. Ideally, the RID would be the key, but it's possible to have messages referencing more than one RID (e.g. split/join/link associations). The ATOC code might be a better option, but I'll leave that up to those that run the platform.

Best wishes,

Peter

Andrew Belcher

unread,

Oct 27, 2025, 11:55:57 AM10/27/25

to A gathering place for the Open Rail Data community

Has there been any response/progress on this? I see two fairly significant problems with multiple partitions that are potentially going to make consumers significantly more complex:

1. Order of processing of the messages for a single train actually matters. It feels like this could be solved by ensuring a sensible split for the partitions (e.g. RID or ATOC). Split/join/link associations may not be too big a concern though, as I don't believe they really modify the data in a conflicting manner?

2. We are currently monitoring the sequence numbers to check for missing messages. I guess this could be done at a per partition level without too much difficulty, but definitely seems less than ideal.

Having had a play with trying to implement it (as a result of the weekends STOMP issues), I think it's going to take a lot of consumer code to handle re-ordering the messages effectively and accurately. I was struggling to completely eliminate missed sequences and it wasn't clear if that was due to issues with re-ordering my side.