Overall approach to consuming Darwin Push Port?

176 views
Skip to first unread message

Peter Wood

unread,
Jun 25, 2025, 1:02:59 PMJun 25
to A gathering place for the Open Rail Data community
Good Evening Folks,

I've been "tinkering" with the schedule feeds from TRUST (is that technically correct?), consuming the TOC snapshot and then daily updates, I've also been happily consuming VSTP and movements over STOMP. I've been able to get a resolved schedule, see the planned times, see the actual times, capture cancellations/reinstatements, etc - there are a few edge cases I need to iron out, but in general it's good. 

I've been ignoring (up until now) Darwin, however there's various pieces of information in Darwin I want, formations, loading, platform changes, estimates. I've got the PushPort feed from Kafka into my own message streaming solutions (NATS), so that side is all good.

My question is around the necessity of consuming both the Darwin TimeTable files and the PP feed, with the view I've already materialised the schedule from TRUST.

Can I "get away with" just consuming the contents of the kafka topic? Or will I be missing critical data if I don't consume the file based snapshots as well? 

My naïve view of the snapshots suggests there's no details in there that I haven't already materialised from ITPS files and feed. However, the one thing that does seem to necessitate it, is the mapping from the WTT UID/Date/Signalling ID combination to a RID?

Could someone just give me a nod if I've got the right view of the data?

A huge thanks to everyone who's contributed to the google group in the past, it's been hugely useful to read through and gain more understanding.

Incidentally, does anyone know what partition key is being used on the PP Kafka topics? I noticed that the topic has two partitions, and as such message times shuffle around now and again, depending where you are on each partition (no global ordering accross partitions in Kafka topics). Are we safe to assume messages about a RID (for example) will always be in the same partition, and thus there's no ordering concern?

Peter.


Peter Hicks

unread,
Jun 25, 2025, 1:14:27 PMJun 25
to openrail...@googlegroups.com
Hi Peter

On Wednesday, 25 June 2025 at 18:03, Peter Wood <pe...@alastria.net> wrote:

I've been ignoring (up until now) Darwin, however there's various pieces of information in Darwin I want, formations, loading, platform changes, estimates. I've got the PushPort feed from Kafka into my own message streaming solutions (NATS), so that side is all good.

My question is around the necessity of consuming both the Darwin TimeTable files and the PP feed, with the view I've already materialised the schedule from TRUST.

Can I "get away with" just consuming the contents of the kafka topic? Or will I be missing critical data if I don't consume the file based snapshots as well?

TRUST and Darwin are different systems, with TRUST being a historical view of 'what has just happened' and a coarse view of 'what might happen'.  Darwin is a system more concerned with what is about to happen, and can reflect in quite some detail things that TRUST cannot, such as:

  1. Trains terminating short, prior to the train arriving at the location where the train terminates
  2. Locations where a train is planned not to stop, prior to the train not stopping there (and I know there are sites which work this out historically, but it's no good if you're using to catch a train!)
  3. Additional stops

Rather than trying to merge two dissimilar worlds, I'd recommend creating a "TRUST view" and a "Darwin view".  Use TRUST for history, Darwin for now-and-future.  Avoid the temptation to try to resolve the differences between the two, because they're different beasts.

Incidentally, does anyone know what partition key is being used on the PP Kafka topics? I noticed that the topic has two partitions, and as such message times shuffle around now and again, depending where you are on each partition (no global ordering accross partitions in Kafka topics). Are we safe to assume messages about a RID (for example) will always be in the same partition, and thus there's no ordering concern?

I don't actually know the answer to this - but I know exactly who to ask!


Peter

RailAleFan

unread,
Jun 26, 2025, 12:03:36 PMJun 26
to A gathering place for the Open Rail Data community
Also interested in the answer to this - just starting out with RDM/PushPort and messages relating the same RID are being received on both partitions;

RdKafka\Message Object
(
    [err] => 0
    [topic_name] => prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-JSON
    [timestamp] => 1750953015301
    [partition] => 0
    [payload] => stdClass Object
        (
            [ts] => 2025-06-26T16:50:15.3018776+01:00
            [version] => 18.0
            [uR] => stdClass Object
                (
                    [updateOrigin] => TD
                    [TS] => stdClass Object
                        (
                            [rid] => 202506268019489
                            [uid] => P19489
                            [ssd] => 2025-06-26
                        )
                )
        )
    [len] => 3272
    [key] => {"messageID":"ID:liv1-dwnpp102-49752-638840716752370682-1:22:1:1:59026703"}
    [offset] => 123565974
    [headers] => Array
        (
        )
    [opaque] =>
)

RdKafka\Message Object
(
    [err] => 0
    [topic_name] => prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-JSON
    [timestamp] => 1750953015561
    [partition] => 1
    [payload] => stdClass Object
        (
            [ts] => 2025-06-26T16:50:15.5608327+01:00
            [version] => 18.0
            [uR] => stdClass Object
                (
                    [updateOrigin] => TD
                    [TS] => stdClass Object
                        (
                            [rid] => 202506268019489
                            [uid] => P19489
                            [ssd] => 2025-06-26
                        )
                )
        )
    [len] => 1160
    [key] => {"messageID":"ID:liv1-dwnpp102-49752-638840716752370682-1:22:1:1:59026709"}
    [offset] => 123568412
    [headers] => Array
        (
        )
    [opaque] =>
)


Cheers!

Peter Wood

unread,
Jun 26, 2025, 12:15:48 PMJun 26
to openrail...@googlegroups.com
Hello Peter et al,

Rather than trying to merge two dissimilar worlds, I'd recommend creating a "TRUST view" and a "Darwin view".  Use TRUST for history, Darwin for now-and-future.  Avoid the temptation to try to resolve the differences between the two, because they're different beasts.

Thanks for the overall recommendation, the temptation is there for sure. On the assumption that I build two views for actual "what's happened now", using Darwin for passenger, TRUST for other - do I really need to ingest the Darwin TimeTable files, or are we lucky enough that when trains are activated we get a full copy of their schedule over PP - or is that not guaranteed/is often missed?
 
Incidentally, does anyone know what partition key is being used on the PP Kafka topics?
I don't actually know the answer to this - but I know exactly who to ask!

RailAleFan's message has answered the practical side of it at least - sounds like I need to consume each partition seperately and reorder on my side in to a well ordered stream. Shouldn't be too hard, a bit of a pain though - though I was going to pre-process them into different subjects anyway... so... :shrug:.
 
Thanks for your advice!

P
--

Peter Hicks

unread,
Jun 26, 2025, 12:17:43 PMJun 26
to openrail...@googlegroups.com

On Thursday, 26 June 2025 at 17:03, RailAleFan <raila...@gmail.com> wrote:

Also interested in the answer to this - just starting out with RDM/PushPort and messages relating the same RID are being received on both partitions

Just to set expectations, due to summer having arrived and people taking holiday, it might be a couple of weeks before I find out the answer.

It's possible the partition key isn't being set properly and is doing some kind of round-robin, but rather than speculate, let's wait for the concrete answer.


Peter

Peter Hicks

unread,
Jun 26, 2025, 12:29:22 PMJun 26
to openrail...@googlegroups.com
Hi Peter

On Thursday, 26 June 2025 at 17:15, Peter Wood <pe...@alastria.net> wrote:

Thanks for the overall recommendation, the temptation is there for sure. On the assumption that I build two views for actual "what's happened now", using Darwin for passenger, TRUST for other - do I really need to ingest the Darwin TimeTable files, or are we lucky enough that when trains are activated we get a full copy of their schedule over PP - or is that not guaranteed/is often missed?

I'd always ingest the Darwin timetable, as my use-cases rely on having details of everything, including buses and non-running train services.  But if you just want a subset of data, you might be able to get away with it.  I'm not sure about the licence conditions and whether you're still compliant with it if you do this, that'll be a question RDG can answer directly.


Peter

David Wheatley

unread,
Aug 21, 2025, 11:45:03 AMAug 21
to A gathering place for the Open Rail Data community
Hi Peter,

Did you ever hear back about RDM Push Port wrt the partitions?

Receiving messages about the same service (potentially) out-of-order due to this issue is currently blocking one of the projects I was working on.

Thanks,
David

Peter Hicks

unread,
Aug 21, 2025, 11:47:36 AMAug 21
to openrail...@googlegroups.com
Hi David

On Thursday, 21 August 2025 at 16:45, 'David Wheatley' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:

Did you ever hear back about RDM Push Port wrt the partitions?

Receiving messages about the same service (potentially) out-of-order due to this issue is currently blocking one of the projects I was working on.

Nothing back so far, and it has been a few weeks.  I'll give out some gentle nudges.


Peter


Reply all
Reply to author
Forward
0 new messages