Issue with Darwin Push Port via Rail Data Marketplace?

291 views
Skip to first unread message

Ian Morrison

unread,
Mar 9, 2026, 2:42:37 PMMar 9
to A gathering place for the Open Rail Data community
Hi everyone,

Is it just me not getting any messages delivered by the Darwin Push Port via Rail Data Marketplace? I'm subscribed to the XML feed. It failed at some point this morning, and has been down all day for me. I've checked the username/password etc., and all are correct. The program seems to create the Consumer object fine, but it sits forever waiting on a message to arrive, which never does. 

Regards,
Ian

David Wheatley

unread,
Mar 9, 2026, 2:46:05 PMMar 9
to openrail...@googlegroups.com
Nothing since 10:10:32 for me either.

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/d77cd780-1ea4-4acd-8ed0-6543bbac6302n%40googlegroups.com.

Peter Hicks

unread,
Mar 9, 2026, 2:47:13 PMMar 9
to openrail...@googlegroups.com
Hi Ian

On Monday, 9 March 2026 at 18:42, Ian Morrison <ianmorr...@gmail.com> wrote:

Is it just me not getting any messages delivered by the Darwin Push Port via Rail Data Marketplace? I'm subscribed to the XML feed. It failed at some point this morning, and has been down all day for me. I've checked the username/password etc., and all are correct. The program seems to create the Consumer object fine, but it sits forever waiting on a message to arrive, which never does.

It's not just you - I've logged on to the RDM and clicked 'Preview' on the data product page (https://raildata.org.uk/dashboard/dataProduct/P-3f10bf96-d8e8-4041-aa5e-d75d82c45c4e/pubSub), and I'm not seeing any data.

Since there's no heartbeat to identify whether the problem is within the RDM or upstream, you're best off opening a support ticket on the RDM.


Peter

Ian Morrison

unread,
Mar 9, 2026, 2:48:07 PMMar 9
to A gathering place for the Open Rail Data community
Thanks Peter, will do so now.

David Wheatley

unread,
Mar 9, 2026, 2:48:37 PMMar 9
to openrail...@googlegroups.com
I have opened RDM0003126. I'll reply if I hear anything.

David

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Ian Morrison

unread,
Mar 9, 2026, 2:50:56 PMMar 9
to A gathering place for the Open Rail Data community
I've also just opened a ticket,  RDM0003127. Hopefully it will get sorted soon.

Alex Barber

unread,
Mar 10, 2026, 5:14:26 AMMar 10
to openrail...@googlegroups.com
Has RDM responded at all - it's concerning that this has been down for so long.

Regards,

AleBarber

Alex Barber

unread,
Mar 10, 2026, 5:33:54 AMMar 10
to openrail...@googlegroups.com
And just as I send that we seem to be receiving data.


AleBarber

goo...@notacrab.net

unread,
Mar 10, 2026, 11:21:22 AMMar 10
to A gathering place for the Open Rail Data community
Hi,

I got a response to my ticket asking when it'd be up, but they didn't answer my other questions about what kind of monitoring / oncall provisons they have :(

Best,

Jez Smith

unread,
Mar 10, 2026, 11:52:50 AMMar 10
to A gathering place for the Open Rail Data community
Afternoon all,

RDM picked up the ticket at 8.50am this morning and the issue was resolved around 9.30.  Only P1 incidents (where the whole RDM platform is down) are dealt with out of hours, so I would encourage everyone to raise an incident on RDM when you identify an issue.  We don't monitor this wiki for problems.

Right now, we are trying to get to the bottom of the root cause, to see if there is anything to do there.  We are also looking at ways in which we can monitor data feeds from the publisher.  This is easier with APIs where we can set a threshold, but trickier for pub/sub.  If we can identify a reliable way, we will implement it.

...but belt and braces; if you identify a problem, please raise an incident on RDM.  We would rather have 2 incidents for the same thing than none.

Cheers

Jez

David Wheatley

unread,
Mar 10, 2026, 12:06:00 PMMar 10
to openrail...@googlegroups.com
Hi Jez,

Right now, we are trying to get to the bottom of the root cause, to see if there is anything to do there.  We are also looking at ways in which we can monitor data feeds from the publisher.  This is easier with APIs where we can set a threshold, but trickier for pub/sub.  If we can identify a reliable way, we will implement it.

Darwin has heartbeat messages on the Pub/Sub feed every 60 seconds as per the Push Port spec, so in this case downtime monitoring could be as simple as detecting if no messages are received for more than a couple of minutes.

Cheers,
David

Tom Cairns

unread,
Mar 10, 2026, 12:22:50 PMMar 10
to openrail...@googlegroups.com
I’m getting real de ja vu here for those of us who went through the worst bits of the NROD outages. 

It is really concerning to understand that RDM doesn’t individually monitor one of perhaps the most critical systems in terms of downstream usage. As far as I suspect most people here will be concerned, Darwin is an RDG product on an RDG platform and the intricacies of how it gets there is entirely unimportant to those who use it.

I think this raises very serious questions about whether RDM can be relied upon for production use if it was down for nearly 24 hours.

Tom

Seb Dazeley

unread,
Mar 10, 2026, 12:37:49 PMMar 10
to openrail...@googlegroups.com
With the National Rail Data Portal closing down soon, it's especially important that the Darwin feed on RDM can be relied upon because it will be the only way non-industry users can access it. 
--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

David Wheatley

unread,
Mar 21, 2026, 6:13:30 AM (7 days ago) Mar 21
to openrail...@googlegroups.com
Is anyone else not receiving messages from the push port again, or is it just me?

David

Peter Hicks

unread,
Mar 21, 2026, 6:21:16 AM (7 days ago) Mar 21
to openrail...@googlegroups.com
Hi David

On Saturday, 21 March 2026 at 10:13, 'David Wheatley' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:

Is anyone else not receiving messages from the push port again, or is it just me?

Push Port via RDM?  No - the last message that was queued was timestamped 08:47:32:

image.png

Push Port direct?  Yup, working fine.  Push Port via NRDP?  Fine too.

I find it really​ difficult to champion use of the RDM when outages like this happen.  Combined with the difficulty in obtaining and processing a snapshot, I wouldn't be happy running anything on my side off the Push Port data via RDM.

Just to put some context on it, a direct Push Port setup is just a case of consuming messages from an ActiveMQ server.  Very little stateful data there and something like a 5-10 minute expiry on messages.  There are lots of strategies for high availability in this kind of scenario.


Peter

David Wheatley

unread,
Mar 21, 2026, 6:25:52 AM (7 days ago) Mar 21
to openrail...@googlegroups.com
I have raised a ticket, RDM0003159.

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Adam Williams

unread,
Mar 21, 2026, 8:55:04 AM (7 days ago) Mar 21
to A gathering place for the Open Rail Data community
I say this in a personal capacity:

>  I think this raises very serious questions about whether RDM can be relied upon for production use if it was down for nearly 24 hours.

I think it's unfortunately looking abundantly clear now it's entirely unsuitable for any sort of production use-case.

The response to incidents shouldn't depend on RDM tickets being raised (though I'm glad to hear folks are sparing time out of their day and effectively doing free QA for Rail Delivery Group to raise tickets that flag up problems): someone should be paged as soon as it stops publishing messages which, as David has already pointed out, is a trivial healthcheck from a technical perspective to put in place. These are basic measures that a 2 person tech start-up can get right.

This particular feed should probably be considered one of the flagship data products on RDM that sets the standard for other products. Things need to change both in terms of technical architecture and the approach to service management if the marketplace can remotely be considered fit-for-purpose at doing its job of delivering data - rather than just being an increasingly flaky single point of failure that sits in front of other systems that (for the most part) do work.

I was critical when it came to the GEMINI issues, but it's become clearer that that technical architecture was yet another symptom of a lot of investment being directed to some places, and not enough to the places that actually matter when it comes to delivering a reliable, fit-for-purpose technical service.

> Just to put some context on it, a direct Push Port setup is just a case of consuming messages from an ActiveMQ server.  Very little stateful data there and something like a 5-10 minute expiry on messages.  There are lots of strategies for high availability in this kind of scenario.

Yes; I hope RDG will now look to engage with the people that can advise them on how to approach this properly (and probably could've advised on the problems with e.g. the broken partitioning approach from the start, and how to engage with the community).

Peter Hicks

unread,
Mar 21, 2026, 9:06:23 AM (7 days ago) Mar 21
to openrail...@googlegroups.com
On Saturday, 21 March 2026 at 12:55, Adam Williams <adam.lu....@gmail.com> wrote:

Yes; I hope RDG will now look to engage with the people that can advise them on how to approach this properly (and probably could've advised on the problems with e.g. the broken partitioning approach from the start, and how to engage with the community).

Given my long and proven history with Open Data and 'getting things working properly for everyone', I'd like to think I'd fit in to this paid role perfectly.  However, there is apparently no budget available, so I just have to sit by and let things crash and burn.


Peter

jezinwo...@gmail.com

unread,
Mar 21, 2026, 11:55:46 AM (7 days ago) Mar 21
to openrail...@googlegroups.com, openrail...@googlegroups.com
David,

The feed should have been up and working again from around 12.40.

Hope that you are now seeing messages coming through.

The team will do a root cause on Monday.

Cheers

Jez

On 21 Mar 2026, at 13:06, 'Peter Hicks' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Chris Bagnall

unread,
Mar 23, 2026, 5:51:02 AM (5 days ago) Mar 23
to A gathering place for the Open Rail Data community
Hi all,

We also had the same issue on our test system.  Unfortunately, I did not check until around 9pm on Saturday (I was actually checking my data gathering for checking the ordering of messages).  This logging comes from the C# Confluent Kafka library:

2026-03-21 02:16:23.966 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-XML [0]: desired partition is no longer available (Local: Unknown partition)
2026-03-21 02:16:23.967 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-XML [1]: desired partition is no longer available (Local: Unknown partition)
2026-03-21 02:16:23.968 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-XML [0]: topic does not exist (Broker: Unknown topic or partition)
2026-03-21 02:16:23.969 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-XML [1]: topic does not exist (Broker: Unknown topic or partition)
2026-03-21 02:16:36.574 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: prod-1010-Darwin-Train-Information-Push-Port-IIII2_0-XML [1]: desired partition is not available (Local: Unknown partition)

After this, there is no logging (or messages received) until I restarted the interface, then it worked normally.

I note that the interface quite happily maintained the connection later that evening despite more problems.  Log sample:

2026-03-21 22:59:17.998 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: GroupCoordinator: b1-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092: Disconnected (after 5725186ms in state UP)
2026-03-21 22:59:27.479 +00:00 [ERR] [thrd:sasl_ssl://b3-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/]: sasl_ssl://b3-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/3: Failed to resolve 'b3-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092': No such host is known.  (after 11054ms in state CONNECT)
2026-03-21 22:59:27.481 +00:00 [ERR] [thrd:sasl_ssl://b2-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/]: sasl_ssl://b2-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/2: Failed to resolve 'b2-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092': No such host is known.  (after 11054ms in state CONNECT)
2026-03-21 22:59:27.482 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: sasl_ssl://b3-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/3: Failed to resolve 'b3-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092': No such host is known.  (after 11054ms in state CONNECT)
2026-03-21 22:59:27.483 +00:00 [ERR] [thrd:app]: rdkafka#consumer-2: sasl_ssl://b2-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092/2: Failed to resolve 'b2-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092': No such host is known.  (after 11054ms in state CONNECT)
2026-03-21 22:59:27.644 +00:00 [ERR] [thrd:GroupCoordinator]: GroupCoordinator: b1-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092: Failed to resolve 'b1-pkc-z3p1v0.europe-west2.gcp.confluent.cloud:9092': No such host is known.  (after 9645ms in state CONNECT)

I have suspicions as to what the problem is, but I have not yet had time to investigate.

Cheers,
Chris
Reply all
Reply to author
Forward
0 new messages