Am I counting trains correctly?

62 views
Skip to first unread message

Jack Murphy

unread,
Sep 11, 2025, 9:55:29 AM (10 days ago) Sep 11
to A gathering place for the Open Rail Data community
Hi everyone,

I am using the Train Movement feed to count the number of delayed trains (per train company) over the course of a day. There are two flaws with my approach currently, one of which I'd like to address here.

The number of delayed trains just seems implausibly high. It has recorded 130 delays for Northern Rail in 1 and a half hours. 

My methodology is to check the train ID, and if the train ID has not been seen before, and if the variation status is "LATE", then I add it to a table where each row is a different train company. 

Here is my code in case it is useful:

            if train_entry["train_id"] not in registered_journeys and train_entry["variation_status"] == "LATE":

                registered_journeys.append(train_entry["train_id"])

                train_entry["toc_id"] = company_lookup[train_entry["toc_id"]]      

                # this is also where the leaderboard is updated, alongside updating the allow_message variable
                update_leaderboard(train_entry, self.delayed_train_leaderboard)

                allow_message = True

&

def update_leaderboard(train, delayed_train_leaderboard):

    leaderboard = delayed_train_leaderboard["leaderboard"]
    toc_id = train["toc_id"]

    if toc_id not in leaderboard:
        leaderboard[train["toc_id"]] = 1
        pp.pprint(leaderboard)


    elif toc_id in leaderboard:
        leaderboard[train["toc_id"]] += 1
        pp.pprint(leaderboard)


So does anybody know what's going wrong here? Does 'train ID' change for the same train at some point?? Is one of the other fields better for me to use, like  "train_service_code"? The same journeys must in some way be recounted, perhaps many times.

Will appreciate advice on this. 

All the best,
Jack Murphy

Matthew Burdett

unread,
Sep 11, 2025, 10:41:53 AM (9 days ago) Sep 11
to openrail...@googlegroups.com
If I was doing it, my logic would be the same as yours...look up distinct trust id late movements. Although id personally set my lateness tolerance to maybe 3 minutes+ . I don't think I'd care if a train was 60 seconds late as I don't know how reliable the report is found to be. GPS from Darwin then fine maybe, but if it's TD then I don't know. For my area there's no TD between Newhaven  and  seaford so id be relying on berth offset from smart and I've had some speed restrictions lately at tidemills so I wouldn't trust it being completely accurate for lateness reporting.

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/d6d13647-c1fe-4665-9bd7-5609e5c4e252n%40googlegroups.com.

Peter Hicks

unread,
Sep 11, 2025, 10:44:09 AM (9 days ago) Sep 11
to openrail...@googlegroups.com
Hi Jack

On Thursday, 11 September 2025 at 14:55, Jack Murphy <jackfran...@outlook.com> wrote:

I am using the Train Movement feed to count the number of delayed trains (per train company) over the course of a day. There are two flaws with my approach currently, one of which I'd like to address here.

The number of delayed trains just seems implausibly high. It has recorded 130 delays for Northern Rail in 1 and a half hours.

My methodology is to check the train ID, and if the train ID has not been seen before, and if the variation status is "LATE", then I add it to a table where each row is a different train company.

There are two problems here.

First, you are comparing 'late' and 'delayed'.  Briefly, a delay is a change in lateness, and lateness is the number of minutes behind a scheduled time that an event occurs.  They are very different things - a train will incur a delay and then run late.

Second, you haven't said what your train_id value is.  If it's four characters in the format NANN (number, alpha, number, number), then you're going to find there are numerous trains with the same identifier running on a particular day.  But if it's the 10-character TRUST train identity, you're on the right path.

I'd recommend you write some unit tests, providing your code with some specific inputs and checking its outputs.

So does anybody know what's going wrong here? Does 'train ID' change for the same train at some point?? Is one of the other fields better for me to use, like "train_service_code"? The same journeys must in some way be recounted, perhaps many times.

Train IDs are assigned by TRUST and for passenger trains, and do not change mid-journey.  For non-passenger movements, primarily those run by freight operators, the TRUST identifier may change but only the train class, and only where it makes sense.

Train service codes apply to groups of services, and a train may run under different service codes during their journey.  It doesn't sound like this is the unique identifier you need.

A final question - is this a programming exercise to 'do something' (which is fine - better than working entirely in the theoretical world!), or are you looking to design a metric to show something about performance?


Peter

Evelyn Snow

unread,
Sep 11, 2025, 10:47:59 AM (9 days ago) Sep 11
to openrail...@googlegroups.com
Hi Jack,

If the data is from roughly the time of writing, this seems extremely plausible to me. I was on my
way home from work in Huddersfield - there were delays due to a thunderstorm, and the signalling
got hit by lightning!

Evelyn

Jack Murphy

unread,
Sep 11, 2025, 12:00:17 PM (9 days ago) Sep 11
to A gathering place for the Open Rail Data community
Okay, my most basic response here is to ask how to quote text and have it turn up purple and indented :P That would be useful in responding to all of you!

Although I can respond to you now, Evelyn. That's very interesting to know. The rail network is also pretty big... But somehow my instinct is that some trains are being recounted multiple times.

Tom Cairns

unread,
Sep 11, 2025, 2:57:46 PM (9 days ago) Sep 11
to openrail...@googlegroups.com

Given Northern run in the region of around 2500 services a day, I’d say 150 Northern services being delayed in 90 minutes is quite possible.

 

Tom

 

--

You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Jack Murphy

unread,
Sep 19, 2025, 7:07:01 AM (yesterday) Sep 19
to A gathering place for the Open Rail Data community
Hi everyone,

May I ask again how to quote a message in purple and indented as Peter has here?

All the best,
Jack Murphy

Peter Hicks

unread,
Sep 19, 2025, 7:15:30 AM (yesterday) Sep 19
to openrail...@googlegroups.com
Hi Jack

On Friday, 19 September 2025 at 12:07, Jack Murphy <jackfran...@outlook.com> wrote:

May I ask again how to quote a message in purple and indented as Peter has here?

That's entirely down to your client (or MUA if we're talking old-school tech).  Emails can be in plain text, or they can include MIME headers and include other data types.  Outlook used to enjoy creating a text-only MIME part and a verbose HTML MIME part to ensure maximum message sizes(!)

Peter

Jack Murphy

unread,
Sep 19, 2025, 7:40:29 AM (yesterday) Sep 19
to openrail...@googlegroups.com
That's entirely down to your client (or MUA if we're talking old-school tech).  Emails can be in plain text, or they can include MIME headers and include other data types.  Outlook used to enjoy creating a text-only MIME part and a verbose HTML MIME part to ensure maximum message sizes(!)

Oh yeah! The functionality is just missing when answering within Google Groups.

Thanks for clarifying.

My application is having some weird new issues today which I'm currently debugging. I'll ask a question if it seems relevant to this group. You know when you're convinced that you're 90% done with a coding project but then the final 10% extends for miles? 🙃



From: 'Peter Hicks' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com>
Sent: Friday, September 19, 2025 12:15
To: openrail...@googlegroups.com <openrail...@googlegroups.com>
Subject: Re: [openraildata-talk] Am I counting trains correctly?
--
You received this message because you are subscribed to a topic in the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openraildata-talk/45t3CkEzwpo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/TkwJUuk2FouwXswyOhYTkBZRghVvgZKZ-QHeirBGc9aMQnxP8SHLISIvd5hq1hUpBHP8nosiZx3x0Cdg2HxMZHn0NBaWS-k3kQ80OrmzrYc%3D%40poggs.co.uk.
Reply all
Reply to author
Forward
0 new messages