Passenger Train Allocation Data Query

49 views
Skip to first unread message

Marsh Lane

unread,
Oct 10, 2025, 6:45:41 PM (4 days ago) Oct 10
to A gathering place for the Open Rail Data community
Evening all,
Finally managed a look at the new feed today (well done once again to Peter and everyone involved, not only for getting the feed, but the amount of various data it contains).  I have it up and running with a Python connector to Kafka.

However, wading through some of the XML data, I just wanted to check on a couple of queries.  In the TransportOperationalIdentification we have a <core> field giving, for example '1N64Y0215712'.  This does not appear in the FOI redacted v095 document.  I am assuming the first four are head code - are characters 5-10 the TRUST TrainUID (Y02157 in this case) and if so, are the last two a reference to the original departure hour?  Assuming I am right, I note that 1N64 terminated at 13:50, but the message is date stamped at 18:35 - would that have been an update message?  If so how should they be treated?  Should a late update replace previous messages just for that working (or workings referenced)?  Any advice would be welcomed (although I acknowledge I could just be unwittingly making things more complicated than they need to be!).

Also, A small number of units are showing defect reports, with the text in the 'DefectDescription' field giving an overview of the fault.  Looking at Peter's FOI request document, the schema seems to show that the Max Length of that field in the XML is 50 characters.  As several of the 'DefectDescription' records are being terminated part way through words, the RAVERS entry is obviously longer, and wondered why the field on the XML output is shorter?  

I assume there is a reason, or was 50 characters selected on the basis it should cover the content, but actually does not?  Obviously, it's a minor issue in terms of the specific fields people are likely to use, but thought I'd just ask and flag it incase nobody had yet realised.

Rich

Matthew Burdett

unread,
Oct 11, 2025, 12:57:05 AM (4 days ago) Oct 11
to openrail...@googlegroups.com
Hi Rich

I'm sure someone will give a more thorough reply in due course.

The <Core> field as you guessed is the headcode + TrainID+ hour of departure.

Yes there can be updates to allocations hours (even days) after the train has run. Likewise some schedules may not even get an allocation untill after the train has run, although most are planned in prior.

You'll also find that the defects are very out of date and have no timestamp on to work with so there's no way of knowing what is new, what's old. it's up to you if you want to show this data. 

Kind regards 

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/a2e8129d-1f6d-4727-a1a6-631dd0eb86c9n%40googlegroups.com.

Peter Hicks

unread,
Oct 11, 2025, 3:44:47 PM (3 days ago) Oct 11
to openrail...@googlegroups.com
Hi Rich

On Friday, 10 October 2025 at 23:45, Marsh Lane <marsh...@outlook.com> wrote:

However, wading through some of the XML data, I just wanted to check on a couple of queries. In the TransportOperationalIdentification we have a <core> field giving, for example '1N64Y0215712'. This does not appear in the FOI redacted v095 document. I am assuming the first four are head code - are characters 5-10 the TRUST TrainUID (Y02157 in this case) and if so, are the last two a reference to the original departure hour? Assuming I am right, I note that 1N64 terminated at 13:50, but the message is date stamped at 18:35 - would that have been an update message? If so how should they be treated? Should a late update replace previous messages just for that working (or workings referenced)? Any advice would be welcomed (although I acknowledge I could just be unwittingly making things more complicated than they need to be!).

You're correct in that it doesn't appear in the v095 schema file, because it's one of the elements imported from the 'GB TAF TAP TSI complete.xsd' file:

<xs:element name="Core">
  <xs:annotation>
    <xs:documentation>It is the main part of identifier and is determined by the company that creates  it.
This is a 5.3.1.GB Schema revision to replace pattern validation of 'Core' element to support values used in CS to LINX and LINX to TM usse cases.
Core element now has two xs:patterns and element value is valid if either pattern matches
1. LINX Train Object / Path Object [Concatenation of (Train ID/headcode + Train UID + WTT Hour of Origin)]  e.g 1Q04Z0012315 or 1Q04 0150615 or 4491U0140312, etc
2. To validate TRUST Train ID (with two leading '-' padding characters) e.g. --702W32MA27 or --334O33C326, etc
    </xs:documentation>
  </xs:annotation>
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:minLength value="12"/>
      <xs:maxLength value="12"/>
      <xs:pattern value="[0-9]{1}[0-9A-Z]{1}[0-9]{2}[ A-Z]{1}[0-9]{7}"/>
      <xs:pattern value="[\-]{2}[0-9]{3}[0-9A-Z]{1}[0-9]{2}[0-9A-Z]{2}[0-9]{2}"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

XSDs are often not standalone - they reference other files.  For example, the 'complete.xsd' file imports (includes) taf_tap_codelists.xsd.  Whilst it sounds horribly cumbersome, it's just like importing libraries in Python - one library might import two others, which also import some more, etc.

It's possible for the allocation for a train to be updated after it's terminated.  This might happen if some kind of disruption occurred - I think we'd all rather than train run than waiting for somebody to update the resource allocations!  The reason an allocation may be updated retrospectively is likely because GEMINI updates mileages for resources, and the TOC will want to use other systems driven by GEMINI to advise when the resource is due for its mileage-based maintenance.

How you handle this is up to you - if you want a historical record of what GEMINI thinks happened, you could process the update.  But if you're building a system that is more real-time, then you might want to ignore the updated allocation because when the train's terminated, it's not much use.

Also, A small number of units are showing defect reports, with the text in the 'DefectDescription' field giving an overview of the fault. Looking at Peter's FOI request document, the schema seems to show that the Max Length of that field in the XML is 50 characters. As several of the 'DefectDescription' records are being terminated part way through words, the RAVERS entry is obviously longer, and wondered why the field on the XML output is shorter?

It's not actually RAVERS any more, but R2, which is a combined Rolling Stock Library and RAVERS-like system.  As I understand it, operators interface their systems in to R2, uploading a summary of defect data and a system-specific identifier for that defect.  I think it's likely the original mainframe GEMINI system (it's called GEMINI because it had a mainframe component and a 'micro' - or PC - component to interact with it, hence two systems) allowed 50 characters because that would fit nicely on some 80 column wide screen, and that 50-character limit just stuck.  It's not just a case of changing a column in a database - GEMINI might not actually use a SQL database and could just store data in partitioned data sets (PDS).

I don't think the 50-character limit is a problem, because there's also a load more data in the maintainer's defect management system that won't be in GEMINI, and the first 50 characters of a defect is probably good enough to give an idea of what type of defect it is.


Peter

Marsh Lane

unread,
Oct 11, 2025, 5:40:59 PM (3 days ago) Oct 11
to 'Peter Hicks' via A gathering place for the Open Rail Data community

Matthew/Peter,

Many thanks for both of your replies. Nothing in my original email was intended to be negative or aggreved, I just was not sure how subsequent updates to allocation should be treated, but I assume they are a straightforward replacement for the original message.

Peter, thanks for your background on R2 - My query over the 50 character limit was purely based on the thought of whether that had been set for this exported XML dataset, rather than the core system behind it, but yes what you've said makes total sense.  I suspect from our (my) point of view, and thinking about Matthew's comment, its probably best ignored, especially as it could be out of date.

Thanks again, as ever, really helpful.

Rich

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages