Project for conversion of DTD data to CSV etc.: rdg-to-db

79 views
Skip to first unread message

Eoghan Murray

unread,
Aug 9, 2017, 7:45:47 AM8/9/17
to A gathering place for the Open Rail Data community
Hi,

I've published a repository to automatically download DTD Zip files from https://dtdportal.atocrsp.org/ (you have to supply login credentials obviously).
This is similar in intent to the open-track dtd2mysql project: https://github.com/open-track/dtd2mysql in terms of having some tooling to get started on working with DTD data.
There's a proof-of-concept conversion to CSV included and I intend to also target the open-track mysql schema as a backend.

One thing that might be generally reusable is the file-fields spec: https://github.com/GoPixie/rdg-to-db/blob/master/file-fields.json
I've coded the spec up in JSON so that it could be easily reused in other projects/contexts.

I wanted to get a taste from this group whether this project could be useful for others and to find out whether there is any other existing open efforts in this area that I'm missing.

Cheers!

Eoghan

Peter Hicks

unread,
Aug 9, 2017, 7:49:08 AM8/9/17
to Eoghan Murray, A gathering place for the Open Rail Data community
Hi Eoghan

Naturally this would only be useful if you already have an account on the DTD Portal (which limits it to, I guess, approved suppliers).  But I believe earlier in the year at the RDG Developer Day, it was announced some of this data will be made publicly available later this year.

Parsers for data are always good, even better is a codification of the field structure (especially for non-JSON or XML data), but the very best thing is a tutorial on how to use the data.


Peter


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eoghan Murray

unread,
Aug 9, 2017, 8:07:59 AM8/9/17
to A gathering place for the Open Rail Data community, eoghan...@gmail.com
Cheers Peter, there is also a `./download --datadotatoc` to download Fares & Timetables only from data.atoc.org (which is what it originally focused on). 

Eoghan Murray

unread,
Aug 16, 2017, 6:42:10 PM8/16/17
to A gathering place for the Open Rail Data community, eoghan...@gmail.com

> very best thing is a tutorial on how to use the data

Not sure if this qualifies but the following Python model works off a postgresql 'route_code' VIEW which joins route_code information between the RJFA (Fares) and RJRG (Routeing) dataset:

I created an account on the DTD Portal as the Fares feed didn't include info on which TOCs were included/excluded with route codes (as is the subject of a previous post).

I'll be keeping the rdg-to-db project as general as possible with the goal of enabling others to use it as a basis for exploring the data or adding useful database views to encode some of the business logic — that's the dream anyhow!

Here's an example of querying the route_code VIEW which allows you to see possible discrepancies between the RJFA/RJRG route_code data (crs_inclusions/exclusions) and gives a more complete picture of route codes than either data set:

rdg=# SELECT * FROM route_code
WHERE route_code in ('00028', '00091', '00014', '00003', '00121') AND date_range @> current_date;
-[ RECORD 1 ]------+------------------------
route_code         | 00003
date_range         | [2017-05-24,)    ;; single postgresql daterange instead of separate START_DATE/END_DATE columns (open end date as is set to 2999-12-31)
quote_date         | 2017-04-07
description        | STRATFORD/LONDON
atb_desc           | STRATFORD/LONDON
crs_inclusions     | {LST,SRA}
crs_exclusions     | {}
rgk_crs_inclusions | {}
rgk_crs_anys       | {LST,SRA}  ;; The RJFA data doesn't have the concept of 'either/or'; the Routeing dataset does
rgk_crs_exclusions | {}
london_marker      | 3
toc_inclusions     | {}
toc_exclusions     | {}
mode_inclusions    | {}
mode_exclusions    | {}
-[ RECORD 2 ]------+------------------------
route_code         | 00028
date_range         | [2014-09-09,)
quote_date         | 2017-04-07
description        | GT NORTHERN ONLY
atb_desc           | GT NORTHERN ONLY
crs_inclusions     | 
crs_exclusions     | 
rgk_crs_inclusions | {}
rgk_crs_anys       | {}
rgk_crs_exclusions | {}
london_marker      | 3
toc_inclusions     | {GN,TL}  ;; Both Great Northern toc_ids Fares data doesn't have the toc ids
toc_exclusions     | {}
mode_inclusions    | {}
mode_exclusions    | {}
-[ RECORD 3 ]------+------------------------
route_code         | 00014
date_range         | [2015-01-02,2016-10-21)
quote_date         | 2017-04-07
description        | VIA SWANLEY
atb_desc           | VIA SWANLEY
crs_inclusions     | {SAY}
crs_exclusions     | {}
rgk_crs_inclusions | {SAY}
rgk_crs_anys       | {}
rgk_crs_exclusions | {}
london_marker      | 3
toc_inclusions     | {}
toc_exclusions     | {}
mode_inclusions    | {}
mode_exclusions    | {}
-[ RECORD 4 ]------+------------------------
route_code         | 00091
date_range         | [2017-05-24,)
quote_date         | 2017-04-07
description        | DUMFRIES NOT LDN
atb_desc           | DUMFRIES NOT LONDON
crs_inclusions     | {DMF}
crs_exclusions     | {WFJ,EUS}
rgk_crs_inclusions | {DMF}
rgk_crs_anys       | {}
rgk_crs_exclusions | {WFJ}    ;; Routeing data misses that London Euston is specifically excluded as well as Watford Junction
london_marker      | 0
toc_inclusions     | {}
toc_exclusions     | {}
mode_inclusions    | {}
mode_exclusions    | {}
-[ RECORD 5 ]------+------------------------
route_code         | 00121
date_range         | [2014-10-15,)
quote_date         | 2017-04-07
description        | BUS & TRAM ONLY
atb_desc           | BUS & TRAM ONLY
crs_inclusions     | 
crs_exclusions     | 
rgk_crs_inclusions | {}
rgk_crs_anys       | {}
rgk_crs_exclusions | {}
london_marker      | 2
toc_inclusions     | {}
toc_exclusions     | {}
mode_inclusions    | {2,7}  ;; {0: train, 1: walk, 2: bus, 3: ferry, 4: tube, 5: transfer, 6: metro, 7: tram, 100: hovercraft, 101: jetfoil }
mode_exclusions    | {}



Reply all
Reply to author
Forward
0 new messages