GMPTE data: progress towards an API?

63 views
Skip to first unread message

Ric Roberts

unread,
Feb 22, 2011, 6:34:07 AM2/22/11
to Open Data Manchester
Does anyone know if anyone has made any decent progress with the GMPTE
data yet (such converting it into a more intelligible format, or
making it accessible via an API)? Or who to get in touch with
regarding this?

I am planning to get this data online as Linked Data in the near
future, so that people wanting to make apps or sites which use the
data can just access it via http requests and SPARQL queries rather
than have to download and decipher large files (ie. a simple API).
Once this is done, I'd be happy to help people regarding accessing the
data.

But if an effort is already being made in this direction, It would
make sense not to replicate that work. Or if something *is* already
underway, then maybe we could work together?

If we can get the GMPTE data online as Linked Data soon, it might be a
useful set of data for people to work with at Hack Days, such as the
MDDA one coming up next month
http://groups.google.com/group/opendatamanchester/browse_thread/thread/087cd1d3a26146be.
Making it available as Linked Data would mean that people could get on
with making useful stuff rather than worrying about the data formats.

Ben Gibbs

unread,
Feb 22, 2011, 7:36:52 AM2/22/11
to opendatam...@googlegroups.com
Hi Ric,

I tried to get support for importing the ATCO CIF data from GMPTE into a MySQL database and then building an API on top of that in Rails last autumn, but no-one else offered to help and the sheer amount of data made hosting such a database and API fairly costly, so it took a back burner.

You can now access the latest ATCO CIF zip from GMPTE at the new DataGM data store: http://datagm.org.uk/package/gmpte-atco-cif

GMPTE update this zip file weekly and sometimes daily near holiday periods so you'll probably want to download it weekly, then extract and parse the data however you need to.

Hope that helps, Ben

--

www.bobop.co.uk
07811 197374
WordPress Specialist, Ruby On Rails Developer, Web Consultant to the Third Sector


Christopher Osborne

unread,
Feb 22, 2011, 2:46:58 PM2/22/11
to opendatam...@googlegroups.com

We (ITO World) were thinking of releasing a GTFS export. Is this something the community would be interested in using?

On Feb 22, 2011 5:54 PM, <opendatamanch...@googlegroups.com> wrote:

Group: http://groups.google.com/group/opendatamanchester/topics


    WordPress Specialist, Ruby On Rails Developer, Web Consultant to the Third
    Sector

     


       
      Does anyone know if anyone has made any decent progress with the GMPTE
      data yet (such converting it into a more intelligible format, or
      making it accessible via an API)? Or who to get in touch with
      regarding this?
       
      I am planning to get this data online as Linked Data in the near
      future, so that people wanting to make apps or sites which use the
      data can just access it via http requests and SPARQL queries rather
      than have to download and decipher large files (ie. a simple API).
      Once this is done, I'd be happy to help people regarding accessing the
      data.
       
      But if an effort is already being made in this direction, It would
      make sense not to replicate that work. Or if something *is* already
      underway, then maybe we could work together?
       
      If we can get the GMPTE data online as Linked Data soon, it might be a
      useful set of data for people to work with at Hack Days, such as the
      MDDA one coming up next month
      http://groups.google.com/group/opendatamanchester/browse_thread/thread/087cd1d3a26146be.
      Making it available as Linked Data would mean that people could get on
      with making useful stuff rather than worrying about the data formats.

       


         
        Hi Ric,
         
        I tried to get support for importing the ATCO CIF data from GMPTE into a
        MySQL database and then building an API on top of that in Rails last autumn,
        but no-one else offered to help and the sheer amount of data made hosting
        such a database and API fairly costly, so it took a back burner.
         
        You can now access the latest ATCO CIF zip from GMPTE at the new DataGM data
        store: http://datagm.org.uk/package/gmpte-atco-cif
         
        GMPTE update this zip file weekly and sometimes daily near holiday periods
        so you'll probably want to download it weekly, then extract and parse the
        data however you need to.
         
        Hope that helps, Ben
         
         
         
         


        WordPress Specialist, Ruby On Rails Developer, Web Consultant to the Third
        Sector

         

          Ric Roberts <r...@swirrl.com> Feb 21 11:34AM -0800 ^
           
          A transport-data-related hack day sounds good, and MadLab would be a
          great place to hold it. The 12th of March is about 3 weeks away,
          which I guess might be a bit short notice for some.
           
          What exactly are we hoping to get out of the day? Do we need define
          something more focused, or are we happy with people just rocking up
          and doing what they like? And what's in it for those participating?
           
          I agree with Paul that we're more likely to get developers to attend
          on a Saturday. Would the stakeholders who stand to benefit from the
          output of the day be happy to attend a weekend event? I realise that
          it might not be part of their culture, and that we might need to work
          on raising awareness etc. If this proves tricky I'd be happy to do it
          on a weekday (given enough notice), as I work for myself and have a
          specific interest in this area.
           
          In fact, my colleague is helping to organise a Scottish Linked Data
          hack day this week in Glasgow (http://
          scottishlinkeddatahackday.eventbrite.com/) and we're donating hosting
          on our "PublishMyData" Linked Data publishing platform to help with
          getting the data online in a re-usable, query-able format on the day
          itself.
           
          We've been planning on sorting out some Manchester-themed Linked Data
          for a while, and this event sounds like a good place to get started on
          that! A couple of us from my company could turn up and lead an effort
          along these lines, and donate hosting on our platform if people are
          interested.
           
          I'm planning meeting up with Ian Moss later this week, and I'm also
          planning on coming to the ODM meet-up this week if anyone wants to
          chat about this.
           
          Cheers!
          Ric
          (I'm a coder, but I sometimes have ideas too).
           

           

          Alan Holding <alanh...@me.com> Feb 22 02:40AM -0800 ^
           

          >> (I'm a coder, but I sometimes have ideas too)
           
          Heh. Sorry about that. :)

           

        Julian Tait

        unread,
        Feb 22, 2011, 7:42:19 PM2/22/11
        to opendatam...@googlegroups.com
        Hi Chris

        Yes I think that GTFS would be really useful. Under what conditions would it be made available?

        Julian

        Julian Tait

        unread,
        Feb 23, 2011, 4:43:27 AM2/23/11
        to opendatam...@googlegroups.com
        Hi Chris,

        I assume that the GTFS will comply with the Traveline T&Cs

        Julian

        Ric Roberts

        unread,
        Feb 23, 2011, 5:14:56 AM2/23/11
        to Open Data Manchester
        Sounds good, Chris: the ATCO CIF format is pretty awkward to work
        with. How do you deal with updates to the data (if at all)? - it
        changes quite frequently

        Christopher Osborne

        unread,
        Feb 23, 2011, 8:46:36 AM2/23/11
        to Open Data Manchester
        Gathering points below together:

        I thought we'd throw this one out to the ODM group, see if there's any
        demand, and then work out what fits with the community without
        committing ourselves to anything right now. For starters I'd like to
        do a one-off export, and make that available to the group. I will
        consult with GMPTE to see what their views are on the matter, but from
        their recent presentations they stated that their release of data is
        unaffected by Traveline T&Cs.

        Updates not a problem for us, we do data quality checks on all of the
        UK's transport schedules on a weekly basis already. But I'd like to
        repeat that we're not committing to anything just yet, but exploring
        possibilities from having the data in a more accessible format. All of
        the open source trip planners etc that I've seen are GTFS based so I
        believe it should push things forward a great deal.

        aph

        unread,
        Feb 23, 2011, 6:11:16 PM2/23/11
        to Open Data Manchester
        I had started looking at the GMPTE data and agree that the data format
        is pretty awful - it clearly dates back to the days of fixed positions
        which reminds me of my (long time ago) days working with Fortran.
        However
        as the data follows some form of structure, it would be relatively
        easy to
        parse each line and extract the individual elements into a
        more structured and suitable format for further data extraction. I was
        assuming
        that this would be XML and had spent a little time looking at various
        options. For
        open data I would like to find an existing open standard but hadn't
        found a
        suitable one (I was very surprised to find that there isn't a easy
        migration
        from ATCO-CIF to some of the existing XML formats like TransXchange).
        I
        am reluctant to create a new XML schema unless it can be shown that
        there
        isn't one that can be made to fit with the data. That said, there will
        probably
        be a need to extract the data into a number of different formats (GTFS
        is
        obviously one) depending on the types of applications that will appear
        that would use the data.

        BTW I have a version of the ATCO-CIF spec but it is version 5.1
        (November 2000)
        but the GMPTE data appears to be version 5.0. Don't know what the
        differences
        between the two versions are but as it is minor version number change,
        I assume it isn't
        very dramatic. The data also appears to contain 'proprietary'
        extensions, lines
        beginning with Z. Does anyone have a specification of these
        extensions?

        Anthony

        On Feb 23, 1:46 pm, Christopher Osborne <chris.gai...@gmail.com>
        wrote:

        Ben Gibbs

        unread,
        Feb 24, 2011, 4:18:31 AM2/24/11
        to opendatam...@googlegroups.com
        Hi Anthony and others,

        I have attached the ATCO-CIF documentation which was distributed when the zips were first released to the Open Data Manchester group.

        The "Atco-cif_formats.pdf" document has the definitions of the lines beginning with Z and the "atco-cif-spec.pdf" document is version 5.10.

        Hope this helps, Ben

        Atco-cif_formats.pdf
        atco-cif-spec.pdf

        Ric Roberts

        unread,
        Feb 24, 2011, 4:48:48 AM2/24/11
        to Open Data Manchester
        Thanks, Ben and Anthony: that's useful info.

        Looks like it shouldn't be *too* hard to get this data online as
        Linked Data in the next few weeks. I'll post here when there's
        something to see - just need to get to grips with the data model and
        how best to represent it as Linked Data.

        Cheers,
        Ric.
        > *www.bobop.co.uk
        > *07811 197374
        > WordPress Specialist, Ruby On Rails Developer, Web Consultant to the Third
        > Sector
        >
        >  Atco-cif_formats.pdf
        > 389KViewDownload
        >
        >  atco-cif-spec.pdf
        > 125KViewDownload

        aph

        unread,
        Feb 27, 2011, 6:21:04 PM2/27/11
        to Open Data Manchester
        I have now started processing the data, the documents from Ben were a
        great help.
        I have written a Java program which will process the file and dump out
        a decoded
        version of each record. From the sample files I have processed there
        are only a
        small subset of the ATCO record types used (QS, QI, QO, QT) together
        with the
        following extension codes (ZA, ZD, ZJ, ZL, ZN, ZS). As I have only
        looked at a
        few of the files, there is the possibility that there are additional
        types to
        decode but at least it is a start.

        The dumped file isn't really usable, so I have toyed with the idea of
        creating
        an XML file. I have looked at TransXchange and need to create some
        form of
        mapping of the ATCO data fields to a corresponding TransXchange field.
        This
        will take some time. However, I have started to create some of the
        GTFS files
        from the data (I think there are 6 files required). It is clear that
        these
        files will need additional data that is not present in the GMPTE files
        but I have
        downloaded the Naptan data from data.gov.uk and when I can load this
        into a database,
        it should enable the easy translation of the bus stops into the
        Latitude/Longitude
        that is required by GTFS.

        When I have got a full set of GTFS files for one of the routes (anyone
        got
        a favorite route to try?) I will post the data for review.

        Anthony

        Julian Tait

        unread,
        Feb 28, 2011, 1:10:59 AM2/28/11
        to opendatam...@googlegroups.com
        This sounds great. Would love to see what you come up with.

        BTW let us know if there are any NaPTAN points that don't exist on the data.gov.uk database I would be interested to know. The data.gov.uk database is updated quarterly and if new stops come into being - which I cant think happens too often, they will take a while to manifest themselves on the data.gov.uk dataset.

        Cheers

        Julian

        aph

        unread,
        Mar 6, 2011, 5:50:12 PM3/6/11
        to Open Data Manchester
        I finally managed to create GTFS files for several routes, together
        with a KML file (for use of Google earth).
        The files validate (according to the validator tools see
        http://code.google.com/p/googletransitdatafeed/)
        and can be displayed using the local schedule viewer (see
        http://code.google.com/p/googletransitdatafeed/wiki/ScheduleViewer).
        I can make these available if required but it looks as if the full
        GTFS data has now been made available
        (see https://groups.google.com/group/opendatamanchester/t/1115922b526e3794?hl=en)
        which is probably more useful.

        What I have discovered:

        1. Not all of the bus stops ATCO codes have equivalent NaPTAN codes
        2. There is a lot of the ATCO data that isn't needed by GTFS
        3. The mapping isn't wonderful as the bus stops are linked together as
        straight lines rather than following the road
        4. The dates need some manipulating as routes with an end date of
        00000000 are not valid for GTFS

        My algorithm takes a long time to process the data so I need to do
        some optimisation before it could be
        considered robust for production use.

        I will start looking at some GTFS apps but I intend to get the GTFS
        schedule loaded into a database so
        that I can start doing some interesting data mining.

        Regards

        Anthony
        Reply all
        Reply to author
        Forward
        0 new messages