Any Beginner Tutorials On Using NYC GTFS Subway Data With Python?

2,064 views
Skip to first unread message

webscra...@gmail.com

unread,
Sep 19, 2018, 7:39:07 PM9/19/18
to GTFS-realtime
Hello,

I'm a newbie Python developer, and have built a few Python based Twitter webscrapers, and have made a couple of fake websites hosted on my blog page:


I am very interested in using the subway data provided by MTA, and applied for / received an API key.

My main problem, is that when I literally print out the r.content using the requests library, I get a bunch of what looks like byte data.

I'm kinda confused on how to work with this data, since I want to make a Pyhton based app that would cater towards a person's particular subway route, and give route times. I've read through Susan's post (though, she and others who posted are a bit more high level experienced than me), and it appears that this would be better suited as a webpage app because of the ping restrictions.

Is there any guide on how to deal with this data in Python, or if worst case scenario, PHP?

I ask, because I can find some sources of documentation sprinkled throughout Google's pages, but the thing is, its mostly going on about CREATING feeds, and not really working with EXISTING feeds.

Thanks for reading my post!

Sincerely,

Sam

Andrew Byrd

unread,
Sep 20, 2018, 12:26:54 AM9/20/18
to gtfs-r...@googlegroups.com
Hello,

It sounds like you’re seeing un-decoded protocol buffer messages. GTFS-realtime uses a system called Protocol Buffers. See the Data format and Data structure sections of: https://developers.google.com/transit/gtfs-realtime/#data-format

There is a specification document (gtfs-realtime.proto) describing the messages that will be transmitted, and you use a protocol buffers compiler for your language of choice to generate code that will decode or produce those messages. This allows for a structured stream of messages that can be handled by a wide range of languages, and aims to be more compact, efficient, and typed (compared to e.g. gzipped JSON).

-Andrew

--
You received this message because you are subscribed to the Google Groups "GTFS-realtime" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-realtime/fb117b9b-6c7a-4e4f-87fd-920d25e89e74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

webscra...@gmail.com

unread,
Sep 20, 2018, 1:24:47 PM9/20/18
to GTFS-realtime

Sean Barbeau

unread,
Sep 20, 2018, 2:04:18 PM9/20/18
to GTFS-realtime
To consume all the data in the NYCT Subway feed you'll need their official .proto file that define's their extensions to the GTFS-rt spec:

You could then use the normal protocol buffer tools to generate Python client code to parse the feed.

The https://github.com/google/gtfs-realtime-bindings/tree/master/python library is pre-generated client code using the normal GTFS-rt .proto:

You can use gtfs-realtime-bindings to parse the NYCT feed, but some fields that are custom to NYCT's feed and not in the standard won't be recognized.

Sean

webscra...@gmail.com

unread,
Sep 25, 2018, 1:28:09 PM9/25/18
to GTFS-realtime
Gotcha. Yeah, I'm looking at the .proto file that corresponds to that specification you're talking about.

I read through all of it, but the thing is, how do I parse through this data to get info about specific trains?

So far I've been able to just write it all to a file, but obviously I want to actively parse through it. Here's what I have so far:

MTADataFeed = (NOT SHOWING MY KEY)

r = requests.get(MTADataFeed)

print("\n\nResponse Code = " + str(r.status_code))

feed = gtfs_realtime_pb2.FeedMessage()
feed.ParseFromString(r.content)
# print(feed.entity[1].vehicle.trip.trip_id)

for entity in feed.entity:
print(entity)

f = open("MTADataStream.txt", "w")

for entity in feed.entity:
f.write(str(entity))

f.close()

Sean Barbeau

unread,
Sep 25, 2018, 1:46:52 PM9/25/18
to gtfs-r...@googlegroups.com
Predictions for when the next train will arrive should be in TripUpdate entities, while positions of the vehicle will be in VehiclePosition entities.  You should be able to loop through these in your code.  To see more information about the fields for each object see the spec at https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md.

Note that you may need to reference the GTFS data (zip file) to get information about where stops are and which trip_ids serve which routes, along with headsign information and schedules.

Sean


You received this message because you are subscribed to a topic in the Google Groups "GTFS-realtime" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gtfs-realtime/KCeiidMh8RI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-realtime/fd8b4866-3a6d-459c-bb75-670a56dccdcb%40googlegroups.com.

webscra...@gmail.com

unread,
Oct 10, 2018, 1:39:37 PM10/10/18
to GTFS-realtime
This is my current code so far, notice how Line 115 should be adjusted with your own personal MTA data feed key. However, I'm having issues figuring how how the heck what "id" matches to what train, and what "stop_id" matches to what NYC stop.

If I knew those two factors, I could somehow query into this data, and make it useable for a user. So far, I'm only displaying the first train's info just to get a sense of what to do. Notice how this is in Python too, here's my code:

https://pastebin.com/SLNRmKj3

How do I determine the train's line, as well as the stop_id? I tried referring to this PDF's "message NyctTripDescriptor" section:


But the first train id is literally "000001" which doesn't correspond to the two first two digit format in the PDF (there's no Zero train line in NYC). Otherwise, this is a red herring in the sense that its just arbitrarily labeled as "000001" to be the first train, and it doesn't refer to the actual train line.

Sean Barbeau

unread,
Oct 10, 2018, 3:58:29 PM10/10/18
to gtfs-r...@googlegroups.com
Questions about the data you see in MTA's feed are best directed to https://groups.google.com/forum/m/#!forum/mtadeveloperresources

Sean

webscra...@gmail.com

unread,
Oct 16, 2018, 1:15:28 PM10/16/18
to GTFS-realtime
Yeah, I've been trying to do so, but no one barely responds.

Is there anyone willing to help me out?

I just want to finally query into this data, and actually make the train_id and stop_id fields useable, and am so lost on how to work with this.

Nikhil VJ

unread,
Oct 25, 2018, 10:45:43 PM10/25/18
to GTFS-realtime
Hello,
I've posted some sample programs here:

https://groups.google.com/d/msg/gtfs-realtime/c4Wae9D6I7Q/_GzyMymDBQAJ


reading, and writing a vehicle postions .pb file in python3.


- Nikhil VJ, Pune, India

Reply all
Reply to author
Forward
0 new messages