RFC differential updates over websockets

Stefan de Konink

unread,

Jun 5, 2013, 8:26:40 PM6/5/13

to gtfs-r...@googlegroups.com

Hi,

After having implemented differential and full_dataset for alerts, vehiclepositions and stoptimeupdates, we want to get back to were we were actually developing for: realtime distribution of transit information.

From a certain source we have heard that for a standardisation effort websockets would be prefered other message queing systems;

Method 0) Legacy client would connect to GTFS-RT aware webserver
- webserver serves cached full_dataset

Method 1) each client can trigger rendering full_dataset
- webserver serves full_dataset
- webserver sends differential updates

Method 2) a client has to wait for the next full_dataset to have a guaranteed replicated state, prior to receiving differential updates
- webserver serves full_dataset (cached)
- webserver serves full_dataset (new)
- webserver sends differential updates

In all cases, if a client gets out of sync with differential updates, it could force itself to restart the connection. Which results into the full_dataset again.

Any thoughts on the above?

Stefan

Jeremy Baron

unread,

Jun 5, 2013, 10:11:50 PM6/5/13

to gtfs-r...@googlegroups.com

On Thu, Jun 6, 2013 at 12:26 AM, Stefan de Konink <ste...@konink.de> wrote:
> From a certain source we have heard that for a standardisation effort
> websockets would be prefered other message queing systems;

How about PubSubHubbub? I imagine if there's demand for websocket
clients then it should be pretty easy to write a service which
subscribes to a firehose at the hub and broadcasts to websocket users.

-Jeremy

Frumin, Michael

unread,

Jun 6, 2013, 9:07:25 AM6/6/13

to gtfs-r...@googlegroups.com

Only that ZeroMQ has proven to be absurdly fast, reliable, flexible, and easy to use.

--
You received this message because you are subscribed to the Google Groups "GTFS-realtime" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-realtim...@googlegroups.com.
To post to this group, send email to gtfs-r...@googlegroups.com.
Visit this group at http://groups.google.com/group/gtfs-realtime?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Brian Ferris

unread,

Jun 6, 2013, 9:25:53 AM6/6/13

to gtfs-r...@googlegroups.com

Since I'm the "certain source", my thoughts are the following:

WebSockets are an IETF Standard that's relatively vendor agnostic. I don't deny that ZeroMQ is cool, but I'm reluctant to bake a specific vendor's transport protocol into the standard, no matter how cool it is.

More generally, I'm not convinced some of the more exotic features of ZeroMQ and other message queue technologies are really needed here. The most typical use-case here is an agency who wants to share their GTFS-realtime feed a number of subscribers (1-to-N), which WebSockets supports just fine. Plus, since WebSocket connections are HTTP based (with an UPGRADE option for the initial HTTP connection), I think it will lead to less pain when dealing with transit agency firewalls.

Thoughts?

Brian

Frumin, Michael

unread,

Jun 6, 2013, 9:33:25 AM6/6/13

to gtfs-r...@googlegroups.com

Fair enough, though since ZMQ is totally open source I’m a little reluctant to use the word “vendor” (not that there isn’t a company behind it…).

But I’m all for standards (and easy solutions to dealing with firewalls). For the ZMQ-obsessives among us, it’s probably not too hard to bridge from that to WebSockets (and vice versa).

Thanks,

Mike

Stefan de Konink

unread,

Jun 6, 2013, 9:44:53 AM6/6/13

to gtfs-r...@googlegroups.com

On Thursday, June 6, 2013 3:33:25 PM CEST, Frumin, Michael wrote:
> Fair enough, though since ZMQ is totally open source I’m a
> little reluctant to use the word “vendor” (not that there isn’t
> a company behind it…).

Give Pieter some credit business please!! But I see what Brian means, because the next cool thing would be called nanomsg, etc.

> But I’m all for standards (and easy solutions to dealing with
> firewalls). For the ZMQ-obsessives among us, it’s probably not
> too hard to bridge from that to WebSockets (and vice versa).

We will support all flavors :) HTTP/ZeroMQ/WebSockets

I just heard that OpenTripPlanner does HTTP/ZeroMQ and as you mention going to WebSockets is a triviality.

But I really want to get this ball rolling. Currently we put a lot of effort in pushing the development of OpenTripPlanner as consumer of data. It would be a really good gesture from other vendors to step in and commit to support as producer or consumer with a reasonable deadlines.

Stefan

Brian Ferris

unread,

Jun 6, 2013, 9:56:25 AM6/6/13

to gtfs-r...@googlegroups.com

First off, I want to say thanks to Stefan for pushing this forward. I actually spent some time hacking some stuff together in the OneBusAway GTFS-realtime library to support this:

https://github.com/OneBusAway/onebusaway-gtfs-realtime-api/blob/master/src/main/resources/com/google/transit/realtime/gtfs-realtime-OneBusAway.proto

I have a few thoughts from that process.

First, I'd propose adding a "incremental_index" field to FeedHeader. For an incremental feed, the field represents the index of the current incremental FeedMessage. Each incremental FeedMessage sent to a client should sequentially increment the index, such that a client can detect missed messages by looking for gaps in the index value. It is not required that the index of the first message sent to a client be zero.

I've got some more thoughts, but I figured I'd toss this out first.

Second, could you give a bit more explanation of Method 2 in your list? I'm not 100% certain I understand what's going on there.

Brian

Stefan

--
You received this message because you are subscribed to the Google Groups "GTFS-realtime" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-realtime+unsubscribe@googlegroups.com.

Stefan de Konink

unread,

Jun 6, 2013, 10:11:49 AM6/6/13

to gtfs-r...@googlegroups.com

Hi,

On Thursday, June 6, 2013 3:56:25 PM CEST, Brian Ferris wrote:
> First, I'd propose adding a "incremental_index" field to
> FeedHeader. For an incremental feed, the field represents the
> index of the current incremental FeedMessage. Each incremental
> FeedMessage sent to a client should sequentially increment the
> index, such that a client can detect missed messages by looking
> for gaps in the index value. It is not required that the index
> of the first message sent to a client be zero.

Great idea, also combined with full_dataset as 'starting point'.

> I've got some more thoughts, but I figured I'd toss this out first.
>
> Second, could you give a bit more explanation of Method 2 in
> your list? I'm not 100% certain I understand what's going on
> there.

I'll elaborate. Currently we have a producer that takes about 1.6s to render all the active messages (about 3k) to a file - previously it took us 3 minutes to do the same, but then all stops were populated with predictions. A lot of clients could bring the system to it knees by doing a deny of service attack. Since the rendering to protobuf actually locks the processing which is bad.

If at certain times a full_dataset is made the client could start off from the 'last run'. Especially for alerts etc. this will be all fine. Because the client already missed some differential messages the state cannot be guaranteed. (To detect this your incremental_index would help.) At a certain moment in time (for example every minute) a cached full_dataset update will be made. From this point on, a newly connected client has guaranteed coverage of every next message.

Visually:

client1 index - client2 index

F1 1 -
D 2 -
D 3 - F1 1
D 4 - D 4
D 5 - D 5
D 6 - D 6
D 7 - F2 7
D 8 - D 8

Method 2 would state that Differential message 4-6 would not be send to the client. Reviewing it we could just send it, but still maintain that the client stepped in after the first differential was send out.

Stefan

Stefan de Konink

unread,

Jun 6, 2013, 10:17:37 AM6/6/13

to gtfs-r...@googlegroups.com

I hope this mail client doesn't play tricks;

On Thu, 6 Jun 2013, Stefan de Konink wrote:

> Visually:

client1 index / client2 index

F1 1 /
D 2 /
D 3 / F1 1
D 4 / D 4
D 5 / D 5
D 6 / D 6
D 7 / F2 7
D 8 / D 8

F1 = First Full Dataset (cached)
F2 = Second Full Dateset (cached)
D = Differential

Stefan

Stefan de Konink

unread,

Jun 6, 2013, 10:22:05 AM6/6/13

to gtfs-r...@googlegroups.com

On Thursday, June 6, 2013 4:17:37 PM CEST, Stefan de Konink wrote:
> I hope this mail client doesn't play tricks;

Going to blaim the Google Group then.

<http://img812.imageshack.us/img812/6699/visuallypng>

Stefan

Stefan de Konink

unread,

Jun 6, 2013, 10:22:56 AM6/6/13

to gtfs-r...@googlegroups.com

Sorry.

http://img812.imageshack.us/img812/6699/visually.png

--
Met vriendelijke groet,

Stefan de Konink

Brian Ferris

unread,

Jun 6, 2013, 11:38:47 AM6/6/13

to gtfs-r...@googlegroups.com

So in my mind, the server would always be maintaining a full dataset that is a complete representation of all the incremental updates up to the current point in time. Thus, when client 2 connects (as in your diagram) it would immediately receive a full update and wouldn't start initially out of sync (if I'm interpreting your diagram correctly). This simplifies things a bit as you don't really have to maintain any state for each client after the initial connection.

Put another way:

https://docs.google.com/spreadsheet/ccc?key=0AmT8yNIP0VUQdGpubnUxX0g0RGVMZWN4NTY4U01IcHc&usp=sharing

Does that seem practical?

Brian Ferris

unread,

Jun 6, 2013, 12:01:58 PM6/6/13

to gtfs-r...@googlegroups.com

Ok, after discussing this offline, I'd make the following proposal.

"Servers will send a full-dataset on initial connection, followed by subsequent incremental updates. Servers are free to periodically send new full-datasets at their discretion. Any such full-dataset supersedes and replaces all incremental updates received up to that point. Clients cannot explicitly request a new full-dataset, but they may disconnect and reconnect, which will trigger the sending of a new full-dataset by the server."

Brian Ferris

unread,

Jul 11, 2013, 9:04:43 AM7/11/13

to gtfs-r...@googlegroups.com

So I've been playing with GTFS-realtime over WebSockets and I've got a couple of things to add to the discussion:

1) I've updated the onebusaway-gtfs-realtime-exporter to support producing incremental GTFS-realtime updates via WebSockets:

Source Code: https://github.com/OneBusAway/onebusaway-gtfs-realtime-exporter/wiki

Example Code with Incremental Support: http://developer.onebusaway.org/modules/onebusaway-gtfs-realtime-exporter/current-SNAPSHOT/

2) I've updated the onebusaway-gtfs-realtime-from-nextbus-cli tool (converts NextBus API data to GTFS-realtime) to support use the new exporter with incremental support:

Source Code: https://github.com/OneBusAway/onebusaway-gtfs-realtime-from-nextbus-cli

3) I've updated the onebusaway-gtfs-realtime-visualizer demo tool (visualizes GTFS-realtime vehicle positions on a map) to consume incremental GTFS-realtime updates:

Source Code: https://github.com/OneBusAway/onebusaway-gtfs-realtime-visualizer

Basically, I've got example code for both producing and consuming incremental GTFS-realtime data over websockets.

At this point, I'm interested in testing my code against Stefan's to see if we both agree on what should be coming over the pipe ; ) Also, I'd like to play around with WebSockets in more real-world scenarios to see how they hold in practice to disconnects, firewalls, etc. Hopefully I can get the OneBusAway SIRI=>GTFS-realtime converter updated with incremental support as well.

Thanks,

Brian

Jim Mittler

unread,

Jul 22, 2013, 10:47:10 AM7/22/13

to gtfs-r...@googlegroups.com

Hello, I managed to stumble across this thread and am very happy to have found it.

Server-Sent-Events (HTML5 EventSource) may be a more straightforward protocol for your needs than websockets. It seems to behave nicely with proxies and firewalls. Since the subscriber isn't sending any data back to the publisher, the one-way paradigm may be a better match.

It would make sense to me to send the full dataset as the first payload then follow with incrementals. This could be the default behavior and offer something different via query parms.

I would love to see something like this available as an alternative to polling a server on an interval. It's nice to be able to say "send me all your updates!" (but only when something actually changes). In practice you may need some sort of throttling behavior. I'm not sure how often the positional data updates internally; for other EventSources we've had some success limiting updates to one "delta" payload per second.

If you are sending JSON, then you can do some really fun stuff like sending "diffs" and patching them back together on the browser side.

Jim

Kurt Raschke

unread,

Oct 4, 2013, 8:41:06 PM10/4/13

to gtfs-r...@googlegroups.com

If anyone is looking for an additional consumer to test GTFS-realtime over WebSocket, I've developed a patch for OneBusAway which allows it to act as a WebSocket consumer:

https://github.com/kurtraschke/onebusaway-application-modules/compare/gtfs-rt-ws

I have tested this with onebusaway-gtfs-realtime-exporter and it worked as expected.

-Kurt

Og Crudden

unread,

Oct 4, 2014, 4:23:33 PM10/4/14

to gtfs-r...@googlegroups.com

Hi,

I like this method as it will allow the consumer choose between polling or publish/subscribe. Has anyone tired to implement this method?

Cheers,

Sean.

Stefan de Konink

unread,

Oct 5, 2014, 12:46:12 PM10/5/14

to gtfs-r...@googlegroups.com

On Sat, 4 Oct 2014, Og Crudden wrote:

> Hi,I like this method as it will allow the consumer choose between polling

> or publish/subscribe. Has anyone tired to implement this method?

This is the default for websockets. Hence either do a GET download, or a
Connection Upgrade to websockets.

Stefan

Og Crudden

unread,

Oct 7, 2014, 4:38:26 PM10/7/14

to gtfs-r...@googlegroups.com

Hi Stefan,

I like the publish subscribe aspect of pubsubhubbub for a few reason one of these being that the distribution responsibilities are given to a hub. This would, I guess, help with any firewall or disconnect issues mentioned by Brian Ferris in a later post.

Cheers,

Sean.

Stefan de Konink

unread,

Oct 7, 2014, 4:40:51 PM10/7/14

to gtfs-r...@googlegroups.com

On Tuesday, October 7, 2014 10:38:26 PM CEST, Og Crudden wrote:
> I like the publish subscribe aspect of pubsubhubbub for a few
> reason one of these being that the distribution responsibilities
> are given to a hub. This would, I guess, help with any firewall
> or disconnect issues mentioned by Brian Ferris in a later post.

You just like the message broker idea :) I like it too, but I don't want to
frame myself in yet another technology.

Stefan

Reply all

Reply to author

Forward