Schedule data order

ageh...@gmail.com

unread,

Aug 18, 2016, 9:28:01 AM8/18/16

to A gathering place for the Open Rail Data community

Hey,

With the schedule data the sequence number is obviously important to ensure that each schedule is processed in order, but does the order of messages inside each schedule have to be processed in order or is this not important?

Trying to split my processing across worker processes but that becomes pretty difficult when they have to be processed in order so hoping not :)

Peter Hicks

unread,

Aug 18, 2016, 9:29:55 AM8/18/16

to ageh...@gmail.com, A gathering place for the Open Rail Data community

Hello

Yes, they need to be processed in order within a file. You might get a 'delete' for a particular UID/start date/STP indicator, and then a 'new' for the same one.

Have you looked at calculating a hash of the UID/start date/STP indicator such that it returns a value from 1 to 'n', where 'n' is the number of threads you have processing, and load-balancing based on that?

Peter

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send an email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adrian Hooper

unread,

Aug 18, 2016, 9:41:25 AM8/18/16

to A gathering place for the Open Rail Data community, ageh...@gmail.com

Ah shame, that's what I was afraid of.

I'll take a look at something like that. Although even with that identifier, surely processing in the right order is still pretty awkward? If say 10 threads are processing 10 messages, they've got to know between them which one is first, and which other thread is processing the one before/after?

Peter Hicks

unread,

Aug 18, 2016, 9:45:12 AM8/18/16

to Adrian Hooper, A gathering place for the Open Rail Data community

Hi Adrian

I think many users and consumers of the schedule feeds probably process them in a single thread. But if you're using multiple threads and you send all transactions for, say, A12345 starting on 2016-01-01 with STP indicator P to thread #1 and only thread #1, it'll be the only thread that needs a create/delete/revise for that unique key in order.

Alternatively, you could load-balance across threads using the UID, which might be easier to do.

Peter

To post to this group, send email to openrail...@googlegroups.com.

Adrian Hooper

unread,

Aug 18, 2016, 9:47:33 AM8/18/16

to A gathering place for the Open Rail Data community, ageh...@gmail.com

Ah right that makes sense. So the order is only really important for the same identifiers. Alright well I'll try it out in a single thread and see how well it performs and then looking at splitting it if I need to.

I'm not really expecting a huge amount of processing time for a daily update, they don't appear to be massive from what I've seen

Reply all

Reply to author

Forward