Schedule data order

62 views
Skip to first unread message

ageh...@gmail.com

unread,
Aug 18, 2016, 9:28:01 AM8/18/16
to A gathering place for the Open Rail Data community
Hey,

With the schedule data the sequence number is obviously important to ensure that each schedule is processed in order, but does the order of messages inside each schedule have to be processed in order or is this not important?

Trying to split my processing across worker processes but that becomes pretty difficult when they have to be processed in order so hoping not :)

Peter Hicks

unread,
Aug 18, 2016, 9:29:55 AM8/18/16
to ageh...@gmail.com, A gathering place for the Open Rail Data community
Hello

Yes, they need to be processed in order within a file.  You might get a 'delete' for a particular UID/start date/STP indicator, and then a 'new' for the same one.

Have you looked at calculating a hash of the UID/start date/STP indicator such that it returns a value from 1 to 'n', where 'n' is the number of threads you have processing, and load-balancing based on that?


Peter

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send an email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adrian Hooper

unread,
Aug 18, 2016, 9:41:25 AM8/18/16
to A gathering place for the Open Rail Data community, ageh...@gmail.com
Ah shame, that's what I was afraid of.
I'll take a look at something like that. Although even with that identifier, surely processing in the right order is still pretty awkward? If say 10 threads are processing 10 messages, they've got to know between them which one is first, and which other thread is processing the one before/after?

Peter Hicks

unread,
Aug 18, 2016, 9:45:12 AM8/18/16
to Adrian Hooper, A gathering place for the Open Rail Data community
Hi Adrian

I think many users and consumers of the schedule feeds probably process them in a single thread.  But if you're using multiple threads and you send all transactions for, say, A12345 starting on 2016-01-01 with STP indicator P to thread #1 and only thread #1, it'll be the only thread that needs a create/delete/revise for that unique key in order.

Alternatively, you could load-balance across threads using the UID, which might be easier to do.


Peter


To post to this group, send email to openrail...@googlegroups.com.

Adrian Hooper

unread,
Aug 18, 2016, 9:47:33 AM8/18/16
to A gathering place for the Open Rail Data community, ageh...@gmail.com
Ah right that makes sense. So the order is only really important for the same identifiers. Alright well I'll try it out in a single thread and see how well it performs and then looking at splitting it if I need to. 
I'm not really expecting a huge amount of processing time for a daily update, they don't appear to be massive from what I've seen
Reply all
Reply to author
Forward
0 new messages