On Monday, 28 May 2012 at 09:26, Adam Sutton wrote:
How frequently do you ingest data from your sources, BBC, PA, etc... Or do they push data to you as and when it changes.
Hi Adam, we ingest from our main sources every 15 mins. In the case of PA this covers any changes they've made in that period regardless of day, for the BBC it's changes made to today's schedule. We refresh 14 days worth of BBC data every two hours.
These are current timings and are always subject to change, usually to make things run more frequently rather than less, though we are often bound in this kind of thing by constraints on the source system.
I'm starting to look at the various issues surrounding the complexities of merging multiple data sources, notably XMLTV, my own data (based on atlas) and EIT. Ultimately I think XMLTV v my script will be a no go (at least initially) as the two are basically different representations of the same data (albeit mine is clearly much better ;) ). But there will be people who still want to use EIT data for picking up last minute scheduling changes.
As you say, the XMLTV data and the PA data are the same, it's the same underlying data set.
However it might still be possible to do this with my script alone as long as Atlas is getting the updates as soon as they're made (or at least before they become relevant - i.e. the show airs or would have stopped etc...). My import for short periods is quick enough to be usable as long as I know the data is there.
For today this should be the case, assuming the overrun or schedule change is known before very near the start or end of the programme.
Also any further thoughts on having something in place to detect "changes" and remove the need to pull the entire schedule every time? It could be something as simple (forgive use of the word, I mean from a high level perspective) as allowing the user to specify a time into the schedule.json endpoint and only schedule changes since that time will be provided. It's easy enough for me to record the time just before I make the request (with a small margin for time differences). This would greatly reduce the amount of data that needs to be shifted for most queries, though I guess it also depends as much on the database performance?
Nothing to report here yet. But as Chris mentioned in another thread, we are starting to works towards an even more scalable architecture for Atlas which should make a big difference here.
Cheers
Jonathan