Query Performance

21 views
Skip to first unread message

Adam Sutton

unread,
Mar 28, 2012, 10:32:32 AM3/28/12
to atla...@googlegroups.com
I'm starting to use my atlas->(e)xmltv script in more realistic environments and I'm realising that I may well need to revisit the performance issues.

I'm doing 14 day grabs of about 126 channels (possibly a bit excessive but its a nearly full freesat grab), using publisher specific and PA data (which I'm them combining myself). However its (sometimes) taking a age to grab all the data.

As a specific example querying 7 days of data from bbc one, with this query:


is taking 5-6 seconds.

This could mean as much as 10s per channel which brings the total grab time up to 20mins (often its more).

I intend to mitigate this a bit by doing a nightly 14day grab + more frequent 24hour grabs to pick up potential last minute changes. However even this takes more time than I'd like.

Is this the level of performance you would expect and am I simply fighting a losing battle or is there something slowing things down or should I re-organise my queries for better performance?

I've tried multiple channels (for same pub) for single day and I've tried single channel for single day and single channel for multi days (current).

Any thoughts?

Fred van den Driessche

unread,
Apr 2, 2012, 11:00:26 AM4/2/12
to atla...@googlegroups.com
Hi Adam,

Unfortunately the performance of the schedule end-point is not going to change any time soon. We're currently investigating ways to improve it but these modifications won't happen for a few months. 

We generally find that requesting a single channel-day at a time, parallelising those requests as much as possible, to be the best approach. Also, the series_summary and brand_summary annotations will have an effect too, especially over a long schedule interval. You may find that it's better to maintain a temporary cache of brands/series that is populated from the content end-point as required.

Cheers,
Fred

Adam Sutton

unread,
Apr 2, 2012, 12:07:36 PM4/2/12
to atla...@googlegroups.com
Thanks for the feedback Fred,

I had already figured out that using a parallel approach improved the performance significantly. I think I've now got 32 threads (not really played around with that number much, it was a first stab in the dark), which means no more than 4 channels handled per thread.

This has bought the times down to something like 30s for 1 day and 5min for 14days.

While I appreciate what you say about the brand|series_summary options, these were mainly added at my own request in the hope they would generally speed things up. Previously I was doing exactly what you mentioned and was caching all the brand and series info, problem was (at least in initial grabs) this was very slow due to the sheer number of queries that needed to be performed (but again I wasn't doing this in parallel so possibly some improvement could be had there).

Generally I prefer the idea of having atlas do these work for me, however I'm finding that due to the fact that publisher overlaying is not yet done (I'm having to do it client side) I think there appear to be some issues with brand/series matching that cause my EPG system some headaches and possibly if I went back to doing the brand/series caching on my side I might improve this.

But for now its just about good enough for my needs as is so I'll stick with what I've got for a bit.

Adam

Fred van den Driessche

unread,
Apr 10, 2012, 6:40:28 AM4/10/12
to atla...@googlegroups.com
Hi Adam,

We're looking at putting something together in the shorter term that should speed up these queries for you. We'll let you know when we have something in place.

Cheers,
Fred

Adam Sutton

unread,
May 28, 2012, 6:05:39 AM5/28/12
to atla...@googlegroups.com
Hi Fred,

Not sure whether any progress has been made on this? I have made further improvements to my script an generally the performance is much better now (the manage change was I was rather stupidly opening a new TCP connection for each query, now I open once at the start of each thread). But not sure whether any of the performance improvement is down to changes at your end.

Regards
Adam

Fred van den Driessche

unread,
May 29, 2012, 9:34:15 AM5/29/12
to atla...@googlegroups.com
Hi Adam,

Nothing on production that would have an impact on schedule performance has changed yet, we're aiming to make anything we provide in the short term compatible with the v4 API. Glad that you've found a way to speed things up.

Fred
Reply all
Reply to author
Forward
0 new messages