Tilde-delimited
===============
+ current years-old implementation, works and well supported by
various listings grabbers
+ export/mapping to XMLTV well supported and heavily used
+ machine-parsable fields
+ one .dat file per channel
- difficult to extend with new fields tacked on to end of each
programme record (row)
- difficult to improve existing fields (e.g. programme timings)
without breaking existing implementations - data fields essentially
fixed
- data needs to be parsed and converted to valid XMLTV before it can be used
- inconsistent data formats in certain fields (e.g. season/episode numbering)
- raw data not human-friendly
- some +1 channels supplied natively have had different programme
content in the past
JSON
====
+ direct access to Atlas API without need for data being provided in
intermediate format/service to grabbers
+ potential for access to richest set of data available
+ extra/new data fields easily provided in JSON reply
+ little additional burden on metabroadcast team - XMLTV service
- data needs to be parsed and converted to valid XMLTV before it can be used
- grabbers will need to be extended to support new fields as and when
they appear in data
- may need JSON libraries to be packaged for distros providing XMLTV packages
- data in Atlast feed may differ from that provided in current XMLTV
service implementation - this will likely cause issues for software
that uses programme titles/episodes/descriptions for duplicate matching.
- heaviest rewriting for existing grabbers
XMLTV
=====
+ data supplied in ready-to-use valid XMLTV format
+ no further processing needed (although may be desirable for some
fields), resulting in much faster listings grabbing
+ programme timings available in ISO8601 format - no GMT/BST uncertainty
+ grabbers will not need to be extended to support new fields unless
they are further reprocessing them
+ fixed element formats - no uncertainty for fields currently
providing data in several different formats
- relies on metabroadcast team to configure and ensure output
validates against XMLTV DTD (and possible future XMLTV schema) and
also to add new
- less rewriting for existing grabbers
I think everyone would like as rich and consistent data as possible.
The current delimited data service has worked well for several years
since the move from scraping the Radio Times, but the growing
enthusiasm concerning metabroadcast's Atlas system and how it could
provide even richer data for the UK's XMLTV users makes providing more
structured data seem sensible. As to whether this takes the form of
JSON or XML - if the same data were available in either format I
would probably come down on the side of XML (if providing both is not
possible), purely for the fact the data is immediately usable by any
XMLTV-compliant consuming application without requiring further
processing.
Along with the XMLTV project, I am interested in looking at further
extending the XMLTV DTD (and possibly moving to an easily validated
XML Schema) to support additional data consumers and applications
could use to give their users a richer experience. Moving over to an
XML or JSON-based service would permit such new elements to be added
easily in the future.
Cheers,
Nick
tv_grab_uk_rt dev
--
Nick Morrott
MythTV Official wiki: http://mythtv.org/wiki/
MythTV users list archive: http://www.gossamer-threads.com/lists/mythtv/users
"An investment in knowledge always pays the best interest." - Benjamin Franklin
Thanks for such a thoughtful email. Jonathan is literally up a
mountain this week, but fortunately we were discussing this before he
went away, so I'll give some initial response from us with apologies
in advance if I misunderstand any details, or got anything horribly
wrong.
On 23 January 2012 13:07, Nick Morrott <knowled...@gmail.com> wrote:
> Tilde-delimited
> ===============
>
> + current years-old implementation, works and well supported by
> various listings grabbers
> + export/mapping to XMLTV well supported and heavily used
> + machine-parsable fields
> + one .dat file per channel
>
> - difficult to extend with new fields tacked on to end of each
> programme record (row)
> - difficult to improve existing fields (e.g. programme timings)
> without breaking existing implementations - data fields essentially
> fixed
> - data needs to be parsed and converted to valid XMLTV before it can be used
> - inconsistent data formats in certain fields (e.g. season/episode numbering)
> - raw data not human-friendly
> - some +1 channels supplied natively have had different programme
> content in the past
We would like to gracefully retire the tilde-separated format. Main
reasons for us are that it's non-standard, and hard to extend/improve.
Obviously there will need to be a proper transition, and we will give
due notice to the list.
> JSON
> ====
>
> + direct access to Atlas API without need for data being provided in
> intermediate format/service to grabbers
> + potential for access to richest set of data available
> + extra/new data fields easily provided in JSON reply
> + little additional burden on metabroadcast team - XMLTV service
>
> - data needs to be parsed and converted to valid XMLTV before it can be used
> - grabbers will need to be extended to support new fields as and when
> they appear in data
> - may need JSON libraries to be packaged for distros providing XMLTV packages
> - data in Atlast feed may differ from that provided in current XMLTV
> service implementation - this will likely cause issues for software
> that uses programme titles/episodes/descriptions for duplicate matching.
> - heaviest rewriting for existing grabbers
Our JSON API is the first place our new features are implemented. We
like JSON because it is easy to read, extensible, particularly easy to
use in browsers. Our plan is to extend this to include basic listings
info without the need for an API key, i.e. a superset of the data
currently available in the tilde-separated format. The JSON also
contains all kinds of other goodness, including ondemand locations,
images, channel details, with much more in the pipeline.
We currently have parallel XML and RDF/XML feeds with very similar
contents, but we see these as secondary to the JSON, and they will
tend to lag behind a bit for new features.
Both JSON and XMLTV could well be possible.
Aside from our APIs, we also offer a range of custom feeds for
specific platforms, including formats specified by Google, Microsoft,
the BBC, the Radioplayer initiative and the current RadioTimes
tilde-separated format.
We're big fans of XMLTV. Tom and I were at university with, then
worked in a startup with Ed Avis (who wrote the original XMLTV spec).
We would love to add XMLTV as a further feed output from Atlas. This
could eventually allow us to provide data from other sources and
countries too. But to add XMLTV we'd like some help from the
community. Ideally someone would volunteer to write and maintain the
XMLTV feed adaptor, although we're willing to give lots of help as
well as running an instance of the code.
So, that's our thoughts. Keen to hear others, and Jonathan will likely
pick up for us in a week, when he gets off that mountain.
Cheers,
Chris
> Please remember that not everyone is using your data feed in
> browsers! There will be a great many people who will read the data
> stream and store the data in some other format (e.g. I store it in a
> MySQL database) - if you are doing this on a server then JSON format
> is *not* helpful ;-)
You appear to have the misconception that JSON is not a browser only format.
I'm slightly confused why you think a better format, which allows you to quickly grab the fields you want with human readable code is harder:
Table("Field name") = JSON("Wanted Field Name")
Given that you have the field names as well as the values, you can usually iterate across the entire fields really easily.
Feeds that describe themselves, are just better to read and debug.
> Ok so you do plan to support other feed formats apart from just JSON
> and XML. In which case is there any problem in retaining the tilde-
> separated RT format as one of those options? You don't have to keep
> adding new data items as and when you think of them - think of the
> tilde-delimited format as a "basic" data feed service.
and
> The tilde-delim format is there already (so there is no up-front
> development cost) and if you freeze the content then there is no
> ongoing costs either. Please keep this format going!
The problem with that is this "well just leave it as it is" is that at some point you want to change that affects/could affect all the feeds, and any changes like that need to be further tested.
All I'm saying is that when you're doing production systems the "well just leave that one" is invariably the thing that gets missed when stuff breaks.
If people are parsing are on Topfields I can understand the desire for simpler, more brittle formats, but for any modern language, and I can even include Perl in this, there are parsers for JSON and XML. Personally I avoid delimited data any time a more formalised structure is available.
G
This is possible in the Atlas API, and we will be expanding the range
of queries that are possible. See here:
http://atlas.metabroadcast.com/#apiExplorer
> On Jan 25, 9:45 am, Adam Sutton <a...@adamsutton.me.uk> wrote:
> [...]
>> especially if there could be some efficient means within atlas to determine
>> whether programming has changed for a given interval. Meaning incremental
>> updates would be possible.
>
> I concur; incremental updates would be a big help and a significant
> advance, esp. for the RT data. At the moment most systems restrict
> themselves to downloading once a day (due to the volume of data and
> load on the source) but then have to process a whole 7 day's worth of
> programme data even though only 1 day's worth is actually new.
We can do this via HTTP headers. I think these are currently quite
conservative, changing once per day for all data. We could make this
more selective, updating headers only for files that changed.
> Some mechanism whereby
> (a) only additions can be selected for download
> (b) only changes can be selected
> would IMO be a great bonus and enable a new raft of useful
> functionality in the target systems, e.g. ability to retrieve last
> minute updates to broadcast schedules - which is one of the biggest
> complaints from end-users of EPG systems!
Data in the main Atlas API updates throughout the day, with a pretty
short latency between a provider update and the data changing in our
feed. Custom feeds such as the tilde-delimited format are generated on
a fixed timetable, and fast updates are harder to accommodate. In the
future we might be able to add a change feed to the API, so users can
quickly get updates
I'm keen to hear more thoughts on what other formats we should
support, beyond our existing Atlas-specific JSON, XML and RDF/XML.
There is clearly some support for maintaining the current
tilde-delimited format. Would people be keen to see XMLTV, or other
formats?
Chris
That's basically how all descendants of swedb work. See tv_grab_se_swedb
You can easily generate output in xmltv format plus an index on a
central server.
I have just extended NonameTV to support the OzTivo style data
availability signaling to our API. This addition allows to avoid asking
the web server about updates on a file by file basis.
See http://www.oztivo.net/twiki/bin/view/TVGuide/StaticXMLGuideAPI
Here's how we do it in NonameTV.
We write out one list of all channels in xmltv format.
For each channel we write one xmltv file per channel per day, compress
it and see if it is different from the last one. (mind the time stamp
in the compressed format)
If it has changed we move it to the live feed, if its unchanged we
simply delete it.
The timestamps of last modification of all files are then collected and
added to the channel list.
Now we just need to write a new client that supports the options for
caching data.
Regards,
Karl
Atlas is a data aggregator. We don't employ teams of people to actually write the listings, but one of the ways we add value is by automatically matching feeds from various sources. Atlas is also used as a foundation for some of our other products (like Voila, which provides many things, including watchlists, recommendations and buzz charts).
> While you are trying to "add value" please don't lose sight of the
> core purpose, namely providing programme data in a simple easy to use
> format.
We won't. We see that as the foundation on which is everything else is built on. It's important to us that Atlas is as easy to use as possible.
> And before you stop the tilde-delim feed can I ask you what analysis
> you have done to check its demand?
Before any decision is made to do this, we will monitor usage of the different formats over an extended period. Obviously everyone using the data for XMLTV is currently using the format and we know how many requests that is.
> Perhaps I am just 'old school' but delimited format has served us well
> in computing for over 45 years. It's simple fast and efficient.
A good old split in Perl has certainly been used by me plenty of times over the years ;)
> The tilde-delim format is there already (so there is no up-front
> development cost) and if you freeze the content then there is no
> ongoing costs either. Please keep this format going!
Unfortunately these things are never that simple. Systems change and things need kept up to date even if their outputs are not changing. If we do decide to retire the tilde format, Atlas is open source and we would be very happy if someone stepped in as the maintainer for that output format.
> Thanks for reading
Thanks for the suggestions, I have been reading and collating everything since I got back. It's great there's been so much to get through!
Cheers
Jonathan
We agree and would love to see this happen.
Anyone out there with good Java skills fancy having a crack at it?
Cheers
Jonathan
That's very interesting Karl. Thanks for sharing that and the results of the S3 testing.
I think I would make sense for any future XMLTV feed from Atlas to follow the same structure.
Cheers
Jonathan