Changing the XMLTV service data format (XML/JSON)

1,008 views
Skip to first unread message

Nick Morrott

unread,
Jan 23, 2012, 8:07:20 AM1/23/12
to atla...@googlegroups.com
Some initial thoughts (please add further thoughts inline) on the
recent discussions about the current/proposed data formats for the
XMLTV service going forwards:


Tilde-delimited
===============

+ current years-old implementation, works and well supported by
various listings grabbers
+ export/mapping to XMLTV well supported and heavily used
+ machine-parsable fields
+ one .dat file per channel

- difficult to extend with new fields tacked on to end of each
programme record (row)
- difficult to improve existing fields (e.g. programme timings)
without breaking existing implementations - data fields essentially
fixed
- data needs to be parsed and converted to valid XMLTV before it can be used
- inconsistent data formats in certain fields (e.g. season/episode numbering)
- raw data not human-friendly
- some +1 channels supplied natively have had different programme
content in the past


JSON
====

+ direct access to Atlas API without need for data being provided in
intermediate format/service to grabbers
+ potential for access to richest set of data available
+ extra/new data fields easily provided in JSON reply
+ little additional burden on metabroadcast team - XMLTV service

- data needs to be parsed and converted to valid XMLTV before it can be used
- grabbers will need to be extended to support new fields as and when
they appear in data
- may need JSON libraries to be packaged for distros providing XMLTV packages
- data in Atlast feed may differ from that provided in current XMLTV
service implementation - this will likely cause issues for software
that uses programme titles/episodes/descriptions for duplicate matching.
- heaviest rewriting for existing grabbers


XMLTV
=====

+ data supplied in ready-to-use valid XMLTV format
+ no further processing needed (although may be desirable for some
fields), resulting in much faster listings grabbing
+ programme timings available in ISO8601 format - no GMT/BST uncertainty
+ grabbers will not need to be extended to support new fields unless
they are further reprocessing them
+ fixed element formats - no uncertainty for fields currently
providing data in several different formats

- relies on metabroadcast team to configure and ensure output
validates against XMLTV DTD (and possible future XMLTV schema) and
also to add new
- less rewriting for existing grabbers

I think everyone would like as rich and consistent data as possible.
The current delimited data service has worked well for several years
since the move from scraping the Radio Times, but the growing
enthusiasm concerning metabroadcast's Atlas system and how it could
provide even richer data for the UK's XMLTV users makes providing more
structured data seem sensible. As to whether this takes the form of
JSON or XML - if the same data were available in either format I
would probably come down on the side of XML (if providing both is not
possible), purely for the fact the data is immediately usable by any
XMLTV-compliant consuming application without requiring further
processing.

Along with the XMLTV project, I am interested in looking at further
extending the XMLTV DTD (and possibly moving to an easily validated
XML Schema) to support additional data consumers and applications
could use to give their users a richer experience. Moving over to an
XML or JSON-based service would permit such new elements to be added
easily in the future.

Cheers,
Nick
tv_grab_uk_rt dev

--
Nick Morrott

MythTV Official wiki: http://mythtv.org/wiki/
MythTV users list archive: http://www.gossamer-threads.com/lists/mythtv/users

"An investment in knowledge always pays the best interest." - Benjamin Franklin

Chris Jackson

unread,
Jan 23, 2012, 1:48:56 PM1/23/12
to atla...@googlegroups.com
Nick,

Thanks for such a thoughtful email. Jonathan is literally up a
mountain this week, but fortunately we were discussing this before he
went away, so I'll give some initial response from us with apologies
in advance if I misunderstand any details, or got anything horribly
wrong.

On 23 January 2012 13:07, Nick Morrott <knowled...@gmail.com> wrote:

> Tilde-delimited
> ===============
>
> + current years-old implementation, works and well supported by
> various listings grabbers
> + export/mapping to XMLTV well supported and heavily used
> + machine-parsable fields
> + one .dat file per channel
>
> - difficult to extend with new fields tacked on to end of each
> programme record (row)
> - difficult to improve existing fields (e.g. programme timings)
> without breaking existing implementations - data fields essentially
> fixed
> - data needs to be parsed and converted to valid XMLTV before it can be used
> - inconsistent data formats in certain fields (e.g. season/episode numbering)
> - raw data not human-friendly
> - some +1 channels supplied natively have had different programme
> content in the past

We would like to gracefully retire the tilde-separated format. Main
reasons for us are that it's non-standard, and hard to extend/improve.
Obviously there will need to be a proper transition, and we will give
due notice to the list.

> JSON
> ====
>
> + direct access to Atlas API without need for data being provided in
> intermediate format/service to grabbers
> + potential for access to richest set of data available
> + extra/new data fields easily provided in JSON reply
> + little additional burden on metabroadcast team - XMLTV service
>
> - data needs to be parsed and converted to valid XMLTV before it can be used
> - grabbers will need to be extended to support new fields as and when
> they appear in data
> - may need JSON libraries to be packaged for distros providing XMLTV packages
> - data in Atlast feed may differ from that provided in current XMLTV
> service implementation - this will likely cause issues for software
> that uses programme titles/episodes/descriptions for duplicate matching.
> - heaviest rewriting for existing grabbers

Our JSON API is the first place our new features are implemented. We
like JSON because it is easy to read, extensible, particularly easy to
use in browsers. Our plan is to extend this to include basic listings
info without the need for an API key, i.e. a superset of the data
currently available in the tilde-separated format. The JSON also
contains all kinds of other goodness, including ondemand locations,
images, channel details, with much more in the pipeline.

We currently have parallel XML and RDF/XML feeds with very similar
contents, but we see these as secondary to the JSON, and they will
tend to lag behind a bit for new features.

Both JSON and XMLTV could well be possible.

Aside from our APIs, we also offer a range of custom feeds for
specific platforms, including formats specified by Google, Microsoft,
the BBC, the Radioplayer initiative and the current RadioTimes
tilde-separated format.

We're big fans of XMLTV. Tom and I were at university with, then
worked in a startup with Ed Avis (who wrote the original XMLTV spec).
We would love to add XMLTV as a further feed output from Atlas. This
could eventually allow us to provide data from other sources and
countries too. But to add XMLTV we'd like some help from the
community. Ideally someone would volunteer to write and maintain the
XMLTV feed adaptor, although we're willing to give lots of help as
well as running an instance of the code.

So, that's our thoughts. Keen to hear others, and Jonathan will likely
pick up for us in a week, when he gets off that mountain.

Cheers,
Chris

spitfires

unread,
Jan 24, 2012, 4:09:37 AM1/24/12
to Atlas

On Jan 23, 1:07 pm, Nick Morrott <knowledgejun...@gmail.com> wrote:
>
> Tilde-delimited
> - data needs to be parsed and converted to valid XMLTV before it can be used

Not necessarily; it depends on what you are using it for. If you are
assuming that all EPG data will only ever be fed into XMLTV databases
then you might have a point, but many of us use programme listing
systems which aren't based on XMLTV (shock horror! ;-) ). For me the
delimited format is ideal.



On Jan 23, 6:48 pm, Chris Jackson <ch...@metabroadcast.com> wrote:
>
> We would like to gracefully retire the tilde-separated format. Main
> reasons for us are that it's non-standard, and hard to extend/improve.

Please don't. Yes it's hard to extend etc. but for many applications
it does the job (and admirably so). I can parse a complete programme
entry in 1 line of program code, and it works on every server without
the need to have any special libraries (i.e. no XML or JSON parsers)
installed.

w.r.t. extensibility, there is no need to have all the so-called
"rich" metadata in a delimited feed. Perhaps the delimited feed could
be retained as a 'lite' data service, for those of us who don't want
all the extra bumph?



> Our JSON API is the first place our new features are implemented. We
> like JSON because it is easy to read, extensible, particularly easy to
> use in browsers.

Please remember that not everyone is using your data feed in
browsers! There will be a great many people who will read the data
stream and store the data in some other format (e.g. I store it in a
MySQL database) - if you are doing this on a server then JSON format
is *not* helpful ;-)

Therein lies another question... I'm a little confused as to where
Atlas sees itself going. Is it merely a data provider or do you
intend it being a listings *service*. Hmm that's not very clear... is
Atlas simply an agglomerator of raw data (like the RT data stream), or
do you intend to be a service provider (like Digiguide)?

While you are trying to "add value" please don't lose sight of the
core purpose, namely providing programme data in a simple easy to use
format.



> Aside from our APIs, we also offer a range of custom feeds for
> specific platforms, including formats specified by Google, Microsoft,
> the BBC, the Radioplayer initiative and the current RadioTimes
> tilde-separated format.

Ok so you do plan to support other feed formats apart from just JSON
and XML. In which case is there any problem in retaining the tilde-
separated RT format as one of those options? You don't have to keep
adding new data items as and when you think of them - think of the
tilde-delimited format as a "basic" data feed service.

And before you stop the tilde-delim feed can I ask you what analysis
you have done to check its demand?



I'm not saying don't have XML, JSON, or whatever output format - hey
I'll probably use XML *as well as* tilde-delimited for different
purposes - but don't scrap the delimited format just because it's not
trendy.

Here's a parallel: think of the number of different spreadsheet data
formats out there... yet what do we often fall back on when we want to
transfer data from one to another? Why comma-separated-values of
course! It's not 'clever', it's not 'elegant', but it works.

Perhaps I am just 'old school' but delimited format has served us well
in computing for over 45 years. It's simple fast and efficient.

The tilde-delim format is there already (so there is no up-front
development cost) and if you freeze the content then there is no
ongoing costs either. Please keep this format going!

Thanks for reading,
Geoff

The Gareth

unread,
Jan 24, 2012, 5:58:26 AM1/24/12
to atla...@googlegroups.com

On 24 Jan 2012, at 09:09, spitfires wrote:

> Please remember that not everyone is using your data feed in
> browsers! There will be a great many people who will read the data
> stream and store the data in some other format (e.g. I store it in a
> MySQL database) - if you are doing this on a server then JSON format
> is *not* helpful ;-)

You appear to have the misconception that JSON is not a browser only format.

I'm slightly confused why you think a better format, which allows you to quickly grab the fields you want with human readable code is harder:

Table("Field name") = JSON("Wanted Field Name")

Given that you have the field names as well as the values, you can usually iterate across the entire fields really easily.

Feeds that describe themselves, are just better to read and debug.

> Ok so you do plan to support other feed formats apart from just JSON
> and XML. In which case is there any problem in retaining the tilde-
> separated RT format as one of those options? You don't have to keep
> adding new data items as and when you think of them - think of the
> tilde-delimited format as a "basic" data feed service.

and

> The tilde-delim format is there already (so there is no up-front
> development cost) and if you freeze the content then there is no
> ongoing costs either. Please keep this format going!

The problem with that is this "well just leave it as it is" is that at some point you want to change that affects/could affect all the feeds, and any changes like that need to be further tested.

All I'm saying is that when you're doing production systems the "well just leave that one" is invariably the thing that gets missed when stuff breaks.

If people are parsing are on Topfields I can understand the desire for simpler, more brittle formats, but for any modern language, and I can even include Perl in this, there are parsers for JSON and XML. Personally I avoid delimited data any time a more formalised structure is available.

G

spitfires

unread,
Jan 24, 2012, 7:35:05 AM1/24/12
to Atlas

On Jan 24, 10:58 am, The Gareth <thegar...@gmail.com> wrote:
> You appear to have the misconception that JSON is not a browser only format.

No, I was replying to Chris who suggested their reason for having JSON
as their primary output format was because it is easier to use in
browsers. I was countering that not everyone uses browsers, and for
many of those people JSON is not the best format to use.


> I'm slightly confused why you think a better format, which allows you to quickly grab the fields you want with human readable code is harder:
> Table("Field name") = JSON("Wanted Field Name"
> Given that you have the field names as well as the values, you can usually iterate across the entire fields really easily.

...As you can with a simple array of field labels.

Data = Record[ indexof [ Labels("Wanted Field Name") ]]

Sure the labels array has to change if the feed content changes but
that's not your point ;-) You seem to think that parsing delimited
data is all squiggles and unreadable code; it can be just as human-
readable (at the cost of only 1 more line of code).

What's the primary purpose of the data feed - is it to be "human
readable" or is it for automatic parsing? If it's the latter then I
would suggest that human readability isn't that important.


> Feeds that describe themselves, are just better to read and debug.

Have you tried reading JSON without a prettifier? ! ;-)


> [...] for any modern language, and I can even include Perl in this, there are parsers for JSON and XML.

But why *force* people to install libraries to parse a feed if they
don't need/want to? KISS?


> Personally I avoid delimited data any time a more formalised structure is available.

*nods* Horses for courses: sometimes I use a *less* formalised
structure such as XML, other times I'll use a simple delimited
format. It all depends on which is best for the task.

Cheers,
Geoff

Adam Sutton

unread,
Jan 25, 2012, 4:45:00 AM1/25/12
to atla...@googlegroups.com
All,

I've just started a new thread as I'm interested in looking at writing something to handle the Atlas native output and generate XMLTV formatted output (at least initially). Though currently this requires a license key which may or may not be a problem going forward.

Would it not make sense, as has previously been suggested, to leave the delimited format data alone (at least medium term) for those that want to use it. But to investigate writing a native atlas grabber?

I can see the benefits in leaving the existing data formats for those that already use them. Increase in data size is generally a minor issue, especially if there could be some efficient means within atlas to determine whether programming has changed for a given interval. Meaning incremental updates would be possible.

The richer format/source of data within Atlas is likely to be beneficial in the long run for many applications. Certainly my needs require a more more robust set of data (may even mean XMLTV needs tweaking).

Anyway,

Just my two pence!

spitfires

unread,
Jan 25, 2012, 6:35:44 AM1/25/12
to Atlas

I agree. I can foresee a time when there is a much larger data set
available and we will reach the point where people will want the
ability to select which data items they want rather than having to
download a dump of the whole schedule, especially if they only want
part of it.

Perhaps grouping the data set into logical subsets which can then be
selected for download? Either as alternatives (e.g. set_A or set_B)
or cumulatives (e.g. subset_Core + subset_C + ...).

On Jan 25, 9:45 am, Adam Sutton <a...@adamsutton.me.uk> wrote:
[...]
> especially if there could be some efficient means within atlas to determine
> whether programming has changed for a given interval. Meaning incremental
> updates would be possible.

I concur; incremental updates would be a big help and a significant
advance, esp. for the RT data. At the moment most systems restrict
themselves to downloading once a day (due to the volume of data and
load on the source) but then have to process a whole 7 day's worth of
programme data even though only 1 day's worth is actually new.

Some mechanism whereby
(a) only additions can be selected for download
(b) only changes can be selected
would IMO be a great bonus and enable a new raft of useful
functionality in the target systems, e.g. ability to retrieve last
minute updates to broadcast schedules - which is one of the biggest
complaints from end-users of EPG systems!


(Caveat: apologies if this is already there/possible in the XML feeds
- I only use the raw RT feed).

Chris Jackson

unread,
Jan 25, 2012, 2:36:56 PM1/25/12
to atla...@googlegroups.com
On 25 January 2012 11:35, spitfires <spitfire...@gmail.com> wrote:
>
> I agree.  I can foresee a time when there is a much larger data set
> available and we will reach the point where people will want the
> ability to select which data items they want rather than having to
> download a dump of the whole schedule, especially if they only want
> part of it.

This is possible in the Atlas API, and we will be expanding the range
of queries that are possible. See here:

http://atlas.metabroadcast.com/#apiExplorer

> On Jan 25, 9:45 am, Adam Sutton <a...@adamsutton.me.uk> wrote:
> [...]
>> especially if there could be some efficient means within atlas to determine
>> whether programming has changed for a given interval. Meaning incremental
>> updates would be possible.
>
> I concur; incremental updates would be a big help and a significant
> advance, esp. for the RT data.  At the moment most systems restrict
> themselves to downloading once a day (due to the volume of data and
> load on the source) but then have to process a whole 7 day's worth of
> programme data even though only 1 day's worth is actually new.

We can do this via HTTP headers. I think these are currently quite
conservative, changing once per day for all data. We could make this
more selective, updating headers only for files that changed.

> Some mechanism whereby
> (a) only additions can be selected for download
> (b) only changes can be selected
> would IMO be a great bonus and enable a new raft of useful
> functionality in the target systems, e.g. ability to retrieve last
> minute updates to broadcast schedules - which is one of the biggest
> complaints from end-users of EPG systems!

Data in the main Atlas API updates throughout the day, with a pretty
short latency between a provider update and the data changing in our
feed. Custom feeds such as the tilde-delimited format are generated on
a fixed timetable, and fast updates are harder to accommodate. In the
future we might be able to add a change feed to the API, so users can
quickly get updates

I'm keen to hear more thoughts on what other formats we should
support, beyond our existing Atlas-specific JSON, XML and RDF/XML.
There is clearly some support for maintaining the current
tilde-delimited format. Would people be keen to see XMLTV, or other
formats?

Chris


--
ch...@metabroadcast.com -- +44 7967 756705

Adam Sutton

unread,
Jan 25, 2012, 7:02:09 PM1/25/12
to atla...@googlegroups.com
With regards to the incremental updates this definitely requires changes to the upstream data to make it work (in my opinion). I include in my RT XMLTV grabber script to option to do an incremental update. Basically I carve up the data into days and then process only where no changes are detected. However since I have to basically compare the entire contents (since there is no per day timestamp from source) it actually didn't end up any quicker. Some of this was probably due to my implementation which was trying to cope with potential for the output data being missing / not properly matching the input.

Much of this complication would go away if the upstream data was easy to break down and check for updates etc... With regards to the XMLTV format to make it at all efficient I think you'd need to keep caches of the actual output XML and then the final output would be a collection of all the cached (and generated files). If that makes any sense.

Or an entirely new architecture is required that actually deals in updates rather than just full EPG streams.

Actually the main issue with RT being run so infrequently its horribly inefficient (this is not an inherent limitation, poor coding). The XMLTV grabber takes about 1hr to grab data on my PVR machine (old kit), my own version takes 2-3mins. This at least makes it feasible to run fairly frequently.

On 25 January 2012 11:35, spitfires <spitfire...@gmail.com> wrote:

dazzle

unread,
Jan 26, 2012, 5:15:11 AM1/26/12
to Atlas
Personally I can see extra problems with supplying an XMLTV feed
directly.

If you supply an XMLTV file per channel then my grabber would either
have to download each file I wanted then combine them into another
XMLTV file, or I would have to grab the XMLTV data from each file I
wanted and generate another combined XMLTV file.

The way around this would be to make an http request which then
generated an XMLTV file with all the channels I wanted for the number
of days I wanted and I just download the single file.

The other problem with XMLTV is if other, richer datafields were being
included, for example multiple star-ratings or reviews, and I didn't
want any of those fields then I would have to go through the XMLTV
file removing the elements I don't want and saving the resultant file
as a different XMLTV file.

The way around that would be similar to the one aboe, an http request
specifying which data I wanted.

For me XMLTV is the final format I want not the beginning format. In
JSON format you could include all the data available, my grabber goes
off gets the data I want and creates an XMLTV file from that - I would
be going from JSON to XMLTV rather than XMLTV to XMLTV (which seems
rather superflous).

I already use JSON, using Ruby, from the RottenTomatoes API (http://
developer.rottentomatoes.com) to create star-ratings for films in my
XMLTV file and it is quick and simple to manage - I get the data, save
it into an sqlite3 database and then match and add the star-ratings
element into the relevant programme elements in my XMLTV file (JSON -
sqlite3 - XML rather than XML - sqlite3 - XML).

Also most modern browsers have native JSON support built-in via
JSON.parse() and just about all programming languages have a JSON
library, it is also a smaller datafeed which makes it more portable
across different platforms (desktop pc, smartphone, tablet, smart tv),
not all of them accessing the internet via fixed line broadband.

I think JSON should be the base format and if anyone wants to
contribute code to Atlas to create XMLTV and / or tilde delimited
formats then they could.

To the question of how much data to supply then I forsee two possible
JSON formats - a free basic dataset which would be equivalent to the
current RT tilde delimited format, and then another more data rich
format which is behind an API key which could be paid (a subscription
method) in the same way SchedulesDirect does for US / Canadian TV data
(http://www.schedulesdirect.org/).

Anyway just my thoughts,

Paul

Karl Dietz

unread,
Jan 26, 2012, 3:58:37 AM1/26/12
to atla...@googlegroups.com
On 26.01.2012 01:02, Adam Sutton wrote:
> With regards to the incremental updates this definitely requires changes
> to the upstream data to make it work (in my opinion). I include in my RT
> XMLTV grabber script to option to do an incremental update. Basically I
> carve up the data into days and then process only where no changes are
> detected. However since I have to basically compare the entire contents
> (since there is no per day timestamp from source) it actually didn't end
> up any quicker. Some of this was probably due to my implementation which
> was trying to cope with potential for the output data being missing /
> not properly matching the input.
>
> Much of this complication would go away if the upstream data was easy to
> break down and check for updates etc... With regards to the XMLTV format
> to make it at all efficient I think you'd need to keep caches of the
> actual output XML and then the final output would be a collection of all
> the cached (and generated files). If that makes any sense.

That's basically how all descendants of swedb work. See tv_grab_se_swedb

You can easily generate output in xmltv format plus an index on a
central server.

I have just extended NonameTV to support the OzTivo style data
availability signaling to our API. This addition allows to avoid asking
the web server about updates on a file by file basis.

See http://www.oztivo.net/twiki/bin/view/TVGuide/StaticXMLGuideAPI


Here's how we do it in NonameTV.

We write out one list of all channels in xmltv format.
For each channel we write one xmltv file per channel per day, compress
it and see if it is different from the last one. (mind the time stamp
in the compressed format)
If it has changed we move it to the live feed, if its unchanged we
simply delete it.

The timestamps of last modification of all files are then collected and
added to the channel list.


Now we just need to write a new client that supports the options for
caching data.

Regards,
Karl

Jonathan Tweed

unread,
Feb 7, 2012, 12:37:59 PM2/7/12
to atla...@googlegroups.com
On Tuesday, 24 January 2012 at 09:09, spitfires wrote:
> Therein lies another question... I'm a little confused as to where
> Atlas sees itself going. Is it merely a data provider or do you
> intend it being a listings *service*. Hmm that's not very clear... is
> Atlas simply an agglomerator of raw data (like the RT data stream), or
> do you intend to be a service provider (like Digiguide)?


Atlas is a data aggregator. We don't employ teams of people to actually write the listings, but one of the ways we add value is by automatically matching feeds from various sources. Atlas is also used as a foundation for some of our other products (like Voila, which provides many things, including watchlists, recommendations and buzz charts).

> While you are trying to "add value" please don't lose sight of the
> core purpose, namely providing programme data in a simple easy to use
> format.


We won't. We see that as the foundation on which is everything else is built on. It's important to us that Atlas is as easy to use as possible.


> And before you stop the tilde-delim feed can I ask you what analysis
> you have done to check its demand?


Before any decision is made to do this, we will monitor usage of the different formats over an extended period. Obviously everyone using the data for XMLTV is currently using the format and we know how many requests that is.

> Perhaps I am just 'old school' but delimited format has served us well
> in computing for over 45 years. It's simple fast and efficient.


A good old split in Perl has certainly been used by me plenty of times over the years ;)



> The tilde-delim format is there already (so there is no up-front
> development cost) and if you freeze the content then there is no
> ongoing costs either. Please keep this format going!


Unfortunately these things are never that simple. Systems change and things need kept up to date even if their outputs are not changing. If we do decide to retire the tilde format, Atlas is open source and we would be very happy if someone stepped in as the maintainer for that output format.

> Thanks for reading

Thanks for the suggestions, I have been reading and collating everything since I got back. It's great there's been so much to get through!

Cheers
Jonathan

Jonathan Tweed

unread,
Feb 7, 2012, 12:40:42 PM2/7/12
to atla...@googlegroups.com
On Thursday, 26 January 2012 at 10:15, dazzle wrote:
> I think JSON should be the base format and if anyone wants to
> contribute code to Atlas to create XMLTV and / or tilde delimited
> formats then they could.


We agree and would love to see this happen.

Anyone out there with good Java skills fancy having a crack at it?

Cheers
Jonathan


Jonathan Tweed

unread,
Feb 7, 2012, 12:42:08 PM2/7/12
to atla...@googlegroups.com
On Thursday, 26 January 2012 at 08:58, Karl Dietz wrote:
> See http://www.oztivo.net/twiki/bin/view/TVGuide/StaticXMLGuideAPI

That's very interesting Karl. Thanks for sharing that and the results of the S3 testing.

I think I would make sense for any future XMLTV feed from Atlas to follow the same structure.

Cheers
Jonathan

Reply all
Reply to author
Forward
0 new messages