An Introduction to PBCore

30 views
Skip to first unread message

John Tynan

unread,
Dec 9, 2008, 1:01:40 PM12/9/08
to django-...@googlegroups.com
In riffing off Scot's dictum that it's "all about the data model. Get
this right and you win" I thought I would pass along something to the
group that might be useful to consider in thinking about the model for
the django-newsoom.

While News organizations have long made use of NewsML as a metadata
standard for the delivery and archiving of multimedia news, some folks
in public broadcasting have been working on PBCore, a similar industry
standard: http://www.pbcore.org/

To view a story which is built using pbcore as a model, see this story
(that new member to the django-newsroom group Jack Brighton designed):

http://will.illinois.edu/prairiefire/episode/pf2007-05-10/

To view a pbcore record for this same story, see:

http://will.illinois.edu/prairiefire/pbcoreEpisode/pf2007-05-10/

To view a tool a tool used to create pbcore records, have a look at
this Rails application that David Rice and Mike Castleman from WGBH
put together:

http://pbcore.vermicel.li/

I asked them if I could see the data model and he sent this along (see
attached). I could see where django and pinax could be used to create
a similar tool.

Along those same lines, NPR has created an API and NPRML, their own
custom XML structure, to exchange rich media content:
http://www.npr.org/api/

It might be useful to think about what a django-newsroom API might
look like and how it might be used. It may be useful to think about
what would be some useful standards to target, in addition to RSS, for
distributing and archiving stories.

pbcore_structure.sql

johntynan

unread,
Dec 9, 2008, 2:05:10 PM12/9/08
to Django Newsroom
One Correction: the pbcore vermicelli tool is being developed at WNET
(Channel 13, New York), not at WGBH (My apologies).

If you are interested in learning more about PBCore you may also want
to visit: http://pbcoreresources.org/

I was asked to mention that the pbcore vermicelli tool is not an
official product of pbcore.org and is generated independently. The
opinions expressed in this representation of pbcore's xsd in sql are
those of the authors and do not necessarily reflect the opinions of
the creators of PBCore.
>  pbcore_structure.sql
> 20KViewDownload

Milan Andric

unread,
Dec 9, 2008, 3:28:37 PM12/9/08
to django-...@googlegroups.com
Hey John, thanks for this post. So a system without a public API is
handicapped and we should use something like PBCore or other
standards/solutions if they exist. It looks like PBCore is ready to
go. Does anyone know of anything else like PBCore or NewsML that we
should keep in mind?

I also played around with the NPRML/API generator and it's awesome. I
was able to get nprml/rss/xml, json, html, javascript all via the REST
API. Within a few minutes had a widget that can dynamically syndicate
NPR content anywhere on the web based on my custom query. This is the
heart of building a web service, this is what news is at its core, an
informational service to the public. I imagine NPR began writing its
own XML spec because the existing ones were not sufficient at the
time.

So this gets us into the API conversation. Which is something I have
not spent time on, but this is a start. There needs to be a
documented (ideally, standard) interface to our data. Be it NewsML or
Dublin Core/PBCore, this is a must-have for django newsroom.

Also just spent a little time on http://will.illinois.edu/. I am very
impressed ... also runs at my alma mater. Totally missed this gem of
a site. I contacted the webmaster and asked about the CMS they are
using and he responded with EE (expression engine), I was totally
surprised! Oh I should also welcome him to the group. Welcome Jack
and thanks for joining! Your site is an inspiration for this project.

Also came across your NPR simile demo. Another gem!
http://npr-simile-timeline.googlecode.com/svn/trunk/newsample.html
Again, vital to a news org. Any content that goes into a
django-newsroom cms should be able to plug right into this type of
application.

--
Milan

johntynan

unread,
Dec 10, 2008, 2:02:32 PM12/10/08
to Django Newsroom
Milan, I appreciate your pointing out my work with the NPR API.
Thanks also for mentioning widgets. I think publishing stories by way
of widgets is a good, practical justification for having an api.

I wanted to mention two things with regards to thinking about
standards and interoperability:

1) there are many gradations of interoperability, pbcore and nprml
gets a lot of data out there in a very structured way. But there is a
vast middle ground between RSS and these formats, MediaRSS and ATOM
being two widely accepted standards that have the benefit of allowing
multiple enclosures per item.

2) It may be that some of the models that would contrbute to a story
the django-newsroom may arise from the use of third-party django
apps. It would be interesting to see how a rich media feed from
django-newsroom would be influenced by this. It would also give us
the benefit of using some existing code as a point of reference. I
think it's good to keep these kinds of outputs and approaches in
mind. Particularly with the idea that if the data isn't in the
database at the start it will be difficult to get this into a feed.

For instance, based on some recent experience with MediaRSS there are
fields like bitrate, framerate, channels specifically for audio/mpeg
content, when we add an audio file to the site we do not store that
metadata along with the file and so there is now way to get this data
out into the feed when we need it.

Two additional notes:

It may also be useful to look at what the New York Times is doing with
their API: http://developer.nytimes.com/

It may be useful to experiment with something like the multirespose
python class

http://github.com/toastdriven/multiresponse/tree/master

for html, xml and json rendering.


On Dec 9, 1:28 pm, "Milan Andric" <mand...@gmail.com> wrote:
> Hey John, thanks for this post.  So a system without a public API is
> handicapped and we should use something like PBCore or other
> standards/solutions if they exist.  It looks like PBCore is ready to
> go.  Does anyone know of anything else like PBCore or NewsML that we
> should keep in mind?
>
> I also played around with the NPRML/API generator and it's awesome.  I
> was able to get nprml/rss/xml, json, html, javascript all via the REST
> API.  Within a few minutes had a widget that can dynamically syndicate
> NPR content anywhere on the web based on my custom query.  This is the
> heart of building a web service, this is what news is at its core, an
> informational service to the public.  I imagine NPR began writing its
> own XML spec because the existing ones were not sufficient at the
> time.
>
> So this gets us into the API conversation.  Which is something I have
> not spent time on, but this is a start.  There needs to be a
> documented (ideally, standard) interface to our data.  Be it NewsML or
> Dublin Core/PBCore, this is a must-have for django newsroom.
>
> Also just spent a little time onhttp://will.illinois.edu/.  I am very
> impressed ... also runs at my alma mater.  Totally missed this gem of
> a site.  I contacted the webmaster and asked about the CMS they are
> using and he responded with EE (expression engine), I was totally
> surprised!  Oh I should also welcome him to the group. Welcome Jack
> and thanks for joining!  Your site is an inspiration for this project.
>
> Also came across your NPR simile demo.  Another gem!http://npr-simile-timeline.googlecode.com/svn/trunk/newsample.html
> Again, vital to a news org.  Any content that goes into a
> django-newsroom cms should be able to plug right into this type of
> application.
>
> --
> Milan
>

Milan Andric

unread,
Dec 10, 2008, 6:29:43 PM12/10/08
to django-...@googlegroups.com
Hey Folks,

Comments below ...

On Wed, Dec 10, 2008 at 1:02 PM, johntynan <jgt...@gmail.com> wrote:
>
> Milan, I appreciate your pointing out my work with the NPR API.
> Thanks also for mentioning widgets. I think publishing stories by way
> of widgets is a good, practical justification for having an api.
>
> I wanted to mention two things with regards to thinking about
> standards and interoperability:
>
> 1) there are many gradations of interoperability, pbcore and nprml
> gets a lot of data out there in a very structured way. But there is a
> vast middle ground between RSS and these formats, MediaRSS and ATOM
> being two widely accepted standards that have the benefit of allowing
> multiple enclosures per item.
>
> 2) It may be that some of the models that would contrbute to a story
> the django-newsroom may arise from the use of third-party django
> apps. It would be interesting to see how a rich media feed from
> django-newsroom would be influenced by this. It would also give us
> the benefit of using some existing code as a point of reference. I
> think it's good to keep these kinds of outputs and approaches in
> mind. Particularly with the idea that if the data isn't in the
> database at the start it will be difficult to get this into a feed.
>
> For instance, based on some recent experience with MediaRSS there are
> fields like bitrate, framerate, channels specifically for audio/mpeg
> content, when we add an audio file to the site we do not store that
> metadata along with the file and so there is now way to get this data
> out into the feed when we need it.

Ah that's a good point. So it helps to know what API standards you
want to support to build the requirements into your application. A
little bit of forethought can save you time later in adjustment. I
think the principle here is to gather as much meta data as possible
about your data. At the same time, limiting the grief on the end
user. Media libraries exist to extract header information from
various formats like mp3s and jpgs. A content management system that
supports this, not so common. Add on PBCore support and there is only
one of you?

Did a quick google search for "pbcore cms"
Came up with some crufty looking stuff.
http://findarticles.com/p/articles/mi_m0EIN/is_2005_April_6/ai_n13559449/pg_1?tag=artBody;col1
http://www.leadingedgedesign.com/html/solutions/products-mas.php

And Jack's presentation from Feb:
http://www.pbcore.org/resources/items/InternetMediaLibrary.pdf

That's another area where things are converging in terms of media
solutions/media asset management, libraries and news/broadcasting
organizations.

>
> Two additional notes:
>
> It may also be useful to look at what the New York Times is doing with
> their API: http://developer.nytimes.com/

Was looking at this yesterday as well. And here's a quote from the FAQ:

"3. Why are you offering APIs?
...
But we also have a simpler, more compelling reason: journalism. To
inform the public or tell a story, we use articles, photos, videos,
interactive graphics, slideshows and more. Data has always been the
primary force behind those features, and now it can become a feature
in its own right. Our APIs help us fulfill the newspaper's
journalistic mission by putting more information in the hands of the
public — and they also expand that mission by giving users the ability
to find and tell their own stories."

Now we just need to convince them that it's also in their journalistic
mission to contribute to Django Newsroom. ;)
Grassroots iz we.

So looks like they are using a 3rd party to provide the API services.

http://www.mashery.com/product/

Kind of lame considering PBS/NPR and other organizations are working
on all their stuff in-house and coming up with standards to
interoperate? Or maybe just a smart move? Financially? Maybe we can
sick a reporter on it? I've already left some droppings in the forums,
might as well ask away. I wonder what the NPR/PBS folks think of the
NYTimes API?

>
> It may be useful to experiment with something like the multirespose
> python class
>
> http://github.com/toastdriven/multiresponse/tree/master
>
> for html, xml and json rendering.

Noted.

Well I will keep API in mind when working on the video and slideshow
apps and keep you posted.

--
Milan

Reply all
Reply to author
Forward
0 new messages