Playing with the NPR API

48 views
Skip to first unread message

Chris Beer

unread,
Mar 30, 2010, 6:13:26 PM3/30/10
to public media
I've been testing with ingesting items from the NPR API into Solr (a
Lucene-based search engine) to see what kinds of interesting analysis
and functionality can be quickly prototyped. The demonstrator, with
about 4 years of content, is at http://cbeer.info/~chris/npr-solr/npr.html
. I've blogged about it at http://authoritativeopinion.com/blog/2010/03/19/npr-api-solr/
and pushed the code out to github at http://github.com/cbeer/npr-solr
.

I'd be interested in hearing what kinds of use cases people have for
something like this.The first prototype I developed on top of it was a
"More like this" feature to pull in similar items. Maybe it's
something as simple as seeing what topics different correspondents
cover, comparing bylines over time, etc..

Hope someone finds this interesting.

Chris

johntynan

unread,
Mar 31, 2010, 7:59:23 AM3/31/10
to public media
Chris,

Thanks so much for doing this. I think one of the great strengths of
your using Solr is its ability to get beyond the 20 item per request
limitation.

I have two ideas that I would like to work on. I think the NPR API
needs a "editorial layer" where you can generate a large set of
results, then "curate" them according to the relevance to the story or
topic that you are hoping to show. I have a use case with the NPR
Timeline widget that I'd be glad to talk about with you further.

I also have a second idea that I was intending to work on here:
http://code.google.com/p/nprmusicmegaphone/

I call it the NPR Music Megaphone:

NPR Music Megaphone aims to create a seamless listening experience for
NPR Music content. The goal is to leverage the NPR API and web
standards like xspf (and eventually HTML5) to provide a continuous
music/interview stream based on user preferences.

Starting with the NPR API Reference:
http://www.npr.org/api/inputReference.php

I am thinking of creating a web-based music station with:

Choice of Genres:
http://api.npr.org/list?id=3018

Choice of Artists:
http://api.npr.org/list?id=3009

Then, to create xml parsers (and an xml to xspf parser).

Specifically, I will want to look for links to m3u files (among other
resoureces) in a feed similar to this:
http://api.npr.org/query?id=15404041&fields=title,teaser,storyDate,audio,album,artist&output=NPRML&apiKey=MDAxNzgwMDQ5MDEyMTQ4NzYyMjU4YmY1Yw004

Then, in the browser, use the Wax Mp3 player to play the audio:
http://wiki.github.com/waxmp3/player

I'd like to host this on google app engine.

On Mar 30, 6:13 pm, Chris Beer <chris_b...@wgbh.org> wrote:
> I've been testing with ingesting items from the NPR API into Solr (a
> Lucene-based search engine) to see what kinds of interesting analysis
> and functionality can be quickly prototyped. The demonstrator, with

> about 4 years of content, is athttp://cbeer.info/~chris/npr-solr/npr.html
> . I've blogged about it athttp://authoritativeopinion.com/blog/2010/03/19/npr-api-solr/
> and pushed the code out to github athttp://github.com/cbeer/npr-solr

johntynan

unread,
Mar 31, 2010, 8:11:05 AM3/31/10
to public media
Also, while I'm dreaming about it, I'd like people to be able to login
to this app using their google account, openid or facebook connect
accound and save preset stations/artist searches as well as "favorite"
a song or artist.

I'd like to allow people to add comments (via status updates to
facebook or twitter).

Also, in addition to posting to facebook, have a live chat around each
song/story.

So every time you enter a song/story you have the option to enter the
chat with people who are currently listening to this same piece.

We could use the xmmp? (jabber protocol on google app engine). These
real-time chat sessions could feature group and private modes.

Since the xml parser would use media rss or atom over nprml, we could
open this service up to jamendo or other music services.

Could we also make use of google wave as the realtime chat protocol?
The idea of playing chats back in realtime... if these were timed to a
piece of music, that would be interesting.

On Mar 31, 7:59 am, johntynan <jgty...@gmail.com> wrote:
> Chris,
>
> Thanks so much for doing this.  I think one of the great strengths of
> your using Solr is its ability to get beyond the 20 item per request
> limitation.
>
> I have two ideas that I would like to work on.  I think the NPR API
> needs a "editorial layer" where you can generate a large set of
> results, then "curate" them according to the relevance to the story or
> topic that you are hoping to show.  I have a use case with the NPR
> Timeline widget that I'd be glad to talk about with you further.
>
> I also have a second idea that I was intending to work on here:http://code.google.com/p/nprmusicmegaphone/
>
> I call it the NPR Music Megaphone:
>
> NPR Music Megaphone aims to create a seamless listening experience for
> NPR Music content.  The goal is to leverage the NPR API and web
> standards like xspf (and eventually HTML5) to provide a continuous
> music/interview stream based on user preferences.
>
> Starting with the NPR API Reference:http://www.npr.org/api/inputReference.php
>
> I am thinking of creating a web-based music station with:
>
> Choice of Genres:http://api.npr.org/list?id=3018
>
> Choice of Artists:http://api.npr.org/list?id=3009
>
> Then, to create xml parsers (and an xml to xspf parser).
>
> Specifically, I will want to look for links to m3u files (among other

> resoureces) in a feed similar to this:http://api.npr.org/query?id=15404041&fields=title,teaser,storyDate,au...

Keith Hopper

unread,
Apr 1, 2010, 12:39:10 PM4/1/10
to public...@googlegroups.com
I have two ideas that I would like to work on.  I think the NPR API
needs a "editorial layer" where you can generate a large set of
results, then "curate" them according to the relevance to the story or
topic that you are hoping to show.  I have a use case with the NPR
Timeline widget that I'd be glad to talk about with you further.

I understand that KPCC has built something like this, and we at Public Interactive have been looking at how to incorporate something like this into our Core Publisher pilot project.

Definitely interested in what you come up with, John.

Keith
--
Keith Hopper

johntynan

unread,
Apr 6, 2010, 11:43:14 AM4/6/10
to public media
Following up on the question:

http://twitter.com/pubmedia/status/11670649082
From the backlog last week: What can #pubmedia learn from chat
roulette (and now NY Times roulette -- http://bit.ly/dxaiiZ).

http://twitter.com/pubmedia/status/11670658750
Specifically, how can #pubmedia incorporate serendipity and planned
randomness in content discovery (using APIs, deep archives, etc.)

I proposed the idea of

* a mobile app that used geographic context to present #pubmedia
content relating to physical location

* in talking with http://twitter.com/rgutel. She suggested a public
media "roadtrip app". Listen to stories relevant to travel route.

As I recall, Daniel Jacobson created a google map / NPR/NYT map mashup
for OSCON 2009:

http://www.danieljacobson.com/NewsMap/

As Daniel describes it, NewsMap "sends content from both NPR's API and
feeds from NYTimes through Yahoo!'s PlaceMaker API
http://developer.yahoo.com/geo/placemaker/ to identify geo-information
about the stories. The stories are then placed on a Google Map.
Code is available at" http://github.com/danieljacobson/NewsMap/tree/master

On Mar 30, 6:13 pm, Chris Beer <chris_b...@wgbh.org> wrote:

> I've been testing with ingesting items from the NPR API into Solr (a
> Lucene-based search engine) to see what kinds of interesting analysis
> and functionality can be quickly prototyped. The demonstrator, with

> about 4 years of content, is athttp://cbeer.info/~chris/npr-solr/npr.html
> . I've blogged about it athttp://authoritativeopinion.com/blog/2010/03/19/npr-api-solr/
> and pushed the code out to github athttp://github.com/cbeer/npr-solr

Amanda Hirsch

unread,
Apr 6, 2010, 11:52:40 AM4/6/10
to public...@googlegroups.com
I'd love to see a roadtrip app and/or online guide that includes some human curation, in addition to whatever could be automated through the API...someone to pick out a selection of great cultural stories that are "recommended public media content" for whatever place you're traveling through...

--
To unsubscribe, reply using "remove me" as the subject.

Chris Beer

unread,
Apr 6, 2010, 12:07:27 PM4/6/10
to public media
The NewsMap thing is pretty cool, if I get a chance I'll play around
with it.

So, what would it take to get local stations to expose their content
in a scrapable way? If I were an optimist, I'd say the new NPR ingest
API could solve all sorts of problems and provide a nice dataset. As a
pessimist though, it looks like we'd have to combine a number of
sources:

- it looks like PI provides RSS feeds, which should capture some swath
of content.. Is there a deep archive behind all that?
- some stations (including my own...) don't provide (discoverable?)
RSS feeds for their content; is there a way to incorporate all this
content?
- what can we do about local arts + culture content, which probably
isn't captured in an existing feed. Are there metadata hints page
creators could offer? would we need a manual ingest/tagging interface
(which is one approach PBS is taking for content metadata).
- can we tap into the COVE API and get useful information?

The more I think about this, the more excited I get -- there's so much
potential in aggregating and exposing local material (and then using
that to justify the importance of local public media)


Chris


On Apr 6, 11:52 am, Amanda Hirsch <ahirsc...@gmail.com> wrote:
> I'd love to see a roadtrip app and/or online guide that includes some human
> curation, in addition to whatever could be automated through the
> API...someone to pick out a selection of great cultural stories that are
> "recommended public media content" for whatever place you're traveling
> through...
>
>
>
> On Tue, Apr 6, 2010 at 11:43 AM, johntynan <jgty...@gmail.com> wrote:
> > Following up on the question:
>
> >http://twitter.com/pubmedia/status/11670649082
> > From the backlog last week: What can #pubmedia learn from chat

> > roulette (and now NY Times roulette --http://bit.ly/dxaiiZ).


>
> >http://twitter.com/pubmedia/status/11670658750
> > Specifically, how can #pubmedia incorporate serendipity and planned
> > randomness in content discovery (using APIs, deep archives, etc.)
>
> > I proposed the idea of
>
> > * a mobile app that used geographic context to present #pubmedia
> > content relating to physical location
>

> > * in talking withhttp://twitter.com/rgutel. She suggested a public


> > media "roadtrip app". Listen to stories relevant to travel route.
>
> > As I recall, Daniel Jacobson created a google map / NPR/NYT map mashup
> > for OSCON 2009:
>
> >http://www.danieljacobson.com/NewsMap/
>
> > As Daniel describes it, NewsMap "sends content from both NPR's API and
> > feeds from NYTimes through Yahoo!'s PlaceMaker API

> >http://developer.yahoo.com/geo/placemaker/to identify geo-information


> > about the stories. The stories are then placed on a Google Map.
> > Code is available at"http://github.com/danieljacobson/NewsMap/tree/master
>
> > On Mar 30, 6:13 pm, Chris Beer <chris_b...@wgbh.org> wrote:
> > > I've been testing with ingesting items from the NPR API into Solr (a
> > > Lucene-based search engine) to see what kinds of interesting analysis
> > > and functionality can be quickly prototyped. The demonstrator, with
> > > about 4 years of content, is athttp://

> > cbeer.info/~chris/npr-solr/npr.html<http://cbeer.info/%7Echris/npr-solr/npr.html>

JohnMcMellen

unread,
Apr 6, 2010, 1:32:31 PM4/6/10
to public media
I'm sure lots of stations are exposing their content in a scrapable
way, if you define that as something like RSS or MediaRSS. I have all
kinds of feeds possible from our content management system, but
nowhere to plug them into. Also not sure what metadata is needed to
derive the desired functionality.

Can you plug other feeds besides the NPR API into your Solr app Chris?
I had a dream about a search engine that indexed every story in public
media together.

Katie Kemple

unread,
Apr 6, 2010, 1:35:24 PM4/6/10
to public...@googlegroups.com
"I had a dream about a search engine that indexed every story in public
media together." This is great... preach it!

Seems one big problem for smaller stations is that they don't have a permanent (or even temporary in some instances) home for their stories online. What can we do about that?

--
To unsubscribe, reply using "remove me" as the subject.



--
Katie Kemple
703-981-7322
http://twitter.com/kkemple


Jonathan Coffman

unread,
Apr 6, 2010, 1:55:49 PM4/6/10
to public...@googlegroups.com
Sounds a lot like PBS’ current project to facilitate local/national content sharing via a db of metadata...
-Jonathan
__________________________________________________________
Jonathan Coffman
Product Manager, Social Media
http://www.pbs.org
Twitter: @jdcoffman


One moment can change a day,
One day can change a life,
One life can change the world.
THE BUDDHA premieres Wed., April 7th, 8/7C
http://www.pbs.org/thebuddha/
______________________________________________________________________________

This email may contain material that is confidential or proprietary to PBS and is intended solely for use by the intended recipient. Any review, reliance or distribution of such material by others, or forwarding of such material without express permission, is strictly prohibited. If you are not the intended recipient, please notify the sender and destroy all copies.
______________________________________________________________________________

Chris Beer

unread,
Apr 8, 2010, 1:33:30 PM4/8/10
to public media
John -- With Solr, if you can map your input to the solr schema, you
can ingest it. For my NPR API demo, I set the schema up using some of
the NPR terms, but it would be fairly trivial to map the NPR API to
something more like Dublin Core terms and create a nice generic schema
that way. If I had been thinking, I would have cached the NPR API
responses to make it easier to build different indexes.

I think what both PBS and NPR are doing to start capturing local
content is great. I'm really curious to see what kind of information
stations contribute back, although I fear it will be a subset of
what's actually out there (either because they self-curate, rights
restrictions, whatever) and both NPR and PBS have an interest in
collecting media assets, not just metadata. Maybe the CPB American
Archive could pick up that end of things..

Anyway, it is all very interesting and makes me wish I had more time
to throw at playing around with aggregating materials.

Daniel Jacobson

unread,
Apr 10, 2010, 9:20:10 AM4/10/10
to public media
Chris,
In addition to the API Ingest project (at NPR) and the corresponding
efforts at PBS, we are both (along with PRX, APM and PRI)
participating in the Public Media Platform initiative that is focused
on solving this exact problem. That said, the PMP is a big project
that will take some time to grow, so if this group (or others) are
able to make headway on doing something lighter-weight, that would be
great!

On a separate note, I am not sure that building systems around
scrapable content is the way to go in the long run. RSS-like content
is really just reference material that drives people back to a web
page, which is great for the web. But to make the content really
portable, to have it appear on mobile devices for example, it is much
better to get the full content centralized.

By the way, let me know if you want to collaborate on mapping the NPR
API to Dublin Core, by the way. Jack, Dave and I spent time creating
a mapping from NPRML to PBCore, so that could be a decent starting
point.

Chris Beer

unread,
Apr 28, 2010, 7:16:41 PM4/28/10
to public media
Sorry it took so long to respond to this -- been crazy around here.

I'd love to hear more about what PMP is going to do, I've only read
some high-level stuff, but it seems like an exciting project with
potential to help cross-polinate our content distribution (and maybe
get organizations out of siloed distribution (on can only hope..)

I think there's definitely space for both centralized aggregation and
distributed federation. Certainly in the library, university and
cultural heritage spaces, people are working at making it a decent
user experience (and maybe, with a solid technical framework like
linked data in place, achievable?). The nice thing about a distributed
model, in my opinion, is its reflection of the organization of public
media anyway -- distributed content creators sharing and distributing
content for various audiences with different needs and interests. I'm
not sure it's an approach ready from prime-time yet, but I'm
interested in exploring it further.

Chris Beer
WGBH Interactive
Reply all
Reply to author
Forward
0 new messages