New API Methods

1 view
Skip to first unread message

Doug Kaye

unread,
Dec 23, 2009, 12:03:08 AM12/23/09
to spokenw...@googlegroups.com
As discussed previously, the /search method is a mess.  It was too ambitious. It turned out to be impossible to implement, and the queries required to pull it off were ridiculously inefficient. So much so that I had to turn it off. I'm therefore moving in a different direction, and I'd like to get everyone's input *before* I go down the wrong path, rather than after.

The plan is to replace /search with object-specific methods: /programs, /feeds, /collections and /members. One one hand, you won't be able to combine search criteria in an arbitrary manner. OTOH, I can guarantee that everything that's allowed will be reasonably efficient. I've also spec'd requests that return almost every possible field in the database -- probably many more than you'll ever want.

Below is a first-cut at the spec for the new /programs method. What do you think?

Thanks again, folks.

   ...doug

/programs (unimplemented; proposed)

Search/browse programs.

URL

http://api.spokenword.org/programs[.format]?quesrystring

HTTP Method(s)

GET

Search/Order Options

You may specify zero or one of the following querystring options. You may not combine them (ie, use more than one).

  • Submitter (m)
    • ?submitterId=memberId
  • Highest-Rated
    • ?order=ratingAverage
  • Most-Popular (using an algorithm that averages aged ratings)
    • ?order=ratingAged
  • Recently Collected
    • ?order=recentlyCollected
  • Recently Rated
    • ?order=recentlyRated
  • Most Collected
    • ?order=mostCollected
  • Most Played/Dowloaded (m)
    • ?order=playCount
  • Pageviews (p)
    • ?order=pageviews
  • Most-Rated
    • ?order=ratingCount
  • Publication Date, Most Recent (m)
    • ?order=pubDate
  • Recording Date, Most Recent (p)
    • ?order=recordingDate
  • Submission Date, Most Recent (m,p)
    • ?order=submissionDate
Filter Options
You may specify zero or more of the following with or without one of the Search/Order Options above.
  • Media
    • &media=[a][v]: media: audio, video (default='av'). Only with Search/Order Options above labeled (m).
  • Language
    • &lang=language: see Language section. Only with Search/Order Options above labeled (p).
  • Explicit Content
    • &noexp=0|1: exclude content tagged as explicit (default=0, don't exclude). Oonly with Search/Order Options above labeled (p).
Result Paging
  • &count=int: max number of items to return (default=20, max=200)
  • &page=int: offset into search results (default=0)

Usage Examples

http://api.spokenword.org/programs?cat=G015
http://api.spokenword.org/programs?tag=publicRadio&count=25&page=3
http://api.spokenword.org/programs?tag=+publicRadio+kqed-news
http://api.spokenword.org/programs?order=submissionDate&m=a&lang=en&noexp=1
http://api.spokenword.org/programs?order=playCount&media=v&count=50&page=0
Response

Returns a searchResult object followed by an array of the specified objects: program, feed or collection.

Method-Specific Errors

    TBD


Thilo Planz

unread,
Dec 23, 2009, 1:12:07 AM12/23/09
to spokenw...@googlegroups.com
Hi Doug,

> The plan is to replace /search with object-specific methods: /programs,
> /feeds, /collections and /members.

That makes sense.
Most people want to search that way anyway, I believe.
And we can combine the results client-side if we feel like it.

> OTOH, I can guarantee that
> everything that's allowed will be reasonably efficient.

That's a bold statement ;-)


> I've also spec'd
> requests that return almost every possible field in the database -- probably
> many more than you'll ever want.

If this becomes a problem, we could have a "returnFields" parameter.
"all": would be every field
"id": would be only the program id
"short": something in between


> *Search/Order Options*
> - *Highest-Rated*
> - ?order=ratingAverage
> - ?order=ratingAged
> - ?order=recentlyCollected
> - ?order=recentlyRated
> - ?order=mostCollected
> - ?order=playCount
> - ?order=pageviews
> - ?order=ratingCount
> - ?order=pubDate
> - ?order=recordingDate
> - ?order=submissionDate
>

I'd like to be able to use all these order options
also as filter options.

minPlayCount=4
maxPlayCount=10
minPubDate=20091010

And then, maybe playCount, rating, collected could be for a given user only:

user_maxCollectedCount=0
user_minCollectedCount=1

would return only stuff that I have not already collected,
or only stuff that I have collected, which could be very useful.


Cheers,

Thilo

Doug Kaye

unread,
Dec 23, 2009, 1:26:29 AM12/23/09
to spokenw...@googlegroups.com
On Tue, Dec 22, 2009 at 10:12 PM, Thilo Planz <thilo...@googlemail.com> wrote:
> everything that's allowed will be reasonably efficient.
> OTOH, I can guarantee that
That's a bold statement ;-)

I can safely say that because no query uses more than one table.


> I've also spec'd
> requests that return almost every possible field in the database -- probably
> many more than you'll ever want.

If this becomes a problem, we could have a "returnFields" parameter.
"all": would be every field
"id": would be only the program id
"short": something in between

I didn't say that properly. What I meant to write was that you can query or sort by almost every field. As it is, we're already returning virtually all fields in the program object, for example.


 
> *Search/Order Options*
>    - *Highest-Rated*
>       - ?order=ratingAverage
>    - ?order=ratingAged
>       - ?order=recentlyCollected
>    - ?order=recentlyRated
>       - ?order=mostCollected
>    - ?order=playCount
>    - ?order=pageviews
>       - ?order=ratingCount
>    - ?order=pubDate
>    - ?order=recordingDate
>    - ?order=submissionDate
>

I'd like to be able to use all these order options
also as filter options.

  minPlayCount=4
  maxPlayCount=10
  minPubDate=20091010

And then, maybe playCount, rating, collected could be for a given user only:

  user_maxCollectedCount=0
  user_minCollectedCount=1

   would return only stuff that I have not already collected,
or only stuff that I have collected, which could be very useful.

That's exactly what we can't do and why I need to rewrite this part of the API. It's what got us into trouble in the previous /search method. The tables are too large, about a half-million records each. So if a query requires a JOIN, it can often create an intermediate table of >100,000 rows, and hence is too slow to offer via the API. These queries often take a few seconds, and that's more than we can afford to support unless we implement a lot of throttling. That's why the spec says you can't combine these query parameters except specifically as shown. For example, you can combine a filter for language with ordering by pageviews. But you can't specify both ordering by playCount with a pubDate min or max. Those data are in two different tables, so it can generate a 500,000 JOIN, which is a killer.

Thilo Planz

unread,
Dec 23, 2009, 1:38:59 AM12/23/09
to spokenw...@googlegroups.com

>> I'd like to be able to use all these order options
>> also as filter options.
>>
>> minPlayCount=4
>> maxPlayCount=10
>> minPubDate=20091010
>>
>> And then, maybe playCount, rating, collected could be for a given user
>> only:
>>
>> user_maxCollectedCount=0
>> user_minCollectedCount=1
>>
>> would return only stuff that I have not already collected,
>> or only stuff that I have collected, which could be very useful.
>>
>
> That's exactly what we can't do and why I need to rewrite this part of the
> API.
> For example, you can combine a
> filter for language with ordering by pageviews. But you can't specify both
> ordering by playCount with a pubDate min or max. Those data are in two
> different tables, so it can generate a 500,000 JOIN, which is a killer.

Okay, understood.
But we could have a min/max using the same field as the ordering?

How about the per-user counts?
Completely out of the question?

Thilo

Doug Kaye

unread,
Dec 23, 2009, 1:41:20 AM12/23/09
to spokenw...@googlegroups.com
On Tue, Dec 22, 2009 at 10:38 PM, Thilo Planz <thilo...@googlemail.com> wrote:
> But we could have a min/max using the same field as the ordering?

Yes, that's reasonable and a good idea. I'll work on that.


> How about the per-user counts?
> Completely out of the question?

Give me an example.

...doug

Thilo Planz

unread,
Dec 23, 2009, 1:48:04 AM12/23/09
to spokenw...@googlegroups.com
>> How about the per-user counts?
>> Completely out of the question?
>
> Give me an example.

And then, maybe playCount, rating, collected could be for a given user
only:

user_maxCollectedCount=0
user_minCollectedCount=1

would return only stuff that I have not already collected,
or only stuff that I have collected, which could be very useful.


Thilo

Doug Kaye

unread,
Dec 23, 2009, 2:22:01 AM12/23/09
to spokenw...@googlegroups.com
Hi, Thilo.

I don't understand your examples below, so start from scratch and try
again, if you don't mind.

But specifically as to filtering on what you have or haven't
collected, here are some thoughts...

1. You can always find out what you've collected by using /collection
and specifying your "history" collection. Unless you've turned off
history tracking in your profile, that will show you everything you've
ever collected.

2. To find out what you *haven't* collected, you have to do #1 and
then use that list to remove program IDs from the list of all
programs. (I know you know this, I'm just going through it for the
sake of explanation.)

3. I think the question you're asking is can #1 and #2 be used as
server-side filters, combined with some other ordered request. Is that
right? Is that what you're looking for? For example (using English,
which is best for this):

"Give me all the programs sorted by highest rating, with a rating
of at least 4.1, and which I have never collected before."

Hmmm...it's possible to combine sorting by a parameter (like average
rating) with a min and max value for the same parameter. In general,
combining that with other tables is what's prohibitively expensive,
and that's what's required to check your collection history. It can
get really nasty, in fact. Supposed you asked "...and which is
currently in at least one of my active collections." Now we can't
check the history collection. We have to check the union of all of
your non-history collections. We do that now when we display various
pages on the site, but that usually means only checking a max of 100
programs or so to see if they're in a collection or not. The queries
you're asking for require us to check hundreds of thousands of
programs for whether they're in one of your collections.

Again, there may be a reasonable substitute for this query, but it's
not going to be a generalized query facility.

The best thing is to keep thinking about your specific needs and give
me the queries you need to make, expressed in English. That way I have
the best chance of understanding them, and I can possibly translate
them into something that matches the scheme.

Thanks again.

...doug

Thilo Planz

unread,
Dec 23, 2009, 2:46:22 AM12/23/09
to spokenw...@googlegroups.com
Doug,

> I don't understand your examples below, so start from scratch and try
> again, if you don't mind.

Sorry, that example was quite terse.

If was looking to a) search only within the programs that I have already
rated or collected, or b) to exclude those programs from the search
(and search in all other programs).

> "Give me all the programs sorted by highest rating, with a rating
> of at least 4.1, and which I have never collected before."

Yes, that would be one example.
(But I am thinking more of ratings than collection).

> 1. You can always find out what you've collected by using /collection
> and specifying your "history" collection. Unless you've turned off
> history tracking in your profile, that will show you everything you've
> ever collected.

That is great. I did not know we have this feature.

Is there a similar list of everything you have ever rated?


> Again, there may be a reasonable substitute for this query, but it's
> not going to be a generalized query facility.

Yes, I understand that.
Generalized ad-hoc queries will not work with those large tables.

The cases I am thinking of can be covered by either a join or
an anti-join (for exclusion) against the history collection or
the speculative rating history collection.

That join could also happen client-side.


Thanks,

Thilo

Doug Kaye

unread,
Dec 23, 2009, 11:52:15 AM12/23/09
to spokenw...@googlegroups.com
On Tue, Dec 22, 2009 at 11:46 PM, Thilo Planz <thilo...@googlemail.com> wrote:
>>     "Give me all the programs sorted by highest rating, with a rating
>> of at least 4.1, and which I have never collected before."
>
> Yes, that would be one example.
> (But I am thinking more of ratings than collection).

OK, I think we can support "Give me all the programs sorted by highest
rating, with a rating of at least 4.1, and which have never rated
before." New options to support this are in the works.


> Is there a similar list of everything you have ever rated?

There's an RSS feed for programs a member has rated. I'll probably add
this to the APIs, and I may (eventually) deprecate the RSS feed.


> The cases I am thinking of can be covered by either a join or
> an anti-join (for exclusion) against the history collection or
> the speculative rating history collection.

I can probably add an option for "in collection <#>" as this would
create a JOIN of no more than 1,000 rows. But "Except in collection
<#>" would still create a huge JOIN.

Another update to the spec will be coming soon. Thanks for your help!

...doug

Ken Kennedy

unread,
Jan 11, 2010, 1:24:50 PM1/11/10
to spokenw...@googlegroups.com


On Wed, Dec 23, 2009 at 11:52 AM, Doug Kaye <do...@rds.com> wrote:

On Tue, Dec 22, 2009 at 11:46 PM, Thilo Planz <thilo...@googlemail.com> wrote:

> Is there a similar list of everything you have ever rated?

There's an RSS feed for programs a member has rated. I'll probably add
this to the APIs, and I may (eventually) deprecate the RSS feed.


What's the location/format of that feed, Doug? I'm poking around, but I'm not seeing it. That would be interesting to see.

Thanks!

--
Ken Kennedy
Contact info: http://kenzoid.com/me/contact

Doug Kaye

unread,
Jan 11, 2010, 1:28:30 PM1/11/10
to spokenw...@googlegroups.com
The feed is at: http://rss.spokenword.org/recentlyRated

It's standard RSS plus the use of our namespace:
http://conversationsnetwork.org/rssNamespace-1.0/

I made this as a one-off for Thilo. I'm not sure if this is the best
thing in the long term or whether we should just make it part of the
API. The latter is my tendency at the moment.

...doug

Ken Kennedy

unread,
Jan 11, 2010, 1:37:50 PM1/11/10
to spokenw...@googlegroups.com
I like the API idea as well, especially with some options (for a specific user, etc.)

--
You received this message because you are subscribed to the Google Groups "SpokenWord.org APIs" group.
To post to this group, send email to spokenw...@googlegroups.com.
To unsubscribe from this group, send email to spokenword-ap...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/spokenword-api?hl=en.



Reply all
Reply to author
Forward
0 new messages