"Programs" searches with mediaUrlLike working?

Ken Kennedy

unread,

Jun 10, 2011, 5:28:52 PM6/10/11

to SpokenWord.org APIs

Hey Doug, hope all is well. Just checking on the status of
mediaUrlLike "Programs" searches...I seem to remember that this was
something that wasn't working well, but there were hopes of improving
it. I've actually finally gotten a decent python module working to
parse my Google Listen feed of "read" items, and I am looking to wire
that into SpokenWord if I can. But I need some way to link one to the
other...mediaUrl seems to be the closest thing to an absolute
reference, so I'm headed down that path first. I'd love to be able to
use SpokenWord as the central repository for my podcast ratings,
history, comments, etc.!

Doug Kaye

unread,

Jun 12, 2011, 6:14:56 PM6/12/11

to spokenw...@googlegroups.com

Hi, Ken. It's supposed to be working. I just ran my test suite and it passed. Do you have an example of it failing?

...doug

--
You received this message because you are subscribed to the Google Groups "SpokenWord.org APIs" group.
To post to this group, send email to spokenw...@googlegroups.com.
To unsubscribe from this group, send email to spokenword-ap...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/spokenword-api?hl=en.

Ken Kennedy

unread,

Jun 14, 2011, 7:25:07 PM6/14/11

to spokenw...@googlegroups.com

Oh, cool...hopefully I'm just doing something silly. One of these urls should return me _something_ from the Public Knowledge podcast, at least as far as I can tell.

Direct query to recent podcast: (mediaUrlLike = http://media.publicknowledge.org/pkitk-20110610.mp3)

http://api.spokenword.org/programs.json?mediaUrlLike=http%3A%2F%2Fmedia.publicknowledge.org%2Fpkitk-20110610.mp3

More open query to PublicKnowledge in general (mediaUrlLike = mediaUrlLike=publicknowledge)

http://api.spokenword.org/programs.json?mediaUrlLike=publicknowledge

Both of those are returning an empty set to me. Do you have an example of something that definitely works? Maybe my formatting is off or something.

Thanks for help, Doug!

--
Ken Kennedy
Contact info: http://kenzoid.com/me/contact

Doug Kaye

unread,

Jun 16, 2011, 10:36:26 AM6/16/11

to spokenw...@googlegroups.com

Try using this as your mediaUrl: %publicknowledge.org%

mediaUrlLike=%publicknowledge.org%

The search string is passed as-is to MySQL, hence the % as leading or trailing wildcards.

I could not find a URL such as http://media.publicknowledge.org/pkitk-20110610.mp3. Can you give me the URL of the program that has that media URL so I can track it down?

Note that this is a VERY slow query. There's an index for that table, but the index has more than 1 million entries and is going to be searched linearly. Keep that in mind in your application design.

...doug

Ken Kennedy

unread,

Jun 16, 2011, 5:36:56 PM6/16/11

to spokenw...@googlegroups.com

Oh... duh. Now I just feel silly. Thanks, Doug, I will try that this evening!

Ken Kennedy

unread,

Jun 17, 2011, 1:17:01 PM6/17/11

to spokenw...@googlegroups.com

As for the specific link I used, I was "lucky" enough to choose a feed (basically at random from my subscriptions) that isn't currently validating:

http://feedvalidator.org/check.cgi?url=http://feeds.publicknowledge.org/publicknowledge-intheknow

Your most recent record in the spokenword db is the 2011-03-04 episode. I'll touch base with them and see if they know they have an issue (in fact, I think I may have before, now that I think about it...)

I'm going to pick another unique/specific URL, check and make sure it's in your database, and then try with it to see what the result is.

Hopefully a query against mediaUrlLike that's an exact match will either run fairly quickly, or maybe a non-LIKE option could be opened up if it doesn't?

mediaUrlEquals, perhaps? If the fied is indexed, and there's no wildcard matching, it should either return the results or an empty set pretty quickly. *fingers crossed*

Doug Kaye

unread,

Jun 17, 2011, 2:08:00 PM6/17/11

to spokenw...@googlegroups.com

Yes, if you don't need a LIKE option, it could be much faster. And it might be much faster without a leading "%". Give it a try.

...doug

Ken Kennedy

unread,

Jun 17, 2011, 2:28:43 PM6/17/11

to spokenw...@googlegroups.com

I literally just did...were you watching the API log? *grin*

I just ran 3 exact matches through with no '%'s; they returned pretty quickly. They checked out (as hopefully they will when the length of the returned list is 1) as the program I expected them to be. The good news is that I got the mediaUrls from my Google Reader Listen Subscriptions "read" feed; so I can now:

a) Listen to podcasts on my Android device with Google Listen (or for that matter, in any fashion that uses Google Reader for podcast feeds, and marks them read properly).

b) Periodically have a script check my Listen "read" feed for newly listened programs, and pull out the mediaUrls (you can go arbitrarily far back on this, so if I haven't queried it for, say...a week, that's ok. I just change the timestamp that I go back to).

c) Take those mediaUrls, use the programs/search?mediaUrlLike query from the SpokenWord API to get programIds, and then add those programIds to my "Recently Listened" collection.

No more manual "trying to find what I just listened to in SpokenWord"! This makes is much, much easier for me to quickly scan this collection do things like rate programs, move programs to other (more manually curated) collections, etc.

Seems to be working fine so far...it's just been hand-tested python at this point. I'm wiring things together into a test script now, and hopefully should put things into a cron job over the weekend.

Please let me know if the queries become bundensome on the MySQL db, Doug. I don't query unless I have a "listened" mediaUrl, and in this model, they're all intended to be unique; no '%'s. So it shouldn't be too bad, but if things go awry, let me know and I'll change frequency (or strategy) as needed.

There are some edge-case, ill-behaved "podcasts" that reuse media urls...ick. Nothing much to be done about that, but AFAIK they are pretty rare.

Doug Kaye

unread,

Jun 22, 2011, 9:30:42 AM6/22/11

to spokenw...@googlegroups.com

Thanks, Ken. The only way I'd know if there's a problem is if I tested a specific query. But if you're not using any % or wildcards, you should be fine. I may actually go back and disable anything but exact searches.

...doug

Reply all

Reply to author

Forward