Holiday homework - juicer meta data extraction API

matth

unread,

Jan 1, 2012, 6:46:32 PM1/1/12

to Meta Meta Project

Hi dudes,

Hope everybody had a great Christmas and New Year!

Back at the Hackfest I was chatting with a few of you about entity
extraction and also "Readability" style body text extraction.

It's still an area I am interested in and as I wanted an excuse to
try out Scala on the new Heroku stack I put together a little app ...

http://juicer.herokuapp.com/

TL;DR - You post a URL to the API, it extracts the page title, body
text,
keywords, etc then runs the lot through a named entity extractor.
Essentially
Readabilty + Stanford NER. You can also just post a load of plain text
and
it will do the NER on that.

Please feel free to use it / hack it / break it / etc with the caveat
that it's
still a "toy" project at this stage. If you'd like to include this
type of
functionality in any meta-meta API's I'd be happy to get involved
there, as it
runs on Heroku you can boot up your own instance for free in about 10
mins!

If anybody wants to install it or mess about with some Scala insanity
then the
Github URL is ...

http://github.com/matth/juicer/

Cheers, and Happy New Year!

Matt

Mark

unread,

Jan 3, 2012, 5:24:36 AM1/3/12

to meta-met...@googlegroups.com

Hi Matt,

Wow! Nice work :)

I'm sure this will be of interest to many. I need to figure out what meta we can extract from audio and video and feeding this back into the project.

Looking forward to working with you and other meta-hackers in 2012.

Happy New Year all!

Mark

Matthew Haynes

unread,

Jan 3, 2012, 6:27:12 AM1/3/12

to meta-met...@googlegroups.com

Thanks dude! Hopefully it'll be of use!

> I need to figure out what meta we can extract from audio and video and feeding this back into the project.

What kind of thing are you thinking of? I could possibly extend the juicer app to pull out links to media files (Youtube embeds, MP3's, etc), which might help provide some context to the media.

Then I guess there is other stuff embedded in media files such as ID3 Tags etc, I think my colleague Abdel made a start on a service for this kind of thing. I'll try and get the details if it sounds useful!

Cheers,

Matt

Mark

unread,

Jan 3, 2012, 6:34:46 AM1/3/12

to meta-met...@googlegroups.com

All info is useful I think - but I was especially thinking of the embedded info.

Eventually word level timings could be part of the equation and perhaps event beats per minute and other music related info.

I've started playing about with colour info in video - http://happyworm.com/jPlayerLab/videofingerprint/v04/ not sure if detected scene changes or even average colours would be useful meta to extract from video but I guess the beauty of this project is that everything *could* be useful to someone, so let's just grab the data we can :)

Cheers

Mark

Matthew Haynes

unread,

Jan 3, 2012, 8:58:56 AM1/3/12

to meta-met...@googlegroups.com

Great stuff Mark !

The average frame colour detection is very cool! Looks as if there is real potential for
it to be used as automatic scene / shot detection (well, it does shots already!)

Does it have a github url? Would love to have a play about!

Matt

Mark

unread,

Jan 3, 2012, 9:57:48 AM1/3/12

to meta-met...@googlegroups.com

Glad you like it :) It's on github here https://github.com/maboa/videofingerprint (Note I haven't used Canvas in the most efficient way - just wanted to get something working). The next stage is to try and figure out the scene changes. I'm wondering actually if this should all take place server-side as you tend to get different results in Firefox and Chrome and depending on how powerful/overloaded your system/browser is.

However, what I'd like to try first is to skip forward in a series of (say) 500ms increments to get the average colour, maybe using the same video in a hidden div so that this process isn't seen. We could then grab the scene change timings and thumbnails in advance and also fit the video 'fingerprint' (the first graph) to the progress bar (convert to png?).

Also there's probably better ways to detect scene changes than using the average frame colour - but I like to try the simple things first :)

Cheers

Mark

Reply all

Reply to author

Forward