Grabbing meta-data from media

5 views
Skip to first unread message

Maboa

unread,
Nov 21, 2011, 10:32:08 AM11/21/11
to meta-met...@googlegroups.com
Hi - finding some time for my first post on this group, so hello everyone!

I'm working on a project that we call Hyperaudio which is really all about taking web-based audio and enhancing it with interactive transcripts and other data. Hyperaudio has so far been applied to the spoken word in both audio and video but it could also be applied to music.

I think it would be useful collaborate with the meta-meta project on the media side of things and hope to help contribute to specification and solutions.

Some examples of the data we could extract:

- word timings and transcripts
- volume
- pitch
- language
- keywords

I am looking at http://cmusphinx.sourceforge.net/ as something I can mess around with. Although it's been a while since I wrote C. A possibility is to use something called Emscripten https://github.com/kripken/emscripten/wiki to convert to JavaScript. This is ambitious stuff to be sure, but it would be interesting to see how far we can get with free and open source software.

Cheers

Mark


Dan Schultz

unread,
Nov 21, 2011, 10:51:01 AM11/21/11
to meta-met...@googlegroups.com
Hey Mark,

 Very interesting, I hadn't thought about non-spoken audio.  Are you planning to make us all amazing mashup artists ;)

 Just to address your Meta Meta points -- We started in Python while planning for a more decentralized architecture that would allow for other languages (and therefore other tools).  Laurian and Raynor were the two leading that charge so maybe they can describe in a bit more detail what they had in mind at some point?

 In the mean time that would be great in any language, even if you don't explicitly tie it into the MetaMeta API up front (we can do that after we figure out the architecture for that type of use case).  Of course if you have thoughts about how to kill both birds with one stone that would be epic too.  There's also middle ground (e.g. providing the C functionality over an API that meta meta could call, but thinking too hard about creating a decentralized standard for that type of interaction yet) that would get the functionality there with the intent to refactor the integration points down the line.

 Just let me know if I can help at all!

Best,

Mark

unread,
Nov 21, 2011, 12:02:08 PM11/21/11
to meta-met...@googlegroups.com
Well, of course there is karaoke to consider :)

Regarding APIs - are you thinking http and URI based? I'm working on something right now where I am taking this approach. You can specify JSONP (to get round the cross-site restrictions) for client-side calls or just straight JSON for server-side. The language that each module is written in needn't be exposed the only sticking point is that it would must be accessible by http.

Mark

Laurian

unread,
Nov 21, 2011, 3:53:39 PM11/21/11
to meta-met...@googlegroups.com
I think we could keep MetaMeta API as a spec, with implementations in what languages are available. Think of the MetaMeta API as the front controller that dispatches requests to whatever can do the job:
- python
- "raw" CGI scripts
- redirect to external service
- proxy (+/- some processing) to external service

I used sphinx for some experiments in 2009, it was running as a component in RePlay [1], triggered as command from an Ant script; and RePlay was validating each "word" against wikipedia to be sure that it is a real word; the results were quite funny, see

But those words were never displayed, they were used for search, a metaphone or soundex search…

Now, if you have some context for a video, some related articles (from the links around it, etc.) you can easily align concepts from the related articles to the sound-like equivalent from sphinx.


BTW, RePlay was also dumping each keyframe as image and was doing OCR to scoop for words to be indexed.


Laurian
Reply all
Reply to author
Forward
0 new messages