Hyperaudio Applications and Demos

818 views
Skip to first unread message

Mark Boas

unread,
Nov 17, 2011, 11:57:14 AM11/17/11
to hyper...@googlegroups.com
Applications and Demos

Proof of concept :
http://happyworm.com/jPlayerLab/audiotextsync/v13/

Danish Radio Demo - Hyperdisken Programme : http://yoyodyne.cc/h/ -

Minnesota Public Radio - http://minnesota.publicradio.org/proto/hyperAudio/  (based on Hyperdisken)

Radiolab / Soundcloud : http://hyper-audio.org/r/

Henrik Moltke Interview on DR : http://happyworm.com/clientarea/hyperaudio/htdemo/

(work in progress - click on the latest version number to start)


Source Code


http://github.com/maboa/
Message has been deleted

Mark Boas

unread,
Dec 13, 2011, 1:12:42 PM12/13/11
to hyper...@googlegroups.com
A screencast I made yesterday demoing the Hyperaudio Pad. http://happyworm.com/screencams/hyperaudiopad/2011-12-12/

Matthew Terenzio

unread,
Dec 13, 2011, 7:27:54 PM12/13/11
to hyper...@googlegroups.com
Hey Mark,

Very good. I'm beginning to get it. Wheels are turning . . .

Mark Boas

unread,
Dec 14, 2011, 5:54:25 AM12/14/11
to hyper...@googlegroups.com
Good to hear Matt - if you have any ideas or can think of any uses cases, I'd love to hear them.

Mark Boas

unread,
Jan 30, 2012, 4:43:53 AM1/30/12
to hyper...@googlegroups.com
This popped up in my tweet-stream.

“@MattWilcox: My problem with video blogs is they take too damned long to consume. Give me the details in text, if I'm interested, I'll watch.”

https://twitter.com/maboa/status/163911920003465217

It seems there is a need for Hyperaudio in all sorts of circumstances, the trick is to make the process of making Hyperaudio easy enough for the every day producer. This is what I hope we can focus on.

Mark Boas

unread,
May 3, 2012, 5:01:41 AM5/3/12
to hyper...@googlegroups.com
A type of hyper-transcript : https://developer.mozilla.org/en-US/demos/detail/html5-audio-read-along/launch by Weston Ruter.

Cool that you can control the speed (if your browser supports it).

Mariano Blejman

unread,
May 3, 2012, 6:15:55 AM5/3/12
to hyper...@googlegroups.com
hi, i am using firefox 12, and i can't see the speed control (i can see it in google chrome but sound doesn't work)

HTMLMediaElement.playbackRate

why?
--
Mariano Blejman
Editor-in-chief Suplemento NO 
Editor of Digital Culture section
Página/12, Buenos Aires, Argentina
http://www.marianoblejman.com
http://www.pagina12.com.ar
Hacks/Hackers Buenos Aires
mail: mari...@he.net
mob: +5491150080629
skype: jfk_knoppixxx
twitter: @blejman @supleno @cult_digital @hackshackersba
LinkedIn: http://linkd.in/97kuH9

Mark

unread,
May 3, 2012, 6:20:43 AM5/3/12
to hyper...@googlegroups.com
Hi Mariano,

I haven't had a chance to play with playbackrate much yet so I'm not quite sure what's causing the issue. You can check whether your browser supports it by trying out http://areweplayingyet.org/

Might be worth contacting the author? I would be interested in knowing why too! :)

Thanks

Mark

J. Reuben Wetherbee

unread,
May 17, 2012, 1:58:43 PM5/17/12
to hyper...@googlegroups.com

My name is Reuben Wetherbee and I work for the University of Pennsylvania School of Arts and Science.  I was recently given a project to help members of the Writing Program come up with a way to align a whole bunch of readings by poets with the corresponding text. I ended up using jPlayer and Hyperaudio concepts to get the job done.  The final product has both a player and creator/editor module where the Writing Program can do some manual alignment for themselves starting from plain text. Manual alignment works since most of the poems are rather short.  You can choose to align by word or by line and edit already tagged text.

 

Anyway, if anybody from the group wants to check it out and give me feedback that would be great.

 

The code is on github at www.github.com/jrweth/MediaAlignedText

 

You can see some documentation and some demos at:

http://jrweth.github.com/MediaAlignedText

 

Some Poems that have already been aligned and rolled out:

http://www.writing.upenn.edu/pennsound/x/Williams-WC/the_red_wheelbarrow.php

http://www.writing.upenn.edu/pennsound/x/Ashbery/crossroads_in_the_past.php
http://www.writing.upenn.edu/pennsound/x/Ashbery/the_skaters.php

 

Disclaimer – this is my first foray into a jQuery plugin and really heavy duty javascript so I’m sure there are some bugs / kinks / idiotic ways of doing things that could use some work.   I also haven’t yet tested on a really long selection yet, so there might be performance problems. 

Mark Boas

unread,
May 21, 2012, 6:53:16 AM5/21/12
to hyper...@googlegroups.com
Hi Reuben,

It's great to see this type of thing being developed. I tried it out and it works well. I especially like the timeline view.

What I'm aiming at setting up is an algorithm that will create a good first pass of speech-to-text timings which can then be tweaked. An algorithm rarely gets everything right! I'm not quite sure of the form the interface for the tweaking will take just now.

I'll keep you posted and if you want to be involved in the process of creating it let me know. (I'm thinking of basing it on the open source CMU Sphinx http://cmusphinx.sourceforge.net/) Note this is currently a Java based library although I'd love to see it working purely in JavaScript using an advanced audio API in the browser.

All very exciting! Keep us posted.

Best

Mark

PS Have you got a demo version of the editor online somewhere? - I find that's often a useful complement to having the code on github as it lets people try tools out really easily.

J. Reuben Wetherbee

unread,
May 23, 2012, 9:09:28 AM5/23/12
to hyper...@googlegroups.com
HI Mark,

Yes.  Keep me posted on on the speach-to-tet timings project.  I fooled around with CMU Sphinx a bit as well.  One of the problems I ran into was jus the sheer time it took to do the  parsing for longer files, which didn't necessarily work out well in a web context.   I also fooled around with the aligner provided by the Linguistics department here at Penn which I got somewhat working spitting out hyperaudio, but ran into similar issues in longer files (e.g. over 10 minutes).

-Reuben

p.s. there are demos for the editor at the github documentation site (http://jrweth.github.com/MediaAlignedText/editor_demo.html).  Do you think additional demos would be helpful?

tara shankar

unread,
May 29, 2012, 11:16:14 PM5/29/12
to hyper...@googlegroups.com
I'm really happy to see this kind of textual and sound interplay work. I especially like the more designed treatment of the transcript in the Radiolab / Soundcloud npr where the transcript (I assume) is algorithmically placed based on certain attributes from the voice audio track. I'm guessing... there's a constant vertical scroll animation and the word onset determines the y placement. The size of the words is based on average amplitude or a marked up feature? Really nice and amusing treatment of the program for online consumption.

There is some very early textual animation work done at the MIT Media lab by folks like Suguru Ishizaki (I think still at CMU), Yin Yin Wong (1995 -- mostly designed, not algorithmic). I contributed some work that built on their phenomenal design and programming sense by using prosodic characteristics of the text to animate the "transcript". My work was called Prosodic Font (1998) -- I'm not sure the demo still works (Java 1.1 :)  There was no backend to it -- the system accepted a marked up form of the transcript to produce the visualization. 

I remain very interested in this area. It seems like the time is drawing very near when there are complete mash-ups between reco, prosodic recognition, semantics accessible from a Web scripting language. I reference the AT&T speech mashup portal and a few other efforts. The big missing piece for me still is the lack of good audio recording through HTML5 or other open source languages.

Thought Bubble

unread,
May 30, 2012, 4:19:08 PM5/30/12
to hyper...@googlegroups.com
Hey Mark. 

Great to see this kind of development, I'm curious if there is any way to auto_transcribe the VO being spoken into a text stream(with associated time markers) of sorts. I know from experience that Adobe has a very strong auto transcriber built into Adobe Premier which can literally transcribe any given audio file. If you provide it with a VO matching script it is over 90% accurate. Have you had a chance to experiment with it?

Mark

unread,
Jun 1, 2012, 7:46:11 AM6/1/12
to hyper...@googlegroups.com
Fantastic to see this discussion and great to have some time to finally respond ....

@Reuben regarding CMU Sphinx I have still yet to install successfully. But I would love to play about with it. I think I should find time to make a much more concerted effort in this area. After all the holy grail would be to create a web based service that would do most of the transcription and create an easily-to-use tool to 'tidy up' the results and get them talking! I don't think the time it takes to process is a show-stopper as we can email the user when ready and actually depending on server resource and popularity we may have to queue jobs up. I may ping you if I have issues installing. I hope that's OK! :)

@Tara with the Radiolab demo, many people perceive some sort of algorithm at work with the word sizes, but all we are actually doing is displaying the text in different colours depending on who is speaking. The size is actually random, but it's like our brains read more into it. I'd love however to take account of any extra meta-data we could associate with each word, such as volume, pitch and express that through colour and size. I'd love to see any work you have created in this area!

@Thought I have heard of that feature yes, But I'm still to stump up the cash for the latest version of Adobe Premier. What I'm aiming to do is to create a web-based tool that is free, easy to use and acceptably quick.

Mark

Mark Boas

unread,
Jun 1, 2012, 11:54:58 AM6/1/12
to hyper...@googlegroups.com
Just found this...

HTML5 Audio Karoke – a JavaScript audio text aligner http://johndyer.name/html5-audio-karoke-a-javascript-audio-text-aligner/

Uses CMU Sphinx and looks promising.

Weston Ruter

unread,
Jun 1, 2012, 1:28:15 PM6/1/12
to hyper...@googlegroups.com
@Mark: Thanks for the share.

@Mariano: Greetings. Yes, Firefox does not support the playbackRate property yet, so that's why the speed option is not available. It is supported by both Chrome and Safari, however, so you can try it in those browsers.

Also, happy to say that I won 2nd place in the audio Dev Derby. May this further spread the cause of hyperaudio!

Steve Raskin

unread,
Jun 3, 2012, 3:06:12 PM6/3/12
to hyper...@googlegroups.com
Hi Mark,

Presently my primary concern is how best to obtain and implement into both HTML and javascript the hypertranscripting, as my prospective project involves potentially a thousand pages, and word-alignment is required.

I've been checking out the various hyperaudio demos, and I have questions ;)

1. My understanding was that legit HTML5 data attributes must be prefixed with 'data-'; where did 'oval' and 'm' come from and are they legit HTML5 data-attributes? I'm also wondering about the pros/cons of each word being duped as an attribute of itself (e..g., oval=), versus IDs for each word (as in my hack prototype); I suppose the former is more semantically meaningful and easier to read in an array.

2. In paneltext.js I see the .subtitle arrays; how did you go about getting the time values on these projects? This certainly seems to be the most tedious aspect of the project I'm looking at (approximately 1000 word-aligned pages, btw); even with my Logic chops however, it's not a quick task and I have no way to export into anything useful, e.g., xml. I had a go with Soundbooth, using the 'Analyze Speech’ + 'Export ... Speech Analysis'. Only with a reference text were the results useful, though the exported xml file has so much extraneous taggery that to extract/convert the time values to a data-attribute will require a load of editing in BBEdit. As to CMU Sphinx, I've installed it and ran a couple of the demos but many hours later I have no idea how exactly to proceed with it for my objectives. Some granny-proof documentation/tutorials would be most welcome, but that's another board ;)

3. In the hyperpadaudio and Hyperdisken demos (http://happyworm.com/clientarea/hyperaudio/hap/v22/pad.htm, http://happyworm.com/clientarea/hyperaudio/htdemo/), where is the paneltext.js or its equivalent? Are you achieving the highlighting differently than in the other demos? Specifically, are you using an alternate method which doesn't require an array with timings for each word?

As to this: http://hyper-audio.org/r/, well that's just a thing of beauty, hella nice work man.

I thank you in advance for indulging my noob questions.

p.s. Mark, only after hours of investigation did I notice your presentation, which, IMO, is essential reading. Perhaps you can add a link to this google discussion page? For me, the key discovery therein are the available transcription services, should my results with Soundbooth prove too time-consuming. On a paid project I could easily justify hiring one of those services, though in my case, as I'll have the texts, I require only marked up timestamping.

Cheers, sr


Mark Boas

unread,
Jun 3, 2012, 5:56:47 PM6/3/12
to hyper...@googlegroups.com
Hi Steve,

My answers inline.


On Sunday, June 3, 2012 9:06:12 PM UTC+2, Steve Raskin wrote:
Hi Mark,

Presently my primary concern is how best to obtain and implement into both HTML and javascript the hypertranscripting, as my prospective project involves potentially a thousand pages, and word-alignment is required.

I've been checking out the various hyperaudio demos, and I have questions ;)

1. My understanding was that legit HTML5 data attributes must be prefixed with 'data-'; where did 'oval' and 'm' come from and are they legit HTML5 data-attributes? I'm also wondering about the pros/cons of each word being duped as an attribute of itself (e..g., oval=), versus IDs for each word (as in my hack prototype); I suppose the former is more semantically meaningful and easier to read in an array.

Yes - technically we should use data-t or similar, I cut it down to t for some demos just for efficiency's sake. Transcripts can get rather large and including a data-t attribute instead of just t can cut down the size significantly. That said I am leaning towards using data-t in future for validation and accessibility reasons. The duplication present in earlier transcripts with the oval attribute was just because this was the format that the 3playmedia.com service exported word aligned transcripts in HTML.
 

2. In paneltext.js I see the .subtitle arrays; how did you go about getting the time values on these projects? This certainly seems to be the most tedious aspect of the project I'm looking at (approximately 1000 word-aligned pages, btw); even with my Logic chops however, it's not a quick task and I have no way to export into anything useful, e.g., xml. I had a go with Soundbooth, using the 'Analyze Speech’ + 'Export ... Speech Analysis'. Only with a reference text were the results useful, though the exported xml file has so much extraneous taggery that to extract/convert the time values to a data-attribute will require a load of editing in BBEdit. As to CMU Sphinx, I've installed it and ran a couple of the demos but many hours later I have no idea how exactly to proceed with it for my objectives. Some granny-proof documentation/tutorials would be most welcome, but that's another board ;)

At the moment I am using a third-party service such as 3playmedia or I'm grabbing subtitled media and guessing the word timings, which surprisingly is often good enough,

3. In the hyperpadaudio and Hyperdisken demos (http://happyworm.com/clientarea/hyperaudio/hap/v22/pad.htm, http://happyworm.com/clientarea/hyperaudio/htdemo/), where is the paneltext.js or its equivalent? Are you achieving the highlighting differently than in the other demos? Specifically, are you using an alternate method which doesn't require an array with timings for each word?

Nothing special going on here, the filenames may have changed but it is essentially the same approach.
 

As to this: http://hyper-audio.org/r/, well that's just a thing of beauty, hella nice work man.

Thanks there were a few peopel involved in that one, very much a team effort.
 

I thank you in advance for indulging my noob questions.

p.s. Mark, only after hours of investigation did I notice your presentation, which, IMO, is essential reading. Perhaps you can add a link to this google discussion page? For me, the key discovery therein are the available transcription services, should my results with Soundbooth prove too time-consuming. On a paid project I could easily justify hiring one of those services, though in my case, as I'll have the texts, I require only marked up timestamping.

Sure I'll post a link to it. However somewhere in this groups should be a post containing a list of the various transcription services. I plan to put a website together at some point which should make the key information more obvious. I should also investigate Soundbooth.
 

Cheers, sr


It's encouraging to hear about other people working in this area. I'd love to hear how you get on.

Cheers

Mark

Steve Raskin

unread,
Jun 5, 2012, 5:04:24 PM6/5/12
to hyper...@googlegroups.com
Thanks again Mark. My research into hyperaudio was prompted by an invitation from a client to advise them regarding a project which, I believe, is a most fitting application for hyperaudio. I’m awaiting their response and I’ll let you know what happens when I hear from them. Cheers, sr

Mark Boas

unread,
Nov 7, 2012, 11:47:16 AM11/7/12
to hyper...@googlegroups.com
Sorry for being quiet. I've been working on some stuff for Al Jazeera English running up to the election, the good news is it was related to hyperaudio.

here's an example:
http://www.aljazeera.com/indepth/interactive/2012/10/20121023134433218846.html

I included some undocumented url params so you can do stuff like this http://www.aljazeera.com/indepth/interactive/2012/10/20121023134433218846.html?k=economy&t=1000

Any thoughts of course appreciated.

Cheers

Mark

Weston Ruter

unread,
Nov 7, 2012, 11:57:34 AM11/7/12
to hyper...@googlegroups.com
Nice work, Mark. How did you obtain the word timing data?

Mark

unread,
Nov 7, 2012, 5:31:33 PM11/7/12
to hyper...@googlegroups.com
Thanks! We used 3playmedia.com

Mark Boas

unread,
Nov 14, 2012, 6:16:25 AM11/14/12
to hyper...@googlegroups.com
Latest screencast demonstrating typed natural language transitions http://happyworm.com/screencams/hyperaudiopad/2012-10-24/HyperaudioPad-Oct12.m4v

Zev Averbach

unread,
Nov 14, 2012, 12:38:13 PM11/14/12
to hyper...@googlegroups.com
I was blown away by the Al-Jazeera transcripts, Mark -- the hypertranscripts (sp?) are beautiful and eminently readable.  

Grabbed your code from Github and made this, using Inqscribe + the SRT--->Hypertranscript utility (minus the "oval"s).


Not as sexy as 3Play's timing, I know -- but our first attempt.

We are hoping to develop a transcription app (manual, not ASR) which generates hypertranscript and possibly WebVTT/SRT files by generating a timecode every time there's a space bar press, then compensating for the "hearing-->typing" latency before publishing the transcript.  

It sounds like Inqscribe's roadmap might eventually support something like this, and possibly these guys (Wreally Transcribe) too.

We're here to help, if you or anyone needs input on UI/UX, as well as transcription services.

Zev Averbach

Mark Boas

unread,
Nov 15, 2012, 4:56:23 PM11/15/12
to hyper...@googlegroups.com
That's great Zev - thanks!
Reply all
Reply to author
Forward
0 new messages