|Hyperaudio Applications and Demos||Mark Boas||11/17/11 8:57 AM|
Applications and Demos
Proof of concept : http://happyworm.com/jPlayerLab/audiotextsync/v13/
Danish Radio Demo - Hyperdisken Programme : http://yoyodyne.cc/h/ -
Minnesota Public Radio - http://minnesota.publicradio.org/proto/hyperAudio/ (based on Hyperdisken)
Radiolab / Soundcloud : http://hyper-audio.org/r/
Henrik Moltke Interview on DR : http://happyworm.com/clientarea/hyperaudio/htdemo/
Hyperaudio Pad Prototype : http://happyworm.com/clientarea/hyperaudio/hap/
(work in progress - click on the latest version number to start)
|unk...@googlegroups.com||11/18/11 1:25 AM||<This message has been deleted.>|
|Re: Hyperaudio Applications and Demos||Mark Boas||12/13/11 10:12 AM|
A screencast I made yesterday demoing the Hyperaudio Pad. http://happyworm.com/screencams/hyperaudiopad/2011-12-12/
|Re: Hyperaudio Applications and Demos||Matthew Terenzio||12/13/11 4:27 PM|
Very good. I'm beginning to get it. Wheels are turning . . .
|Re: Hyperaudio Applications and Demos||Mark Boas||12/14/11 2:54 AM|
Good to hear Matt - if you have any ideas or can think of any uses cases, I'd love to hear them.
|Re: Hyperaudio Applications and Demos||Mark Boas||1/30/12 1:43 AM|
This popped up in my tweet-stream.
“@MattWilcox: My problem with video blogs is they take too damned long to consume. Give me the details in text, if I'm interested, I'll watch.”
It seems there is a need for Hyperaudio in all sorts of circumstances, the trick is to make the process of making Hyperaudio easy enough for the every day producer. This is what I hope we can focus on.
|Re: Hyperaudio Applications and Demos||Mark Boas||5/3/12 2:01 AM|
A type of hyper-transcript : https://developer.mozilla.org/en-US/demos/detail/html5-audio-read-along/launch by Weston Ruter.
Cool that you can control the speed (if your browser supports it).
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Mariano Blejman||5/3/12 3:15 AM|
hi, i am using firefox 12, and i can't see the speed control (i can see it in google chrome but sound doesn't work)
Editor-in-chief Suplemento NO
Editor of Digital Culture section
Página/12, Buenos Aires, Argentina
Hacks/Hackers Buenos Aires
twitter: @blejman @supleno @cult_digital @hackshackersba
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Mark Boas||5/3/12 3:20 AM|
I haven't had a chance to play with playbackrate much yet so I'm not quite sure what's causing the issue. You can check whether your browser supports it by trying out http://areweplayingyet.org/
Might be worth contacting the author? I would be interested in knowing why too! :)
|Re: Hyperaudio Applications and Demos||J. Reuben Wetherbee||5/17/12 10:58 AM|
My name is Reuben Wetherbee and I work for the University of Pennsylvania School of Arts and Science. I was recently given a project to help members of the Writing Program come up with a way to align a whole bunch of readings by poets with the corresponding text. I ended up using jPlayer and Hyperaudio concepts to get the job done. The final product has both a player and creator/editor module where the Writing Program can do some manual alignment for themselves starting from plain text. Manual alignment works since most of the poems are rather short. You can choose to align by word or by line and edit already tagged text.
Anyway, if anybody from the group wants to check it out and give me feedback that would be great.
The code is on github at www.github.com/jrweth/MediaAlignedText
You can see some documentation and some demos at:
Some Poems that have already been aligned and rolled out:
|Re: Hyperaudio Applications and Demos||Mark Boas||5/21/12 3:53 AM|
It's great to see this type of thing being developed. I tried it out and it works well. I especially like the timeline view.
What I'm aiming at setting up is an algorithm that will create a good first pass of speech-to-text timings which can then be tweaked. An algorithm rarely gets everything right! I'm not quite sure of the form the interface for the tweaking will take just now.
All very exciting! Keep us posted.
PS Have you got a demo version of the editor online somewhere? - I find that's often a useful complement to having the code on github as it lets people try tools out really easily.
|Re: Hyperaudio Applications and Demos||J. Reuben Wetherbee||5/23/12 6:09 AM|
Yes. Keep me posted on on the speach-to-tet timings project. I fooled around with CMU Sphinx a bit as well. One of the problems I ran into was jus the sheer time it took to do the parsing for longer files, which didn't necessarily work out well in a web context. I also fooled around with the aligner provided by the Linguistics department here at Penn which I got somewhat working spitting out hyperaudio, but ran into similar issues in longer files (e.g. over 10 minutes).
p.s. there are demos for the editor at the github documentation site (http://jrweth.github.com/MediaAlignedText/editor_demo.html). Do you think additional demos would be helpful?
|Re: Hyperaudio Applications and Demos||tara shankar||5/29/12 8:16 PM|
I'm really happy to see this kind of textual and sound interplay work. I especially like the more designed treatment of the transcript in the Radiolab / Soundcloud npr where the transcript (I assume) is algorithmically placed based on certain attributes from the voice audio track. I'm guessing... there's a constant vertical scroll animation and the word onset determines the y placement. The size of the words is based on average amplitude or a marked up feature? Really nice and amusing treatment of the program for online consumption.
There is some very early textual animation work done at the MIT Media lab by folks like Suguru Ishizaki (I think still at CMU), Yin Yin Wong (1995 -- mostly designed, not algorithmic). I contributed some work that built on their phenomenal design and programming sense by using prosodic characteristics of the text to animate the "transcript". My work was called Prosodic Font (1998) -- I'm not sure the demo still works (Java 1.1 :) There was no backend to it -- the system accepted a marked up form of the transcript to produce the visualization.
I remain very interested in this area. It seems like the time is drawing very near when there are complete mash-ups between reco, prosodic recognition, semantics accessible from a Web scripting language. I reference the AT&T speech mashup portal and a few other efforts. The big missing piece for me still is the lack of good audio recording through HTML5 or other open source languages.
|Re: Hyperaudio Applications and Demos||Thought Bubble||5/30/12 1:19 PM|
Great to see this kind of development, I'm curious if there is any way to auto_transcribe the VO being spoken into a text stream(with associated time markers) of sorts. I know from experience that Adobe has a very strong auto transcriber built into Adobe Premier which can literally transcribe any given audio file. If you provide it with a VO matching script it is over 90% accurate. Have you had a chance to experiment with it?
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Mark Boas||6/1/12 4:46 AM|
Fantastic to see this discussion and great to have some time to finally respond ....
@Reuben regarding CMU Sphinx I have still yet to install successfully. But I would love to play about with it. I think I should find time to make a much more concerted effort in this area. After all the holy grail would be to create a web based service that would do most of the transcription and create an easily-to-use tool to 'tidy up' the results and get them talking! I don't think the time it takes to process is a show-stopper as we can email the user when ready and actually depending on server resource and popularity we may have to queue jobs up. I may ping you if I have issues installing. I hope that's OK! :)
@Tara with the Radiolab demo, many people perceive some sort of algorithm at work with the word sizes, but all we are actually doing is displaying the text in different colours depending on who is speaking. The size is actually random, but it's like our brains read more into it. I'd love however to take account of any extra meta-data we could associate with each word, such as volume, pitch and express that through colour and size. I'd love to see any work you have created in this area!
@Thought I have heard of that feature yes, But I'm still to stump up the cash for the latest version of Adobe Premier. What I'm aiming to do is to create a web-based tool that is free, easy to use and acceptably quick.
|Re: Hyperaudio Applications and Demos||Mark Boas||6/1/12 8:54 AM|
Just found this...
Uses CMU Sphinx and looks promising.
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Weston Ruter||6/1/12 10:28 AM|
@Mark: Thanks for the share.
@Mariano: Greetings. Yes, Firefox does not support the playbackRate property yet, so that's why the speed option is not available. It is supported by both Chrome and Safari, however, so you can try it in those browsers.
Also, happy to say that I won 2nd place in the audio Dev Derby. May this further spread the cause of hyperaudio!
|Re: Hyperaudio Applications and Demos||Steve Raskin||6/3/12 12:06 PM|
I've been checking out the various hyperaudio demos, and I have questions ;)
1. My understanding was that legit HTML5 data attributes must be prefixed with 'data-'; where did 'oval' and 'm' come from and are they legit HTML5 data-attributes? I'm also wondering about the pros/cons of each word being duped as an attribute of itself (e..g., oval=), versus IDs for each word (as in my hack prototype); I suppose the former is more semantically meaningful and easier to read in an array.
2. In paneltext.js I see the .subtitle arrays; how did you go about getting the time values on these projects? This certainly seems to be the most tedious aspect of the project I'm looking at (approximately 1000 word-aligned pages, btw); even with my Logic chops however, it's not a quick task and I have no way to export into anything useful, e.g., xml. I had a go with Soundbooth, using the 'Analyze Speech’ + 'Export ... Speech Analysis'. Only with a reference text were the results useful, though the exported xml file has so much extraneous taggery that to extract/convert the time values to a data-attribute will require a load of editing in BBEdit. As to CMU Sphinx, I've installed it and ran a couple of the demos but many hours later I have no idea how exactly to proceed with it for my objectives. Some granny-proof documentation/tutorials would be most welcome, but that's another board ;)
3. In the hyperpadaudio and Hyperdisken demos (http://happyworm.com/clientarea/hyperaudio/hap/v22/pad.htm, http://happyworm.com/clientarea/hyperaudio/htdemo/), where is the paneltext.js or its equivalent? Are you achieving the highlighting differently than in the other demos? Specifically, are you using an alternate method which doesn't require an array with timings for each word?
As to this: http://hyper-audio.org/r/, well that's just a thing of beauty, hella nice work man.
I thank you in advance for indulging my noob questions.
p.s. Mark, only after hours of investigation did I notice your presentation, which, IMO, is essential reading. Perhaps you can add a link to this google discussion page? For me, the key discovery therein are the available transcription services, should my results with Soundbooth prove too time-consuming. On a paid project I could easily justify hiring one of those services, though in my case, as I'll have the texts, I require only marked up timestamping.
|Re: Hyperaudio Applications and Demos||Mark Boas||6/3/12 2:56 PM|
My answers inline.
Yes - technically we should use data-t or similar, I cut it down to t for some demos just for efficiency's sake. Transcripts can get rather large and including a data-t attribute instead of just t can cut down the size significantly. That said I am leaning towards using data-t in future for validation and accessibility reasons. The duplication present in earlier transcripts with the oval attribute was just because this was the format that the 3playmedia.com service exported word aligned transcripts in HTML.
At the moment I am using a third-party service such as 3playmedia or I'm grabbing subtitled media and guessing the word timings, which surprisingly is often good enough,
Nothing special going on here, the filenames may have changed but it is essentially the same approach.
Thanks there were a few peopel involved in that one, very much a team effort.
Sure I'll post a link to it. However somewhere in this groups should be a post containing a list of the various transcription services. I plan to put a website together at some point which should make the key information more obvious. I should also investigate Soundbooth.
It's encouraging to hear about other people working in this area. I'd love to hear how you get on.
|Re: Hyperaudio Applications and Demos||Steve Raskin||6/5/12 2:04 PM|
Thanks again Mark. My research into hyperaudio was prompted by an invitation from a client to advise them regarding a project which, I believe, is a most fitting application for hyperaudio. I’m awaiting their response and I’ll let you know what happens when I hear from them. Cheers, sr
|Re: Hyperaudio Applications and Demos||Mark Boas||11/7/12 8:47 AM|
Sorry for being quiet. I've been working on some stuff for Al Jazeera English running up to the election, the good news is it was related to hyperaudio.
here's an example:
I included some undocumented url params so you can do stuff like this http://www.aljazeera.com/indepth/interactive/2012/10/20121023134433218846.html?k=economy&t=1000
Any thoughts of course appreciated.
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Weston Ruter||11/7/12 8:57 AM|
|Re: [hyperaudio] Re: Hyperaudio Applications and Demos||Mark Boas||11/7/12 2:31 PM|
Thanks! We used 3playmedia.com
|Re: Hyperaudio Applications and Demos||Mark Boas||11/14/12 3:16 AM|
Latest screencast demonstrating typed natural language transitions http://happyworm.com/screencams/hyperaudiopad/2012-10-24/HyperaudioPad-Oct12.m4v
|Re: Hyperaudio Applications and Demos||Zev Averbach||11/14/12 9:38 AM|
I was blown away by the Al-Jazeera transcripts, Mark -- the hypertranscripts (sp?) are beautiful and eminently readable.
Grabbed your code from Github and made this, using Inqscribe + the SRT--->Hypertranscript utility (minus the "oval"s).
Not as sexy as 3Play's timing, I know -- but our first attempt.
We are hoping to develop a transcription app (manual, not ASR) which generates hypertranscript and possibly WebVTT/SRT files by generating a timecode every time there's a space bar press, then compensating for the "hearing-->typing" latency before publishing the transcript.
It sounds like Inqscribe's roadmap might eventually support something like this, and possibly these guys (Wreally Transcribe) too.
We're here to help, if you or anyone needs input on UI/UX, as well as transcription services.
|Re: Hyperaudio Applications and Demos||Mark Boas||11/15/12 1:56 PM|
That's great Zev - thanks!
On Wednesday, November 14, 2012 6:38:13 PM UTC+1, Zev Averbach wrote:
I was blown away by the Al-Jazeera transcripts, Mark -- the hypertranscripts (sp?) are beautiful and eminently readable.