audio upload / post-processing

Laurent Savaete

unread,

Mar 5, 2012, 11:55:01 AM3/5/12

to ductus-d...@googlegroups.com

Here's a long overdue email with a few thoughts regarding audio
processing upon upload.

At the moment, we can upload files in ogg or aac format, the server
saves them directly and returns a urn for them. End of story.

This scheme has a few limitations, such as:
1- online recording is (for now) only able to deliver wav files. They
are excessively big, and since they are recorded through flash, the
actual audio quality is not very high anyway.
2- when concatenating several audio files together (like when
producing the podcast), it would be a good idea to:
2a- clean up the audio bits, to avoid clicks when playing the
"tape" between two bits (in audacity, you'd do edit/Find zero
crossings)
2b- standardise audio volume,
2c- possibly do noise removal, and such.

1- is more pressing than any of the 2- series. As Jim mentioned in
another email, allowing wav files onto the server now pretty much
means allowing the format forever, and wasting the corresponding disk
space/bandwidth.

Here are two ways I see to tackle these issues:
a- get the server to do all the processing (like convert wav to ogg),
which is all CPU intensive. This is the most straightforward way to do
it, (almost) all the code exists, but raises the issue of server load
(scaling up) and Jim mentioned possible issues with DoS attacks. From
testing wih a few wav samples taken from the FSI material, system
running time on a wav->ogg conversion for a sample a few seconds long
was 20-30 millisecs using ffmpeg (on devbox). Looking at the cpu time
used by sql queries (on my machine, tiny sqlite db), we are already
around the 10ms figures/page.

b- delegate as much as possible to the client.
= b1- Converting wav to ogg is possible in flash (there is a proof of
concept I mentioned earlier, but it relies on "lab" features of flash
("Alchemy" which allows linking C/C++ code into flash binaries) which
may or may not exist in future releases. Adobe seems to have stated
they would include this in upcoming versions of the player, but I
wouldn't rely much on that.
= b2- there are partial solutions like firefogg.org which provides a
web layer over ffmpeg and may help create an ogg encoding solution for
firefox users (who are willing to install ffmpeg+firefogg...). This is
all free, but VERY user unfriendly.
= b3- as stupid as it may sound, we could probably port a limited
version of the ogg/vorbis encoder to javascript. Looking at some code
on xiph.org, porting the code would be a substantial amount of work,
but nothing mindblowing.
= b4- webaudio/webRTC seem to offer what we need (see second example
here: http://www.w3.org/TR/webrtc/#examples). That would allow us to
get rid of the proprietary flash thing. All formats we need seem to be
supported (in Chromium/Chrome dev branches at least). Problem is: when
will this be available in standard releases?

As a way forward, I'd suggest the following path:
1 - we allow wav upload for now (maybe finding a way to let only stuff
recorded online through), and convert to ogg on server side as uploads
come in. We currently have (very) low traffic, so it shouldn't
overload the server. It's easy to implement, as it just requires a few
lines of code. We store no wav files, so no need to worry about them
once we close the door in the future. All we get is a temporary impact
on bandwidth consumed by initial upload of sounds recorded online.
2 - We improve our online recorder to support webaudio/webrtc api in
browsers that support it, with the current flash as fallback.
Hopefully chrome/chromium users will benefit from it very soon, then
firefox, and what else? :) I haven't managed to figure this out from
specs, but we may even be able to convert the output of flash
recording to ogg using webaudio on the client.
3 - we setup stats so we can track usage of each type of recording
method, and get rid of the old flash blurb as soon as it falls below a
certain percentage of users. When that is done, we can dump flash, and
ban wav from uploads.

With those 3 steps, we can deploy a satisfactory UI to the main server
next week with online recording (which I think is very important),
create only little temporary code, and limit the risk of server
overload, while not polluting the storage db with fat wav files.

Any opinions?

Ian Sullivan

unread,

Mar 5, 2012, 12:05:02 PM3/5/12

to ductus-d...@googlegroups.com

On 03/05/2012 11:55 AM, Laurent Savaete wrote:
> As a way forward, I'd suggest the following path:
> 1 - we allow wav upload for now (maybe finding a way to let only stuff
> recorded online through), and convert to ogg on server side as uploads
> come in. We currently have (very) low traffic, so it shouldn't
> overload the server. It's easy to implement, as it just requires a few
> lines of code. We store no wav files, so no need to worry about them
> once we close the door in the future. All we get is a temporary impact
> on bandwidth consumed by initial upload of sounds recorded online.
> 2 - We improve our online recorder to support webaudio/webrtc api in
> browsers that support it, with the current flash as fallback.
> Hopefully chrome/chromium users will benefit from it very soon, then
> firefox, and what else? :) I haven't managed to figure this out from
> specs, but we may even be able to convert the output of flash
> recording to ogg using webaudio on the client.
> 3 - we setup stats so we can track usage of each type of recording
> method, and get rid of the old flash blurb as soon as it falls below a
> certain percentage of users. When that is done, we can dump flash, and
> ban wav from uploads.
>
> With those 3 steps, we can deploy a satisfactory UI to the main server
> next week with online recording (which I think is very important),
> create only little temporary code, and limit the risk of server
> overload, while not polluting the storage db with fat wav files.
>
> Any opinions?

I think that sounds like a very reasonable plan, especially with the
addition of stat monitoring you mention in #3.

Having talked with a couple of people about the summer project this past
week and done some touring with them through the main site, dev site,
etc, I very much agree that getting the dev UI and online recording up
on the main site is more important at the moment than the server load.

-Ian

Laurent Savaete

unread,

Mar 7, 2012, 12:17:21 PM3/7/12

to ductus-d...@googlegroups.com

As the plan is so popular, I acted upon it today :)
If you play around with the editor on devbox, you can now record
online in Wav format, which is encoded to ogg/vorbis on the server
when you upload the file.
From that point on, the wav file is gone, and every subsequent action
(like listening, etc...) is performed on the compressed file.
So it is transparent from a user perspective (you'll have to check
transfers in firebug to actually see the difference. The compressed
ogg file is 5-10 times smaller than wav file)
Manually uploading a wav file is not prevented yet, but I'll try to
think of something smart to limit waste of bandwidth.

Jim, I pushed the corresponding code on a new branch
(audio-processing). Would be ace if you could check the few commits in
it, and merge into master. Then the ui-editor branch should only be js
code.

I also attempted to put together a simple html5 test using
getUserMedia() to record directly in chrome-dev, but the browser keeps
crashing. I'll keep you posted on my progress.

Jim Garrison

unread,

Mar 7, 2012, 11:42:48 PM3/7/12

to ductus-d...@googlegroups.com

Looks like March 26 according to
http://www.chromium.org/developers/calendar
(which is when Chrom[ium] 18 will become stable). I'm not sure if it
will be enabled by default, but a user of Chrom[ium] 18 (currently in
the beta channel) can go to chrome://flags and enable it.

Anyway, I think that 'b4' is clearly the best of these options.

> As a way forward, I'd suggest the following path:
> 1 - we allow wav upload for now (maybe finding a way to let only stuff
> recorded online through), and convert to ogg on server side as uploads
> come in. We currently have (very) low traffic, so it shouldn't
> overload the server. It's easy to implement, as it just requires a few
> lines of code. We store no wav files, so no need to worry about them
> once we close the door in the future. All we get is a temporary impact
> on bandwidth consumed by initial upload of sounds recorded online.

Some notes on this below

> 2 - We improve our online recorder to support webaudio/webrtc api in
> browsers that support it, with the current flash as fallback.
> Hopefully chrome/chromium users will benefit from it very soon, then
> firefox, and what else? :) I haven't managed to figure this out from
> specs, but we may even be able to convert the output of flash
> recording to ogg using webaudio on the client.

this would be awesome!

> 3 - we setup stats so we can track usage of each type of recording
> method, and get rid of the old flash blurb as soon as it falls below a
> certain percentage of users. When that is done, we can dump flash, and
> ban wav from uploads.
>
> With those 3 steps, we can deploy a satisfactory UI to the main server
> next week with online recording (which I think is very important),
> create only little temporary code, and limit the risk of server
> overload, while not polluting the storage db with fat wav files.
>
> Any opinions?

I suppose this all makes sense, especially since we only plan to allow
the server to accept WAV files for a limited time. But there are some
denial of service issues, as you noted...

* Currently there is no max size of an accepted WAV file that might be
passed to the encoder. On wikiotics.org we have nginx configured not to
accept files about 20 megabytes (though this does not help other users
of ductus). How long does a 20 megabyte WAV file take to encode, and
what is the RAM footprint while this happens?

* Currently we have only a few processes of gunicorn running. If these
are all occupied, then all requests to the site are locked out until one
finishes. In order to combat DOS attacks, the mediacache uses a
separate pool of gunicorn processes which are /only/ used for media
conversion. This ensures that only a few threads of media conversion
are going at a time, and also that people can still get to the content
of the main site even if there is a backlog of media waiting to be
converted. If we do conversion upon upload, we won't have these safety
features. One solution would be to create a third pool for all POST
requests. Or, we could just hope that nobody sends us too much audio
all at once, but if this were to happen it would break the entire site,
as things stand.

Laurent Savaete

unread,

Mar 9, 2012, 8:39:40 AM3/9/12

to ductus-d...@googlegroups.com

Alright people, here is the show stopper I found today:
https://code.google.com/p/chromium/issues/detail?id=112367

it's a bug in chrome that prevents audio capture. Video is apparently
more important to most people, what are they thinking? :)
It's about a month old, so it's a hot issue, I suppose.

So anyway, we can prioritize this bug higher by going to the link
above, and starring the issue. Please do so if you can, and spread the
word!
Which means html5 *audio* capture probably won't work by the end of the month...
(and apparently, opera has the same problem)

Jim Garrison

unread,

Mar 10, 2012, 3:05:43 PM3/10/12

to ductus-d...@googlegroups.com

On 03/07/2012 09:17 AM, Laurent Savaete wrote:
> Jim, I pushed the corresponding code on a new branch
> (audio-processing). Would be ace if you could check the few commits in
> it, and merge into master.

Okay, I cleaned it all up, making it into once patch since there is no
reason to allow WAV files into the ResourceDatabase if we are converting
them all to Ogg/vorbis upon upload anyway. Attached is the current
state of the patch. It would be very nice to have some stats on the
maximum cpu time and ram we expect the conversion to take before we
deploy this to the live site.

Also, is there a reason you used "audio/x-wav" instead of "audio/wav"?
Per <http://tools.ietf.org/html/draft-ietf-appsawg-xdash-03> (which is
not yet approved, currently a draft), I would be inclined to lean toward
making it "audio/wav".

Another concern I have is testing for this text "Stream #0.0: Audio:
pcm_s16le". This precise string might /never/ exist if the program is
not being run from an English locale. It would be safer to test for
something that will always exist, such as simply "pcm_s16le". Which
brings me to another point: it looks like if we test for this string
specifically we are only allowing stereo 16-bit little endian wav files.
I think this could cause problems, /unless/ we are sure that wami will
always return this format *and* if we only plan to support wav files
recorded with wami. :)

0001-Audio-upload-accepts-WAV-files-converting-to-ogg-vor.patch

Jim Garrison

unread,

Mar 10, 2012, 6:25:34 PM3/10/12

to ductus-d...@googlegroups.com

Thanks for finding this. It's too bad it doesn't work yet, but good to
know now nonetheless!

Laurent Savaete

unread,

Mar 11, 2012, 10:43:01 AM3/11/12

to ductus-d...@googlegroups.com

> Okay, I cleaned it all up, making it into once patch since there is no
> reason to allow WAV files into the ResourceDatabase if we are converting
> them all to Ogg/vorbis upon upload anyway. Attached is the current

thanks for the clean up, it does make sense indeed not to let wav through.

> state of the patch. It would be very nice to have some stats on the
> maximum cpu time and ram we expect the conversion to take before we
> deploy this to the live site.

here are some values, using tstime
(https://bitbucket.org/gsauthof/tstime) which apparently is fairly
accurate compared to builtins like time
(http://unix.stackexchange.com/questions/18841/measuring-ram-usage-of-a-program
for more details):

command line used: ffmpeg -i in.wav -acodec libvorbis out.ogg
=============================
wav: 300kb (3s long)
real 0.193 s, user 0.092 s, sys 0.008s
rss 5860 kb, vm 105672 kb
wav: 1.5Mb (17s long)
real 0.283 s, user 0.244 s, sys 0.016s
rss 5992 kb, vm 105872 kb
wav: 10.5Mb (2min 2s)
real 2.130 s, user 2.120 s, sys 0.008s
rss 6060 kb, vm 105932 kb

So ram values seem fairly stable. Runtime is sort of proprotional to
file size. If we want to, we can simply cap the wav file size in the
validation stage, or also do that on the flash recorder, by allowing
only a certain recording duration.
We could cap it at say 30s, which would result in roughly 1.5Mb wav
files (from testing on local box, file durations above are not all
same bitrates), hence a compression time of around 0.3s (real).
I'm a bit reluctant to get my hands into flash code again (I just find
it really crappy :) but if you feel better that way, we'll just tame
the beast!

> Also, is there a reason you used "audio/x-wav" instead of "audio/wav"?
> Per <http://tools.ietf.org/html/draft-ietf-appsawg-xdash-03> (which is
> not yet approved, currently a draft), I would be inclined to lean toward
> making it "audio/wav".

one thing I'm sure of is that I tried audio/wav first, and it didn't work :)
Wami actually specifically marks data as audio/x-wav in its code,
which is probably why I went for that option in the end.

> Another concern I have is testing for this text "Stream #0.0: Audio:
> pcm_s16le". This precise string might /never/ exist if the program is
> not being run from an English locale. It would be safer to test for
> something that will always exist, such as simply "pcm_s16le". Which
> brings me to another point: it looks like if we test for this string
> specifically we are only allowing stereo 16-bit little endian wav files.
> I think this could cause problems, /unless/ we are sure that wami will
> always return this format *and* if we only plan to support wav files
> recorded with wami. :)

Wami hardcodes recording settings as 22khz little endian, 16bits.
Since we don't actually intend to accept wav, but we do it just for
wami recordings, I suppose it does make sense to restrict what we
accept?
About the non-English versions of ffmpeg, I looked at the manpage,
binaries, source code, skimmed through the whole ffmpeg site
documentation, searched around on the web, but haven't been able to
find anything that shows it is even i18n'd. So I wouldn't worry about
this :)

From my end, I don't see any reason to block this patch, especially as
you updated it, it looks good to me.

Jim Garrison

unread,

Mar 11, 2012, 11:51:29 AM3/11/12

to ductus-d...@googlegroups.com

On 03/11/2012 07:43 AM, Laurent Savaete wrote:
> So ram values seem fairly stable. Runtime is sort of proprotional to
> file size. If we want to, we can simply cap the wav file size in the
> validation stage, or also do that on the flash recorder, by allowing
> only a certain recording duration.
> We could cap it at say 30s, which would result in roughly 1.5Mb wav
> files (from testing on local box, file durations above are not all
> same bitrates), hence a compression time of around 0.3s (real).
> I'm a bit reluctant to get my hands into flash code again (I just find
> it really crappy :) but if you feel better that way, we'll just tame
> the beast!

I'm fine with how things stand, unless/until we find that it actually
causes problems. Knowing that the RAM usage is stable puts an upper
bound on the resources required (since we know we will never have to
thrash while encoding).

>> Also, is there a reason you used "audio/x-wav" instead of "audio/wav"?
>> Per <http://tools.ietf.org/html/draft-ietf-appsawg-xdash-03> (which is
>> not yet approved, currently a draft), I would be inclined to lean toward
>> making it "audio/wav".
>
> one thing I'm sure of is that I tried audio/wav first, and it didn't work :)
> Wami actually specifically marks data as audio/x-wav in its code,
> which is probably why I went for that option in the end.

It looks like the real reason it didn't work is because python-magic
refers to wav as audio/x-wav. So I took the liberty of recognizing this
but still referring to it as audio/wav in the code. Let me know if this
breaks anything :)

>> Another concern I have is testing for this text "Stream #0.0: Audio:
>> pcm_s16le". This precise string might /never/ exist if the program is
>> not being run from an English locale. It would be safer to test for
>> something that will always exist, such as simply "pcm_s16le". Which
>> brings me to another point: it looks like if we test for this string
>> specifically we are only allowing stereo 16-bit little endian wav files.
>> I think this could cause problems, /unless/ we are sure that wami will
>> always return this format *and* if we only plan to support wav files
>> recorded with wami. :)
>
> Wami hardcodes recording settings as 22khz little endian, 16bits.
> Since we don't actually intend to accept wav, but we do it just for
> wami recordings, I suppose it does make sense to restrict what we
> accept?
> About the non-English versions of ffmpeg, I looked at the manpage,
> binaries, source code, skimmed through the whole ffmpeg site
> documentation, searched around on the web, but haven't been able to
> find anything that shows it is even i18n'd. So I wouldn't worry about
> this :)

Sure, that works for me.

> From my end, I don't see any reason to block this patch, especially as
> you updated it, it looks good to me.

Great, it's been cherry-picked into master. Thanks for your work on this!

Cheers,
Jim

Laurent Savaete

unread,

Mar 11, 2012, 7:14:02 PM3/11/12

to ductus-d...@googlegroups.com

> It looks like the real reason it didn't work is because python-magic
> refers to wav as audio/x-wav. So I took the liberty of recognizing this
> but still referring to it as audio/wav in the code. Let me know if this
> breaks anything :)

ok, I just rebased ui-editor on top of master, and it seems to work
fine on my laptop. (thanks Jim for helping me out with git!)
I pushed to gitorious and devbox.

Jim Garrison

unread,

Mar 11, 2012, 7:31:47 PM3/11/12

to ductus-d...@googlegroups.com

On 03/11/2012 04:14 PM, Laurent Savaete wrote:
> ok, I just rebased ui-editor on top of master, and it seems to work
> fine on my laptop. (thanks Jim for helping me out with git!)
> I pushed to gitorious and devbox.

I removed a couple of the early commits that allowed WAV files to be
contributed and played. The result is now on the ui-editor-cleanup branch.

I will look through the giant diff of everything and try to catalogue
any issues that I find.

A few things off the top of my head:

* How is it that one is supposed to include wami? (I notice it's not in
the repository; is the precise version you're using from elsewhere?).
Some options for moving forward may include using git-submodule and
keeping a fork of everything in a separate repository. In general I
like to keep all the javascript code we depend on in the repository
along with source. If the flash code is too big, then I suppose we'll
just keep it elsewhere... I know you have a wami branch on google code
or somewhere, but please update me on the latest about how you think
things /should/ work.

* I'll probably add swfobject.js to the repository instead of having it
be pulled from google.

Laurent Savaete

unread,

Mar 11, 2012, 8:40:19 PM3/11/12

to ductus-d...@googlegroups.com

> I removed a couple of the early commits that allowed WAV files to be
> contributed and played. The result is now on the ui-editor-cleanup branch.
>
> I will look through the giant diff of everything and try to catalogue
> any issues that I find.
>
> A few things off the top of my head:
>
> * How is it that one is supposed to include wami? (I notice it's not in
> the repository; is the precise version you're using from elsewhere?).
> Some options for moving forward may include using git-submodule and
> keeping a fork of everything in a separate repository. In general I
> like to keep all the javascript code we depend on in the repository
> along with source. If the flash code is too big, then I suppose we'll
> just keep it elsewhere... I know you have a wami branch on google code
> or somewhere, but please update me on the latest about how you think
> things /should/ work.

the only thing that needs to be included from wami is the flash
object. I'm not sure what the proper way to do it is.
Unless I'm mistaken, all the javascript we need is already in our git
repository.
The code should be at
https://code.google.com/r/laurentsavaete-wami-for-ductus/ (but the
Makefile is missing from what I see, i'll add it tomorrow)
With what I've done on the ui, most of the wami JS code was rewritten,
so we don't actually need/want any of the original wami JS code now.
Flash code: I added functions to original wami (pass data to JS...)
that the author doesn't seem interested in. So we have to take care of
that code ourselves. (current flash code weighs about 176K)

> * I'll probably add swfobject.js to the repository instead of having it
> be pulled from google.

that sounds like a good idea.

Laurent Savaete

unread,

Mar 12, 2012, 10:20:45 AM3/12/12

to ductus-d...@googlegroups.com

Jim,

I just pushed a Makefile and a simple README to the hg repository for
the wami recorder.
Let me know what you'd like me to do with it in order to integrate
properly with ductus (in terms of installation, I suppose it would be
ace if we could have ductus install pull in the flex code
automatically)
Just to clarify, the few js files that are in the wami repository
(mine) are just examples, we don't actually need or want them for
ductus.

Seeing how the original wami code is erring here and there,
introducing api changes, bugs and then reverting them, I'd rather stay
a bit away from it. We could convert the google code hg repository to
a git one, if that makes pulling the code into ductus easier (you
mentioned git-submodule)
Or we can go as far as to include all the flex code into the ductus
code base, but that's your call.
Unless there are API changes in the flash player, I don't really see
any reason for the flash code to change much now that it works (bugs
being an exception).
Changes to use html5 getUserMedia() would happen in the ductus code.

Also if you need any help/explanations regarding the js code, you know
where I am.

Jim Garrison

unread,

Mar 13, 2012, 9:52:46 PM3/13/12

to ductus-d...@googlegroups.com

On 03/12/12 07:20, Laurent Savaete wrote:
> I just pushed a Makefile and a simple README to the hg repository for
> the wami recorder.
> Let me know what you'd like me to do with it in order to integrate
> properly with ductus (in terms of installation, I suppose it would be
> ace if we could have ductus install pull in the flex code
> automatically)
> Just to clarify, the few js files that are in the wami repository
> (mine) are just examples, we don't actually need or want them for
> ductus.

I'm leaning toward including both the swf file and the latest source
from mercurial directly in the git repository, but without the full
revision history (just as we do for external javascript libraries). The
source isn't very big, especially if we can leave out the examples.
Since I don't know how to build it and what is safe to leave out (beyond
what I can tell from looking at the Makefile), ping me off list when you
wake up, and we can hash out the details. Or, feel free to make a
commit to ductus that just adds the flash object with source alongside
it (but without the .hg directory ;).

Reply all

Reply to author

Forward