Uploaded torrent w/ raw audio of Open Source Bridge 2009, help us process it!

4 views
Skip to first unread message

Igal Koshevoy

unread,
Oct 16, 2009, 12:01:53 PM10/16/09
to osbr...@googlegroups.com
I uploaded a torrent with raw audio for the 2009 even, about ~3 GB of MP3s:


http://opensourcebridge.org/torrents/Open_Source_Bridge_2009_conference_audio,_raw.torrent


HELP WANTED:

We need to process these raw audio into delicious content that the
public will want to listen to. However, I won't be able to help much
with this beyond making sure that you have the data and suggest how to
organize this effort for those that would like to take this on:

1. At least one of you should confirm that you could download and play
the audio.

2. Write instructions for how to crowdsource the processing of these raw
files. I'd recommend creating a page on the planning wiki
<http://opensourcebridge.org/planning/> for a new "Audio Processing"
team that describes how to process these files into their final form,
such as:
a. Recommended tools with links, e.g., Audacity
b. Target format, e.g., MP3 mono VBR 128kbit/s normalized to -0.3 dB
c. What to do to the file, e.g. ensure audio starts with the
description of the talk else add it, then trim out unnecessary spaces,
suggested things to cut out, etc.
d. File naming conventions for the files (e.g., underlines rather
than spaces, sequence number, event name, name of talk, etc.
e. Guidelines for declaring status of the files, e.g., a table of
raw filenames on the wiki that volunteers can edit to declare who is
working on what file and it's state, e.g.,:
- Unclaimed file that needs a volunteer to process it.
- Claimed file that will be processed by a specific person, with
date of claim
- Uploaded file that's been processed by a specific person and
needs reviewing
- Rejected file that failed review, with reasons why
- Accepted file that's passed review by a specific person

3. Publish a broader call for volunteers to process the files, with
links to the finalized and tested instructions

4. Develop an easy way for folks to upload the processed audio files for
review and consolidation. Is there a free service people could upload
these files to easily, something like RapidShare but without the
annoyances? Otherwise, would setting up a group login on the production
servers that lets people use SFTP to upload to a chrooted directory be okay?

5. Develop a way to publish the consolidated set of processed audio
files. Maybe this is a torrent of everything and individual blip.tv
uploads of the rest.

6. Publish processed audio.

7. Broadcast announcements widely.

-igal

M. Edward (Ed) Borasky

unread,
Oct 16, 2009, 4:06:37 PM10/16/09
to osbr...@googlegroups.com
I'll volunteer to collect the audio processing tools, since I happen
to be doing that for my own stuff at the moment anyway. But I'd like
someone else to do the wiki page. I suspect Audacity can do nearly all
of it, but I haven't listened to any of them yet. In any event, I can
get some of the really fancy / tricky digital signal processing
software if we need it.

--
M. Edward (Ed) Borasky
http://borasky-research.net

"I've always regarded nature as the clothing of God." ~Alan Hovhaness

M. Edward (Ed) Borasky

unread,
Oct 17, 2009, 1:48:32 AM10/17/09
to osbridge


On Oct 16, 9:01 am, Igal Koshevoy <i...@pragmaticraft.com> wrote:
> I uploaded a torrent with raw audio for the 2009 even, about ~3 GB of MP3s:
>
> http://opensourcebridge.org/torrents/Open_Source_Bridge_2009_conferen...
>
> HELP WANTED:
>
> We need to process these raw audio into delicious content that the
> public will want to listen to. However, I won't be able to help much
> with this beyond making sure that you have the data and suggest how to
> organize this effort for those that would like to take this on:
>
> 1. At least one of you should confirm that you could download and play
> the audio.

Download in progress.

>
> 2. Write instructions for how to crowdsource the processing of these raw
> files. I'd recommend creating a page on the planning wiki
> <http://opensourcebridge.org/planning/> for a new "Audio Processing"
> team that describes how to process these files into their final form,
> such as:
>     a. Recommended tools with links, e.g., Audacity
>     b. Target format, e.g., MP3 mono VBR 128kbit/s normalized to -0.3 dB

You'll probably get a knee-jerk reaction from a lot of open source
folk at the mention of MP3. Open source purists / "software patent
haters" will tell you that you "should" use OGG format for that
reason. I don't want to get caught in the crossfire on this one, but I
will say that there *are* open source media players that run on
Windows and Macs that will play OGGs. And the OGG format is *slightly*
higher quality for a given compressed file size and *slightly* smaller
files for a given audio quality than MP3. ;-)

>     c. What to do to the file, e.g. ensure audio starts with the
> description of the talk else add it, then trim out unnecessary spaces,
> suggested things to cut out, etc.

There may be software out there that deletes (or trims) silences
automatically. I don't recall running into any, but it's simple enough
to do.

>     d. File naming conventions for the files (e.g., underlines rather
> than spaces, sequence number, event name, name of talk, etc.

I think there is a sort-of-kind-of "standard" for this. If you look at
what's on your system after you rip a CD, you'll see a distinct
structure. Let me see if I can find that documented somewhere - I
don't want to re-invent *that* wheel. ;-)

>     e. Guidelines for declaring status of the files, e.g., a table of
> raw filenames on the wiki that volunteers can edit to declare who is
> working on what file and it's state, e.g.,:
>         - Unclaimed file that needs a volunteer to process it.
>         - Claimed file that will be processed by a specific person, with
> date of claim
>         - Uploaded file that's been processed by a specific person and
> needs reviewing
>         - Rejected file that failed review, with reasons why
>         - Accepted file that's passed review by a specific person

I'll let someone else set this up. ;-)

>
> 3. Publish a broader call for volunteers to process the files, with
> links to the finalized and tested instructions
>
> 4. Develop an easy way for folks to upload the processed audio files for
> review and consolidation. Is there a free service people could upload
> these files to easily, something like RapidShare but without the
> annoyances? Otherwise, would setting up a group login on the production
> servers that lets people use SFTP to upload to a chrooted directory be okay?

There are lots of "free" services out there, but they all have some
kind of "catch" - ads, getting on email marketing lists, etc. Does
Open Source Bridge or a sponsor have the resources to host stuff?

Then again, if it's "only" 2.7 GB raw, perhaps we don't have a big
problem. That's only 4 CDs uncompressed and unedited. That's only
*one* DVD uncompressed and unedited.

M. Edward (Ed) Borasky

unread,
Oct 17, 2009, 2:42:16 AM10/17/09
to osbridge
Download completed and task research in progress:

1. The files are already in MP3 format. I don't have the gory details
yet - I need to load some software that inspects them. So I don't know
if they're compressed or not.

2. It looks like there are three major tasks:

a. Tag the files. This could be done manually, but there are loads
of libraries in everyone's favorite scripting languages to do this, so
all we really need is a CSV file with the info that goes into the tags
with the file names and a couple hours of hacker time.

Once the files are tagged, the organization falls out automatically.
If there's a machine-readable document with the session descriptions,
that would be a great place to start. I'll dig up the libraries for
Perl and experiment with tagging a file or two.

b. Audio editing of the tagged files. That we will undoubtedly need
to do manually, and we might want to have two editors per session if
there are enough "curator" cycles to put on this.

c. Server design and deployment for the edited files.

Igal Koshevoy

unread,
Oct 17, 2009, 2:49:13 AM10/17/09
to osbr...@googlegroups.com
M. Edward (Ed) Borasky wrote:
>> b. Target format, e.g., MP3 mono VBR 128kbit/s normalized to -0.3 dB
>>
>
> You'll probably get a knee-jerk reaction from a lot of open source
> folk at the mention of MP3. Open source purists / "software patent
> haters" will tell you that you "should" use OGG format for that
> reason. I don't want to get caught in the crossfire on this one, but I
> will say that there *are* open source media players that run on
> Windows and Macs that will play OGGs. And the OGG format is *slightly*
> higher quality for a given compressed file size and *slightly* smaller
> files for a given audio quality than MP3. ;-)
>
I love open source, but MP3 is the only format that the vast majority of
devices can play, so we must support it. We can publish OGG files in
addition if someone writes the spec for how to encode these.

>> d. File naming conventions for the files (e.g., underlines rather
>> than spaces, sequence number, event name, name of talk, etc.
>>
>
> I think there is a sort-of-kind-of "standard" for this. If you look at
> what's on your system after you rip a CD, you'll see a distinct
> structure. Let me see if I can find that documented somewhere - I
> don't want to re-invent *that* wheel. ;-)
>

The files have names like "Wed_110_SESSION4.mp3", which are useless to
the public. We need to come up with a consistent way to give them useful
names like:

osbridge2009-0025-Spindle,_Mutilate_and_Metaprogram-by_Markus_Roberts_and_Matt_Youell.mp3

>> 4. Develop an easy way for folks to upload the processed audio files for
>> review and consolidation. Is there a free service people could upload
>> these files to easily, something like RapidShare but without the
>> annoyances? Otherwise, would setting up a group login on the production
>> servers that lets people use SFTP to upload to a chrooted directory be okay?
>>
>
> There are lots of "free" services out there, but they all have some
> kind of "catch" - ads, getting on email marketing lists, etc. Does
> Open Source Bridge or a sponsor have the resources to host stuff?
>

This task is about sharing intermediate files between members of the
processing team and NOT with the general public, which are addressed
elsewhere[1]. I'm fine with ads, but not fine with RapidShare's "you
must wait 60 seconds before downloading" approach. If we can find a free
and easy 3rd party way to share these somewhat large files, then I don't
have to engineer and support another custom service -- managing
accounts, locking them down, enforcing quotas, backing up data, etc.
This isn't "hard", I just want to avoid doing this if I can.

> Then again, if it's "only" 2.7 GB raw, perhaps we don't have a big
> problem. That's only 4 CDs uncompressed and unedited. That's only
> *one* DVD uncompressed and unedited.

I could dedicate 7GB of space for these files on the existing production
servers, while still leaving space for other things. However, the total
disk space used will depend on the workflow, for example, if we're
exchanging lossless files that are then used to produce the MP3 and OGG
files, then we need more space.

-igal


[1] The actual effort of publishing the final content to the general
public is described in these tasks:

M. Edward (Ed) Borasky

unread,
Oct 17, 2009, 3:29:50 AM10/17/09
to osbr...@googlegroups.com
On Fri, Oct 16, 2009 at 11:49 PM, Igal Koshevoy <ig...@pragmaticraft.com> wrote:
> The files have names like "Wed_110_SESSION4.mp3", which are useless to
> the public. We need to come up with a consistent way to give them useful
> names like:
>
> osbridge2009-0025-Spindle,_Mutilate_and_Metaprogram-by_Markus_Roberts_and_Matt_Youell.mp3

See my previous email - tag them and software will rename them in any
number of organized ways with a directory structure and all that. I've
got stuff on my Windows box to do that already, actually - a side
benefit is that I can load them on my Zunes. ;-)

> I could dedicate 7GB of space for these files on the existing production
> servers, while still leaving space for other things. However, the total
> disk space used will depend on the workflow, for example, if we're
> exchanging lossless files that are then used to produce the MP3 and OGG
> files, then we need more space.

Are the "raw" files I downloaded lossless? They won't get any bigger
if they are. ;-) If they're compressed, do they need to be
decompressed to edit them?

We can store and exchange files with a lossless compression scheme -
flac is the obvious one. I'm guessing we will get pretty good
compression, since they're mostly a single human voice talking rather
than the Shostakovich 11th Symphony. ;-)

Lance

unread,
Oct 17, 2009, 8:57:29 AM10/17/09
to osbr...@googlegroups.com

If you have problems finding a place to host the processed audio
files, the OSL might be able to host them on our FTP servers if the
total amount is no more than a couple of gig compressed audio. You
know how to get ahold of me whenever you get to this task :)

-Lance

Reply all
Reply to author
Forward
0 new messages