Audio file conversion formats

Evelyn McLellan

unread,

Feb 10, 2010, 7:44:24 PM2/10/10

to kbra...@nla.gov.au, archiv...@googlegroups.com

Hi Kevin,

Peter mentioned that you're an audio preservation expert, so I'm hoping
you have some time to answer a few questions relating to the
Archivematica project.

We are looking at converting audio files and audio streams in
multi-media files to uncompressed PCM. I know that
Broadcast Wave is considered a standard audio preservation format, but
that's just a file format using the PCM encoding, correct? I'm asking
because when we convert multi-media files the resulting audio stream
isn't WAV, it's PCM. Please see the page for AVI preservation on our
wiki at
http://www.archivematica.org/wiki/index.php?title=Audio/Video_Interleaved_Format, especially the Conversion Test Results Section and you'll see what I've been doing.

You'll see that I specified the audio codec during the conversions as
pcm_s16le. But FFMpeg, which I was using for the conversions, offers the
following variants of PCM:

pcm_alaw PCM A-law
pcm_dvd PCM signed 20|24-bit big-endian
pcm_f32be PCM 32-bit floating point big-endian
pcm_f32le PCM 32-bit floating point little-endian
pcm_f64be PCM 64-bit floating point big-endian
pcm_f64le PCM 64-bit floating point little-endian
pcm_mulaw PCM mu-law
pcm_s16be PCM signed 16-bit big-endian
pcm_s16le PCM signed 16-bit little-endian
pcm_s16le_planar PCM 16-bit little-endian planar
pcm_s24be PCM signed 24-bit big-endian
pcm_s24daud PCM D-Cinema audio signed 24-bit
pcm_s24le PCM signed 24-bit little-endian
pcm_s32be PCM signed 32-bit big-endian
pcm_s32le PCM signed 32-bit little-endian
pcm_s8 PCM signed 8-bit
pcm_u16be PCM unsigned 16-bit big-endian
pcm_u16le PCM unsigned 16-bit little-endian
pcm_u24be PCM unsigned 24-bit big-endian
pcm_u24le PCM unsigned 24-bit little-endian
pcm_u32be PCM unsigned 32-bit big-endian
pcm_u32le PCM unsigned 32-bit little-endian
pcm_u8 PCM unsigned 8-bit
pcm_zork PCM Zork

I used pcm_s16le as the codec because it was the default for a number of
multimedia container formats. However, I'm not certain it's the right
choice.

I'm assuming that when a .WAV file is ingested, as long as its encoding
is uncompressed PCM then we should leave the file the way it is.

And last but perhaps not least, what do you think of FLAC as a
preservation format?

Thanks very much.

Evelyn McLellan
Systems Archivist
Artefactual Systems Inc.

Bigelow, Sue

unread,

Feb 11, 2010, 1:35:21 PM2/11/10

to archiv...@googlegroups.com

Just saw this . . .

Evelyn, if this helps, for preservation purposes we need not just PCM
but *linear* PCM. Some of the variants you have there are not linear.

Digressive explanation:
For instance "mulaw" and "alaw" (representing Greek symbols mu and
alpha, and law) refer to non-linear variants used in telecommunications.
Since the international standard is to use 8-bits for digital telecomm,
they need to find a way to represent the human voice realistically using
only 8 bits. Mulaw and alaw give you more bits in the higher auido
registers, where humans can distinguish more different sounds, and fewer
in th elower registers, where it all sounds muddy to us anyway. They are
different stadards used in different countries. (No, I have no idea how
they talk to each other)

So I think that many of the variantions shown will not be useful for us.

Sue Bigelow
Digital Conservator
City of Vancouver Archives

Hi Kevin,

Thanks very much.

--
You received this message because you are subscribed to the Google
Groups "archivematica" group.
To post to this group, send email to archiv...@googlegroups.com.
To unsubscribe from this group, send email to
archivematic...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/archivematica?hl=en.

Evelyn McLellan

unread,

Feb 11, 2010, 1:44:53 PM2/11/10

to archiv...@googlegroups.com

Thanks, Sue. I'll narrow the list down to codecs that produce LPCM.

Any thoughts on the video preservation formats? Discussion thread at
http://groups.google.ca/group/archivematica/browse_thread/thread/516b2f0436e1d0c3

I'd really like to hear what you think. Thanks.

Evelyn

Bigelow, Sue

unread,

Feb 11, 2010, 6:31:35 PM2/11/10

to archiv...@googlegroups.com

Sorry--guess I wasn't too clear.

Yes on the JPEG2K/MXF. Same conclusion that Jim Lindner arrived at back
in 2006 (JP2K wrapped in MXF) as discussed here, starting page 42:
http://www.danceheritage.org/preservation/DigitalVideoPreservation1.pdf
--codec analysys p 54 And since he consulted with the Presto project, no
surprise that they lean to JPEG2k, too. (Not sure if they use MXF).

And it looks like Presto
http://wiki.prestospace.org/pmwiki.php?n=Main.TechRef considers a
datarate of 90 Mb/s to be of master quality when used with lossless
JPEG2000 of standard def TV material, which is useful to know.

And now our problem is to find a way to produce that using open source
tools. A lot of places convert to JP2K using hardware or proprietary
tools, and of course that's not something we can replicate into the
future, but might explain the dearth of open-source alternatives--maybe
there just isn't the demand.

Evelyn

> ve d_Format, especially the Conversion Test Results Section and you'll

Evelyn McLellan

unread,

Feb 11, 2010, 7:10:53 PM2/11/10

to archivematica

I'll just summarize our phone call:

Our preferred option is MJPEG2000/LPCM wrapped in MXF. Unfortunately,
however, FFmpeg doesn't convert to MJPEG2000 - the best we can do
using that program is MPEG-2/LPCM/MXF. I haven't been able to find an
open-source Linux-based program that converts video to MJPEG2000. So
maybe we need to keep our options open - add a script to Archivematica
to convert video files to MPEG-2/LPCM/MXF but if desired the user
could switch off that option and use another program/operating system
to convert to MJPEG2000/LPCM/MXF pre-ingest.

Evelyn

On Feb 11, 3:31 pm, "Bigelow, Sue" <sue.bige...@vancouver.ca> wrote:
> Sorry--guess I wasn't too clear.
>
> Yes on the JPEG2K/MXF. Same conclusion that Jim Lindner arrived at back
> in 2006 (JP2K wrapped in MXF) as discussed here, starting page 42:http://www.danceheritage.org/preservation/DigitalVideoPreservation1.pdf
> --codec analysys p 54 And since he consulted with the Presto project, no
> surprise that they lean to JPEG2k, too. (Not sure if they use MXF).
>

> And it looks like Prestohttp://wiki.prestospace.org/pmwiki.php?n=Main.TechRefconsiders a

> datarate of 90 Mb/s to be of master quality when used with lossless
> JPEG2000 of standard def TV material, which is useful to know.
>
> And now our problem is to find a way to produce that using open source
> tools. A lot of places convert to JP2K using hardware or proprietary
> tools, and of course that's not something we can replicate into the
> future, but might explain the dearth of open-source alternatives--maybe
> there just isn't the demand.
>
> Sue Bigelow
> Digital Conservator
> City of Vancouver Archives
>
> -----Original Message-----
> From: archiv...@googlegroups.com
>
> [mailto:archiv...@googlegroups.com] On Behalf Of Evelyn McLellan
> Sent: Thursday, February 11, 2010 10:45 AM
> To: archiv...@googlegroups.com
> Subject: RE: [archivematica] Audio file conversion formats
>
> Thanks, Sue. I'll narrow the list down to codecs that produce LPCM.
>

> Any thoughts on the video preservation formats? Discussion thread athttp://groups.google.ca/group/archivematica/browse_thread/thread/516b2f0

> > For more options, visit this group athttp://groups.google.com/group/archivematica?hl=en.

Evelyn McLellan

unread,

Apr 1, 2010, 2:30:36 PM4/1/10

to Kevin Bradley, archiv...@googlegroups.com

Hi Kevin,

Thanks very much for your response. What I've been doing is using FFmpeg
to convert various formats to .wav formats, using the default encoding
of pcm_s16le, which as you point out is 16-bit audio. The sampling rate
does not have to be limited this way - I'm able to default it to
whatever I like. I could also choose, say, pcm_s24le if I wanted to. I
understand that the higher the settings, the better the quality and the
larger the file, & we need to make choices based on what kind of audio
files we're preserving (eg animal sounds for scientific research vs.
commercial music recordings).

You said that "But using riff, it will populate the file information
according to the bit and sampling rate of the audio." I'm not sure how
to use riff, per, se, aside from the fact that .wav is a sub-format of
riff. Could you clarify?

I guess one of my main concerns is that all the standards and
information on the web seem to be about how to make high-quality digital
copies from analogue. I think it's easier then just to choose the
highest-quality settings and standardize them across the board. But what
about when you're converting a variety of formats to one format? Should
I be taking, say, a 16-bit .wma file with a sampling rate of 22050 and
converting it to a 24-bit 96000kHz .wav file? Am I actually creating an
unacceptably altered (i.e. improved) version of the original?

Re converting to Motion JPEG 2000, that's what we originally wanted to
do but we couldn't find an open-source Linux-based tool to do the job. I
think that conversions to that format are mainly being done from
analogue at this point. We've settled for MPEG-2: please see the wiki
page at http://www.archivematica.org/wiki/index.php?title=Video.

Any advice you could give me would be much appreciated.

Evelyn

On Fri, 2010-03-26 at 16:19 +1100, Kevin Bradley wrote:

Hi Evelyn,

I've been away for a bit and while sitting in the plane began to feel
guilt for the little time I have had to spend on Archivematica, and so I
was just dragging all the emails into one place to start going though
them when I saw your email. My apologies in the delay in responding,
this was so long ago that being away is no excuse, and I'm forced to
fall back on being busy.

Yes, wav, and bwf, are file formats which may encode PCM audio. PCM
mean pulse code modulated, and generally is shorthand for uncompressed
audio. The RIFF File Reference defines how that audio is stored.
Waveformatex is the sytax that defines the audio structure, and
according to the RIFF spec when wFormatTag equals WAVE_FORMAT_EXTENSIBLE
you have an uncompressed wave file.

If you use pcm_s16le you limit the file to 16 bit audio (and perhaps
even the sampling rate ... I'm not sure). But using riff, it will
populate the file information according to the bit and sampling rate of
the audio.

My quick read of what you've been doing (an impressive amount), makes me
think I need to understand it better so as I can discuss when you are
applying normalisation and how you intend to do it. You state somewhere
(I read) that the sampling rate should be more than the original
sampling rate, but I'd add that the sampling rate should be a whole
number product of the original (so if the original is 48kHz, the file
should be 48kHz, or 96kHz or 192kHz). Application of part file changes
(eg 44.1kHz to 48kHz) adds variables due to conversion process as there
are a number of ways of calculating the change and it becomes difficult
to describe it in the metadata in a meaningful way.

I also remember seeing an email about motion JPEG2000 file conversion,
and it made me wonder about the use cases for doing on the fly
conversions.

However, you may already have answers to this, but I'd be keen to pick
up the discussion now.

Bigelow, Sue

unread,

Apr 1, 2010, 2:37:21 PM4/1/10

to archiv...@googlegroups.com

Upsampling isn't going to improve the file, just make it bigger. You
just create more data for us to store. Keeping the sampling rate and
bit-depth of the original would be the best option--as you say, the
problem is how to do it using the available tools.

Uncompressing a compressed audio format for normalization to WAV is
worth the creation of the extra bits for us to manage and store beceause
we have the advantage of not having compression. But I don't see an
advantage to upsampling.

Sue Bigelow
Digital Conservator
City of Vancouver Archives

-----Original Message-----
From: archiv...@googlegroups.com
[mailto:archiv...@googlegroups.com] On Behalf Of Evelyn McLellan

Hi Kevin,

Evelyn

Hi Evelyn,

Hi Kevin,

Thanks very much.

--

Evelyn McLellan

unread,

Apr 1, 2010, 2:42:01 PM4/1/10

to archiv...@googlegroups.com

Thanks, Sue, that's helpful. I'll try to figure out how to do that. If
we do end up upsampling and increasing bit-depth, though, is the only
drawback file size? Not that that's not a significant drawback, but I'm
not 100% sure I can figure out how to achieve retention of the original
settings when converting to .wav.

Evelyn

Evelyn McLellan

unread,

Apr 1, 2010, 4:17:08 PM4/1/10

to archiv...@googlegroups.com

OK, to answer part of my own question, it's easy to copy the original
sampling rate but not the original bit depth.

Bigelow, Sue

unread,

Apr 1, 2010, 4:28:24 PM4/1/10

to archiv...@googlegroups.com

And this review says there *is* an improvement in sound quality with
upsampling http://www.stereophile.com/asweseeit/344/#

Still, you aren't creating any new information, so it would annoy me on
principle to be creating more bits to store. It looks like the increase
in sound quality is because the playback filters can do a better job.

> > eave d_Format, especially the Conversion Test Results Section and

Evelyn McLellan

unread,

Apr 1, 2010, 4:46:26 PM4/1/10

to archiv...@googlegroups.com

Interesting article. But even if it had said definitively that sound
quality is improved, are we supposed to take the creator's digital audio
recordings and improve them? I'm thinking no. It comes back to the
difference between converting from analogue as opposed to converting
from born-digital.

The bit depth is problematic, though. I need to do some more work on
this.

Evelyn

Evelyn McLellan

unread,

Apr 2, 2010, 1:56:07 PM4/2/10

to archivematica

OK, as I mentioned, it is easy to retain the original sampling
frequency in converted audio files but not the bit depth. What happens
when I use FFmpeg to convert audio files to wav is that, unless I
specify otherwise, the original sampling rate is retained but the bit
depth is always set to 16. So when a 16-bit audio file is converted
the bit depth stays the same, and when an 8-bit audio file is
converted the bit depth is converted to 16 (which makes for a larger
file but is otherwise not a problem).

So what happens when I feed a 24-bit audio file into the converter?
The conversion fails. This is a good thing because we wouldn't want
Archivematica blithely converting all our incoming 24-bit (or higher)
audio files to 16-bit files. We would want an error notification and
the opportunity to convert the file properly. And of course if we were
ingesting a SIP with a bunch of 24-bit audio files we would just set
the normalization default to 24 bits.

Evelyn

On Apr 1, 1:46 pm, Evelyn McLellan <eve...@artefactual.com> wrote:
> Interesting article. But even if it had said definitively that sound
> quality is improved, are we supposed to take the creator's digital audio
> recordings and improve them? I'm thinking no. It comes back to the
> difference between converting from analogue as opposed to converting
> from born-digital.
>
> The bit depth is problematic, though. I need to do some more work on
> this.
>
> Evelyn
>
> On Thu, 2010-04-01 at 13:28 -0700, Bigelow, Sue wrote:
> > And this review says there *is* an improvement in sound quality with

> > upsamplinghttp://www.stereophile.com/asweseeit/344/#

> > > > 6262 1636 | Fax +61 2 6262 1653 | kbrad...@nla.gov.au |
> > > >http://www.nla.gov.au/Expect change, except from a vending machine

> ...
>
> read more »

Bigelow, Sue

unread,

Apr 2, 2010, 3:40:06 PM4/2/10

to archiv...@googlegroups.com

It would, of course, be nice if Archivematica would set the normalization bit depth based on the file metadata. It would have to adjust it per file, not per SIP, since a SIP could potentially have audio files with different bit depths.

No idea how complicated that would be to code.

Sue Bigelow
Digital Conservator
City of Vancouver Archives

1150 Chestnut Street,
Vancouver, B.C. V6J 3J9
604.829.4271 Tel
604.736.0626 Fax

Evelyn

--

winmail.dat

eve...@artefactual.com

unread,

Apr 2, 2010, 4:01:30 PM4/2/10

to archiv...@googlegroups.com

Hmm, right now the problem would not just be coding but getting the file
metadata. FITS doesn't report bit-depths for all audio file types - if you
look at the various conversion test results on the wiki you'll see that it
either doesn't report the bit depth, reports it but calls it something
else, or gets it wrong.

Kevin Bradley

unread,

Apr 6, 2010, 7:18:37 PM4/6/10

to archiv...@googlegroups.com

Can I just pipe in on this thread a little late?

Upsampling does not, of itself, improve the quality of the audio file.

However, because the digital to analogue process requires very tight analogue filtering after the conversion process (its called anti-aliasing), it can degrade the perceived quality of audio when the artefacts of the filtering process impact on the audio (which it will do because the sharper the curve, the worse the side effects), the "softer" the curve, the more audio gets cut out. So, if you upsample to a higher frequency 9say 48kHz to 96kHz), you can use cheaper analogue filters, but the effect of those filters is way outside the range of human hearing. Hence it sounds better. But, if you want that, it only matters at the conversion to analogue point (aka listening) and so its makes more sense (and its more archivally responsible) to do it on the fly if, and when, its required, rather than create and store a file (and if the a/d manufacturers spent a little more on decent filters we wouldn't need it at all).

So, store files at their original sampling frequency.

Bit rate: The bit rate is described at 8, 16 or 24 bits. Its like a bucket. If the bucket is as big or bigger than the original, you be able to store all the information, but if its smaller, then some will be lost. The process of shortening the length of the word without processing is truncating and is generally agreed as being a bad thing. Bit depth equates directly to dynamic range.

So, capture the bit depth at the original rate, or at 24 bit.

(and I think this was the decision anyway).

A thing to watch is that digitally processing audio often requires that information that describes the processing is attached to each word, this means the processing needs to be at a higher rate than the audio or you are effectively truncating the word length. This used to be a problem in the dim, dark and ancient days of digital audio processing (mid to late 1990s), a lot of technology would talk about 32bit processing, which means you got an extra 8 bits of information for processes (and which didn't interfere with the audio content), some of the recent high end system use an underlying 64 bit processing capability. Its worth watching that the open source system designers haven't made a decision that those extra bits don't count.

File Storage Standards: The International Association of Sound and Audiovisual archives standards for a preservation storage file: 24bit, 48khz, 96khz or better (material dependant) BWF (broadcast Wave Format) files (wave file with metadata), [previously EBU Tech 3285, now AES31-2-2006]. (Guidelines in the Production and Preservation of Digital Audio Objects, page 11, paragraph 2.8.2).

I'd like to have further discussion about normalisation to ensure the tools don't add artefacts (which is an issue esp with , and because I am not at all sure about this forum, I'll introduce myself. I'm the representative from the UNESCO memory of the world sub committee on Technology for this project, and President of the International Association of Sound and Audiovisual archives, and am keen to encourage sound and audiovisual materials in Archivematica as well as other digital materials.

Kevin Bradley | Curator, Oral History and Folklore | Director, Sound Preservation
National Library of Australia | Canberra ACT 2600

Ph + 61 2 6262 1636 | Fax +61 2 6262 1653 | kbra...@nla.gov.au | http://www.nla.gov.au/

Expect change, except from a vending machine

-----Original Message-----
From: archiv...@googlegroups.com [mailto:archiv...@googlegroups.com] On Behalf Of Evelyn McLellan

Sent: Saturday, 3 April 2010 4:56 AM
To: archivematica

Evelyn

--

Evelyn McLellan

unread,

Apr 7, 2010, 12:29:06 PM4/7/10

to archivematica

Hi Kevin,

Thanks, you can jump in any time! I think we're ok with sampling rates
and bit depths (original sampling rate plus whatever bit depth we
choose, which can be 24 across the board), but otherwise I'm not 100%
sure about the quality of our audio conversions using FFmpeg. We're
converting to Wave, not Broadcast Wave, as far as I can tell, which as
I understand it does not affect sound quality but may affect
transparency of documentation (is that correct?). Second, I don't know
how to measure whether the conversion process is leaving the artefacts
you refer to.

Would it be possible for us to send you some pre- and post-normalized
audio files for analysis? We would love to have an expert opinion on
whether we're meeting acceptable conversion standards.

Evelyn

On Apr 6, 4:18 pm, Kevin Bradley <kbrad...@nla.gov.au> wrote:
> Can I just pipe in on this thread a little late?
>
> Upsampling does not, of itself, improve the quality of the audio file.
>
> However, because the digital to analogue process requires very tight analogue filtering after the conversion process (its called anti-aliasing), it can degrade the perceived quality of audio when the artefacts of the filtering process impact on the audio (which it will do because the sharper the curve, the worse the side effects), the "softer" the curve, the more audio gets cut out. So, if you upsample to a higher frequency 9say 48kHz to 96kHz), you can use cheaper analogue filters, but the effect of those filters is way outside the range of human hearing. Hence it sounds better. But, if you want that, it only matters at the conversion to analogue point (aka listening) and so its makes more sense (and its more archivally responsible) to do it on the fly if, and when, its required, rather than create and store a file (and if the a/d manufacturers spent a little more on decent filters we wouldn't need it at all).
>
> So, store files at their original sampling frequency.
>
> Bit rate: The bit rate is described at 8, 16 or 24 bits. Its like a bucket. If the bucket is as big or bigger than the original, you be able to store all the information, but if its smaller, then some will be lost. The process of shortening the length of the word without processing is truncating and is generally agreed as being a bad thing. Bit depth equates directly to dynamic range.
>
> So, capture the bit depth at the original rate, or at 24 bit.
>
> (and I think this was the decision anyway).
>
> A thing to watch is that digitally processing audio often requires that information that describes the processing is attached to each word, this means the processing needs to be at a higher rate than the audio or you are effectively truncating the word length. This used to be a problem in the dim, dark and ancient days of digital audio processing (mid to late 1990s), a lot of technology would talk about 32bit processing, which means you got an extra 8 bits of information for processes (and which didn't interfere with the audio content), some of the recent high end system use an underlying 64 bit processing capability. Its worth watching that the open source system designers haven't made a decision that those extra bits don't count.
>
> File Storage Standards: The International Association of Sound and Audiovisual archives standards for a preservation storage file: 24bit, 48khz, 96khz or better (material dependant) BWF (broadcast Wave Format) files (wave file with metadata), [previously EBU Tech 3285, now AES31-2-2006]. (Guidelines in the Production and Preservation of Digital Audio Objects, page 11, paragraph 2.8.2).
>
> I'd like to have further discussion about normalisation to ensure the tools don't add artefacts (which is an issue esp with , and because I am not at all sure about this forum, I'll introduce myself. I'm the representative from the UNESCO memory of the world sub committee on Technology for this project, and President of the International Association of Sound and Audiovisual archives, and am keen to encourage sound and audiovisual materials in Archivematica as well as other digital materials.
>
> Kevin Bradley | Curator, Oral History and Folklore | Director, Sound Preservation
> National Library of Australia | Canberra ACT 2600

> Ph + 61 2 6262 1636 | Fax +61 2 6262 1653 | kbrad...@nla.gov.au |http://www.nla.gov.au/

> ...
>
> read more »

Kevin Bradley

unread,

Apr 11, 2010, 9:11:51 PM4/11/10

to archiv...@googlegroups.com

I'm having a "little think" about the best way to test the conversion process and measure the lsb etc etc, Then, fine! It might take me a little while though (not that I'm a slow thinker, just busy!)

Kevin Bradley

Hi Kevin,

Evelyn

--

Evelyn McLellan

unread,

Apr 12, 2010, 12:24:22 PM4/12/10

to archivematica

Thanks, Kevin!

> ...
>
> read more »

Reply all

Reply to author

Forward