Best practices for normalizing audio files for preservation and access

41 views
Skip to first unread message

Creighton Barrett

unread,
Dec 10, 2021, 9:32:40 AM12/10/21
to archiv...@googlegroups.com
Hi everyone,

Our digitization vendor currently provides “preservation” and “access” WAV files that DROID and Siegfried recognize as PRONOM fmt/704. By default, Archivematica does not recognize fmt/704 as a preservation format, and it does not have a normalize for preservation rule. I was curious how Archivematica and ffmpeg would handle these files if I decided to normalize them for preservation, so I set up a local rule for fmt/704 in our test environment.

ffmpeg appears to take the "preservation" WAV files and normalize them into fmt/143 files. And it normalized the "access" WAV files into fmt/141 files. fmt/143 is not recognized as a preservation format, so I don't want that to be the result of any effort to normalize fmt/704 files for preservation. fmt/141 *is* considered a preservation format, but I see in PRONOM that it has been superseded by fmt/141 and fmt/142.

After poking around at other recommendations, including the Library of Congress' recommended format statement for media-independent digital audio, I would think the desirable WAV format for preservation would be fmt/2 or fmt/6.

This all leads me to a few questions I am hoping list members can help answer: 
  1. Why does ffmpeg produce two different WAV formats from our fmt/704 files? The only difference between the "preservation" and "access" formats that I can see is bitrate and, therefore, file size.

  2. Is it possible to configure ffmpeg to produce a fmt/2 file from a fmt/704 file(or other digital audio files)? What would that command look like?

  3. Why does Archivematica recognize fmt/141 as a preservation format but not fmt/704?
For preservation planning purposes, I'm interested in continuing to test audio normalization for preservation and access and would love to know if we can tell Archivematica to produce fmt/2 or fmt/6 files when we normalize digital audio for preservation. 

But the last question is maybe the most important. I have no reason to believe that our fmt/704 "preservation" WAV files will somehow become inaccessible or unreadable in the future but fmt/141 or other iterations of WAV will persist. In other words, normalizing preservation files received from our vendor into other preservation formats just doesn't seem to make sense. So I do just tell Archivematica that fmt/704 *is* a preservation format?

Curious to hear what others think, and I welcome any general thoughts or comments on digital audio preservation.

Thank you!

Creighton

Sarah Romkey

unread,
Dec 14, 2021, 2:09:53 PM12/14/21
to archiv...@googlegroups.com
Hi Creighton,

I'll take a stab at your third question and hope for some community members with more audio expertise to weigh in on the other two.

My answer is based more on my knowledge of the Format Policy Registry than on the merits of the formats in question. The original design of the FPR to indicate that a format is a Preservation format (or not) or an Access format (or not) was more about providing a flag to users during Normalization than taking effect on any preservation actions. As time wore on and PRONOM grew, we haven't really continued to add the Preservation or Access format designation to the new formats, largely just due to lack of resources but also at times due to lack of expertise (so many formats from so many specific fields, we didn't really feel qualified to assess them all). So fmt/704 may have been added to PRONOM and therefore to Archivematica after that time that we stopped proactively using those flags.

So yes, you can tell Archivematica fmt/704 is a Preservation format if you wish- the FPR defaults are really meant to be a reference implementation that can be altered for local needs. Just be aware that the Preservation flag doesn't actually do anything besides report itself in the Normalization report- Archivematica will enact a normalization rule if there is one for that format (which I don't think is the case by default).

Hope that helps!


Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Archivematica Program Manager
@archivematica / @accesstomemory




--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/CAHueW_WD%3D7pr7c7PRD_XYYy1QnRf7NF5jBjSrGtO-giZfzRzMQ%40mail.gmail.com.

Creighton Barrett

unread,
Dec 16, 2021, 11:48:41 AM12/16/21
to archiv...@googlegroups.com
Hi Sarah,

Thanks so much, this is very helpful! I can certainly appreciate the resources and expertise needed to maintain those flags for new formats added to PRONOM and the FPR. There are a lot of formats!

I think the bigger puzzle that prompted all of this is why ffmpeg is producing two different formats from the same format. That made me pause and consider our approach to these audio files. Would love to hear from anyone with more audio expertise about what that might be happening!

Thanks,
Creighton

Reply all
Reply to author
Forward
0 new messages