Helllo all,
My colleague, format enthusiast Ross Spencer and I have been doing a little bit
of research on WAVES, and I went down a rabbit hole ,and was wondering two
things about PRONOM history and future directions re: signatures for wave
files.
I have two WAVE files from our organization, Archives New Zealand. They were
both encoded with PCM audio, however, one was made using apple software and has
a RIFF FLLR chunk with about 4k bytes of zeroes before the data chunk. The
other file doesn't have the FLLR chunk, and the data chunk is 16 bytes after
the format chunk.. The file with the FLLR chunk identifies as fmt/6 (WAVE classic,
I suppose!) and the one without identifies as fmt/142 (waveformatex). I am
wondering if these are indeed two different formats, or if fmt/142 is the
preferred signature and could be improved to included wave files that were
encoded with the FLLR chunk [ I guess this could be done by changing the sig
from 52494646{4}57415645666D7420[!10]{3}[!FEFF]{16-1000}64617461 to
something like 52494646{4}57415645666D7420[!10]{3}[!FEFF]{16-4000}64617461
], or if the waveformatex signature was specifically developed to exclude WAVE
files that have the FLLR chunk, because they are understood to be two different
formats. I'm happy to share these files with interested parties.
Also, it looks like the fmt/142 files we have that are not identifying as fmt/141 (PCMWAVEFORMAT) is because the length of the format chunk is 18, and not 16 bytes. Do the extra two bytes fundamentally change the file's format? In the same vein, I was wondering how useful it would be to develop a signature
for wave_format_pcm or instead to refine fmt/141 to include cases where the format chunk is longer than expected.
I'd be happy to discuss/research/develop this further if it's wanted by the community.
Thanks!
Andrea Byrne