Hi all,
It sounds like we have some similar MXF files and we have decided not to normalize them at this point. Partly this is because of the same issues surrounding captions and other streams, but our thinking has also been informed by looking at the larger processes that produce these files, at storage costs, and at potential contexts for re-use. Taking each of these considerations in order:
1. The processes that create the files
In our case, the MXF files represent television shows where each episode is an edited version of a live event held by the museum (generally a talk or panel discussion). The creation of these files is outsourced to a third-party vendor who specializes in meeting broadcast requirements. It proved very difficult to create these particular files in-house - all production up to the creation of the MXF is done in-house - and having to re-do it would be a real burden. Even if we normalized, we would keep the MXF for at least as long as it met broadcaster needs.
One thing we might change, since we're in the position of creating the files in the first place, is having captions delivered as sidecar files in addition to being embedded in the MXF files. The most recent delivery of episodes included subtitles as .scc files but this does not seem to have been the result of a deliberate request. If you have input into how files are delivered, it might be worth trying to go further up the workflow to get captions in a format that could be later combined with normalized MKV.
2. Storage costs
In my experience, transcoding to FFV1/MKV from a format like MXF increases the file size, sometimes substantially. Of course this depends on the specific characteristics of the source, but while I've seen significant savings going from v210/uncompressed to FFV1/MKV, I haven't seen that coming from most other digital video sources in our collection[1]. Combined with keeping the original MXF, normalization would add significantly to storage costs.
As a side note, there would also be an additional transcoding step if we were to store as MKV but deliver in another format. This would be difficult to fit into our current workflows, though it might not be a problem in your context. If we were to store both the original and the normalized file, then the question is largely around storage costs and risk management via file format choices[2].
3. Contexts of re-use
The MXF files aren't the only files we preserve from live events. There are also edit masters (usually in Prores) and highly compressed H264 versions that are uploaded to Youtube. The MXF files are actually shorter than the other files because they've been edited to fit a television time slot. Most requests for partial footage of an event actually go through the Prores file.
This means the most likely re-use scenario for the MXF version is for re-broadcast, in which case it likely would need to be delivered as MXF again. This actually came up once last year when a broadcast partner had an issue with their asset management system and couldn't access two episodes previously sent them. We sent them the MXF files, on a hard drive via overnight delivery, and they were able to fulfill their broadcast schedule. So even though we don't have a lot of internal need for the MXF files, they still serve a purpose as MXF and are likely to continue to do so for a while.
Apologies for going on at length about considerations that might not apply to you, but I think it can be worth looking beyond technical file format assessments when analyzing whether or not to normalize. I should also add that our decision not to normalize is specifically not to normalize at the time of ingest. The question can be reopened, and I would welcome a clear normalization target format for born digital video in the future.
Andrew Berger
Computer History Museum
[1] This also generally happens with Prores, H264, and DV, which we also do not normalize. In a context where storage costs were less of an issue, we might save the originals plus normalized files. But at the moment we are relying more on the preservation and maintenance of A/V playback software, rather than particular formats, for continued access in the future. We do check that every file can be rendered in ffmpeg and/or VLC before ingest.
[2] We are going to move to FFV1/MKV for digitization, as the storage savings are substantial over v210 and offset any extra file management work that this might create. At the moment, our in-house team needs files in a format like Prores to be able to use them with their editing tools but FFV1/MKV could be more widely supported in the future.