What is happening?
In the next PRONOM/DROID signature file release, we will be making changes to the identification signature sequences of MPEG-1 and MPEG-2. For some users of DROID this may mean that a file currently identified as MPEG-1 may, if re-characterised, be newly identified as MPEG-2, and vice-versa.
Why are we making these changes?
Signature sequences within PRONOM are generally devised through our own research, or the contributions of external parties. Where available we make use of published file format technical specification documents, and during the course of research we will gather or create many sample files, to determine and validate signature sequences. Our aim is to assert, with a strong degree of confidence that a particular sequence is as accurate as possible, and not prone to clashing with existing signatures. On occasion we will have cause to revisit certain signature sequences, where we ourselves find, or external parties report, problems with a particular signature.
On this occasion we received two separate but similar reports of issues of non-identification of valid MPEG files, from Dave Thompson at The Wellcome Library, and from Steve Murphy at LDS Church. The initial outcome of these issues were the creation of a new file format entry within PRONOM (MPEG-2 Elementary Stream), and the removal of the End Of File (EOF) sequence from the MPEG-2 Video Format. However, this prompted a more thorough research thread within The National Archives. It was found during this research process that our existing signature sequences for MPEG-1 and MPEG-2 were flawed.
What is changing?
The Beginning Of File (BOF) sequence for MPEG-1 Video Format is currently (prior to the June 2014 PRONOM/DROID signature release) as follows:
000001BA[20:2F]{7-11}000001BB
The BOF for MPEG-2 Video Format is:
000001BA[40:7F]{7-11}000001BB
The difference between these sequences is the single byte at offset 4 (hex position 0x04). This distinction seems to have been made on the basis of observations of the sample MPEG files available to us at the time. Our recent research has noted that this byte is not the determining factor for distinguishing between MPEG-1 and MPEG-2.
MPEG-2 is similar to the MPEG-1 video standard, however it significantly extends the standard with many additional features. Describing them is beyond the scope of this post, however this is the key to distinguishing between the formats. A web page hosted by the Telemedia, Networks and Systems Group at the Massachusetts Institute of Technology (http://tns-www.lcs.mit.edu/manuals/mpeg2/FAQ) discusses this issue:
"25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
A. All MPEG-2 bitstreams must contain specific extension headers that *immediately* follow MPEG-1 headers. At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension() exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles. For example, sequence_scalable_extension() is not allowed in Main Profile bitstreams.
A simple program need only scan the coded bitstream for byte-aligned start codes to determine whether the stream is MPEG-1 or MPEG-2."
The Sequence Header for MPEG is hex 0x00 00 01 B3, whilst the Sequence Extension is 0x00 00 01 B5. We can therefore predict that, if an MPEG file contains a Sequence Header, and this is followed immediately by the Sequence Extension header, then the file cannot be an MPEG-1 as the Sequence Extension is exclusive to MPEG-2. We have gathered a large number of MPEG video files from a range of sources in order to test and confirm this.
We have also developed a signature for MPEG-1 elementary stream files, and will be adding this to the next release.
In full, the following changes are taking place:
x-fmt/385 - MPEG-1 Video Format
We will be changing the name of this format to 'MPEG-1 Program Stream' for accuracy and consistency.
The new BOF sequence will be:
000001BA{8-12}000001BB
We are removing the EOF sequence. We have previously been expecting a 'Program End' marker within 32 bytes of the end of the file, however we have found that an arbitrary number of bytes (potentially padding, metadata etc.) may be found following this marker.
x-fmt/386 - MPEG-2 Video Format
We will be changing the name of this format to 'MPEG-2 Program Stream' for accuracy and consistency.
The new BOF sequence will be:
000001BA{8-12}000001BB{8-65536}000001B3{8-128}000001B5
This format will take priority over x-fmt/385.
fmt/585 - MPEG 2 Transport Stream
We will be changing the name of this format to 'MPEG-2 Transport Stream' for consistency.
fmt/640 - MPEG-2 Elementary Stream
We will be adding 'mpeg' and 'm2v' as additional extensions.
The new BOF sequence will be:
000001B3{8-256}000001B5{6-256}000001B8
fmt/649 (provisional PUID - subject to change) - MPEG-1 Elementary Stream
This will be added as a new format to the PRONOM registry.
The BOF sequence will be:
000001B3{8}000001B8
What’s the distinction between the Transport, Elementary, and Program MPEG streams?
Some of the more flexible multimedia container formats (MPEG-2, MPEG-4, MXF etc.) can be sub-divided into different variants or ‘flavours’ depending on stream content, file structure and intended use. These may carry an extension different from that of the generic form (e.g. *.m2v rather than *.mpg).
There are for example various types of MPEG-2 including; ‘Program’, ‘Elementary’ and ‘Transport.'
What impact will these changes have?
You may find, if re-characterising with DROID or any other tool that uses the PRONOM registry, a file that has previously been identified as an MPEG-1 Video Format file, newly identifies as an MPEG-2 Program Stream file. Conversely a file formerly identified as MPEG-2 Video Format may newly identify as MPEG-1 Program Stream.
When will this happen?
The next PRONOM/DROID signature release is scheduled for 25th June, 2014. This may be subject to change.
Conclusion
We strive to ensure the PRONOM registry is as accurate as possible and we continue to actively support it. We encourage users who encounter inaccuracies within the PRONOM data to let us know via the online submission form (http://www.nationalarchives.gov.uk/contact/contactform.asp?id=13), so that we can correct them. We are extremely thankful to Dave Thompson and Steve Murphy for alerting us to their particular issues with their MPEG files as their feedback prompted this research.
David Clipsham
Digital Records Infrastructure Support Engineer, The National Archives.