[NOTICE] forthcoming changes to MPEG-1 and MPEG-2 variants in next PRONOM/DROID signature release

212 views
Skip to first unread message

Dclipsham

unread,
Jun 6, 2014, 7:41:01 AM6/6/14
to

What is happening?

In the next PRONOM/DROID signature file release, we will be making changes to the identification signature sequences of MPEG-1 and MPEG-2. For some users of DROID this may mean that a file currently identified as MPEG-1 may, if re-characterised, be newly identified as MPEG-2, and vice-versa.

Why are we making these changes?

Signature sequences within PRONOM are generally devised through our own research, or the contributions of external parties. Where available we make use of published file format technical specification documents, and during the course of research we will gather or create many sample files, to determine and validate signature sequences. Our aim is to assert, with a strong degree of confidence that a particular sequence is as accurate as possible, and not prone to clashing with existing signatures. On occasion we will have cause to revisit certain signature sequences, where we ourselves find, or external parties report, problems with a particular signature.

On this occasion we received two separate but similar reports of issues of non-identification of valid MPEG files, from Dave Thompson at The Wellcome Library, and from Steve Murphy at LDS Church. The initial outcome of these issues were the creation of a new file format entry within PRONOM (MPEG-2 Elementary Stream), and the removal of the End Of File (EOF) sequence from the MPEG-2 Video Format. However, this prompted a more thorough research thread within The National Archives. It was found during this research process that our existing signature sequences for MPEG-1 and MPEG-2 were flawed.

What is changing?

The Beginning Of File (BOF) sequence for MPEG-1 Video Format is currently (prior to the June 2014 PRONOM/DROID signature release) as follows:

000001BA[20:2F]{7-11}000001BB

The BOF for MPEG-2 Video Format is:

000001BA[40:7F]{7-11}000001BB

The difference between these sequences is the single byte at offset 4 (hex position 0x04). This distinction seems to have been made on the basis of observations of the sample MPEG files available to us at the time. Our recent research has noted that this byte is not the determining factor for distinguishing between MPEG-1 and MPEG-2.

MPEG-2 is similar to the MPEG-1 video standard, however it significantly extends the standard with many additional features. Describing them is beyond the scope of this post, however this is the key to distinguishing between the formats. A web page hosted by the Telemedia, Networks and Systems Group at the Massachusetts Institute of Technology (http://tns-www.lcs.mit.edu/manuals/mpeg2/FAQ) discusses this issue:

"25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?

A. All MPEG-2 bitstreams must contain specific extension headers that *immediately* follow MPEG-1 headers.  At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension() exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles.  For example, sequence_scalable_extension()  is not allowed in Main Profile bitstreams.

A simple program need only scan the coded bitstream for byte-aligned start  codes to determine whether the stream is MPEG-1 or MPEG-2."

The Sequence Header for MPEG is hex 0x00 00 01 B3, whilst the Sequence Extension is 0x00 00 01 B5. We can therefore predict that, if an MPEG file contains a Sequence Header, and this is followed immediately by the Sequence Extension header, then the file cannot be an MPEG-1 as the Sequence Extension is exclusive to MPEG-2. We have gathered a large number of MPEG video files from a range of sources in order to test and confirm this.

We have also developed a signature for MPEG-1 elementary stream files, and will be adding this to the next release.

In full, the following changes are taking place:

x-fmt/385 - MPEG-1 Video Format

We will be changing the name of this format to 'MPEG-1 Program Stream' for accuracy and consistency.

The new BOF sequence will be:

000001BA{8-12}000001BB

We are removing the EOF sequence. We have previously been expecting a 'Program End' marker within 32 bytes of the end of the file, however we have found that an arbitrary number of bytes (potentially padding, metadata etc.) may be found following this marker.


x-fmt/386 - MPEG-2 Video Format

We will be changing the name of this format to 'MPEG-2 Program Stream' for accuracy and consistency.

The new BOF sequence will be:

000001BA{8-12}000001BB{8-65536}000001B3{8-128}000001B5

This format will take priority over x-fmt/385.


fmt/585 - MPEG 2 Transport Stream

We will be changing the name of this format to 'MPEG-2 Transport Stream' for consistency.


fmt/640 - MPEG-2 Elementary Stream

We will be adding 'mpeg' and 'm2v' as additional extensions.

The new BOF sequence will be:

000001B3{8-256}000001B5{6-256}000001B8


fmt/649 (provisional PUID - subject to change) - MPEG-1 Elementary Stream

This will be added as a new format to the PRONOM registry.

The BOF sequence will be:

000001B3{8}000001B8

 

What’s the distinction between the Transport, Elementary, and Program MPEG streams?

Some of the more flexible multimedia container formats (MPEG-2, MPEG-4, MXF etc.) can be sub-divided into different variants or ‘flavours’ depending on stream content, file structure and intended use. These may carry an extension different from that of the generic form (e.g. *.m2v rather than *.mpg).

There are for example various types of MPEG-2 including; ‘Program’, ‘Elementary’ and ‘Transport.'

  • MPEG-2 ‘Program’ may contain both video and audio and has the potential to carry a variety of other data.
  • MPEG-2 ‘Elementary’ (*.m2v) - holds either the video or audio stream extracted from an MPEG-2 Program file. Generally used for editing purposes.
  • MPEG-2 ‘Transport’ (*.m2t files) - used in broadcasting to send digital video and audio signals. These are streamlined versions of ‘Program’ or ‘Elementary’ files.

What impact will these changes have?

You may find, if re-characterising with DROID or any other tool that uses the PRONOM registry, a file that has previously been identified as an MPEG-1 Video Format file, newly identifies as an MPEG-2 Program Stream file. Conversely a file formerly identified as MPEG-2 Video Format may newly identify as MPEG-1 Program Stream.

When will this happen?

The next PRONOM/DROID signature release is scheduled for 25th June, 2014. This may be subject to change.

Conclusion

We strive to ensure the PRONOM registry is as accurate as possible and we continue to actively support it. We encourage users who encounter inaccuracies within the PRONOM data to let us know via the online submission form (http://www.nationalarchives.gov.uk/contact/contactform.asp?id=13), so that we can correct them. We are extremely thankful to Dave Thompson and Steve Murphy for alerting us to their particular issues with their MPEG files as their feedback prompted this research.


David Clipsham

Digital Records Infrastructure Support Engineer, The National Archives.

Tyler Thorsted

unread,
May 5, 2016, 5:57:20 PM5/5/16
to droid-list
David,

I have a few Mpeg2 program streams which have slightly more than 128 bytes between 000001B3 and 000001B5. Is this a specification in the standard or just from your observations?

Tyler Thorsted

Dclipsham

unread,
May 11, 2016, 11:05:21 AM5/11/16
to droid...@googlegroups.com
Hi Tyler, just to publicly describe what we've discussed offline, where you've provided to the PRONOM team an image of the affected Hex dump:



With reference to this web page and other sources we
hold locally:


http://dvd.sourceforge.net/dvdinfo/mpeghdrs.html


 


The 'Sequence Header'  (00 00 01 B3) will
contain at least 8 bytes (64 bits) determining horizontal size (12 bits),
vertical size (12 bits), aspect ratio (4 bits), frame rate (4 bits), bit rate
(18 bits), 1 'always set' bit (1 bit), VBV Buffer size (10 bits), then three
flags - constrained parameters flag, load intra quantiser matrix, load
non-intra quantiser matrix (1 bit each, 3 total).


 


According to the table "If either load
quantiser matrix flag is =1, it is immediately followed by the 64 byte table(moving
the "load non-intra quantiser matrix" flag, in the case of "load
intra quantiser matrix").


 


The first flag appears in the '0x82' byte at
position 58 in your hex table. 0x82 = 1000 0010 so this flag is set and the next 64 bytes from here would by
the 'intra quantiser matrix' (except the final bit, which would be the next
flag, as it has shifted as per the point in parentheses above).


 


The next flag is in the '0xA7' at position
122. 0xA7 = 1010 0111, so this flag is also set and the next 64 bytes from here would
be the 'non-intra quantiser matrix'


 


So it appears that the 8-128 byte range in
PRONOM is based on a misinterpretation of the Sequence Header, and the byte
range should be either 8 bytes (main header, with neither quantiser matrix flag
set), 72 bytes (1 flag set, so 8 byte main header with one 64-byte quantiser
matrix), or 136 bytes (both flags set, so 8 byte main header, with two 64-byte
quantiser matrices).


 


 


Hopefully that all makes at least some sense!
The upshot is that we'll amend the signatures in the next release, which we
currently intend for early June.






Best wishes,
David

Dclipsham

unread,
May 11, 2016, 12:58:19 PM5/11/16
to droid-list
Apologies for weird formatting above, was correcting a point on byte range (replacing 76 with 72) via cellphone. Will fix ASAP.
Reply all
Reply to author
Forward
0 new messages