H.264 media types

Jeremy Noring

unread,

Jul 15, 2009, 2:13:04 PM7/15/09

to

I have a question about H.264 subtypes as defined by this
documentation in MSDN:

http://msdn.microsoft.com/en-us/library/dd757808(VS.85).aspx

...so I see that the formats basically break down into two types:
those with start codes, and those without. My question pertains to
actually delivering samples to a downstream filter.

If I'm using the subtypes prefixed by the NAL unit start code
(0x00000001), when I deliver a sample, must it be a single NAL unit,
like so:

00 00 00 01 ...NALU follows...

Or could I have multiple NAL units in a single IMediaSample, like so:

00 00 00 01 ...NALU follows... 00 00 00 01 ...NALU follows...

The reason I ask is I'm not sure if I must deliver the SPS and PPS NAL
units as individual samples, or if it's OK to squash them into a
single IMediaSample and pass that down the pipe. The documentation is
clear that NAL units cannot span multiple samples, but it's not clear
as to whether or not a single sample can contain multiple NAL units.
One encoder I use pukes out giant blobs of data containing multiple
NAL units in a single buffer, so if I can't pass multiple samples
downstream, I'll have to parse them out and manually break them up.

The same question applies to the subtypes prefixed with the sample
size--can I have multiple NAL units in a single sample?

Any and all advice is welcome--thanks!

Alessandro Angeli

unread,

Jul 15, 2009, 3:51:38 PM7/15/09

to

From: "Jeremy Noring"

> If I'm using the subtypes prefixed by the NAL unit start
> code (0x00000001), when I deliver a sample, must it be a
> single NAL unit, like so:

> Or could I have multiple NAL units in a single
> IMediaSample, like so:

[...]

> The same question applies to the subtypes prefixed with
> the sample size--can I have multiple NAL units in a
> single sample?

My understanding is that an AVC1 media sample must contain
exactly 1 NALU without any prefix (that's how the data is
stored in MP4 and other containers), while H264/X264 samples
can contain any number of NALUs and the NALU boundaries
don't even have to respect the sample boundaries.

--
// Alessandro Angeli
// MVP :: DirectShow / MediaFoundation
// mvpnews at riseoftheants dot com
// http://www.riseoftheants.com/mmx/faq.htm

Jeremy Noring

unread,

Jul 15, 2009, 5:34:08 PM7/15/09

to

On Jul 15, 12:51 pm, "Alessandro Angeli" <nob...@nowhere.in.the.net>
wrote:

> My understanding is that an AVC1 media sample must contain
> exactly 1 NALU without any prefix (that's how the data is
> stored in MP4 and other containers), while H264/X264 samples
> can contain any number of NALUs and the NALU boundaries
> don't even have to respect the sample boundaries.

Thanks for the reply. I'm getting close to being able to test this
myself, so hopefully I'll be able to clarify shortly, but I think
you're right about H264/X264 being happy with multiple NALU to a
single sample.

From MSFT's documentation, it seems like AVC1 does expect a prefix
(i.e. "each NALU is prefixed by a length field, which gives the length
of the NALU in bytes. The size of the length field can vary, but is
typically 1, 2, or 4 bytes."). I use a 4 byte prefix, since it makes
it trivial to overwrite the existent 4 byte NALU start code.

Alessandro Angeli

unread,

Jul 16, 2009, 7:16:57 AM7/16/09

to

From: "Jeremy Noring"

> From MSFT's documentation, it seems like AVC1 does expect
> a prefix (i.e. "each NALU is prefixed by a length field,
> which gives the length of the NALU in bytes. The size of

That's what the doc says, but it seems unnecessary, unless
you can pack multiple AVC1 NALUs per sample. You'll know for
sure when you start testing :-)

> the length field can vary, but is typically 1, 2, or 4
> bytes."). I use a 4 byte prefix, since it makes it
> trivial to overwrite the existent 4 byte NALU start code.

Can't the start code also be only 24 bits instead of 32?

jack wini

unread,

Jul 16, 2009, 9:08:11 AM7/16/09

to

First, you must deliver the SPS and PPS NAL units to decoder filter,you can
squash them into a
single IMediaSample or not, it does not matter.

if NAL like :

00 00 00 01 ...NALU follows... 00 00 00 01 ...NALU follows...

you also don't worry about it,because the H.264 decoder can parse them .

So ,you can multiple NAL units in a single sample,or not.

Have a try.

--wini

url:http://www.ureader.com/msg/14714812.aspx

Jeremy Noring

unread,

Jul 16, 2009, 10:01:03 AM7/16/09

to

On Jul 16, 5:16 am, "Alessandro Angeli" <nob...@nowhere.in.the.net>
wrote:

> From: "Jeremy Noring"
>
> > From MSFT's documentation, it seems like AVC1 does expect
> > a prefix (i.e. "each NALU is prefixed by a length field,
> > which gives the length of the NALU in bytes. The size of
>
> That's what the doc says, but it seems unnecessary, unless
> you can pack multiple AVC1 NALUs per sample. You'll know for
> sure when you start testing :-)

Well, sure, all of it seems a little weird and unnecessary given that
the IMediaSample itself has methods to specify the size of the sample
data. I'm not sure if MSFT did this for a reason, or if they were
just following suit with other filter implementations in the open
source community. Or, maybe the point is to be able to include
multiple NALU in a single sample (the overall size is returned by the
IMediaSample method, and each NALU in the buffer is prefixed with its
own size).

I have two encoders--one sends the SPS/PPS info in-stream, and the
other sends it out of stream (specifically, in the SDP exchange).
Both bundles samples in a completely different way. It's a bit of a
pain.

>
> > the length field can vary, but is typically 1, 2, or 4
> > bytes."). I use a 4 byte prefix, since it makes it
> > trivial to overwrite the existent 4 byte NALU start code.
>
> Can't the start code also be only 24 bits instead of 32?

I believe so, but both of my encoders (always???) use a 32 bit start
code.

Geraint Davies

unread,

Jul 17, 2009, 5:12:15 AM7/17/09

to

On Thu, 16 Jul 2009 07:01:03 -0700 (PDT), Jeremy Noring
<kid...@gmail.com> wrote:

>On Jul 16, 5:16�am, "Alessandro Angeli" <nob...@nowhere.in.the.net>
>wrote:
>> From: "Jeremy Noring"
>>
>> > From MSFT's documentation, it seems like AVC1 does expect
>> > a prefix (i.e. "each NALU is prefixed by a length field,
>> > which gives the length of the NALU in bytes. The size of

You are supposed to have a single *access unit* in a sample, not a
single NALU. So one whole picture. The first sample is going to have
the SPS, the PPS, and then however many slice NALUs the picture is
broken up into. You can search for AUD SEI NALUs (sorry about the
abbreviation mush) to break the stream into pictures if necessary.

However, with many decoders, I've been successful in delivering a
stream of NALUs without worrying about frame boundaries, so long as
you have a whole number of NALUs in each sample.

So clearly you need the start codes or the access unit delimiters to
separate the NALUs.

There are three types of stream format. Byte Stream Format (00 00 01)
comes in two flavours. One has the SPS and PPS appended to the media
type format block (format block length is sizeof(VIDEOINFOHEADER) plus
the SPS/PPS). A second, more basic and more common type has the PPS
and SPS inband only, so you need to search for this in the stream. And
the third type is length-prepended, the same as MP4 files. With this
type, you need FORMAT_Mpeg2Video, where the dwFlags field is used to
store the size of the prepended length field. Here you always have the
parameter sets out of band in the sequence header field of the format
block (with nalu length field of 2 bytes for the param sets).

G