use of 0x8 flag for invisible frame in webm

215 views
Skip to first unread message

fishor

unread,
Jul 4, 2011, 3:28:27 AM7/4/11
to WebM Discussion
Hallo all,

i see vpxenc set 0x8 flag on each matroska block in case of invisible
frame.
Is it used for debug purpose? Or there any documentation hot to use in
for decoding?

Regards,
Alexey

John Koleszar

unread,
Jul 4, 2011, 8:13:25 AM7/4/11
to webm-d...@webmproject.org

Hi Alexey,

You don't need to use it when decoding. I don't think recent versions
of vpxenc set that flag any more.

fishor

unread,
Jul 4, 2011, 10:15:17 AM7/4/11
to WebM Discussion
OK. The other question: according to the documentation altref frame
should have timestamp nearest to depending frame, but also it may have
same timestamp as dependet frame. I decided to go easy way and use
same timestamp as depframe, but it looks like many players have
problems with it, especially by seeking.
May be it still make sense to use some flag to mark altref frame? If
yes, it should be documented.

On Jul 4, 12:13 pm, John Koleszar <jkoles...@google.com> wrote:

John Koleszar

unread,
Jul 6, 2011, 9:14:58 AM7/6/11
to webm-d...@webmproject.org
Could you provide more specifics? I don't see how marking the alt ref frame affects seeking at all.

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss...@webmproject.org.
For more options, visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.


fishor

unread,
Jul 7, 2011, 2:02:47 AM7/7/11
to WebM Discussion
You right, i was confused with other bug. It has nothing to do with
seeking.
Only thing, what currently bug me about altref is the duration:
According to the documentation - altref has duration=0. You can do it
by setting BlockDuration field in container. Normal frames use default
duration.
vpxenc do not use BlocDuration but did used 0x8, so it was possible to
interpret it in demuxer.

So what is the right way to mux it? Set blockduration?

fishor

unread,
Jul 7, 2011, 7:01:42 AM7/7/11
to WebM Discussion
I just gathered some statistic

if i encode with:
http://www.webmproject.org/tools/encoder-parameters/
vpxenc input_1280_720_30fps.yuv -o output_vp8.webm \
--i420 -w 1280 -h 720 -p 2 -t 4 \
--best --target-bitrate=2000 --end-usage=vbr \
--auto-alt-ref=1 --fps=30000/1001 -v \
--minsection-pct=5 --maxsection-pct=800 \
--lag-in-frames=16 --kf-min-dist=0 --kf-max-dist=360 \
--token-parts=2 --static-thresh=0 --drop-frame=0 \
--min-q=0 --max-q=60

for a video with 5000 frames, i'll get 200 altref frames. If i mux it
in proper way and explicitly set duration=0 to every altref, it will
add some useless mini overhead (about 800Bytes for each 5000 frames)
in container.

It seems to be cheaper to use flag option instead of duration way.

John Koleszar

unread,
Jul 7, 2011, 9:18:09 AM7/7/11
to webm-d...@webmproject.org
You shouldn't need to use the BlockDuration element at all, apart from
maybe the last block in the track. From the spec, "...When not written
and with no DefaultDuration, the (BlockDuration) value is assumed to
be the difference between the timecode of this Block and the timecode
of the next Block in "display" order." So just set the timecodes
correctly and you should be ok. There's no need to signal the alt ref
frames in the container, either via a flag or explicit duration.

fishor

unread,
Jul 7, 2011, 10:01:41 AM7/7/11
to WebM Discussion
Good point, thank for the tip.

fishor

unread,
Jul 7, 2011, 11:10:42 AM7/7/11
to WebM Discussion
Seems like after each answer i have more question :)

I currently checked some different webm videos inclusive some from
youtube. All of them set default duration. Spec also say: "All
absolute (block + cluster) timecodes must be monotonically
increasing." According to this, default duration make sense. So you
propose disabling default duration if altref option is enabled?

If i do not use defaultduration setting, i do not know what can i
expect from the stream. Should i normalize framerate or not. Are there
was some frame lost? If yes, it is good to wait for a keyframe. If
stream fail, there is two ways to display this failed stream:
stuttering, or artifacts. I think stuttering is less worst.

I work on vp8 support in gstreamer. The problem is, the timestamp will
be set in vp8enc element. Timestamps are 64bit long (nanoseconds), in
webmmux they are rounded to milisconds if timestampscale=1000000.
Because i do not know how the stream will be muxed, i can't decide
what TS to set for AR-frame. If i set it too close to D-1 it will have
same TS and can be wrongly interpreted by demuxer. If i set it too
close to D-frame, it can be dropped on some QoS event.

I should some how know what frame type is it, if it is an invisible
frame, i can send it to decoder immediately after D-1 was decoded. No
problems with TS or QoS and every one is happy.

Frank Galligan

unread,
Jul 7, 2011, 1:52:49 PM7/7/11
to webm-d...@webmproject.org
CIL

On Thu, Jul 7, 2011 at 11:10 AM, fishor <lexa....@gmail.com> wrote:
Seems like after each answer i have more question :)

I currently checked some different webm videos inclusive some from
youtube. All of them set default duration.
We will look into this.
 
Spec also say: "All
absolute (block + cluster) timecodes must be monotonically
increasing." According to this, default duration make sense.
I'm not sure what the correlation between monotonically increasing and default duration is. 

So you
propose disabling default duration if altref option is enabled?
I think John was proposing not to use Default duration at all because it will bloat the container because according to the spec every block MUST include block duration.

Same with block duration unless on the last block.
 

If i do not use defaultduration setting, i do not know what can i
expect from the stream. Should i normalize framerate or not.
This depends on what you are trying to accomplish. Do you need to seek a lot? live?

Are there
was some frame lost? If yes, it is good to wait for a keyframe. If
stream fail, there is two ways to display this failed stream:
stuttering, or artifacts. I think stuttering is less worst.

I work on vp8 support in gstreamer. The problem is, the timestamp will
be set in vp8enc element. Timestamps are 64bit long (nanoseconds), in
webmmux they are rounded to milisconds if timestampscale=1000000.
Because i do not know how the stream will be muxed, i can't decide
what TS to set for AR-frame.
You should just take the TS from vpxenc and divide by timestampscale.
 
If i set it too close to D-1 it will have
same TS and can be wrongly interpreted by demuxer.
FFmpeg right now sets the altref raw timecode which matches the block that precedes it. Hence the monotonically increasing.

Does the gstreamer have an issue with this now?


If i set it too
close to D-frame, it can be dropped on some QoS event.
If you drop one frame all the frames after it will have artifacts until the decoder receives a key-frame. So I don't see the issue here.
 

I should some how know what frame type is it, if it is an invisible
frame, i can send it to decoder immediately after D-1 was decoded.
You should be able to send any frames, not just alt-ref,  to the decoder after D-1 frame was decoded.

No
problems with TS or QoS and every one is happy.

fishor

unread,
Jul 7, 2011, 3:08:30 PM7/7/11
to WebM Discussion


On 7 Jul., 19:52, Frank Galligan <fgalli...@google.com> wrote:
> CIL
>
> On Thu, Jul 7, 2011 at 11:10 AM, fishor <lexa.fis...@gmail.com> wrote:
> > Seems like after each answer i have more question :)
>
> > I currently checked some different webm videos inclusive some from
> > youtube. All of them set default duration.
>
> We will look into this.
>
> > Spec also say: "All
> > absolute (block + cluster) timecodes must be monotonically
> > increasing." According to this, default duration make sense.
>
> I'm not sure what the correlation between monotonically increasing
> and default duration is.
>
> So you> propose disabling default duration if altref option is enabled?
>
> I think John was proposing not to use Default duration at all because it
> will bloat the container because according to the spec every block MUST
> include block duration.
>
> Same with block duration unless on the last block.

Ok. I need to take a look at in webmmux.

> > If i do not use defaultduration setting, i do not know what can i
> > expect from the stream. Should i normalize framerate or not.
>
> This depends on what you are trying to accomplish. Do you need to seek a
> lot? live?

I have two targets:
- make webm more attractive in gstreamer (it is default decoder and
encoder framework for ubuntu), so people can use it for offlien porn
collection or what ever. Just to make this code path good tested and
used.
- primer target is for streaming, unstable sources in unstable
envelopment, cameras (variable framerate) over network (packet drops,
delays). It should be recordable and recorded stream should be
seecable live.

> Are there
>
> > was some frame lost? If yes, it is good to wait for a keyframe. If
> > stream fail, there is two ways to display this failed stream:
> > stuttering, or artifacts. I think stuttering is less worst.
>
> > I work on vp8 support in gstreamer. The problem is, the timestamp will
> > be set in vp8enc element. Timestamps are 64bit long (nanoseconds), in
> > webmmux they are rounded to milisconds if timestampscale=1000000.
> > Because i do not know how the stream will be muxed, i can't decide
> > what TS to set for AR-frame.
>
> You should just take the TS from vpxenc and divide by timestampscale.
>
> > If i set it too close to D-1 it will have
> > same TS and can be wrongly interpreted by demuxer.
>
> FFmpeg right now sets the altref raw timecode which matches the block that
> precedes it. Hence the monotonically increasing.
>
> Does the gstreamer have an issue with this now?

Gstreamer had a bug by not setting TS to AR-frames. I decided to fix
it in easy way, by setting same TS as D-frame has. Probably it was not
really good solution.

> If i set it too> close to D-frame, it can be dropped on some QoS event.
>
> If you drop one frame all the frames after it will have artifacts until the
> decoder receives a key-frame. So I don't see the issue here.
>
>
>
> > I should some how know what frame type is it, if it is an invisible
> > frame, i can send it to decoder immediately after D-1 was decoded.
>
> You should be able to send any frames, not just alt-ref,  to the decoder
> after D-1 frame was decoded.

I mean in the right order.

Thank you for your answer.

Frank Galligan

unread,
Jul 8, 2011, 10:40:48 AM7/8/11
to webm-d...@webmproject.org

Default encoder settings are hard because the video can be used in different environments. I would probably let the encoder pick the key-frames as a default. I.e. Pick a high key-frame interval like 9999. This way if the encoder plug-in is used as an exporter in an application it should get the best quality. If a developer wants to use the encoder plug-in in an application in a lossy network he/she can change the key-frame frequency to a value that would work better in that situation.

Setting the TS of Alt frame to the preceding frame is how FFmpeg handles it. Some encoders have set the Altref time in the middle if the timebase is granular enough. And I think others have set the Alt TS to the same time as the following frame. All of these should work. At some point we should describe the best practice. I will send an email to the list soon, I just want to clear something up on timing first.


--

Steve Lhomme

unread,
Jul 9, 2011, 10:37:19 AM7/9/11
to webm-d...@webmproject.org
Following a discussion I had with fishor on IRC, I need some
clarifications on something:
Can alt-frames be considered keyframes ? Or they are just "secondary"
reference frames ?

Right now GStreamer seems to use alt frames with BlockGroup and
BlockDuration of 0 and possibly marked as invisible. This is OK, but
that frame has no reference set, so it can be considered as a
keyframe, and would be described as such in the Cue entries.

So are they keyframes usable for seeking ? If not I have to verify
mkclean to make sure it doesn't list them in the Cues. For that they
will need to be specifically marked as Invisible (a 0 duration may be
valid as a keyframe for seeking).

--
Steve Lhomme
Matroska association Chairman

Steve Lhomme

unread,
Jul 10, 2011, 5:45:02 AM7/10/11
to webm-d...@webmproject.org
Looking closely, it seems alt-ref frames can use SimpleBlock. The
timecode only needs to be one increment before the next frame it's
attached to. It is only necessary to use Block/BlockGroup when a
default duration has been set. In which case the duration of the
alt-ref would be assumed to be the default and may disturb playback.

Steve

Frank Galligan

unread,
Jul 11, 2011, 1:34:07 PM7/11/11
to webm-d...@webmproject.org
We need to decide what are valid files and what is the ideal layout for alt-ref frames wrt to WebM.

First some background info. Alt-ref frames are generated frames that are not supposed to have any duration associated with them. (If anyone disagrees with this definition or has a better one please reply.)

PF = preceding frame
AF = alt-ref frame
FF = following frame

Next I think we will need to define the specs for 2 different cases. One with the Track DefautDuration set one without.

1. Without DefaultDuration being set.

I think there are 3 possibilities that should be considered valid files.
1.a. PF, then AF with time = PF, then FF with time > AF
1.b. PF, then AF with time > PF, then FF with time > AF
1.c. PF, then AF with time > PF, then FF with time = AF

I think muxers should strive to output alt-ref frames in 1.c. This case is the closet to what the alt-ref frame is supposed to be. I.E. PF and FF have the correct time and duration, AF has a time and 0 duration.


2. With DefaultDuration being set.

I think there are 6 possibilities that all players might run into.
2.a. PF, then AF with time = PF, then FF with time > AF (with PF + AF + FF duration > PF + FF source duration)
2.b. PF, then AF with time > PF, then FF with time > AF (with PF + AF + FF duration > PF + FF source duration)
2.c. PF, then AF with time > PF, then FF with time = AF (with PF + AF + FF duration > PF + FF source duration)
2.d. PF, then AF with time = PF, then FF with time > AF (with PF + AF + FF duration = PF + FF source duration)
2.e. PF, then AF with time > PF, then FF with time > AF (with PF + AF + FF duration = PF + FF source duration)
2.f. PF, then AF with time > PF, then FF with time = AF (with PF + AF + FF duration = PF + FF source duration)

With 2.a.-2.c. the problem is that muxer is adding time into the track when it shouldn't be. So far I have only heard of files that have the Track's DefaultDuration set and then PF + AF + FF do not have their BlockDuration set so all blocks should receive the DefualtDuration as their duration. I think all the players currently handle these files without any problems. But to try and create a general rule in the spec for demuxers might be impossible because dmeuxers will not know if a frame is an AF. I think if any of PF, AF, or FF has a BlockDuration set and the PF + AF + FF duration != PF + FF source duration, then the files should be considered malformed. But should we try and special case when the DefaultDuration is set and the BlockDuration is not?

I think 2.d.-2.f should be considered valid files. I think muxers should strive to output alt-ref frames in 2.f. with AF setting the BlockDuration to 0.

Frank

fishor

unread,
Jul 13, 2011, 2:05:12 AM7/13/11
to WebM Discussion
I implemented the 2.f form in gstreamer. Some test show that not all
programs can handle it:
mplayer - filed to seek
vlc - ok
firefox - ok
chromium - filed to seek
mkvmerge - marked frames with duration=0 as keyframes (should be fixed
by now).

I wait now until it will be official to make bug reports.

Frank Galligan

unread,
Jul 13, 2011, 9:38:33 AM7/13/11
to webm-d...@webmproject.org
Just making sure but you if implement 2.f you will need to add a ReferenceBlock element to the altref frame, otherwise that block will be treated as a key-frame.

Can you implement 1.c in gstreamer? I think that is probably the best solution for altref frames.

Frank



--

Frank Galligan

unread,
Jul 13, 2011, 9:39:20 AM7/13/11
to webm-d...@webmproject.org
Also could you send me some sample files? I would like to test in some other players.

Thanks,
Frank

fishor

unread,
Jul 13, 2011, 3:25:25 PM7/13/11
to WebM Discussion
Just to make sure. In case of 2.f the stream will looks like:

DefaultDuration=3333

SipleBlock.
timestamp=1
data=PF
BlockGroup
timestamp=3
BlockDuration=0
ReferenceBlock=3 (timestamp of following frame)
data=AF (altref frame)
SimpleBlcok
timestamp=3
data=FF

Is it ok that reference timestamp point to the same ts it is in?

On 13 Jul., 15:39, Frank Galligan <fgalli...@google.com> wrote:
> Also could you send me some sample files? I would like to test in some other
> players.
>
> Thanks,
> Frank
>
> On Wed, Jul 13, 2011 at 9:38 AM, Frank Galligan <fgalli...@google.com>wrote:
>
>
>
>
>
>
>
> > Just making sure but you if implement 2.f you will need to add a
> > ReferenceBlock element to the altref frame, otherwise that block will be
> > treated as a key-frame.
>
> > Can you implement 1.c in gstreamer? I think that is probably the best
> > solution for altref frames.
>
> > Frank
>
> >> To post to this group, send email to webm-disc...@webmproject.org.
> >> To unsubscribe from this group, send email to
> >> webm-discuss+unsubscr...@webmproject.org.

Frank Galligan

unread,
Jul 13, 2011, 4:51:50 PM7/13/11
to webm-d...@webmproject.org
(Quick warning I haven't done this myself yet.)

I think the AF should reference the PF. So like this:

DefaultDuration=3333

SipleBlock.
 timestamp=1
 data=PF
BlockGroup
 timestamp=3
 BlockDuration=0
 ReferenceBlock=1 (timestamp of PF)

 data=AF (altref frame)
SimpleBlcok
 timestamp=3
 data=FF

Frank

To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss...@webmproject.org.

fishor

unread,
Jul 14, 2011, 3:50:47 AM7/14/11
to WebM Discussion
The specs on matroska org was corrected by now. It tells "The duration
of the Block (based on TimecodeScale). This element is mandatory when
DefaultDuration is set for the track (but can be omitted as other
default values). When not written and with no DefaultDuration, the
value is assumed to be the difference between the timecode of this
Block and the timecode of the next Block in "display" order (not
coding order). This element can be useful at the end of a Track (as
there is not other Block available), or when there is a break in a
track like for subtitle tracks. When set to 0 that means the frame is
not a keyframe."

Frank Galligan

unread,
Jul 14, 2011, 11:51:38 AM7/14/11
to webm-d...@webmproject.org
I think we are going to have trouble with that last sentence in the paragraph. I think this will break a good amount of muxers/demuxers.

Frank



--

Steve Lhomme

unread,
Jul 14, 2011, 12:32:49 PM7/14/11
to webm-d...@webmproject.org
You're right, we need to find something else to store alt-ref frames
(AF). You also want the AF to have the same timecode as the following
frame (FF). This is not good. In the same track no frame should have
the same timecode. That's a basic requirement, especially when
references are used.

Here are the possibilities I can think of:

1/ in BlockGroup with a 0 duration. The problem is that for most
existing demuxers this is considered as a keyframe. When remuxing it
can result to a Cue entry which should not happen. So this is not
enough.
One possibility could be to add a fake reference in the BlockGroup,
namely it's to itself (own timecode). That would fool current demuxers
and should not disturb systems that handle these references properly
(I don't know any). Now that would be a problem if our AF and its FF
had the same timecode. One good reason to avoid this.

2/ Use a SimpleBlock for the AF without they keyframe flag and with
the invisible flag (which it is).
This solution, like the previous, would require the AF and the FF to
have different timecodes. This may not be possible if the
TimecodeScale matches the framerate of the video. ie, the timecode is
incremented by 1 in each consecutive Block/SimpleBlock, leaving no
room for a frame in between displayed ones.

3/ Put the AF and FF together in the same SimpleBlock using lacing.
The problem is that in this case the timecode of the second frame is
"unknown" (left to the framework to decide how to handle this). That
means they would likely get the default duration (if any). They might
also get split by a remuxer.

4/ Put the AF in BlockAdditions:
http://www.matroska.org/technical/specs/index.html#BlockAdditions
As you can see, it is left to the codec to know how to handle the
BlockAdditional element. That means the FF would be in Block and the
AF in BlockAdditional. They get the same timecode. And the AF is
marked with BlockAddID=1 which for VP8 would mean an alt-ref frame. No
need for 0 duration as it's just extra data to handle.

IMO only #4 is a clean solution. It doesn't require any hacking of the
timecodes. The drawback is that BlockAdditions is currently poorly
supported. So it's likely the data in current frameworks will not be
passed to the VP8 decoder. I suppose that would break decoding.

There's one last option: frame packing. The VP8 encoder/decoder is
responsible for (un)packing the AF and FF frames together in one
buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
B-frames were done in AVI.

Steve

--

Vladimir Pantelic

unread,
Jul 14, 2011, 12:47:54 PM7/14/11
to webm-d...@webmproject.org
Steve Lhomme wrote:

> 4/ Put the AF in BlockAdditions:
> http://www.matroska.org/technical/specs/index.html#BlockAdditions
> As you can see, it is left to the codec to know how to handle the
> BlockAdditional element. That means the FF would be in Block and the
> AF in BlockAdditional. They get the same timecode. And the AF is
> marked with BlockAddID=1 which for VP8 would mean an alt-ref frame. No
> need for 0 duration as it's just extra data to handle.

5/ Packed AF:

> There's one last option: frame packing. The VP8 encoder/decoder is
> responsible for (un)packing the AF and FF frames together in one
> buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
> B-frames were done in AVI.

I "vote" for 4 or 5

fishor

unread,
Jul 15, 2011, 2:35:34 AM7/15/11
to WebM Discussion
Hi,

On 14 Jul., 18:32, Steve Lhomme <slho...@matroska.org> wrote:
> You're right, we need to find something else to store alt-ref frames
> (AF). You also want the AF to have the same timecode as the following
> frame (FF). This is not good. In the same track no frame should have
> the same timecode. That's a basic requirement, especially when
> references are used.
>
> Here are the possibilities I can think of:
>
> 1/ in BlockGroup with a 0 duration. The problem is that for most
> existing demuxers this is considered as a keyframe. When remuxing it
> can result to a Cue entry which should not happen. So this is not
> enough.
> One possibility could be to add a fake reference in the BlockGroup,
> namely it's to itself (own timecode). That would fool current demuxers
> and should not disturb systems that handle these references properly
> (I don't know any). Now that would be a problem if our AF and its FF
> had the same timecode. One good reason to avoid this.

It is easy, but too tricky.

> 2/ Use a SimpleBlock for the AF without they keyframe flag and with
> the invisible flag (which it is).
> This solution, like the previous, would require the AF and the FF to
> have different timecodes. This may not be possible if the
> TimecodeScale matches the framerate of the video. ie, the timecode is
> incremented by 1 in each consecutive Block/SimpleBlock, leaving no
> room for a frame in between displayed ones.

ircc webm spec say "The TimecodeScale element should (must) be set to
a default of 1.000.000 nanoseconds."

> 3/ Put the AF and FF together in the same SimpleBlock using lacing.
> The problem is that in this case the timecode of the second frame is
> "unknown" (left to the framework to decide how to handle this). That
> means they would likely get the default duration (if any). They might
> also get split by a remuxer.

Timecodes of FF and PF are known, AF is unknown and should be
"generated"

> 4/ Put the AF in BlockAdditions:http://www.matroska.org/technical/specs/index.html#BlockAdditions
> As you can see, it is left to the codec to know how to handle the
> BlockAdditional element. That means the FF would be in Block and the
> AF in BlockAdditional. They get the same timecode. And the AF is
> marked with BlockAddID=1 which for VP8 would mean an alt-ref frame. No
> need for 0 duration as it's just extra data to handle.

For gstreamer it mean, we should keep one frame in memory by muxing.
It is not so good.

> IMO only #4 is a clean solution. It doesn't require any hacking of the
> timecodes. The drawback is that BlockAdditions is currently poorly
> supported. So it's likely the data in current frameworks will not be
> passed to the VP8 decoder. I suppose that would break decoding.

hm... this is exactly the difference between vp8 and theora. vp8
produce small sized packets, wich are good for network.

> There's one last option: frame packing. The VP8 encoder/decoder is
> responsible for (un)packing the AF and FF frames together in one
> buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
> B-frames were done in AVI.
>
> Steve
>
>
>
>
>
>
>
>
>
> On Thu, Jul 14, 2011 at 5:51 PM, Frank Galligan <fgalli...@google.com> wrote:
> > I think we are going to have trouble with that last sentence in the
> > paragraph. I think this will break a good amount of muxers/demuxers.
> > Frank
>
> > On Thu, Jul 14, 2011 at 3:50 AM, fishor <lexa.fis...@gmail.com> wrote:
>
> >> The specs on matroska org was corrected by now. It tells "The duration
> >> of the Block (based on TimecodeScale). This element is mandatory when
> >> DefaultDuration is set for the track (but can be omitted as other
> >> default values). When not written and with no DefaultDuration, the
> >> value is assumed to be the difference between the timecode of this
> >> Block and the timecode of the next Block in "display" order (not
> >> coding order). This element can be useful at the end of a Track (as
> >> there is not other Block available), or when there is a break in a
> >> track like for subtitle tracks. When set to 0 that means the frame is
> >> not a keyframe."
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "WebM Discussion" group.
> >> To post to this group, send email to webm-disc...@webmproject.org.
> >> To unsubscribe from this group, send email to
> >> webm-discuss+unsubscr...@webmproject.org.
> >> For more options, visit this group at
> >>http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "WebM Discussion" group.
> > To post to this group, send email to webm-disc...@webmproject.org.
> > To unsubscribe from this group, send email to
> > webm-discuss+unsubscr...@webmproject.org.

fishor

unread,
Jul 15, 2011, 5:37:45 AM7/15/11
to WebM Discussion
Most impotent question for me is: what timestamp of altref frame can
change by stream demuxing/reconstruction? We can't seek between
keyframes, so every thing between KF is almost like monolithic block.
If it goes in correct order every thing is fine, and even if AF share
time with with PF oder FF, it will be decoded in same order it was
written.

Only problem is seeking and wrong stream duration in form 2.[a,b,c]

Or i miss some thing?

Vladimir Pantelic

unread,
Jul 15, 2011, 6:03:08 AM7/15/11
to webm-d...@webmproject.org
fishor wrote:
> Most impotent question for me is: what timestamp of altref frame can
> change by stream demuxing/reconstruction? We can't seek between
> keyframes, so every thing between KF is almost like monolithic block.
> If it goes in correct order every thing is fine, and even if AF share
> time with with PF oder FF, it will be decoded in same order it was
> written.
>
> Only problem is seeking and wrong stream duration in form 2.[a,b,c]
>
> Or i miss some thing?

the more I think about it, the more I favor option 5 aka frame packing

Paul Wilkins

unread,
Jul 15, 2011, 8:58:17 AM7/15/11
to WebM Discussion

A TRUE key frame must not only be decodable without reference to any
other frame, it must also be possible to decode all subsequent
frames without reference to anything that precedes the key frame.

In practice this means that a key frame must not only be intra coded,
it must reset ALL reference buffers and the entropy context to a
defined state. Also the key frame header is different as it contains
extra data to support, for example, spatial re-sampling / resizing.

However, it may be possible to create fast seeking algorithms that
use alt refs to skip through video more quickly and if it is
desirable (????) I think it would be possible (from a bit stream
perspective) to create key frames that are marked as hidden and never
actually displayed.

fishor

unread,
Jul 15, 2011, 9:11:15 AM7/15/11
to WebM Discussion
It should be done for future releases. For current release probably
only way is to use a dirty ReferenceBlock hack. Or fix player? So or
so, some part should be changed. And currently looks like only ffmpeg
demuxer is affected by duration=0 issue?

Frank Galligan

unread,
Jul 15, 2011, 2:00:06 PM7/15/11
to webm-d...@webmproject.org
CIL


On Thu, Jul 14, 2011 at 12:32 PM, Steve Lhomme <slh...@matroska.org> wrote:
You're right, we need to find something else to store alt-ref frames
(AF). You also want the AF to have the same timecode as the following
frame (FF). This is not good. In the same track no frame should have
the same timecode. That's a basic requirement, especially when
references are used.
This is a question that came up before. I think the only mention of same block timecodes in the spec is this "There can be many Blocks in a BlockGroup provided they all have the same timecode. " But that statement doesn't necessarily preclude blocks and simple blocks from have the same timecode.

I do agree that referencing a timecode that has more than would block would undefined currently, but worst case we could always put rules in the spec about this.

Currently FFmpeg is creating files with simple blocks that have the same timecodes. And the players I tested have no issues. I will talk about this later in detail.



Here are the possibilities I can think of:

1/ in BlockGroup with a 0 duration. The problem is that for most
existing demuxers this is considered as a keyframe. When remuxing it
can result to a Cue entry which should not happen. So this is not
enough.
Correct.
 
One possibility could be to add a fake reference in the BlockGroup,
namely it's to itself (own timecode). That would fool current demuxers
and should not disturb systems that handle these references properly
(I don't know any). Now that would be a problem if our AF and its FF
had the same timecode. One good reason to avoid this.
I'm not sure why you would want to add a fake reference.

If your file was muxed like 2.e. or 2.f. you should just put a real reference to PF. (If your file was muxed like 2.d. you could still have a reference but as said above we will need to make rules about referencing > 1 blocks with the same timestamp)


2/ Use a SimpleBlock for the AF without they keyframe flag and with
the invisible flag (which it is).
Muxers should be able to mark the block as invisible but I don't think this should be mandatory. I think we should leave the invisible flag out of this discussion.

This solution, like the previous, would require the AF and the FF to
have different timecodes. 
Again not sure why they cannot have the same timecode.
 
This may not be possible if the
TimecodeScale matches the framerate of the video. ie, the timecode is
incremented by 1 in each consecutive Block/SimpleBlock, leaving no
room for a frame in between displayed ones.
With same timecodes this would be fine.
 

3/ Put the AF and FF together in the same SimpleBlock using lacing.
The problem is that in this case the timecode of the second frame is
"unknown" (left to the framework to decide how to handle this). That
means they would likely get the default duration (if any). They might
also get split by a remuxer.
I think this would break almost all players today. I don't know for sure though.
 

4/ Put the AF in BlockAdditions:
http://www.matroska.org/technical/specs/index.html#BlockAdditions
As you can see, it is left to the codec to know how to handle the
BlockAdditional element. That means the FF would be in Block and the
AF in BlockAdditional. They get the same timecode. And the AF is
marked with BlockAddID=1 which for VP8 would mean an alt-ref frame. No
need for 0 duration as it's just extra data to handle.

IMO only #4 is a clean solution. It doesn't require any hacking of the
timecodes. The drawback is that BlockAdditions is currently poorly
supported. So it's likely the data in current frameworks will not be
passed to the VP8 decoder. I suppose that would break decoding.
This would break all players today.

Also I'm not sure this is the cleanest solution as muxers and demuxers might have to have intimate knowledge of the data. I.E. I think mkv/webm muxers will need to know that the compressed frame that was just sent to them is an altref frame. There is no way in current multimedia platforms to distinguish altref frames from other non key-frames.  So muxers would need to know  that the data is a vp8 frame and how to parse the vp8 header to see if it is an alt-ref frame. Same is true with demuxers sending frames down the pipeline.



There's one last option: frame packing. The VP8 encoder/decoder is
responsible for (un)packing the AF and FF frames together in one
buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
B-frames were done in AVI.
Ahh yes. I agree this would be the best option, but we cannot do this option without breaking the bitstream (which we cannot do), or defining a new codec ID with frame packed header ala VP6 vs VP6F. For the next generation I have made sure that the codec guys will support frame packing, but that does not help us today.


I created a bunch of files and tested them on different players. I put the files up here:

sync_PF-0_FF-33.webm
sync_PF-0_AF-33-SB_FF-33.webm
sync_PF-0_AF-16-SB_FF-33.webm
sync_noaud_PF-0_AF-33-SB_FF-34.webm
sync_noaud_PF-0_AF-1-SB_FF-33.webm
sync_def_PF-0_FF-33.webm
sync_def_PF-0_AF-33-SB_FF-33.webm
sync_def_PF-0_AF-33-BG_FF-33.webm
sync_def_PF-0_AF-16-SB_FF-33.webm

All files are 30fps.
def = DefaultDuration is set > 0
noaud = no audio
PF-X, AF-X, FF-X  X = relative millisecond to get an idea where the timecodes of the frame are.
AF-X-SB = Alt-ref frames muxed in Simple Block
AF-X-BG = Alt-ref frames muxed in Block Group

1.b. = sync_PF-0_AF-16-SB_FF-33.webm
1.c. = sync_PF-0_AF-33-SB_FF-33.webm

2.e. = sync_def_PF-0_AF-16-SB_FF-33.webm

two versions of 2.f.
2.f. = sync_def_PF-0_AF-33-SB_FF-33.webm
2.f. = sync_def_PF-0_AF-33-BG_FF-33.webm

Currently FFmpeg creates files with altref frames like sync_def_PF-0_AF-33-SB_FF-33.webm. I.E. DefaultDuration is set. All blocks are muxed in SimpleBlocks and AF timecode equals FF timecode.

I tested playback on vlc, Opera, Firefox, Chromium, and IE9. All of the players could play and seek in sync without artifacts all of the files (except IE9 couldn't play sync_def_PF-0_AF-33-BG_FF-33.webm but we can fix that).



I think we should stay away from defining muxing alt-ref frames that would break current players as it will take time for the players to support the new format. Plus you will always have users that have not upgraded or can't because they are on a device and then the new files will not work for them.

I think the only file types that are currently being created are 1.b, 1.c, and 2.f with SimpleBlocks.

I think it would be best if the spec said 1.a,1.b, 1.c, 2.e SB, 2.e BG, 2.f SB and 2.f BG are valid files that must be handled by players. I think most players can handle these files today (with the IE9 exception on BG). For muxers we would strongly suggest 1.c, unless there is a good reason. I can't think of any, if anyone has one please reply. 

This will be dependent on block timecodes being monotonically increasing vs strictly increasing. So a few questions:
Is strictly increasing explicitly referenced in the Matroska spec today?
What issues could arise form allowing monotonically vs strictly increasing timecodes?

I think the cleanest solution would be to allow monotonically increasing timestamps. The muxing/demuxing is straight forward (for the most part) and all/most of the players can handle it today.


Another question I raised if the interaction of DefaultDuration and SimpleBlocks? If you read the spec and the files that FFmpeg currently generates with altref frames it would seem that the with every altref frame DefaultDuration time is being added to the overall duration of the video stream. This would mess with av sync but the timestamps are correct. All players can handle these files correctly. It seems like most players just ignore DefaultDuration wrt SimpleBlocks. Is this the correct behavior?


Truthfully I don't fully understand why you would need to set a DefaultDuration on a video stream in a timestamp based container. If the framerate is non-integral it seems like they are going to be at odds with each other.

Frank


=
Steve

Frank Galligan

unread,
Jul 15, 2011, 2:03:47 PM7/15/11
to webm-d...@webmproject.org


On Fri, Jul 15, 2011 at 2:35 AM, fishor <lexa....@gmail.com> wrote:
Hi,



> 2/ Use a SimpleBlock for the AF without they keyframe flag and with
> the invisible flag (which it is).
> This solution, like the previous, would require the AF and the FF to
> have different timecodes. This may not be possible if the
> TimecodeScale matches the framerate of the video. ie, the timecode is
> incremented by 1 in each consecutive Block/SimpleBlock, leaving no
> room for a frame in between displayed ones.

ircc webm spec say "The TimecodeScale element should (must) be set to
a default of 1.000.000 nanoseconds."
I have never seen a file with a timecodescale different than the default. I'm sure there are some but I'm guessing they are pretty rare.
 

> 3/ Put the AF and FF together in the same SimpleBlock using lacing.
> The problem is that in this case the timecode of the second frame is
> "unknown" (left to the framework to decide how to handle this). That
> means they would likely get the default duration (if any). They might
> also get split by a remuxer.

Timecodes of FF and PF are known, AF is unknown and should be
"generated"
AF is known, it is created by the encoder.
 

Frank Galligan

unread,
Jul 15, 2011, 2:08:18 PM7/15/11
to webm-d...@webmproject.org
Why can't you use Simpleblocks?

I also don't think you a dirty RefernceBlock hack if you output BlockGroups. Just put in the reference to the previous frame like any other non key-frame. See http://code.google.com/p/webm/downloads/detail?name=sync_def_PF-0_AF-33-BG_FF-33.webm&can=1&q=Type%3DWebM#makechanges for reference.

Steve Lhomme

unread,
Jul 16, 2011, 4:41:35 AM7/16/11
to webm-d...@webmproject.org
On Fri, Jul 15, 2011 at 8:00 PM, Frank Galligan <fgal...@google.com> wrote:
> On Thu, Jul 14, 2011 at 12:32 PM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> You're right, we need to find something else to store alt-ref frames
>> (AF). You also want the AF to have the same timecode as the following
>> frame (FF). This is not good. In the same track no frame should have
>> the same timecode. That's a basic requirement, especially when
>> references are used.
>
> This is a question that came up before. I think the only mention of same
> block timecodes in the spec is this "There can be many Blocks in a
> BlockGroup provided they all have the same timecode. " But
> that statement doesn't necessarily preclude blocks and simple blocks from
> have the same timecode.
> I do agree that referencing a timecode that has more than would block would
> undefined currently, but worst case we could always put rules in the spec
> about this.

I agree. I thought there was already something about frames never
having the same timecode in the specs but it's not. For me it's so
obvious that it's implied.

>> One possibility could be to add a fake reference in the BlockGroup,
>> namely it's to itself (own timecode). That would fool current demuxers
>> and should not disturb systems that handle these references properly
>> (I don't know any). Now that would be a problem if our AF and its FF
>> had the same timecode. One good reason to avoid this.
>
> I'm not sure why you would want to add a fake reference.

Simply to make sure demuxers know it's is not a keyframe. As this is
the only way to tell it when BlockGroup is used.

> If your file was muxed like 2.e. or 2.f. you should just put a real
> reference to PF. (If your file was muxed like 2.d. you could still have
> a reference but as said above we will need to make rules about referencing >
> 1 blocks with the same timestamp)
>>
>> 2/ Use a SimpleBlock for the AF without they keyframe flag and with
>> the invisible flag (which it is).
>
> Muxers should be able to mark the block as invisible but I don't think this
> should be mandatory. I think we should leave the invisible flag out of this
> discussion.

It is mandatory to put the correct information about data in the file
format. Just yesterday I was reported a bug in FFMPEG that doesn't
always put the keyframe flag on ffv1 frames even though they are
supposed to be all keyframes. Sure it works but this is not correct.
Again, a rule that is written nowhere because it's so obvious that
it's implied.

>> This solution, like the previous, would require the AF and the FF to
>> have different timecodes.
>
> Again not sure why they cannot have the same timecode.

OK, this seems to be a crucial point here. So let me explain a bit.

Matroska stores Presentation TimeStamps (PTS) and no Decoding
TimeStamps (DTS). And the frames are in coding order (ie DTS always
increasing, but not necessarily PTS). It seems very odd that 2 frames
should be rendered at the same time on the same data pipeline. It may
not be the case in the few programs you tested, but I'm not sure you
are planning to check with all multimedia framework makers to make
sure such a rule doesn't break their internal working. Also because
the PTS can be decreasing in some case (when B frames are involved),
that means there has to be a reordering in the pipeline, based on the
timecode. Reordering 2 frames that have the same timecode could lead
to unexpected behaviours.

The original BlockGroup+Block+BlockReference(s) was designed so that a
program that only knows about the container (not the codec internals)
can edit a file and keep all the frames that are needed to display the
desired edit. That's where the invisible flag was introduced, so that
reference frames that are outside of the edit are still part of the
file, but not displayed.

Imagine this example of video frames :
A(k) B(alt) C(-1) D(-2)
(k: keyframe, alt: alt-ref frame, -: references the x frame behind

If you cut at D, it has a references to B. With BlockGroup the
BlockReference would be the timecode of B. But if the timecode of B
and C are the same, which one are you supposed to keep ? That's why
it's better to avoid having frames with the same timecode.

Of course when using only SimpleBlock you lose that editing
possibility. But does it mean VP8 will always be prevented from being
used in non linear editors ?

Plus a alt-frame doesn't have a PTS, only a range for the DTS. If it
has no PTS there's no reason to have a fake one, it only creates
problems. Plus it also breaks the rule of one frame in/one frame out
that most frameworks use.

Now in the light of this, frame packing would not be ideal either. If
you want D, you'd need B+C. But then B+C would be marked as invisible.
You only lose some decoding CPU decoding C which is never used.

> This would break all players today.

That's one of the problems you get when desiging a format with
features not existing before. Most of the time they are not used
because no framework handles it. It's chicken and egg problem.

> Also I'm not sure this is the cleanest solution as muxers and demuxers might
> have to have intimate knowledge of the data. I.E. I think mkv/webm muxers

This is the opposite. Just like the CodecPrivate is opaque to the
(de)mux stage, the BlockAdditions are also opaque. They just carry an
ID passed to the codec in case more than one is possible.

> will need to know that the compressed frame that was just sent to them is an
> altref frame. There is no way in current multimedia platforms to distinguish
> altref frames from other non key-frames.  So muxers would need to know  that
> the data is a vp8 frame and how to parse the vp8 header to see if it is an
> alt-ref frame. Same is true with demuxers sending frames down the pipeline.

It's mostly that frameworks are not made for BlockAdditions.

This system was created for the WavPack hybrid mode. The main track is
a lossy codec and the BlockAddition part is the complementary data to
have the lossless version. In this case a muxer can strip the
additional data to keep only the lossy part. In the case of the
alt-ref frame it cannot be removed otherwise it breaks decoding.

>> There's one last option: frame packing. The VP8 encoder/decoder is
>> responsible for (un)packing the AF and FF frames together in one
>> buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
>> B-frames were done in AVI.
>
> Ahh yes. I agree this would be the best option, but we cannot do this option
> without breaking the bitstream (which we cannot do), or defining a new codec
> ID with frame packed header ala VP6 vs VP6F. For the next generation I have
> made sure that the codec guys will support frame packing, but that does not
> help us today.

There's not a bit left to handle this ? That would be pretty easy to
handle the extra check on your decoder library.

> I think we should stay away from defining muxing alt-ref frames that would
> break current players as it will take time for the players to support the
> new format. Plus you will always have users that have not upgraded or
> can't because they are on a device and then the new files will not work for
> them.

In the light of all this, I still don't see a clean & working solution
to the alt-ref problem. It's sad to be stuck by legacy of existing
files for such a new format...

> I think the only file types that are currently being created are 1.b, 1.c,
> and 2.f with SimpleBlocks.
> I think it would be best if the spec said 1.a,1.b, 1.c, 2.e SB, 2.e BG, 2.f
> SB and 2.f BG are valid files that must be handled by players. I think most
> players can handle these files today (with the IE9 exception on BG). For
> muxers we would strongly suggest 1.c, unless there is a good reason. I can't
> think of any, if anyone has one please reply.
> This will be dependent on block timecodes being monotonically increasing
> vs strictly increasing. So a few questions:
> Is strictly increasing explicitly referenced in the Matroska spec today?

No, but see above why it should be avoided.

> What issues could arise form allowing monotonically vs strictly increasing
> timecodes?
> I think the cleanest solution would be to allow monotonically increasing
> timestamps. The muxing/demuxing is straight forward (for the most part) and
> all/most of the players can handle it today.

Increasing timecodes may be good for VP8. But for any format with B
frames this is not going to be the case.

> Another question I raised if the interaction of DefaultDuration and
> SimpleBlocks? If you read the spec and the files that FFmpeg currently
> generates with altref frames it would seem that the with every altref frame
> DefaultDuration time is being added to the overall duration of the video

It should not be added. It replaces the duration when it is not set
(in SimpleBlock for example).

> stream. This would mess with av sync but the timestamps are correct. All
> players can handle these files correctly. It seems like most players just
> ignore DefaultDuration wrt SimpleBlocks. Is this the correct behavior?

I assume some framework don't use duration for frames. This is usually
fine for decoding, in the end you only need the PTS (hence we don't
use a DTS).

> Truthfully I don't fully understand why you would need to set a
> DefaultDuration on a video stream in a timestamp based container. If the
> framerate is non-integral it seems like they are going to be at odds with
> each other.

It can be useful to recover the start timecode in laces (thus it
should not be used with Vorbis which doesn't have a fixed framerate).
It is also used to tell the fps of a track for video.

Frank Galligan

unread,
Jul 16, 2011, 11:00:05 AM7/16/11
to webm-d...@webmproject.org
I know why you need a reference, just why not have a reference to the previous frame instead of a fake reference? 

> If your file was muxed like 2.e. or 2.f. you should just put a real
> reference to PF. (If your file was muxed like 2.d. you could still have
> a reference but as said above we will need to make rules about referencing >
> 1 blocks with the same timestamp)
>>
>> 2/ Use a SimpleBlock for the AF without they keyframe flag and with
>> the invisible flag (which it is).
>
> Muxers should be able to mark the block as invisible but I don't think this
> should be mandatory. I think we should leave the invisible flag out of this
> discussion.

It is mandatory to put the correct information about data in the file
format. Just yesterday I was reported a bug in FFMPEG that doesn't
always put the keyframe flag on ffv1 frames even though they are
supposed to be all keyframes. Sure it works but this is not correct.
Again, a rule that is written nowhere because it's so obvious that
it's implied.
A key frame and invisible frame are different. Keyframes have implications on the file format. I.E. seeking, splitting files in editors, etc.

Invisible frames have no bearing on the format. They only have a bearing on the decoder and/or renderer.

I agree in an academic world that all altrefs would be marked invisible frames.  But in real life most multimedia frameworks will not have support for invisible frames. So that will force all muxers in multimedia frameworks that do not have support for the notion of invisible frames to include a vp8 frame reader to see if any vp8 frame is an altref frame.

This has come up many times before and we thought it would be best to not force all muxers to have a serious restriction for little benefit.

>> This solution, like the previous, would require the AF and the FF to
>> have different timecodes.
>
> Again not sure why they cannot have the same timecode.

OK, this seems to be a crucial point here. So let me explain a bit.

Matroska stores Presentation TimeStamps (PTS) and no Decoding
TimeStamps (DTS). And the frames are in coding order (ie DTS always
increasing, but not necessarily PTS). It seems very odd that 2 frames
should be rendered at the same time on the same data pipeline.
I do agree with you.
 
It may
not be the case in the few programs you tested, but I'm not sure you
are planning to check with all multimedia framework makers to make
sure such a rule doesn't break their internal working. Also because
the PTS can be decreasing in some case (when B frames are involved),
that means there has to be a reordering in the pipeline, based on the
timecode.
But don't the demuxers have to have reference blocks for them to do the reordering?

Do any demuxers do reordering without reference blocks? 
 
Reordering 2 frames that have the same timecode could lead
to unexpected behaviours.
Reordering any frames  could leave to unexpected behaviors. If editors/demuxers are confused if frames have the same timestamp then we could make rules about that. I haven't seen anything confused about the frames yet, but like you have said I haven't tested everything.



The original BlockGroup+Block+BlockReference(s) was designed so that a
program that only knows about the container (not the codec internals)
can edit a file and keep all the frames that are needed to display the
desired edit. That's where the invisible flag was introduced, so that
reference frames that are outside of the edit are still part of the
file, but not displayed.
I understand this.
 

Imagine this example of video frames :
A(k) B(alt) C(-1) D(-2)
(k: keyframe, alt: alt-ref frame, -: references the x frame behind

If you cut at D, it has a references to B. With BlockGroup the
BlockReference would be the timecode of B. But if the timecode of B
and C are the same, which one are you supposed to keep ?
Why not keep both? Or if they are all BlockGroups B & C should be in the same BlockGroup according to the spec (but then C wouldn't be droppable anymore).
 
That's why
it's better to avoid having frames with the same timecode.

Of course when using only SimpleBlock you lose that editing
possibility.
Why? The only ability you lose is that C can't be droppable wrt to the NLE.

But does it mean VP8 will always be prevented from being
used in non linear editors ?
Do many NLE's use the invisible bit to do cutting in real life? The ones I have used would re-encode D until the next key frame. (Most of my work previously has been in formats that do not have an invisible state.)

 I still don't understand why you need to use BlockGroup with VP8. Maybe we should make a rule if you are using VP8 and creating altrefs they must be muxed in SimpleBlocks.


Plus a alt-frame doesn't have a PTS, only a range for the DTS. If it
has no PTS there's no reason to have a fake one, it only creates
problems.
 

 
Plus it also breaks the rule of one frame in/one frame out
that most frameworks use.
This is already broken when it was released without bit-packing. We can't put that cat back. 

Now in the light of this, frame packing would not be ideal either. If
you want D, you'd need B+C. But then B+C would be marked as invisible.
You only lose some decoding CPU decoding C which is never used.
Correct. I also think that the case you reference is so small in real life that we to make rules to keep C to be droppable when using an NLE to cut on D so players will nto have to decode an extra frame. 

Also currently wrt to VP8 C cannot be droppable. ( This is not guaranteed in the future)



> This would break all players today.

That's one of the problems you get when desiging a format with
features not existing before. Most of the time they are not used
because no framework handles it. It's chicken and egg problem.
Yeah I know. 

> Also I'm not sure this is the cleanest solution as muxers and demuxers might
> have to have intimate knowledge of the data. I.E. I think mkv/webm muxers

This is the opposite. Just like the CodecPrivate is opaque to the
(de)mux stage, the BlockAdditions are also opaque. They just carry an
ID passed to the codec in case more than one is possible.

> will need to know that the compressed frame that was just sent to them is an
> altref frame. There is no way in current multimedia platforms to distinguish
> altref frames from other non key-frames.  So muxers would need to know  that
> the data is a vp8 frame and how to parse the vp8 header to see if it is an
> alt-ref frame. Same is true with demuxers sending frames down the pipeline.

It's mostly that frameworks are not made for BlockAdditions.
Correct. 

This system was created for the WavPack hybrid mode. The main track is
a lossy codec and the BlockAddition part is the complementary data to
have the lossless version. In this case a muxer can strip the
additional data to keep only the lossy part. In the case of the
alt-ref frame it cannot be removed otherwise it breaks decoding.

>> There's one last option: frame packing. The VP8 encoder/decoder is
>> responsible for (un)packing the AF and FF frames together in one
>> buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
>> B-frames were done in AVI.
>
> Ahh yes. I agree this would be the best option, but we cannot do this option
> without breaking the bitstream (which we cannot do), or defining a new codec
> ID with frame packed header ala VP6 vs VP6F. For the next generation I have
> made sure that the codec guys will support frame packing, but that does not
> help us today.

There's not a bit left to handle this ? That would be pretty easy to
handle the extra check on your decoder library.
So I have been told.
Unless we make the rule that if a ReferenceBlock references more than one block all blocks must be kept in a cut which is going to mark the non-displayed frames invisible if one or more of those frames are droppable.

Again I just don't think this will happen much in real life, if ever. Even if it does the worst case is the decoder may have to decode some additional frames already on top the frame invisible frames it needs to decode to display the cut frame.

The main reason I don't think this situation will ever happen is because other codecs do not have a notion of frames with the same timestamp, so you will never get in a situation above. And currently VP8 can not create the above situation because C cannot be droppable. In the future VP8 could get in that situation but I think taking the extra decode hit in that situation is more than reasonable.

> Another question I raised if the interaction of DefaultDuration and
> SimpleBlocks? If you read the spec and the files that FFmpeg currently
> generates with altref frames it would seem that the with every altref frame
> DefaultDuration time is being added to the overall duration of the video

It should not be added. It replaces the duration when it is not set
(in SimpleBlock for example).

> stream. This would mess with av sync but the timestamps are correct. All
> players can handle these files correctly. It seems like most players just
> ignore DefaultDuration wrt SimpleBlocks. Is this the correct behavior?

I assume some framework don't use duration for frames. This is usually
fine for decoding, in the end you only need the PTS (hence we don't
use a DTS).
Yeah I know pretty much duration on videos frames are academic, unless you are using san editor. 

> Truthfully I don't fully understand why you would need to set a
> DefaultDuration on a video stream in a timestamp based container. If the
> framerate is non-integral it seems like they are going to be at odds with
> each other.

It can be useful to recover the start timecode in laces (thus it
should not be used with Vorbis which doesn't have a fixed framerate).
It is also used to tell the fps of a track for video.


--

John Koleszar

unread,
Jul 16, 2011, 3:31:25 PM7/16/11
to webm-d...@webmproject.org
Just a couple additional points wrt VP8:

On Sat, Jul 16, 2011 at 4:41 AM, Steve Lhomme <slh...@matroska.org> wrote:
> On Fri, Jul 15, 2011 at 8:00 PM, Frank Galligan <fgal...@google.com> wrote:
>> On Thu, Jul 14, 2011 at 12:32 PM, Steve Lhomme <slh...@matroska.org> wrote:

[...]


>
> Matroska stores Presentation TimeStamps (PTS) and no Decoding
> TimeStamps (DTS). And the frames are in coding order (ie DTS always
> increasing, but not necessarily PTS). It seems very odd that 2 frames
> should be rendered at the same time on the same data pipeline. It may
> not be the case in the few programs you tested, but I'm not sure you
> are planning to check with all multimedia framework makers to make
> sure such a rule doesn't break their internal working. Also because
> the PTS can be decreasing in some case (when B frames are involved),
> that means there has to be a reordering in the pipeline, based on the
> timecode. Reordering 2 frames that have the same timecode could lead
> to unexpected behaviours.

The reordering happens in the decoder. Any framework that supports
b-frames has to support passing a frame to the decoder but not getting
one back, or passing one and getting multiple back. We rely on the
first behavior in VP8.

>
> The original BlockGroup+Block+BlockReference(s) was designed so that a
> program that only knows about the container (not the codec internals)
> can edit a file and keep all the frames that are needed to display the
> desired edit. That's where the invisible flag was introduced, so that
> reference frames that are outside of the edit are still part of the
> file, but not displayed.
>
> Imagine this example of video frames :
> A(k) B(alt) C(-1) D(-2)
> (k: keyframe, alt: alt-ref frame, -: references the x frame behind
>
> If you cut at D, it has a references to B. With BlockGroup the
> BlockReference would be the timecode of B. But if the timecode of B
> and C are the same, which one are you supposed to keep ? That's why
> it's better to avoid having frames with the same timecode.
>

In VP8, D references A, B, and C, unless you go out of your way to
avoid that, as you might for error resiliency. There is a quality
penalty for doing that though, so any time you're using a reliable
transport like local disk or HTTP, you would leave that feature off.
It's off by default in libvpx.

In a little more detail, each macroblock in VP8 can be reconstructed
from one of 4 frames (the current frame, and one of 3 reference
buffers). In addition, it has an entropy model which can receive
partial updates on every frame, and they persist from frame to frame.
You can get information about which frames are referenced and whether
a frame is droppable at encode time, but it's expensive on the decode
side.

> Of course when using only SimpleBlock you lose that editing
> possibility. But does it mean VP8 will always be prevented from being
> used in non linear editors ?
>

Frame accurate cutting is not not easy with VP8 if you don't encode
with this use in mind, regardless of how it's muxed.

> Plus a alt-frame doesn't have a PTS, only a range for the DTS. If it
> has no PTS there's no reason to have a fake one, it only creates
> problems. Plus it also breaks the rule of one frame in/one frame out
> that most frameworks use.
>

It only breaks the 1i1o rule end to end, which doesn't affect any
framework we know about. At the decoder, 1i1o is not expected.

> Now in the light of this, frame packing would not be ideal either. If
> you want D, you'd need B+C. But then B+C would be marked as invisible.
> You only lose some decoding CPU decoding C which is never used.
>

It's possible to skip the bulk of the decoding of C if you told the
decoder you weren't going to display it. Not supported in libvpx
today, but I think ffvp8 supports this.

[...]

>>> There's one last option: frame packing. The VP8 encoder/decoder is
>>> responsible for (un)packing the AF and FF frames together in one
>>> buffer (one Frame as seen by Matroska). Just like the MPEG 4.2
>>> B-frames were done in AVI.
>>
>> Ahh yes. I agree this would be the best option, but we cannot do this option
>> without breaking the bitstream (which we cannot do), or defining a new codec
>> ID with frame packed header ala VP6 vs VP6F. For the next generation I have
>> made sure that the codec guys will support frame packing, but that does not
>> help us today.
>
> There's not a bit left to handle this ? That would be pretty easy to
> handle the extra check on your decoder library.
>

There are multiple decoders and device drivers that all would have to
be updated. Ignoring the problems of rollout and existing content,
there are a few ways this could be done, but it's fairly tricky to do
and still support frame parallel decoding due to the way the VP8
payload is defined. In any case, it boils down to putting a container
within a container. There are also use cases I can imagine for having
runs of N invisible frames, which doesn't fit well with a packed
format at all. Bottom line, I think we can do better next time, but
it's not worth piling on hacks when the current practice of using
SimpleBlocks and either monotonically increasing or synthetic
timestamps works and is reasonably intuitive, imo. The other methods
presented seem like more work with no benefit to me.

[...]

>> Truthfully I don't fully understand why you would need to set a
>> DefaultDuration on a video stream in a timestamp based container. If the
>> framerate is non-integral it seems like they are going to be at odds with
>> each other.
>
> It can be useful to recover the start timecode in laces (thus it
> should not be used with Vorbis which doesn't have a fixed framerate).
> It is also used to tell the fps of a track for video.
>

It's not any better for determining fps than the FrameRate parameter
is (ie, informational). It doesn't have enough resolution at most time
code scales.

Steve Lhomme

unread,
Aug 14, 2011, 9:36:51 AM8/14/11
to webm-d...@webmproject.org
Sorry for taking so long to answer but I've been busy on various other fronts...

From the discussion it seems the crucial part is that demuxers don't
actually reorder frames on their own. So having 2 frames with the same
timecode may not be such a big deal after all. It should just be
defined that in case of "raw editing" because of the possible mismatch
with a reference, they both should be kept (and in that order).

Do everyone agrees with this ?

Steve

Frank Galligan

unread,
Aug 15, 2011, 10:36:40 AM8/15/11
to webm-d...@webmproject.org
I'm fine with this. I will try and create some text that will be added to the WebM spec and post it here to see if everyone is okay with it.

Side note: The first file I put up that had BlockGroups with a ReferenceBlock was incorrect. I had the ReferenceBlock as absolute when it should have been relative. I fixed the file. You can read more info if your interested here:http://code.google.com/p/webm/issues/detail?id=351

Frank

Frank Galligan

unread,
Aug 15, 2011, 12:12:42 PM8/15/11
to webm-d...@webmproject.org
What about adding this to the WebM spec?

#### Monotonically Increasing Frames

The muxer MUST NOT re-order frames with the same timestamp. The muxer MUST output the frames in the order they are given.

The demuxer MUST NOT re-order frames with the same timestamp. The demuxer MUST send the frames in the order they are read form the WebM container.

Note in some WebM files that are using BlockGroups/Blocks with ReferenceBlock elements there could be a case that a ReferenceBlock is referincing more than one Block with the same timestamp. In this case the muxer/demuxer may not know which of the Blocks is referenced and MUST treat the ReferenceBlock as refernecing all of the Blocks with the same timestamp.

James Zern

unread,
Aug 16, 2011, 9:02:28 PM8/16/11
to webm-d...@webmproject.org
On Mon, Aug 15, 2011 at 09:12, Frank Galligan <fgal...@google.com> wrote:
> What about adding this to the WebM spec?
> #### Monotonically Increasing Frames
> The muxer MUST NOT re-order frames with the same timestamp. The muxer MUST
> output the frames in the order they are given.
> The demuxer MUST NOT re-order frames with the same timestamp. The demuxer
> MUST send the frames in the order they are read form the WebM container.
>
These two sound alright.

> Note in some WebM files that are using BlockGroups/Blocks with
> ReferenceBlock elements there could be a case that a ReferenceBlock is
> referincing more than one Block with the same timestamp. In this case the
> muxer/demuxer may not know which of the Blocks is referenced and MUST treat
> the ReferenceBlock as refernecing all of the Blocks with the same timestamp.
>

I think this could be worded more simply. Maybe something like:
If the timecode stored in a ReferenceBlock is shared by multiple
Blocks the muxer/demuxer MUST treat all the Blocks as referenced.

Steve Lhomme

unread,
Aug 17, 2011, 11:26:43 AM8/17/11
to webm-d...@webmproject.org
Whichever way it's worded, I agree it needs to be there in text
somewhere. I'll add the same thing to the Matroska specs.

Oleksij Rempel

unread,
Mar 1, 2016, 12:40:11 PM3/1/16
to 16.jace...@pccs.k12.id.us, WebM Discussion
It is used only for debugging. SInce all frame are decoded by libvpx any way, you need only feed the librarary with right order of frames.
The reason why i asked it for debugging was an issue in gstreamer where Invisible frame was passed in wrong order becouse it share same timestamp with visible frame.

2016-02-29 3:51 GMT+01:00 <16.jace...@pccs.k12.id.us>:


On Monday, July 4, 2011 at 2:28:27 AM UTC-5, fishor wrote:
Hallo all,

i see vpxenc set 0x8 flag on each matroska block in case of invisible
frame.
Is it used for debug purpose? Or there any documentation hot to use in
for decoding?

Regards,
 Alexey

Reply all
Reply to author
Forward
0 new messages