Muxing audio

28 views
Skip to first unread message

fnordware

unread,
Aug 5, 2014, 8:51:50 PM8/5/14
to apps-...@webmproject.org
Just a few questions about muxing audio in Matroska with libwebm:

1. Does it matter if the audio "frames" are marked as keyframes or not?

2. What is the preferred method of interleaving audio? Should I aim for audio blocks to accompany each video frame, or do bigger audio chunks so there's only an audio block every few frames?

3. I notice that libwebm is letting me write multiple audio blocks with the same timecode. Is this allowed in Matroska? Recomended?

Frank Galligan

unread,
Aug 6, 2014, 2:29:43 PM8/6/14
to apps-...@webmproject.org
1. Audio should be marked as keyframes.

2. If your audio block rate is greater than your video frame than it would be best to accompany each video frame. Here are some guidelines on muxing audio and video. [1]

3. This is allowed as blocks MUST be written in monotonically increasing order. But I don't understand why you would have audio blocks with the same timestamp. We relaxed the timestamp restriction in Matroska for vp8 altref frames, not audio.




--
You received this message because you are subscribed to the Google Groups "Application Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to apps-devel+...@webmproject.org.
To post to this group, send email to apps-...@webmproject.org.
Visit this group at http://groups.google.com/a/webmproject.org/group/apps-devel/.
For more options, visit https://groups.google.com/a/webmproject.org/d/optout.

fnordware

unread,
Aug 8, 2014, 12:02:16 AM8/8/14
to apps-...@webmproject.org
On Wednesday, August 6, 2014 11:29:43 AM UTC-7, Frank Galligan wrote:
1. Audio should be marked as keyframes.

2. If your audio block rate is greater than your video frame than it would be best to accompany each video frame. Here are some guidelines on muxing audio and video. [1]

3. This is allowed as blocks MUST be written in monotonically increasing order. But I don't understand why you would have audio blocks with the same timestamp. We relaxed the timestamp restriction in Matroska for vp8 altref frames, not audio.


You can see what I'm doing with libwebm here:


I supply Vorbis with audio and it emits packets. As long as each packet is within the current frame, I write it using Segment::AddFrame(). The timecode is the same for all these packets, as they must be if I'm going to write a frame for that timecode right after the audio. But when I do this, the result looks like this in webminspector.py.

Cluster (head size: 12 bytes, data: 104549 bytes, pos: 4664, '0x1238')
 
TimeCode (head size: 2 bytes, data: 1 bytes, pos: 4676, '0x1244') : 0
 
SimpleBlock (head size: 2 bytes, data: 62 bytes, pos: 4679, '0x1247') : 'binary'
  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0
 
SimpleBlock (head size: 3 bytes, data: 362 bytes, pos: 4743, '0x1287') : 'binary'
  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0
 
SimpleBlock (head size: 3 bytes, data: 327 bytes, pos: 5108, '0x13f4') : 'binary'
  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0
 
SimpleBlock (head size: 3 bytes, data: 243 bytes, pos: 5438, '0x153e') : '[VP8] key: Yes, ver: 0, sf: Yes, pl: 217'
  track number
: 1, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0
 
SimpleBlock (head size: 3 bytes, data: 331 bytes, pos: 5684, '0x1634') : 'binary'
  track number
: 2, keyframe : False, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 42, time code(absolute) : 42
 
SimpleBlock (head size: 3 bytes, data: 344 bytes, pos: 6018, '0x1782') : 'binary'
  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 42, time code(absolute) : 42
 
SimpleBlock (head size: 2 bytes, data: 34 bytes, pos: 6365, '0x18dd') : '[VP8] key: No, ver: 0, sf: Yes, pl: 26'
  track number
: 1, keyframe : False, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 42, time code(absolute) : 42


Tom Finegan

unread,
Aug 8, 2014, 1:59:21 PM8/8/14
to apps-...@webmproject.org
The granulepos of each ogg packet returned by vorbis_bitrate_flushpacket()[1] is the time (in samples) for that packet. I don't think your audio timestamps are accurate as you currently have things implemented. 

[1] For each set of successful sequential calls to vorbis_analysis_blockout(), vorbis_analysis(), vorbis_bitrate_add_block().

Brendan Bolles

unread,
Aug 8, 2014, 2:20:48 PM8/8/14
to apps-...@webmproject.org
On Aug 8, 2014, at 10:59 AM, 'Tom Finegan' via Application Developers wrote:

> The granulepos of each ogg packet returned by vorbis_bitrate_flushpacket()[1] is the time (in samples) for that packet. I don't think your audio timestamps are accurate as you currently have things implemented.
>
> [1] For each set of successful sequential calls to vorbis_analysis_blockout(), vorbis_analysis(), vorbis_bitrate_add_block().


I'm not sure that would matter for this issue anyway. The point is I'm calling muxer_segment->AddFrame() with a certain time stamp several times, once for each vorbis packet. I think libwebm is supposed to make a single block with a frame for each call, while right now it's creating a new block for each call.


If you can tell me specifically where you think my audio sample math is wrong, I'm all ears. It seems to sync up just fine in the movies I've made. Players don't seem to be balking at having multiple audio blocks with the same time stamp, but it seems non-optimal to me, especially for things like seeking.


Brendan

Tom Finegan

unread,
Aug 8, 2014, 5:03:02 PM8/8/14
to apps-...@webmproject.org
On Fri, Aug 8, 2014 at 11:20 AM, Brendan Bolles <bre...@fnordware.com> wrote:
On Aug 8, 2014, at 10:59 AM, 'Tom Finegan' via Application Developers wrote:

> The granulepos of each ogg packet returned by vorbis_bitrate_flushpacket()[1] is the time (in samples) for that packet. I don't think your audio timestamps are accurate as you currently have things implemented.
>
> [1] For each set of successful sequential calls to vorbis_analysis_blockout(), vorbis_analysis(), vorbis_bitrate_add_block().


I'm not sure that would matter for this issue anyway.  The point is I'm calling muxer_segment->AddFrame() with a certain time stamp several times, once for each vorbis packet.  I think libwebm is supposed to make a single block with a frame for each call, while right now it's creating a new block for each call.


libwebm will write what you give it with the exceptions noted in the container guidelines for WebM muxers[1]: If you want to write multiple ogg packets in one frame:

1. Accumulate each packet returned by vorbis_bitrate_flush_packet() in a buffer.
2. Write them all at once using a timestamp calculated from the granulepos of the first packet you received from vorbis_bitrate_flush_packet().

Even with all your packets having the same time stamp, strictly speaking, the timestamps are still "increasing" monotonically. In other words, this is valid behavior.
 

If you can tell me specifically where you think my audio sample math is wrong, I'm all ears.  It seems to sync up just fine in the movies I've made.  Players don't seem to be balking at having multiple audio blocks with the same time stamp, but it seems non-optimal to me, especially for things like seeking.


Am I looking in the wrong place? I see timeStamp assigned at line 1249, and then used with AddFrame() for the audio track at line 1362 without any modification using the granulepos or sample rate as the time base-- Sorry if I've confused myself and you! 

If you seek within the output for your current code the A/V sync is correct? I expected otherwise, but I've been wrong before. :)

Otoh, if the player you're using for playback testing does A/V sync based on the audio playback clock, it's quite possibly ignoring your timestamps (excepting the first frame and post seeking).

 

Brendan

Brendan Bolles

unread,
Aug 8, 2014, 7:11:50 PM8/8/14
to apps-...@webmproject.org
On Aug 8, 2014, at 2:03 PM, 'Tom Finegan' via Application Developers wrote:

> 1. Accumulate each packet returned by vorbis_bitrate_flush_packet() in a buffer.
> 2. Write them all at once using a timestamp calculated from the granulepos of the first packet you received from vorbis_bitrate_flush_packet().
>
> Even with all your packets having the same time stamp, strictly speaking, the timestamps are still "increasing" monotonically. In other words, this is valid behavior.


This is what I'm doing. Except I'm encoding the next packet of audio in between each write instead of having them all lined up beforehand. But they all have the same timestamp, as you can see in the output from webminspector.py.

>
> Am I looking in the wrong place? I see timeStamp assigned at line 1249, and then used with AddFrame() for the audio track at line 1362 without any modification using the granulepos or sample rate as the time base-- Sorry if I've confused myself and you!


Yeah, that is right. And you'll notice that timeStamp is const, so I'm definitely writing each audio packet in a frame with the same timeStamp.

I use timeStamp and nextTimeStamp and the sample rate to figure out the granulepos range ("nextBlockAudioSample") I want to put in the file during the current frame. Then I get packets from vorbis, check the granulepos, and write them if they're in the range.


Brendan

fnordware

unread,
Aug 8, 2014, 7:43:04 PM8/8/14
to apps-...@webmproject.org
Here's an easier test that you'll be able to replicate.

I built the sample_muxer program that comes with libwebm. I had it process the Tears of Steel WebM on Xiph.

The output looks a lot like mine in webminspector.py. Groups of audio blocks with the same time code. The original file (made with FFmpeg) does not look like this.

Cluster (head size: 12 bytes, data: 39533 bytes, pos: 4281, '0x10b9')
 
TimeCode (head size: 2 bytes, data: 1 bytes, pos: 4293, '0x10c5') : 0
 
SimpleBlock (head size: 3 bytes, data: 2606 bytes, pos: 4296, '0x10c8') : '[VP8] key: Yes, ver: 0, sf: Yes, pl: 2588'

  track number
: 1, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 2 bytes, data: 81 bytes, pos: 6905, '0x1af9') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 528 bytes, pos: 6988, '0x1b4c') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 436 bytes, pos: 7519, '0x1d5f') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 443 bytes, pos: 7958, '0x1f16') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 449 bytes, pos: 8404, '0x20d4') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 447 bytes, pos: 8856, '0x2298') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 448 bytes, pos: 9306, '0x245a') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 459 bytes, pos: 9757, '0x261d') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 0, time code(absolute) : 0

 
SimpleBlock (head size: 3 bytes, data: 156 bytes, pos: 10219, '0x27eb') : '[VP8] key: No, ver: 0, sf: Yes, pl: 148'

  track number
: 1, keyframe : False, invisible : 'no', discardable : 'no'
  lace
: 'no lacing', time code : 42, time code(absolute) : 42

 
SimpleBlock (head size: 3 bytes, data: 156 bytes, pos: 10378, '0x288a') : '[VP8] key: No, ver: 0, sf: Yes, pl: 148'

  track number
: 1, keyframe : False, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 83, time code(absolute) : 83
 
SimpleBlock (head size: 3 bytes, data: 156 bytes, pos: 10537, '0x2929') : '[VP8] key: No, ver: 0, sf: Yes, pl: 148'

  track number
: 1, keyframe : False, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 125, time code(absolute) : 125
 
SimpleBlock (head size: 3 bytes, data: 446 bytes, pos: 10696, '0x29c8') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 445 bytes, pos: 11145, '0x2b89') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 452 bytes, pos: 11593, '0x2d49') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 439 bytes, pos: 12048, '0x2f10') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 445 bytes, pos: 12490, '0x30ca') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 452 bytes, pos: 12938, '0x328a') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 440 bytes, pos: 13393, '0x3451') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141
 
SimpleBlock (head size: 3 bytes, data: 441 bytes, pos: 13836, '0x360c') : 'binary'

  track number
: 2, keyframe : True, invisible : 'no', discardable : 'no'

  lace
: 'no lacing', time code : 141, time code(absolute) : 141





Tom Finegan

unread,
Aug 11, 2014, 2:17:59 PM8/11/14
to apps-...@webmproject.org
On Fri, Aug 8, 2014 at 4:11 PM, Brendan Bolles <bre...@fnordware.com> wrote:
On Aug 8, 2014, at 2:03 PM, 'Tom Finegan' via Application Developers wrote:

> 1. Accumulate each packet returned by vorbis_bitrate_flush_packet() in a buffer.
> 2. Write them all at once using a timestamp calculated from the granulepos of the first packet you received from vorbis_bitrate_flush_packet().
>
> Even with all your packets having the same time stamp, strictly speaking, the timestamps are still "increasing" monotonically. In other words, this is valid behavior.


This is what I'm doing.  Except I'm encoding the next packet of audio in between each write instead of having them all lined up beforehand.  But they all have the same timestamp, as you can see in the output from webminspector.py.


The code in your repo appears to be calling AddFrame() once per ogg packet, and while doing so it uses the same timestamp for each ogg packet. Without reading the granulepos to use as the timestamp value passed to libwebm the timestamp is going to remain the same for all ogg packets passed to AddFrame until you return to the top of your loop and calculate a new timestamp.

 
>
> Am I looking in the wrong place? I see timeStamp assigned at line 1249, and then used with AddFrame() for the audio track at line 1362 without any modification using the granulepos or sample rate as the time base-- Sorry if I've confused myself and you!


Yeah, that is right.  And you'll notice that timeStamp is const, so I'm definitely writing each audio packet in a frame with the same timeStamp.

I use timeStamp and nextTimeStamp and the sample rate to figure out the granulepos range ("nextBlockAudioSample") I want to put in the file during the current frame.  Then I get packets from vorbis, check the granulepos, and write them if they're in the range.


Right, and you use the timestamp calculated at the top of the loop for N ogg_packets returned by vorbis_bitrate_flush_packet(). Those N ogg_packets become N SimpleBlocks, all with the same timestamp. The way to avoid multiple SimpleBlocks with the same timestamp is to use the timestamp of each ogg_packet. Alternatively, accumulate the packets in a single buffer and pass it to AddFrame().

Tom Finegan

unread,
Aug 11, 2014, 2:23:55 PM8/11/14
to apps-...@webmproject.org
On Fri, Aug 8, 2014 at 4:43 PM, fnordware <fnor...@gmail.com> wrote:
Here's an easier test that you'll be able to replicate.

I built the sample_muxer program that comes with libwebm. I had it process the Tears of Steel WebM on Xiph.

The output looks a lot like mine in webminspector.py. Groups of audio blocks with the same time code. The original file (made with FFmpeg) does not look like this.

Tears of Steel from the Xiph site is laced, and sample_muxer doesn't lace output. This is expected behavior.  This is the first simple block from the original file (output from mkvinfo):

| + SimpleBlock (key, track number 2, 8 frame(s), timecode 0.000s = 00:00:00.000)
|  + Frame with size 77
|  + Frame with size 524
|  + Frame with size 432
|  + Frame with size 439
|  + Frame with size 445
|  + Frame with size 443
|  + Frame with size 444
|  + Frame with size 455
 
All of these frames will be added with time=0

sample_muxer is a very simple example program, and it simply remuxes the output from the input file as it's read with little consideration for the payload. The only check made on the content of blocks read is for the presence of discard padding (because omission will break playback of the file when discard padding was present in the input).

Brendan Bolles

unread,
Aug 11, 2014, 2:42:41 PM8/11/14
to apps-...@webmproject.org
On Aug 11, 2014, at 11:23 AM, 'Tom Finegan' via Application Developers wrote:

> Tears of Steel from the Xiph site is laced, and sample_muxer doesn't lace output. This is expected behavior. This is the first simple block from the original file (output from mkvinfo):
>
> | + SimpleBlock (key, track number 2, 8 frame(s), timecode 0.000s = 00:00:00.000)
> | + Frame with size 77
> | + Frame with size 524
> | + Frame with size 432
> | + Frame with size 439
> | + Frame with size 445
> | + Frame with size 443
> | + Frame with size 444
> | + Frame with size 455
>
> All of these frames will be added with time=0



Right, and I had hoped that by calling AddFrame() multiple times with the same timeStamp, I'd get the same result. I guess not. There doesn't seem to be a way to do this with libwebm.


Brendan

Brendan Bolles

unread,
Aug 11, 2014, 2:50:15 PM8/11/14
to apps-...@webmproject.org
On Aug 11, 2014, at 11:17 AM, 'Tom Finegan' via Application Developers wrote:

> The code in your repo appears to be calling AddFrame() once per ogg packet, and while doing so it uses the same timestamp for each ogg packet. Without reading the granulepos to use as the timestamp value passed to libwebm the timestamp is going to remain the same for all ogg packets passed to AddFrame until you return to the top of your loop and calculate a new timestamp.


OK, so I should be ticking up the timeStamp I'm using for writing audio based on the granulepos. I wasn't doing that because I was following the example I've seen in Tears of Steel and others.

Thanks for explaining this to me!


Brendan

Reply all
Reply to author
Forward
0 new messages