YouTube and WebM transcodes

Kevin Carle

unread,

Oct 11, 2010, 9:25:22 PM10/11/10

to WebM Discussion

Update to an earlier thread: we finally have it so that any videos uploaded in WebM will be transcoded into WebM regardless of resolution.

Took longer than I originally anticipated but it's live now! Sorry for the delay. If this does _not_ work for you, let me know.

-Kevin

Bryan Donnovan

unread,

Oct 11, 2010, 11:38:48 PM10/11/10

to WebM Discussion

If something is uploaded in WebM, why would it need to be transcoded
into WebM? Wouldn't this identity transcode degrade the quality?

Pascal Massimino

unread,

Oct 12, 2010, 12:59:06 AM10/12/10

to webm-d...@webmproject.org

Bryan,

On Mon, Oct 11, 2010 at 8:38 PM, Bryan Donnovan <bryand...@gmail.com> wrote:

If something is uploaded in WebM, why would it need to be transcoded
into WebM? Wouldn't this identity transcode degrade the quality?

the file needs to be transcoded anyway to make sure the result is adequate

for the streamers, storage, etc. For instance, if you upload a WebM at 380Mb/s,

it certainly isn't something that can go through "as is". Resolution needs to be

bracketed too, for instance.

Pascal

@Kevin: great news!

Matthew Heaney

unread,

Oct 12, 2010, 10:47:35 AM10/12/10

to webm-d...@webmproject.org, Kevin Carle

There's a problem with the webm file here:

http://www.youtube.com/watch?v=cRdxXPV9GNQ

The problem is that the CuePoints are incorrectly formatted -- the
Block element points to the wrong block of the cluster.

Unfortunately, neither the MuxingApp nor the WritingApp elements of
that file have been populated, so its provenance is unknown.

-Matt

Andy Berkheimer

unread,

Oct 12, 2010, 11:00:32 AM10/12/10

to webm-d...@webmproject.org, Kevin Carle

Thanks Matt. This comes from our ffmpeg muxer, will take a look...

-Andy

Vladimir Pantelic

unread,

Oct 12, 2010, 11:06:38 AM10/12/10

to webm-d...@webmproject.org

I you could point us to your FFmpeg tree and/or patches, FFmpeg could
also take a look...

Regards,

Vladimir

Steve Lhomme

unread,

Oct 12, 2010, 2:14:32 PM10/12/10

to webm-discuss

In general when you develop something with WebM, especially a writer,
you should check that the files are valid. At least until it goes to
production. It might be something that you add to your encoder farm
output, not sure how you do you things. But that would avoid putting
online bogus file.

There is a tool to verify the Matroska side of things in WebM files:
http://www.matroska.org/downloads/mkvalidator.html
It doesn't check bitstream issues though.

Steve

> --
> You received this message because you are subscribed to the Google Groups
> "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org.
> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
>

Matthew Heaney

unread,

Oct 12, 2010, 2:21:52 PM10/12/10

to webm-d...@webmproject.org

On Tue, Oct 12, 2010 at 2:14 PM, Steve Lhomme <slh...@matroska.org> wrote:
>
> There is a tool to verify the Matroska side of things in WebM files:
> http://www.matroska.org/downloads/mkvalidator.html
> It doesn't check bitstream issues though.

But note that the last time I checked (back in July 2010), this
validation tool was yielding false positives of an error. The tool
says, incorrectly, that a cluster must start with a keyframe, but that
tool does not conform to the WebM container standard, which requires
that the audio that goes with a video keyframe must appear on the same
cluster as the keyframe itself, and that the audio block(s) must
precede the the video.

WebM container standard is here:

http://www.webmproject.org/code/specs/container/

-Matt

Gregory Maxwell

unread,

Oct 12, 2010, 2:37:00 PM10/12/10

to webm-d...@webmproject.org

On Tue, Oct 12, 2010 at 2:21 PM, Matthew Heaney
<matthew...@google.com> wrote:
> validation tool was yielding false positives of an error. The tool
> says, incorrectly, that a cluster must start with a keyframe, but that

[snip]

Can you restate this? I think you're saying that the specification
does not require encoders to write bitstreams with clusters that begin
with a key-frame.

Muxers should treat all guidelines marked _should_ in this section as must.
[...]
* Key frames _should_ be placed at the beginning of clusters.
* Having key frames at the beginning of clusters should make seeking
faster and easier for the client.

I don't think warning on clusters which begin with non-keyframes would
be inappropriate for a validation tool. Am I misunderstanding anything
obvious?

Matthew Heaney

unread,

Oct 12, 2010, 2:46:46 PM10/12/10

to webm-d...@webmproject.org

On Tue, Oct 12, 2010 at 2:37 PM, Gregory Maxwell <gmax...@gmail.com> wrote:
>
> Can you restate this? I think you're saying that the specification
> does not require encoders to write bitstreams with clusters that begin
> with a key-frame.

The requirement is that if there's a keyframe, that it should be near
the front of the cluster, and that the audio block that immediately
precedes it must be on the same cluster. This means that if your
container has both audio and video, then a cluster cannot begin with a
keyframe; it must begin with audio. Note also that there might one or
more non-key video frames between the audio frame (that begins the
cluster) and the keyframe, and so they would also precede the
keyframe.

> Muxers should treat all guidelines marked _should_ in this section as must.
> [...]
> * Key frames _should_ be placed at the beginning of clusters.

Two other bullet points in that section state that:

* Audio blocks that contain the video key frame's timecode should be
in the same cluster as the video key frame block.

* Audio blocks that have same absolute timecode as video blocks
should be written before the video blocks.

It would not be correct to interpret "keyframe at beginning of
cluster" to mean "keyframe must be first block of cluster", since that
would violate the two rules above.

> * Having key frames at the beginning of clusters should make seeking
> faster and easier for the client.

The keyframe should be near the beginning of the cluster, and if
there's audio, then it cannot also be the first block in the cluster.

-Matt

Steve Lhomme

unread,

Oct 12, 2010, 2:48:40 PM10/12/10

to webm-discuss

It does warn you if the first video frame in a cluster is not a
keyframe, not if the audio is not in front.

Matthew Heaney

unread,

Oct 12, 2010, 2:51:44 PM10/12/10

to webm-d...@webmproject.org

On Tue, Oct 12, 2010 at 2:48 PM, Steve Lhomme <slh...@matroska.org> wrote:
> It does warn you if the first video frame in a cluster is not a
> keyframe, not if the audio is not in front.

But that is a useless warning, which is my point.

Danny Piccirillo

unread,

Oct 12, 2010, 3:47:05 PM10/12/10

to WebM Discussion

Awesome, great to hear! Any chance videos uploaded in Ogg Theora could
also be made available in WebM automatically?

Steve Lhomme

unread,

Oct 13, 2010, 1:51:08 AM10/13/10

to webm-discuss

It's far from useless, when seeking in a video the important factor is
where the keyframe you need is, not if the audio is before or after
the keyframe. The audio problem is really if the matching audio with
the video is found in the previous cluster, which would suck. Luckily
in WebM there is no B frame otherwise it would be possible to have the
audio matching a video frame as you described and followed by an old B
frame which doesn't have the corresponding audio in that Cluster.

I may add the audio check later specifically for WebM (because in
Matroska you can't tell if you'll have B frames or not).

Steve

Vladimir Pantelic

unread,

Oct 13, 2010, 6:28:07 AM10/13/10

to webm-d...@webmproject.org

Matthew Heaney wrote:
> On Tue, Oct 12, 2010 at 2:14 PM, Steve Lhomme<slh...@matroska.org> wrote:
>>
>> There is a tool to verify the Matroska side of things in WebM files:
>> http://www.matroska.org/downloads/mkvalidator.html
>> It doesn't check bitstream issues though.
>
> But note that the last time I checked (back in July 2010), this
> validation tool was yielding false positives of an error. The tool
> says, incorrectly, that a cluster must start with a keyframe, but that
> tool does not conform to the WebM container standard, which requires
> that the audio that goes with a video keyframe must appear on the same
> cluster as the keyframe itself, and that the audio block(s) must
> precede the the video.

could you explain the reasoning behind this requirement? Why does audio
need to be before the video?

Matthew Heaney

unread,

Oct 13, 2010, 10:46:37 AM10/13/10

to webm-d...@webmproject.org

On Wed, Oct 13, 2010 at 6:28 AM, Vladimir Pantelic <vlad...@gmail.com> wrote:
>
> could you explain the reasoning behind this requirement? Why does audio
> need to be before the video?

When you seek, so that you're now pointing to a new cluster
(containing a keyframe), you want to have all the audio you need (for
that keyframe frame) on that same cluster, so you don't have to seek
backwards for the previous cluster, simply to get a bit of audio.

When there's a tie, and blocks across tracks have the same timecode,
then the ties goes to audio, and so the audio block physically
precedes the video block. I forget the reasons for that requirement,
but I remember someone did specifically ask for it.

-Matt

Andy Shaules

unread,

Oct 13, 2010, 11:54:20 AM10/13/10

to webm-d...@webmproject.org

Speaking of live encoding, most capture routines may provide a video frame
first, but if you look at the timestamps, the audio is almost always latent
by a scecond or two. Early audio is dropped, video is cueued, until the
desired offset.

Speaking of playback, it takes longer to prepared a stream, and render audio
than to bitblt a video frame, so in addition, audio is given control of the
system clock so that tiny adjusments can be made ensuring smooth audio
playback.

Steve Lhomme

unread,

Oct 13, 2010, 2:38:44 PM10/13/10

to webm-discuss

But unlike video, audio is always buffered at some point. So whether
it's exactly before or after the corresponding video frame in the
container doesn't matter much.

Matthew Heaney

unread,

Oct 13, 2010, 3:01:47 PM10/13/10

to webm-d...@webmproject.org

On Wed, Oct 13, 2010 at 1:51 AM, Steve Lhomme <slh...@matroska.org> wrote:
> It's far from useless, when seeking in a video the important factor is
> where the keyframe you need is, not if the audio is before or after
> the keyframe. The audio problem is really if the matching audio with
> the video is found in the previous cluster, which would suck.

Please fix your mkv validator tool to stop issuing false positive
warnings for webm files. Thanks.

Andy Shaules

unread,

Oct 13, 2010, 3:05:31 PM10/13/10

to webm-d...@webmproject.org

If audio is always buffered, and video is not, then audio should be before
video, otherwise you force the video to be buffered as well, in cases where
audio is latant in the container.

Steve Lhomme

unread,

Oct 13, 2010, 4:51:31 PM10/13/10

to webm-discuss

What false positive ? The audio in front of video ? I explained in
this thread that audio doesn't matter as much as a keyframe, as long
as it's in the same cluster (and obviously not too far). Because audio
needs caching. And since WebM was being discussed I've only heard
about the guy who heard someone who said it would be nice if audio was
at the front. I'd like the actual explanation. I'd be surprised if any
container in this world has any such requirement at all otherwise
playback is not guaranteed.

Now I never claimed mkvalidator is perfect. But it's not a tool to
prove a Matroska/WebM file is correct. It's a tool to find out if it's
incorrect. Maybe the word 'validator' is misleading but that's the
idea.

Steve Lhomme

unread,

Oct 13, 2010, 5:01:24 PM10/13/10

to webm-discuss

This is not how it works. I don't think any sane person would decode
the audio and the video in the order they come in the file in the same
thread. Each track is handled independently and then synchronized at
the end. And the video is always waiting for the audio to reach the
right timecode to be actually displayed. At which point the next video
frame can start to be decoded and hopefully finish before the audio is
done with its chunk of decoded data. None of that is instantaneous and
since you don't want audio glitches you always have to have some audio
decoded ahead of what's playing, same thing for video.

So your proposition would imply that a lot of audio needs to come
before the matching video frame in case it's blocking for a while and
the audio would be starving ? What duration would be needed to go
ahead of the video then ? The duration of one video frame would be the
minimum, but it could be more if the video decoder is too slow and you
are not reading ahead in your file/stream before the data read has
been fully decoded and rendered.

Steve Lhomme

unread,

Oct 13, 2010, 5:07:40 PM10/13/10

to webm-discuss

On Wed, Oct 13, 2010 at 10:56 PM, Andy Shaules <bowl...@gmail.com> wrote:
> Containers:
>
> nsv, flv, avi,
>
> Try them otherwise to the detriment of av sync.
>
> unsigned integer values for avi/nsv offset represent the amount of time in
> milliseconds audio is currently ahead.

And this is solving what problem ? Certainly not the a/v sync. In the
case of AVI there is not timecodes for audio. This is actually the
first reason that lead me to create Matroska. Try to mux VBR audio in
AVI and get a good sync with that. I don't know much about NSV or FLV.

> ----- Original Message ----- From: "Steve Lhomme" <slh...@matroska.org>
> To: "webm-discuss" <webm-d...@webmproject.org>

> Sent: Wednesday, October 13, 2010 1:51 PM

> Subject: Re: YouTube and WebM transcodes
>
>

Gregory Maxwell

unread,

Oct 13, 2010, 5:08:30 PM10/13/10

to webm-d...@webmproject.org

On Wed, Oct 13, 2010 at 4:51 PM, Steve Lhomme <slh...@matroska.org> wrote:
> On Wed, Oct 13, 2010 at 9:01 PM, Matthew Heaney
> <matthew...@google.com> wrote:
>> On Wed, Oct 13, 2010 at 1:51 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>> It's far from useless, when seeking in a video the important factor is
>>> where the keyframe you need is, not if the audio is before or after
>>> the keyframe. The audio problem is really if the matching audio with
>>> the video is found in the previous cluster, which would suck.
>>
>> Please fix your mkv validator tool to stop issuing false positive
>> warnings for webm files. Thanks.
>
> What false positive ? The audio in front of video ? I explained in
> this thread that audio doesn't matter as much as a keyframe, as long
> as it's in the same cluster (and obviously not too far). Because audio
> needs caching. And since WebM was being discussed I've only heard
> about the guy who heard someone who said it would be nice if audio was
> at the front. I'd like the actual explanation. I'd be surprised if any
> container in this world has any such requirement at all otherwise
> playback is not guaranteed.

There are three interacting requirements here (remember should is must
in for a muxer/validator):

* Key frames should be placed at the beginning of clusters.

* Audio blocks that contain the video key frame's timecode should be
in the same cluster as the video key frame block.
* Audio blocks that have same absolute timecode as video blocks should
be written before the video blocks.

The I believe argument here is that the requirement is that audio
comes before the key frames it overlaps and must be in the same
cluster. Because audio frame durations do not perfectly align with
video frames (and, in fact, are often longer) it may not be possible
to put a key frame at the start of the cluster while obeying the rest
of the requirements.

E.g. audio frame covers time 10-12, P-frame at time 10, I frame at
time 11 could not start the cluster with a keyframe while obeying the
audio requirements. The keyframe forces a particular audio block
which then forces also bringing along the P-frame.

Steve Lhomme

unread,

Oct 13, 2010, 5:16:39 PM10/13/10

to webm-discuss

On Wed, Oct 13, 2010 at 11:01 PM, Steve Lhomme <slh...@matroska.org> wrote:
> This is not how it works. I don't think any sane person would decode
> the audio and the video in the order they come in the file in the same
> thread. Each track is handled independently and then synchronized at
> the end. And the video is always waiting for the audio to reach the
> right timecode to be actually displayed. At which point the next video
> frame can start to be decoded and hopefully finish before the audio is
> done with its chunk of decoded data. None of that is instantaneous and
> since you don't want audio glitches you always have to have some audio
> decoded ahead of what's playing, same thing for video.
>
> So your proposition would imply that a lot of audio needs to come
> before the matching video frame in case it's blocking for a while and
> the audio would be starving ? What duration would be needed to go
> ahead of the video then ? The duration of one video frame would be the
> minimum, but it could be more if the video decoder is too slow and you
> are not reading ahead in your file/stream before the data read has
> been fully decoded and rendered.

...And such a system/design would fail as soon as it encounters a
frame that takes too long to decode (audio starving, video played out
of sync). That's what happens in a system that can't cache coded
and/or decoded data independently for each track it's playing. It's
also subject to terrible muxing which is currently the norm for WebM
creation (I doubt the muxers always put at least the duration of a
video frame of audio in front of a video frame).

Now there's something else I don't understand about this concept.
Suppose you have an audio frame and then the video frame with the same
timecode. Does that mean you decode the audio and start playing it
before the video is actually decoded and rendered ? Or you wait until
*both* are decoded to render them ? In the latter case (obeviously the
correct one) it doesn't matter which decoder finished first. You need
them *both* before you can render.

Andy Shaules

unread,

Oct 13, 2010, 4:56:59 PM10/13/10

to webm-d...@webmproject.org

Containers:

nsv, flv, avi,

Try them otherwise to the detriment of av sync.

unsigned integer values for avi/nsv offset represent the amount of time in
milliseconds audio is currently ahead.

----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 1:51 PM
Subject: Re: YouTube and WebM transcodes

Steve Lhomme

unread,

Oct 13, 2010, 5:21:58 PM10/13/10

to webm-discuss

I agree that audio should always be in the same cluster as the video
frame which timecode is inside the start-end timecodes of that audio
frame. I should add such a check in mkvalidator, although I fear most
files I know will be rejected. But that's certainly a goal for all
muxers.

Andy Shaules

unread,

Oct 13, 2010, 5:25:34 PM10/13/10

to webm-d...@webmproject.org

The entire windows directshow multimedia platform does run on a single
stream thread.

believe it or not. Although the individual filter can spawn children
threads, the filtergraph stream exists as one thread.

By definition of flv tags, their order is thier order in time.

Think of it this way. Since audio samples determine the stream clock time in
almost all cases, don't you think they should come first? the entire
duration of a segment would be known.

In the end you are talking about chunks of time. The audio spans the entire
time of the cluster. The video does not. If you have ever written your own
muxer from scratch, you will find that audio and video samples never arrive
to the mux process in any predetermined order or latency. How do YOU solve
the problem at the point of encoding? Sure, as a decoder, you could spawn
any number of cache threads, but there is more to A/V than just playback. In
muxing, audio arrives second to the muxer, and it arrives with earlier
timestamps than then video. It taks longer to render audio also.... Seems to
me a no-brainer to ask for audio ahead of video, just the same as other
modern and legacy containers do.

Andy Shaules

unread,

Oct 13, 2010, 5:43:28 PM10/13/10

to webm-d...@webmproject.org

Thanks Gregory for making it clear.

Steve, you will probably find that most muxers are fighting audio that is
stamped too far ahead of the video before the other is true. The audio
stream is always late to the encoder and with early timestamps, so that
video frames must wait for their coresponding audio event.

----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 2:21 PM
Subject: Re: YouTube and WebM transcodes

Andy Shaules

unread,

Oct 13, 2010, 5:48:53 PM10/13/10

to webm-d...@webmproject.org

I know it sounds counter intuitive, but I will recieve a video stamped by
the camera at about 3 seconds stream time, and then receive an audio segment
that is one half second in duration, and start time of 1 second stream-time.

We muxers must hold the video as long as we can and finally release it when
the audio catches up to say, about 100 milliseconds prior to the very first
video frame I have. All audio prior to then is dumped. There can be up to
three seconds of audio @ a stream time earlier than the very first video
frame.

----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 2:21 PM
Subject: Re: YouTube and WebM transcodes

Steve Lhomme

unread,

Oct 13, 2010, 5:53:12 PM10/13/10

to webm-discuss

On Wed, Oct 13, 2010 at 11:25 PM, Andy Shaules <bowl...@gmail.com> wrote:
> The entire windows directshow multimedia platform does run on a single
> stream thread.

Yeah, sorry I meant more like handled independently and then
synchronized at the end much like multi-threads tasks. In mono CPU it
can be threads or an internal scheduler/sequencer that just handles
the synchronisation between 2 processes that have their own
cache/queue.

> believe it or not. Although the individual filter can spawn children
> threads, the filtergraph stream exists as one thread.
>
> By definition of flv tags, their order is thier order in time.
>
>
> Think of it this way. Since audio samples determine the stream clock time in
> almost all cases, don't you think they should come first? the entire
> duration of a segment would be known.

Well, like in my example of audio and video having the same timecode,
they need to play at the same time and thus wait for each other. Which
one will take more time to decode ? The video one (most likely) so
maybe it should come first if you follow that logic ? In the end I
think it's a non-issue as long as you realise there has to be caching
somewhere to have proper synchronisation. After that proper muxing is
mostly have similar timecodes together (and obviously in the right
order).

> In the end you are talking about chunks of time. The audio spans the entire
> time of the cluster. The video does not. If you have ever written your own
> muxer from scratch, you will find that audio and video samples never arrive
> to the mux process in any predetermined order or latency. How do YOU solve
> the problem at the point of encoding? Sure, as a decoder, you could spawn
> any number of cache threads, but there is more to A/V than just playback. In
> muxing, audio arrives second to the muxer, and it arrives with earlier
> timestamps than then video. It taks longer to render audio also.... Seems to
> me a no-brainer to ask for audio ahead of video, just the same as other
> modern and legacy containers do.

Well, in fact the main Matroska muxer used in 95% of the existing
files in mkvmerge. And it would need a major code shaking to achieve
that. Right now it cannot even start a cluster with a keyframe in all
cases (because it doesn't want to have too long clusters). And the
order of where packets appear I think it based on the order of the
tracks definition in the header.

I did implement the "audio at the front" in mkclean and it was a
tricky task too (as I don't handle a buffer between tracks there).

Vladimir Pantelic

unread,

Oct 14, 2010, 4:13:10 AM10/14/10

to webm-d...@webmproject.org

Matthew Heaney wrote:
> On Wed, Oct 13, 2010 at 6:28 AM, Vladimir Pantelic<vlad...@gmail.com> wrote:
>>
>> could you explain the reasoning behind this requirement? Why does audio
>> need to be before the video?
>
> When you seek, so that you're now pointing to a new cluster
> (containing a keyframe), you want to have all the audio you need (for
> that keyframe frame) on that same cluster, so you don't have to seek
> backwards for the previous cluster, simply to get a bit of audio.

Yes, I want the audio that goes with a keyframe on the same cluster,
but why would I need it *before* the video? audio and video go
to two different rendering pipelines anyway, and both have some
"delay" (decoding, pcm buffer queue, video frame queue) and need
to be synced much later in the chain.

> When there's a tie, and blocks across tracks have the same timecode,
> then the ties goes to audio, and so the audio block physically
> precedes the video block. I forget the reasons for that requirement,
> but I remember someone did specifically ask for it.

anybody remembers? :)

Vladimir Pantelic

unread,

Oct 14, 2010, 4:13:18 AM10/14/10

to webm-d...@webmproject.org

Andy Shaules wrote:
> The entire windows directshow multimedia platform does run on a single
> stream thread.
>
> believe it or not. Although the individual filter can spawn children
> threads, the filtergraph stream exists as one thread.

So, WebM is ruled by the (current) behaviour of windows directshow?

Reply all

Reply to author

Forward