If something is uploaded in WebM, why would it need to be transcoded
into WebM? Wouldn't this identity transcode degrade the quality?
There's a problem with the webm file here:
http://www.youtube.com/watch?v=cRdxXPV9GNQ
The problem is that the CuePoints are incorrectly formatted -- the
Block element points to the wrong block of the cluster.
Unfortunately, neither the MuxingApp nor the WritingApp elements of
that file have been populated, so its provenance is unknown.
-Matt
I you could point us to your FFmpeg tree and/or patches, FFmpeg could
also take a look...
Regards,
Vladimir
There is a tool to verify the Matroska side of things in WebM files:
http://www.matroska.org/downloads/mkvalidator.html
It doesn't check bitstream issues though.
Steve
> --
> You received this message because you are subscribed to the Google Groups
> "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org.
> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
>
But note that the last time I checked (back in July 2010), this
validation tool was yielding false positives of an error. The tool
says, incorrectly, that a cluster must start with a keyframe, but that
tool does not conform to the WebM container standard, which requires
that the audio that goes with a video keyframe must appear on the same
cluster as the keyframe itself, and that the audio block(s) must
precede the the video.
WebM container standard is here:
http://www.webmproject.org/code/specs/container/
-Matt
Can you restate this? I think you're saying that the specification
does not require encoders to write bitstreams with clusters that begin
with a key-frame.
Muxers should treat all guidelines marked _should_ in this section as must.
[...]
* Key frames _should_ be placed at the beginning of clusters.
* Having key frames at the beginning of clusters should make seeking
faster and easier for the client.
I don't think warning on clusters which begin with non-keyframes would
be inappropriate for a validation tool. Am I misunderstanding anything
obvious?
The requirement is that if there's a keyframe, that it should be near
the front of the cluster, and that the audio block that immediately
precedes it must be on the same cluster. This means that if your
container has both audio and video, then a cluster cannot begin with a
keyframe; it must begin with audio. Note also that there might one or
more non-key video frames between the audio frame (that begins the
cluster) and the keyframe, and so they would also precede the
keyframe.
> Muxers should treat all guidelines marked _should_ in this section as must.
> [...]
> * Key frames _should_ be placed at the beginning of clusters.
Two other bullet points in that section state that:
* Audio blocks that contain the video key frame's timecode should be
in the same cluster as the video key frame block.
* Audio blocks that have same absolute timecode as video blocks
should be written before the video blocks.
It would not be correct to interpret "keyframe at beginning of
cluster" to mean "keyframe must be first block of cluster", since that
would violate the two rules above.
> * Having key frames at the beginning of clusters should make seeking
> faster and easier for the client.
The keyframe should be near the beginning of the cluster, and if
there's audio, then it cannot also be the first block in the cluster.
-Matt
But that is a useless warning, which is my point.
I may add the audio check later specifically for WebM (because in
Matroska you can't tell if you'll have B frames or not).
Steve
could you explain the reasoning behind this requirement? Why does audio
need to be before the video?
When you seek, so that you're now pointing to a new cluster
(containing a keyframe), you want to have all the audio you need (for
that keyframe frame) on that same cluster, so you don't have to seek
backwards for the previous cluster, simply to get a bit of audio.
When there's a tie, and blocks across tracks have the same timecode,
then the ties goes to audio, and so the audio block physically
precedes the video block. I forget the reasons for that requirement,
but I remember someone did specifically ask for it.
-Matt
Speaking of playback, it takes longer to prepared a stream, and render audio
than to bitblt a video frame, so in addition, audio is given control of the
system clock so that tiny adjusments can be made ensuring smooth audio
playback.
Please fix your mkv validator tool to stop issuing false positive
warnings for webm files. Thanks.
What false positive ? The audio in front of video ? I explained in
this thread that audio doesn't matter as much as a keyframe, as long
as it's in the same cluster (and obviously not too far). Because audio
needs caching. And since WebM was being discussed I've only heard
about the guy who heard someone who said it would be nice if audio was
at the front. I'd like the actual explanation. I'd be surprised if any
container in this world has any such requirement at all otherwise
playback is not guaranteed.
Now I never claimed mkvalidator is perfect. But it's not a tool to
prove a Matroska/WebM file is correct. It's a tool to find out if it's
incorrect. Maybe the word 'validator' is misleading but that's the
idea.
So your proposition would imply that a lot of audio needs to come
before the matching video frame in case it's blocking for a while and
the audio would be starving ? What duration would be needed to go
ahead of the video then ? The duration of one video frame would be the
minimum, but it could be more if the video decoder is too slow and you
are not reading ahead in your file/stream before the data read has
been fully decoded and rendered.
And this is solving what problem ? Certainly not the a/v sync. In the
case of AVI there is not timecodes for audio. This is actually the
first reason that lead me to create Matroska. Try to mux VBR audio in
AVI and get a good sync with that. I don't know much about NSV or FLV.
> ----- Original Message ----- From: "Steve Lhomme" <slh...@matroska.org>
> To: "webm-discuss" <webm-d...@webmproject.org>
> Sent: Wednesday, October 13, 2010 1:51 PM
> Subject: Re: YouTube and WebM transcodes
>
>
There are three interacting requirements here (remember should is must
in for a muxer/validator):
* Key frames should be placed at the beginning of clusters.
* Audio blocks that contain the video key frame's timecode should be
in the same cluster as the video key frame block.
* Audio blocks that have same absolute timecode as video blocks should
be written before the video blocks.
The I believe argument here is that the requirement is that audio
comes before the key frames it overlaps and must be in the same
cluster. Because audio frame durations do not perfectly align with
video frames (and, in fact, are often longer) it may not be possible
to put a key frame at the start of the cluster while obeying the rest
of the requirements.
E.g. audio frame covers time 10-12, P-frame at time 10, I frame at
time 11 could not start the cluster with a keyframe while obeying the
audio requirements. The keyframe forces a particular audio block
which then forces also bringing along the P-frame.
...And such a system/design would fail as soon as it encounters a
frame that takes too long to decode (audio starving, video played out
of sync). That's what happens in a system that can't cache coded
and/or decoded data independently for each track it's playing. It's
also subject to terrible muxing which is currently the norm for WebM
creation (I doubt the muxers always put at least the duration of a
video frame of audio in front of a video frame).
Now there's something else I don't understand about this concept.
Suppose you have an audio frame and then the video frame with the same
timecode. Does that mean you decode the audio and start playing it
before the video is actually decoded and rendered ? Or you wait until
*both* are decoded to render them ? In the latter case (obeviously the
correct one) it doesn't matter which decoder finished first. You need
them *both* before you can render.
nsv, flv, avi,
Try them otherwise to the detriment of av sync.
unsigned integer values for avi/nsv offset represent the amount of time in
milliseconds audio is currently ahead.
----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 1:51 PM
Subject: Re: YouTube and WebM transcodes
believe it or not. Although the individual filter can spawn children
threads, the filtergraph stream exists as one thread.
By definition of flv tags, their order is thier order in time.
Think of it this way. Since audio samples determine the stream clock time in
almost all cases, don't you think they should come first? the entire
duration of a segment would be known.
In the end you are talking about chunks of time. The audio spans the entire
time of the cluster. The video does not. If you have ever written your own
muxer from scratch, you will find that audio and video samples never arrive
to the mux process in any predetermined order or latency. How do YOU solve
the problem at the point of encoding? Sure, as a decoder, you could spawn
any number of cache threads, but there is more to A/V than just playback. In
muxing, audio arrives second to the muxer, and it arrives with earlier
timestamps than then video. It taks longer to render audio also.... Seems to
me a no-brainer to ask for audio ahead of video, just the same as other
modern and legacy containers do.
Steve, you will probably find that most muxers are fighting audio that is
stamped too far ahead of the video before the other is true. The audio
stream is always late to the encoder and with early timestamps, so that
video frames must wait for their coresponding audio event.
----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 2:21 PM
Subject: Re: YouTube and WebM transcodes
We muxers must hold the video as long as we can and finally release it when
the audio catches up to say, about 100 milliseconds prior to the very first
video frame I have. All audio prior to then is dumped. There can be up to
three seconds of audio @ a stream time earlier than the very first video
frame.
----- Original Message -----
From: "Steve Lhomme" <slh...@matroska.org>
To: "webm-discuss" <webm-d...@webmproject.org>
Sent: Wednesday, October 13, 2010 2:21 PM
Subject: Re: YouTube and WebM transcodes
Yeah, sorry I meant more like handled independently and then
synchronized at the end much like multi-threads tasks. In mono CPU it
can be threads or an internal scheduler/sequencer that just handles
the synchronisation between 2 processes that have their own
cache/queue.
> believe it or not. Although the individual filter can spawn children
> threads, the filtergraph stream exists as one thread.
>
> By definition of flv tags, their order is thier order in time.
>
>
> Think of it this way. Since audio samples determine the stream clock time in
> almost all cases, don't you think they should come first? the entire
> duration of a segment would be known.
Well, like in my example of audio and video having the same timecode,
they need to play at the same time and thus wait for each other. Which
one will take more time to decode ? The video one (most likely) so
maybe it should come first if you follow that logic ? In the end I
think it's a non-issue as long as you realise there has to be caching
somewhere to have proper synchronisation. After that proper muxing is
mostly have similar timecodes together (and obviously in the right
order).
> In the end you are talking about chunks of time. The audio spans the entire
> time of the cluster. The video does not. If you have ever written your own
> muxer from scratch, you will find that audio and video samples never arrive
> to the mux process in any predetermined order or latency. How do YOU solve
> the problem at the point of encoding? Sure, as a decoder, you could spawn
> any number of cache threads, but there is more to A/V than just playback. In
> muxing, audio arrives second to the muxer, and it arrives with earlier
> timestamps than then video. It taks longer to render audio also.... Seems to
> me a no-brainer to ask for audio ahead of video, just the same as other
> modern and legacy containers do.
Well, in fact the main Matroska muxer used in 95% of the existing
files in mkvmerge. And it would need a major code shaking to achieve
that. Right now it cannot even start a cluster with a keyframe in all
cases (because it doesn't want to have too long clusters). And the
order of where packets appear I think it based on the order of the
tracks definition in the header.
I did implement the "audio at the front" in mkclean and it was a
tricky task too (as I don't handle a buffer between tracks there).
Yes, I want the audio that goes with a keyframe on the same cluster,
but why would I need it *before* the video? audio and video go
to two different rendering pipelines anyway, and both have some
"delay" (decoding, pcm buffer queue, video frame queue) and need
to be synced much later in the chain.
> When there's a tie, and blocks across tracks have the same timecode,
> then the ties goes to audio, and so the audio block physically
> precedes the video block. I forget the reasons for that requirement,
> but I remember someone did specifically ask for it.
anybody remembers? :)
So, WebM is ruled by the (current) behaviour of windows directshow?