Hi all,
The issue with audio muting is that we want to still start up the audio
decoder even when audio is muted so that there's no delay when playback
is started if audio is unmuted.
due to our current behaviour there is memory overhead as well to buffer
the decoded data (currently we buffer 1s of decoded audio in advance of
the current playback position in order to avoid playback glitches, which
for 48KHz stero audio is 192,000 bytes required for this buffering. So
if we start up the audio decoder as well as the video decoder, we'll do
a bunch of work and allocate a bunch of memory that we don't need to.
There is an API for deselecting streams on video elements (which would
do what you're implying mute should do), the track API, but we don't
implement it yet, and AFAIK no other browser does either.
>
> 2. What happened with the requirement of selecting a key frame?
I think they dropped it because they thought they were more likely to
get something agreed upon if they dropped that requirement.
I think we should have the thumbnail API choose the keyframe for us, and
provide no way for the caller to specify what the keyframe should be.
The implementation should just pick the right frame. Because we control
the implementation, we can ensure that.
> Earlier in the thread it seemed like that is a hard requirement for
> getting a quality thumbnail generated, is that no longer the case?
Now they're discussing writing a demuxer in JavaScript that parses the
MP4 file's index and finds the keyframe they want.
I think this is a bad solution, because it means the code we're using
playback and for getting thumbnails is different, so it's ineveitable
that the behaviour will differ for these, and we'll fail to get a
thumbnail for some files. We also don't really want to have to write a
parser for all the other formats we want to support, like WebM, Ogg, etc.
Despite this being a bad solution, it's one that can be gotten to work
"well enough" quickly for the next B2G milestone, so I'm OK with it as a
short term solution until we have a useful thumbnail API implementation
in Gecko.
> I'm asking this because the API you're proposing above has no notion
> of key frames.
I think Jonas' earlier proposal is a good one, and we should allow video
URLs in it, and it for video the thumbnail API should select the
keyframe which has the largest encoded size out of the first 10
keyframes (what Android does). I think we should not require or even
allow the caller to specify which frame should be the thumbnail, because
it's much more reliable to get the video stack to do that.
>
> 3. What is the issue with seeking the video through the usual
> HTMLVideoElement API, and why is that unacceptable but a time argument
> to mozCreateThumbnail is acceptable?
We don't want to use normal seeking path because that requires us to
load the video to be thumbnailed in our regular pipeline, and if we're
not going to be playing back that means we've done a lot of extra work
that we don't need to. This increases latency of the API, and the amount
of work we have to do.
Additionally, on some B2G devices we can only have one hardware decoder
active at once (the software decoder is buggy and can only handle H.264
Baseline, so it's no use here). If we use the regular <video> path we
lock the hardware decoder for the entire time we've got a <video> active
in the active document. So while we're doing all that extra work that we
don't need to do (decoding audio and video frames for buffering under
the assumption that we'll be playing back the media) , we can't
thumbnail another video. Or play another video for that matter, though
you could argue the Player App should also tiptoe around this limitation.
Whereas if we have a dedicated thumbnail API we can initialize and lock
the HW decoder only while we decode the desired thumbnail keyframe, we
don't need to keep the HW decoder locked while we're demuxing/parsing
the file to find the desired thumbnail keyframe as in the normal <video>
path.
The primary use case here is the Video Player App "Gallery" screen. On
first run we'll have a number of unthumbnailed videos on disk, and we
need to iterate through them all and generate thumbnails. We want the
thumbnails to be generated ASAP.
> To me, they both seem like two ways to do the same thing, and I don't
> understand why we cannot rely on normal seeking here.
>
> Now, let's talk about your proposal specifically. We are working on
> an API for resizing images without incurring a high memory usage cost
> (see Jonas' post earlier this thread for a rough sketch.) It seems to
> me like there is no reason why we cannot integrate HTMLVideoElement
> with that API. That gives you the ability to rescale the captured
> thumbnail to the requested size, which is part of your proposal
> above. That API will be based on Promises which will give you a
> better syntax for handling the asynchronous response. I'm actively
> working on the API for now, focusing on the image resizing issue for
> now, but I would like to start thinking about the video thumbnail
> generation use case as well, so I would really appreciate if we can
> discuss the questions above.
I think it is completely reasonable to have the image thumbnail API also
work on videos. The sketch Jonas proposed earlier seems a reasonable API
to me.
As I said above, I think for video we should let the
navigator.createResizedImage() implementation chose which frame to use
for the thumbnail of a video based on the size of keyframe. We shouldn't
force the JS App to decide what frame should be the keyframe, or try to
expose the sizes of keyframes to the web somehow.
>
> Does that sound good?
>
Does to me! Did what I said make sense?
Cheers,
Chris P.