[RFC] WebM metadata

James Zern

unread,

Jan 18, 2012, 3:20:38 PM1/18/12

to WebM Discussion

A metadata specification for WebM has been a long requested feature.
We plan on addressing this in the near term, but wanted to have a
general discussion before we propose a solution.

One requirement we have is the support for temporal metadata to store
things like geolocation data, e.g., GPS coordinates. The other is the
traditional global metadata, e.g., title, author, etc.
Initially we were thinking to define a separate standard for both
global and temporal, which would be stored in its own track. After a
bit of discussion it came to feel as though global was merely a
special case of the temporal, which we knew we wanted.

Consider some examples:
i) A video clip with an audio soundtrack that changes sources through the clip.
With one audio and one video track it would be simple enough to have
an artist label that covered each or both. With multiple audio sources
in one track you could similarly have an artists tag. This though
begins to decouple the attribution for each segment.
ii) Appending 2 or more unrelated videos.
This is a similar problem. You get into an issue with having to merge
metadata from each segment into one global blob. At that point which
takes precedence as the 'artist' for the clip?
iii) Live presentations.
Once more the performer, etc. can change throughout the clip, like a
shoutcast stream.

It is true that a global mapping could be constructed to handle the
above cases, but it seems that it might be simpler to have a timed
metadata track that could indicate the duration that the value applied
to. Global or live metadata could simply have no duration to indicate
it was for the entire clip or until a new value was encountered (for
the live case).

Does anyone have any thoughts on this? Is there any reason we
shouldn’t explore a single metadata solution using a timed track?

Basil Mohamed Gohar

unread,

Jan 18, 2012, 3:31:11 PM1/18/12

to webm-d...@webmproject.org

(Note, I'm neither a codec nor a container format developer, just a
geeky user, so take all of this with that knowledge first.)

I see the argument for the global metadata being a special case of the
temporal, but in the global case, there is usually a high likelihood
that the metadata is always in the same place (e.g., either at the
beginning or the end of the file) which yields a much, much simpler
method of extraction for client software.

In the temporal case, I suppose it could be constructed in such a way
that, in the case that it doesn't real change, then it would effectively
behave as the global case - that is, it would always be located in the
same place, and remain easily extracted.

I would suggest something along the lines of temporal data being
considered updates to existing metadata, and if a previous value is not
marked as having been replaced by subsequent metadata, then it should
persist as long as its associated tracks do.

Finally, and I'm sure this has already been considered, but what already
exists within the Matroska world with regards to metadata? I would
strongly urge that whatever is developed be made as compatible as
possible with what is already out there.

James Zern

unread,

Jan 18, 2012, 7:39:07 PM1/18/12

to webm-d...@webmproject.org

Semantically this sounds similar to special casing the global
metadata. It sounds as though you're more concerned about where the
base data lives in the file and how to extract that.

> Finally, and I'm sure this has already been considered, but what already
> exists within the Matroska world with regards to metadata? I would
> strongly urge that whatever is developed be made as compatible as
> possible with what is already out there.

Matroska defines a hierarchical tag system [1], which allows tags to
be attached to tracks or chapters. Note chapters are not part of the
webm specification currently.
I agree that using an existing system for which there are tools to
manipulate the data is usually best, which is part of the reason for
this thread. With tags, for instance though, I think we'd have to
extend them to reference clusters or blocks. That or an excessive
amount of chapters might be needed.
Beyond this, will all tag systems, Matroska tags, XMP, etc. we will
need to have a list of well-known entities as well as define new
entities where we see a need. In this case some of the tools for
extraction will need to be updated as well. Given we're talking about
a web format, should we use these as a reference and provide a
solution that fits better in this framework rather than worry about
compatibility?

[1]: http://matroska.org/technical/specs/tagging/index.html

Silvia Pfeiffer

unread,

Jan 18, 2012, 7:51:52 PM1/18/12

to webm-d...@webmproject.org

Talking about metadata and timed metadata: have you considered how
this hooks into what is being done in HTML5?

There, we have defined WebVTT as a file format for external
time-synchronized data (including captions, subtitles, descriptions,
and - yes - general metadata).

In the past it has been stated that the idea for captions in WebM
would be to use the solution that is being used in HTML5 and
encapsulate it into WebM, i.e. encapsulate WebVTT into WebM. Since
WebVTT is similar to SRT and there are existing specs for how to
encapsulate SRT into Matroska, putting WebVTT into WebM should be
possible in a similar manner.

I would be totally supportive of solving the timed metadata challenge
with WebVTT and by creating a WebVTT encapsulation spec for WebM.

As for the non-timed metadata: Matroska also has a solution for
header-style metadata, i.e. metadata such as id3 tags that need to be
available to seach in apps such as iTunes. I would suggest building on
that existing solution.

HTH.

Cheers,
Silvia.

> --
> You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to webm-discuss...@webmproject.org.
> For more options, visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>

Frank Galligan

unread,

Jan 18, 2012, 9:46:23 PM1/18/12

to webm-d...@webmproject.org

I see the argument for the global metadata being a special case of the
temporal, but in the global case, there is usually a high likelihood
that the metadata is always in the same place (e.g., either at the
beginning or the end of the file) which yields a much, much simpler
method of extraction for client software.

I agree, if you are only looking at supporting global metadata vs temporal metadata. If you want to support both then having global metadata be a special case of temporal is most likely simpler because you only need to support one method of extraction/insertion vs two if they are use separate storing methods.

In the temporal case, I suppose it could be constructed in such a way
that, in the case that it doesn't real change, then it would effectively
behave as the global case - that is, it would always be located in the
same place, and remain easily extracted.

Yeah, that is the idea. If you wanted some temporal metadata to behave like global metadata you would set the time from the start of the file to the duration of the file.

I would suggest something along the lines of temporal data being
considered updates to existing metadata, and if a previous value is not
marked as having been replaced by subsequent metadata, then it should
persist as long as its associated tracks do.

One issue with this is that there will be temporal metadata that should only be valid during playback at certain times and not at setup time. The other issue is implementation wise we will still be using two different formats to store the data.

Finally, and I'm sure this has already been considered, but what already
exists within the Matroska world with regards to metadata? I would
strongly urge that whatever is developed be made as compatible as
possible with what is already out there.

As James mentioned Tags would work for global metadata but not for temporal metadata. Silvia mentioned WebVTT, which I think would be a nice fit for temporal metadata. It is very close to SRT format which is already in use by Matroska for some subtitles today.

Basil Mohamed Gohar

unread,

Jan 18, 2012, 11:34:18 PM1/18/12

to webm-d...@webmproject.org

On 01/18/2012 07:51 PM, Silvia Pfeiffer wrote:
> Talking about metadata and timed metadata: have you considered how
> this hooks into what is being done in HTML5?
>
> There, we have defined WebVTT as a file format for external
> time-synchronized data (including captions, subtitles, descriptions,
> and - yes - general metadata).
>
> In the past it has been stated that the idea for captions in WebM
> would be to use the solution that is being used in HTML5 and
> encapsulate it into WebM, i.e. encapsulate WebVTT into WebM. Since
> WebVTT is similar to SRT and there are existing specs for how to
> encapsulate SRT into Matroska, putting WebVTT into WebM should be
> possible in a similar manner.
>
> I would be totally supportive of solving the timed metadata challenge
> with WebVTT and by creating a WebVTT encapsulation spec for WebM.
>
> As for the non-timed metadata: Matroska also has a solution for
> header-style metadata, i.e. metadata such as id3 tags that need to be
> available to seach in apps such as iTunes. I would suggest building on
> that existing solution.
>
> HTH.
>
> Cheers,
> Silvia.

I don't know the specifics of WebVTT nor SRT, but if what Silvia has
described really is feasible, then it sounds great. I would break it
down as follows:

For "global" metadata, we use Matroska's existing features for
"tag"-style information. This would require basically zero modification
to existing implementations that already support.

For temporal metadata, as this is a new feature, this WebVTT
encapsulation seems to be appropriate. It would not conflict with
anything, and support for temporal metadata already requires new code to
be written, and it would be one less bit of technology to have to deal
with (namely, HTML5 is already pushing for this format for metadata of a
similar nature).

Now, does the HTML5 spec itself mandate that the metadata actually be
external to the stream, or just that it be stored separately in WebVTT
format (allowing it to be encapsulated and/or extracted)? This is more
a question for Silvia.

James Zern

unread,

Jan 19, 2012, 1:31:20 AM1/19/12

to webm-d...@webmproject.org

Thanks for adding the background to the thread.

On Wed, Jan 18, 2012 at 16:51, Silvia Pfeiffer
<silviap...@gmail.com> wrote:
> Talking about metadata and timed metadata: have you considered how
> this hooks into what is being done in HTML5?
>
> There, we have defined WebVTT as a file format for external
> time-synchronized data (including captions, subtitles, descriptions,
> and - yes - general metadata).
>

Yes this is what initially got our discussion going. Using something
along these lines we'd then only need one implementation for metadata
rather than two.

Vladimir Pantelic

unread,

Jan 19, 2012, 4:28:47 AM1/19/12

to webm-d...@webmproject.org, James Zern

James Zern wrote:

> Does anyone have any thoughts on this? Is there any reason we

> shouldnï¿½t explore a single metadata solution using a timed track?

In 99% of all use cases, you will have only global metadata, so
whatever you do, please make sure that global metadata, aka artist,
album, song title can be extracted extremely fast and easily.

This means making it part of the matroska header section and not
having to parse a data track to access that. Even for a video
that has N different performers, it's annoying to have to seek
through a potentially huge file to access the metadata only,
the file could be on some slow HTTP connection after all..

Silvia Pfeiffer

unread,

Jan 19, 2012, 10:39:58 PM1/19/12

to webm-d...@webmproject.org

On Thu, Jan 19, 2012 at 3:34 PM, Basil Mohamed Gohar
<abu_hu...@hidayahonline.org> wrote:
>
> I don't know the specifics of WebVTT nor SRT, but if what Silvia has
> described really is feasible, then it sounds great. I would break it down
> as follows:
>
> For "global" metadata, we use Matroska's existing features for "tag"-style
> information. This would require basically zero modification to existing
> implementations that already support.
>
> For temporal metadata, as this is a new feature, this WebVTT encapsulation
> seems to be appropriate. It would not conflict with anything, and support
> for temporal metadata already requires new code to be written, and it would
> be one less bit of technology to have to deal with (namely, HTML5 is already
> pushing for this format for metadata of a similar nature).
>
> Now, does the HTML5 spec itself mandate that the metadata actually be
> external to the stream, or just that it be stored separately in WebVTT
> format (allowing it to be encapsulated and/or extracted)?

The TextTrack API for HTML5 supports both in-band and external text
tracks. WebVTT is the format that all browsers have basically agreed
to implement support for as the baseline external text track file
format. On top of that, the TextTrack API also interfaces with text
tracks provided in-band. They are basically mapped to the same data
structure as is being parsed from a WebVTT file. What format browsers
use there is still up in the air, but since we have the choice for
WebM it makes sense to pick WebVTT itself.

For some side information:
1. I've heard MPEG people talk about defining a encapsulation means
for WebVTT in MPEG-4, too.
2. I also think that WebVTT in Ogg would be simple and should be
supported. We could even re-use KATE tracks for WebVTT encapsulation.

Cheers,
Silvia.

Silvia Pfeiffer

unread,

Jan 19, 2012, 11:13:09 PM1/19/12

to webm-d...@webmproject.org

On Thu, Jan 19, 2012 at 1:46 PM, Frank Galligan <fgal...@google.com> wrote:
>>
>> I see the argument for the global metadata being a special case of the
>> temporal, but in the global case, there is usually a high likelihood
>> that the metadata is always in the same place (e.g., either at the
>> beginning or the end of the file) which yields a much, much simpler
>> method of extraction for client software.
>
> I agree, if you are only looking at supporting global metadata vs temporal
> metadata. If you want to support both then having global metadata be a
> special case of temporal is most likely simpler because you only need to
> support one method of extraction/insertion vs two if they are use separate
> storing methods.

I think it would be a mistake to deal with them in the same manner.

Applications such as iTunes and other media players as well as asset
management systems typically find metadata in a defined location in a
media file without having to seek, without having to parse more than
the first few KB of data, and without having to parse all text tracks
to determine where to find metadata.

Thus, the disadvantages of having a single method of dealing with
metadata in my mind far outweigh the advantages. Header-style metadata
(i.e. "tags") and timed metadata are two fundamentally different types
of data and should be handled in two separate ways.

In Ogg we solved the problem of having changing file-wide metadata by
allowing chaining of media files, i.e. essentially concatenating media
resources. I am not sure that would be possible with WebM nor whether
that is a good solution. I would actually much prefer introducing the
concept of playlists for this situation and managing it in this way.
It is a problem that I don't think we should solve on the file level.

Cheers,
Silvia.

Ralph Giles

unread,

Jan 24, 2012, 8:56:37 PM1/24/12

to webm-d...@webmproject.org

On 20 January 2012 17:13, Silvia Pfeiffer <silviap...@gmail.com> wrote:

> Header-style metadata
> (i.e. "tags") and timed metadata are two fundamentally different types
> of data and should be handled in two separate ways.

I generally agree with this. File level metadata like title, creator,
language, license, etc. have a different use case. Moreover, Matroska
already has an established tag mechanism which many media players
already know. I am in favour of using that encapsulation for WebM over
a custom format, or the XMP blob idea mentioned on IRC.

That doesn't give a fixed offset for the metadata, but one can at
least require it to be at the start (or end) of the file so the
buffering requirements are minimal. The ebml parser subset needed is
quite small.

For Ogg we did define a baseline semantic vocabulary for tags. I think
that was helpful for interoperability, so I would support publishing
such a set for WebM. Afaict Matroska itself doesn't itself do so?

Metadata of this type is generally for the convenience of indexers
(online or in a media player). For example, our media code in Firefox
completely ignores the tags. I'm not sure it's especially necessary to
provide an interface for this either, since one can parse the resource
directly in javascript.

I'm also excited to hear you're interested in timed metadata. For that
I would point out the kind=metadata variant of webvtt.

Timed metadata is always going to be a niche application, and needs
specific application support to be useful. We're already implementing
webvtt as a format for HTML5 <track> elements, so it's easy to attach
to files; the specification also allows embedding the same data in the
media resource itself. The most webby thing to do would be to use the
same format for subtitles and for timed metadata so we content can use
the same interface to support whatever it wants to. Right now the
interpretation of kind=metadata webvtt is completely up to the
application, but for interoperability we could suggest a general
framework, like json cue text with a defined semantic vocabulary for
common use cases like geolocation, camera parameters and so on.

FWIW,
-r

--
Ralph Giles
Xiph.org Foundation for open multimedia

Frank Galligan

unread,

Jan 25, 2012, 10:23:32 AM1/25/12

to webm-d...@webmproject.org

CIL

On Tue, Jan 24, 2012 at 8:56 PM, Ralph Giles <gi...@xiph.org> wrote:

On 20 January 2012 17:13, Silvia Pfeiffer <silviap...@gmail.com> wrote:

> Header-style metadata
> (i.e. "tags") and timed metadata are two fundamentally different types
> of data and should be handled in two separate ways.

I generally agree with this. File level metadata like title, creator,
language, license, etc. have a different use case. Moreover, Matroska
already has an established tag mechanism which many media players
already know. I am in favour of using that encapsulation for WebM over
a custom format, or the XMP blob idea mentioned on IRC.

That doesn't give a fixed offset for the metadata, but one can at
least require it to be at the start (or end) of the file so the
buffering requirements are minimal. The ebml parser subset needed is
quite small.

For Ogg we did define a baseline semantic vocabulary for tags. I think
that was helpful for interoperability, so I would support publishing
such a set for WebM. Afaict Matroska itself doesn't itself do so?

Matroska does define a baseline of tags. http://matroska.org/technical/specs/tagging/index.html Scroll down to the "Official Tags" section.

The plan is to define a set of standard metadata for WebM, both global and temporal.

Metadata of this type is generally for the convenience of indexers
(online or in a media player). For example, our media code in Firefox
completely ignores the tags. I'm not sure it's especially necessary to
provide an interface for this either, since one can parse the resource
directly in javascript.

I'm not sure about this either. That will be up to the clients whether they want to expose this information.

I'm also excited to hear you're interested in timed metadata. For that
I would point out the kind=metadata variant of webvtt.

Good to know (my vote too by the way). We first wanted to get everyone's opinion on the idea of just changing all metadata to be stored temporally. The next step I think will be to decide how to store the metadata within the WebM files. Then define a standard set of metadata.

Timed metadata is always going to be a niche application, and needs
specific application support to be useful. We're already implementing
webvtt as a format for HTML5 <track> elements, so it's easy to attach
to files; the specification also allows embedding the same data in the
media resource itself.

This was one of the pros for storing all data temporally (and if we store it as WebVTT) we will pretty much get all metadata supported very quickly by the browsers.

The most webby thing to do would be to use the
same format for subtitles and for timed metadata so we content can use
the same interface to support whatever it wants to. Right now the
interpretation of kind=metadata webvtt is completely up to the
application, but for interoperability we could suggest a general
framework, like json cue text with a defined semantic vocabulary for
common use cases like geolocation, camera parameters and so on.

The thinking was we would standardize a set of metadata for temporal as well.

FWIW,
-r

--
Ralph Giles
Xiph.org Foundation for open multimedia

Basil Mohamed Gohar

unread,

Jan 25, 2012, 10:38:12 AM1/25/12

to webm-d...@webmproject.org

On 01/25/2012 10:23 AM, Frank Galligan wrote:
> CIL
>
> On Tue, Jan 24, 2012 at 8:56 PM, Ralph Giles <gi...@xiph.org
> <mailto:gi...@xiph.org>> wrote:
>
> On 20 January 2012 17:13, Silvia Pfeiffer

I know that WebM is frequently touted as a Web format, and sometimes I
worry that this loses sight of the fact that it will be used, regardless
of the intentions of the original implementers, as a normal file format
as well. Why I bring this is up is that a lot of the decisions related
to WebM seem to be focused on its usage as planned in the web browser,
and sometimes that is to the detriment of usages outside of that framework.

For example, as Matroska already has a lot of metadata support, and
implementations of Matroska already have good support for that, it seems
like it should be paramount that this also be utilized as extensively as
possible within WebM so as to make support for WebM within the wider
Matroska infrastructure as easy as possible. Yes, WebM has the
distinction of being a Web-prioritized format, but I don't think this
should come at the expense of compatibility with other implementations
of the parent format.

What I mean is, if something that this metadata proposal hopes to
achieve can already be done by existing implementations, and doing so
would basically require little to no work to support, then it should be
the goal to make it be so. If it's not possible (as opposed to being
just not favorable), then fine, we should go ahead and propose new
ideas, but still in a way that wouldn't be "annoying" to existing
implementations.

I hope I don't sound overly hostile, but I want WebM to reach the widest
audience possible, and there is an installed base of Matroska users and
developers already existing, and just dumping a new set of requirements
to implement that incompatible will just cause an annoyance and, in my
humble opinion, will stifle adoption of an otherwise outstanding free
format.

<key points>
So, to summarize my concerns, it seems that the temporal metadata
appears to be a novel idea, and there is nothing significantly
standardized or pre-existing within the Matroska world to support it,
and HTML5 already is standardizing on WebVTT, and there exists an
encapsulation of WebVTT for Matroska already (I hope I got that right),
so the way forward for this seems clear - use and/or build on the
existing support for WebVTT in Matroska, in a way that is compliant with
the spirit and the letter of the HTML5 standard for temporal metadata.

As for so-called global and non-temporal metadata, I find (again, as a
non-developer and a user only) no reason to not use Matroska's existing
infrastructure for metadata/tag support. It already works, and doing
something else seems foolish. On top of that, Matroska *does* seem to
have published some guidelines as to a standard set of metadata, and we
should take as much as possible from that, so as to take advantage of
existing infrastructure.
</key points>

Could someone better in the know point out to me what, if anything,
seems faulty in the above two paragraphs that needs to be addressed, or
is a case of "devil in the details", and we all agree on the basic idea
of not reinventing the wheel, just making sure it's round enough? :)

James Zern

unread,

Jan 25, 2012, 2:09:57 PM1/25/12

to webm-d...@webmproject.org

Yes I think so. We can now handle these in two separate tracks and
eventually start a new discussion around webvtt metadata and how we
might extend or update the global entries using it.

That's fair. Some of the initial debate was focused on making life
easier for the browser in extracting/parsing, that's why we brought
the discussion here. I think the cons presented currently out weigh
any pros (presumed or otherwise), however.

> I hope I don't sound overly hostile, but I want WebM to reach the widest
> audience possible, and there is an installed base of Matroska users and
> developers already existing, and just dumping a new set of requirements
> to implement that incompatible will just cause an annoyance and, in my
> humble opinion, will stifle adoption of an otherwise outstanding free
> format.
>
> <key points>
> So, to summarize my concerns, it seems that the temporal metadata
> appears to be a novel idea, and there is nothing significantly
> standardized or pre-existing within the Matroska world to support it,
> and HTML5 already is standardizing on WebVTT, and there exists an
> encapsulation of WebVTT for Matroska already (I hope I got that right),

Not quite yet, that will be coming in parallel with some of this.

Basil Mohamed Gohar

unread,

Jan 25, 2012, 2:23:08 PM1/25/12

to webm-d...@webmproject.org

On 01/25/2012 02:09 PM, James Zern wrote:
> On Wed, Jan 25, 2012 at 07:38, Basil Mohamed Gohar
> <abu_hu...@hidayahonline.org> wrote:
>
>> <key points>
>> So, to summarize my concerns, it seems that the temporal metadata
>> appears to be a novel idea, and there is nothing significantly
>> standardized or pre-existing within the Matroska world to support it,
>> and HTML5 already is standardizing on WebVTT, and there exists an
>> encapsulation of WebVTT for Matroska already (I hope I got that right),
> Not quite yet, that will be coming in parallel with some of this.

Right. I misremembered something Silvia mentioned earlier in the
thread. *SRT* has an encapsulation, WebVTT is *like* SRT, so it should
be *possible*. I hope I got *that* right now...

Ralph Giles

unread,

Jan 25, 2012, 5:18:54 PM1/25/12

to webm-d...@webmproject.org

On 26 January 2012 08:23, Basil Mohamed Gohar
<abu_hu...@hidayahonline.org> wrote:

> Right. I misremembered something Silvia mentioned earlier in the
> thread. *SRT* has an encapsulation, WebVTT is *like* SRT, so it should
> be *possible*. I hope I got *that* right now...

That's my understanding as well. There's no proposal for how exactly
to encapsulate webvtt in webm, but we should should be able to reuse
the current matroska srt mapping:
http://matroska.org/technical/specs/subtitles/srt.html

An open question is how to handle any of the positioning directives.

Silvia Pfeiffer

unread,

Jan 25, 2012, 9:25:41 PM1/25/12

to webm-d...@webmproject.org

On Thu, Jan 26, 2012 at 2:38 AM, Basil Mohamed Gohar
<abu_hu...@hidayahonline.org> wrote:
<...>

>
> <key points>
> So, to summarize my concerns, it seems that the temporal metadata
> appears to be a novel idea, and there is nothing significantly
> standardized or pre-existing within the Matroska world to support it,
> and HTML5 already is standardizing on WebVTT, and there exists an
> encapsulation of WebVTT for Matroska already (I hope I got that right),
> so the way forward for this seems clear - use and/or build on the
> existing support for WebVTT in Matroska, in a way that is compliant with
> the spirit and the letter of the HTML5 standard for temporal metadata.
>
> As for so-called global and non-temporal metadata, I find (again, as a
> non-developer and a user only) no reason to not use Matroska's existing
> infrastructure for metadata/tag support. It already works, and doing
> something else seems foolish. On top of that, Matroska *does* seem to
> have published some guidelines as to a standard set of metadata, and we
> should take as much as possible from that, so as to take advantage of
> existing infrastructure.
> </key points>
>
> Could someone better in the know point out to me what, if anything,
> seems faulty in the above two paragraphs that needs to be addressed, or
> is a case of "devil in the details", and we all agree on the basic idea
> of not reinventing the wheel, just making sure it's round enough? :)

So, there are several things at work here:

Firstly we want to make sure WebM has the best possible solution both
for global and timed metadata.
Secondly we want to make sure that it works both with the Web and with
Desktop applications.
Thirdly it would be nice to be able to re-use existing tools that had
been built for Matroska also for WebM.

I must admit that I am not completely opposed to reinventing the wheel
if that means we get a better wheel. I.e. I would take the first goal
over the third goal. However, it has to be very clear that it is
indeed a better solution. For example, if WebM had a much better way
of storing global metadata than the existing Matroska way, I would go
for it - since we want to look forward to the future with WebM and not
back.

Now, what criteria would we use for deciding whether one thing is
better than another? I have some that I care about - do add your own,
cause I'm sure I've overlooked a few.

I believe a good solution must satisfy the following:
* it works well with what HTML5 has specified
* it does not conflict with how existing tools work for Matroska
* the way to use it is obvious: e.g. global metadata is not used for
the same purpose as timed metadata
* it does not confuse people as to what to do: e.g. there is only one
way of doing global metadata

I can't think of anything else right now, so do add your own.

So, back to the technical discussion about global/mutiplexed metadata:

* I agree that WebVTT's timed metadata mechanism is the way to go for
timed metadata in WebM, simply because that's how we do it in HTML5
and there is no mechanism for that in Matroska yet. Thus, as we
determine how to encapsulate captions, subtitles and descriptions in
WebM, we should encapsulate WebVTT metadata in exactly the same way
and use that as the solution for timed metadata in WebM.

* Global metadata is a different issue: it really depends on what we
want to be able to handle.

Is the global metadata that we want to support per track? If so, then
it relates to more than just the text tracks and needs to also be able
to be carried for audio and video tracks. In Ogg we do this with
skeleton by having message header fields, and also by having
VorbisComment headers on each track. This is useful because as you rip
out certain tracks from a multitrack resource, you retain the metadata
that relates to the individual track.

Or is it really global metadata that we want to support here? Then it
should logically be carried independently of a track (independently of
WebVTT tracks, too, which can have their own separate metadata). In
Ogg we don't actually have a means for such global metadata.

Matroska's existing metadata (tagging) mechanism is very general: it
allows to provide tags that either belong to a track, to the full
resource, to a chapter, to a edition, or an attachment (IIUC). This
may be too generic for us to support in WebM - in particular it seems
to allow mixing track-based metadata with truely global metadata,
which may be too confusing.

So, overall, I don't think we've properly specified the problem yet
that we want to solve.

Also note that in HTML5 we actually don't have a way to expose global
metadata to JavaScript (not truely global, not track-based global
metadata): there is no API. The only thing that HTML5 knows right now
is timed metadata through WebVTT. I would like to see such an API,
which would also be used to e.g. expose ID3 tags for MPEG-based
content to JavaScript, but I don't have high hopes for it because
there also is no API to expose image metadata (e.g. EXIM) to
JavaScript. The point that has been made in the past is that if
JavaScript needs access to such metadata, the server should extract it
from the file and hand it to the Web app rather than JS doing it
itself.

Cheers,
Silvia.

James Zern

unread,

Jan 26, 2012, 2:48:52 AM1/26/12

to webm-d...@webmproject.org

A thread regarding WebVTT encapsulation was started here [1].

[1]: https://groups.google.com/a/webmproject.org/group/webm-discuss/browse_thread/thread/41d8d9db7c0c2e85#

Frank Galligan

unread,

Jan 26, 2012, 8:46:37 AM1/26/12

to webm-d...@webmproject.org

Hi Basil,

I don't think there is anything wrong with what you are saying. I think most of us go out of our way to make sure any new features are compatible with what is out there. I know when I'm looking at new features I always think how it is going to effect current tools and regularly shoot down ideas that would force a change to existing muxers/demuxers if there is a viable alternative. With that being said sometimes you need to look at ideas that might changes some things if you think in the long run the benefit will outweigh the pain. This thread was started as a RFC to see what people thought of the idea of treating all metadata as temporal. This wasn't to benefit the web, but to benefit all implementations as they didn't have to treat metadata separately as all data is temporal in media files (global being from start to duration). So far pretty much everyone has said they would not like to make this change.

I pretty much agree with your point that if a feature that people want implemented in WebM can already be achieved in Matroska we should look there first. But that doesn't mean we should use that feature without looking to make sure that is the best route long term for all groups. (Personally I look at current Matroska tools and clients, HTML5 browsers, and other video tools and clients.) So yes I don't want to reinvent the wheel, unless it is not round enough.

It looks like pretty much most people want separate metadata, global and temporal, and most people have said they would like to use Matroska Tags for the global metadata. How do people use MKV tags today? Are there any deficiencies with the tags spec?

Frank

Frank Galligan

unread,

Jan 26, 2012, 9:10:59 AM1/26/12

to webm-d...@webmproject.org

When defining global metadata I think we should go with most people think of. Most people think of global metadata as "title", "creator", "license", .. These are tied to the file. I do understand that if you tied metadata to the track you satisfy some smaller use cases requirements (e.g. adding a music track to a video sequence), but for most uses cases you are going to be adding complexity and rules to define how global metadata should be resolved.

I think we should define global metadata as:

* Pertaining to the duration of the file (or live stream). I.e. not to individual streams.

* Not temporal.

Anything else anyone want to add?

Frank

--

Basil Mohamed Gohar

unread,

Jan 26, 2012, 9:18:33 AM1/26/12

to webm-d...@webmproject.org

On 01/26/2012 09:10 AM, Frank Galligan wrote:

When defining global metadata I think we should go with most people think of. Most people think of global metadata as "title", "creator", "license", .. These are tied to the file. I do understand that if you tied metadata to the track you satisfy some smaller use cases requirements (e.g. adding a music track to a video sequence), but for most uses cases you are going to be adding complexity and rules to define how global metadata should be resolved.

I believe this would cover the vast majority of cases and is also what people expect, because, at least in my experience, most metadata/tags in multimedia files are treated globally, and applied to all content of the file as a whole, as opposed to individual tracks. I believe the same is true for any multimedia used in a streaming or Web context (e.g., YouTube videos, streaming media, etc.) which could be considered some of possible new uses of WebM that were not so widespread before in a standard format.

I think we should define global metadata as:

* Pertaining to the duration of the file (or live stream). I.e. not to individual streams.

* Not temporal.

Anything else anyone want to add?

I think this sounds found. Now, to be "that guy", I want to ask the question if this means there will basically be no way left to add useful and/or equivalent metadata on a track-by-track basis (e.g., maybe you do want to label a secondary audio track as a translation and attribute it to the translator as "creator"). Is that something that would be completely ruled-out? I am assuming that such a feature is already supported by Matroska.

I think this is a minor feature (i.e., being able to add metadata to tracks individually), but it would be nice to have it be *possible*, if not required. This way, pretty-much all options are there, and if an implementation does not want to bother with track-by-track metadata, then there'll be no blame on them, as long as they're still properly supporting the global metadata (which would apply to the file as a whole).

Frank Galligan

unread,

Jan 26, 2012, 10:14:33 AM1/26/12

to webm-d...@webmproject.org

I think what you are asking for basically boils down to a specific subset of non-standard global metadata.

Like I said before I think the high-level process to define metadata in WebM should be:

1. Define what metadata is. I.E. global vs temporal, global definition, ...

2. Specify how metadata is stored. I.E. MKV tags, xmp, ...

3. Standardize a set of the metadata vocabulary.

Currently for 2. I have only heard of Tags and some people mention xmp. Both of which have support for track specific metadata. MKV tags has TrackUID element. XMP has pantries.

So I think what you want to do, you could do two different ways (unless someone comes up with another solution for #2 that everyone decides they want to use).

The first way would be to use the unsupported metadata implementation (assuming most people want to keep global metadata to the file only) that supports track specific metadata. e.g. for MKV tags you would set the TrackUID for your metadata. I think this will work for most muxers/demuxers but there could be some muxers/demuxers out there that would treat this file as invalid or strip the data.

The second way would be to use the supported metadata implementation with non-standard metadata. So basically you could define your names of the metadata like "audio track 1 creator", "audio 2 creator", "audio 3 creator", ...

I would vote to use the second method as it will be guaranteed to work with muxers/demuxers that implement the Webm metadata standard, even if the base standard has support for the first method.

Frank

Silvia Pfeiffer

unread,

Jan 26, 2012, 4:25:09 PM1/26/12

to webm-d...@webmproject.org, webm-d...@webmproject.org

This is a lot harder to parse with tools than the structured approach where the track is called out explicitly in a field. Why not use the field when it already exists? Have you come across tools that ignore that field for MKV files?

Silvia.

Frank Galligan

unread,

Jan 26, 2012, 11:18:31 PM1/26/12

to webm-d...@webmproject.org

On Thu, Jan 26, 2012 at 4:25 PM, Silvia Pfeiffer <silviap...@gmail.com> wrote:

This is a lot harder to parse with tools than the structured approach where the track is called out explicitly in a field.

I don't understand. What is a lot harder to parse?

Why not use the field when it already exists?

Just because something exists doesn't necessarily means it is a good idea to use it.

If we think it is a good idea to allow Tracks to have separate metadata from themselves and the file metadata that adds complexity to the whole system. E.g. if you take your example of adding an API to HTML5 to get the metadata. With only file metadata the api might just return a list of name value pairs. With file and tracks metadata the api must have a way of distinguishing the file metadata vs all of the tracks that may have metadata. The documentation becomes more complex for developers/users as now they need to know what a media file is and what a stream/track is. We may also have to define precedence rules. E.g. what happens if the file and tracks define different licenses?

Also it is much easier to add a feature to a spec or API later then it is to deprecate one.

Have you come across tools that ignore that field for MKV files?

No, but most of my experience has been with WebM tools (tags aren't in spec) and FFmpeg which has support for it. I haven't come across any MKV files that had track specific data either, or maybe I have but the client app didn't have a track specific metadata UI. If you know of any I would like to check them out.

Frank

Silvia Pfeiffer

unread,

Jan 27, 2012, 2:39:59 AM1/27/12

to webm-d...@webmproject.org

On Fri, Jan 27, 2012 at 3:18 PM, Frank Galligan <fgal...@google.com> wrote:
> On Thu, Jan 26, 2012 at 4:25 PM, Silvia Pfeiffer <silviap...@gmail.com>
> wrote:
>>
>> This is a lot harder to parse with tools than the structured approach
>> where the track is called out explicitly in a field.
>
> I don't understand. What is a lot harder to parse?

If be bury the fact that a certain name-value metadata pair applies to
a track rather than the full file in the name of the name-value pair,
then it becomes difficult to determine what that name-value pair
applies to. For example:

name="audio track 1 creator"
value="John Smith"

rather than

name="creator"
value="John Smith"
track="audio track 1"

I was under the impression that you were suggesting the first option.

>> Why not use the field when it already exists?
>
> Just because something exists doesn't necessarily means it is a good idea to
> use it.

Sure.

> If we think it is a good idea to allow Tracks to have separate metadata from
> themselves and the file metadata that adds complexity to the whole system.

Sure, I was replying under the assumption that we wanted to support it
and your suggestion was to include it in the "name" field to which I
object. If we don't want to support it, then sure we don't need this.
But in this case we also should not try to support a hack for
track-related metadata.

> E.g. if you take your example of adding an API to HTML5 to get the metadata.
> With only file metadata the api might just return a list of name value
> pairs. With file and tracks metadata the api must have a way of
> distinguishing the file metadata vs all of the tracks that may have
> metadata.

Right. There is one more field to parse. And that field already exists
in the Matroska Tags.

We could certainly instead decide not to support that extra field and
instead expect the tracks themselves to take care of their metadata.
For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.
That's another option for supporting track-related metadata. IIUC
VorbisComment headers are already included in Vorbis tracks. So,
track-related metadata would be another parsing step away, but it
would be possible already.

> The documentation becomes more complex for developers/users as now
> they need to know what a media file is and what a stream/track is. We may
> also have to define precedence rules. E.g. what happens if the file and
> tracks define different licenses?

It'a actually already a legal problem that a combination of content
can have a different license to the individual components. This would
be a way to represent the actual legal situation.

> Also it is much easier to add a feature to a spec or API later then it is to
> deprecate one.

Yes, sure. I didn't want to say that we have to do it. I was just
objecting to the proposed solution.

We should probably make a collection of Matroska files that have
metadata and inspect them to see what is currently done. I'm pretty
sure VLC supports it, but not sure to what extent.

Cheers,
Silvia.

Frank Galligan

unread,

Jan 27, 2012, 9:29:04 AM1/27/12

to webm-d...@webmproject.org

Hi Silvia,

CIL

On Fri, Jan 27, 2012 at 2:39 AM, Silvia Pfeiffer <silviap...@gmail.com> wrote:

On Fri, Jan 27, 2012 at 3:18 PM, Frank Galligan <fgal...@google.com> wrote:
> On Thu, Jan 26, 2012 at 4:25 PM, Silvia Pfeiffer <silviap...@gmail.com>
> wrote:
>>
>> This is a lot harder to parse with tools than the structured approach
>> where the track is called out explicitly in a field.
>
> I don't understand. What is a lot harder to parse?

If be bury the fact that a certain name-value metadata pair applies to
a track rather than the full file in the name of the name-value pair,
then it becomes difficult to determine what that name-value pair
applies to. For example:

name="audio track 1 creator"
value="John Smith"

rather than

name="creator"
value="John Smith"
track="audio track 1"

I was under the impression that you were suggesting the first option.

Yes I was, but I wrote "assuming that we didn't want to support track metadata". And that Basil wanted to add metadata that was not standardized. The metadata would only be useful for that solution, so it only matters what Basil wants to put in for the name.

If the WebM specification supports track specific metadata then the latter option would be the easy choice (well track would probably be just a track number or id) whether we standardize "creator" or not.

>> Why not use the field when it already exists?
>
> Just because something exists doesn't necessarily means it is a good idea to
> use it.

Sure.

> If we think it is a good idea to allow Tracks to have separate metadata from
> themselves and the file metadata that adds complexity to the whole system.

Sure, I was replying under the assumption that we wanted to support it
and your suggestion was to include it in the "name" field to which I
object. If we don't want to support it, then sure we don't need this.
But in this case we also should not try to support a hack for
track-related metadata.

I don't think it is too much of a hack if we decide not to support track metadata in the WebM spec, well at least as the spec is concerned. I agree it would be a hack wrt Basil's solution. But to the webm spec it would just be non-standard file metadata.

This brings up another point which we will need to decide about the spec. Do we want the WebM spec to support non-standard metadata? I have just always assumed yes, but I guess it doesn't hurt to ask.

> E.g. if you take your example of adding an API to HTML5 to get the metadata.
> With only file metadata the api might just return a list of name value
> pairs. With file and tracks metadata the api must have a way of
> distinguishing the file metadata vs all of the tracks that may have
> metadata.

Right. There is one more field to parse. And that field already exists
in the Matroska Tags.

We could certainly instead decide not to support that extra field and
instead expect the tracks themselves to take care of their metadata.
For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.
That's another option for supporting track-related metadata.

Not really as we are proposing WebVTT data will be contained in their own WebM track . Right now we don't have any facility tying a WebVTT WebM track to another WebM track. But I would also argue that you do not want to do this. I don't think today there is any facility tying a WebVTT file to a specific stream in a file.

IIUC
VorbisComment headers are already included in Vorbis tracks. So,
track-related metadata would be another parsing step away, but it
would be possible already.

I would argue against this too for supporting track metadata wrt the WebM spec. One it would be specific to one type of stream. Two it would add more complexity as file global metadata would be stored differently than vorbis track metadata. Then if the WebM spec were to add support for track specific metadata for streams you would have to put reconciliation rules into effect too.

Of course specific solutions can use this data already but it is not supported so the solution would do at its own risk.

> The documentation becomes more complex for developers/users as now
> they need to know what a media file is and what a stream/track is. We may
> also have to define precedence rules. E.g. what happens if the file and
> tracks define different licenses?

It'a actually already a legal problem that a combination of content
can have a different license to the individual components. This would
be a way to represent the actual legal situation.

So here might be a pro for track specific metadata. Would it resolve the problem you mention? Maybe. Assume it did, next question is how much often does this use case occur?

I really wouldn't want something in a specification that everyone is going to have to support if it is only useful to 0.1% of the time.

> Also it is much easier to add a feature to a spec or API later then it is to
> deprecate one.

Yes, sure. I didn't want to say that we have to do it. I was just
objecting to the proposed solution.

I know.

We should probably make a collection of Matroska files that have
metadata and inspect them to see what is currently done.

Good idea.

Anyone using MKV with Tags today that they would like to share? Might not be a lot of people on a WebM specific list but if you know people using Matroska please ask them.

I'm pretty
sure VLC supports it, but not sure to what extent.

I was using VLC to inspect/add metadata, but their UI only had support for global file metadata.

Ralph Giles

unread,

Jan 27, 2012, 6:09:17 PM1/27/12

to webm-d...@webmproject.org

On 27 January 2012 20:39, Silvia Pfeiffer <silviap...@gmail.com> wrote:

> For example, text tracks in WebVTT may have metadata header fields
> that will be encoded in the CodePrivateData as per the spec proposal.

Do you have a link to this proposal?

http://matroska.org/technical/specs/subtitles/srt.html doesn't put
anything in CodecPrivate, but that would be the logical place to store
file header metadata of the sort we've discussed adding to WebVTT.

> That's another option for supporting track-related metadata. IIUC
> VorbisComment headers are already included in Vorbis tracks. So,
> track-related metadata would be another parsing step away, but it
> would be possible already.

This is an excellent point. We should at least offer guidelines about
this. For example, an implementation MAY support both in-stream
metadata like the vorbis comment packet in CodecPrivate and Matroska
tag elements, but the Tag element takes precedence.

In general, I think track-specific metadata is useful for recording
distinguishing information about a track, like the language, kind,
specific authorship information (e.g. translator) and the license if
it differs from the overall work. Otherwise global metadata should be
preferred.

Silvia Pfeiffer

unread,

Jan 27, 2012, 6:23:29 PM1/27/12

to webm-d...@webmproject.org

On Sat, Jan 28, 2012 at 10:09 AM, Ralph Giles <gi...@xiph.org> wrote:
> On 27 January 2012 20:39, Silvia Pfeiffer <silviap...@gmail.com> wrote:
>
>> For example, text tracks in WebVTT may have metadata header fields
>> that will be encoded in the CodePrivateData as per the spec proposal.
>
> Do you have a link to this proposal?

Matthew sent it on a separate thread to webm-discuss:
https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit

Cheers,
Silvia.

Silvia Pfeiffer

unread,

Jan 27, 2012, 6:46:06 PM1/27/12

to webm-d...@webmproject.org

Hi Frank, all,

On Sat, Jan 28, 2012 at 1:29 AM, Frank Galligan <fgal...@google.com> wrote:
> On Fri, Jan 27, 2012 at 2:39 AM, Silvia Pfeiffer <silviap...@gmail.com>
> wrote:
>
> This brings up another point which we will need to decide about the spec. Do
> we want the WebM spec to support non-standard metadata? I have just always
> assumed yes, but I guess it doesn't hurt to ask.

Yes, I agree we should.

>> We could certainly instead decide not to support that extra field and
>> instead expect the tracks themselves to take care of their metadata.
>> For example, text tracks in WebVTT may have metadata header fields
>> that will be encoded in the CodePrivateData as per the spec proposal.
>> That's another option for supporting track-related metadata.
>
> Not really as we are proposing WebVTT data will be contained in their own
> WebM track . Right now we don't have any facility tying a WebVTT WebM track
> to another WebM track.

That was not what I meant: I didn't think that a WebVTT track would
contain metadata about other tracks. Just about itself.

>> IIUC
>> VorbisComment headers are already included in Vorbis tracks. So,
>> track-related metadata would be another parsing step away, but it
>> would be possible already.
>
> I would argue against this too for supporting track metadata wrt the WebM
> spec. One it would be specific to one type of stream. Two it would add more
> complexity as file global metadata would be stored differently than vorbis
> track metadata. Then if the WebM spec were to add support for
> track specific metadata for streams you would have to
> put reconciliation rules into effect too.

We need those rules already. What if a WebVTT file has metadata and
the WebM file has global metadata: which wins? I would think we want
the global ones to overrule the track ones.

>> It'a actually already a legal problem that a combination of content
>> can have a different license to the individual components. This would
>> be a way to represent the actual legal situation.
>
> So here might be a pro for track specific metadata. Would it resolve the
> problem you mention? Maybe. Assume it did, next question is how much often
> does this use case occur?
>
> I really wouldn't want something in a specification that everyone is going
> to have to support if it is only useful to 0.1% of the time.

Considering the number of mashups online I think the problem is bigger
than 0.1%. But whether we want to solve it in this way is a different
question.

>> We should probably make a collection of Matroska files that have
>> metadata and inspect them to see what is currently done.
>
> Good idea.
>
> Anyone using MKV with Tags today that they would like to share? Might not be
> a lot of people on a WebM specific list but if you know people using
> Matroska please ask them.

I've done a Google search for "filetype:mkv" which takes you to a lot
of dodgy sites. So I am not sure that's the best way to discover
files. Are there no collections of mkv files for validation of format?
Also, is there a site where we could upload such files? Should we
start a collection?

Cheers,
Silvia.

Ralph Giles

unread,

Jan 27, 2012, 7:05:42 PM1/27/12

to webm-d...@webmproject.org

On 28 January 2012 12:09, Ralph Giles <gi...@xiph.org> wrote:

>> For example, text tracks in WebVTT may have metadata header fields
>> that will be encoded in the CodePrivateData as per the spec proposal.
>
> Do you have a link to this proposal?

Having recently looked at Matroska for the Opus mapping, I wrote a
proposal for adding WebVTT to WebM at
https://wiki.xiph.org/MatroskaWebVTT#DRAFT

Comments welcome.

James Zern

unread,

Jan 27, 2012, 10:33:48 PM1/27/12

to webm-d...@webmproject.org

On Fri, Jan 27, 2012 at 16:05, Ralph Giles <gi...@xiph.org> wrote:
> On 28 January 2012 12:09, Ralph Giles <gi...@xiph.org> wrote:
>
>>> For example, text tracks in WebVTT may have metadata header fields
>>> that will be encoded in the CodePrivateData as per the spec proposal.
>>
>> Do you have a link to this proposal?
>
> Having recently looked at Matroska for the Opus mapping, I wrote a
> proposal for adding WebVTT to WebM at
> https://wiki.xiph.org/MatroskaWebVTT#DRAFT
>
> Comments welcome.
>

"- CodecID is S_WEBVTT"
"We need some way to signal the 'kind' attribute from the html5
embedding. That is, whether a give track is subtitles, captions,
description, or metadata."
See Matt's proposal [1], S_TEXT/VTT/kind.

We should merge any differences between these and work from one
source. I believe Matt was working on moving the doc to a wiki on
webmproject.

[1]: https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit

Frank Galligan

unread,

Jan 27, 2012, 11:24:31 PM1/27/12

to webm-d...@webmproject.org

I think we only have to define it for standardized metadata. I think WebVTT metadata would be used for temporal. I could see where we standardize on a GPS location for global. (e.g. Like a picture, I took my video around here.) I also think we want to standardize on temporal GPS metadata.

Even though the global and temporal both are standardized are they really going to be competing with each other. One use case is for global in general the file was created here. Then in another use case you could follow along on the map for a bike ride. I'm not sure one really needs to take precedence in this case.

Maybe there will be a case where global and temporal metadata that is standardized will need to take priority. If that is the case then we can come up with rules to put in the spec.

>> It'a actually already a legal problem that a combination of content
>> can have a different license to the individual components. This would
>> be a way to represent the actual legal situation.
>
> So here might be a pro for track specific metadata. Would it resolve the
> problem you mention? Maybe. Assume it did, next question is how much often
> does this use case occur?
>
> I really wouldn't want something in a specification that everyone is going
> to have to support if it is only useful to 0.1% of the time.

Considering the number of mashups online I think the problem is bigger
than 0.1%. But whether we want to solve it in this way is a different
question.

>> We should probably make a collection of Matroska files that have
>> metadata and inspect them to see what is currently done.
>
> Good idea.
>
> Anyone using MKV with Tags today that they would like to share? Might not be
> a lot of people on a WebM specific list but if you know people using
> Matroska please ask them.

I've done a Google search for "filetype:mkv" which takes you to a lot
of dodgy sites. So I am not sure that's the best way to discover
files.

Heh.

Are there no collections of mkv files for validation of format?

I'm sure there are some. I think FFmpeg used to have a bunch of different files. I'm not sure these collection would be the best to base decisions off of. Usually they are files that are made to try out different parts of the specification, not if people really use them day-to-day.

Also, is there a site where we could upload such files?

Maybe we could put them up on webmproject.org?

Should we
start a collection?

Sure. But I still think we need to figure out how people are using Matroska Tags today.

Steve Lhomme

unread,

Jan 29, 2012, 10:12:33 AM1/29/12

to webm-d...@webmproject.org

Hello everyone,

A lot of comments pop in my mind when reading this...

First of all, the current Matroska tags system is actually the 3rd
version we came up with. It was made simple and flexible as much as we
could, while being generic enough to support all we could think of. In
the case of WebM you probably don't need all the cases, but as for the
rest of the WebM specs, you can define a profile of what you use in that
system and what you don't. For example you don't need the AttachmentUID
target or maybe no Targets at all if you only want tags global to the
Segment.

Chained Matroska files (file concatenation) have tags in each segment,
so the tags remain clean from this operation. In a live stream a new
Segment could be created whenever the metadata need to change and the
Tags put at the front of the live stream.

Temporal vs "global". I think the terminology is important here,
temporal tags like 'live' GPS info should be tracks, there is no
question about that. I am not sure this is the focus of this discussion
though. Even though a standard way to do it could be useful for some at
some point... I consider everything non temporal as a "global" tag,
because it needs to be extracted without parsing/playing the whole file.
Also if you want to give metadata for the first half of a video, you're
not going to repeat the tag info every other second in a track. That's
why Matroska Tags sit on top of Chapters when needed. And whatever
solution you pick in the end, it has to cover this use case.

I don't think the idea of having metadata outside of the content file is
a good idea. All tags could be lost if you move a file (to your phone)
and you forget the tag file (or the upload system only accepts certain
types of file). One the other hand it is a bit faster to parse a lot of
metadata for many files. For on a server, for instance, you still need
to ask which is the tag file to download for a particular file. Not sure
the round trip helps much.

As you can see here, Tags can either be found at the front of the very
bottom of the file: http://www.matroska.org/technical/order/index.html
I don't know if anyone is like me, but I tag my own files with custom
genres and grouping names. So the most common case for me would be that
tags are at the back of the file.

WebM could be bold and force tags to be always at the front. Meaning a
big remux is necessary whenever you modify tags a lot. Among
requirements it should also be mandatory to put the tag name before the
tag value. Also the TagSimple may not be recursive (can be used to add
the Twitter/email info about the artist in that tag, for example).

Matroska tags have free name strings, that's so they can easily be
extended and still be meaningful when displayed to the user, even if the
name is not semantically interpreted. I think any good tag system should
have this kind of extension possible.

Steve

Steve Lhomme

unread,

Jan 29, 2012, 10:14:34 AM1/29/12

to webm-d...@webmproject.org

Le 28/01/2012 00:46, Silvia Pfeiffer a écrit :
>> Anyone using MKV with Tags today that they would like to share? Might not be
>> a lot of people on a WebM specific list but if you know people using
>> Matroska please ask them.
>
> I've done a Google search for "filetype:mkv" which takes you to a lot
> of dodgy sites. So I am not sure that's the best way to discover
> files. Are there no collections of mkv files for validation of format?
> Also, is there a site where we could upload such files? Should we
> start a collection?

All the files here are properly tagged (although not using any advanced
features of Matroska tags) :
http://www.matroska.org/downloads/test_w1.html

Steve Lhomme

unread,

Jan 29, 2012, 10:22:15 AM1/29/12

to webm-d...@webmproject.org

I am not sure about the codec ID S_TEXT/VTT/kind
"S_" stands for subtitles.

At some point we had something called 'control tracks', such would have
had a "C_" prefix. Maybe we could use a "D_" prefix for all non video,
audio and subtitle data. D is for data. I also think it is wrong to use
the type 0x11 for all these WebVTT data as they are not subtitles.

If WebVTT subtitles are just UTF-8 strings, then the regular
S_TEXT/UTF-8 codec ID should be used for broader compatibility.

Chapters inside a track are not very useful. And using CodecPrivate will
completely break after a remuxing/editing.

Basil Mohamed Gohar

unread,

Jan 29, 2012, 11:37:23 AM1/29/12

to webm-d...@webmproject.org

On 01/29/2012 10:12 AM, Steve Lhomme wrote:
> Hello everyone,
Hello, Steve. Glad to see participation from the Matroska team on this
specific discussion. :)

>
> A lot of comments pop in my mind when reading this...
>
> First of all, the current Matroska tags system is actually the 3rd
> version we came up with. It was made simple and flexible as much as we
> could, while being generic enough to support all we could think of. In
> the case of WebM you probably don't need all the cases, but as for the
> rest of the WebM specs, you can define a profile of what you use in
> that system and what you don't. For example you don't need the
> AttachmentUID target or maybe no Targets at all if you only want tags
> global to the Segment.
>
> Chained Matroska files (file concatenation) have tags in each segment,
> so the tags remain clean from this operation. In a live stream a new
> Segment could be created whenever the metadata need to change and the
> Tags put at the front of the live stream.

Does this mean that the way to update metadata for a live stream (e.g.,
title, creator/performer, etc.) is to place tags at the front of a new
segment within the stream? Is this a behavior that's already supported
by some/most players? If I'm not mistaken, I believe this is how it
works with Ogg as well.

>
> Temporal vs "global". I think the terminology is important here,
> temporal tags like 'live' GPS info should be tracks, there is no
> question about that. I am not sure this is the focus of this
> discussion though. Even though a standard way to do it could be useful
> for some at some point... I consider everything non temporal as a
> "global" tag, because it needs to be extracted without parsing/playing
> the whole file. Also if you want to give metadata for the first half
> of a video, you're not going to repeat the tag info every other second
> in a track. That's why Matroska Tags sit on top of Chapters when
> needed. And whatever solution you pick in the end, it has to cover
> this use case.

This is a compelling argument to make "data" that changes regularly
throughout the file be a track, as opposed to just metadata. The think
the distinction is do we consider it data in and of itself, or do we
look at it solely as data that is describing other data, and we're not
interested in it alone. Given your stance, I tend to agree with that
view point. Why not just make data that's changing throughout the life
of the file be its own track? Then it will have all the timing
information we could want. Since the "temporal metadata" would still
have to be spread across the file anyway, and it wouldn't have been
placed at, say, the head of the file, this shouldn't really cause a lot
of complications.

>
> I don't think the idea of having metadata outside of the content file
> is a good idea. All tags could be lost if you move a file (to your
> phone) and you forget the tag file (or the upload system only accepts
> certain types of file). One the other hand it is a bit faster to parse
> a lot of metadata for many files. For on a server, for instance, you
> still need to ask which is the tag file to download for a particular
> file. Not sure the round trip helps much.

Personally, I totally agree with this. I know that there's been some
discussion elsewhere about putting associated information (I think it
was subtitles?) in a separate file, and that is for ease of parsing by
Javascript, I think. However, it would be nice if WebM itself supported
such metadata inside the file, and not require it to be outside.

>
> As you can see here, Tags can either be found at the front of the very
> bottom of the file: http://www.matroska.org/technical/order/index.html
> I don't know if anyone is like me, but I tag my own files with custom
> genres and grouping names. So the most common case for me would be
> that tags are at the back of the file.
>
> WebM could be bold and force tags to be always at the front. Meaning a
> big remux is necessary whenever you modify tags a lot. Among
> requirements it should also be mandatory to put the tag name before
> the tag value. Also the TagSimple may not be recursive (can be used to
> add the Twitter/email info about the artist in that tag, for example).

I think this doesn't matter so much. If I'm not mistaken, the general
way to deal with this is to just leave some extra space where additional
tags could be, so they could be updated. However, I think the remuxing
case isn't that annoying because it won't come up that often, and not
often enough to write the standard around it.

>
> Matroska tags have free name strings, that's so they can easily be
> extended and still be meaningful when displayed to the user, even if
> the name is not semantically interpreted. I think any good tag system
> should have this kind of extension possible.
>
> Steve
>

Thanks again!

Steve Lhomme

unread,

Jan 30, 2012, 4:13:04 AM1/30/12

to webm-d...@webmproject.org

>> Chained Matroska files (file concatenation) have tags in each segment, so
>> the tags remain clean from this operation. In a live stream a new Segment
>> could be created whenever the metadata need to change and the Tags put at
>> the front of the live stream.
>
> Does this mean that the way to update metadata for a live stream (e.g.,
> title, creator/performer, etc.) is to place tags at the front of a new
> segment within the stream? Is this a behavior that's already supported by
> some/most players? If I'm not mistaken, I believe this is how it works with
> Ogg as well.

Not sure it's already supported, the most likely would be GStreamer.
But yes, that's how it should work. I am not sure players that can
handle live streams are OK with new Segments in the stream either.

>> WebM could be bold and force tags to be always at the front. Meaning a big
>> remux is necessary whenever you modify tags a lot. Among requirements it
>> should also be mandatory to put the tag name before the tag value. Also the
>> TagSimple may not be recursive (can be used to add the Twitter/email info
>> about the artist in that tag, for example).
>
> I think this doesn't matter so much. If I'm not mistaken, the general way
> to deal with this is to just leave some extra space where additional tags
> could be, so they could be updated. However, I think the remuxing case
> isn't that annoying because it won't come up that often, and not often
> enough to write the standard around it.

You have to consider cover art. This is usually a JPEG file that is
many KB large. Requiring 50 KB of padding at the front in case you
want padding is not a good idea. In Matroska the cover art is put in
Attachments (usually found at the back of the file, mkclean puts it at
the front). See http://www.matroska.org/technical/cover_art/index.html

--
Steve Lhomme
Matroska association Chairman

Frank Galligan

unread,

Jan 30, 2012, 10:20:54 AM1/30/12

to webm-d...@webmproject.org

Hi Steve,

Live webm global metadata. I think most people want to switch to some kind of adaptive solution for live. But changing global metadata could be added as a use case. As for how currently live streams are done I guess if there was big enough support from the clients for supporting chained segments or something like that we could always add it.

At this point I don't think there are any plans to support global metadata outside of the file.

As for tags placement, I would probably leave it up to the muxer to decide where to put the tags with a seekhead offset. Pretty much like cues let the use case dictate where the tags should be.

Frank

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.

To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.

Frank Galligan

unread,

Jan 30, 2012, 10:23:23 AM1/30/12

to webm-d...@webmproject.org

Thanks. Do you also know of any solutions using tags with track specific ids? I'm trying to come up with use cases.

Frank

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.

To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.

Steve Lhomme

unread,

Jan 30, 2012, 10:59:09 AM1/30/12

to webm-d...@webmproject.org

On Mon, Jan 30, 2012 at 4:20 PM, Frank Galligan <fgal...@google.com> wrote:
> Hi Steve,
>
> Live webm global metadata. I think most people want to switch to some kind
> of adaptive solution for live. But changing global metadata could be added
> as a use case. As for how currently live streams are done I guess if there
> was big enough support from the clients for supporting chained segments or
> something like that we could always add it.

I guess adaptive streaming is just like concatenating segments, except
they come from different URLs. Also for (live) adaptive streaming I
think the metadata are supposed to be available in the manifest file
(which is reloaded each time for a live stream).

Frank Galligan

unread,

Jan 30, 2012, 11:00:24 AM1/30/12

to webm-d...@webmproject.org

Understood on the "S_...". I'm fine with changing this. Anyone else really want to use the "S_" for embedding WebVTT data?

As for chapters not useful in tracks. I agree that webvtt chapters feel like they should live in MKV chapters (actually this is where I first had them). But there is a use case that is not covered (we could change this but it would break a lot of current demuxers). Also translating from webvtt chapters <-> mkv chapters increases complexity on muxers/demuxers. In the end the goal of both chapters is to present the user with a list of name-to-time values. If it is stored in one or more tracks or in some global data the demuxer should still be able to give that mapping to the client.

I do agree that remuxing/editing could break the CodecPrivate solution and should be added to the doc.

Frank

On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:

Le 28/01/2012 00:23, Silvia Pfeiffer a écrit :

On Sat, Jan 28, 2012 at 10:09 AM, Ralph Giles<gi...@xiph.org> wrote:

On 27 January 2012 20:39, Silvia Pfeiffer<silviapfeiffer1@gmail.com> wrote:

For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.

Do you have a link to this proposal?

Matthew sent it on a separate thread to webm-discuss:
https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit

I am not sure about the codec ID S_TEXT/VTT/kind
"S_" stands for subtitles.

At some point we had something called 'control tracks', such would have had a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and subtitle data. D is for data. I also think it is wrong to use the type 0x11 for all these WebVTT data as they are not subtitles.

If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8 codec ID should be used for broader compatibility.

Chapters inside a track are not very useful. And using CodecPrivate will completely break after a remuxing/editing.

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.

To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.

Steve Lhomme

unread,

Jan 30, 2012, 11:01:23 AM1/30/12

to webm-d...@webmproject.org

I don't have a file but a use case could be the speaker in a
commentary audio track, whereas using the main track ID you would
describe metadata about the general audio track for a movie.

On Mon, Jan 30, 2012 at 4:23 PM, Frank Galligan <fgal...@google.com> wrote:
> Thanks. Do you also know of any solutions using tags with track specific
> ids? I'm trying to come up with use cases.
>
> Frank
>
>
> On Sun, Jan 29, 2012 at 10:14 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> Le 28/01/2012 00:46, Silvia Pfeiffer a écrit :
>>
>>>> Anyone using MKV with Tags today that they would like to share? Might
>>>> not be
>>>> a lot of people on a WebM specific list but if you know people using
>>>> Matroska please ask them.
>>>
>>>
>>> I've done a Google search for "filetype:mkv" which takes you to a lot
>>> of dodgy sites. So I am not sure that's the best way to discover
>>> files. Are there no collections of mkv files for validation of format?
>>> Also, is there a site where we could upload such files? Should we
>>> start a collection?
>>
>>
>> All the files here are properly tagged (although not using any advanced
>> features of Matroska tags) :
>> http://www.matroska.org/downloads/test_w1.html
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "WebM Discussion" group.
>> To post to this group, send email to webm-d...@webmproject.org.
>> To unsubscribe from this group, send email to

>> webm-discuss...@webmproject.org.

>> For more options, visit this group at
>> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>>
>

> --
> You received this message because you are subscribed to the Google Groups
> "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to

> webm-discuss...@webmproject.org.

> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.

--

Basil Mohamed Gohar

unread,

Jan 30, 2012, 11:03:39 AM1/30/12

to webm-d...@webmproject.org

Perhaps this is a little silly of an example, but imagine a "make your
own subtitle" contest, where a clip is given many different subtitles by
contestants, or just for plain humor. Downfall, anyone...?

Frank Galligan

unread,

Jan 30, 2012, 11:08:47 AM1/30/12

to webm-d...@webmproject.org

Hi Basil,

CIL

On Sun, Jan 29, 2012 at 11:37 AM, Basil Mohamed Gohar <abu_hu...@hidayahonline.org> wrote:

On 01/29/2012 10:12 AM, Steve Lhomme wrote:

Hello everyone,

Hello, Steve. Glad to see participation from the Matroska team on this specific discussion. :)

A lot of comments pop in my mind when reading this...

First of all, the current Matroska tags system is actually the 3rd version we came up with. It was made simple and flexible as much as we could, while being generic enough to support all we could think of. In the case of WebM you probably don't need all the cases, but as for the rest of the WebM specs, you can define a profile of what you use in that system and what you don't. For example you don't need the AttachmentUID target or maybe no Targets at all if you only want tags global to the Segment.

Chained Matroska files (file concatenation) have tags in each segment, so the tags remain clean from this operation. In a live stream a new Segment could be created whenever the metadata need to change and the Tags put at the front of the live stream.

Does this mean that the way to update metadata for a live stream (e.g., title, creator/performer, etc.) is to place tags at the front of a new segment within the stream? Is this a behavior that's already supported by some/most players? If I'm not mistaken, I believe this is how it works with Ogg as well.

I think chained matrsoka would break a lot of players. but I don't know for sure. If someone wanted to make a table that would be great.

Temporal vs "global". I think the terminology is important here, temporal tags like 'live' GPS info should be tracks, there is no question about that. I am not sure this is the focus of this discussion though. Even though a standard way to do it could be useful for some at some point... I consider everything non temporal as a "global" tag, because it needs to be extracted without parsing/playing the whole file. Also if you want to give metadata for the first half of a video, you're not going to repeat the tag info every other second in a track. That's why Matroska Tags sit on top of Chapters when needed. And whatever solution you pick in the end, it has to cover this use case.

This is a compelling argument to make "data" that changes regularly throughout the file be a track, as opposed to just metadata. The think the distinction is do we consider it data in and of itself, or do we look at it solely as data that is describing other data, and we're not interested in it alone. Given your stance, I tend to agree with that view point. Why not just make data that's changing throughout the life of the file be its own track? Then it will have all the timing information we could want. Since the "temporal metadata" would still have to be spread across the file anyway, and it wouldn't have been placed at, say, the head of the file, this shouldn't really cause a lot of complications.

I think this is what everyone is advocating.

I don't think the idea of having metadata outside of the content file is a good idea. All tags could be lost if you move a file (to your phone) and you forget the tag file (or the upload system only accepts certain types of file). One the other hand it is a bit faster to parse a lot of metadata for many files. For on a server, for instance, you still need to ask which is the tag file to download for a particular file. Not sure the round trip helps much.

Personally, I totally agree with this. I know that there's been some discussion elsewhere about putting associated information (I think it was subtitles?) in a separate file, and that is for ease of parsing by Javascript, I think. However, it would be nice if WebM itself supported such metadata inside the file, and not require it to be outside.

That is what we are trying to do with the embedding WebVTT in WebM spec. Matt posted a doc here https://docs.google.com/a/google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit . I think he is trying to move it to a wiki.

As you can see here, Tags can either be found at the front of the very bottom of the file: http://www.matroska.org/technical/order/index.html
I don't know if anyone is like me, but I tag my own files with custom genres and grouping names. So the most common case for me would be that tags are at the back of the file.

WebM could be bold and force tags to be always at the front. Meaning a big remux is necessary whenever you modify tags a lot. Among requirements it should also be mandatory to put the tag name before the tag value. Also the TagSimple may not be recursive (can be used to add the Twitter/email info about the artist in that tag, for example).

I think this doesn't matter so much. If I'm not mistaken, the general way to deal with this is to just leave some extra space where additional tags could be, so they could be updated. However, I think the remuxing case isn't that annoying because it won't come up that often, and not often enough to write the standard around it.

A muxer could have a void element afterwards, but I agree with you that remuxing because of tags shouldn't happen much in most uses cases. If your solution changes global metadata a lot then you could always have the muxer put a lot of extra space.

Matroska tags have free name strings, that's so they can easily be extended and still be meaningful when displayed to the user, even if the name is not semantically interpreted. I think any good tag system should have this kind of extension possible.

Steve

Thanks again!

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.

To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.

Frank Galligan

unread,

Jan 30, 2012, 11:19:06 AM1/30/12

to webm-d...@webmproject.org

Ok, we can add this as a use case for track specific global metadata.

Basil Mohamed Gohar

unread,

Jan 30, 2012, 11:19:04 AM1/30/12

to webm-d...@webmproject.org

On 01/30/2012 11:08 AM, Frank Galligan wrote:
> Hi Basil,
>
> CIL
Likewise. :)

>
>
> On Sun, Jan 29, 2012 at 11:37 AM, Basil Mohamed Gohar
> <abu_hu...@hidayahonline.org

If testing this is as simple as just concatenating a bunch of Matroska
files, then I'm willing to do so for at least the following players (on
Fedora 16):
- Totem
- VLC
- MPlayer
- gstreamer directly via gst-launch
- Any other suggestions?
By the time I get home, though, I'm sure many others could test this.

And a void element is a fully-supported part of the spec, right? It can
just be replaced with whatever you want that is valid in that location
plus another, truncated void element (I assume) to fill-up whatever
space is left. Moreover, I think, worst case, metadata can still be put
anywhere else in the file, through the equivalent of pointers, is that
correct? My point is, even if remuxing isn't an option, a support
method still remains for putting the metadata wherever they like? This
can get messy, but I think it would still be supported.

UPDATE: I just read the page at the link above, and I think what I'm
referring to is the meta seek, and that having more than one meta seek
in a file is deprecated. So, hopefully someone that understands better
can let me know if both methods are still available or it's just fine to
swallow it and say, "Look, tags go here. Remux if you need more
space." Or perhaps I'm just confused.

>
>
>
> Matroska tags have free name strings, that's so they can
> easily be extended and still be meaningful when displayed to
> the user, even if the name is not semantically interpreted. I
> think any good tag system should have this kind of extension
> possible.
>
> Steve
>
> Thanks again!
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org

> <mailto:webm-d...@webmproject.org>.

> To unsubscribe from this group, send email to

> webm-discuss...@webmproject.org
> <mailto:webm-discuss%2Bunsu...@webmproject.org>.

> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to

> webm-discuss...@webmproject.org.

Frank Galligan

unread,

Jan 30, 2012, 11:30:16 AM1/30/12

to webm-d...@webmproject.org

Hi Basil,

As for concating a bunch of files together that would be the simplest form. If we were to add support for chained segments my guess we wouldn't want to do that. but that might be a good place to see what works currently. Please add a seek column.

I wouldn't put Tags anywhere you want. I would either stick to before the cluster data or after. I would also stick to one Tags element.

The idea of adding a void element after the Nth element is for the muxer to know that there is X amount of data after the Nth element and that the muxer can re-write Nth element and the void element following the Nth element without having to remux the entire file.

Frank

James Zern

unread,

Jan 30, 2012, 2:27:27 PM1/30/12

to webm-d...@webmproject.org

On Mon, Jan 30, 2012 at 08:00, Frank Galligan <fgal...@google.com> wrote:
> Understood on the "S_...". I'm fine with changing this. Anyone else really
> want to use the "S_" for embedding WebVTT data?
>

Going to something like "D_" seems to make sense given that only a
portion of the webvtt file contains subtitles and we don't want to get
into splitting out specific cues.

>
>
> On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> Le 28/01/2012 00:23, Silvia Pfeiffer a écrit :
>>
>>> On Sat, Jan 28, 2012 at 10:09 AM, Ralph Giles<gi...@xiph.org> wrote:
>>>>

>>>> On 27 January 2012 20:39, Silvia Pfeiffer<silviap...@gmail.com>

Ralph Giles

unread,

Jan 29, 2012, 2:19:05 PM1/29/12

to webm-d...@webmproject.org

> Le 28/01/2012 00:23, Silvia Pfeiffer a écrit :

>> Matthew sent it on a separate thread to webm-discuss:
>>
>> https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit

Great, thanks.

On 30 January 2012 04:22, Steve Lhomme <slh...@matroska.org> wrote:

> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
> codec ID should be used for broader compatibility.

For kind=subtitle and kind=caption, webvtt can have angle-bracket
markup, so I don't think this will work in general; the decoder needs
to know if it must parse the internal markup.

What do you think about having /kind in the CodecID vs having to look
in codec private to determine this?

> Chapters inside a track are not very useful. And using CodecPrivate will
> completely break after a remuxing/editing.

I agree. I suggested chapters be translated into the equivalent
matroska chapter elements in my draft.

-r

Frank Galligan

unread,

Jan 30, 2012, 5:52:39 PM1/30/12

to webm-d...@webmproject.org

Hi all,

We have been talking about this for a while here and we are thinking of making these major changes to the proposal.

1. Storing WebVTT chapter tracks in MKV Tracks.

2. All other WebVTT data will be stored in a webm track. The block payload would be a WebVTT cue minus the webvtt timing information. The "-->" would still be there as a place holder so the WebVTT decoder will now how recreate the WebVTT cue.

3. No data would be stored in the webm CodecPrivate.

(This is actually how the proposal first looked.)

I think we can still support live chapters if we treat them just like the other webvtt data. So live chapters would get inserted into a webm track and VOD chapters would get stored as MKV chapters.

So it will be up to the client demuxer if it wants to translate the WebM chapters into WebVTT chapters. It will be the job of a WebVTT decoder to handle the WEBVTT tracks in a WebM file.

Frank

Silvia Pfeiffer

unread,

Jan 30, 2012, 6:07:58 PM1/30/12

to webm-d...@webmproject.org

On Tue, Jan 31, 2012 at 6:27 AM, James Zern <jz...@google.com> wrote:
> On Mon, Jan 30, 2012 at 08:00, Frank Galligan <fgal...@google.com> wrote:
>> Understood on the "S_...". I'm fine with changing this. Anyone else really
>> want to use the "S_" for embedding WebVTT data?
>>
> Going to something like "D_" seems to make sense given that only a
> portion of the webvtt file contains subtitles and we don't want to get
> into splitting out specific cues.

FWIW I agree with the "D_" prefix.

Cheers,
Silvia.

Silvia Pfeiffer

unread,

Jan 30, 2012, 6:12:16 PM1/30/12

to webm-d...@webmproject.org

On Tue, Jan 31, 2012 at 9:52 AM, Frank Galligan <fgal...@google.com> wrote:
> Hi all,
>
> We have been talking about this for a while here and we are thinking of
> making these major changes to the proposal.
>
> 1. Storing WebVTT chapter tracks in MKV Tracks.

Did you mean MKV chapters? Or .. can you explain the difference
between WebM tracks and MKV tracks?

> 2. All other WebVTT data will be stored in a webm track. The block payload
> would be a WebVTT cue minus the webvtt timing information. The "-->" would
> still be there as a place holder so the WebVTT decoder will now how recreate
> the WebVTT cue.
> 3. No data would be stored in the webm CodecPrivate.
> (This is actually how the proposal first looked.)
>
> I think we can still support live chapters if we treat them just like the
> other webvtt data. So live chapters would get inserted into a webm track and
> VOD chapters would get stored as MKV chapters.
>
> So it will be up to the client demuxer if it wants to translate the WebM
> chapters into WebVTT chapters. It will be the job of a WebVTT decoder to
> handle the WEBVTT tracks in a WebM file.

SGTM.

What about global metadata? Should we just make one spec that deals
with WebVTT and global metadata and chapters?

Cheers,
Silvia.

Frank Galligan

unread,

Jan 31, 2012, 8:02:18 AM1/31/12

to webm-d...@webmproject.org

Yes. WebVTT chapters in MKV chapters. (This is what happens when you are rushing to get home.)

I still like to keep it two separate specs for now. One spec may end up referencing another spec .

Frank

James Zern

unread,

Jan 31, 2012, 2:12:01 PM1/31/12

to WebM Discussion

On Wed, Jan 18, 2012 at 12:20, James Zern <jz...@google.com> wrote:
> A metadata specification for WebM has been a long requested feature.
> We plan on addressing this in the near term, but wanted to have a
> general discussion before we propose a solution.
>
> One requirement we have is the support for temporal metadata to store
> things like geolocation data, e.g., GPS coordinates. The other is the
> traditional global metadata, e.g., title, author, etc.
> Initially we were thinking to define a separate standard for both
> global and temporal, which would be stored in its own track. After a
> bit of discussion it came to feel as though global was merely a
> special case of the temporal, which we knew we wanted.
>
> Consider some examples:
> i) A video clip with an audio soundtrack that changes sources through the clip.
> With one audio and one video track it would be simple enough to have
> an artist label that covered each or both. With multiple audio sources
> in one track you could similarly have an artists tag. This though
> begins to decouple the attribution for each segment.
> ii) Appending 2 or more unrelated videos.
> This is a similar problem. You get into an issue with having to merge
> metadata from each segment into one global blob. At that point which
> takes precedence as the 'artist' for the clip?
> iii) Live presentations.
> Once more the performer, etc. can change throughout the clip, like a
> shoutcast stream.
>
> It is true that a global mapping could be constructed to handle the
> above cases, but it seems that it might be simpler to have a timed
> metadata track that could indicate the duration that the value applied
> to. Global or live metadata could simply have no duration to indicate
> it was for the entire clip or until a new value was encountered (for
> the live case).
>
> Does anyone have any thoughts on this? Is there any reason we
> shouldn’t explore a single metadata solution using a timed track?

After getting feedback on the global portion of the metadata the
consensus seems to be for using the Matroska tag system as a base.
This will allow extraction of the data with minimal webm file support
and provide some existing tool support for manipulating these files. I
think using a similar wiki setup, as is being proposed for webvtt
embedding, to decide what is added/removed for webm from the existing
framework will work for this.
If anyone feels strongly about not using matroska tags as the basis
for global metadata, now is the time to speak up.

Matthew Heaney

unread,

Feb 3, 2012, 4:28:03 PM2/3/12

to webm-d...@webmproject.org

On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:
>

> I am not sure about the codec ID S_TEXT/VTT/kind
> "S_" stands for subtitles.

What do you use for SRT? Why are you treating WebVTT subtitles
differently from SRT subtitles?

> At some point we had something called 'control tracks', such would have had
> a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and
> subtitle data. D is for data.

To clarify: you think the codec ID should be D_TEXT/VTT/kind then?

> I also think it is wrong to use the type 0x11
> for all these WebVTT data as they are not subtitles.

If not 0x11, then what value?

> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
> codec ID should be used for broader compatibility.

Are SRT subtitles "just strings"?

> Chapters inside a track are not very useful. And using CodecPrivate will
> completely break after a remuxing/editing.

OK, we have tentatively decided to convert WebVTT chapters to Matroska chapters.

Steve Lhomme

unread,

Feb 4, 2012, 8:02:36 AM2/4/12

to webm-d...@webmproject.org

On Fri, Feb 3, 2012 at 10:28 PM, Matthew Heaney
<matthew...@google.com> wrote:
> On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> I am not sure about the codec ID S_TEXT/VTT/kind
>> "S_" stands for subtitles.
>
> What do you use for SRT? Why are you treating WebVTT subtitles
> differently from SRT subtitles?

We use this: http://www.matroska.org/technical/specs/subtitles/srt.html
In short, the juicy information is used in a S_TEXT/UTF8 track.

>> At some point we had something called 'control tracks', such would have had
>> a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and
>> subtitle data. D is for data.
>
> To clarify: you think the codec ID should be D_TEXT/VTT/kind then?

No, whatever can be mapped to existing things should be done that way.
Other things (like temporal data) should use new tracks IDs. For
example D_GPS/VTT for a track that has GPS data in the format used in
WebVTT.

>> I also think it is wrong to use the type 0x11
>> for all these WebVTT data as they are not subtitles.
>
> If not 0x11, then what value?

http://www.matroska.org/technical/specs/index.html#TrackType
You could use 0x20 or something not defined in there.

>> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
>> codec ID should be used for broader compatibility.
>
> Are SRT subtitles "just strings"?

See the link above.

>> Chapters inside a track are not very useful. And using CodecPrivate will
>> completely break after a remuxing/editing.
>
> OK, we have tentatively decided to convert WebVTT chapters to Matroska chapters.

\o/

Silvia Pfeiffer

unread,

Feb 4, 2012, 8:23:20 AM2/4/12

to webm-d...@webmproject.org

On Sun, Feb 5, 2012 at 12:02 AM, Steve Lhomme <slh...@matroska.org> wrote:
> On Fri, Feb 3, 2012 at 10:28 PM, Matthew Heaney
> <matthew...@google.com> wrote:
>> On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>>
>>> I am not sure about the codec ID S_TEXT/VTT/kind
>>> "S_" stands for subtitles.
>>
>> What do you use for SRT? Why are you treating WebVTT subtitles
>> differently from SRT subtitles?
>
> We use this: http://www.matroska.org/technical/specs/subtitles/srt.html
> In short, the juicy information is used in a S_TEXT/UTF8 track.
>
>>> At some point we had something called 'control tracks', such would have had
>>> a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and
>>> subtitle data. D is for data.
>>
>> To clarify: you think the codec ID should be D_TEXT/VTT/kind then?
>
> No, whatever can be mapped to existing things should be done that way.
> Other things (like temporal data) should use new tracks IDs. For
> example D_GPS/VTT for a track that has GPS data in the format used in
> WebVTT.

Is the "S_" and "D_" indication used anywhere or are these IDs used as
mime type equivalents? I.e. are there any apps that care about all
"S_" type tracks rather than "S_TEXT/UTF8" as the identifier of a
subtitle track? Other than applications that would count how many
subtitle tracks a video collection typically has, I fail to see a use
case for separating out the "subtitle" information from the format
identifying information.

Basically, I am asking because I wonder if we'd rather just have
something like "D_TEXT/VTT/kind" for all kinds of WebVTT tracks, or
instead have "S_TEXT/VTT" (for subtitles) and "C_TEXT/VTT" (for
captions) and "D_TEXT/VTT" (for description), and "M_TEXT/VTT/type"
(for metadata with type providing further information on the cue
content format).

Cheers,
Silvia.

Steve Lhomme

unread,

Feb 5, 2012, 10:51:58 AM2/5/12

to webm-d...@webmproject.org

On Sat, Feb 4, 2012 at 2:23 PM, Silvia Pfeiffer
<silviap...@gmail.com> wrote:
>>> To clarify: you think the codec ID should be D_TEXT/VTT/kind then?
>>
>> No, whatever can be mapped to existing things should be done that way.
>> Other things (like temporal data) should use new tracks IDs. For
>> example D_GPS/VTT for a track that has GPS data in the format used in
>> WebVTT.
>
>
> Is the "S_" and "D_" indication used anywhere or are these IDs used as
> mime type equivalents? I.e. are there any apps that care about all
> "S_" type tracks rather than "S_TEXT/UTF8" as the identifier of a

These names are made human readable to make it easy to search for the
proper codec when you don't know what it is. The type prefix is just a
naming convention.

> subtitle track? Other than applications that would count how many
> subtitle tracks a video collection typically has, I fail to see a use
> case for separating out the "subtitle" information from the format
> identifying information.
>
> Basically, I am asking because I wonder if we'd rather just have
> something like "D_TEXT/VTT/kind" for all kinds of WebVTT tracks, or
> instead have "S_TEXT/VTT" (for subtitles) and "C_TEXT/VTT" (for
> captions) and "D_TEXT/VTT" (for description), and "M_TEXT/VTT/type"
> (for metadata with type providing further information on the cue
> content format).

Yes, each part should be separated and mapped to existing Matroska
structures to describe the same things. For example S_TEXT/VTT would
not exist, it's just S_TEXT/UTF8 with the same mapping as SRT.
Captions are handled like a subtitle track. Chapters would not go
inside a track and metadata probably not either.

> Cheers,
> Silvia.
>
> --
> You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to webm-discuss...@webmproject.org.
> For more options, visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>

--

Silvia Pfeiffer

unread,

Feb 5, 2012, 11:56:43 PM2/5/12

to webm-d...@webmproject.org

We're talking about timed metadata here, so it would. I agree that
header-style metadata and chapters should not go in a track.

I hadn't really expected that the encapsulation would explicitly know
about the kind that goes into the the track, but if there is already a
convention that subtitle/caption tracks start with "S_" and "D_" is
data (not "M_" as I mentioned above), then we should likely stick with
that. What do we do with descriptions? They are time-aligned text that
the screen reader turns into voice. Since "D_" is taken, maybe we
should use "A_" for "audio description"?

Silvia.

Silvia Pfeiffer

unread,

Feb 6, 2012, 12:08:06 AM2/6/12

to webm-d...@webmproject.org

On Mon, Feb 6, 2012 at 3:56 PM, Silvia Pfeiffer

Another question that just came to mind: if the codec ID field does
not contain information on what format the content will be in, how
does the decoder know which code path to use? WebVTT has markup that
is different from SRT, so we can't just pretend it is the same. I
think it would need to be "S_WEBVTT" or "S_TEXT/VTT".

Silvia.

Matthew Heaney

unread,

Feb 7, 2012, 6:36:11 PM2/7/12

to webm-d...@webmproject.org

On Sat, Feb 4, 2012 at 8:02 AM, Steve Lhomme <slh...@matroska.org> wrote:
>
> No, whatever can be mapped to existing things should be done that way.

But that's an argument for using "S_TEXT/VTT/kind" as the codec ID.

> Other things (like temporal data) should use new tracks IDs. For
> example D_GPS/VTT for a track that has GPS data in the format used in
> WebVTT.

There is no such thing as "temporal metadata". Yes there has been
interest in specifying some mechanism for carrying temporal metadata
in WebM, and in particular for specifying GPS payload that can vary
over time, but this is a separate discussion. Our immediate goal is
very narrow: decide how to embed WebVTT files in a WebM file.

>>> I also think it is wrong to use the type 0x11
>>> for all these WebVTT data as they are not subtitles.
>>
>> If not 0x11, then what value?
>
> http://www.matroska.org/technical/specs/index.html#TrackType
> You could use 0x20 or something not defined in there.

Well this strikes me as is a distinction without a difference. But I
have no particular opinion about what the correct value should be.
(To be honest: I don't know that a "control" track type is.)

-Matt

Silvia Pfeiffer

unread,

Feb 7, 2012, 6:46:07 PM2/7/12

to webm-d...@webmproject.org

So is the TrackType repeating what the ID is getting? I mean: if
TrackType is set to 0x12 for a WebVTT rack with kind=subtitles (or
captions), then the ID has to start with "S_"?

Silvia.

James Zern

unread,

Feb 7, 2012, 8:39:14 PM2/7/12

to WebM Discussion

This page is up [1]. For now it only contains edits to the supported
elements. For official tags should we similarly use Matroska as the
basis and pare down where necessary?

[1] http://wiki.webmproject.org/webm-metadata/global-metadata

Basil Mohamed Gohar

unread,

Feb 7, 2012, 9:09:01 PM2/7/12

to webm-d...@webmproject.org

If the Matroska spec support enough official tags to cover the uses that
WebM needs, then it seems like the obvious choice to me.

Steve Lhomme

unread,

Feb 12, 2012, 11:15:31 AM2/12/12

to webm-d...@webmproject.org

Le 06/02/2012 05:56, Silvia Pfeiffer a écrit :
> On Mon, Feb 6, 2012 at 2:51 AM, Steve Lhomme<slh...@matroska.org> wrote:
>> Yes, each part should be separated and mapped to existing Matroska
>> structures to describe the same things. For example S_TEXT/VTT would
>> not exist, it's just S_TEXT/UTF8 with the same mapping as SRT.
>> Captions are handled like a subtitle track. Chapters would not go
>> inside a track and metadata probably not either.
>

> I hadn't really expected that the encapsulation would explicitly know
> about the kind that goes into the the track, but if there is already a
> convention that subtitle/caption tracks start with "S_" and "D_" is
> data (not "M_" as I mentioned above), then we should likely stick with
> that. What do we do with descriptions? They are time-aligned text that
> the screen reader turns into voice. Since "D_" is taken, maybe we
> should use "A_" for "audio description"?

Technically any subtitle codec that is text based could be used for text
to speech. S_TEXT/UTF8 would be the easiest candidate. The other ones
have some presentation data in them but can always be stripped to only
have the text.

Creating an audio codec which in fact only has text inside is also
doable. The container doesn't need to know about that trick.

Steve Lhomme

unread,

Feb 12, 2012, 11:17:45 AM2/12/12

to webm-d...@webmproject.org

Le 06/02/2012 06:08, Silvia Pfeiffer a écrit :
> Another question that just came to mind: if the codec ID field does
> not contain information on what format the content will be in, how
> does the decoder know which code path to use? WebVTT has markup that
> is different from SRT, so we can't just pretend it is the same. I
> think it would need to be "S_WEBVTT" or "S_TEXT/VTT".

What do you call "markup" ? If the only usable information are
start/stop timecodes and the text to render, then S_TEXT/UTF8 is fine.

Steve Lhomme

unread,

Feb 12, 2012, 11:20:28 AM2/12/12

to webm-d...@webmproject.org

Le 08/02/2012 00:36, Matthew Heaney a écrit :
> On Sat, Feb 4, 2012 at 8:02 AM, Steve Lhomme<slh...@matroska.org> wrote:
>>
>> No, whatever can be mapped to existing things should be done that way.
>
> But that's an argument for using "S_TEXT/VTT/kind" as the codec ID.

I don't know enough about VTT to tell if it fits exactly into something
pre-existing.

>>>> I also think it is wrong to use the type 0x11
>>>> for all these WebVTT data as they are not subtitles.
>>>
>>> If not 0x11, then what value?
>>
>> http://www.matroska.org/technical/specs/index.html#TrackType
>> You could use 0x20 or something not defined in there.
>
> Well this strikes me as is a distinction without a difference. But I
> have no particular opinion about what the correct value should be.
> (To be honest: I don't know that a "control" track type is.)

A control track can, for example, tell the player to switch on/off a
track, seek to another position in the file, etc. It "controls" the
playback. There are no real life example for this, but it could be done.
For now we have done it using chapter "codecs" which is more flexible.

Steve Lhomme

unread,

Feb 12, 2012, 11:21:39 AM2/12/12

to webm-d...@webmproject.org

Le 08/02/2012 00:46, Silvia Pfeiffer a écrit :
> So is the TrackType repeating what the ID is getting? I mean: if
> TrackType is set to 0x12 for a WebVTT rack with kind=subtitles (or
> captions), then the ID has to start with "S_"?

No, it's just a convention for users to know what codec they are
supposed to have.

Silvia Pfeiffer

unread,

Feb 12, 2012, 5:28:29 PM2/12/12

to webm-d...@webmproject.org

With markup I mean HTML-style markup: <ruby>, <v>, <c>, <b>, <i>, etc.

Silvia.

Steve Lhomme

unread,

Feb 13, 2012, 3:33:54 AM2/13/12

to webm-d...@webmproject.org

OK, these are not supported (although I'm told some SRT files have <i>
and a DirectShow filter handles them).

So "S_TEXT/VTT" would be the nicest codec ID, IMO.

Ralph Giles

unread,

Feb 14, 2012, 7:09:03 PM2/14/12

to webm-d...@webmproject.org

On 13 February 2012 00:33, Steve Lhomme <slh...@matroska.org> wrote:

> So "S_TEXT/VTT" would be the nicest codec ID, IMO.

Yes. S_TEXT/UTF8 only makes sense if the VTT file has no markup.

Is there another way to indicate the 'kind' attribute for the track if
you don't like the S_TEXT/VTT/<kind> codec id? It makes sense to use a
different Track Type for metadata than for subtitles, but what about
distinguishing captions and descriptions from subtitles? Should
parsers just look for the header metadata in the CodecPrivate element?

-r

--
Ralph Giles
Xiph.org Foundation for open multimedia

Frank Galligan

unread,

Feb 15, 2012, 10:01:49 AM2/15/12

to webm-d...@webmproject.org

Matt updated the proposal on the wiki. http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm

The contents of the WebVTT file is stored as its own WebM track. The information that would appear as attributes of the HTML5 track tag can be embedded in WebM Track element as follows:

The TrackType sub-element value is 0x11 for WebVTT SUBTITLES and CAPTIONS, and 0x21 for WebVTT DESCRIPTIONS, CHAPTERS, and METADATA.
The label attribute is stored as the Name sub-element.
The srclang attribute is stored as the Language sub-element.

Per the convention (see [MKVCODECID]) used for flavors of a particular video or audio codec, the CodecID for a WebVTT track is "D_WEBVTT/kind", where kind is one of SUBTITLES, CAPTIONS, DESCRIPTIONS, CHAPTERS, or METADATA.

So the idea is that parsers that care about subtitles but not necessarily WebVTT can just check the TrackType element for the value 0x11 and then check if the client has a decoder that handles "D_WEBVTT/kind".

Then parsers that care about WebVTT but not other types of subtitles can check the CodecID for "D_WEBVTT/kind".

Frank

Silvia Pfeiffer

unread,

Feb 15, 2012, 3:30:16 PM2/15/12

to webm-d...@webmproject.org, webm-d...@webmproject.org

I like this.

Cheers,

Silvia.

James Zern

unread,

Feb 21, 2012, 9:06:58 PM2/21/12

to WebM Discussion

The element definitions will be put into the main format guide soon,
probably by the end of the week.
The official tags have been pared down slightly to start with to
remove some audio / commercial focused values. These will be restored
and new ones added on an as needed basis.

> [1] http://wiki.webmproject.org/webm-metadata/global-metadata

Message has been deleted

Steve Lhomme

unread,

Feb 26, 2012, 5:47:28 AM2/26/12

to webm-d...@webmproject.org

I don't understand the point of putting CHAPTERS in a muxed track,
unless we have another definition of what chapters are. If it's just
to let the user know in what part of the movie/content it's in, OK. If
the plan is to deliver like a table of contents, then it is bogus. The
same may go with METADATA.

--

Steve Lhomme

unread,

Feb 26, 2012, 6:28:25 AM2/26/12

to webm-d...@webmproject.org

I just read this document, it's a bit clearer now:
http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm

# the same in stream/out of stream differenciation should be done for metadata

# the BlockGroup+BlockDuration approach seems to be the correct one

# file-wide data should be stored in the codec private, the language
should be extracted. I'm not sure it should be removed.

# the Default Cue data seem to manage a codec state
(http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
so it should be in a BlockGroup. Having a null duration seems good too
(if it's wrongly interpreted, it should not render anyway). So using a
SimpleBlock is OK too.
Given it's a state that is preserved, maybe further Blocks of data
could reference this Default Block (like a P frame). But it becomes
impossible if there are different default state for different
parameters changing over time.
Maybe the only clean solution would be to "expand" the default values
in every block. After all it's just a compression of data (for the
writer). When extracting, a smart program could "compress" back the
data using default states too.

Matthew Heaney

unread,

Mar 1, 2012, 5:58:01 PM3/1/12

to webm-d...@webmproject.org, Steve Lhomme

On Sun, Feb 26, 2012 at 6:28 AM, Steve Lhomme <slh...@matroska.org> wrote:
> I just read this document, it's a bit clearer now:
> http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm
>

> # the same in stream/out of stream differentiation should be done for metadata

Can you elaborate on this a bit? It's not clear what you mean.

> # the BlockGroup+BlockDuration approach seems to be the correct one

OK

> # file-wide data should be stored in the codec private, the language
> should be extracted. I'm not sure it should be removed.

That part of the WebVTT spec hasn't been finalized, so putting it in
the CodecPrivate is just we could do it, if file-wide metadata is
standardized.

> # the Default Cue data seem to manage a codec state
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
> so it should be in a BlockGroup. Having a null duration seems good too
> (if it's wrongly interpreted, it should not render anyway). So using a
> SimpleBlock is OK too.

We just need to figure out how to handle the timestamp of the block.

> Given it's a state that is preserved, maybe further Blocks of data
> could reference this Default Block (like a P frame).

Right. But to make this work, we might need some notion of an
I-frame, in which we collect all the current defaults, and write a
proper non-default WebVTT cue.

> But it becomes
> impossible if there are different default state for different
> parameters changing over time.
> Maybe the only clean solution would be to "expand" the default values
> in every block.

That's another option -- when you embed a WebVTT cue, you write a proper cue.

It does mean that the muxer would have to be a state machine, with
intimate knowledge of WebVTT cue syntax, in order to synthesize a
proper (non-default) cue value. This is the same thing a WebVTT
renderer (or "decoder") must do, so perhaps this can be reused in
library form.

> After all it's just a compression of data (for the
> writer). When extracting, a smart program could "compress" back the
> data using default states too.

Right.

-Matt

Silvia Pfeiffer

unread,

Mar 1, 2012, 7:11:22 PM3/1/12

to webm-d...@webmproject.org

On Sun, Feb 26, 2012 at 10:28 PM, Steve Lhomme <slh...@matroska.org> wrote:
> # the Default Cue data seem to manage a codec state
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
> so it should be in a BlockGroup. Having a null duration seems good too
> (if it's wrongly interpreted, it should not render anyway). So using a
> SimpleBlock is OK too.
> Given it's a state that is preserved, maybe further Blocks of data
> could reference this Default Block (like a P frame). But it becomes
> impossible if there are different default state for different
> parameters changing over time.
> Maybe the only clean solution would be to "expand" the default values
> in every block. After all it's just a compression of data (for the
> writer). When extracting, a smart program could "compress" back the
> data using default states too.

We need to be careful about the default cue settings. They have not
been accepted into the WebVTT specification and are just a proposal at
this stage.

When encoding them, I would prefer them to be header-like data rather
than replicated across each cue, which both takes up more space and is
harder to identify as a default setting.

Cheers,
Silvia.

Silvia Pfeiffer

unread,

Mar 1, 2012, 7:20:02 PM3/1/12

to webm-d...@webmproject.org, Steve Lhomme

On Fri, Mar 2, 2012 at 9:58 AM, Matthew Heaney
<matthew...@google.com> wrote:
> On Sun, Feb 26, 2012 at 6:28 AM, Steve Lhomme <slh...@matroska.org> wrote:
>
>> # the Default Cue data seem to manage a codec state
>> (http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
>> so it should be in a BlockGroup. Having a null duration seems good too
>> (if it's wrongly interpreted, it should not render anyway). So using a
>> SimpleBlock is OK too.
>
> We just need to figure out how to handle the timestamp of the block.
>
>
>> Given it's a state that is preserved, maybe further Blocks of data
>> could reference this Default Block (like a P frame).
>
> Right. But to make this work, we might need some notion of an
> I-frame, in which we collect all the current defaults, and write a
> proper non-default WebVTT cue.
>
>
>> But it becomes
>> impossible if there are different default state for different
>> parameters changing over time.
>> Maybe the only clean solution would be to "expand" the default values
>> in every block.
>
> That's another option -- when you embed a WebVTT cue, you write a proper cue.

Actually, if default cue settings were used within the file rather
than just as header-type data, they would need to follow the cue
parsing convention, so could easily also be embedded as a cue.

> It does mean that the muxer would have to be a state machine, with
> intimate knowledge of WebVTT cue syntax, in order to synthesize a
> proper (non-default) cue value. This is the same thing a WebVTT
> renderer (or "decoder") must do, so perhaps this can be reused in
> library form.

The only problem would be if you're seeking through a file and miss
that cue, so your settings on the cues that come thereafter may be
wrong. For this case we could indeed have a "i-frame" type link back
to the last default cue settings cue. Or alternatively we could move
such cues to the header of the file. Or finally, if we really can't
avoid it, we could indeed add these cue settings to every cue
thereafter. But we'd need to do some of the cue settings
interpretation during muxing in this case, because some of the default
settings may be overwritten by a cue. I'd prefer to avoid that last
option.

Silvia.

Reply all

Reply to author

Forward