One requirement we have is the support for temporal metadata to store
things like geolocation data, e.g., GPS coordinates. The other is the
traditional global metadata, e.g., title, author, etc.
Initially we were thinking to define a separate standard for both
global and temporal, which would be stored in its own track. After a
bit of discussion it came to feel as though global was merely a
special case of the temporal, which we knew we wanted.
Consider some examples:
i) A video clip with an audio soundtrack that changes sources through the clip.
With one audio and one video track it would be simple enough to have
an artist label that covered each or both. With multiple audio sources
in one track you could similarly have an artists tag. This though
begins to decouple the attribution for each segment.
ii) Appending 2 or more unrelated videos.
This is a similar problem. You get into an issue with having to merge
metadata from each segment into one global blob. At that point which
takes precedence as the 'artist' for the clip?
iii) Live presentations.
Once more the performer, etc. can change throughout the clip, like a
shoutcast stream.
It is true that a global mapping could be constructed to handle the
above cases, but it seems that it might be simpler to have a timed
metadata track that could indicate the duration that the value applied
to. Global or live metadata could simply have no duration to indicate
it was for the entire clip or until a new value was encountered (for
the live case).
Does anyone have any thoughts on this? Is there any reason we
shouldn’t explore a single metadata solution using a timed track?
I see the argument for the global metadata being a special case of the
temporal, but in the global case, there is usually a high likelihood
that the metadata is always in the same place (e.g., either at the
beginning or the end of the file) which yields a much, much simpler
method of extraction for client software.
In the temporal case, I suppose it could be constructed in such a way
that, in the case that it doesn't real change, then it would effectively
behave as the global case - that is, it would always be located in the
same place, and remain easily extracted.
I would suggest something along the lines of temporal data being
considered updates to existing metadata, and if a previous value is not
marked as having been replaced by subsequent metadata, then it should
persist as long as its associated tracks do.
Finally, and I'm sure this has already been considered, but what already
exists within the Matroska world with regards to metadata? I would
strongly urge that whatever is developed be made as compatible as
possible with what is already out there.
Semantically this sounds similar to special casing the global
metadata. It sounds as though you're more concerned about where the
base data lives in the file and how to extract that.
> Finally, and I'm sure this has already been considered, but what already
> exists within the Matroska world with regards to metadata? I would
> strongly urge that whatever is developed be made as compatible as
> possible with what is already out there.
Matroska defines a hierarchical tag system [1], which allows tags to
be attached to tracks or chapters. Note chapters are not part of the
webm specification currently.
I agree that using an existing system for which there are tools to
manipulate the data is usually best, which is part of the reason for
this thread. With tags, for instance though, I think we'd have to
extend them to reference clusters or blocks. That or an excessive
amount of chapters might be needed.
Beyond this, will all tag systems, Matroska tags, XMP, etc. we will
need to have a list of well-known entities as well as define new
entities where we see a need. In this case some of the tools for
extraction will need to be updated as well. Given we're talking about
a web format, should we use these as a reference and provide a
solution that fits better in this framework rather than worry about
compatibility?
There, we have defined WebVTT as a file format for external
time-synchronized data (including captions, subtitles, descriptions,
and - yes - general metadata).
In the past it has been stated that the idea for captions in WebM
would be to use the solution that is being used in HTML5 and
encapsulate it into WebM, i.e. encapsulate WebVTT into WebM. Since
WebVTT is similar to SRT and there are existing specs for how to
encapsulate SRT into Matroska, putting WebVTT into WebM should be
possible in a similar manner.
I would be totally supportive of solving the timed metadata challenge
with WebVTT and by creating a WebVTT encapsulation spec for WebM.
As for the non-timed metadata: Matroska also has a solution for
header-style metadata, i.e. metadata such as id3 tags that need to be
available to seach in apps such as iTunes. I would suggest building on
that existing solution.
HTH.
Cheers,
Silvia.
> --
> You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to webm-discuss...@webmproject.org.
> For more options, visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
I see the argument for the global metadata being a special case of the
temporal, but in the global case, there is usually a high likelihood
that the metadata is always in the same place (e.g., either at the
beginning or the end of the file) which yields a much, much simpler
method of extraction for client software.
In the temporal case, I suppose it could be constructed in such a way
that, in the case that it doesn't real change, then it would effectively
behave as the global case - that is, it would always be located in the
same place, and remain easily extracted.
I would suggest something along the lines of temporal data being
considered updates to existing metadata, and if a previous value is not
marked as having been replaced by subsequent metadata, then it should
persist as long as its associated tracks do.
Finally, and I'm sure this has already been considered, but what already
exists within the Matroska world with regards to metadata? I would
strongly urge that whatever is developed be made as compatible as
possible with what is already out there.
For "global" metadata, we use Matroska's existing features for
"tag"-style information. This would require basically zero modification
to existing implementations that already support.
For temporal metadata, as this is a new feature, this WebVTT
encapsulation seems to be appropriate. It would not conflict with
anything, and support for temporal metadata already requires new code to
be written, and it would be one less bit of technology to have to deal
with (namely, HTML5 is already pushing for this format for metadata of a
similar nature).
Now, does the HTML5 spec itself mandate that the metadata actually be
external to the stream, or just that it be stored separately in WebVTT
format (allowing it to be encapsulated and/or extracted)? This is more
a question for Silvia.
On Wed, Jan 18, 2012 at 16:51, Silvia Pfeiffer
<silviap...@gmail.com> wrote:
> Talking about metadata and timed metadata: have you considered how
> this hooks into what is being done in HTML5?
>
> There, we have defined WebVTT as a file format for external
> time-synchronized data (including captions, subtitles, descriptions,
> and - yes - general metadata).
>
Yes this is what initially got our discussion going. Using something
along these lines we'd then only need one implementation for metadata
rather than two.
> Does anyone have any thoughts on this? Is there any reason we
> shouldn�t explore a single metadata solution using a timed track?
In 99% of all use cases, you will have only global metadata, so
whatever you do, please make sure that global metadata, aka artist,
album, song title can be extracted extremely fast and easily.
This means making it part of the matroska header section and not
having to parse a data track to access that. Even for a video
that has N different performers, it's annoying to have to seek
through a potentially huge file to access the metadata only,
the file could be on some slow HTTP connection after all..
The TextTrack API for HTML5 supports both in-band and external text
tracks. WebVTT is the format that all browsers have basically agreed
to implement support for as the baseline external text track file
format. On top of that, the TextTrack API also interfaces with text
tracks provided in-band. They are basically mapped to the same data
structure as is being parsed from a WebVTT file. What format browsers
use there is still up in the air, but since we have the choice for
WebM it makes sense to pick WebVTT itself.
For some side information:
1. I've heard MPEG people talk about defining a encapsulation means
for WebVTT in MPEG-4, too.
2. I also think that WebVTT in Ogg would be simple and should be
supported. We could even re-use KATE tracks for WebVTT encapsulation.
Cheers,
Silvia.
I think it would be a mistake to deal with them in the same manner.
Applications such as iTunes and other media players as well as asset
management systems typically find metadata in a defined location in a
media file without having to seek, without having to parse more than
the first few KB of data, and without having to parse all text tracks
to determine where to find metadata.
Thus, the disadvantages of having a single method of dealing with
metadata in my mind far outweigh the advantages. Header-style metadata
(i.e. "tags") and timed metadata are two fundamentally different types
of data and should be handled in two separate ways.
In Ogg we solved the problem of having changing file-wide metadata by
allowing chaining of media files, i.e. essentially concatenating media
resources. I am not sure that would be possible with WebM nor whether
that is a good solution. I would actually much prefer introducing the
concept of playlists for this situation and managing it in this way.
It is a problem that I don't think we should solve on the file level.
Cheers,
Silvia.
> Header-style metadata
> (i.e. "tags") and timed metadata are two fundamentally different types
> of data and should be handled in two separate ways.
I generally agree with this. File level metadata like title, creator,
language, license, etc. have a different use case. Moreover, Matroska
already has an established tag mechanism which many media players
already know. I am in favour of using that encapsulation for WebM over
a custom format, or the XMP blob idea mentioned on IRC.
That doesn't give a fixed offset for the metadata, but one can at
least require it to be at the start (or end) of the file so the
buffering requirements are minimal. The ebml parser subset needed is
quite small.
For Ogg we did define a baseline semantic vocabulary for tags. I think
that was helpful for interoperability, so I would support publishing
such a set for WebM. Afaict Matroska itself doesn't itself do so?
Metadata of this type is generally for the convenience of indexers
(online or in a media player). For example, our media code in Firefox
completely ignores the tags. I'm not sure it's especially necessary to
provide an interface for this either, since one can parse the resource
directly in javascript.
I'm also excited to hear you're interested in timed metadata. For that
I would point out the kind=metadata variant of webvtt.
Timed metadata is always going to be a niche application, and needs
specific application support to be useful. We're already implementing
webvtt as a format for HTML5 <track> elements, so it's easy to attach
to files; the specification also allows embedding the same data in the
media resource itself. The most webby thing to do would be to use the
same format for subtitles and for timed metadata so we content can use
the same interface to support whatever it wants to. Right now the
interpretation of kind=metadata webvtt is completely up to the
application, but for interoperability we could suggest a general
framework, like json cue text with a defined semantic vocabulary for
common use cases like geolocation, camera parameters and so on.
FWIW,
-r
--
Ralph Giles
Xiph.org Foundation for open multimedia
On 20 January 2012 17:13, Silvia Pfeiffer <silviap...@gmail.com> wrote:I generally agree with this. File level metadata like title, creator,
> Header-style metadata
> (i.e. "tags") and timed metadata are two fundamentally different types
> of data and should be handled in two separate ways.
language, license, etc. have a different use case. Moreover, Matroska
already has an established tag mechanism which many media players
already know. I am in favour of using that encapsulation for WebM over
a custom format, or the XMP blob idea mentioned on IRC.
That doesn't give a fixed offset for the metadata, but one can at
least require it to be at the start (or end) of the file so the
buffering requirements are minimal. The ebml parser subset needed is
quite small.
For Ogg we did define a baseline semantic vocabulary for tags. I think
that was helpful for interoperability, so I would support publishing
such a set for WebM. Afaict Matroska itself doesn't itself do so?
Metadata of this type is generally for the convenience of indexers
(online or in a media player). For example, our media code in Firefox
completely ignores the tags. I'm not sure it's especially necessary to
provide an interface for this either, since one can parse the resource
directly in javascript.
I'm also excited to hear you're interested in timed metadata. For that
I would point out the kind=metadata variant of webvtt.
Timed metadata is always going to be a niche application, and needs
specific application support to be useful. We're already implementing
webvtt as a format for HTML5 <track> elements, so it's easy to attach
to files; the specification also allows embedding the same data in the
media resource itself.
The most webby thing to do would be to use the
same format for subtitles and for timed metadata so we content can use
the same interface to support whatever it wants to. Right now the
interpretation of kind=metadata webvtt is completely up to the
application, but for interoperability we could suggest a general
framework, like json cue text with a defined semantic vocabulary for
common use cases like geolocation, camera parameters and so on.
FWIW,
-r
--
Ralph Giles
Xiph.org Foundation for open multimedia
For example, as Matroska already has a lot of metadata support, and
implementations of Matroska already have good support for that, it seems
like it should be paramount that this also be utilized as extensively as
possible within WebM so as to make support for WebM within the wider
Matroska infrastructure as easy as possible. Yes, WebM has the
distinction of being a Web-prioritized format, but I don't think this
should come at the expense of compatibility with other implementations
of the parent format.
What I mean is, if something that this metadata proposal hopes to
achieve can already be done by existing implementations, and doing so
would basically require little to no work to support, then it should be
the goal to make it be so. If it's not possible (as opposed to being
just not favorable), then fine, we should go ahead and propose new
ideas, but still in a way that wouldn't be "annoying" to existing
implementations.
I hope I don't sound overly hostile, but I want WebM to reach the widest
audience possible, and there is an installed base of Matroska users and
developers already existing, and just dumping a new set of requirements
to implement that incompatible will just cause an annoyance and, in my
humble opinion, will stifle adoption of an otherwise outstanding free
format.
<key points>
So, to summarize my concerns, it seems that the temporal metadata
appears to be a novel idea, and there is nothing significantly
standardized or pre-existing within the Matroska world to support it,
and HTML5 already is standardizing on WebVTT, and there exists an
encapsulation of WebVTT for Matroska already (I hope I got that right),
so the way forward for this seems clear - use and/or build on the
existing support for WebVTT in Matroska, in a way that is compliant with
the spirit and the letter of the HTML5 standard for temporal metadata.
As for so-called global and non-temporal metadata, I find (again, as a
non-developer and a user only) no reason to not use Matroska's existing
infrastructure for metadata/tag support. It already works, and doing
something else seems foolish. On top of that, Matroska *does* seem to
have published some guidelines as to a standard set of metadata, and we
should take as much as possible from that, so as to take advantage of
existing infrastructure.
</key points>
Could someone better in the know point out to me what, if anything,
seems faulty in the above two paragraphs that needs to be addressed, or
is a case of "devil in the details", and we all agree on the basic idea
of not reinventing the wheel, just making sure it's round enough? :)
Yes I think so. We can now handle these in two separate tracks and
eventually start a new discussion around webvtt metadata and how we
might extend or update the global entries using it.
That's fair. Some of the initial debate was focused on making life
easier for the browser in extracting/parsing, that's why we brought
the discussion here. I think the cons presented currently out weigh
any pros (presumed or otherwise), however.
> I hope I don't sound overly hostile, but I want WebM to reach the widest
> audience possible, and there is an installed base of Matroska users and
> developers already existing, and just dumping a new set of requirements
> to implement that incompatible will just cause an annoyance and, in my
> humble opinion, will stifle adoption of an otherwise outstanding free
> format.
>
> <key points>
> So, to summarize my concerns, it seems that the temporal metadata
> appears to be a novel idea, and there is nothing significantly
> standardized or pre-existing within the Matroska world to support it,
> and HTML5 already is standardizing on WebVTT, and there exists an
> encapsulation of WebVTT for Matroska already (I hope I got that right),
Not quite yet, that will be coming in parallel with some of this.
> Right. I misremembered something Silvia mentioned earlier in the
> thread. *SRT* has an encapsulation, WebVTT is *like* SRT, so it should
> be *possible*. I hope I got *that* right now...
That's my understanding as well. There's no proposal for how exactly
to encapsulate webvtt in webm, but we should should be able to reuse
the current matroska srt mapping:
http://matroska.org/technical/specs/subtitles/srt.html
An open question is how to handle any of the positioning directives.
So, there are several things at work here:
Firstly we want to make sure WebM has the best possible solution both
for global and timed metadata.
Secondly we want to make sure that it works both with the Web and with
Desktop applications.
Thirdly it would be nice to be able to re-use existing tools that had
been built for Matroska also for WebM.
I must admit that I am not completely opposed to reinventing the wheel
if that means we get a better wheel. I.e. I would take the first goal
over the third goal. However, it has to be very clear that it is
indeed a better solution. For example, if WebM had a much better way
of storing global metadata than the existing Matroska way, I would go
for it - since we want to look forward to the future with WebM and not
back.
Now, what criteria would we use for deciding whether one thing is
better than another? I have some that I care about - do add your own,
cause I'm sure I've overlooked a few.
I believe a good solution must satisfy the following:
* it works well with what HTML5 has specified
* it does not conflict with how existing tools work for Matroska
* the way to use it is obvious: e.g. global metadata is not used for
the same purpose as timed metadata
* it does not confuse people as to what to do: e.g. there is only one
way of doing global metadata
I can't think of anything else right now, so do add your own.
So, back to the technical discussion about global/mutiplexed metadata:
* I agree that WebVTT's timed metadata mechanism is the way to go for
timed metadata in WebM, simply because that's how we do it in HTML5
and there is no mechanism for that in Matroska yet. Thus, as we
determine how to encapsulate captions, subtitles and descriptions in
WebM, we should encapsulate WebVTT metadata in exactly the same way
and use that as the solution for timed metadata in WebM.
* Global metadata is a different issue: it really depends on what we
want to be able to handle.
Is the global metadata that we want to support per track? If so, then
it relates to more than just the text tracks and needs to also be able
to be carried for audio and video tracks. In Ogg we do this with
skeleton by having message header fields, and also by having
VorbisComment headers on each track. This is useful because as you rip
out certain tracks from a multitrack resource, you retain the metadata
that relates to the individual track.
Or is it really global metadata that we want to support here? Then it
should logically be carried independently of a track (independently of
WebVTT tracks, too, which can have their own separate metadata). In
Ogg we don't actually have a means for such global metadata.
Matroska's existing metadata (tagging) mechanism is very general: it
allows to provide tags that either belong to a track, to the full
resource, to a chapter, to a edition, or an attachment (IIUC). This
may be too generic for us to support in WebM - in particular it seems
to allow mixing track-based metadata with truely global metadata,
which may be too confusing.
So, overall, I don't think we've properly specified the problem yet
that we want to solve.
Also note that in HTML5 we actually don't have a way to expose global
metadata to JavaScript (not truely global, not track-based global
metadata): there is no API. The only thing that HTML5 knows right now
is timed metadata through WebVTT. I would like to see such an API,
which would also be used to e.g. expose ID3 tags for MPEG-based
content to JavaScript, but I don't have high hopes for it because
there also is no API to expose image metadata (e.g. EXIM) to
JavaScript. The point that has been made in the past is that if
JavaScript needs access to such metadata, the server should extract it
from the file and hand it to the Web app rather than JS doing it
itself.
Cheers,
Silvia.
--
When defining global metadata I think we should go with most people think of. Most people think of global metadata as "title", "creator", "license", .. These are tied to the file. I do understand that if you tied metadata to the track you satisfy some smaller use cases requirements (e.g. adding a music track to a video sequence), but for most uses cases you are going to be adding complexity and rules to define how global metadata should be resolved.
I think we should define global metadata as:* Pertaining to the duration of the file (or live stream). I.e. not to individual streams.* Not temporal.
Anything else anyone want to add?
This is a lot harder to parse with tools than the structured approach where the track is called out explicitly in a field.
Why not use the field when it already exists?
Have you come across tools that ignore that field for MKV files?
If be bury the fact that a certain name-value metadata pair applies to
a track rather than the full file in the name of the name-value pair,
then it becomes difficult to determine what that name-value pair
applies to. For example:
name="audio track 1 creator"
value="John Smith"
rather than
name="creator"
value="John Smith"
track="audio track 1"
I was under the impression that you were suggesting the first option.
>> Why not use the field when it already exists?
>
> Just because something exists doesn't necessarily means it is a good idea to
> use it.
Sure.
> If we think it is a good idea to allow Tracks to have separate metadata from
> themselves and the file metadata that adds complexity to the whole system.
Sure, I was replying under the assumption that we wanted to support it
and your suggestion was to include it in the "name" field to which I
object. If we don't want to support it, then sure we don't need this.
But in this case we also should not try to support a hack for
track-related metadata.
> E.g. if you take your example of adding an API to HTML5 to get the metadata.
> With only file metadata the api might just return a list of name value
> pairs. With file and tracks metadata the api must have a way of
> distinguishing the file metadata vs all of the tracks that may have
> metadata.
Right. There is one more field to parse. And that field already exists
in the Matroska Tags.
We could certainly instead decide not to support that extra field and
instead expect the tracks themselves to take care of their metadata.
For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.
That's another option for supporting track-related metadata. IIUC
VorbisComment headers are already included in Vorbis tracks. So,
track-related metadata would be another parsing step away, but it
would be possible already.
> The documentation becomes more complex for developers/users as now
> they need to know what a media file is and what a stream/track is. We may
> also have to define precedence rules. E.g. what happens if the file and
> tracks define different licenses?
It'a actually already a legal problem that a combination of content
can have a different license to the individual components. This would
be a way to represent the actual legal situation.
> Also it is much easier to add a feature to a spec or API later then it is to
> deprecate one.
Yes, sure. I didn't want to say that we have to do it. I was just
objecting to the proposed solution.
We should probably make a collection of Matroska files that have
metadata and inspect them to see what is currently done. I'm pretty
sure VLC supports it, but not sure to what extent.
Cheers,
Silvia.
On Fri, Jan 27, 2012 at 3:18 PM, Frank Galligan <fgal...@google.com> wrote:If be bury the fact that a certain name-value metadata pair applies to
> On Thu, Jan 26, 2012 at 4:25 PM, Silvia Pfeiffer <silviap...@gmail.com>
> wrote:
>>
>> This is a lot harder to parse with tools than the structured approach
>> where the track is called out explicitly in a field.
>
> I don't understand. What is a lot harder to parse?
a track rather than the full file in the name of the name-value pair,
then it becomes difficult to determine what that name-value pair
applies to. For example:
name="audio track 1 creator"
value="John Smith"
rather than
name="creator"
value="John Smith"
track="audio track 1"
I was under the impression that you were suggesting the first option.
Sure.
>> Why not use the field when it already exists?
>
> Just because something exists doesn't necessarily means it is a good idea to
> use it.
Sure, I was replying under the assumption that we wanted to support it
> If we think it is a good idea to allow Tracks to have separate metadata from
> themselves and the file metadata that adds complexity to the whole system.
and your suggestion was to include it in the "name" field to which I
object. If we don't want to support it, then sure we don't need this.
But in this case we also should not try to support a hack for
track-related metadata.
Right. There is one more field to parse. And that field already exists
> E.g. if you take your example of adding an API to HTML5 to get the metadata.
> With only file metadata the api might just return a list of name value
> pairs. With file and tracks metadata the api must have a way of
> distinguishing the file metadata vs all of the tracks that may have
> metadata.
in the Matroska Tags.
We could certainly instead decide not to support that extra field and
instead expect the tracks themselves to take care of their metadata.
For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.
That's another option for supporting track-related metadata.
IIUC
VorbisComment headers are already included in Vorbis tracks. So,
track-related metadata would be another parsing step away, but it
would be possible already.
It'a actually already a legal problem that a combination of content
> The documentation becomes more complex for developers/users as now
> they need to know what a media file is and what a stream/track is. We may
> also have to define precedence rules. E.g. what happens if the file and
> tracks define different licenses?
can have a different license to the individual components. This would
be a way to represent the actual legal situation.
Yes, sure. I didn't want to say that we have to do it. I was just
> Also it is much easier to add a feature to a spec or API later then it is to
> deprecate one.
objecting to the proposed solution.
We should probably make a collection of Matroska files that have
metadata and inspect them to see what is currently done.
I'm pretty
sure VLC supports it, but not sure to what extent.
> For example, text tracks in WebVTT may have metadata header fields
> that will be encoded in the CodePrivateData as per the spec proposal.
Do you have a link to this proposal?
http://matroska.org/technical/specs/subtitles/srt.html doesn't put
anything in CodecPrivate, but that would be the logical place to store
file header metadata of the sort we've discussed adding to WebVTT.
> That's another option for supporting track-related metadata. IIUC
> VorbisComment headers are already included in Vorbis tracks. So,
> track-related metadata would be another parsing step away, but it
> would be possible already.
This is an excellent point. We should at least offer guidelines about
this. For example, an implementation MAY support both in-stream
metadata like the vorbis comment packet in CodecPrivate and Matroska
tag elements, but the Tag element takes precedence.
In general, I think track-specific metadata is useful for recording
distinguishing information about a track, like the language, kind,
specific authorship information (e.g. translator) and the license if
it differs from the overall work. Otherwise global metadata should be
preferred.
Matthew sent it on a separate thread to webm-discuss:
https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit
Cheers,
Silvia.
On Sat, Jan 28, 2012 at 1:29 AM, Frank Galligan <fgal...@google.com> wrote:
> On Fri, Jan 27, 2012 at 2:39 AM, Silvia Pfeiffer <silviap...@gmail.com>
> wrote:
>
> This brings up another point which we will need to decide about the spec. Do
> we want the WebM spec to support non-standard metadata? I have just always
> assumed yes, but I guess it doesn't hurt to ask.
Yes, I agree we should.
>> We could certainly instead decide not to support that extra field and
>> instead expect the tracks themselves to take care of their metadata.
>> For example, text tracks in WebVTT may have metadata header fields
>> that will be encoded in the CodePrivateData as per the spec proposal.
>> That's another option for supporting track-related metadata.
>
> Not really as we are proposing WebVTT data will be contained in their own
> WebM track . Right now we don't have any facility tying a WebVTT WebM track
> to another WebM track.
That was not what I meant: I didn't think that a WebVTT track would
contain metadata about other tracks. Just about itself.
>> IIUC
>> VorbisComment headers are already included in Vorbis tracks. So,
>> track-related metadata would be another parsing step away, but it
>> would be possible already.
>
> I would argue against this too for supporting track metadata wrt the WebM
> spec. One it would be specific to one type of stream. Two it would add more
> complexity as file global metadata would be stored differently than vorbis
> track metadata. Then if the WebM spec were to add support for
> track specific metadata for streams you would have to
> put reconciliation rules into effect too.
We need those rules already. What if a WebVTT file has metadata and
the WebM file has global metadata: which wins? I would think we want
the global ones to overrule the track ones.
>> It'a actually already a legal problem that a combination of content
>> can have a different license to the individual components. This would
>> be a way to represent the actual legal situation.
>
> So here might be a pro for track specific metadata. Would it resolve the
> problem you mention? Maybe. Assume it did, next question is how much often
> does this use case occur?
>
> I really wouldn't want something in a specification that everyone is going
> to have to support if it is only useful to 0.1% of the time.
Considering the number of mashups online I think the problem is bigger
than 0.1%. But whether we want to solve it in this way is a different
question.
>> We should probably make a collection of Matroska files that have
>> metadata and inspect them to see what is currently done.
>
> Good idea.
>
> Anyone using MKV with Tags today that they would like to share? Might not be
> a lot of people on a WebM specific list but if you know people using
> Matroska please ask them.
I've done a Google search for "filetype:mkv" which takes you to a lot
of dodgy sites. So I am not sure that's the best way to discover
files. Are there no collections of mkv files for validation of format?
Also, is there a site where we could upload such files? Should we
start a collection?
Cheers,
Silvia.
>> For example, text tracks in WebVTT may have metadata header fields
>> that will be encoded in the CodePrivateData as per the spec proposal.
>
> Do you have a link to this proposal?
Having recently looked at Matroska for the Opus mapping, I wrote a
proposal for adding WebVTT to WebM at
https://wiki.xiph.org/MatroskaWebVTT#DRAFT
Comments welcome.
"- CodecID is S_WEBVTT"
"We need some way to signal the 'kind' attribute from the html5
embedding. That is, whether a give track is subtitles, captions,
description, or metadata."
See Matt's proposal [1], S_TEXT/VTT/kind.
We should merge any differences between these and work from one
source. I believe Matt was working on moving the doc to a wiki on
webmproject.
[1]: https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit
Considering the number of mashups online I think the problem is bigger
>> It'a actually already a legal problem that a combination of content
>> can have a different license to the individual components. This would
>> be a way to represent the actual legal situation.
>
> So here might be a pro for track specific metadata. Would it resolve the
> problem you mention? Maybe. Assume it did, next question is how much often
> does this use case occur?
>
> I really wouldn't want something in a specification that everyone is going
> to have to support if it is only useful to 0.1% of the time.
than 0.1%. But whether we want to solve it in this way is a different
question.
I've done a Google search for "filetype:mkv" which takes you to a lot
>> We should probably make a collection of Matroska files that have
>> metadata and inspect them to see what is currently done.
>
> Good idea.
>
> Anyone using MKV with Tags today that they would like to share? Might not be
> a lot of people on a WebM specific list but if you know people using
> Matroska please ask them.
of dodgy sites. So I am not sure that's the best way to discover
files.
Are there no collections of mkv files for validation of format?
Also, is there a site where we could upload such files?
Should we
start a collection?
A lot of comments pop in my mind when reading this...
First of all, the current Matroska tags system is actually the 3rd
version we came up with. It was made simple and flexible as much as we
could, while being generic enough to support all we could think of. In
the case of WebM you probably don't need all the cases, but as for the
rest of the WebM specs, you can define a profile of what you use in that
system and what you don't. For example you don't need the AttachmentUID
target or maybe no Targets at all if you only want tags global to the
Segment.
Chained Matroska files (file concatenation) have tags in each segment,
so the tags remain clean from this operation. In a live stream a new
Segment could be created whenever the metadata need to change and the
Tags put at the front of the live stream.
Temporal vs "global". I think the terminology is important here,
temporal tags like 'live' GPS info should be tracks, there is no
question about that. I am not sure this is the focus of this discussion
though. Even though a standard way to do it could be useful for some at
some point... I consider everything non temporal as a "global" tag,
because it needs to be extracted without parsing/playing the whole file.
Also if you want to give metadata for the first half of a video, you're
not going to repeat the tag info every other second in a track. That's
why Matroska Tags sit on top of Chapters when needed. And whatever
solution you pick in the end, it has to cover this use case.
I don't think the idea of having metadata outside of the content file is
a good idea. All tags could be lost if you move a file (to your phone)
and you forget the tag file (or the upload system only accepts certain
types of file). One the other hand it is a bit faster to parse a lot of
metadata for many files. For on a server, for instance, you still need
to ask which is the tag file to download for a particular file. Not sure
the round trip helps much.
As you can see here, Tags can either be found at the front of the very
bottom of the file: http://www.matroska.org/technical/order/index.html
I don't know if anyone is like me, but I tag my own files with custom
genres and grouping names. So the most common case for me would be that
tags are at the back of the file.
WebM could be bold and force tags to be always at the front. Meaning a
big remux is necessary whenever you modify tags a lot. Among
requirements it should also be mandatory to put the tag name before the
tag value. Also the TagSimple may not be recursive (can be used to add
the Twitter/email info about the artist in that tag, for example).
Matroska tags have free name strings, that's so they can easily be
extended and still be meaningful when displayed to the user, even if the
name is not semantically interpreted. I think any good tag system should
have this kind of extension possible.
Steve
All the files here are properly tagged (although not using any advanced
features of Matroska tags) :
http://www.matroska.org/downloads/test_w1.html
I am not sure about the codec ID S_TEXT/VTT/kind
"S_" stands for subtitles.
At some point we had something called 'control tracks', such would have
had a "C_" prefix. Maybe we could use a "D_" prefix for all non video,
audio and subtitle data. D is for data. I also think it is wrong to use
the type 0x11 for all these WebVTT data as they are not subtitles.
If WebVTT subtitles are just UTF-8 strings, then the regular
S_TEXT/UTF-8 codec ID should be used for broader compatibility.
Chapters inside a track are not very useful. And using CodecPrivate will
completely break after a remuxing/editing.
Not sure it's already supported, the most likely would be GStreamer.
But yes, that's how it should work. I am not sure players that can
handle live streams are OK with new Segments in the stream either.
>> WebM could be bold and force tags to be always at the front. Meaning a big
>> remux is necessary whenever you modify tags a lot. Among requirements it
>> should also be mandatory to put the tag name before the tag value. Also the
>> TagSimple may not be recursive (can be used to add the Twitter/email info
>> about the artist in that tag, for example).
>
> I think this doesn't matter so much. If I'm not mistaken, the general way
> to deal with this is to just leave some extra space where additional tags
> could be, so they could be updated. However, I think the remuxing case
> isn't that annoying because it won't come up that often, and not often
> enough to write the standard around it.
You have to consider cover art. This is usually a JPEG file that is
many KB large. Requiring 50 KB of padding at the front in case you
want padding is not a good idea. In Matroska the cover art is put in
Attachments (usually found at the back of the file, mkclean puts it at
the front). See http://www.matroska.org/technical/cover_art/index.html
--
Steve Lhomme
Matroska association Chairman
--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.
--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.
I guess adaptive streaming is just like concatenating segments, except
they come from different URLs. Also for (live) adaptive streaming I
think the metadata are supposed to be available in the manifest file
(which is reloaded each time for a live stream).
Le 28/01/2012 00:23, Silvia Pfeiffer a écrit :
On Sat, Jan 28, 2012 at 10:09 AM, Ralph Giles<gi...@xiph.org> wrote:
On 27 January 2012 20:39, Silvia Pfeiffer<silviapfeiffer1@gmail.com> wrote:
For example, text tracks in WebVTT may have metadata header fields
that will be encoded in the CodePrivateData as per the spec proposal.
Do you have a link to this proposal?
Matthew sent it on a separate thread to webm-discuss:
https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit
I am not sure about the codec ID S_TEXT/VTT/kind
"S_" stands for subtitles.
At some point we had something called 'control tracks', such would have had a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and subtitle data. D is for data. I also think it is wrong to use the type 0x11 for all these WebVTT data as they are not subtitles.
If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8 codec ID should be used for broader compatibility.
Chapters inside a track are not very useful. And using CodecPrivate will completely break after a remuxing/editing.
--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.
On Mon, Jan 30, 2012 at 4:23 PM, Frank Galligan <fgal...@google.com> wrote:
> Thanks. Do you also know of any solutions using tags with track specific
> ids? I'm trying to come up with use cases.
>
> Frank
>
>
> On Sun, Jan 29, 2012 at 10:14 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> Le 28/01/2012 00:46, Silvia Pfeiffer a écrit :
>>
>>>> Anyone using MKV with Tags today that they would like to share? Might
>>>> not be
>>>> a lot of people on a WebM specific list but if you know people using
>>>> Matroska please ask them.
>>>
>>>
>>> I've done a Google search for "filetype:mkv" which takes you to a lot
>>> of dodgy sites. So I am not sure that's the best way to discover
>>> files. Are there no collections of mkv files for validation of format?
>>> Also, is there a site where we could upload such files? Should we
>>> start a collection?
>>
>>
>> All the files here are properly tagged (although not using any advanced
>> features of Matroska tags) :
>> http://www.matroska.org/downloads/test_w1.html
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "WebM Discussion" group.
>> To post to this group, send email to webm-d...@webmproject.org.
>> To unsubscribe from this group, send email to
>> webm-discuss...@webmproject.org.
>> For more options, visit this group at
>> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org.
> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
--
On 01/29/2012 10:12 AM, Steve Lhomme wrote:
Hello everyone,Hello, Steve. Glad to see participation from the Matroska team on this specific discussion. :)
Does this mean that the way to update metadata for a live stream (e.g., title, creator/performer, etc.) is to place tags at the front of a new segment within the stream? Is this a behavior that's already supported by some/most players? If I'm not mistaken, I believe this is how it works with Ogg as well.
A lot of comments pop in my mind when reading this...
First of all, the current Matroska tags system is actually the 3rd version we came up with. It was made simple and flexible as much as we could, while being generic enough to support all we could think of. In the case of WebM you probably don't need all the cases, but as for the rest of the WebM specs, you can define a profile of what you use in that system and what you don't. For example you don't need the AttachmentUID target or maybe no Targets at all if you only want tags global to the Segment.
Chained Matroska files (file concatenation) have tags in each segment, so the tags remain clean from this operation. In a live stream a new Segment could be created whenever the metadata need to change and the Tags put at the front of the live stream.
This is a compelling argument to make "data" that changes regularly throughout the file be a track, as opposed to just metadata. The think the distinction is do we consider it data in and of itself, or do we look at it solely as data that is describing other data, and we're not interested in it alone. Given your stance, I tend to agree with that view point. Why not just make data that's changing throughout the life of the file be its own track? Then it will have all the timing information we could want. Since the "temporal metadata" would still have to be spread across the file anyway, and it wouldn't have been placed at, say, the head of the file, this shouldn't really cause a lot of complications.
Temporal vs "global". I think the terminology is important here, temporal tags like 'live' GPS info should be tracks, there is no question about that. I am not sure this is the focus of this discussion though. Even though a standard way to do it could be useful for some at some point... I consider everything non temporal as a "global" tag, because it needs to be extracted without parsing/playing the whole file. Also if you want to give metadata for the first half of a video, you're not going to repeat the tag info every other second in a track. That's why Matroska Tags sit on top of Chapters when needed. And whatever solution you pick in the end, it has to cover this use case.
Personally, I totally agree with this. I know that there's been some discussion elsewhere about putting associated information (I think it was subtitles?) in a separate file, and that is for ease of parsing by Javascript, I think. However, it would be nice if WebM itself supported such metadata inside the file, and not require it to be outside.
I don't think the idea of having metadata outside of the content file is a good idea. All tags could be lost if you move a file (to your phone) and you forget the tag file (or the upload system only accepts certain types of file). One the other hand it is a bit faster to parse a lot of metadata for many files. For on a server, for instance, you still need to ask which is the tag file to download for a particular file. Not sure the round trip helps much.
I think this doesn't matter so much. If I'm not mistaken, the general way to deal with this is to just leave some extra space where additional tags could be, so they could be updated. However, I think the remuxing case isn't that annoying because it won't come up that often, and not often enough to write the standard around it.
As you can see here, Tags can either be found at the front of the very bottom of the file: http://www.matroska.org/technical/order/index.html
I don't know if anyone is like me, but I tag my own files with custom genres and grouping names. So the most common case for me would be that tags are at the back of the file.
WebM could be bold and force tags to be always at the front. Meaning a big remux is necessary whenever you modify tags a lot. Among requirements it should also be mandatory to put the tag name before the tag value. Also the TagSimple may not be recursive (can be used to add the Twitter/email info about the artist in that tag, for example).
Thanks again!
Matroska tags have free name strings, that's so they can easily be extended and still be meaningful when displayed to the user, even if the name is not semantically interpreted. I think any good tag system should have this kind of extension possible.
Steve
--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To post to this group, send email to webm-d...@webmproject.org.
To unsubscribe from this group, send email to webm-discuss+unsubscribe@webmproject.org.
UPDATE: I just read the page at the link above, and I think what I'm
referring to is the meta seek, and that having more than one meta seek
in a file is deprecated. So, hopefully someone that understands better
can let me know if both methods are still available or it's just fine to
swallow it and say, "Look, tags go here. Remux if you need more
space." Or perhaps I'm just confused.
>
>
>
> Matroska tags have free name strings, that's so they can
> easily be extended and still be meaningful when displayed to
> the user, even if the name is not semantically interpreted. I
> think any good tag system should have this kind of extension
> possible.
>
> Steve
>
> Thanks again!
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org
> <mailto:webm-d...@webmproject.org>.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org
> <mailto:webm-discuss%2Bunsu...@webmproject.org>.
> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org.
>
>
> On Sun, Jan 29, 2012 at 10:22 AM, Steve Lhomme <slh...@matroska.org> wrote:
>>
>> Le 28/01/2012 00:23, Silvia Pfeiffer a écrit :
>>
>>> On Sat, Jan 28, 2012 at 10:09 AM, Ralph Giles<gi...@xiph.org> wrote:
>>>>
>>>> On 27 January 2012 20:39, Silvia Pfeiffer<silviap...@gmail.com>
>> Matthew sent it on a separate thread to webm-discuss:
>>
>> https://docs.google.com/document/d/1-tVXd1mRlWNvZrdIkLAJEp5xt3gDVDwfVubyUm9oNJ4/edit
Great, thanks.
On 30 January 2012 04:22, Steve Lhomme <slh...@matroska.org> wrote:
> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
> codec ID should be used for broader compatibility.
For kind=subtitle and kind=caption, webvtt can have angle-bracket
markup, so I don't think this will work in general; the decoder needs
to know if it must parse the internal markup.
What do you think about having /kind in the CodecID vs having to look
in codec private to determine this?
> Chapters inside a track are not very useful. And using CodecPrivate will
> completely break after a remuxing/editing.
I agree. I suggested chapters be translated into the equivalent
matroska chapter elements in my draft.
-r
FWIW I agree with the "D_" prefix.
Cheers,
Silvia.
Did you mean MKV chapters? Or .. can you explain the difference
between WebM tracks and MKV tracks?
> 2. All other WebVTT data will be stored in a webm track. The block payload
> would be a WebVTT cue minus the webvtt timing information. The "-->" would
> still be there as a place holder so the WebVTT decoder will now how recreate
> the WebVTT cue.
> 3. No data would be stored in the webm CodecPrivate.
> (This is actually how the proposal first looked.)
>
> I think we can still support live chapters if we treat them just like the
> other webvtt data. So live chapters would get inserted into a webm track and
> VOD chapters would get stored as MKV chapters.
>
> So it will be up to the client demuxer if it wants to translate the WebM
> chapters into WebVTT chapters. It will be the job of a WebVTT decoder to
> handle the WEBVTT tracks in a WebM file.
SGTM.
What about global metadata? Should we just make one spec that deals
with WebVTT and global metadata and chapters?
Cheers,
Silvia.
After getting feedback on the global portion of the metadata the
consensus seems to be for using the Matroska tag system as a base.
This will allow extraction of the data with minimal webm file support
and provide some existing tool support for manipulating these files. I
think using a similar wiki setup, as is being proposed for webvtt
embedding, to decide what is added/removed for webm from the existing
framework will work for this.
If anyone feels strongly about not using matroska tags as the basis
for global metadata, now is the time to speak up.
What do you use for SRT? Why are you treating WebVTT subtitles
differently from SRT subtitles?
> At some point we had something called 'control tracks', such would have had
> a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and
> subtitle data. D is for data.
To clarify: you think the codec ID should be D_TEXT/VTT/kind then?
> I also think it is wrong to use the type 0x11
> for all these WebVTT data as they are not subtitles.
If not 0x11, then what value?
> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
> codec ID should be used for broader compatibility.
Are SRT subtitles "just strings"?
> Chapters inside a track are not very useful. And using CodecPrivate will
> completely break after a remuxing/editing.
OK, we have tentatively decided to convert WebVTT chapters to Matroska chapters.
We use this: http://www.matroska.org/technical/specs/subtitles/srt.html
In short, the juicy information is used in a S_TEXT/UTF8 track.
>> At some point we had something called 'control tracks', such would have had
>> a "C_" prefix. Maybe we could use a "D_" prefix for all non video, audio and
>> subtitle data. D is for data.
>
> To clarify: you think the codec ID should be D_TEXT/VTT/kind then?
No, whatever can be mapped to existing things should be done that way.
Other things (like temporal data) should use new tracks IDs. For
example D_GPS/VTT for a track that has GPS data in the format used in
WebVTT.
>> I also think it is wrong to use the type 0x11
>> for all these WebVTT data as they are not subtitles.
>
> If not 0x11, then what value?
http://www.matroska.org/technical/specs/index.html#TrackType
You could use 0x20 or something not defined in there.
>> If WebVTT subtitles are just UTF-8 strings, then the regular S_TEXT/UTF-8
>> codec ID should be used for broader compatibility.
>
> Are SRT subtitles "just strings"?
See the link above.
>> Chapters inside a track are not very useful. And using CodecPrivate will
>> completely break after a remuxing/editing.
>
> OK, we have tentatively decided to convert WebVTT chapters to Matroska chapters.
\o/
Is the "S_" and "D_" indication used anywhere or are these IDs used as
mime type equivalents? I.e. are there any apps that care about all
"S_" type tracks rather than "S_TEXT/UTF8" as the identifier of a
subtitle track? Other than applications that would count how many
subtitle tracks a video collection typically has, I fail to see a use
case for separating out the "subtitle" information from the format
identifying information.
Basically, I am asking because I wonder if we'd rather just have
something like "D_TEXT/VTT/kind" for all kinds of WebVTT tracks, or
instead have "S_TEXT/VTT" (for subtitles) and "C_TEXT/VTT" (for
captions) and "D_TEXT/VTT" (for description), and "M_TEXT/VTT/type"
(for metadata with type providing further information on the cue
content format).
Cheers,
Silvia.
These names are made human readable to make it easy to search for the
proper codec when you don't know what it is. The type prefix is just a
naming convention.
> subtitle track? Other than applications that would count how many
> subtitle tracks a video collection typically has, I fail to see a use
> case for separating out the "subtitle" information from the format
> identifying information.
>
> Basically, I am asking because I wonder if we'd rather just have
> something like "D_TEXT/VTT/kind" for all kinds of WebVTT tracks, or
> instead have "S_TEXT/VTT" (for subtitles) and "C_TEXT/VTT" (for
> captions) and "D_TEXT/VTT" (for description), and "M_TEXT/VTT/type"
> (for metadata with type providing further information on the cue
> content format).
Yes, each part should be separated and mapped to existing Matroska
structures to describe the same things. For example S_TEXT/VTT would
not exist, it's just S_TEXT/UTF8 with the same mapping as SRT.
Captions are handled like a subtitle track. Chapters would not go
inside a track and metadata probably not either.
> Cheers,
> Silvia.
>
> --
> You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to webm-discuss...@webmproject.org.
> For more options, visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
>
--
We're talking about timed metadata here, so it would. I agree that
header-style metadata and chapters should not go in a track.
I hadn't really expected that the encapsulation would explicitly know
about the kind that goes into the the track, but if there is already a
convention that subtitle/caption tracks start with "S_" and "D_" is
data (not "M_" as I mentioned above), then we should likely stick with
that. What do we do with descriptions? They are time-aligned text that
the screen reader turns into voice. Since "D_" is taken, maybe we
should use "A_" for "audio description"?
Silvia.
Another question that just came to mind: if the codec ID field does
not contain information on what format the content will be in, how
does the decoder know which code path to use? WebVTT has markup that
is different from SRT, so we can't just pretend it is the same. I
think it would need to be "S_WEBVTT" or "S_TEXT/VTT".
Silvia.
But that's an argument for using "S_TEXT/VTT/kind" as the codec ID.
> Other things (like temporal data) should use new tracks IDs. For
> example D_GPS/VTT for a track that has GPS data in the format used in
> WebVTT.
There is no such thing as "temporal metadata". Yes there has been
interest in specifying some mechanism for carrying temporal metadata
in WebM, and in particular for specifying GPS payload that can vary
over time, but this is a separate discussion. Our immediate goal is
very narrow: decide how to embed WebVTT files in a WebM file.
>>> I also think it is wrong to use the type 0x11
>>> for all these WebVTT data as they are not subtitles.
>>
>> If not 0x11, then what value?
>
> http://www.matroska.org/technical/specs/index.html#TrackType
> You could use 0x20 or something not defined in there.
Well this strikes me as is a distinction without a difference. But I
have no particular opinion about what the correct value should be.
(To be honest: I don't know that a "control" track type is.)
-Matt
So is the TrackType repeating what the ID is getting? I mean: if
TrackType is set to 0x12 for a WebVTT rack with kind=subtitles (or
captions), then the ID has to start with "S_"?
Silvia.
This page is up [1]. For now it only contains edits to the supported
elements. For official tags should we similarly use Matroska as the
basis and pare down where necessary?
[1] http://wiki.webmproject.org/webm-metadata/global-metadata
Technically any subtitle codec that is text based could be used for text
to speech. S_TEXT/UTF8 would be the easiest candidate. The other ones
have some presentation data in them but can always be stripped to only
have the text.
Creating an audio codec which in fact only has text inside is also
doable. The container doesn't need to know about that trick.
What do you call "markup" ? If the only usable information are
start/stop timecodes and the text to render, then S_TEXT/UTF8 is fine.
I don't know enough about VTT to tell if it fits exactly into something
pre-existing.
>>>> I also think it is wrong to use the type 0x11
>>>> for all these WebVTT data as they are not subtitles.
>>>
>>> If not 0x11, then what value?
>>
>> http://www.matroska.org/technical/specs/index.html#TrackType
>> You could use 0x20 or something not defined in there.
>
> Well this strikes me as is a distinction without a difference. But I
> have no particular opinion about what the correct value should be.
> (To be honest: I don't know that a "control" track type is.)
A control track can, for example, tell the player to switch on/off a
track, seek to another position in the file, etc. It "controls" the
playback. There are no real life example for this, but it could be done.
For now we have done it using chapter "codecs" which is more flexible.
No, it's just a convention for users to know what codec they are
supposed to have.
With markup I mean HTML-style markup: <ruby>, <v>, <c>, <b>, <i>, etc.
Silvia.
OK, these are not supported (although I'm told some SRT files have <i>
and a DirectShow filter handles them).
So "S_TEXT/VTT" would be the nicest codec ID, IMO.
> So "S_TEXT/VTT" would be the nicest codec ID, IMO.
Yes. S_TEXT/UTF8 only makes sense if the VTT file has no markup.
Is there another way to indicate the 'kind' attribute for the track if
you don't like the S_TEXT/VTT/<kind> codec id? It makes sense to use a
different Track Type for metadata than for subtitles, but what about
distinguishing captions and descriptions from subtitles? Should
parsers just look for the header metadata in the CodecPrivate element?
-r
--
Ralph Giles
Xiph.org Foundation for open multimedia
> [1] http://wiki.webmproject.org/webm-metadata/global-metadata
--
# the same in stream/out of stream differenciation should be done for metadata
# the BlockGroup+BlockDuration approach seems to be the correct one
# file-wide data should be stored in the codec private, the language
should be extracted. I'm not sure it should be removed.
# the Default Cue data seem to manage a codec state
(http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
so it should be in a BlockGroup. Having a null duration seems good too
(if it's wrongly interpreted, it should not render anyway). So using a
SimpleBlock is OK too.
Given it's a state that is preserved, maybe further Blocks of data
could reference this Default Block (like a P frame). But it becomes
impossible if there are different default state for different
parameters changing over time.
Maybe the only clean solution would be to "expand" the default values
in every block. After all it's just a compression of data (for the
writer). When extracting, a smart program could "compress" back the
data using default states too.
Can you elaborate on this a bit? It's not clear what you mean.
> # the BlockGroup+BlockDuration approach seems to be the correct one
OK
> # file-wide data should be stored in the codec private, the language
> should be extracted. I'm not sure it should be removed.
That part of the WebVTT spec hasn't been finalized, so putting it in
the CodecPrivate is just we could do it, if file-wide metadata is
standardized.
> # the Default Cue data seem to manage a codec state
> (http://www.w3.org/WAI/PF/HTML/wiki/Media_WebVTT_Changes#Default_Cue_Settings),
> so it should be in a BlockGroup. Having a null duration seems good too
> (if it's wrongly interpreted, it should not render anyway). So using a
> SimpleBlock is OK too.
We just need to figure out how to handle the timestamp of the block.
> Given it's a state that is preserved, maybe further Blocks of data
> could reference this Default Block (like a P frame).
Right. But to make this work, we might need some notion of an
I-frame, in which we collect all the current defaults, and write a
proper non-default WebVTT cue.
> But it becomes
> impossible if there are different default state for different
> parameters changing over time.
> Maybe the only clean solution would be to "expand" the default values
> in every block.
That's another option -- when you embed a WebVTT cue, you write a proper cue.
It does mean that the muxer would have to be a state machine, with
intimate knowledge of WebVTT cue syntax, in order to synthesize a
proper (non-default) cue value. This is the same thing a WebVTT
renderer (or "decoder") must do, so perhaps this can be reused in
library form.
> After all it's just a compression of data (for the
> writer). When extracting, a smart program could "compress" back the
> data using default states too.
Right.
-Matt
We need to be careful about the default cue settings. They have not
been accepted into the WebVTT specification and are just a proposal at
this stage.
When encoding them, I would prefer them to be header-like data rather
than replicated across each cue, which both takes up more space and is
harder to identify as a default setting.
Cheers,
Silvia.
Actually, if default cue settings were used within the file rather
than just as header-type data, they would need to follow the cue
parsing convention, so could easily also be embedded as a cue.
> It does mean that the muxer would have to be a state machine, with
> intimate knowledge of WebVTT cue syntax, in order to synthesize a
> proper (non-default) cue value. This is the same thing a WebVTT
> renderer (or "decoder") must do, so perhaps this can be reused in
> library form.
The only problem would be if you're seeking through a file and miss
that cue, so your settings on the cues that come thereafter may be
wrong. For this case we could indeed have a "i-frame" type link back
to the last default cue settings cue. Or alternatively we could move
such cues to the header of the file. Or finally, if we really can't
avoid it, we could indeed add these cue settings to every cue
thereafter. But we'd need to do some of the cue settings
interpretation during muxing in this case, because some of the default
settings may be overwritten by a cue. I'd prefer to avoid that last
option.
Silvia.