[whatwg] [media] startOffsetTime, also add startTime?

11 views
Skip to first unread message

Odin Hørthe Omdal

unread,
Mar 7, 2012, 5:56:42 AM3/7/12
to wha...@lists.whatwg.org, sean.o...@gmail.com, ingar....@gmail.com
startOffsetTime seem to leave people confused, I often have to explain it,
and yesterday I read the spec[5] and old emails and got confused myself.
It hasn't been implemented after almost 2 years.


Having the UTC time of the clip you're getting would be very useful. But
it'd be really nice to get the start of the non-normalized timestamp
exposed to javascript for synchronizing out-of-band metadata with the live
streamed media.

Browsers are currently supposed to take the timestamp and normalize it to
0 for currentTime. Chromium currently does not do that; it starts at 3:15,
if I join a streamed video that I started streaming 3 minutes, 15 seconds
ago.

I don't think using the UTC time as the sync point is very stable at the
moment. It'd be a much quicker, stable, and easier win to get a startTime,
timelineStartTime or timeSinceStart or similar that exposes the
NON-normalized timestamp value at the start of the stream. So that, if you
do

startTime + currentTime

you're able to get the actual timestamp that the stream is at, at that
point. And in contrast with startOffsetTime this one won't ever change, so
startTime + currentTime will always be continuously increasing.

The Date UTC which startOffsetTime would use, seems to be varying quite a
bit. You need to know your streaming server and what it does in order to
understand the result. Even different media from the same server might
give different results if the streaming server implementation just reads
the UTC time directly from the file. The information could be useful, but
for more advanced uses.


startOffsetTime and initialTime came out of this conversation in 2010:
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/thread.html#26342>

And introduced here:
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/028004.html>


Sean O'Halpin of BBC recently mentioned[2] some of the confusion:

> There seems to be some confusion here in how the HTML5 media elements
> specification is dealing with logical stream addressing versus physical
> stream addressing. The excerpt above talks about a user agent being able
> to "seek to an earlier point than the first frame originally provided by
> the server" but does not explain how this could possibly happen without
> communication back to the server, in which case we are effectively
> dealing with a request for a different physical resource. At the very
> least, the fact that the Firefox and Chrome teams came up with different
> interpretations shows that this part of the specification would benefit
> from clarification.


And an earlier blog post about startOffsetTime specifically[3]:

> The reason for setting this out is that we'd like to see consistent
> support for startOffsetTime across all commonly used codecs and for
> browser vendors to bring their implementations into line with the
> published HTML5 media elements specification. There are ambiguities in
> the specification itself, such as the interpretation of 'earliest
> seekable position', which could be clarified, especially with respect to
> continuous live streaming media. Browser vendors need to agree on a
> common interpretation of attributes such as currentTime so others can
> experiment with the exciting possibilities this new technology is
> opening up.

Sooo... It would be nice to get some real cleanups to the whole media +
time thing. :D

1.
<http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#offsets-into-the-media-resource>
2.
<http://www.bbc.co.uk/blogs/researchanddevelopment/2012/02/what-does-currenttime-mean-in.shtml>
3.
<http://www.bbc.co.uk/blogs/researchanddevelopment/2012/01/implementing-startoffsettime-f.shtml>
--
Odin Hørthe Omdal · Core QA, Opera Software · http://opera.com /

Rick Waldron

unread,
Mar 7, 2012, 10:04:33 AM3/7/12
to Odin Hørthe Omdal, wha...@lists.whatwg.org, sean.o...@gmail.com, ingar....@gmail.com
Thanks for putting this together Odin -- this has long been a point of
interest for all of us on the Popcorn.js dev team.

Rick

> <http://lists.whatwg.org/**htdig.cgi/whatwg-whatwg.org/**
> 2010-May/thread.html#26342<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-May/thread.html#26342>
> >
>
> And introduced here:
> <http://lists.whatwg.org/**htdig.cgi/whatwg-whatwg.org/**
> 2010-August/028004.html<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/028004.html>


> >
>
>
> Sean O'Halpin of BBC recently mentioned[2] some of the confusion:
>
> There seems to be some confusion here in how the HTML5 media elements
>> specification is dealing with logical stream addressing versus physical
>> stream addressing. The excerpt above talks about a user agent being able to
>> "seek to an earlier point than the first frame originally provided by the
>> server" but does not explain how this could possibly happen without
>> communication back to the server, in which case we are effectively dealing
>> with a request for a different physical resource. At the very least, the
>> fact that the Firefox and Chrome teams came up with different
>> interpretations shows that this part of the specification would benefit
>> from clarification.
>>
>
>
> And an earlier blog post about startOffsetTime specifically[3]:
>
> The reason for setting this out is that we'd like to see consistent
>> support for startOffsetTime across all commonly used codecs and for browser
>> vendors to bring their implementations into line with the published HTML5
>> media elements specification. There are ambiguities in the specification
>> itself, such as the interpretation of 'earliest seekable position', which
>> could be clarified, especially with respect to continuous live streaming
>> media. Browser vendors need to agree on a common interpretation of
>> attributes such as currentTime so others can experiment with the exciting
>> possibilities this new technology is opening up.
>>
>
>
>
> Sooo... It would be nice to get some real cleanups to the whole media +
> time thing. :D
>
>
>

> 1. <http://www.whatwg.org/specs/**web-apps/current-work/**
> multipage/the-video-element.**html#offsets-into-the-media-**resource<http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#offsets-into-the-media-resource>
> >
> 2. <http://www.bbc.co.uk/blogs/**researchanddevelopment/2012/**
> 02/what-does-currenttime-mean-**in.shtml<http://www.bbc.co.uk/blogs/researchanddevelopment/2012/02/what-does-currenttime-mean-in.shtml>
> >
> 3. <http://www.bbc.co.uk/blogs/**researchanddevelopment/2012/**
> 01/implementing-**startoffsettime-f.shtml<http://www.bbc.co.uk/blogs/researchanddevelopment/2012/01/implementing-startoffsettime-f.shtml>

Philip Jägenstedt

unread,
Mar 8, 2012, 5:11:06 AM3/8/12
to wha...@lists.whatwg.org
On Wed, 07 Mar 2012 11:56:42 +0100, Odin Hørthe Omdal <odi...@opera.com>
wrote:

> startOffsetTime seem to leave people confused, I often have to explain

> it, and yesterday I read the spec[5] and old emails and got confused
> myself. It hasn't been implemented after almost 2 years.

We (Opera) have wanted to implement this for a long time, but it has been
stalled by the fact that the spec is confusing to the point that we
haven't been able to agree on what it's actually trying to say. Let's fix
that.

> Having the UTC time of the clip you're getting would be very useful. But
> it'd be really nice to get the start of the non-normalized timestamp
> exposed to javascript for synchronizing out-of-band metadata with the
> live streamed media.
>
> Browsers are currently supposed to take the timestamp and normalize it
> to 0 for currentTime. Chromium currently does not do that; it starts at
> 3:15, if I join a streamed video that I started streaming 3 minutes, 15
> seconds ago.
>
> I don't think using the UTC time as the sync point is very stable at the
> moment. It'd be a much quicker, stable, and easier win to get a
> startTime, timelineStartTime or timeSinceStart or similar that exposes
> the NON-normalized timestamp value at the start of the stream. So that,
> if you do
>
> startTime + currentTime
>
> you're able to get the actual timestamp that the stream is at, at that
> point. And in contrast with startOffsetTime this one won't ever change,
> so startTime + currentTime will always be continuously increasing.

I agree that it would be useful to expose the constant by which timestamps
are adjusted to guarantee that that currentTime starts at 0 and ends at
duration. I think that both a name like startTime (or initialTime) would
suggest that it is the initial value of currentTime, which it is not.

I suggest the property offsetTime, defined as the stream time in seconds
which currentTime and duration are relative to. In practice it would often
be understood as the "time since the server began streaming" and would be
useful to sync live streams with out-of-band content simply by letting the
out-of-band content be relative to the start of the stream. No round-trip
with Date representations should be necessary in the common case.

As hinted above, I don't think that startOffsetTime should really be the
first choice for trying to sync live streams. However, knowing the date of
a video is still useful, potentially even for the streaming case, so we do
want to expose the DateUTC field from WebM. However, startOffsetTime is a
bad name for it, since it's not using the same unit as currentTime. I
suggest offsetDate, to go with offsetTime.

Finally, what about initialTime? It can be set to a non-zero value at two
points in the spec:

"Establish the media timeline for the purposes of the current playback
position, the earliest possible position, and the initial playback
position, based on the media data."

"If either the media resource or the address of the current media resource
indicate a particular start time, then set the initial playback position
to that time and"

Does any format expose something like this in-band? I don't know of any
that do and how to implement this, so the only thing that remains is
exposing the start time of media fragments. This seems rather useless to
me, so unless someone has already implemented initialTime and explain what
it means, I suggest dropping it from the spec.

--
Philip Jägenstedt
Core Developer
Opera Software

Philip Jägenstedt

unread,
Mar 8, 2012, 5:55:55 AM3/8/12
to wha...@lists.whatwg.org
On Thu, 08 Mar 2012 11:11:06 +0100, Philip Jägenstedt <phi...@opera.com>
wrote:

> As hinted above, I don't think that startOffsetTime should really be the
> first choice for trying to sync live streams. However, knowing the date
> of a video is still useful, potentially even for the streaming case, so
> we do want to expose the DateUTC field from WebM. However,
> startOffsetTime is a bad name for it, since it's not using the same unit
> as currentTime. I suggest offsetDate, to go with offsetTime.

We discussed this some more internally, specifically if the date is an
offset at all and if so what it is relative to. In WebM, the DateUTC field
is defined as "Date of the origin of timecode (value 0), i.e. production
date." [1] Exposing this directly would mean that it is the date at
currentTime=-offsetTime, an origin time that you can't actually seek to in
the streaming case.

We discussed the concatenation of two clips and how to represent the date.
At least chained WebM and chained Ogg should be able to represent this.

To reduce the possibility for confusion about what date is represented and
to allow the recording date to be preserved in editing, how about exposing
currentDate instead?

[1] http://www.webmproject.org/code/specs/container/

Odin Hørthe Omdal

unread,
Mar 8, 2012, 7:30:49 AM3/8/12
to Ingar Mæhlum Arntzen, wha...@lists.whatwg.org, sean.o...@gmail.com
On Thu, 08 Mar 2012 12:50:41 +0100, Ingar Mæhlum Arntzen
<ingar....@gmail.com> wrote:
> Here's my reasoning. The progress value that is visualized in the video
> element (i.e. currentTime) is part of the end-user experience. For this
> reason it is important that it communicates the appropriate abstraction
> consistently to all end-users.


Ah, but that is up to the user agent to decide how to show the time code.
The currentTime should be normalized from 0 until duration. That makes the
API behave in a common way for all easy tasks. If you write a video player
for your small cat clip, that video player will also work with streaming
video without any problem. That is a good thing.

However, the user agent is free to show you (the user) your "real"
position. And I agree that doing that makes sense. They don't exclude
eachother.

> Maybe "joinTime" or some other property could be added to hold that
> information (which Chromium appears to lack - according to Sean O'Halpins
> comments).
>
> Alternatively, to match you suggestion, if it is the sum ("startTime" +
> "currentTime") that is visualilzed in the video element, that might be OK
> too, but possibly more phrone to confusion?

Only video player authors will actually see and use those attributes. They
should be built for being robust and working nicely for different usages.
Like I said, making the "dumb video player" also work for live streamed
video without any changes.

If you want to do a more advanced media player that is live video
streaming aware, you will have to opt-in to that instead. All the same is
possible, only one way is more backward-proof than the other.

Philip Jägenstedt proposed "offsetTime" for what we've called "startTime",
which IMHO is a clearer name.


> In addition, I wonder if negative values for currentTime are legal. For
> instance, when streaming a Formula 1 race that starts at 17.00, I would
> not
> be surprised to see negative currentTime if I join the stream before the
> race starts.

They are not, and shouldn't be. currentTime is always normalized to 0 ->
duration.

However, you would be perfectly able to write a video player that does
that by using offsetTime and currentTime together. Even better, the
proposed "currentDate" exposes the underlying "date of recording" (or
similar date) of the media, which you can then just look for 2012-03-08
17:00. Actually, you could also build your video player to show that date
on-screen, because 17:00 on the screen might be 18:13 at my place, because
a) I'm in a different time zone, and b) there's 13 minutes worth of
buffering between the Formula 1 production cameras and my computer.

Ian Hickson

unread,
Mar 8, 2012, 1:01:42 PM3/8/12
to Odin Hørthe Omdal, Rick Waldron, wha...@lists.whatwg.org, sean.o...@gmail.com, ingar....@gmail.com
On Wed, 7 Mar 2012, Odin Hørthe Omdal wrote:
>
> startOffsetTime seem to leave people confused, I often have to explain
> it, and yesterday I read the spec[5] and old emails and got confused
> myself. It hasn't been implemented after almost 2 years.

Can you elaborate on how it's confusing? I don't really understand.


> Having the UTC time of the clip you're getting would be very useful. But
> it'd be really nice to get the start of the non-normalized timestamp
> exposed to javascript for synchronizing out-of-band metadata with the
> live streamed media.

What is the "start of the non-normalized timestamp"?


> Browsers are currently supposed to take the timestamp and normalize it
> to 0 for currentTime. Chromium currently does not do that; it starts at
> 3:15, if I join a streamed video that I started streaming 3 minutes, 15
> seconds ago.

If you start streaming at 3:15 (no date) into a stream that had a finite
start at an implicit 0:00, then it is conforming (recommended as "should",
in fact) for the media's first frame to be at currentTime=195s.

In fact if the media has a timestamp, browsers are explicitly urged
("should") to only rebase it to 0 if the timeline has a negative
component. If the media has a discontinuous timeline, the timeline used
for the first part is required ("must") to extend it to the rest of the
resource, but it is still used as is. only if no timeline is present at
all (e.g. an MJPEG stream) is the user agent supposed to use a zero origin
for the timeline, and even then it's still only a "should".


> I don't think using the UTC time as the sync point is very stable at the
> moment. It'd be a much quicker, stable, and easier win to get a startTime,
> timelineStartTime or timeSinceStart or similar that exposes the NON-normalized
> timestamp value at the start of the stream. So that, if you do
>
> startTime + currentTime
>
> you're able to get the actual timestamp that the stream is at, at that
> point. And in contrast with startOffsetTime this one won't ever change,
> so startTime + currentTime will always be continuously increasing.

I don't understand what this is asking for. Can you give a concrete
example with a specific media stream I can look at?


> The Date UTC which startOffsetTime would use, seems to be varying quite
> a bit. You need to know your streaming server and what it does in order
> to understand the result. Even different media from the same server
> might give different results if the streaming server implementation just
> reads the UTC time directly from the file. The information could be
> useful, but for more advanced uses.

startOffsetTime is only useful when there's a date component. The only
time I'm aware of that being available is for something like a cable TV
DVR. Does any Web media format have a way to specify a date?


> Sean O'Halpin of BBC recently mentioned[2] some of the confusion:
>
> > There seems to be some confusion here in how the HTML5 media elements
> > specification is dealing with logical stream addressing versus
> > physical stream addressing. The excerpt above talks about a user agent
> > being able to "seek to an earlier point than the first frame
> > originally provided by the server" but does not explain how this could
> > possibly happen without communication back to the server, in which
> > case we are effectively dealing with a request for a different
> > physical resource.

You'll definitely need communication with the server; even for just
straight streaming without seeking, or for seeking forwards, you'll
probably want client-to-server communication (e.g. for bandwidth
management).

Not sure what the difference is between a "logical" and "physical"
resource in this case though.


> > At the very least, the fact that the Firefox and Chrome teams came up
> > with different interpretations shows that this part of the
> > specification would benefit from clarification.

The spec intentionally allows different behaviours with respect to how
much of the stream the user is allowed to seek to. A browser could just
make .seekable return a single empty range consisting of just the
currentTime, for instance.


I'm happy to change the spec on this point, but I don't understand what
the problem is, so it's hard for me to make changes.

--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'

Ian Hickson

unread,
Mar 8, 2012, 1:16:40 PM3/8/12
to Philip Jägenstedt, Odin Hørthe Omdal, wha...@lists.whatwg.org, Ingar Mæhlum Arntzen, sean.o...@gmail.com

(Oops, sorry. Missed these e-mails in my earlier reply.)

On Thu, 8 Mar 2012, Philip Jägenstedt wrote:
> On Wed, 07 Mar 2012 11:56:42 +0100, Odin Hørthe Omdal
> <odi...@opera.com> wrote:
> >

> > startOffsetTime seem to leave people confused, I often have to explain
> > it, and yesterday I read the spec[5] and old emails and got confused
> > myself. It hasn't been implemented after almost 2 years.
>

> We (Opera) have wanted to implement this for a long time, but it has
> been stalled by the fact that the spec is confusing to the point that we
> haven't been able to agree on what it's actually trying to say. Let's
> fix that.

I'm happy to make it clearer, but it seems clear to me. What are your
interpretations, so that I can explicitly rule out in the spec the ones
that are not intended?


> I agree that it would be useful to expose the constant by which
> timestamps are adjusted

Time stamps should not be adjusted.


> to guarantee that that currentTime starts at 0 and ends at duration.

That is not what the spec requires.


> I think that both a name like startTime (or initialTime) would suggest
> that it is the initial value of currentTime, which it is not.

initialTime is the initial value of currentTime.


> I suggest the property offsetTime, defined as the stream time in seconds
> which currentTime and duration are relative to.

I don't understand what this means. The currentTime is relative to the
media timeline, which is UA-defined and "should" be based on the media
timeline.


> In practice it would often be understood as the "time since the server
> began streaming" and would be useful to sync live streams with
> out-of-band content simply by letting the out-of-band content be

> relative to the start of the stream.

That "should" be zero. I can change that to a "must" if you like; it's
a "should" because in some cases (e.g. MJPEG) you don't know what the
media timeline is or how to interpret it, so there's no way to do it.


> No round-trip with Date representations should be necessary in the
> common case.

The startOffsetTime attribute is intended for display, no? Why would you
round-trip with it?


> As hinted above, I don't think that startOffsetTime should really be the
> first choice for trying to sync live streams.

Indeed.


> However, knowing the date of a video is still useful, potentially even
> for the streaming case, so we do want to expose the DateUTC field from
> WebM. However, startOffsetTime is a bad name for it, since it's not
> using the same unit as currentTime. I suggest offsetDate, to go with
> offsetTime.

I don't mind renaming startOffsetTime if people think that would help. I
don't think "offsetDate" is any clearer though.

How about "mediaTimelineOriginDate"?


> Finally, what about initialTime? It can be set to a non-zero value at
> two points in the spec:
>
> "Establish the media timeline for the purposes of the current playback
> position, the earliest possible position, and the initial playback
> position, based on the media data."
>
> "If either the media resource or the address of the current media
> resource indicate a particular start time, then set the initial playback
> position to that time and"
>
> Does any format expose something like this in-band? I don't know of any
> that do and how to implement this, so the only thing that remains is
> exposing the start time of media fragments. This seems rather useless to
> me, so unless someone has already implemented initialTime and explain
> what it means, I suggest dropping it from the spec.

The address of the current media resource can indicate a particular start
time if you implement media fragments.


On Thu, 8 Mar 2012, Philip Jägenstedt wrote:
>
> currentTime=-offsetTime, an origin time that you can't actually seek to
> in the streaming case.

Whether you can seek there or not depends entirely on the protocol and
server. It's not a given that you can't seek to it.


> We discussed the concatenation of two clips and how to represent the
> date. At least chained WebM and chained Ogg should be able to represent
> this.

The spec requires ("must") that in the case of chained clips with
discontinuous timelines, the first clip's timeline be extended to cover
the others, and any data regarding the timeline in the subsequest clips is
dropped.


> To reduce the possibility for confusion about what date is represented
> and to allow the recording date to be preserved in editing, how about
> exposing currentDate instead?

What's the use case?


On Thu, 8 Mar 2012, Odin Hørthe Omdal wrote:
>
> Ah, but that is up to the user agent to decide how to show the time
> code. The currentTime should be normalized from 0 until duration.

I don't really understand what this means, but for some interpretations,
I disagree.

I agree that duration should be a time on the media timeline (and not a
length of time independent of timeline). I'm not sure what you mean by 0.


> > In addition, I wonder if negative values for currentTime are legal.
> > For instance, when streaming a Formula 1 race that starts at 17.00, I
> > would not be surprised to see negative currentTime if I join the
> > stream before the race starts.
>
> They are not, and shouldn't be.

The spec doesn't actually disallow it, though it does discourage it. I
could explicitly disallow a timeline with negative components.


> currentTime is always normalized to 0 -> duration.

I don't think the spec supports this assertion.

Philip Jägenstedt

unread,
Mar 9, 2012, 9:40:26 AM3/9/12
to wha...@lists.whatwg.org
On Thu, 08 Mar 2012 19:16:40 +0100, Ian Hickson <i...@hixie.ch> wrote:

>
> (Oops, sorry. Missed these e-mails in my earlier reply.)
>
> On Thu, 8 Mar 2012, Philip Jägenstedt wrote:
>> On Wed, 07 Mar 2012 11:56:42 +0100, Odin Hørthe Omdal
>> <odi...@opera.com> wrote:
>> >
>> > startOffsetTime seem to leave people confused, I often have to explain
>> > it, and yesterday I read the spec[5] and old emails and got confused
>> > myself. It hasn't been implemented after almost 2 years.
>>
>> We (Opera) have wanted to implement this for a long time, but it has
>> been stalled by the fact that the spec is confusing to the point that we
>> haven't been able to agree on what it's actually trying to say. Let's
>> fix that.
>
> I'm happy to make it clearer, but it seems clear to me. What are your
> interpretations, so that I can explicitly rule out in the spec the ones
> that are not intended?

Excellent, see below.

>> I agree that it would be useful to expose the constant by which
>> timestamps are adjusted
>
> Time stamps should not be adjusted.
>
>
>> to guarantee that that currentTime starts at 0 and ends at duration.
>
> That is not what the spec requires.
>
>
>> I think that both a name like startTime (or initialTime) would suggest
>> that it is the initial value of currentTime, which it is not.
>
> initialTime is the initial value of currentTime.
>
>
>> I suggest the property offsetTime, defined as the stream time in seconds
>> which currentTime and duration are relative to.
>
> I don't understand what this means. The currentTime is relative to the
> media timeline, which is UA-defined and "should" be based on the media
> timeline.

The BBC wrote a blog post [1] about how currentTime varies between Firefox
and Chrome. Opera does the same as Firefox here. You're right, however,
that the way "media timeline" doesn't make any guarantee that currentTime
starts at 0 or that duration is the duration. I think that the
implementations predate the "media timeline" concept, and I agree with the
BBC blog post that the Opera/Firefox behavior is better. Controls written
assuming that currentTime goes from 0 to duration won't break and duration
will actually mean duration.

>> In practice it would often be understood as the "time since the server
>> began streaming" and would be useful to sync live streams with
>> out-of-band content simply by letting the out-of-band content be
>> relative to the start of the stream.
>
> That "should" be zero. I can change that to a "must" if you like; it's
> a "should" because in some cases (e.g. MJPEG) you don't know what the
> media timeline is or how to interpret it, so there's no way to do it.

Which "should" are you referring to here?

>> No round-trip with Date representations should be necessary in the
>> common case.
>
> The startOffsetTime attribute is intended for display, no? Why would you
> round-trip with it?
>
>
>> As hinted above, I don't think that startOffsetTime should really be the
>> first choice for trying to sync live streams.
>
> Indeed.

I really don't know what startOffsetTime is intended for. AFAICT it's a
piece of metadata that you could just as well provide out-of-band, but for
convenience it is exposed via the DOM API. I think it could be handy to
have and would like to implement it, but I don't understand if it's any
different from other metadata like producer or location of a video.

>> However, knowing the date of a video is still useful, potentially even
>> for the streaming case, so we do want to expose the DateUTC field from
>> WebM. However, startOffsetTime is a bad name for it, since it's not
>> using the same unit as currentTime. I suggest offsetDate, to go with
>> offsetTime.
>
> I don't mind renaming startOffsetTime if people think that would help. I
> don't think "offsetDate" is any clearer though.
>
> How about "mediaTimelineOriginDate"?

Simply "originDate" or "startDate", perhaps? It could also do with a good
example. The spec says:

"If the media resource specifies an explicit start time and date, then
that time and date should be considered the zero point in the media
timeline; the timeline offset will be the time and date, exposed using the
startOffsetTime attribute."

I interpret this as a date at currentTime=0 in the spec's definition of
currentTime, and currentTime=-initialTime (unless media fragments are
used) in the Opera/Firefox definition of currentTime. However, there's a
weird spec example which can lead one into thinking otherwise:

"The startOffsetTime attribute would return a Date object with a time
corresponding to 2010-03-20 23:15:00 UTC. However, if a different user
agent connected five minutes later, it would (presumably) receive
fragments covering timestamps 2010-03-20 23:20:00 UTC to 2010-03-21
00:05:00 UTC and 2010-02-12 14:25:00 UTC to 2010-02-12 14:35:00 UTC, and
would expose this with a media timeline starting at 0s and extending to
3,300s (fifty five minutes)."

This seems like a rather atypical streaming scenario. It would be a lot
nicer if the single example of startOffsetTime was for the common scenario
where each client gets the same stream that thus has the same timeline and
the same startOffsetTime.

>> Finally, what about initialTime? It can be set to a non-zero value at
>> two points in the spec:
>>
>> "Establish the media timeline for the purposes of the current playback
>> position, the earliest possible position, and the initial playback
>> position, based on the media data."
>>
>> "If either the media resource or the address of the current media
>> resource indicate a particular start time, then set the initial playback
>> position to that time and"
>>
>> Does any format expose something like this in-band? I don't know of any
>> that do and how to implement this, so the only thing that remains is
>> exposing the start time of media fragments. This seems rather useless to
>> me, so unless someone has already implemented initialTime and explain
>> what it means, I suggest dropping it from the spec.
>
> The address of the current media resource can indicate a particular start
> time if you implement media fragments.

Yes, but why do we need to expose that in the DOM API, what is the use
case? For media fragments I think it's just as well to parse the URL to
get the end time as well, while the initial value of currentTime can
trivially be saved in the loadedmetadata event handler. It would certainly
help if the spec didn't suggest that initialTime can be given in-band,
unless there are formats that support this. Unless initialTime solves a
problem, just dropping it would be preferable, of course.

>> We discussed the concatenation of two clips and how to represent the
>> date. At least chained WebM and chained Ogg should be able to represent
>> this.
>
> The spec requires ("must") that in the case of chained clips with
> discontinuous timelines, the first clip's timeline be extended to cover
> the others, and any data regarding the timeline in the subsequest clips
> is
> dropped.

So the second and subsequent clips of a chain have their timelines
normalized, but not the first?

>> To reduce the possibility for confusion about what date is represented
>> and to allow the recording date to be preserved in editing, how about
>> exposing currentDate instead?
>
> What's the use case?

The use case is "don't be confusing", so let me first try to summarize
what I think the spec says:

* currentTime need not start at 0, for streams it will typically represent
for how long the server has been serving a stream.

* duration is not the duration, it is the last timestamp of a resource.

* startOffsetTime is the date at time 0, it's not an offset. It has
nothing to do with syncing live streams.

* initialTime is the first timestamp of the stream or the start time of a
media fragment URL, if one is used.

* For chained streams, the 2nd and subsequent clips have their timelines
normalized and appended to the first clips timeline.

Is that correct?

[1]
http://www.bbc.co.uk/blogs/researchanddevelopment/2012/02/what-does-currenttime-mean-in.shtml

Philip Jägenstedt

unread,
Mar 13, 2012, 6:56:41 AM3/13/12
to wha...@lists.whatwg.org, Philip Jägenstedt, Ian Hickson
On Fri, 09 Mar 2012 15:40:26 +0100, Philip Jägenstedt <phi...@opera.com>
wrote:

> let me first try to summarize what I think the spec says:


>
> * currentTime need not start at 0, for streams it will typically
> represent for how long the server has been serving a stream.
>
> * duration is not the duration, it is the last timestamp of a resource.
>
> * startOffsetTime is the date at time 0, it's not an offset. It has
> nothing to do with syncing live streams.
>
> * initialTime is the first timestamp of the stream or the start time of
> a media fragment URL, if one is used.
>
> * For chained streams, the 2nd and subsequent clips have their timelines
> normalized and appended to the first clips timeline.

I think this is mostly correct, but Odin pointed out to me this section of
the spec:

"In the absence of an explicit timeline, the zero time on the media
timeline should correspond to the first frame of the media resource. For
static audio and video files this is generally trivial. For streaming
resources, if the user agent will be able to seek to an earlier point than
the first frame originally provided by the server, then the zero time
should correspond to the earliest seekable time of the media resource;
otherwise, it should correspond to the first frame received from the
server (the point in the media resource at which the user agent began
receiving the stream)."

There are multiple problems here, and I think it's responsible for some of
the confusion.

* What is an "explicit timeline"? For example, does an Ogg stream that
starts with a non-zero timestamp have an explicit timeline?

* Does "For streaming resources ..." apply only in the absence of an
explicit timeline, or in general? In other words, what's the scope of "In
the absence of an explicit timeline"?

* Why does the spec differentiate between static and streaming resources
at all? This is not a distinction Opera makes internally, the only "mode
switch" we have depends on whether or not a resource is seekable, which
for HTTP means support for byte-range requests. A static resource can be
served by a server without support for byte-range requests such that the
size and duration are known up front, and I certainly wouldn't call that
streaming.

These definitions can be tweaked/clarified in one of two ways:

1. currentTime always reflects the underlying timestamps, such that a
resource can start playing at a non-zero offset and seekable.start(0)
could be non-zero even for a fully seekable resource. This is what the
spec already says, modulo the "streaming resources" weirdness.

2. Always normalize the timeline to start at 0 and end at duration.

I think that the BBC blog post is favoring option 2, and while that's
closest to our implementation I don't feel strongly about it. A benefit of
option 1 is that currentTime=300 represents the same thing on all clients,
which should solve the syncing problem without involving any kinds of
dates.

To sum up, here's the spec changes I still think should be made:

* Make it pedantically clear which of the above two options is correct,
preferably with a pretty figure of a timeline with all the values clearly
marked out.

* Rename startOffsetTime to make it clear that it represents the date at
currentTime=0 and document that it's intended primarily for display. I
wouldn't object to just dropping it until we expose other kinds of
metadata like producer/location, but don't care deeply.

* Drop initialTime.

Philip Jägenstedt

unread,
Apr 3, 2012, 5:28:56 AM4/3/12
to Ian Hickson, wha...@lists.whatwg.org, Odin Omdal Hørthe
Thanks for the spec changes, startDate is now in a state where I'd be
happy to implement it! More comments inline:

On Tue, 03 Apr 2012 02:21:43 +0200, Ian Hickson <i...@hixie.ch> wrote:

> On Fri, 9 Mar 2012, Philip Jägenstedt wrote:
>> On Thu, 08 Mar 2012 19:16:40 +0100, Ian Hickson <i...@hixie.ch> wrote:
>> > On Thu, 8 Mar 2012, Philip Jägenstedt wrote:

>> I really don't know what startOffsetTime is intended for. AFAICT it's a
>> piece of metadata that you could just as well provide out-of-band, but
>> for convenience it is exposed via the DOM API. I think it could be handy
>> to have and would like to implement it, but I don't understand if it's
>> any different from other metadata like producer or location of a video.
>

> The startOffsetTime is useful for controllers who want to display a
> controller with real times, e.g. like TiVo's DVR UI, even when the
> underlying media resource has some more or less arbitrary timeline.
>
> e.g. if a TV station starts broadcasting on some Friday at 2pm, that
> would
> be its zero time for its timeline, but eight months later, a user joining
> that stream doesn't care that the stream is 21 megaseconds old -- they
> just want to see 14:20 as the time that corresponds to what was streaming
> at 2:20pm.

This makes sense, and the new spec example makes it clearer.

>> It could also do with a good example. The spec says:
>>
>> "If the media resource specifies an explicit start time and date, then
>> that time and date should be considered the zero point in the media
>> timeline; the timeline offset will be the time and date, exposed using
>> the startOffsetTime attribute."
>>
>> I interpret this as a date at currentTime=0 in the spec's definition of
>> currentTime
>

> Right.


>
>
>> and currentTime=-initialTime (unless media fragments are used) in the
>> Opera/Firefox definition of currentTime.
>

> Not sure what this means.

In current Opera and Firefox the timeline is always normalized to start at
0, so the time that corresponds to 0 in the original timeline would be at
a negative currentTime. We will have to change this at the same time as
implementing startDate, since otherwise everything will be a mess...

>> > > Finally, what about initialTime? It can be set to a non-zero value
>> > > at two points in the spec:
>> > >
>> > > "Establish the media timeline for the purposes of the current
>> > > playback position, the earliest possible position, and the initial
>> > > playback position, based on the media data."
>> > >
>> > > "If either the media resource or the address of the current media
>> > > resource indicate a particular start time, then set the initial
>> > > playback position to that time and"
>> > >
>> > > Does any format expose something like this in-band? I don't know of
>> > > any that do and how to implement this, so the only thing that
>> > > remains is exposing the start time of media fragments. This seems
>> > > rather useless to me, so unless someone has already implemented
>> > > initialTime and explain what it means, I suggest dropping it from
>> > > the spec.
>> >
>> > The address of the current media resource can indicate a particular
>> > start time if you implement media fragments.
>>
>> Yes, but why do we need to expose that in the DOM API, what is the use
>> case?
>

> Allows controllers to trivially implement UI to jump back to where the
> stream started, while still showing the full seekable range.

Unless I'm missing something, initialTime is just the initial value of
currentTime, so this is already easy. Also, if media fragments are not
used, just setting currentTime=0 will clamp and seek to the earliest
position. However, I've never actually seen such UI for <video>, do you
have a real world example? It seems to me like this is a <1% use case that
is already easy to solve and that it's not worth adding an API to go from
easy to trivial.

> On Tue, 13 Mar 2012, Philip Jägenstedt wrote:
>>
>> "In the absence of an explicit timeline, the zero time on the media
>> timeline should correspond to the first frame of the media resource. For
>> static audio and video files this is generally trivial. For streaming
>> resources, if the user agent will be able to seek to an earlier point
>> than the first frame originally provided by the server, then the zero
>> time should correspond to the earliest seekable time of the media
>> resource; otherwise, it should correspond to the first frame received
>> from the server (the point in the media resource at which the user agent
>> began receiving the stream)."
>>
>> There are multiple problems here, and I think it's responsible for some
>> of the confusion.
>>
>> * What is an "explicit timeline"? For example, does an Ogg stream that
>> starts with a non-zero timestamp have an explicit timeline?
>

> If there's a timestamp in the resource, then yes, it has an explicit
> timeline. That seems self-evident, but if you can think of a way that I
> could clarify this, I would be happy to do so.
>
> An example of a video resource without an explicit timeline would be
> a multipart/x-replace JPEG stream. There, the time between the frames is
> determined by the server's transmission rate, and the data itself has no
> timing information.

AFAIK, no browser supports any format for <video> that does not have
timestamps. I don't think there's any practical need to say how to handle
this until some implementor actually wants to do it, but if you really
want to I would have been less confused if the lack of "explicit timeline"
were portrayed as an exception, using something like multipart/x-replace
as an example.

>> * Does "For streaming resources ..." apply only in the absence of an
>> explicit timeline, or in general? In other words, what's the scope of
>> "In the absence of an explicit timeline"?
>

> I've updated the second sentence to explicitly state that it also only
> applies in the absence of a timeline.

Thanks, that's much better!

>> * Why does the spec differentiate between static and streaming resources
>> at all?
>

> If you receive the entire file, there's no complication with respect to
> streaming to a point before the first rendered frame. The distinction is
> not intended to be normatively detectable, it's only intended to
> distinguish the easy case from the harder case. Again, if you think
> there's some way I could clarify that, please let me know.

IIUC, the spec is trying to handle resources that have no timestamps, are
not (known to be) finite and where "the user agent will be able to seek to

an earlier point than the first frame originally provided by the server",

i.e. with server-side seeking. Do such resources actually exist? I don't
see how they could, because how could the server seek without some concept
of timestamps?

All in all, simply demanding that all formats used have a timeline mapping
seems like a good way to deal with this, for now at least.

>> These definitions can be tweaked/clarified in one of two ways:
>>
>> 1. currentTime always reflects the underlying timestamps, such that a
>> resource can start playing at a non-zero offset and seekable.start(0)
>> could be non-zero even for a fully seekable resource. This is what the
>> spec already says, modulo the "streaming resources" weirdness.
>>
>> 2. Always normalize the timeline to start at 0 and end at duration.
>>
>> I think that the BBC blog post is favoring option 2, and while that's
>> closest to our implementation I don't feel strongly about it. A benefit
>> of option 1 is that currentTime=300 represents the same thing on all
>> clients, which should solve the syncing problem without involving any
>> kinds of dates.
>

> The spec definitely intends #1 if the format supports it. I don't think
> #2
> makes sense for many cases (e.g. broadcast TV, any case where you can
> seek to before the first rendered frame), and more importantly, if you
> connect to a stream and then later start discarding earlier data, you end
> up in #1 even if you started in #2 so I see no benefit to going out of
> our
> way to start in #2.

I (now) agree, and will try to align Opera with #1 when we poke at this
next.

>> Make it pedantically clear which of the above two options is correct,
>> preferably with a pretty figure of a timeline with all the values
>> clearly marked out.
>

> I would be happy to add such a diagram, but I have no idea how to do it,
> given the bazillions of edge cases here.
>
> If anyone wants to make such a diagram, I recommend doing it by writing
> code for this tool:
>
> http://software.hixie.ch/utilities/js/canvas/
>
> ...and then sending me the code. :-)
>
> (Ideally, using little parameterised functions for any repeated bits, so
> it's really easy to adjust.)

Odin, you make some diagrams, do you think any of those could be ported to
a script?

Ian Hickson

unread,
Apr 3, 2012, 1:13:12 PM4/3/12
to Philip Jägenstedt, wha...@lists.whatwg.org, Odin Omdal Hørthe
On Tue, 3 Apr 2012, Philip Jägenstedt wrote:
> > >
> > > It could also do with a good example. The spec says:
> > >
> > > "If the media resource specifies an explicit start time and date,
> > > then that time and date should be considered the zero point in the
> > > media timeline; the timeline offset will be the time and date,
> > > exposed using the startOffsetTime attribute."
> > >
> > > I interpret this as [...] currentTime=-initialTime (unless media
> > > fragments are used) in the Opera/Firefox definition of currentTime.
> >
> > Not sure what this means.
>
> In current Opera and Firefox the timeline is always normalized to start
> at 0, so the time that corresponds to 0 in the original timeline would
> be at a negative currentTime.

I still don't really understand what you mean by "start" here.

The idea is that all the times are unsigned, though. So if there's any way
to seek to one of these times that are before what you're calling the
"start", then yeah, it'll be a mess, because the naive approach of simply
drawing a seek bar from 0 to duration (rather than seekable.start(0) to
duration) will fail.


> We will have to change this at the same time as implementing startDate,
> since otherwise everything will be a mess...

So long as startDate gives the Date at media timeline's 0 point, it
doesn't really matter exactly what the media timeline is.


> > > > > Finally, what about initialTime? [...]


> > >
> > > Yes, but why do we need to expose that in the DOM API, what is the
> > > use case?
> >
> > Allows controllers to trivially implement UI to jump back to where the
> > stream started, while still showing the full seekable range.
>
> Unless I'm missing something, initialTime is just the initial value of
> currentTime, so this is already easy.

Only if the controller is around when the video is created. Don't forget
that one of the design principles of this API is that you should be able
to hook up a controller at any time and have it be able to provide a
fully-fledged controller.


> Also, if media fragments are not used, just setting currentTime=0 will
> clamp and seek to the earliest position. However, I've never actually
> seen such UI for <video>, do you have a real world example? It seems to
> me like this is a <1% use case that is already easy to solve and that
> it's not worth adding an API to go from easy to trivial.

Yeah, that's probably fair. I've removed initialTime.


> > An example of a video resource without an explicit timeline would be a
> > multipart/x-replace JPEG stream. There, the time between the frames is
> > determined by the server's transmission rate, and the data itself has
> > no timing information.
>
> AFAIK, no browser supports any format for <video> that does not have
> timestamps. I don't think there's any practical need to say how to
> handle this until some implementor actually wants to do it, but if you
> really want to I would have been less confused if the lack of "explicit
> timeline" were portrayed as an exception, using something like
> multipart/x-replace as an example.

I've made this more explicit using some notes.

BTW, browsers do support formats that do not have explicit timelines or
even explicit timings. Animated GIFs only have inter-frame timings,
there's no explicit timeline. (A frame's position is implied by the number
of delays that come before it.) And the usual way of sending MJPEG
streams, namely multipart/x-mixed-replace, has no explicit timings
whatsoever. <video> is designed such that these formats could be supported
with the media API.


> > > * Why does the spec differentiate between static and streaming
> > > resources at all?
> >
> > If you receive the entire file, there's no complication with respect
> > to streaming to a point before the first rendered frame. The
> > distinction is not intended to be normatively detectable, it's only
> > intended to distinguish the easy case from the harder case. Again, if
> > you think there's some way I could clarify that, please let me know.

I've removed the confusing bit about static resources vs streaming
resources, so hopefully this will be clearer now.


> IIUC, the spec is trying to handle resources that have no timestamps,
> are not (known to be) finite and where "the user agent will be able to
> seek to an earlier point than the first frame originally provided by the
> server", i.e. with server-side seeking. Do such resources actually
> exist? I don't see how they could, because how could the server seek
> without some concept of timestamps?

You could seek to them using frame numbers.

I'm not aware of such a format currently. I've added a note to that effect
to the spec.


> All in all, simply demanding that all formats used have a timeline
> mapping seems like a good way to deal with this, for now at least.

There are formats supported by browsers that do not have timelines. I
don't think we should exclude those ab initio.

Just covering all the bases in the spec doesn't mean we require anything
of browsers, but it does mean that if a browser wants to go beyond the
call of duty and support, say, animated GIFs, they can do so in an
unambiguous way without having to invent ways around the spec's limitations.

Philip Jägenstedt

unread,
Apr 4, 2012, 4:36:12 AM4/4/12
to Ian Hickson, wha...@lists.whatwg.org, Odin Omdal Hørthe
On Tue, 03 Apr 2012 19:13:12 +0200, Ian Hickson <i...@hixie.ch> wrote:

> On Tue, 3 Apr 2012, Philip Jägenstedt wrote:
>> > >
>> > > It could also do with a good example. The spec says:
>> > >
>> > > "If the media resource specifies an explicit start time and date,
>> > > then that time and date should be considered the zero point in the
>> > > media timeline; the timeline offset will be the time and date,
>> > > exposed using the startOffsetTime attribute."
>> > >
>> > > I interpret this as [...] currentTime=-initialTime (unless media
>> > > fragments are used) in the Opera/Firefox definition of currentTime.
>> >
>> > Not sure what this means.
>>
>> In current Opera and Firefox the timeline is always normalized to start
>> at 0, so the time that corresponds to 0 in the original timeline would
>> be at a negative currentTime.
>
> I still don't really understand what you mean by "start" here.
>
> The idea is that all the times are unsigned, though. So if there's any
> way
> to seek to one of these times that are before what you're calling the
> "start", then yeah, it'll be a mess, because the naive approach of simply
> drawing a seek bar from 0 to duration (rather than seekable.start(0) to
> duration) will fail.

What I mean with "normalized to start at 0" is that when playing the whole
resource, currentTime will start at 0 and end at duration. (This was not
really a deliberate choice in Opera, it's just what GStreamer does and I
never thought about it until this issue came up.)

>> We will have to change this at the same time as implementing startDate,
>> since otherwise everything will be a mess...
>
> So long as startDate gives the Date at media timeline's 0 point, it
> doesn't really matter exactly what the media timeline is.

Yeah, I guess we could shift startDate by same (undetectable) offset, but
if we're going to spend efforts shifting things around we might as well
shift currentTime into alignment with the spec :)

>> > > > > Finally, what about initialTime? [...]
>> > >
>> > > Yes, but why do we need to expose that in the DOM API, what is the
>> > > use case?
>> >
>> > Allows controllers to trivially implement UI to jump back to where the
>> > stream started, while still showing the full seekable range.
>>
>> Unless I'm missing something, initialTime is just the initial value of
>> currentTime, so this is already easy.
>
> Only if the controller is around when the video is created. Don't forget
> that one of the design principles of this API is that you should be able
> to hook up a controller at any time and have it be able to provide a
> fully-fledged controller.

Right, I keep forgetting that...

>> Also, if media fragments are not used, just setting currentTime=0 will
>> clamp and seek to the earliest position. However, I've never actually
>> seen such UI for <video>, do you have a real world example? It seems to
>> me like this is a <1% use case that is already easy to solve and that
>> it's not worth adding an API to go from easy to trivial.
>
> Yeah, that's probably fair. I've removed initialTime.

Thanks!

[snip]

The spec changes around explicit timelines and static/streaming resources
are a big improvement, thanks! However, it now talks about both "explicit
timeline" and "explicit timings" in a way that makes me uncertain about
Ogg. Ogg (at least without skeleton) is just a stream of timestamped
packets, so the timeline simply spans the timestamp of the first packet to
the timestamp of the last packet. WebM is similar in the streaming case in
that timestamps the don't start at 0. Clarification of whether or not
"explicit timestamps" (Ogg, WebM) implies an "explicit timeline" would be
welcome. I assume that's the intention, which I also agree with. (Perhaps
saying "explicit frame durations" instead of "explicit timings" would also
help.)

Finally, a typo: "no explicit timings ofd any kind"

Reply all
Reply to author
Forward
0 new messages