Small change proposed for media overlays

Marisa DeMeglio

unread,

Apr 19, 2011, 3:55:56 AM4/19/11

to EPUB WG, Casey Dougherty, Al Cannistraro, Raymond Walsh

Hi all,

Just a note about a proposed change to Media Overlays in order to
accommodate a use case. The use case is pairing abridged audio books
with the full-length text. The issue is that Media Overlays currently
mandates a 1:1 relationship between the MO and the Content Doc; this
model assumes that the Content Doc is in charge of the playback order,
whereas, for this particular use case, the MO should be in charge.
The reason behind having a 1:1 relationship between MO and Content Doc
was to make it easy for UAs to locate their position in MO starting
from the Content Doc. However, this same benefit could also be gained
from a 1:many MO:Content Doc relationship, because it still preserves
the idea that, given a Content Doc, its corresponding audio narration
exists in only one MO.

So, the proposed changes are:

1. Change 1:1 to 1:many regarding the relationship between Media
Overlays and Content Documents

2. Add some prose to remind Reading Systems that, given a Content
Document, the corresponding starting point in the Media Overlay might
be in the middle of the file instead of at the beginning.

I think it's quite a good idea to make these changes, as it is only a
minor adjustment and will allow MO to better meet publishers' needs.

Let me know if you have any thoughts.

Marisa

Bill Kasdorf

unread,

Apr 19, 2011, 10:02:11 AM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

I wonder if the ratio isn't the reverse. I would think rather than one MO to many Content Docs, the more common situation would be one Content Doc with many MOs--one full text MO, plus various abridgement or subset MOs. I don't envision an EPUB having a bunch of Content Docs in a case like that; instead, I'd envision the MOs needing to address locations (portions, segements) within a single Content Doc.

Actually, perhaps both scenarios would be the reality. An abridgement could take a selection of individual chapters, each of which is a Content Doc in its entirety; but another abridgement could take selected content from one Content Doc (or portions of several).

Just a thought.

--Bill Kasdorf

Kotrch, Steve

unread,

Apr 19, 2011, 11:00:45 AM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

I concur with Bill—at least as far as S&S is concerned.

--steve kotrch

George Kerscher

unread,

Apr 19, 2011, 11:06:50 AM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

We may find that certain items are best with an overlay, e.g. poetry read
while a short story or article is just the text.

Best
George

Peter Sorotokin

unread,

Apr 19, 2011, 12:14:05 PM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

Marisa,

When you say "abridged audio" you mean that some text may not have
corresponding audio, not that some part of the text was rewritten to
condense it or something is rearranged, right? In other words, audio has
some gaps which are still available as text, but otherwise audio is a
faithful reproduction of text.

Peter

Peter Sorotokin

unread,

Apr 19, 2011, 12:35:43 PM4/19/11

to Alan Cannistraro, epub-work...@googlegroups.com, Casey Dougherty, Raymond Walsh

Yes it does, but I'd would like to express it more clearly. I think your
original wording can be read as trying to package full-text and abridged
version of the book in the same EPUB + provide media overlay for the latter.
I would not say "abridged" at all. Just say that (1) it is allowed for media
overlay not to cover all of the text and (2) Reading System may use either
text or media overlay as primary content source and in the latter case come
text content is skipped, if no corresponding audio exists for it.

Peter

On 4/19/11 9:20 AM, "Alan Cannistraro" <al...@apple.com> wrote:

> Peter - that would be one embodiment of this. The general case is that you
> may have a recording of audio that skips over entire sections (even documents)
> of an epub. In this case, we would want the audio to be the driver of
> sequence, rather than respecting every entry in the spine. To solve this, we
> would like a single MO to specify a sequence that spans Content Documents.
>
> Does this make sense?
>
> Alan

Daniel Weck

unread,

Apr 19, 2011, 12:58:42 PM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

In my understanding, there are essentially 2 "major" types of
synchronized text/audio books, leading to very different design
requirements:

(1) The audio narration faithfully reproduces the text content flow.
In this case, the text document is effectively the "master" in terms
of the resulting sequence of audio phrases during playback (bar
"skippability" and "escapability" artifacts, which may occur due to
footnotes, external image descriptions, special structures such as
tables, etc.) In terms of recording workflow, a reader/narrator would
create the audio overlay by following the spine and document(s) order
of a given EPUB. In terms of reading experience, the resulting EPUB +
Media-Overlay publication could be consumed in text-only mode, or in
synchronized text/audio mode, with no major content discrepancy.

(2) An existing pre-recorded audio book is "mapped" onto an existing
EPUB text-only publication, in such a way that the content flow
present in the audio is preserved. For example, the audio book may
skip paragraphs, or may jump forwards and backwards through various
major sections of the book. In this case, the audio narration is
effectively the "master" when the EPUB3 + Media Overlay publication is
consumed in synchronized text+audio mode. Conversely, the linear text-
only reading experience may result in a different sequence.

With (1), text+audio reading systems follow the spine order, and they
execute the playback sequence described in the SMIL overlay for each
Content Document encountered. The fact that a given Content Document
can only have a single Media Overlay makes it easy for reading systems
to re-sync the audio when the user decides to jump to an arbitrary
chapter of the book.

With (2), text+audio reading systems must be able to ignore the spine
order, which is enabled by the ability for a given Media Overlay to
reference different Content Documents. Basically, the declarative SMIL
timing structure becomes the orchestrator of the playback experience,
picking arbitrary parts of the text content along the way, not
necessarily following the text-centric flow of the book.

If I understand correctly, this is what justifies the 1:1 -> 1:many
mapping between Content Documents and Media Overlays. Marisa, do I get
this right ? :)

Regards, Daniel

Marisa DeMeglio

unread,

Apr 19, 2011, 6:10:00 PM4/19/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

Hi Daniel,

That's exactly correct!

Regarding Peter's comments:

Basically, the abridged audio example is the initiative for this
change, but the spec prose will not specifically cite this use case.

We already mention that Media Overlays do not have to cover every
element in the Content Document, so that should be taken care of.

And, the Reading System can faithfully render the publisher's intent
of primary media source by just playing the Media Overlay. The RS
doesn't need to be aware of whether audio or text is intended as the
master.

Marisa

On Tue, Apr 19, 2011 at 18:58, Daniel Weck <danie...@gmail.com> wrote:
> In my understanding, there are essentially 2 "major" types of synchronized
> text/audio books, leading to very different design requirements:
>
> (1) The audio narration faithfully reproduces the text content flow. In this
> case, the text document is effectively the "master" in terms of the
> resulting sequence of audio phrases during playback (bar "skippability" and
> "escapability" artifacts, which may occur due to footnotes, external image
> descriptions, special structures such as tables, etc.) In terms of recording
> workflow, a reader/narrator would create the audio overlay by following the
> spine and document(s) order of a given EPUB. In terms of reading experience,
> the resulting EPUB + Media-Overlay publication could be consumed in
> text-only mode, or in synchronized text/audio mode, with no major content
> discrepancy.
>
> (2) An existing pre-recorded audio book is "mapped" onto an existing EPUB
> text-only publication, in such a way that the content flow present in the
> audio is preserved. For example, the audio book may skip paragraphs, or may
> jump forwards and backwards through various major sections of the book. In
> this case, the audio narration is effectively the "master" when the EPUB3 +
> Media Overlay publication is consumed in synchronized text+audio mode.

> Conversely, the linear text-only reading experience may result in a

Marisa DeMeglio

unread,

Apr 19, 2011, 6:15:46 PM4/19/11

to epub-work...@googlegroups.com

Hi Bill,

That's an interesting idea, to have multiple variants of Media
Overlays. If we start packaging many MO with an EPUB, and only one
type of MO will be played at once, then we also need to provide a
mechanism to select which type the user wants to play. That would be
a great addition in a future revision of EPUB when we also handle
other types of switching in MO, such as language or other presentation
options. At the moment, however, there is not a switch mechanism
associated with MO, so I don't think we could do it now.

Marisa

Bill Kasdorf

unread,

Apr 19, 2011, 6:46:50 PM4/19/11

to epub-work...@googlegroups.com

Okay, thanks for the explanation. Given our schedule, it certainly makes sense to defer this refinement.--Bill

Marisa DeMeglio

unread,

Apr 26, 2011, 10:17:55 PM4/26/11

to epub-work...@googlegroups.com, Casey Dougherty, Al Cannistraro, Raymond Walsh

Just an update on this: I made these changes to the media overlays
spec today. You can see them in sections 3.5.1 and 4.1. Feedback
welcome!