let's start?

Daniel Glazman

unread,

Feb 26, 2013, 12:19:36 PM2/26/13

to epu...@googlegroups.com

Hi there.

Dave started contributing blog entries to the topic so it's probably time to start here. Things I would like to see addressed by a simpler epub format:

ZIP is the only manifest
toplevel document is like Apache's toplevel document, in other words it's index.(htm|html|xht|xhtml)
that document contains a OL used for all ToC purposes
metadata of the package are inside that document
no more ID/IDREF mechanisms in metadata, if we still need to refine metadata, let's used nesting
image cover is also declared through one simple <meta rel="cover" href="..."/> there
I have mixed feelings about the mimetype file. It's useful for downloads over low-bandwidth to detect if a package is supposed to be conformant but it's also a blocker when a tool like iBooksAuthor forks that mimetype. At this time, I'm more in favor of dropping it.
I want to get rid of the properties declaring mathml, svg and script. I think they should be replaced by the presence of the corresponding namespace on the <html> element of the target document or the presence of at least a <script> element inside the <head> of the document. That way, non-conforming UAs don't have to parse the whole thing to know they will fail rendering the document

That's a first quick start. Let's discuss these 8 first ideas and add to them afterwards.

</Daniel>

Hadrien Gardeur

unread,

Feb 26, 2013, 12:29:19 PM2/26/13

to Daniel Glazman, epu...@googlegroups.com

Bonjour Daniel,

Here are my thoughts about all of these points:

ZIP is the only manifest
toplevel document is like Apache's toplevel document, in other words it's index.(htm|html|xht|xhtml)

Agree.

that document contains a OL used for all ToC purposes

I still believe that we should separate the concept of reading order (required) from an (optional) table of contents.

I've posted extensively on the blog why I believe that this is necessary.

That said, it's not really incompatible with what you're saying here.

metadata of the package are inside that document
no more ID/IDREF mechanisms in metadata, if we still need to refine metadata, let's used nesting

Sure.

image cover is also declared through one simple <meta rel="cover" href="..."/> there

meta or link ?

Also, I'd rather follow the tracks of the IETF regarding rel values: they're either already registered, or we rely on a URL for any extension.

I have mixed feelings about the mimetype file. It's useful for downloads over low-bandwidth to detect if a package is supposed to be conformant but it's also a blocker when a tool like iBooksAuthor forks that mimetype. At this time, I'm more in favor of dropping it.

Drop it.

I want to get rid of the properties declaring mathml, svg and script. I think they should be replaced by the presence of the corresponding namespace on the <html> element of the target document or the presence of at least a <script> element inside the <head> of the document. That way, non-conforming UAs don't have to parse the whole thing to know they will fail rendering the document

I also agree.

Hadrien

Dave Cramer

unread,

Feb 26, 2013, 12:36:47 PM2/26/13

to epu...@googlegroups.com

Thanks for getting us started! One problem is that we have three members so far—you, me, and Hadrien. OK if I announce the mailing list on the blog? I'll send invitations to a few more folks—suggestions welcome.

1. Agree

2. Agree

3. I think that OL should be in nav element. I'm still concerned about divergence between navigation function and reading order function, especially for educational content where assessments, etc. may not be in what we'd call the linear reading order in EPUB3. Thinking on how to do this in context of HTML5 without duplication of content.

4. Agree. Section-level metadata can live in the appropriate document... very useful.

5. Can we nest metadata while keeping valid HTML5? Agree on avoiding id/idref. I liked the "dotted" pattern that was proposed for a while for Dublin Core:

but this was deprecated, I think, and would be problematic if more than one dc.creator.

Where does HTML5 author/title/etc fit in with DC metadata?

6. Will have to think about this.

7. Get rid of mimetype!

8. Sounds reasonable. I'm hoping simple books that don't need SVG/MathML can work without namespaces.

Dave

Daniel Glazman

unread,

Feb 26, 2013, 12:39:15 PM2/26/13

to epu...@googlegroups.com

On 26/02/13 18:29, Hadrien Gardeur wrote:

> image cover is also declared through one simple <meta rel="cover"
> href="..."/> there
>
> meta or link ?

Sorry, yes, <link>. My bad.

</Daniel>

Daniel Glazman

unread,

Feb 26, 2013, 12:50:52 PM2/26/13

to epu...@googlegroups.com

On 26/02/13 18:36, Dave Cramer wrote:

> Thanks for getting us started! One problem is that we have three members

> so farï¿½you, me, and Hadrien. OK if I announce the mailing list on the
> blog? I'll send invitations to a few more folksï¿½suggestions welcome.

Sure.

> 1. Agree
> 2. Agree
> 3. I think that OL should be in nav element. I'm still concerned about
> divergence between navigation function and reading order function,
> especially for educational content where assessments, etc. may not be in
> what we'd call the linear reading order in EPUB3. Thinking on how to do
> this in context of HTML5 without duplication of content.

I am sure we can find a way of having one single list that serves all
needs/purposes.

> 5. Can we nest metadata while keeping valid HTML5?

We can certainly add things with our own namespace to xhtml5.

> Agree on avoiding
> id/idref. I liked the "dotted" pattern that was proposed for a while for
> Dublin Core:
>
> <meta name="dc.creator.file-as" content="Melville, Herman"/>
> <meta name="dc.creator.role" content="aut"/>
>
> but this was deprecated, I think, and would be problematic if more than
> one dc.creator.
>
> Where does HTML5 author/title/etc fit in with DC metadata?

We'll have to discuss and specify that collision, yes.

> 8. Sounds reasonable. I'm hoping simple books that don't need SVG/MathML
> can work without namespaces.

Exactly.

</Daniel>

Daniel Glazman

unread,

Feb 26, 2013, 12:51:04 PM2/26/13

to epu...@googlegroups.com

On 26/02/13 18:36, Dave Cramer wrote:

> Thanks for getting us started! One problem is that we have three members

> so farï¿½you, me, and Hadrien. OK if I announce the mailing list on the
> blog? I'll send invitations to a few more folksï¿½suggestions welcome.

I'll announce it on my blog too, people can request invites through
google groups's page. The group should be invite-only but readable to
all, right ?

</Daniel>

Dave Cramer

unread,

Feb 26, 2013, 1:36:26 PM2/26/13

to epu...@googlegroups.com

Exactly. Anyone can request an invitation, anyone can see the content, but the managers need to approve the requests to join. This helps avoid spammers :)

Dave

On Tue, Feb 26, 2013 at 12:51 PM, Daniel Glazman <daniel....@gmail.com> wrote:

On 26/02/13 18:36, Dave Cramer wrote:

Thanks for getting us started! One problem is that we have three members

so far—you, me, and Hadrien. OK if I announce the mailing list on the
blog? I'll send invitations to a few more folks—suggestions welcome.

Hadrien Gardeur

unread,

Feb 26, 2013, 1:42:47 PM2/26/13

to Daniel Glazman, epub-ng

I am sure we can find a way of having one single list that serves all
needs/purposes.

I don't really think so. Things we can drop are:

NCX (deprecated)
manifest (it's all in the zip)
guide (deprecated)

Which mean that we still need:

a reading order (currently the spine)
a table of contents (navigation document)

When exactly do we need the table of contents to be different from the reading order ?

A few things come to mind:

when the table of contents omits things that are in the spine (very common)
when a table of contents reference the same content document multiple times through the use of fragments
when the table of contents provide a navigation that is different from the reading order (very common with textbooks)
for guided navigation (think panel by panel navigation in a comics)

That said, quite often, the reading order and the table of contents will be the same thing. Which is why I would provide the following suggestions:

both reading order and alternate navigation should be defined as OL in index.html (I agree with Dave that this should still be in a <nav> element with an epub:type for alternate navigation)
while reading order should be required, alternate navigation is optional
if no alternate navigation is provided, consider that the reading order is also the table of contents
drop the requirement for all content documents to be referenced in the reading order
drop the notion of linearity, all documents in the reading order are accessible

Alberto Pettarin

unread,

Feb 26, 2013, 3:54:46 PM2/26/13

to epu...@googlegroups.com

First of all, thanks for setting this much-need group.

About Daniel Glazman's proposal:
1. Agree
2. Agree. What about choosing a unique filename, for the sake of easing the parsing phase for a reading app? I would choose "index" without extension, but I know Win/Mac fellows won't like it.
3. Disagree. We need a) reading order list and b) ToC list. For example, ToC might contain MORE elements than reading order list, as when a single page contains more headers with their own id. I like Hadrien Gardeur's suggestion: b) is optional, and it is assumed equivalent to a), when not present.
4. Agree, at element (= page) level.
5. Agree but not sure about the technical details of using nesting.
6. link instead of meta; an equivalence assumption (between <link rel="cover" ...> and <meta rel="cover" ...> might be acceptable)
7. mimetype not needed
8. Agree.

Daniel Glazman

unread,

Feb 26, 2013, 3:57:44 PM2/26/13

to epu...@googlegroups.com

On 26/02/13 21:54, Alberto Pettarin wrote:

> 3. Disagree. We need a) reading order list and b) ToC list. For example,
> ToC might contain MORE elements than reading order list, as when a
> single page contains more headers with their own id. I like Hadrien
> Gardeur's suggestion: b) is optional, and it is assumed equivalent to

My point is that we can have the larger list and play with visibility
and and use attributes to specify the reading order or the ToC
order. We need to discuss that in detail but I'm sure it's
feasible.

</Daniel>

Dave Cramer

unread,

Feb 26, 2013, 4:02:39 PM2/26/13

to epu...@googlegroups.com

It's easy enough to cope with multiple links to the same file. Reading order is just determined by the base file, not anything after #. So if we have this in nav in index.html:

chapter_001.html#x1

chapter_001.html#x2
chapter_001.html#x3

chapter_002.html#x4

that just tells us that chapter_002.html comes after chapter_001.html in the reading order. You can have as many other links as you like.

Dave

Alberto Pettarin

unread,

Feb 26, 2013, 4:49:24 PM2/26/13

to epu...@googlegroups.com

On 02/26/2013 10:02 PM, Dave Cramer wrote:
> It's easy enough to cope with multiple links to the same file. Reading
> order is just determined by the base file, not anything after #. So if
> we have this in nav in index.html:
>
> chapter_001.html#x1
> chapter_001.html#x2
> chapter_001.html#x3
> chapter_002.html#x4
>
> that just tells us that chapter_002.html comes after chapter_001.html in
> the reading order. You can have as many other links as you like.

You are right.

However I still see "reading order" and "ToC" as two separate and
sufficiently independent concepts, and even though I appreciate the
compactness of a unified list, I worry about the additional complexity
that it brings (more for human inspection/editing, than for an automated
framework to write it or for a reading app to parse it).

AlPe

Hadrien Gardeur

unread,

Feb 26, 2013, 5:58:21 PM2/26/13

to Alberto Pettarin, epu...@googlegroups.com

I also have concerns over a single list. We could certainly do it, but we'd have to add attributes to show/hide specific items (which reminds me of the linear="non" attribute in the spine, I hate those) and provide specific processing rules.

A default reading order with an optional ToC feels much more straightforward to me.

Peter Hatch

unread,

Feb 26, 2013, 11:48:08 PM2/26/13

to epu...@googlegroups.com

Regarding point #5, the ID/IDREF mechanisms can be used for more than refining metadata - it lets you add metadata to any element in the document.

Whether that capability is worth the complexity is an open question (I think it's not), but it's important to understand it before removing it.

Hadrien Gardeur

unread,

Feb 27, 2013, 6:51:56 AM2/27/13

to Peter Hatch, epub-ng

Regarding point #5, the ID/IDREF mechanisms can be used for more than refining metadata - it lets you add metadata to any element in the document.

Whether that capability is worth the complexity is an open question (I think it's not), but it's important to understand it before removing it.

By "any element in the document" you mean any element referenced in the manifest, right ?

It seems that the consensus about that, was that each document in the publication could also have its own metadata.

Daniel Glazman

unread,

Feb 27, 2013, 8:01:14 AM2/27/13

to epu...@googlegroups.com

Maybe we should start with requirements instead of technical chats...
My requirements are the following:

1. unzip an ebook behind a web server and the book is immediately
readable on the Web as is

2. zip a set of web pages with an index file and that's an ebook

3. modern browsers can render an ebook w/o code additions

4. navigation and fallbacks work "out of the box" in a modern browser

5. all ebook metadata can be extracted from the index file; using CSS,
they can be rendered in the index file

</Daniel>

Hadrien Gardeur

unread,

Feb 27, 2013, 8:07:29 AM2/27/13

to Daniel Glazman, epub-ng

Maybe we should start with requirements instead of technical chats...

Sounds reasonable.

My requirements are the following:

1. unzip an ebook behind a web server and the book is immediately
readable on the Web as is

2. zip a set of web pages with an index file and that's an ebook

3. modern browsers can render an ebook w/o code additions

Sure, as long as the browser can open the zip and understand index.html. Unzipping an ebook behind a web server should be an option, not the only way to render EPUB NG in a browser.

4. navigation and fallbacks work "out of the box" in a modern browser

What exactly do we need for fallbacks ? EPUB3 has fallbacks, bindings and epub:trigger. None of them are widely used or supported.

I'd like a good definition of what we'd call "fallback" in our own context.

5. all ebook metadata can be extracted from the index file; using CSS,
they can be rendered in the index file

Then, no document level metadata ? Only publication level metadata ?

Hadrien

Markus Gylling

unread,

Feb 27, 2013, 8:59:09 AM2/27/13

to epu...@googlegroups.com

What exactly do we need for fallbacks ? EPUB3 has fallbacks, bindings and epub:trigger. None of them are widely used or supported.
I'd like a good definition of what we'd call "fallback" in our own context.

While I wasn't around at the time, my belief is that the manifest-level fallbacks that EPUB has had since version 1 were mainly targeted at 1) supporting the notion of content documents using any XML grammar, and b) supporting fallbacks for html elements with no intrinsic fallback capabilities, such as <img>. (For example, consider that support for the PNG format was shaky during some years.)

At this juncture, it would appear that a valid starting point assumption is to assume that the intrinsic fallback capabilities provided by html5, svg and mathml are sufficient; there might simply be no need anymore for additional layers of fallback provision.

5. all ebook metadata can be extracted from the index file; using CSS,
they can be rendered in the index file

Then, no document level metadata ? Only publication level metadata ?

Keep in mind that complete bibliographic records (onix, marc, mods et al) are extremely rich and go beyond what can reasonably be captured in html metadata (nor can it be captured in current epub package metadata btw).

So the important requirement here is what the scope of inlined metadata is. One approach is to make a clear distinction:

1) "real" or complete bibliographic records are always out-of-line, and can be referenced via unambiguous links

2) inlined metadata is only for the most fundamental RS needs (approx: identity, version, and "bookshelf display" fields such as author, title and the common dublin core stuff)

Note there are communities out there that nurture the dream of the completely self-contained publication in terms of bibliographic metadata (i.e. there should not be a dependency on an external object). Whether or not it is enough to say "your onix/marc/mods record can always be in the zip" I do not know at this point...

Hadrien

Markus Gylling

unread,

Feb 27, 2013, 9:22:00 AM2/27/13

to epu...@googlegroups.com

On Tuesday, February 26, 2013 10:49:24 PM UTC+1, Alberto Pettarin wrote:

However I still see "reading order" and "ToC" as two separate and
sufficiently independent concepts, and even though I appreciate the
compactness of a unified list, I worry about the additional complexity
that it brings (more for human inspection/editing, than for an automated
framework to write it or for a reading app to parse it).

Ditto. For the record, the DAISY format tried the one-list-to-rule-them-all approach some years ago, but abandoned it as our experience was that it will accumulate pretty heavy complexity in the end.

* combining file-level reading order (aka spine) and the user-targeted TOC in one means that you push the problem to the file level, and not all authors have the capabilities to control the exact constitution of files. (For example: a single chapter should have 1 entry in the TOC for the user, but consists of 4 files due to size constraints or other authoring time factors. )

* on top of reading order + toc, once you add page lists, landmarks, and domain specific navigation to the same list (yes these are things that not all communities need, but some definitely do) the result is a list that has to be filtered by the RS in various and possibly complex ways in order to present something reasonable to the user. The design philosophy question here is whether to burden authoring tools or reading systems with the task of getting things in shape for display...

So I'll argue that

* the basic principle of the epub3 navigation document, which is solely user-centric and uses separate lists for each navigation "purpose", is not broken. (And I'll note that it meets Daniel G's requirement to just-work-in-browsers as well)

* the need for a dedicated spine remains, but can perhaps be solved by using another approach than a dedicated or combined list. Daniel G mentioned html head introspection; how about just requiring link@rel=next and prev in content document head metadata?

Hadrien Gardeur

unread,

Feb 27, 2013, 9:48:56 AM2/27/13

to Markus Gylling, epub-ng

Hello Markus,

Glad to have to have you on board here, your insights are always useful.

While I wasn't around at the time, my belief is that the manifest-level fallbacks that EPUB has had since version 1 were mainly targeted at 1) supporting the notion of content documents using any XML grammar, and b) supporting fallbacks for html elements with no intrinsic fallback capabilities, such as <img>. (For example, consider that support for the PNG format was shaky during some years.)

At this juncture, it would appear that a valid starting point assumption is to assume that the intrinsic fallback capabilities provided by html5, svg and mathml are sufficient; there might simply be no need anymore for additional layers of fallback provision.

We're on the same page.

The only other argument that I've heard was accessibility: if you reference an image in spine, an HTML fallback would provide some accessibility features.

I don't really buy that argument: the only way one could make images really accessible would be with metadata in SVG (not that easy) or with the kind of work that the AHL group is doing.

Keep in mind that complete bibliographic records (onix, marc, mods et al) are extremely rich and go beyond what can reasonably be captured in html metadata (nor can it be captured in current epub package metadata btw).

Fully agree with you, but I think that the distinction is not clear in EPUB3:

weak recommendations about the use of DublinCore
pretty much any element from ONIX, MARC etc. can be included in the package
and at the same time you can also link to out-of-line resources for these same metadata

I would argue that stronger recommendations about core metadata + only allowing ONIX/MARC/MODS and such as out-of-line resources would be a better path to follow.

So the important requirement here is what the scope of inlined metadata is. One approach is to make a clear distinction:

1) "real" or complete bibliographic records are always out-of-line, and can be referenced via unambiguous links

What's your definition of unambiguous then ?

EPUB3 spec relies too much on rel values for that (a rel value per record format), instead of a mix of rel value and media type.

I would strongly argue in favor of a unique rel value for out-of-line records, and provide a few recommendations for the expected media types (we don't close the door to other vocabularies or serializations).

2) inlined metadata is only for the most fundamental RS needs (approx: identity, version, and "bookshelf display" fields such as author, title and the common dublin core stuff)

It would be interested to list such metadata.

While the title and contributors are usually listed, many key metadata are missing most of the time in EPUB 2/3 files: covers, description, series title (and position in the series), categories, nature of the publication (a sample ? a dictionnary that can be indexed ?). The RS need those to organize the user's shelf or display information about the book.

Instead of telling content creators that they can use anything that they want for metadata, and having reading systems that support very little, we could reach a baseline with much more pragmatic recommendations.

Note there are communities out there that nurture the dream of the completely self-contained publication in terms of bibliographic metadata (i.e. there should not be a dependency on an external object). Whether or not it is enough to say "your onix/marc/mods record can always be in the zip" I do not know at this point...

Relying on links to discover additional services and reference metadata that can potentially change a lot is a much more pragmatic option.

Hadrien Gardeur

unread,

Feb 27, 2013, 9:59:33 AM2/27/13

to Markus Gylling, epub-ng

So I'll argue that
* the basic principle of the epub3 navigation document, which is solely user-centric and uses separate lists for each navigation "purpose", is not broken. (And I'll note that it meets Daniel G's requirement to just-work-in-browsers as well)

It's not broken. I feel we're instead building on it to create a real alternative to OPF.

* the need for a dedicated spine remains, but can perhaps be solved by using another approach than a dedicated or combined list. Daniel G mentioned html head introspection; how about just requiring link@rel=next and prev in content document head metadata?

link@rel=next would be much more respectful of HATEOAS (http://en.wikipedia.org/wiki/HATEOAS), but we're not really designing an API here.

In terms of complexity for the spec, authoring and RS, I'm not convinced that link@rel=next is better than a simple list of files in OL. I'd rather have all the "glue" in index.html than all over the place.

Eric Daspet

unread,

Feb 27, 2013, 10:04:48 AM2/27/13

to epu...@googlegroups.com

> how about just requiring link@rel=next and prev in content document head metadata?

Hi

I have mixed feelings about this one. It would be for sure an elegant
solution but I see two drawbacks:

* You have to open and parse all elements to be able to construct the
sequence. Not sure it is blocker (do we need to construct the whole
sequence to read a book?), but this definitely add complexity.

* You restrict yourself to elements that can embed a link@rel=next ot
similar metadata. HTML can, but should we restrict elements to HTML
files?

--
Éric Daspet

Dave Cramer

unread,

Feb 27, 2013, 10:10:28 AM2/27/13

to Hadrien Gardeur, Markus Gylling, epub-ng

* the need for a dedicated spine remains, but can perhaps be solved by using another approach than a dedicated or combined list. Daniel G mentioned html head introspection; how about just requiring link@rel=next and prev in content document head metadata?

link@rel=next would be much more respectful of HATEOAS (http://en.wikipedia.org/wiki/HATEOAS), but we're not really designing an API here.

In terms of complexity for the spec, authoring and RS, I'm not convinced that link@rel=next is better than a simple list of files in OL. I'd rather have all the "glue" in index.html than all over the place.

That's a good way to put it--having all the "glue" in one place. I think rel="next" would make it harder to edit or change books; it reminds me of the current situation where a small change in an EPUB requires touching all sorts of different things. This strikes me as a case where the "webby" thing to do is not necessarily best for the idea of a book.

Dave

Hadrien Gardeur

unread,

Feb 27, 2013, 10:15:31 AM2/27/13

to Eric Daspet, epub-ng

* You restrict yourself to elements that can embed a link@rel=next ot
similar metadata. HTML can, but should we restrict elements to HTML
files?

That's an excellent point, since expressing link@rel=next in SVG or other formats can be widely different or impossible.

Alberto Pettarin

unread,

Feb 27, 2013, 10:14:29 AM2/27/13

to epu...@googlegroups.com

On 02/27/2013 03:59 PM, Hadrien Gardeur wrote:
> * the need for a dedicated spine remains, but can perhaps be
> solved by using another approach than a dedicated or combined list.
> Daniel G mentioned html head introspection; how about just requiring
> link@rel=next and prev in content document head metadata?
>
>
> link@rel=next would be much more respectful of HATEOAS
> (http://en.wikipedia.org/wiki/HATEOAS), but we're not really designing
> an API here.
> In terms of complexity for the spec, authoring and RS, I'm not convinced
> that link@rel=next is better than a simple list of files in OL. I'd
> rather have all the "glue" in index.html than all over the place.

From a purely authoring point of view, having a separate list enables
re-using the same "page" files for different publishing products (e.g.,
subsetting a "full eBook"), requiring rebuilding just the separate list,
whereas if I embed "link@rel=next" into the files, I need to change them
if the next element changes.

I'll throw in another element: working with Audio-eBooks, I would like
to be able to create a navigable playlist (like a regular M3U playlist),
so that the user might choose whether to read+listen or just listen.
(Right now I create a "fake" playlist XHTML page for that purpose, but
it is an ugly workaround to limited EPUB 3 support in reading apps.)
Clearly I do not want to resort to some naming convention of the audio
files to infer their sequence, but rather specify it in the "rendition
order" list.

AlPe

Bill McCoy

unread,

Feb 27, 2013, 11:49:42 AM2/27/13

to EPUB NG

I may seem like the counter-revolutionary reactionist voice but rest
assured I like the baggage of OEBPS that made it into EPUB and then
was kept in EPUB 3 no more than anyone (perhaps less than most since I
had to attempt to draft prose to more clearly specify some vaguer
parts of it in EPUB 3 cycle). I personally argued to switch to RSS for
the "spine" in EPUB 2. But there were 3 flavors of RSS/Atom plus
further fragmentation from de facto dialects so that didn't turn out
to be a very good idea (and it's also a cautionary tale about making
something simpler to author not necssarily leading to overall
simplicity- ask anyone who's written a commercial-grade news reader
about the complexity of handling OPF and they will laugh you out of
the room b/c handling arbitrary RSS feeds in all their flavors and
idiosyncracies is exponentially harder).

Anyway I encourage you guys to think about even higher level
requirements inc. what is different about an "ebook" from a packaged
web app, for which there are already several dueling examples - Google
Packaged Apps, Mozilla Open Web Apps, etc.- and now a new W3C Systems
Applications WG to try to harmonize same). Presumably you guys don't
want to aim to reinvent that wheel and deal with all the implications
(inc. security). But, taken literally, Daniel's requirement #2 sounds
no different than a webapp-in-ZIP.

So what's different about a publication? To me a key overall
requirement is that a publication is "data" - it can be manipulated
downstream and presented in different ways, whereas all you can really
do with an arbitrary website or web app is "play" it and see what
happens. Is that requirement valid for you all? Is it more or less
important than the requirement that you can unzip a book and the book
is immediately readable on the Web as is?

Another example to consider is slideshows in HTML5. There are dozens
if not hundreds of ways to do these things, and all the cool kids
presenting at conferences use one of them. Mostly they all have the
property of Daniel's requirements #1, #2, and #3. But they are
generally hand-authored, somewhat laboriously, none of them work with
each other as they take wildly different approaches to basic things
like how pages are represented and if you just asked any random two
such people at a conference to concatenate their presentations the
editing party would be lengthy. I presume you don't want do just
create a 101st way to do something like this, because it provides no
interoperability, no reliable way for a reading system to introspect
things (unless it knows about all 101 variants), and no means for
making the experience accessible. And this is a subset of the problem
you are trying to solve b/c you are talking about content that isn't
just static pages but will presumably be dynamically paginated at
least some of the time.

I'm confident that an EPUB that doesn't maintain compatibility with
EPUB 2 can be much simpler. I'm at all confident that now is the time
to embark on it in earnest (given the RSS/Atom fragmentation
cautionary tale, and opportunity to harmonize with parallel work going
on to unify packaged Web apps, and the overriding need this year to
get publishers and reading systems onto the modern browser stack). To
me arguing about EPUB 3 vs. something a bit less wart-y seems to me
small beans in the big picture. But I certainly support thinking out
of the box about it even if not everyone may agree with me that it
should be for an EPUB 4 and not for 2013.

I would further urge that you guys define a requirement that there be
a well-defined transformation from the new format to EPUB 3.0. This
would mitigate concerns about fragmentation, provide a cheap way to
get validation, and also clarify the scope of the effort... i.e. what
you are doing provides a proper subset of EPUB 3.0 functionality not
an intersecting set.

Another way to say it is that if you are aiming for 80% solution vs.
EPUB 3.0, what is in the 20% that is left out? How much are you
removing "syntactic salt" (some of which delivers something concrete -
the opportunity for backwards compatibility of content with EPUB 2
Reading Systems - but is nevertheless in my view not what we
necessarily want for long-term) vs. removing functionality?

--Bill

Peter Hatch

unread,

Mar 1, 2013, 2:51:05 AM3/1/13

to Daniel Glazman, epu...@googlegroups.com

What would you think about relaxing the second requirement from
zipping a set of web pages to running a custom tool on them? Assuming
it was open-source, and created as part of the spec process.

Then decisions about what is easier for authoring and changing books
could be made somewhat independently of what is good for reading
systems. So using link@rel=next and prev for spine navigation might
work, because no one would need to hand-author it.

--
Peter Hatch

Daniel Glazman

unread,

Mar 1, 2013, 3:11:59 AM3/1/13

to Peter Hatch, Daniel Glazman, epu...@googlegroups.com

On 01/03/13 08:51, Peter Hatch wrote:

> What would you think about relaxing the second requirement from
> zipping a set of web pages to running a custom tool on them? Assuming
> it was open-source, and created as part of the spec process.

Built for all platforms, maintained across OS versions ? I have the
gut feeling this is just reinventing ZIP, that is now builtin in OS X,
Windows and Linux. All these OSes can not only open a ZIP from their
respective Finder windows, but also view them w/o unzipping in a hard
directory.
No, I don't think it's worth the pain, time, maintainance, effort and
all in all, money.

</Daniel>

Dave Cramer

unread,

Mar 1, 2013, 8:13:21 AM3/1/13

to Peter Hatch, epub-ng

On Fri, Mar 1, 2013 at 2:51 AM, Peter Hatch <peter...@gmail.com> wrote:

What would you think about relaxing the second requirement from
zipping a set of web pages to running a custom tool on them? Assuming
it was open-source, and created as part of the spec process.

Just the way EPUB zips files is a huge obstacle to most users. We have lots of editors who can easily unzip an EPUB and make a minor correction to a file, but they can't turn it back into an EPUB. We should keep in mind that not everything is produced by tools, and we want to make things easy for a classroom of school kids writing their own little books. That's one reason I keep pushing for a folder full of files, zipped in the ordinary way, with the "bookness" embedded in index.html.

Dave

Daniel Glazman

unread,

Mar 1, 2013, 8:22:26 AM3/1/13

to epu...@googlegroups.com

On 01/03/13 14:13, Dave Cramer wrote:

> Just the way EPUB zips files is a huge obstacle to most users. We have
> lots of editors who can easily unzip an EPUB and make a minor correction
> to a file, but they can't turn it back into an EPUB. We should keep in
> mind that not everything is produced by tools, and we want to make
> things easy for a classroom of school kids writing their own little
> books. That's one reason I keep pushing for a folder full of files,
> zipped in the ordinary way, with the "bookness" embedded in index.html.

Exactly. The extra field at zero in the EPUB zipping constraints is a
huge pain. I don't even mention the fact dealing with it inside a web
page is not easy and comes at the cost of a full zip library in JS.

</Daniel>

Bill McCoy

unread,

Mar 1, 2013, 9:24:49 AM3/1/13

to Daniel Glazman, epu...@googlegroups.com

It is a PITA although this is not because ZIP files are produced by hand without tools: ZIP is actually a counter-example to needing things to be hand-coding friendly. It's only that the GUI tool we normally use to make a ZIP can't make an EPUB flavor zip. Of course this can be fixed two ways.

We discussed in EPUB 3.0 WG relaxing the MIMETYPE at zero requirement which most reading systems don't even care about, but compatibility with EPUB 2 trumped. To me there is only one question that needs to be thought about in removing the requirement: what will be consequences for sniffing? This is a practical concern both in firewalls (since ZIP container is a well-known vector for malware) and browsers (since people may not serve up EPUB with proper MIME type). But I consider optionality (presumably it wouldn't be *illegal* to have it in an EPUB NG) just fine. If it turns out that adding this MIMETYPE helps grease transporting EPUBs then people will use it. And passing a sniff test does not mean you aren't malware.

--Bill

P.S. the extra field at zero was inherited from the wholesale adoption of OpenDocument ODF packaging, and was never considered for EPUB on its own merits. The modularization of ODF packaging was first done by Adobe and then "sold" to IDPF (at Adobe we then ended up also using this for AIR package, the short-lived XML serialization of PDF, etc.).

Alberto Pettarin

unread,

Mar 1, 2013, 9:36:31 AM3/1/13

to epu...@googlegroups.com

On 03/01/2013 03:24 PM, Bill McCoy wrote:
But I consider optionality
> (presumably it wouldn't be *illegal* to have it in an EPUB NG) just
> fine. If it turns out that adding this MIMETYPE helps grease
> transporting EPUBs then people will use it.

Indeed I was reflecting on the fact that, so far, nothing impedes to
create a ZIP file that contains *both* a valid "EPUB NG" and a valid
"EPUB 2/3". Clearly what resources will be rendered/supported will
depend on the User Agent, but, in my opinion, this is not a negative
characteristic.

AlPe

Baldur Bjarnason

unread,

Mar 1, 2013, 10:20:48 AM3/1/13

to epub-ng

On 1 Mar 2013, at 14:24, Bill McCoy <whm...@gmail.com> wrote:

> It is a PITA although this is not because ZIP files are produced by hand without tools: ZIP is actually a counter-example to needing things to be hand-coding friendly. It's only that the GUI tool we normally use to make a ZIP can't make an EPUB flavor zip. Of course this can be fixed two ways.
>
> We discussed in EPUB 3.0 WG relaxing the MIMETYPE at zero requirement which most reading systems don't even care about, but compatibility with EPUB 2 trumped. To me there is only one question that needs to be thought about in removing the requirement: what will be consequences for sniffing? This is a practical concern both in firewalls (since ZIP container is a well-known vector for malware) and browsers (since people may not serve up EPUB with proper MIME type). But I consider optionality (presumably it wouldn't be *illegal* to have it in an EPUB NG) just fine. If it turns out that adding this MIMETYPE helps grease transporting EPUBs then people will use it. And passing a sniff test does not mean you aren't malware.
>
> --Bill
>
> P.S. the extra field at zero was inherited from the wholesale adoption of OpenDocument ODF packaging, and was never considered for EPUB on its own merits. The modularization of ODF packaging was first done by Adobe and then "sold" to IDPF (at Adobe we then ended up also using this for AIR package, the short-lived XML serialization of PDF, etc.).

Since we can't expect all major OSes to update their zip tools to automatically create archives with mimetype field, I'd suggest that requiring one for e0 is a non-starter. As both Daniel and Dave have said, EPUB's archive requirements cause huge problems for a lot of regular users trying to make ebooks.

Add to that the fact that requiring an e0-specific mimetype field would, as Alberto points out, make it impossible for somebody to create valid hybrid EPUB3/e0 books and the case for the mimetype at zero is pretty close to dead. Being able to create hybrid archives is, IMO, a huge plus, even if it's only done by converting a e0 file into a EPUB3/e0 hybrid and never authored directly.

I appreciate why the mimetype field is a part of EPUB, of course, but as I understand it, this exercise is about seeing how easy and simple a web-based ebook format we can make when we don't have to worry about backwards compatibility.

And since we're explicitly not bound by backwards compatibility I see no reason for keeping the mimetype field.

My former employer was a vendor specialising in large-scale corporate anti-malware and anti-span scanning so I feel that there are a few points on the malware angle I can highlight.

1. Whether or not a zip archive has a mimetype field has zero bearing on whether it is flagged as malware by any of the major anti-malware scanners. They scan based on either signatures (is this a known malware file or not) or heuristics and the heuristics are too sophisticated to be fooled or assuaged by a mimetype field. In fact, every single anti-malware engineer I know would consider a scanner flagging files merely based on the existence or not of the mimetype field to be a bug and would try to fix it. Which means that it is not our problem.

2. Firewalls that have strict behaviours when it comes to what files they let through are either misconfigured (and thus not our problem) or they are this strict intentionally. A firewall that behaves like this is in a large organisation is going to encounter dozens if not hundreds of false positives a day in a variety of file formats. The sysadmins almost certainly prefer it to behave this way and *want* it to block unknown files. These guys are security fascists and proud of it. We have no standing to try and second-guess how they do their job. We may think they're stupid but they are employed to secure large corporate networks regularly under attack with hundreds of clueless employees constantly falling prey to social attack vectors and we aren't. Many of them, were they aware of it, would consider our attempts to work around their blocks and try to pass file types they don't know through their firewall to be attempts to exploit bugs in their setup. Which again, means it's not our problem.

3. Corporate firewalls are much less of a concern now for ebooks since most, if not all, users are using their own phones and tablets to read and since Bring Your Own Device is becoming more and more common in companies. Both of these are major changes from a few years ago.

- best
- baldur

Bill McCoy

unread,

Mar 1, 2013, 10:57:39 AM3/1/13

to Baldur Bjarnason, epub-ng

Baldur, I agree with you 100% (there's a first time for everything! ;-) ). I was not expecting that we could realistically get OS's to upgrade ZIP libraries, only pointing out that this isn't an issue of "hand coding" but of being able to use widespread tools.

I was not clear in my comments about firewall/email attachment scanning, the concern (which I don't think trumps the proposed simplification) is only that If an EPUB NG were so laissez-faire as be indistinguishable from a zipped website, it might be blocked by default by more systems. Anecdotally I know lots of systems have blocked ZIP files full stop, but now that there are more uses of ZIP-based formats (.docx of course being the largest of these) inspection is done. .docx doesn't seem to have the brute-force MIMETYPE-at-0 of EPUB but it does have a particular fixed structure.

--Bill

Hadrien Gardeur

unread,

Mar 1, 2013, 11:22:13 AM3/1/13

to Bill McCoy, Baldur Bjarnason, epub-ng

Talking about media types: having the same media type for both DRM and non-DRM books in EPUB 2/3 is a major headache.

In OPDS we had to consider that no catalog ever links to a DRM file and instead relies on a different format in between (ACSM for ACS4).

If EPUB NG adopts a different media type than EPUB (it should), DRM files should be forbidden to share the exact same media type (at the very least, a media parameter should be provided).

Reply all

Reply to author

Forward