Long spine issue

63 views
Skip to first unread message

Ishii, Koji a | Koji | EBJB

unread,
Jan 5, 2013, 10:21:13 AM1/5/13
to epub-work...@googlegroups.com
Hello WG,

This may or may not a spec issue, but I hope you don't mind me sending to the ML to ask for opinions.

The issue I heard from a publisher is that, they have a long publication in which they do not want page breaks at all. The physical book is formatted that way, as specified by the author.

They created it as a single XHTML file of 1.6MB. Tests on several RS revealed that the file takes 30 secs to open on the fastest RS, to 3 mins on the slowest. The publication opens instantly on Kindle and all traditional formats they tested.

A quick study indicates that there are at least 0.5% publications that do not want to break pages at all. One of them are selling so well these days that the number of customers affected by this issue increased.

Is there anything EPUB WG can help this situation, or is this an issue that is solely up to RS implementations?

Although EPUB spec does not define behavior at spine boundaries (correct?) given all RS break pages, and given there is no single RS that can load a long spine quickly, I wonder maybe there is something we can do here.

Thoughts?

/koji


Bill McCoy

unread,
Jan 5, 2013, 10:53:09 AM1/5/13
to epub-work...@googlegroups.com
> they have a long publication in which they do not want
> page breaks at all. 
> The physical book is formatted that way, 
> as specified by the author.

Is this a Torah scroll, with formatting specification presumably courtesy of Yahweh? :-)

Seriously though - what kind of physical book has no page breaks, and how do you get to the data that 1 out of every 200 books fits in this category?

Best,

--Bill

Matt Garrish

unread,
Jan 5, 2013, 12:01:52 PM1/5/13
to epub-work...@googlegroups.com
I may be wrong in my interpretation here, but I believe the question isn’t physical page breaks but that chapters are run together so that there are no empty or partially empty pages where one ends and the next begins, in the print and now also in the ebook.
 
Reflowable ebooks kind of augment this ugliness, as it’s not uncommon to find a few widowed words as the last “page”, since you lack control over how the content will fit in any given device at any given font size, spacing, line height, etc. I’m sure I remember at least one person asking in the forums how they could ensure their content could render evenly throughout to avoid gaps, but I think that was in the context of the other problem of images breaking onto new pages.
 
From an accessibility perspective, I’ve always wondered myself if there were a way to effectively load a book in full just to get away from the general problems that data chunking entails (loss of the original document structure, part headings often sitting alone in their own file, etc.). Some means of indicating how to progressively load a large document, in other words.
 
I just don’t know that it’s technically feasible all in one go due to CSS rendering and paginating requirements, but I don’t develop reading systems, either. To be in the specification, we’d need some known way of implementing such a requirement, otherwise it would be just a nice fantasy to suggest that reading systems should fast load large documents.
 
Whether it’s realistic to expect a reading system to detect the last page and load the next document immediately into the same “page” appears to be the question below.
 
Matt

MicheleR

unread,
Jan 5, 2013, 12:12:12 PM1/5/13
to epub-work...@googlegroups.com
On 1/5/2013 12:01 PM, Matt Garrish wrote:
> I may be wrong in my interpretation here, but I believe the question isn’t
> physical page breaks but that chapters are run together so that there are no
> empty or partially empty pages where one ends and the next begins, in the print
> and now also in the ebook.
> Reflowable ebooks kind of augment this ugliness, as it’s not uncommon to find a
> few widowed words as the last “page”, since you lack control over how the
> content will fit in any given device at any given font size, spacing, line
> height, etc.

Am I understanding correctly: there is currently no way to specify that chapters
NOT begin a new "page" when rendered on a reading system, except by putting them
all in one big content document?

Michele

Matt Garrish

unread,
Jan 5, 2013, 12:20:21 PM1/5/13
to epub-work...@googlegroups.com
All in one content document ensures no page breaks, not necessarily one
single content document for the whole book. The question here is how to
ensure no page breaks anywhere in an ebook, though.

If your content is together in a single file, it is typically not broken
onto new pages during rendering (unless someone is supporting the CSS page
break properties). When a reading system reaches the end of one content
document in the spine, the next is loaded as a new page. It's become a de
facto means of ensuring page breaks because the CSS properties aren't
reliable.

Matt

-----Original Message-----
From: MicheleR
Sent: Saturday, January 05, 2013 12:12 PM
To: epub-work...@googlegroups.com
Subject: Re: Long spine issue

Garth Conboy

unread,
Jan 5, 2013, 12:26:28 PM1/5/13
to epub-work...@googlegroups.com
Hmmm… In EPUB 2.0.1 there is:

2.3.2: body Element

It is assumed, in formatting, that the default rendering for body is consistent with the CSS property page-break-before having been set to right (which behaves like always on one-page Reading Systems), but maybe overridden by an appropriate style sheet declaration.

Thus, one should be able to have multiple <spine> items and start each one with a "page-break-before: avoid" on the <body> and they would be "run together".  However, not all EPUB 2-ish reading systems supported that feature -- some would force page-breaks at <spine>/<itemref> boundaries regardless of the <body> CSS, and others would force maximum sized <spine> items and insert hrs breaks if needed for memory/pagination constraints.

Somewhat to my surprise we seem (unless I'm missing something) to be silent on this topic in EPUB 3.  Is this an oversight, or a conversation/decision that I don't remember?  It seems pretty strange not to say that page-breaks are "expected" (and perhaps over-ride-able) between the rendering of the various <spine>/<itemref>s.  The closest I could find in EPUB 3 is:

The spine represents an ordered subset of the Publication Resources listed in the manifest, with content items not being referenced being ancillary to those that do.

Reading Systems must provide a means of rendering a Publication in the order defined by the spine, which includes: 1) recognizing the first primary (linear='yes'item in the spine as the beginning of the main reading order of the Publication; and, 2) rendering successive primary items in the order given in the spine.

Which is silent on what happens between <itemref>s.  If this is an oversight, we should take it up in EPIB 3.0.1 effort.

All of this said, for memory/performance reasons, it may be nearly impossible for some Reading Systems to support either single huge <spine> items or synthetically stitch many smaller ones together.

Best,
  Garth

Matt Garrish

unread,
Jan 5, 2013, 12:42:04 PM1/5/13
to epub-work...@googlegroups.com
> Somewhat to my surprise we seem (unless I'm missing something) to be silent on this topic in EPUB 3.
 
I don’t recall coming across any prose to that effect in my many passes, so I’m reasonably certain it’s not in 3. I guess I may have been incorrectly assuming that page breaking was simply not under authoring control any more. Would be good to clarify if that was (not) an intentional omission, even if, as you suggest, it may not be implementable for all reading systems.
 
Thanks for clarifying,
 
Matt

Bill McCoy

unread,
Jan 5, 2013, 2:07:42 PM1/5/13
to epub-work...@googlegroups.com
I don't recall it being discussed during EPUB 3 nor believe it was an intentional omission - implying it was an accidental editing artifact from the major document reformatting that should be addressed in EPUB 3.0.1. So I entered it as issue #250.

However I'm not sure it fully addresses the case of multiple spine items each with "page-break-before:avoid" on the body element, since in the browser with web pages you can't run together different HTML pages in that manner. May be we need further prose to be more explicit about intended handling?

(Also, my silliness about Torah aside, I do think there's an actual but separate issue about expressing preference for a scrolling presentation a la normal web pages - by spine item or globally - instead of any pagination at all. This perhaps something FXL metadata++ effort or AHL WG can consider)

--Bill

Ishii, Koji a | Koji | EBJB

unread,
Jan 6, 2013, 12:13:26 AM1/6/13
to epub-work...@googlegroups.com

Thank you guys, my original wording wasn’t clear enough, sorry about that, and thank you for guessing the right one. The book in question has 647 pages without forced page breaks at all to keep the fast reading rhythm, even at the chapter boundaries.

 

I hope us to discuss on the following points when we discuss on #250:

1.     The EPUB 2.0.1 language defines page-break-before:right. Should this be right for both LTR and RTL books, or should this be left for RTL books?

2.     In terms of the recommendation for authors if s/he wants no forced page breaks at all, "page-break-before:avoid" on body as Garth suggested is a possible good one from spec perspective, but I’d like more discussions here including opinions from implementations. Concatenating two XHTMLs into a rendering engine might not be technically easy. Also it might not be interoperable if done differently such since how much margins we expect between two body tags are not clearly defined in CSS. I wish the spec be both good, implementable, and interoperable.

 

As far as I observed Kindle behavior by turning pages very fast, I can see it loads incrementally. Not clear if spines were split beforehand, or it loads long spines incrementally. But I’d like the feature implementable with the similar amount of efforts as the competitors.

 

Anyway, thanks for the clarifications, Garth, Matt, and Bill. This definitely help us all.

 

/koji

Lee Passey

unread,
Jan 6, 2013, 1:39:18 PM1/6/13
to epub-work...@googlegroups.com
On 1/5/2013 10:13 PM, Ishii, Koji a | Koji | EBJB wrote:

[snip]

> 2. In terms of the recommendation for authors if s/he wants no
> forced page breaks at all, "page-break-before:avoid" on body as Garth
> suggested is a possible good one from spec perspective, but I'd like
> more discussions here including opinions from implementations.

Suppose I load, parse, and display a single spine element. The last
"page" occupies only half of the screen due to reflowing. When I load
the next spine element for display, if I then encounter the
"page-break-before" style element it's already too late to merge it with
the previous "page", because that half-page has already been displayed.

Instead, I should mark the previous <body> or other division with
"page-break-after: avoid". That way there is a signal to the user agent
as soon as the spine element is loaded that the next spine element must
be loaded and parsed before rendition of the current spine element can
be completed.

Some decision will have to be made as to precedence. For example, if
spine element 3 is marked with "page-break-after:avoid" and spine
element 4 is marked with "page-break-before:always", which has precedence?

Ishii, Koji a | Koji | EBJB

unread,
Jan 7, 2013, 1:12:09 AM1/7/13
to epub-work...@googlegroups.com
Thank you Lee, that's all good valid points.

I see more possible issues, like what happens if user jumps to the following spine, then tuned pages back. What happens if the two documents link to different CSS, or contains different JavaScript.

If we are following W3C specs as they are defined, we can just refer to the spec. But cases like this, loading two XHTML documents into a rendering engine, is beyond what W3C defines, and if it's us to use that way, we need to take burdens to define exact behavior and to make it interoperable.

I'm not pushing any specific conclusions yet, but I think -- this may be a repeat of what Lee wanted to say -- it's not as easy as just to specify how author can specify the CSS to make it happen.

/koji


Bill McCoy

unread,
Jan 7, 2013, 11:14:36 AM1/7/13
to epub-work...@googlegroups.com
There is one context where W3C specs define results of loading multiple XHTML documents into a rendering engine: iframes. But in this case the multiple documents are cleanly sandboxed from each other both logically (totally separate DOMs) and in terms of rendering real estate (simple rectangles).

I think one issue with EPUB is that it doesn't fully define its execution model semantics. This helps give reading systems more flexibility to treat content as data but it has serious issues esp. with spine-level scripting.

I would be in favor of trying to more completely define the execution model of EPUB perhaps as soon as EPUB 3.0.1. We can hopefully be informed by related new W3C work such as on System Applications (see e.g.  http://abarth.github.com/sysapps/drafts/runtime.html ). In the context of the execution model of EPUB, perhaps there's an implicit outer document into which each spine item is then loaded. HTML5 iframe has "seamless" attribute, maybe in our case it goes further than that.

But I think there are going to be issues that will limit how seamlessly we can couple multiple XHTML documents for rendering purposes. As each spine item potentially has its own CSS, JS, etc. as Ishii-san says. Surely they can be combined onto one logical page without breaks - after all multiple iframes can coexist on a page - but I can't see for example a paragraph that has parts from two different spine items. The situation where the bodies of two spine items could be in effect lexically combined is always going to be at best a special case or a heurstic that some content will break (just like not all web paegs can be viewed well through rendering add-ons like Readability/Instapaper).

--Bill

mrot...@twcny.rr.com

unread,
Jan 7, 2013, 11:24:05 AM1/7/13
to epub-work...@googlegroups.com
Hi all --

I don't know if you have been following the recent "long spine" discussion on the epub list, but it does have some marginal relevance to our work in that it seems clear that "chunked" indexes will not be automatically integrated back into a single document for display purposes -- each new "chunk" will begin on a new "page". This is pertinent to our discussion about breaking up very large indexes, whether to repeat an index:group title, etc.

There is nothing we need to do related to this; I just wanted you all to be aware of the discussion, since it's tangentially related to our work. Part of the discussion is present below; you can review the messages in the list archives if you want to trace it back completely.

Michele

Cramer, Dave

unread,
Jan 7, 2013, 7:58:40 PM1/7/13
to epub-work...@googlegroups.com
We run chapters together all the time, to make the print book use fewer pages. We haven't been concerned about copying this design treatment in EPUB. In situations where we want to avoid page breaks between chapters, we always have the option of putting large sections of text into a single HTML file—a lot of text will fit in a 300k file, and even for longer books five or six chunks doesn't seem unreasonable.

From my narrow perspective in U.S. trade publishing, I don't think adding this sort of complexity to the spec is worth it. 

Dave Cramer
Hachette


From: Bill McCoy <whm...@gmail.com>
Reply-To: "epub-work...@googlegroups.com" <epub-work...@googlegroups.com>
Date: Mon, 7 Jan 2013 11:14:36 -0500
To: "epub-work...@googlegroups.com" <epub-work...@googlegroups.com>
Subject: Re: Long spine issue

Ishii, Koji a | Koji | EBJB

unread,
Jan 7, 2013, 8:49:12 PM1/7/13
to epub-work...@googlegroups.com

Hachette, thank you for the valuable feedback. I agree with you, we should try to keep things as simple as possible, and “use long spine if you don’t want forced page breaks” is one of the possible solutions.

 

The problem right now is that, we don’t have a clear message to tell to authors for how to create a long document without forced page breaks. 300k might be ok, but 900k or 1.6MB in my case usually give bad experience. Googling “epub slow” gives some real examples.

 

If you ask vendors for better experience, it’s likely that they’ll recommend you to split spines, saying it is known to be the best practice, but you can’t avoid page breaks if you go that route, and you’re stuck.

 

I would like to have a single message where authors and vendors agree with. It can be either yours or Bill’s.

 

/koji

Bill McCoy

unread,
Jan 7, 2013, 9:10:24 PM1/7/13
to epub-work...@googlegroups.com
I don't see my message and Dave Cramer's as inconsistent: I agree with everything he wrote (esp. as he clarified that his perspective was re: U.S. trade publishing) and I'm not necessarily advocating that we define a metadata means for publications to express a preference for coalesced page rendering across spine items: even if we did do so many reading systems surely would ignore it to keep their implementations simple.

But I am in favor of clarifying the basic execution model of an EPUB reading system and also accidentally dropped language Garth noted between EPUB 2 and EPUB 3 is a bona fide issue. So I would advocate that we a) decide what we want to do about it and b) make the spec as clear as possible reflecting that decision.

--Bill

AUDRAIN LUC

unread,
Jan 8, 2013, 2:06:43 AM1/8/13
to epub-work...@googlegroups.com, epub-work...@googlegroups.com
In Hachette Livre France, we never have been neither concerned about page breaks between spine elements from our publishers. We started massive ePub 2 production in 2009 and had then to split large HTML files to be less than 300kB. We still do. 

I do agree with Dave that 300 kN allows for already large trade books and if larger, one or two page breaks isn't a serious problem. 

I agree it would be fine to do better but please not to bring more complexity. We have so much to solve yet, and IMO, about text, good vertical typography should be addressed before.  

And conceptually, I think an HTML rendering engine is a "page display" engine : it consumes one "page file" at a time. To make it aware that the end of the consumed "page file" will finish in the middle of the actual "device page" is something that would need a meta  level of understanding. 
The HTML rendering engine would then consume "content file" and would grab some more content in the next spine content file to fill the rest of the "device page". 
Perhaps there are some discussions in that direction in HTML5 W3C working group?

Luc

Ishii, Koji a | Koji | EBJB

unread,
Jan 8, 2013, 9:39:07 AM1/8/13
to epub-work...@googlegroups.com

First of all, if you have anything that should be addressed in text and vertical typography, please post to www-...@w3.org, or post here and I’ll forward them to CSS WG. I’m not hearing much requests in this area, if you have any, I’d appreciate to know.

 

Second. I agree, we should try our best not to increase complexity, and I’d like the solution as simple as possible too. But since we want EPUB to be single, global, open standard, two members saying 300KB is enough does not invalidate wish for 1.6MB from other members. I agree it needs to be properly prioritized depends on how many members wish though.

 

Third. HTML, from W3C perspective, at least as of now, is not a page display engine unfortunately. Technically speaking, it’s CSS WG’s responsibility, not HTML WG’s, and CSS WG often hears wishes in this area, but there’re no volunteers to work on paged media today, so there’s very little progress. Actually, since most browser users have little interests in improving printing a.k.a, paged media, CSS WG appreciates volunteers from e-book world.

 

Last, to Bill’s comment. Apart from the wish, I agree that clarifying the basic execution model of an EPUB RS helps us to be more interoperable and is good thing to work on. I agree that discussing and clarifying spine boundary behavior is also great. Knowing these two are separate topic from the wish, I appreciate the wish can also be discussed when we start EPUGB 3.(0).1.

 

/koji

AUDRAIN LUC

unread,
Jan 8, 2013, 9:53:46 AM1/8/13
to epub-work...@googlegroups.com

Third : you are right HTML itself is not “a page rendering engine”, but a web browser is, is it not ? Don’t we call an HTML file a “web page”? And is it not consumed as such by the web browser?

Luc

 

 

De : epub-work...@googlegroups.com [mailto:epub-work...@googlegroups.com] De la part de Ishii, Koji a | Koji | EBJB
Envoyé : mardi 8 janvier 2013 15:39
À : epub-work...@googlegroups.com
Objet : RE: Long spine issue

Ishii, Koji a | Koji | EBJB

unread,
Jan 8, 2013, 10:17:07 AM1/8/13
to epub-work...@googlegroups.com

Third : you are right HTML itself is not “a page rendering engine”, but a web browser is, is it not ? Don’t we call an HTML file a “web page”? And is it not consumed as such by the web browser?

 

Maybe I understand “page” incorrect, sorry about that. HTML does not define layout at all, CSS does. CSS defines two modes; scrollable media and paged media. Paged media is used only on printing in browsers, so most browser vendors do not spend much efforts on it, but e-books relies on it heavily, so issues that happens only on paged media are hard to resolve.

 

That’s what I meant “page,” and looks like I misunderstood. Sorry about this.

Reply all
Reply to author
Forward
0 new messages