--
You received this message because you are subscribed to the Google Groups "EPUB Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epub-working-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
In practice, the original XHTML "source" is not available at the Javascript / DOM layer, so if I am not mistaken, real world implementations (such as the Readium-SDK [1]) compute CFI character offsets based on DOM text nodes. At this stage in the XML parsing / processing pipeline, "insignificant" whitespace (such as source indentation / spacing characters used for pretty-formatting the original source) has already been discarded, and adjacent whitespace characters within mixed-content XML fragments have already been collapsed into single spaces.
So, as much as I appreciate the design motivation behind CFi's "DOM-independent" approach, we need to assert that this is a realistically-implementable approach. What I have seen so far seems to contradict this assumption, but perhaps I am missing something. Jim, Boris, any thoughts?
[1]
https://github.com/readium/readium-sdk/blob/master/ePub3/ePub/cfi.cpp
https://github.com/readium/SDKLauncher-OSX/blob/master/Scripts/lib/epub_cfi.js
https://github.com/readium/SDKLauncher-OSX/blob/master/Scripts/js/views/cfi_navigation_logic.js
The Indexes Working Group discussed this quite a bit though it doesn’t impinge on the draft spec since we allowed CFIs or id/href “linking”.
But we did note in discussion that indexers have their own tools for creating indexes (embedded with in InDesign or Word, or stand-alone CINDEX, Sky …) which are unlikely to provide CFI support anytime soon. The timing of how indexes are written with respect to the writing of the text and it’s publishing to an EPUB also work against this.
It seemed more likely that id/href based index links would be provided by publishers. Then the EPUB “build” process/software might convert id/href linking into CFIs when the index is added to the EPUB.
I would like to hear other thoughts on this issue.
Dave
At first blush, I think that DOMRange and CFI could, in effect, be losslessly round-tripped,
There’s also the annoying pragmatics of round-tripping CFIs to Adobe Location Strings (from RMSDK). We do that trivially now, and we have millions of location strings stored that must still work when we switch to an ePub 3 engine. Something that breaks that (i.e., a non-round-trippable solution) is a non-starter for us.
From: epub-work...@googlegroups.com [mailto:epub-work...@googlegroups.com]
On Behalf Of Jim Dovey
Sent: Tuesday, April 2, 2013 9:11 AM
To: <epub-work...@googlegroups.com>
Cc: epub-work...@googlegroups.com
Subject: Re: [301][NeedsDiscussion] CFI (W3C DOM Range, text-node vs text-range)
I should clarify my reference to node-index when generating CFIs: I look through the list from 1 to n and count the number of element nodes I encounter.
Sent from my iPhone
This electronic mail message contains information that (a) is or
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE
PROTECTED
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of
the addressee(s) named herein. If you are not an intended
recipient, please contact the sender immediately and take the
steps necessary to delete the message completely from your
computer system.
Not Intended as a Substitute for a Writing: Notwithstanding the
Uniform Electronic Transaction Act or any other law of similar
effect, absent an express statement to the contrary, this e-mail
message, its contents, and any attachments hereto are not
intended
to represent an offer or acceptance to enter into a contract and
are not otherwise intended to bind this sender,
barnesandnoble.com
llc, barnesandnoble.com inc. or any other person or entity.
> Bookmark/annotation implementations that are browser-based (99.9% of them?)
Also as was also mentioned CFIs are able to be compared outside of parsing the document to create a DOM, which is a property that we make use of for the Millions (constantly growing) of locations (Bookmarks, Annotations, Reading Positions) that we have stored for our users.
--
-jim
This electronic mail message contains information that (a) is or
There’s also the annoying pragmatics of round-tripping CFIs to Adobe Location Strings (from RMSDK). We do that trivially now, and we have millions of location strings stored that must still work when we switch to an ePub 3 engine. Something that breaks that (i.e., a non-round-trippable solution) is a non-starter for us.
This electronic mail message contains information that (a) is or
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE
PROTECTED
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of
the addressee(s) named herein. If you are not an intended
recipient, please contact the sender immediately and take the
steps necessary to delete the message completely from your
computer system.
Not Intended as a Substitute for a Writing: Notwithstanding the
Uniform Electronic Transaction Act or any other law of similar
effect, absent an express statement to the contrary, this e-mail
message, its contents, and any attachments hereto are not
intended
to represent an offer or acceptance to enter into a contract and
are not otherwise intended to bind this sender,
barnesandnoble.com
llc, barnesandnoble.com inc. or any other person or entity.
--
It’s more complicated than that.
We will have ePub 2.0 renderers for several years; indefinitely, perhaps, on eInk devices. These devices use RMSDK and use location strings for bookmarks, annotations, etc.
Future Reading Systems will support ePub 3.0, but MUST support interoperable bookmarks, et. al. with older engines, or people get very annoyed with us. There have been lawsuits filed over “lost” annotations.
CFIs were designed to be interoperable with location strings (they were at least partly designed by the same guy).
Unless there’s a trivial, accurate mapping from a location string to whatever-you-envision-to-replace CFIs, again, it’s a non-starter for us.
From:
epub-work...@googlegroups.com [mailto:epub-work...@googlegroups.com]
On Behalf Of Jim Dovey
Sent: Tuesday, April 2, 2013 11:25 AM
To: <epub-work...@googlegroups.com>
Subject: Re: [301][NeedsDiscussion] CFI (W3C DOM Range, text-node vs text-range)
On 2013-04-02, at 12:52 PM, Roger Webster <rweb...@book.com> wrote:
--
You received this message because you are subscribed to the Google Groups "EPUB Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
epub-working-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
We can leave the CFI specification as it is, i.e. character offsets are based on raw XHTML text stream
The other option is for CFIs to be designed / aligned with web browser technology
Sorry for jumping into the discussion a bit late, I do not read this group too often.Could someone clarify what is the problem of implementing CFI in JavaScript?
I think this is the case for HTML DOM in the browser window (or for APIs like innerHTML). I think XML parsing (in DOMParser or XMLHttpRequest.responseDocument) is much more consistent (and does not strip whitespace, for instance).
Why is "termstep" optional in "local_path" used by "range" in the grammar? If "termstep" were required, then it would be clear that DOM ranges and CFI ranges cannot be round-tripped — a CFI range would always use spots on two leaves, while DOM could also put its hands on each side of a branch. Since "termstep" is optional, I'd suggest the spec should allow the final step to refer to both text collections AND elements not actually present in the document. This retains DOM-independence for purposes of sorting and comparison.
So, this is a pretty critical issue and we seem to have various diverging opinions on what the CFI data model should be (whitespace handling, entities expansion), compatibility with Adobe's RMSDK locations (non-DOM), practical reality of EPUB3 implementations (DOM), etc.I suggest that we allocate some discussion time during next week's conference call (I hope that Peter Sorotokin and Jim Dovey can make it, as well as anyone else involved in Adobe's RMSDK or Readium-SDK -related products). In the meantime, please feel free to continue to comment in this email thread.
--
The side discussion about DOM (raw character stream, XML InfoSet, etc.) emerged from the lack of clarity in the CFI normative prose regarding the data model to use (which made it difficult to discuss the original issue at hand). I will file a separate issue.