E0 and parallel texts

97 views
Skip to first unread message

Alberto Pettarin

unread,
Sep 5, 2013, 3:55:01 AM9/5/13
to epu...@googlegroups.com
Hi everyone,

I apologize in advance, because the topic might be out-of-scope for E0.
In that case, please consider this email an "informal" RFC on my
students' project.

Currently the EPUB AHL group is discussing some gadgets for supporting
multilingual books. Unfortunately, they are focusing on switching
between languages, and on linking alternate renditions (e.g., text-only
vs. replica). The words "parallel text(s)", as defined in
http://en.wikipedia.org/wiki/Parallel_text seem not to appear in the
publicly available AHL documents.

In Italy, however, publishers are keen to find a good way of producing
ebooks of parallel texts, because we consume a lot of Latin/ancient
Greek/foreign literature with original + Italian translation. The
current state-of-the-art solutions:

a) fixed layout
b) reflowable with internal links
c) reflowable with the two languages interleaved

do not seem handy enough for the job, to the extent that some publisher
put themselves into the "proprietary app" dead-end.

I have recently supervised a project at my (former) University, where
three students coded a proof-of-concept Android app for EPUB 2/3, which
has a split-screen feature: you can split the screen to read two ebooks
independently, to open the footnotes, and also to consult parallel texts.

In particular, we adopted the following naming convention: if in the
package there are two XHTML pages named:

foo.XX.xhtml and foo.YY.xhtml

where XX and YY are two 2-letters ISO 639-2 codes (e.g., "en" and "it"),
those two chapters must be considered "parallel". Other resources, like
bar.xhtml, are considered "common" to all languages (e.g., cover.xhtml
or 1-language-only appendix). This enables the app to let the user
choose the two languages to be shown simultaneously, and to synchronize
the chapter turns.

You can see some pictorial screenshots here:
http://www.albertopettarin.it/rs2.html

My questions are:

A) Shall we embed this tiny convention into E0?
B) Do you think it is out-of-scope for E0? If so, can you propose an
alternative for supporting parallel texts within the current EPUB/E0
specification?

Apologies for the long mail, have a nice day,

Alberto Pettarin



PS: the apk and the source code of the Android app will be published in
a couple of weeks; moreover, I have some more students willing to work
on this project, and I plan to have one of them to support E0 (in
addition to EPUB).

Dave Cramer

unread,
Sep 5, 2013, 8:53:03 AM9/5/13
to Alberto Pettarin, epub-ng
Hi Alberto,

I don't think anything is out-of-scope here!

The filenaming scheme seems handy, but is that really enough to
accomplish the task? Do you need to describe a mapping at the
paragraph level or below between the two texts? What would the main
navigation document look like?

Lots of interesting questions here. One could say that the reading
system should be a blank slate (or an empty viewport!), and if the
content author wants to present two works side-by-side they should
just author the HTML that way, and the author would have an obligation
to create the necessary interface.

Or a reading system could provide a feature to display two books at
once, and if the two books had matching document structures they could
be viewed in parallel.

Or perhaps we could view this as a special case of annotations. A
primary text has other text associated with it, which can be displayed
at the option of the user. Having a parallel stream could be used for
footnotes, instructor's editions, pedagogical notes, extended
examples, all sorts of things even including an alternate version of
the primary text. It's interesting to see the original manuscript of
T.S. Eliot's Wasteland side-by-side with the published version.

I'm sorry I don't know that the AHL group is envisioning here...

Dave

Alberto Pettarin

unread,
Sep 5, 2013, 9:42:56 AM9/5/13
to epub-ng
On 09/05/2013 02:53 PM, Dave Cramer wrote:
> Hi Alberto,
>
> I don't think anything is out-of-scope here!
>
> The filenaming scheme seems handy, but is that really enough to
> accomplish the task? Do you need to describe a mapping at the
> paragraph level or below between the two texts? What would the main
> navigation document look like?

That is an argument I omitted in the previous email for the sake of
brevity, but you are absolutely spot on.

In general, for the "casual" reader, chapter/section -level sync is
enough, while the "pro/academic" reader needs (at least) paragraph-level
sync.

Limiting our focus to parallel texts for the moment, at least for
classics (think about the Loeb series) things are simple: for example,
the Latin text has 7 <p>'s or verses, and the translation has 7 <p>'s or
verses. In this case, even without manual explicit markup, the
RS/packaging SW can easily build the mapping between two texts, based on
DOM information only. With modern languages the problem worsens a bit,
but there are still good heuristics that allows reconstructing the
mapping automatically. Clearly these "automatic" mechanisms can be
broken by a suitably adversarial choice of the contents (e.g., Latin
poetry translated in Italian prose).

> Lots of interesting questions here. One could say that the reading
> system should be a blank slate (or an empty viewport!), and if the
> content author wants to present two works side-by-side they should
> just author the HTML that way, and the author would have an obligation
> to create the necessary interface.
>
> Or a reading system could provide a feature to display two books at
> once, and if the two books had matching document structures they could
> be viewed in parallel.
>
> Or perhaps we could view this as a special case of annotations. A
> primary text has other text associated with it, which can be displayed
> at the option of the user. Having a parallel stream could be used for
> footnotes, instructor's editions, pedagogical notes, extended
> examples, all sorts of things even including an alternate version of
> the primary text. It's interesting to see the original manuscript of
> T.S. Eliot's Wasteland side-by-side with the published version.

Correct, the problem can get much more abstract and general than just
"parallel texts".

I have a quite gut feeling against requiring authors/authoring SW to
create "presentational" structures, when that burden can be replaced by
markup + smarter reading systems. The main problem I see is the
following: if I have 4 languages in my ebook, I would need to prepare
and package 6 different 2-langs renditions. Moreover, I am also
forbidding the reader to simultaneously peruse 3-langs or even 4-langs
in parallel: now I need to provide additional 4 renditions! That seems
to get quickly out of control.

On the other hand, I recognize that a "unified" approach to all these
problems (rather than a quick-and-dirty, naming-based approach) is more
robust and elegant. Perhaps the best approach is a (light) markup to
signal the "mapping anchors" between different resources, which the RS
can leverage to show parallel/detailed renditions.

Bests,

AlPe

richardigp

unread,
Sep 8, 2013, 10:29:39 AM9/8/13
to epu...@googlegroups.com
Hi Alberto,

This raises an interesting direction for E0. One that I am very keen to follow. Once the core packaging is agreed, and kept as simple as possible, the ability to add functionality Javascript libraries with recommended tagging patterns for specific requirements becomes a straight-forward business. It also considerably reduces development and production costs.

Because your requirement is limited to specific types of content (as are most of ours) I would like to see this as a named extension for E0. EG: E0-Parallel Languages.

The heart of E0 as I understand it is the simplest possible framework for the simplest possible packaging of a basic book (linear reading, no interactivity). Once that is set the world of E0 is potentially an amazingly flexible container that can pretty much do anything on an agreed specified or proprietary basis.

I am assuming that E0 will be HTML5/CSS-3 and Javascript moving forward. Or an implementation will be in native platform code if such a requirement is felt necessary. I know that the specification cannot require that, but it does seem to make the most sense.

Complex Content Needs Matching Solutions
================================
Complex tasks like parallel languages do require relatively complex production; and  more importantly production tagging decisions for something the presentation engine can use.

Whereas you are currently matching translations at the page level, there is also a case for the sentence/paragraph level as well. This allows the interleaving and side-by-side presentation of the content at the paragraph and section level. It would then be the Javascript presentation engine that knows how to present this content in multiple ways. This doesn't affect E0, only the presentation engine.

For example in the wikipedia reference it mentions "...sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task." The indication in this case is that sentences have two sequencing IDs to address the original sequence and the parallel sequence in the translated text. If there are more than two languages it would appear this could get a lot more complex in some areas. This discussion is probably not for the E0 list but it does highlight the requirement for a reading system to maintain its' content dynamic potential.

You expressed concern for the complexity of this for the author, but multiple-parallel texts is demanding and there is a production effort that must match the presentation requirements. The tagging, lang and ID production effort must be usable by the presentation engine. So the sales pitch on a "free" E0 multi-language parallel presentation engine would be "if you tag your content this way, the rendering engine will enable the features." If you don't like it, do your own thing.

E0 Packaging for Parallel Language Sections:
=================================
The parallel language page construction directly affects the core packaging. An E0:Parallel language extension can set packaging rules because they match the capabilities of the renderer. That does not seem to fight with the core package. While E0 would not change it's rules, perhaps a packaged parallel language rendering engine can state its file-naming requirements. I think that would remain within the E0 concept and only affect those pushing into more challenging content presentation directions.

E0 Enhanced Functionality
====================
Once the core E0 package is fixed it would be good to see the E0 project move to include these magical content presentation tools. We would be very interested in contributing the AZARDI Interactive Engine code for various interactivity, Q&A and Widgets for example. (It has already been shamelessly ripped off).

By making E0 externally extensible multiple functionality extensions such as fixed-layout, interactivity, external resources, notes and messaging, etc. can be provided from available library components or packaged in with appropriate content. This lets E0 keep up with the relentless march of technology and only get complicated in a content context.

E0 then only has to use JQuery-like techniques to manage the separation of specific reading system enhancements and versions. It also has parallels with the MIT licensed Javascript strategy that is such a large part of the Internet.

This ensures no lock-ins or lock-outs. A Reading system can go on its own merry proprietary way for other new features but should support core E0 packaging and packaged components. They would be packaged with applicable content rather than trying to build infinite "all content in the universe" complexity into a reading engine (an approach that is doomed to failure).

Parallel languages is exciting E0/eBook internationalization functionality. We have done a simple five parallel language set of narrations with highlighting (SMIL-like) for Hopkins University and a reasonably large (and decorative) Sanskrit+English Rigveda with audio. It is non-trivial in larger dimensions.

Based on your requirement/ideas/submission I would like to suggest that E0 Core is extensible by nominated specialist content types to have any combination of package-ready presentation engine extensions available. Parallel language is an appropriately complex candidate for getting the test reading systems working with an extension functional package.

Richard

Alberto Pettarin

unread,
Sep 9, 2013, 6:22:51 AM9/9/13
to epu...@googlegroups.com
Hi Richard,

I agree on the proposed idea of modularity --- the main argument is
indeed the fact that an omni-comprehensive spec is doomed (EPUB3
anyone?). On how to realize it, I think other people in this mailing
list are more knowledgeable than me on this point.

I also agree that parallel texts, maybe coupled with Media Overlays-like
synchronized audio, is a good candidate for testing ideas.
(Full disclaimer: my soon-to-be-launched new startup, ReadBeyond, will
offer an automated text/audio synchronization service.)

Last but not least: this morning we published the aforementioned Android
app (APK+code+demo EPUB) on GitHub (comments are welcome):

https://github.com/pettarin/epub3reader

Have a great week,

AlPe

Kjartan Müller

unread,
Sep 11, 2013, 3:25:17 AM9/11/13
to epu...@googlegroups.com
Hi,

I'm totally for modularity in E0, but I think what we talk about here could/should be solved by using core mechanisms of E0. We could allow more than one link per toc-item, having one default, and optional extra links using rel=alternate. Alternate links could be differentiated by using type, media and/or hreflang-attributes. I think this could be valuable not only for parallel language scenarios, but also for accessibility, handling fallback for mediatypes and screen issues, etc, etc. Reading systems could automate the handling of reading order of these, or present them as active user choices based on hardware, configuration and preferences. I could read one chapter on the train, and listen to the next while walking the dog. An advanced reading system could present language alternatives side by side (and yes, having separate streams, viewports, regions, whatever, is a nice idea for etextbooks and other complex publication types). 

I also think it could be useful to distinguish between modularity and profiling. Modularity when we need to extend the mechanisms of ebooks, and profiling when we need to talk about how certain mechanisms and their uses would be useful/required for a specific type of publications, like cookbooks, etextbooks etc, and could be handled by reading systems. Media overlays could be a module, while children's picture books, that often use media overlays, could be a profile. I don't mean that we necessarily should standardize on profiles, but it could be a way of addressing specific needs and separate them from the modules that could be useful for various publication types. For me, 'parallel language edition' sounds like a profile, while the mechanisms needed would be on a more general level in core or in a module. For example, mechanisms for referencing locations in a book could be a module, using these mechanisms for syncing between language versions could be part of a profile. 

Kjartan

Reply all
Reply to author
Forward
0 new messages