Suggestion: Downgrade-scripts from EPUB3 to DAISY 2.02, DAISY 3 and ZedAI

114 views
Skip to first unread message

Jostein Austvik Jacobsen

unread,
Jan 23, 2013, 8:51:38 AM1/23/13
to daisy-pip...@googlegroups.com
I didn't find any mention of this in the charter, so here goes...

For accessible libraries like here at NLB, the majority of our blind patrons have DAISY 2.02 hardware players. It is unlikely that these patrons will receive EPUB3-capable hardware players (or firmware upgrades) in the near future. So while we would like to gradually move to EPUB3, the reality is that we still have to support DAISY 2.02 hardware players. I believe an EPUB3 to DAISY 2.02 script would be of great use to several organizations.

Similarly, part of our (NLBs) production line is currently based on DTBook input and it will take some time to switch from DTBook to EPUB3 or ZedAI. So an EPUB3 to DTBook (or DAISY 3) script would also be useful.

As more and more publishers produce EPUB-books, that will be the input-format for our production lines. If an organization wants to incorporate ZedAI in their production line, an EPUB3 to ZedAI script is also very useful.

In short, these scripts would ease the transition to EPUB3 significantly.

Jostein

Greg Kearney

unread,
Jan 23, 2013, 6:11:08 PM1/23/13
to daisy-pip...@googlegroups.com
We have the same issue here. Plextor had given no indication as to when, how, or even if the players we use will be upgraded. So while ePub/Daisy3 is fine in theory in practice we and many others are locked into Daisy 2.02.  

Sent from my iPhone

Greg Kearney
Association for the Blind of Western Australia
--
 
 

Romain Deltour

unread,
Jan 24, 2013, 4:55:10 AM1/24/13
to daisy-pip...@googlegroups.com
Jostein and Greg,

Thanks for the always-welcome feedback on concrete library needs.

I can see 2 different use cases there:
- production of distribution formats (DAISY 2.02, DAISY 3)
- production of archival / interchange documents (DTBook)

These conversion needs look totally legitimate to me.

As always, going from a permissive grammar like HTML (as used in EPUB3) to a stricter grammar like DTBook is not easy and not always predictable. The EPUB 3 ToC can be leveraged to get the basic structure, but the html-to-dtbook part often has to make assumptions on the structure of the input, making the conversion quite brittle when used with different input sources.

Going from EPUB3 to DAISY 2.02 is probably an easier task to tackle, although I'm not sure about the exact needs there. I would assume that by the time you can expect to receive EPUB 3 with Media Overlays from publishers (i.e. not so soon) device support will be there. I then assume that you're thinking of text-only EPUB 3 input. Do you intend to convert them to text-only DAISY 2.02 books ? Or text+audio DAISY 2.02 books with TTS ?

I think the production of text+audio DAISY 2.02 can fit in the TTS-based production development that will start this year. Production of text-only DAISY 2.02 might be started as an off-track development, if it is a high priority. In any case, I understand that producing DAISY 2.02 is still very demanded.

On the project management side, as you know there are still pretty big items in the current charter's objectives that will consume most of our current resources; these are legitimate and sorely wanted too (e.g. TTS-based production). We can always play with priorities and do some little off-track development, but at the end the charter needs to be respected. That said, as always, if an organization wants to start and take the reins of the development of a particular conversion, we'll fully support it.

Romain.
> --
>
>

Greg Kearney

unread,
Jan 24, 2013, 5:47:33 AM1/24/13
to daisy-pip...@googlegroups.com
We would convert to 2.02 text and audio with TTS

Sent from my iPhone

Greg Kearney
Association for the Blind of Western Australia

> --
>
>

Jostein Austvik Jacobsen

unread,
Jan 24, 2013, 1:39:17 PM1/24/13
to daisy-pip...@googlegroups.com
Thanks Romain. I agree we don't currently have resources to focus on this in the core Pipeline 2 team. I'll provide a little more info about our situation at NLB though:

Most of the time there is no digital copy of the books we produce. So when we want to produce full-text books (mainly TTS-textbooks for students, but in increasing amounts also narrated, both textbooks and novels) is that we cut out the physical pages, scan them, and perform OCR on them to create DTBooks. In cooperation with the other nordic countries (MTM/SPSM/Celia/Nota), NLB outsources the OCR/DTBook conversion service to partners in India. The current agreement we have with these companies will expire at the end of 2013. So we will soon start to formulate a new procurement and markup guidelines for our books that will take effect from 2014, and it's very likely that we will request books as ZedAI or EPUB3 instead of (or possibly in addition to) DTBook (which is the main reason why I'm mentioning all this now). However, if we want to order ZedAI- or EPUB3-books, then we have to know whether or not these files can be integrated into our production line or not.

For braille-production, we currently use NorBraille[1] which takes DTBook as input and gives PEF as output. Of course, as we move to Pipeline 2, we will discuss internally whether we should switch to the ZedAI-to-PEF script or DTBook-to-PEF script instead.

For narrated books, we currently use Dolphin Publisher[2], but we plan to switch to Hindenburg Audio Book Creator[3][4]. Dolphin Publisher uses DAISY 2.02. In pre-production we essentially create a DAISY 2.02 book with no text and no audio which reflects the chapters/headlines of the audio book, which we use as input when narrating audio-only DAISY 2.02 books. With Hindenburg we can do the same with EPUB3 so I personally expect EPUB3 to be our new master format for audio books, given that we are able to down-convert to DAISY 2.02. Hindenburg does pretty much the same as Tobi as far as I can tell (although I haven't looked closely at either) so organizations using Tobi will probably face the same issues as us if they are moving to EPUB3. We wish to produce more full-text publications, although we don't know how big a percentage of our productions etc. will be full-text. I'm pretty sure they will be EPUB3 though, assuming we are able to convert it to DAISY 2.02.

We distribute textbooks in XHTML 1.0 to students and I believe DAISY 3 full-text as well. (I would have to check to what extent though). I'm sure we could start distributing EPUB3 to our students instead, so a EPUB3-to-DAISY3 full-text script and/or EPUB3-to-HTML script isn't really too important for us.

We expect to get more and more EPUB files from publishers which we can use as input to Hindenburg when narrating full-text books. We hope to use the same EPUB files to create PEF-files for embossing.

Anyway, the conclusion is that if this is to be implemented, the organization(s) that need it would have to implement it themselves (with support from the Pipeline 2 team), right?


Disclaimer: much of this is my own opinion and may not reflect NLBs future decisions
Jostein


--



Romain Deltour

unread,
Jan 24, 2013, 4:43:42 PM1/24/13
to daisy-pip...@googlegroups.com
Thanks for the clarifications Jostein, very informative!

On the technical side your (future) requirements look feasible; again the most tricky task is HTML to DTBook, but it doesn't seem to be top-prio. That said, the task would be easier to develop anyway if you have a good control on the input HTML, which you can have if they all come from the same outsourcing program.
The EPUB3-based audio books to audio-only DAISY 2.02 is quite an unknown territory — btw, I'd be curious to see the output format of the Hindenburg's solution — but from where I stand I do not foresee any major hurdle.

Anyway, the conclusion is that if this is to be implemented, the organization(s) that need it would have to implement it themselves (with support from the Pipeline 2 team), right

Let me clarify: the current charter has been approved by the board in October 2011 and runs until October 2013. We cannot deviate much from it (i.e. postpone an objective by putting resources elsewhere) except if the board approves changes or if it is deemed top priority by DAISY's executives.

So, there are several options to implement the mentioned conversions:

- have the board agree on a shift of priorities before the end of the current charter.
- include it in the objectives of a renewed charter after Oct 2013.
- if you need the development to start ASAP, find proper resources (i.e. you cannot rely on the current pipeline's staff).

Note that the line between "chartered" staff and "organization-specific" staff is very thin: it's obviously fine if you (Jostein) start working on these conversions before Oct 2013 on behalf of NLB, but ideally you should not work 100% off-charter.
For the sake of completeness, note that an extreme option would be for NLB to decide for you to work 100% on these off-charter objectives, we'll accept it and report to the board that this or that could not have been implemented due to unforeseen lack of resource.

To summarize:  it's all a matter of priorities. Priorities can be shifted if the new task is considered top-priority or with the consent of the board. In any case, everything you're describing definitely goes in the right direction for the project.

Finally, note that all I said above is based on my understanding of DAISY's project management process. If you think that priorities *should* be changed wrt the current charter, I'll ask the powers that be if/how we could "officially" bend the rules.

Romain.



--
 
 

Jostein Austvik Jacobsen

unread,
Jan 24, 2013, 7:37:03 PM1/24/13
to daisy-pip...@googlegroups.com
On the technical side your (future) requirements look feasible; again the most tricky task is HTML to DTBook, but it doesn't seem to be top-prio. That said, the task would be easier to develop anyway if you have a good control on the input HTML, which you can have if they all come from the same outsourcing program.
Good point. If we decide to order EPUB3 from India then the markup requirements for that would probably be based on our current DTBook markup requirements.
 
The EPUB3-based audio books to audio-only DAISY 2.02 is quite an unknown territory (...)
We don't have an explicit agreement with the publishers to include text in all of our books yet, which is the reason why we might want to strip out the text and only include the audio.
 
- if you need the development to start ASAP, find proper resources (i.e. you cannot rely on the current pipeline's staff).
Note that the line between "chartered" staff and "organization-specific" staff is very thin: it's obviously fine if you (Jostein) start working on these conversions before Oct 2013 on behalf of NLB (...)
Yes, spending 30% of my time (or something) throughout 2013 on these scripts might be the consequence of this (if possible). Of course, 60% of my time would still be dedicated to the Pipeline 2 charter, I don't want to change that. I'll let you know whether or not we decide to allocate more of my time to Pipeline 2. I personally wouldn't mind; I like developing scripts. :) In any case, I'd be happy to hear from other organizations that have the same or similar requirements as us, what timeframe they need it in, and whether or not they are able to allocate resources to help get this done.

Jostein


--
 
 

Greg Kearney

unread,
Jan 24, 2013, 8:02:07 PM1/24/13
to daisy-pip...@googlegroups.com
We face the same problem here. The issue is that there are thousands, in our case and hundreds of thousands, in the case of the United States, machines in use at this point no support for EPub. Unless and until the makers of these devices start supporting ePub/DAISY there is not going to be much use for it. So we have to remain using DAISY 2.02 because everyone's players can handle it.

This is the the same fate that befell DAISY/NISO 2005 nice in theory but if no ones players can play it it matters little how good it might be.


Gregory Kearney | Manager Accessible Media
Association for the Blind of WA - Guide Dogs WA
PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria Park WA 6100
Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
Tel: 307-224-4022 (North America)
Email: greg.k...@guidedogswa.com.au
Email: gkea...@gmail.com

Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.
Article 19 of the UN Universal Declaration of Human Rights
Reply all
Reply to author
Forward
0 new messages