IRC chat logs for 1/24/10 posted

4 views
Skip to first unread message

Efraim Feinstein

unread,
Jan 24, 2010, 6:49:17 PM1/24/10
to jewishlitu...@googlegroups.com
Hi,

The logs for today's IRC chat are now available on the wiki
<http://wiki.jewishliturgy.org/IRC_Conference/logs/2010-01-24>. A
summary
<http://wiki.jewishliturgy.org/IRC_Conference/summary/2010-01-24> of the
discussion is also available.

Thanks to those who attended. If you didn't make it and want to
comment, please use this discussion list.

--
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

Ze'ev Clementson

unread,
Jan 25, 2010, 1:43:52 PM1/25/10
to jewishlitu...@googlegroups.com
Hi all,

It was an interesting chat on Sunday and I appreciated the opportunity
to talk to everyone. Two follow-up questions I have after re-reading
the logs:

1. The Sarissa and xslt/xpath in Javascript libs look interesting for
providing certain types of developer and (eventually) end-user
"front-ends" to opensiddur texts. It sounds like Azriel (and maybe
Efraim) have the most familiarity with these libs. Could one (or both)
of you please comment on the pros/cons of using these - in particular:
a. Completeness of xslt/xpath implementation (1.0 or 2.0 or subset?).
b. Performance of libs when transforming large xml docs (for
example, have you tried doing in-browser transforms of one or more of
the Tanach jlptei files into html?).
c. Pros/cons of doing in-browser transforms as opposed to just
recording links between documents and doing static transforms. In
other words, what specific functionality will be in the client app
that requires in-browser transforms?
d. Or, have I misunderstood what is intended with the client app and
is the intention primarily to just use the xml & xpath libs (not xslt)
to provide listings of available jlptei segments so that the developer
can select which segments need to be output in a composite Siddur
(and, perhaps, the actual transforms will be done using Saxon with
generated xslt from the client app?)?

2. There was some discussion (after I left the chat) about problems
generating PDF's. It was unclear (at least to me) whether these
problems are:
a. Just problems with generating PDF's from an XHTML document (if
so, what problems?).
b. Problems with generating Hebrew texts correctly with certain
static PDF-generating utilities.
c. Something else.

Could someone please summarize what the PDF-related issues are and
what alternatives have already been looked at.

Thanks Efraim for posting the logs/summary and thanks to everyone for
an interesting chat,
Ze'ev

> --
> You received this message because you are subscribed to the Google Groups
> "jewishliturgy-discuss" group.
> To post to this group, send email to jewishlitu...@googlegroups.com.
> To unsubscribe from this group, send email to
> jewishliturgy-di...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/jewishliturgy-discuss?hl=en.
>
>

Efraim Feinstein

unread,
Jan 25, 2010, 2:35:16 PM1/25/10
to jewishlitu...@googlegroups.com
Hi,

I'm going to leave most of the Sarissa questions to Azriel, since he
knows a lot more about it than I do.

Ze'ev Clementson wrote:
> 1. The Sarissa and xslt/xpath in Javascript libs look interesting for
> providing certain types of developer and (eventually) end-user
> "front-ends" to opensiddur texts. It sounds like Azriel (and maybe
> Efraim) have the most familiarity with these libs. Could one (or both)
> of you please comment on the pros/cons of using these - in particular:
> a. Completeness of xslt/xpath implementation (1.0 or 2.0 or subset?).
>

To the best of my knowledge, no browser has native support for XSLT
2.0. Which is kind of sad. XSLT 2.0 rocks. (As a side note: it will
be interesting to see what happens when Qt completes development of
their XPath 2.0, XQuery 1.0 and XSLT 2.0 libraries. As a C++
implementation in a popular library package, it could be a game changer.)

[skip to...]


> d. Or, have I misunderstood what is intended with the client app and
> is the intention primarily to just use the xml & xpath libs (not xslt)
> to provide listings of available jlptei segments so that the developer
> can select which segments need to be output in a composite Siddur
> (and, perhaps, the actual transforms will be done using Saxon with
> generated xslt from the client app?)?
>

By design, there are no large documents in the database. What's
supposed to happen is that a composite document (which we like to call a
"siddur recipe" has reference links to other composite documents, and so
on, until the documents containing the texts are found). The XSLT 2.0
is running in Saxon, which is running within a Java applet which is
running in the browser, so we're not using the in-browser XSLT
facilities for transforms at all. The JavaScript-based XML/XPath
libraries are primarily operating on individual documents and on JLPTEI
fragments. If we find that there are some commonly used queries, we'll
move them out of JavaScript and into XQuery on the database in order to
take advantage of the speed increase from indexing.

We may use in-browser XSLT (1.0) for simple styling operations. For
example, for a two-column translated text, if we want the Hebrew to go
on the right and the English on the left or vice versa, that might be an
in-browser operation. (It can also be done with JavaScript; AFAIK, it
can't be done with CSS alone).

> 2. There was some discussion (after I left the chat) about problems
> generating PDF's. It was unclear (at least to me) whether these
> problems are:
>

Azriel wrote a summary of our issues and possible approaches on the wiki
<http://wiki.jewishliturgy.org/Target_Survey>; it's a useful start,
although it's a little dated with respect to our current knowledge.

The summary version is: In order to write voweled Hebrew correctly, the
renderer has to support right-to-left languages and OpenType fonts with
complex layouts. In order to run in-browser, it has to be able to run
in a Java environment. Whatever we use in the applet can't have a
license incompatibility that prevents us from linking with Saxon's MPL,
and has to be free software.

We've found a number of solutions that will get us part way there, but
none that will get us all the way.

In our dream world, we would be able to process JLPTEI into either
XHTML+CSS (in paged mode) *or* XSL-FO and use a library to convert that
to PDF.

Azriel Fasten

unread,
Jan 25, 2010, 2:36:39 PM1/25/10
to jewishliturgy-discuss
On Mon, Jan 25, 2010 at 1:43 PM, Ze'ev Clementson <bere...@gmail.com> wrote:
Hi all,

It was an interesting chat on Sunday and I appreciated the opportunity
to talk to everyone. Two follow-up questions I have after re-reading
the logs:

1. The Sarissa and xslt/xpath in Javascript libs look interesting for
providing certain types of developer and (eventually) end-user
"front-ends" to opensiddur texts. It sounds like Azriel (and maybe
Efraim) have the most familiarity with these libs. Could one (or both)
of you please comment on the pros/cons of using these - in particular:
 a. Completeness of xslt/xpath implementation (1.0 or 2.0 or subset?).

Sarissa only normalizes the API to use the browser's xslt/xpath; it depends on the browser for version and conformance. Aside from this, Javeline provides an alternate pure JS implementation which drop in with the psudo-standard api across browsers, and is thus compatible with Sarrisa. There are others such libraries as well. These would cut out the need for Sarrisa, but see next question. All of these libraries, as far as I know, only support 1.0.
 
 b. Performance of libs when transforming large xml docs (for
example, have you tried doing in-browser transforms of one or more of
the Tanach jlptei files into html?).

Performance with Sarrisa means using the browser's natively compiled xml library, which shouldn't be too bad. Performance with a pure JS implementation would be mighty slow, especially in browsers like IE. As to your question, I am not certain what you mean with "in-browser" transforms, but the demo does the transforms in a Java applet. Try it out if you haven't yet.
 
 c. Pros/cons of doing in-browser transforms as opposed to just
recording links between documents and doing static transforms. In
other words, what specific functionality will be in the client app
that requires in-browser transforms?

By "client app"  there can be many things that you are trying to refer to. For a recipe editor or JLPTEI encoding application etc., no rendering needs to be done, but xml manipulation is made a lot easier using xpath. Xpath 1.0 is mostly sufficient for this so far. For the demo application that we have made, each client will have different choices in the "recipe" that occur at the transformation level, so yes, the client requires transforms. However, we do not forsee this occurring in browser because there is actually only a single (open source) XSLT 2.0 implementation: Saxon. Saxon is written in Java, and is linked in as part of the demo applet.
 
 d. Or, have I misunderstood what is intended with the client app and
is the intention primarily to just use the xml & xpath libs (not xslt)
to provide listings of available jlptei segments so that the developer
can select which segments need to be output in a composite Siddur
(and, perhaps, the actual transforms will be done using Saxon with
generated xslt from the client app?)?

Exactly. The browser's implementations are just to weak to do any transforms that we need.
 

2. There was some discussion (after I left the chat) about problems
generating PDF's. It was unclear (at least to me) whether these
problems are:
 a. Just problems with generating PDF's from an XHTML document (if
so, what problems?).

There are few libraries to do so. See http://wiki.jewishliturgy.org/Target_Survey for a very old look around at whats available. I have found more libraries and checked out some of those. One of these days, I will update that page.
 
 b. Problems with generating Hebrew texts correctly with certain
static PDF-generating utilities.

Yes. For example, iText is the primary library to generate PDFs with. With every library I tried there is one or more of the following problems:
  • No bidi/rtl; that is, the text will show up left to right instead of right to left (backwards, reversed)
  • No OpenType font support; that is, either the text won't display at all, or the diacritics will be in sequence, after the character it belongs to, instead of in its midst. We use the Ezra SIL font, which is required for the complex diacritics that is found in Tanach and in the Siddur. Ezra SIL is an OpenType font
The PDF format itself allows OpenType fonts. XeTeX is the only way we know of right now that can generate our PDFs correctly (short of drawing the text as vectors, which can be done but takes away from the semanticness of the PDF; ie. it is not selectable).
 
 c. Something else.

Could someone please summarize what the PDF-related issues are and
what alternatives have already been looked at.

I should really update the Target Survey. I will hopefully get to it later today, and get back to you.

Ze'ev Clementson

unread,
Jan 25, 2010, 4:16:43 PM1/25/10
to jewishlitu...@googlegroups.com
Hi Efraim,

Thanks for the clarifications. More comments in my response to Azriel.

- Ze'ev

Ze'ev Clementson

unread,
Jan 25, 2010, 4:16:51 PM1/25/10
to jewishlitu...@googlegroups.com
Hi Azriel,

On Mon, Jan 25, 2010 at 11:36 AM, Azriel Fasten <fst...@gmail.com> wrote:
>
>
> On Mon, Jan 25, 2010 at 1:43 PM, Ze'ev Clementson <bere...@gmail.com>
> wrote:
>>
>> Hi all,
>>
>> It was an interesting chat on Sunday and I appreciated the opportunity
>> to talk to everyone. Two follow-up questions I have after re-reading
>> the logs:
>>
>> 1. The Sarissa and xslt/xpath in Javascript libs look interesting for
>> providing certain types of developer and (eventually) end-user
>> "front-ends" to opensiddur texts. It sounds like Azriel (and maybe
>> Efraim) have the most familiarity with these libs. Could one (or both)
>> of you please comment on the pros/cons of using these - in particular:
>>  a. Completeness of xslt/xpath implementation (1.0 or 2.0 or subset?).
>
> Sarissa only normalizes the API to use the browser's xslt/xpath; it depends
> on the browser for version and conformance. Aside from this,
> Javeline provides an alternate pure JS implementation which drop in with the
> psudo-standard api across browsers, and is thus compatible with Sarrisa.
> There are others such libraries as well. These would cut out the need for
> Sarrisa, but see next question. All of these libraries, as far as I know,
> only support 1.0.

Thanks, it sounds like (based on both your comments and Efraim's) that
this won't be an issue as there will be no (or, relatively little)
need for xslt transforms in the browser app.

>>  b. Performance of libs when transforming large xml docs (for
>> example, have you tried doing in-browser transforms of one or more of
>> the Tanach jlptei files into html?).
>
> Performance with Sarrisa means using the browser's natively compiled xml
> library, which shouldn't be too bad. Performance with a pure JS
> implementation would be mighty slow, especially in browsers like IE. As to
> your question, I am not certain what you mean with "in-browser" transforms,
> but the demo does the transforms in a Java applet. Try it out if you haven't
> yet.

Yes, I have already tried out the Java app. From the comments in the
irc chat, I had assumed that your intention was to do something
similar using Sarissa and browser xslt/xpath transforms. I understand
now that this is not the case.

>>  c. Pros/cons of doing in-browser transforms as opposed to just
>> recording links between documents and doing static transforms. In
>> other words, what specific functionality will be in the client app
>> that requires in-browser transforms?
>
> By "client app"  there can be many things that you are trying to refer to.
> For a recipe editor or JLPTEI encoding application etc., no rendering needs
> to be done, but xml manipulation is made a lot easier using xpath. Xpath 1.0
> is mostly sufficient for this so far. For the demo application that we have
> made, each client will have different choices in the "recipe" that occur at
> the transformation level, so yes, the client requires transforms. However,
> we do not forsee this occurring in browser because there is actually only a
> single (open source) XSLT 2.0 implementation: Saxon. Saxon is written in
> Java, and is linked in as part of the demo applet.

Yes, that's clear now.

>>  d. Or, have I misunderstood what is intended with the client app and
>> is the intention primarily to just use the xml & xpath libs (not xslt)
>> to provide listings of available jlptei segments so that the developer
>> can select which segments need to be output in a composite Siddur
>> (and, perhaps, the actual transforms will be done using Saxon with
>> generated xslt from the client app?)?
>
> Exactly. The browser's implementations are just to weak to do any transforms
> that we need.

Unfortunate as a browser-based non-java option would have been nice.

>> 2. There was some discussion (after I left the chat) about problems
>> generating PDF's. It was unclear (at least to me) whether these
>> problems are:
>>  a. Just problems with generating PDF's from an XHTML document (if
>> so, what problems?).
>
> There are few libraries to do so.
> See http://wiki.jewishliturgy.org/Target_Survey for a very old look around
> at whats available. I have found more libraries and checked out some of
> those. One of these days, I will update that page.
>
>>
>>  b. Problems with generating Hebrew texts correctly with certain
>> static PDF-generating utilities.
>
> Yes. For example, iText is the primary library to generate PDFs with. With
> every library I tried there is one or more of the following problems:
>
> No bidi/rtl; that is, the text will show up left to right instead of right
> to left (backwards, reversed)
> No OpenType font support; that is, either the text won't display at all, or
> the diacritics will be in sequence, after the character it belongs to,
> instead of in its midst. We use the Ezra SIL font, which is required for the
> complex diacritics that is found in Tanach and in the Siddur. Ezra SIL is an
> OpenType font
>
> The PDF format itself allows OpenType fonts. XeTeX is the only way we know
> of right now that can generate our PDFs correctly (short of drawing the text
> as vectors, which can be done but takes away from the semanticness of the
> PDF; ie. it is not selectable).

I didn't see XeTeX mentioned in your Target_Survey page. What are the
cons of using it (I know nothing about XeTeX other than that it is
based on TeX but has better Unicode and font support)? Could TeXML be
used as an "intermediate" format that the transforms output to prior
to conversion to PDF using XeTeX?

Looking through the Target_Survey page, iText seemed like a reasonable
option as well. There were some cons listed in the summary, but the
general gist seemed to be that it was a workable option that didn't
require the TeX toolchain installed. Have you an updated opinion on
it?

>>  c. Something else.
>>
>> Could someone please summarize what the PDF-related issues are and
>> what alternatives have already been looked at.
>
> I should really update the Target Survey. I will hopefully get to it later
> today, and get back to you.

Great - thanks!

- Ze'ev

Azriel Fasten

unread,
Jan 25, 2010, 4:28:17 PM1/25/10
to jewishliturgy-discuss
On Mon, Jan 25, 2010 at 4:16 PM, Ze'ev Clementson <bere...@gmail.com> wrote:

I didn't see XeTeX mentioned in your Target_Survey page. What are the
cons of using it (I know nothing about XeTeX other than that it is
based on TeX but has better Unicode and font support)? Could TeXML be
used as an "intermediate" format that the transforms output to prior
to conversion to PDF using XeTeX?

That is exactly what Efraim was doing originally (before we revamped the JLPTEI spec, see the proof of concept on the wiki homepage). However, it is a beast, and not in Java. Thus it cannot be used in an applet. It is still a good target though. We were also looking into eXTeX which implements a TeX in Java but it is not complete enough.
 

Looking through the Target_Survey page, iText seemed like a reasonable
option as well. There were some cons listed in the summary, but the
general gist seemed to be that it was a workable option that didn't
require the TeX toolchain installed. Have you an updated opinion on
it?

Many of the other libraries there actually use iText. iText is definitely a great library. However, it does not have OpenType support and thus will not display the vowels in the correct place (it displays them sequentially). It does have some Bidi support though. Another thing to note, is that as of January (IIRC), iText moved to a very restrictive AGPL license from MPL/LGPL dual license. This means that short of a fork, any changes made to the new iText to fix this issue will not help us. There are a few other exciting possibilities, but frustratingly anything that is promising in one area fails in another. I will go through the details on the wiki page when I get the chance, but feel free to ask any questions (these answers will fill the page ;) ).

Efraim Feinstein

unread,
Jan 25, 2010, 4:51:40 PM1/25/10
to jewishlitu...@googlegroups.com
Heh -- I was in the middle of writing almost the same thing as Azriel
just did, but he sent out first :-)

Azriel Fasten wrote:
>
> That is exactly what Efraim was doing originally (before we revamped
> the JLPTEI spec,

The decision not use XeTeX as a primary path to PDF had less to do with
the XML spec change than a change in project focus. Originally, when I
started coding this thing, I was envisioning an entirely offline
development model of the type used in most open source coding projects
(an svn repository), and was hoping that someone else would develop the
front end at some unspecified point.

Aharon successfully convinced me that integrating this whole thing
together as a web-based application was a better way to go about it. It
brings with it a lot of coding pain, but it is the right way to go about
developing of a project of this scale. Unlike most "web services,"
though, the entire infrastructure is being offered for download, so it
doesn't fall into the Software as a Service trap, where even if the
content is free, the infrastructure is subject to the whims of the
service provider. I also see it as something of an insurance policy
against the project's failure. If, at any point, we can't continue
operation, all the work our contributors put into it won't be lost in a
black hole.

Since Azriel developed the Java demo, the XSLT code that I write and
test offline magically becomes part of a web app. :-)

Ze'ev Clementson

unread,
Jan 25, 2010, 5:43:03 PM1/25/10
to jewishlitu...@googlegroups.com
Hi Azriel,

On Mon, Jan 25, 2010 at 1:28 PM, Azriel Fasten <fst...@gmail.com> wrote:
>
>
> On Mon, Jan 25, 2010 at 4:16 PM, Ze'ev Clementson <bere...@gmail.com>
> wrote:
>>
>> I didn't see XeTeX mentioned in your Target_Survey page. What are the
>> cons of using it (I know nothing about XeTeX other than that it is
>> based on TeX but has better Unicode and font support)? Could TeXML be
>> used as an "intermediate" format that the transforms output to prior
>> to conversion to PDF using XeTeX?
>
> That is exactly what Efraim was doing originally (before we revamped the
> JLPTEI spec, see the proof of concept on the wiki homepage). However, it is
> a beast, and not in Java. Thus it cannot be used in an applet. It is still a
> good target though. We were also looking into eXTeX which implements a TeX
> in Java but it is not complete enough.

Seems like most of the non-TeX options are 80% (+/-) solutions. That's too bad.

>> Looking through the Target_Survey page, iText seemed like a reasonable
>> option as well. There were some cons listed in the summary, but the
>> general gist seemed to be that it was a workable option that didn't
>> require the TeX toolchain installed. Have you an updated opinion on
>> it?
>
> Many of the other libraries there actually use iText. iText is definitely a
> great library. However, it does not have OpenType support and thus will not
> display the vowels in the correct place (it displays them sequentially). It
> does have some Bidi support though. Another thing to note, is that as of
> January (IIRC), iText moved to a very restrictive AGPL license from MPL/LGPL
> dual license. This means that short of a fork, any changes made to the new
> iText to fix this issue will not help us. There are a few other exciting
> possibilities, but frustratingly anything that is promising in one area
> fails in another. I will go through the details on the wiki page when I get
> the chance, but feel free to ask any questions (these answers will fill the
> page ;) ).

Looks like iText is becoming a non-starter then too.

I have some additional comments in my reply to Efraim.

Thanks,
Ze'ev

Ze'ev Clementson

unread,
Jan 25, 2010, 5:44:33 PM1/25/10
to jewishlitu...@googlegroups.com
On Mon, Jan 25, 2010 at 1:51 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Heh -- I was in the middle of writing almost the same thing as Azriel just
> did, but he sent out first :-)

hehe - just goes to show that you can't think too much before posting! ;-)

> Azriel Fasten wrote:
>>
>> That is exactly what Efraim was doing originally (before we revamped the
>> JLPTEI spec,
>
> The decision not use XeTeX as a primary path to PDF had less to do with the
> XML spec change than a change in project focus.  Originally, when I started
> coding this thing, I was envisioning an entirely offline development model
> of the type used in most open source coding projects (an svn repository),
> and was hoping that someone else would develop the front end at some
> unspecified point.
>
> Aharon successfully convinced me that integrating this whole thing together
> as a web-based application was a better way to go about it.  It brings with
> it a lot of coding pain, but it is the right way to go about developing of a
> project of this scale.  Unlike most "web services," though, the entire
> infrastructure is being offered for download, so it doesn't fall into the
> Software as a Service trap, where even if the content is free, the
> infrastructure is subject to the whims of the service provider.  I also see
> it as something of an insurance policy against the project's failure.  If,
> at any point, we can't continue operation, all the work our contributors put
> into it won't be lost in a black hole.

That's always a nice plus! :-)

> Since Azriel developed the Java demo, the XSLT code that I write and test
> offline magically becomes part of a web app. :-)

So, you magically become a web developer as well. Hopefully, your pay
is increased accordingly to reflect your dual responsibilities. ;-)

Since there is no longer a need to have the entire tool-chain runnable
in an off-line browser or stand-alone client, does using TeX (or
derivative like XeTeX) become a more reasonable option now? The pros:
1. Complete support for bidi text and rich Hebrew and gives the
greatest level of control of the output.
2. Webservice generation of pdf document means that user is not
required to install any software.
3. At a later date, if alternatives (like XSL-FO) are enhanced and
become an option, swapping PDF-generator component should not be a big
issue (so long as current coding efforts "black box" the interface).
4. Allows us to defer making a choice on a long-term solution.

The cons:
1. Harder to run stand-alone (needs a bigger tool-chain installed).
This could probably be offset by good installation documentation and
install scripts.
2. More complex to code to.
3. I'm sure there are some more cons.

- Ze'ev

> --
> ---
> Efraim Feinstein
> Lead Developer
> Open Siddur Project
> http://opensiddur.net
> http://wiki.jewishliturgy.org
>

Efraim Feinstein

unread,
Jan 25, 2010, 9:27:45 PM1/25/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
>
> So, you magically become a web developer as well. Hopefully, your pay
> is increased accordingly to reflect your dual responsibilities. ;-)
>

It doubled and quadrupled at the same time! Amazing!


As for TeX, I'm not sure what you're getting at:


> Since there is no longer a need to have the entire tool-chain runnable
> in an off-line browser or stand-alone client, does using TeX (or
> derivative like XeTeX) become a more reasonable option now?

The reason for XeTeX as opposed to another form of TeX is that it *does*
support bidi and OpenType fonts with complex layouts and the others don't.

The problem is that to use it, you either need to have the entire
toolchain downloaded to the client or you need to have a server
supporting the toolchain. Installation procedures tend to scare away
users. XeTeX uses a huge amount of processing power and memory (let
alone two parallel TeX processes!), and on servers, processing power and
memory cost money. Our current VPS has just about enough processor and
memory to run the database.

One last-resort option we were discussing involves running XeTeX in the
cloud and having to charge for online PDF generation. (A user who was
willing to go through the installation procedure could get the TeX file
from the web application and install it him/herself)

> The pros:


> 2. Webservice generation of pdf document means that user is not
> required to install any software.
> 3. At a later date, if alternatives (like XSL-FO) are enhanced and
> become an option, swapping PDF-generator component should not be a big
> issue (so long as current coding efforts "black box" the interface).
>

As for issue #3, all of the core JLPTEI processing is independent of the
output format. A different set of output-specific stylesheets would be
needed for each target format; doing that is a lot easier than having to
rewrite all of the logic.

> 4. Allows us to defer making a choice on a long-term solution.
>
> The cons:
> 1. Harder to run stand-alone (needs a bigger tool-chain installed).
> This could probably be offset by good installation documentation and
> install scripts.
>

Maintaining the install scripts could be an issue. We're trying to
support at least 3 platforms (Windows, OS X, and Linux) that have very
different installation procedures. Add to that that our entire
dependency tree is made up of moving targets. It's mitigated now a bit
by the fact that we're shipping every one of our non-system-standard
dependencies in svn.

Ze'ev Clementson

unread,
Jan 26, 2010, 6:17:56 PM1/26/10
to jewishlitu...@googlegroups.com
Hi Efraim/Azriel,

On Jan 25, 6:27 pm, Efraim Feinstein <efraim.feinst...@gmail.com> wrote:
> The reason for XeTeX as opposed to another form of TeX is that it *does*
> support bidi and OpenType fonts with complex layouts and the others don't.
>
> The problem is that to use it, you either need to have the entire
> toolchain downloaded to the client or you need to have a server
> supporting the toolchain. Installation procedures tend to scare away
> users. XeTeX uses a huge amount of processing power and memory (let
> alone two parallel TeX processes!), and on servers, processing power and
> memory cost money. Our current VPS has just about enough processor and
> memory to run the database.

One alternative might be to use RenderX's XSL-FO product (XEP) for
generating PDF's. Although it is a commercial product, they provide a
free Personal Edition that is for non-commercial use:
http://www.renderx.com/download/personal.html

The benefits of this are:
1. Open Siddur output would be in XSL-FO thus keeping to the XML
standards for formatting and PDF generation.
2. XEP is a relatively small, Java download that runs on all the
platforms the other Open Siddur tools run on.
3. XEP appears to support Hebrew and bidi well (however, see my comment below).
4. Potentially (if FOP ever supports Hebrew and bidi properly), we
could move to FOP in the future.
5. Don't need TeX installed thereby eliminating installation and
processing power overhead of XeTeX.
6. Users who want to generate their own PDF's (for non-commercial use)
can do so easily from either a server or locally on their own PC's.
7. Azriel said "If it were possible to get bidi working in FOP, that
would probably be best" - XEP isn't FOP; however, it has similar
technical advantages (but is not open source of course).

The cons are:
1. It's a commercial product and not open source.
2. There is a small RenderX stamp at the bottom of any page generated
with the free version of the product.
3. The XEP free Personal Edition's license seems to allow server-based
PDF generation for non-commercial purposes; however, this would need
to be confirmed by a lawyer.
4. Anyone wanting to produce a PDF that would be sold would need to
buy a XEP commercial license (however, this impacts commercial users
only and non-commercial users get the benefit of a well-supported
commercial product that is easy to install and use).

I've attached a sample XSL-FO file with the corresponding PDF output.
On my Mac, the vowels/dageshim/taamim don't display very well, but
that is probably just the known issue with Mac Hebrew fonts. It would
be good if someone could confirm that everything displays correctly
when generated from a Linux or Windows box.

If you want to download XEP (from the above link) and play with it,
you'll have to add the appropriate font declarations to the XEP ini
file (xep.xml) - I added Ezra SIL with the following on my Mac (the
xml:base value will of course need to be changed):

<font-group xml:base="file://Users/bc/Library/Fonts/"
label="TrueType" embed="true" subset="true">
<font-family name="Ezra SIL">
<font><font-data ttf="SILEOT.ttf"/></font>
<font style="oblique"><font-data ttf="SILEOT.ttf"/></font>
<font weight="bold"><font-data ttf="SILEOT.ttf"/></font>
<font weight="bold" style="oblique"><font-data ttf="SILEOT.ttf"/></font>
</font-family>
</font-group>

Thoughts?

- Ze'ev

XEP-XSLFO-Hebrew.xml
XEP-XSLFO-Hebrew.pdf

Efraim Feinstein

unread,
Jan 26, 2010, 7:01:47 PM1/26/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
>
> One alternative might be to use RenderX's XSL-FO product (XEP) for
>

[snip]


> 4. Potentially (if FOP ever supports Hebrew and bidi properly), we
> could move to FOP in the future.
>

It also has a toxic component -- it discourages developers from
contributing to the development of working free software solutions,
lulling them into a false sense that the problem has been solved. I'm
reminded of Linus Torvalds' reliance on BitKeeper as the Linux kernel's
source code management system. It had a non-open-source "free for open
source developers" license, and technically worked quite well. One day,
the commercial entity behind BitKeeper decided to end support for the
free-as-in-beer version, leaving the Linux kernel developers totally out
of luck. The free alternative that was developed was the source code
management system now known as git.

> The cons are:
> 1. It's a commercial product and not open source.
>

The latter is a fatal flaw.

> 2. There is a small RenderX stamp at the bottom of any page generated
> with the free version of the product.
>

The watermark requirement is merely annoying.

> 3. The XEP free Personal Edition's license seems to allow server-based
> PDF generation for non-commercial purposes; however, this would need
> to be confirmed by a lawyer.
>

Putting a check on how our texts are used is definitely a violation of
our principles. It's also probably a violation of the CC-BY-SA license
on the content (which requires that no additional restrictions be placed
on copying or distribution of the content).

> Thoughts?
>

The PDF renderer is what I would call an essential part of the
toolchain. Requiring use of non-free software, and particularly
non-free software that infects the content, would compromise our
principles and our mission.

Aharon Varady

unread,
Jan 26, 2010, 7:34:38 PM1/26/10
to jewishlitu...@googlegroups.com
On Tue, Jan 26, 2010 at 7:01 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:

The cons are:
1. It's a commercial product and not open source.
 

The latter is a fatal flaw.


I just want to second this. The "open" in Open Siddur is not for display -- it is tachlis (i.e, fundamental).


Thoughts?
 

The PDF renderer is what I would call an essential part of the toolchain.  Requiring use of non-free software, and particularly non-free software that infects the content, would compromise our principles and our mission.



Like Efraim, I see no problem in Open Siddur developers using non-free/open software as part of their development activity. So long as the non-free software does not become a required part of the toolchain, we're good. The danger is as Efraim indicated: that we become reliant on it. I think we're even ok with watermarks as well so long as the content is submitted under free and open terms. This came up while we were talking with google about using their pdf's of scanned public domain books carrying the google watermark.

It seems to me that the issue of kavanah, (intention) is important here. If it was possible to use this non-free software and still retain our focus on replacing it when free/open software cebomes available, I'd think some nuance would be called for. But like, Efraim, I'm worried that if we start using it and finding bugs with it, we'll be tempted to kluge around it.

This also came up in the IRC chat while a potential contributor discussed using ABBYY's commercial OCR software to convert scanned images of text to machine readable text. (The text was temporarily redacted at the request of the IRC participant since the offer to contribute was unofficial.) There is free and open source OCR software for Hebrew available (HOCR, http://hocr.berlios.de), it's just not mature yet to be useful to us or our contributors. Until then, we'll be transcribing by hand and proofing text contributed with whatever OCR software a contributor wishes to use prior to contributing to us.

Aharon

Ze'ev Clementson

unread,
Jan 26, 2010, 7:44:56 PM1/26/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Jan 26, 2010 at 4:01 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Hi,
>
> Ze'ev Clementson wrote:
>>
>> One alternative might be to use RenderX's XSL-FO product (XEP) for
>>
>
> [snip]
>>
>> 4. Potentially (if FOP ever supports Hebrew and bidi properly), we
>> could move to FOP in the future.
>>
>
> It also has a toxic component -- it discourages developers from contributing
> to the development of working free software solutions, lulling them into a
> false sense that the problem has been solved.  I'm reminded of Linus
> Torvalds' reliance on BitKeeper as the Linux kernel's source code management
> system.  It had a non-open-source "free for open source developers" license,
> and technically worked quite well.  One day, the commercial entity behind
> BitKeeper decided to end support for the free-as-in-beer version, leaving
> the Linux kernel developers totally out of luck.   The free alternative that
> was developed was the source code management system now known as git.

The difference is that BitKeeper was an integral part of the Linux
development process and required a huge effort to replace. For Open
Siddur, the PDF-generation component is the last component in the tool
chain, is optional for some users (those who may want HTML output
rather than PDF) and can be replaced much more easily (with XeTeX or
one of the other alternatives) if necessary.

>> The cons are:
>> 1. It's a commercial product and not open source.
>>
>
> The latter is a fatal flaw.

Why? If it ever becomes an issue, XEP can always be replaced. It's
just the last piece in the tool-chain.

>> 2. There is a small RenderX stamp at the bottom of any page generated
>> with the free version of the product.
>>
>
> The watermark requirement is merely annoying.

It is a watermark (which is annoying), but it's pretty unobtrusive -
it's just a line of text (in a small font) at the bottom of the page.

>> 3. The XEP free Personal Edition's license seems to allow server-based
>> PDF generation for non-commercial purposes; however, this would need
>> to be confirmed by a lawyer.
>>
>
> Putting a check on how our texts are used is definitely a violation of our
> principles.  It's also probably a violation of the CC-BY-SA license on the
> content (which requires that no additional restrictions be placed on copying
> or distribution of the content).

That's why I said that server use would need to be confirmed by a lawyer.

>> Thoughts?
>>
>
> The PDF renderer is what I would call an essential part of the toolchain.
>  Requiring use of non-free software, and particularly non-free software that
> infects the content, would compromise our principles and our mission.

So, is the favored alternative at present to use XeTeX (which, as has
been already noted, is not XML-based, uses TeX, is difficult to
install, and uses lots of processing power on the server)?

- Ze'ev

Ze'ev Clementson

unread,
Jan 26, 2010, 7:49:22 PM1/26/10
to jewishlitu...@googlegroups.com
Hi Aharon,

The difference here is that we would be writing XSL-FO which is an XML
standard, not writing to a proprietary format that is only understood
by XEP. If XEP became an issue, we either replace XEP with FOP (if it
had reached the stage where it could support Hebrew & bidi correctly)
or transform the XSL-FO to some other format that could be processed
by a different PDF-generator.

> This also came up in the IRC chat while a potential contributor discussed
> using ABBYY's commercial OCR software to convert scanned images of text to
> machine readable text. (The text was temporarily redacted at the request of
> the IRC participant since the offer to contribute was unofficial.) There is
> free and open source OCR software for Hebrew available (HOCR,
> http://hocr.berlios.de), it's just not mature yet to be useful to us or our
> contributors. Until then, we'll be transcribing by hand and proofing text
> contributed with whatever OCR software a contributor wishes to use prior to
> contributing to us.

- Ze'ev

Efraim Feinstein

unread,
Jan 26, 2010, 11:24:38 PM1/26/10
to jewishlitu...@googlegroups.com
Hi Ze'ev,

Ze'ev Clementson wrote:
> The difference is that BitKeeper was an integral part of the Linux
> development process and required a huge effort to replace. For Open
> Siddur, the PDF-generation component is the last component in the tool
> chain, is optional for some users (those who may want HTML output
> rather than PDF) and can be replaced much more easily (with XeTeX or
> one of the other alternatives) if necessary.
>

We don't have infinite developer time, so if we're going to aim for a
target at this point, it's going to be one where we have a plan for it
to work within our mission requirements. XeTeX requires a very
different set of transforms than XSL-FO, so if we're going to go for
XSL-FO, there's going to have to be some way for it to get us there
using a free toolchain.

My priority for now is the generic transform and the XHTML transform.

One possibility we haven't discussed yet is going via XSL-FO to ODT
(OpenDocument text) instead of PDF directly. Azriel found
http://fo2odf.sourceforge.net/ , which is XSLT 1.0 stylesheets ; it
doesn't support everything we need, and would need some testing before
we commit to it. It also makes the rendering the job of the
OpenDocument application. That might be a problem on the Mac, given
that its system support for OpenType fonts is horrible. I also don't
know about font embedding in ODT. It won't do it directly, but if it's
possible, we could enhance it.

I generated the test.odt from the XSL-FO file you provided and I used
OpenOffice.org to convert it to PDF. Some of the Hebrew text is
missing, but it's because you used fo:bidi-override for the last two
verses and that's not supported by fo2odf. (The first two used fo:block).

> Why? If it ever becomes an issue, XEP can always be replaced. It's
> just the last piece in the tool-chain.
>

It could potentially be just as hard to replace it then as it is now.
FOP (probably the best hope for open source XSL-FO) looks like it's not
anywhere near supporting the necessary features.

test.odt
test.pdf

Azriel Fasten

unread,
Jan 27, 2010, 12:57:10 AM1/27/10
to jewishliturgy-discuss

Azriel Fasten

unread,
Jan 27, 2010, 1:05:32 AM1/27/10
to jewishliturgy-discuss
On Tue, Jan 26, 2010 at 11:24 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
Hi Ze'ev,


Ze'ev Clementson wrote:
The difference is that BitKeeper was an integral part of the Linux
development process and required a huge effort to replace. For Open
Siddur, the PDF-generation component is the last component in the tool
chain, is optional for some users (those who may want HTML output
rather than PDF) and can be replaced much more easily (with XeTeX or
one of the other alternatives) if necessary.
 

We don't have infinite developer time, so if we're going to aim for a target at this point, it's going to be one where we have a plan for it to work within our mission requirements.  XeTeX requires a very different set of transforms than XSL-FO, so if we're going to go for XSL-FO, there's going to have to be some way for it to get us there using a free toolchain.

My priority for now is the generic transform and the XHTML transform.

One possibility we haven't discussed yet is going via XSL-FO to ODT (OpenDocument text) instead of PDF directly.  Azriel found http://fo2odf.sourceforge.net/ , which is XSLT 1.0 stylesheets ; it doesn't support everything we need, and would need some testing before we commit to it.  It also makes the rendering the job of the OpenDocument application.  That might be a problem on the Mac, given that its system support for OpenType fonts is horrible.  I also don't know about font embedding in ODT.  It won't do it directly, but if it's possible, we could enhance it.

Mac provides a horrible font renderer, but that leaves it up to the application to decide whether to do the font rendering itself or rely on the system. Its very similar to depending on Uniscribe in Windows. Some applications do, and some don't. I dunno what Open Office does.
 

I generated the test.odt from the XSL-FO file you provided and I used OpenOffice.org to convert it to PDF.  Some of the Hebrew text is missing, but it's because you used fo:bidi-override for the last two verses and that's not supported by fo2odf.  (The first two used fo:block).

This is beginning to sound promising. Just generate an ODF and ignore any font issues; provide the font files along side the ODF and leave the ugliness for the renderer to deal with. It would also be interesting extending the XSLT sheets to support bidi override etc. In worst case, can't we just put the Unicode characters for overriding in the output in the stead of fo:bidi-override?

If we go this route, XSL-FO can also be considered somewhat of a complete target, for things like XEP, where someone can take our XSL-FO result and feed it to their favorite commercial renderer.


Why? If it ever becomes an issue, XEP can always be replaced. It's
just the last piece in the tool-chain.
 

It could potentially be just as hard to replace it then as it is now.  FOP (probably the best hope for open source XSL-FO) looks like it's not anywhere near supporting the necessary features.



--
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

--

Azriel Fasten

unread,
Jan 27, 2010, 1:11:14 AM1/27/10
to jewishliturgy-discuss


2010/1/26 Ze'ev Clementson <bere...@gmail.com>
Hi Efraim/Azriel,

I've attached a sample XSL-FO file with the corresponding PDF output.
On my Mac, the vowels/dageshim/taamim don't display very well, but
that is probably just the known issue with Mac Hebrew fonts. It would
be good if someone could confirm that everything displays correctly
when generated from a Linux or Windows box.

It didn't render correctly in the PDF. If you have gmail, you can view it in the gmail viewer. I also downloaded the file and checked in case Ezra was not embedded; it is embedded, and doesn't display correctly. However, Efraim's rendering does display. Admittedly he used another beast: Open Office, but there are many ODT renderers now (though a look around hasn't found one for Java to PDF). I think ODT might be the way to go as a "complete target" (http://wiki.jewishliturgy.org/Target_Survey#Some_Terms) and ignore how it gets rendered to PDF etc. as Efraim is suggesting.

Ze'ev Clementson

unread,
Jan 27, 2010, 1:25:19 AM1/27/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Jan 26, 2010 at 8:24 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Hi Ze'ev,
>
> Ze'ev Clementson wrote:
>>
>> The difference is that BitKeeper was an integral part of the Linux
>> development process and required a huge effort to replace. For Open
>> Siddur, the PDF-generation component is the last component in the tool
>> chain, is optional for some users (those who may want HTML output
>> rather than PDF) and can be replaced much more easily (with XeTeX or
>> one of the other alternatives) if necessary.
>>
>
> We don't have infinite developer time, so if we're going to aim for a target
> at this point, it's going to be one where we have a plan for it to work
> within our mission requirements.  XeTeX requires a very different set of
> transforms than XSL-FO, so if we're going to go for XSL-FO, there's going to
> have to be some way for it to get us there using a free toolchain.
>
> My priority for now is the generic transform and the XHTML transform.

Sounds like the correct approach at this stage of development. My only
reason for bringing up the PDF output at the moment is that it's
listed as one of the "Complete Targets" and one that is most likely to
be preferred by anyone who wants printed output. It's nice to have a
clear understanding as to how this target might be achievable.

> One possibility we haven't discussed yet is going via XSL-FO to ODT
> (OpenDocument text) instead of PDF directly.  Azriel found
> http://fo2odf.sourceforge.net/ , which is XSLT 1.0 stylesheets ; it doesn't
> support everything we need, and would need some testing before we commit to
> it.  It also makes the rendering the job of the OpenDocument application.
>  That might be a problem on the Mac, given that its system support for
> OpenType fonts is horrible.  I also don't know about font embedding in ODT.
>  It won't do it directly, but if it's possible, we could enhance it.

Do you really feel it is viable to go through so many layers of
transforms? To rely on this means:
JLPTEI -> XSL-FO -> ODT -> PDF

Every intermediate transform means (potential) loss of formatting information.

> I generated the test.odt from the XSL-FO file you provided and I used
> OpenOffice.org to convert it to PDF.  Some of the Hebrew text is missing,
> but it's because you used fo:bidi-override for the last two verses and
> that's not supported by fo2odf.  (The first two used fo:block).

So if we use fo2odf, we are forced to conform to a "least common
denominator" set of functionality? The last updates to fo2odf appear
to have been made 18 months ago.

>> Why? If it ever becomes an issue, XEP can always be replaced. It's
>> just the last piece in the tool-chain.
>>
>
> It could potentially be just as hard to replace it then as it is now.  FOP
> (probably the best hope for open source XSL-FO) looks like it's not anywhere
> near supporting the necessary features.

Again, I ask what the preferred alternative is. If output is to XSL-FO
and we recommend usage of XEP (and, perhaps offer a server-based
generation option using either the free XEP or a commercial XEP) for
quality PDF output, then there is an immediate PDF solution if we go
the XSL-FO route. If we go for a 100% open source solution, then what
is the preferred option?

- Ze'ev

Ze'ev Clementson

unread,
Jan 27, 2010, 1:25:31 AM1/27/10
to jewishlitu...@googlegroups.com
On Tue, Jan 26, 2010 at 9:57 PM, Azriel Fasten <fst...@gmail.com> wrote:
> I updated the Target Survey wiki page. To only read what I
> changed: http://wiki.jewishliturgy.org/w/index.php?title=Target_Survey&diff=24441&oldid=23244.

Thanks Azriel!

- Ze'ev

Ze'ev Clementson

unread,
Jan 27, 2010, 1:26:01 AM1/27/10
to jewishlitu...@googlegroups.com
Hi Azriel,

Due to the Hebrew font issues on the Mac, it's hard to experiment with
different options. Maybe I'll just start testing in a vmware session
so that I get a better feel for the alternatives.

>> I generated the test.odt from the XSL-FO file you provided and I used
>> OpenOffice.org to convert it to PDF.  Some of the Hebrew text is missing,
>> but it's because you used fo:bidi-override for the last two verses and
>> that's not supported by fo2odf.  (The first two used fo:block).
>
> This is beginning to sound promising. Just generate an ODF and ignore any
> font issues; provide the font files along side the ODF and leave the
> ugliness for the renderer to deal with. It would also be interesting
> extending the XSLT sheets to support bidi override etc. In worst case, can't
> we just put the Unicode characters for overriding in the output in the stead
> of fo:bidi-override?
> If we go this route, XSL-FO can also be considered somewhat of a complete
> target, for things like XEP, where someone can take our XSL-FO result and
> feed it to their favorite commercial renderer.

I like the idea of going with XSL-FO as an intermediate format. We can
always recommend XEP as one option and other alternatives as they
become more robust.

- Ze'ev

Ze'ev Clementson

unread,
Jan 27, 2010, 1:26:24 AM1/27/10
to jewishlitu...@googlegroups.com
Hi Azriel,

Did you try downloading XEP and creating the PDF using my xml file? It
would be interesting to see if the XEP PDF output was much improved
when generated from a Linux box. If I have time tomorrow, I'll setup a
vmware session and retry generating the XEP PDF file (if you haven't
already tested it).

- Ze'ev

Azriel Fasten

unread,
Jan 27, 2010, 9:23:47 AM1/27/10
to jewishliturgy-discuss
There might actually be one more:

JLPTEI -> muXHTML -> XSL-FO -> ODT -> PDF

We already have muXHTML. It relies on the CSS box model to format everything. Every possible format table type of text is surrounded by classed divs to be able to style that text in the correct manner. Out of line text is easiest to imagine; for example, a footnote is in a <div class="footnote">some text</div>. Transforming from this format would be easier.

muXHTML is made to keep all formatting information possible, so little loss there.

Again, XSL-FO can be considered a worthy target of its own. And ODT is definitely a complete target.

XSL-FO might have restrictions on certain things, but as a standard, we would and perhaps should work within those restrictions (not be too fancy). Thus any loss that we have, would be working within a good formatting standard directly from our almost loss-less (formatting wise) muXHTML.

I imagine that ODT, as a word processing format is even more powerful than XSL-FO, something closer to a level of PDF.

I am not sure what the ODT specs say, but are ODT conforming renders required to render everything the same? That is, if so, there would be no loss from ODT->PDF. Keep in mind, these are display formats, thus lots of information is already lost.
 

> I generated the test.odt from the XSL-FO file you provided and I used
> OpenOffice.org to convert it to PDF.  Some of the Hebrew text is missing,
> but it's because you used fo:bidi-override for the last two verses and
> that's not supported by fo2odf.  (The first two used fo:block).

So if we use fo2odf, we are forced to conform to a "least common
denominator" set of functionality? The last updates to fo2odf appear
to have been made 18 months ago.

fo2odf is not ready for use. If we were to use it, we would have to bring its functionality up to par with what we require. Since we have some experience with XSLT, this might not be difficult, depending on how easily it translates to ODF. We are also not restricted by XSLT 1.0.
 

>> Why? If it ever becomes an issue, XEP can always be replaced. It's
>> just the last piece in the tool-chain.
>>
>
> It could potentially be just as hard to replace it then as it is now.  FOP
> (probably the best hope for open source XSL-FO) looks like it's not anywhere
> near supporting the necessary features.

Again, I ask what the preferred alternative is. If output is to XSL-FO
and we recommend usage of XEP (and, perhaps offer a server-based
generation option using either the free XEP or a commercial XEP) for
quality PDF output, then there is an immediate PDF solution if we go
the XSL-FO route. If we go for a 100% open source solution, then what
is the preferred option?

I don't want to *recommend* XEP. I simply want to present a target: XSL-FO. There are many commercial implementations of XSL-FO.

I gotta run, but I just found another possible render path to PDF:


I'll check it out later.

Efraim Feinstein

unread,
Jan 27, 2010, 9:55:26 AM1/27/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
>> My priority for now is the generic transform and the XHTML transform.
>>
>
> Sounds like the correct approach at this stage of development. My only
> reason for bringing up the PDF output at the moment is that it's
> listed as one of the "Complete Targets" and one that is most likely to
> be preferred by anyone who wants printed output. It's nice to have a
> clear understanding as to how this target might be achievable.
>

Agreed! And, this is certainly a topic that should be discussed, and
our options considered.

So far, I think it's achievable with free software, but not out of the
box. Committing to a solution means working on third party projects and
getting them up to speed for the features we need. The questions, from
my perspective, are: which one will get us there fastest? Which one
will get us most of the needed functionality? and -- What are the
trade-offs we're making in committing to that solution?

As you'll see from later in this email, another factor may be who is
doing the work: if a Java developer decides he/she can devote time to
the problem, it would probably result in a Java solution; if an XSLT
developer does, it would probably result in an XSLT solution.

>> OpenType fonts is horrible. I also don't know about font embedding in ODT.
>> It won't do it directly, but if it's possible, we could enhance it.
>>
>
> Do you really feel it is viable to go through so many layers of
> transforms? To rely on this means:
> JLPTEI -> XSL-FO -> ODT -> PDF
>

In this case, the ODT would be the "complete target" output format. A
user would get either the ODT or the XSL-FO from the web application.

>
>> I generated the test.odt from the XSL-FO file you provided and I used
>> OpenOffice.org to convert it to PDF. Some of the Hebrew text is missing,
>> but it's because you used fo:bidi-override for the last two verses and
>> that's not supported by fo2odf. (The first two used fo:block).
>>
>
> So if we use fo2odf, we are forced to conform to a "least common
> denominator" set of functionality? The last updates to fo2odf appear
> to have been made 18 months ago.
>

My coding biases are probably showing here. I think I'd have an easier
time understanding how to modify fo2odf (which is XSLT stylesheets that
convert one XML dialect to another; both XML dialects have open
specifications) than modifying the internals of a rendering engine.
Before committing to using fo2odf as part of our toolchain, I would want
to check whether with its developer whether it's still under active
development. (It appears to be a project with a single developer.) If
not, we could adopt it or fork it.

>
> Again, I ask what the preferred alternative is.

I'm not sure yet. I think we need to do more testing of the types of
constructs we're going to use before committing to something.

I am sure that I want the version of the PDF rendering component that we
ship to be free software. (Users who want to use non-free software,
are, of course, welcome to do so).

Efraim Feinstein

unread,
Jan 27, 2010, 10:37:28 AM1/27/10
to jewishlitu...@googlegroups.com
Hi,

Please snip quotations to the minimum required to understand the thread.
5 lines of quote per paragraph of reply is usually more than enough. :-)

Azriel Fasten wrote:
>
> There might actually be one more:
>
> JLPTEI -> muXHTML -> XSL-FO -> ODT -> PDF

This may be the first way it's written (or not?), but it's not
necessary. You can hook into the transforms and directly output to a
non-HTML format. (it would involve "rewriting" the main entry point to
the transforms, transform.xsl2, which is literally just a set of
xsl:include statements.)

>
> muXHTML is made to keep all formatting information possible, so little
> loss there.

There are some choices that are made in the XHTML output renderer. For
example, where do out of line comments get rendered? A simple
in-browser stylesheet (or Javascript) could fix most of them, though.

>
> Again, XSL-FO can be considered a worthy target of its own. And ODT
> is definitely a complete target.
>
> XSL-FO might have restrictions on certain things, but as a standard,

W3C standards have a tendency to specify things that either nothing
supports or nothing supports well. We do have to keep an eye on how
well the available renderers work.

>
> I imagine that ODT, as a word processing format is even more powerful
> than XSL-FO, something closer to a level of PDF.

Not really. PDF is a printing language (the successor to PostScript)
and pretty much anything that can be expressed on paper can be expressed
in PDF. I don't remember where I read/heard the following, but these
witticisms are pretty much true:
- SAT analogy: word processor:word :: food processor:food
- A word processor is a cross between a bad text editor with a bad
desktop publishing application.

By that, it means that a word processor is meant to express a very
limited set of operations well.

In terms of output format quality:
- PDF can express anything that can be expressed on a page, but it's a
low level format. We need a higher level library to get there.
- You won't find anything out there that will beat the output quality
that TeX will give you. It is a professional-grade typesetting language.
- XSL-FO is limited in what it can describe. It's not made for complex
documents. It was written as a paged styling language for DocBook and HTML.
- It's not clear to me that there's anything you can do in XSL-FO that
can't be done in XHTML+CSS3 with paging extensions. Which you use is a
matter of convenience and available renderers.
- ODT is a word processing language, and as such, it's stuck in the
word-processing mindset. It does have the advantage that a user could
easily edit/tweak the result without having to go through the entire
compilation process. So, it might be a worthy complete target even if
we find a way to PDF.

>
> I am not sure what the ODT specs say, but are ODT conforming renders
> required to render everything the same?

I'm not sure. I know that de-facto there's a lot of variation in the
ODT-rendering space. Compare KWord's implementation to OpenOffice's to
see what I mean.

>
> I gotta run, but I just found another possible render path to PDF:
>
> http://www.reportlab.com/software/opensource/

ReportLab is used by pisa as its PDF rendering backend. Using ReportLab
would mean having to plug a different rendering description language
(XSL-FO or XHTML+CSS, eg) into it. It is Python based.

Ze'ev Clementson

unread,
Jan 27, 2010, 9:05:20 PM1/27/10
to jewishlitu...@googlegroups.com
Hi all,

Today, I installed Windows XP (I was initially going to install Linux,
but I need Windows XP for some other work I'll be doing later this
week so used that instead) in a vmware session along with Java, XEP,
and the Ezra SIL font. Unfortunately, the Hebrew PDF output was no
better than the output from my Mac, so it looks like XEP may not be an
option even if there were no strong objections to using a commercial
PDF-generator.

Therefore, for PDF output, it would appear that XeTeX is still the
strongest option so far with the ODT option another possibility. The
nice thing about outputting to ODT would be that it would make it easy
for end users to do any "final tweaking" of texts. However, it would
be necessary to either target ODT directly (not sure how much work is
involved in this?) or utilize the fo2odt XSLT work (meaning we would
have to output to XSL-FO and enhance the fo2odt code where necessary).

The other option that Efraim mentioned (XHTML/CSS3) is also
potentially quite attractive. If we output to XHTML and can control
both visual and print formatting using CSS3, that may be a very
effective, low-overhead, consistent way to deal with output. It would
also be relatively easy for an end-user to do "final tweaking" of
texts (although perhaps not as easy as with output to ODT). I haven't
been following the CSS3 developments so know nothing about what is
currently available/supported and what is still just proposed. Can
anyone provide a short synopsis of what the current state of CSS3 is
and what might be the pros/cons of using it for print output?

- Ze'ev

Aharon Varady

unread,
Jan 27, 2010, 9:14:44 PM1/27/10
to jewishlitu...@googlegroups.com
On Wed, Jan 27, 2010 at 9:05 PM, Ze'ev Clementson <bere...@gmail.com> wrote:

Therefore, for PDF output, it would appear that XeTeX is still the
strongest option so far with the ODT option another possibility. The
nice thing about outputting to ODT would be that it would make it easy
for end users to do any "final tweaking" of texts. However, it would
be necessary to either target ODT directly (not sure how much work is
involved in this?) or utilize the fo2odt XSLT work (meaning we would
have to output to XSL-FO and enhance the fo2odt code where necessary).


I like this option in any case since indeed we want to provide siddur crafters with a more editable end product than PDF (the latter not being intended for editing). I believe we currently have the Reb Zalman Siddur available for download in both ODT and PDF.


Aharon

Azriel Fasten

unread,
Jan 27, 2010, 9:19:42 PM1/27/10
to jewishliturgy-discuss
On Wed, Jan 27, 2010 at 9:05 PM, Ze'ev Clementson <bere...@gmail.com> wrote:


Today, I installed Windows XP (I was initially going to install Linux,
but I need Windows XP for some other work I'll be doing later this
week so used that instead) in a vmware session along with Java, XEP,
and the Ezra SIL font. Unfortunately, the Hebrew PDF output was no
better than the output from my Mac, so it looks like XEP may not be an
option even if there were no strong objections to using a commercial
PDF-generator.

There are other commercial FO implementations.
 

The other option that Efraim mentioned (XHTML/CSS3) is also
potentially quite attractive. If we output to XHTML and can control
both visual and print formatting using CSS3, that may be a very
effective, low-overhead, consistent way to deal with output. It would
also be relatively easy for an end-user to do "final tweaking" of
texts (although perhaps not as easy as with output to ODT). I haven't
been following the CSS3 developments so know nothing about what is
currently available/supported and what is still just proposed. Can
anyone provide a short synopsis of what the current state of CSS3 is
and what might be the pros/cons of using it for print output?

See http://www.w3.org/TR/css3-page/; it has very good illustrations of how powerful it is.
CSS3 allows for more control of paged media. It allows counters, which can be incremented as a property of an element, and has selectors for things like @page, and can set an element as a footer or margin, etc. So, for example:

body {counter-reset: chapter;}
div.chapter {counter-increment: chapter;}
@page {
  margin: 10%;
  @top-center { content: "Chapter" counter(chapter) }
}
This is exactly why Flying Saucer / xhtmlrenderer is so exciting; it supports some of CSS3 and can easily be extended to support more of it as needed. Unfortunately, it uses iText as its PDF backend, and in addition, does not do BIDI reordering. BIDI rendering is a larger job, and if we use this library, it would have to be implemented. For the backend, xhtmlrenderer uses plugable renderers, and it would not be difficult to put another PDF renderer there; in fact, if we are willing to live with vectors instead of semantic text, its probably not too difficult (see http://wiki.jewishliturgy.org/Target_Survey#PDF.2Fpostscript) as there is already an image renderer (we can also easily render to SVG etc.).

Efraim Feinstein

unread,
Jan 27, 2010, 10:01:31 PM1/27/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> option even if there were no strong objections to using a commercial
> PDF-generator.
>

Just to be clear, my objection is not to using a *commercial* product in
the toolchain. It's to using a non-free product in the toolchain. Our
XSLT transformer, Saxon HE, is both a commercial product and free
software. (Saxon EE and Saxon PE are commercial products and non-free
software).

> Therefore, for PDF output, it would appear that XeTeX is still the
>

Over all available options, XeTeX has the chance to give the nicest
output. For running on the server (let alone trying to run it on the
client through the web), it's got issues.

> strongest option so far with the ODT option another possibility. The
> nice thing about outputting to ODT would be that it would make it easy
> for end users to do any "final tweaking" of texts. However, it would
> be necessary to either target ODT directly (not sure how much work is
> involved in this?)

There's no reason to reinvent the wheel. If we go this route -- or,
even if we *don't* target PDF though ODT, I would target ODT through
XSL-FO and modify fo2odt. So far, it seems to be the best way we know
of to get there, and it would give us an XSL-FO output too.

Azriel and I have edited the Target Summary on the wiki.

Ze'ev Clementson

unread,
Jan 27, 2010, 10:03:06 PM1/27/10
to jewishlitu...@googlegroups.com
On Wed, Jan 27, 2010 at 6:19 PM, Azriel Fasten <fst...@gmail.com> wrote:
>
>
> On Wed, Jan 27, 2010 at 9:05 PM, Ze'ev Clementson <bere...@gmail.com>
> wrote:
>>
>>
>> Today, I installed Windows XP (I was initially going to install Linux,
>> but I need Windows XP for some other work I'll be doing later this
>> week so used that instead) in a vmware session along with Java, XEP,
>> and the Ezra SIL font. Unfortunately, the Hebrew PDF output was no
>> better than the output from my Mac, so it looks like XEP may not be an
>> option even if there were no strong objections to using a commercial
>> PDF-generator.
>
> There are other commercial FO implementations.
>
>>
>> The other option that Efraim mentioned (XHTML/CSS3) is also
>> potentially quite attractive. If we output to XHTML and can control
>> both visual and print formatting using CSS3, that may be a very
>> effective, low-overhead, consistent way to deal with output. It would
>> also be relatively easy for an end-user to do "final tweaking" of
>> texts (although perhaps not as easy as with output to ODT). I haven't
>> been following the CSS3 developments so know nothing about what is
>> currently available/supported and what is still just proposed. Can
>> anyone provide a short synopsis of what the current state of CSS3 is
>> and what might be the pros/cons of using it for print output?
>
> See http://www.w3.org/TR/css3-page/; it has very good illustrations of how
> powerful it is.

According to this page:
http://www.w3.org/Style/CSS/current-work

CSS Paged Media is currently at a status of "Last Call". The stages
that a W3C proposal goes through are:
5.2.1 Working Draft (WD)
5.2.2 Last Call Working Draft
5.2.3 Candidate Recommendation (CR)
5.2.4 Proposed Recommendation (PR)
5.2.5 Recommendation (REC)

So, this proposal is currently only at #2. Are there many projects
that are building stuff based on the Last Call Working Draft? As I
said, I haven't been following CSS3, so I don't know whether it's
already being implemented or is just a formalization of certain
practices that vendors have already incorporated in their products.
I'm just wondering whether it's technology that is likely to be usable
in it's current form and whether there is already code that we can
make use of.

This page:
http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(CSS)#CSS_version_support

seems to show that (for browsers, at least) support of CSS3 is still
fairly low. But, maybe that doesn't matter if the timeframe for CSS3
support (or, at least the page-related functionality) is targeted at
the near future.

- Ze'ev

Efraim Feinstein

unread,
Jan 27, 2010, 10:03:55 PM1/27/10
to jewishlitu...@googlegroups.com
Azriel,

Azriel Fasten wrote:
>
> another PDF renderer there; in fact, if we are willing to live with
> vectors instead of semantic text, its probably not too difficult
> (see http://wiki.jewishliturgy.org/Target_Survey#PDF.2Fpostscript) as
> there is already an image renderer (we can also easily render to SVG
> etc.).
>

How will use of vector graphics impact things like page breaks and line
breaks?

Ze'ev Clementson

unread,
Jan 27, 2010, 10:11:46 PM1/27/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Jan 26, 2010 at 8:24 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> One possibility we haven't discussed yet is going via XSL-FO to ODT
> (OpenDocument text) instead of PDF directly.  Azriel found
> http://fo2odf.sourceforge.net/ , which is XSLT 1.0 stylesheets ; it doesn't
> support everything we need, and would need some testing before we commit to
> it.  It also makes the rendering the job of the OpenDocument application.
>  That might be a problem on the Mac, given that its system support for
> OpenType fonts is horrible.  I also don't know about font embedding in ODT.
>  It won't do it directly, but if it's possible, we could enhance it.
>
> I generated the test.odt from the XSL-FO file you provided and I used
> OpenOffice.org to convert it to PDF.  Some of the Hebrew text is missing,
> but it's because you used fo:bidi-override for the last two verses and
> that's not supported by fo2odf.  (The first two used fo:block).

I attempted to use fo2odf to transform my XSL-FO example to PDF as you
had done; however, following the instructions in the README.TXT file
(under the "Installation of the 'OpenOffice' version" section), I was
unable to get the XSL-FO document to display as an ODT document. I
also tried using Saxon to directly transform it with
fo2odf-one_xml.xsl (again unsuccessfully). What were the steps that
you did to produce the test.pdf file using fo2odf?

Thanks,
Ze'ev

Efraim Feinstein

unread,
Jan 27, 2010, 10:28:23 PM1/27/10
to jewishlitu...@googlegroups.com
Hi Ze'ev,

Ze'ev Clementson wrote:
>
> fo2odf-one_xml.xsl (again unsuccessfully). What were the steps that
> you did to produce the test.pdf file using fo2odf?
>

fo2odf uses an extension that Saxon doesn't support. (This isn't a
problem; XSLT 1.0 transforms frequently use extensions because XSLT 1.0
was under-functional).

The commands I used were (from the fo2odt source directory, with your
.fo file named test.fo ):
xsltproc xsl/odf/fo2odf-archive.xsl test.fo
cd transformed
zip ../test.odt *

The test.odt file is the ODT I attached. The PDF was generated by
running OOo, opening the file and pressing File | Export as PDF

It probably won't work on a Mac because OOo uses the system renderer,
and that won't support the complex OpenType font.

Ze'ev Clementson

unread,
Jan 27, 2010, 11:25:14 PM1/27/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Wed, Jan 27, 2010 at 7:28 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Hi Ze'ev,
>
> Ze'ev Clementson wrote:
>>
>> fo2odf-one_xml.xsl (again unsuccessfully). What were the steps that
>> you did to produce the test.pdf file using fo2odf?
>>
>
> fo2odf uses an extension that Saxon doesn't support.  (This isn't a problem;
> XSLT 1.0 transforms frequently use extensions because XSLT 1.0 was
> under-functional).
>
> The commands I used were (from the fo2odt source directory, with your .fo
> file named test.fo ):
> xsltproc xsl/odf/fo2odf-archive.xsl test.fo
> cd transformed
> zip ../test.odt *
>
> The test.odt file is the ODT I attached.  The PDF was generated by running
> OOo, opening the file and pressing File | Export as PDF
>
> It probably won't work on a Mac because OOo uses the system renderer, and
> that won't support the complex OpenType font.

I just tried it and you're right - the transform worked but, when I
attempt to open the resulting test.odt file, OpenOffice thinks the
file is corrupt. However, the test.odt file that you posted was
viewable/editable in OpenOffice. Shouldn't your test.odt have been the
same as my test.odt file since they both processed the same XSL-FO
file with the same XSLT stylesheet? And, what is OOo? I thought you
were referring to the word processing application from OpenOffice.org
but it seems like you might be talking about something else as
OpenOffice on the Mac does open your test.odt file without any
problems - is OOo something different?

- Ze'ev

Azriel Fasten

unread,
Jan 27, 2010, 11:52:31 PM1/27/10
to jewishliturgy-discuss
On Wed, Jan 27, 2010 at 10:03 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
Azriel,


Azriel Fasten wrote:

another PDF renderer there; in fact, if we are willing to live with vectors instead of semantic text, its probably not too difficult (see http://wiki.jewishliturgy.org/Target_Survey#PDF.2Fpostscript) as there is already an image renderer (we can also easily render to SVG etc.).


How will use of vector graphics impact things like page breaks and line breaks?


Line breaks will work perfectly. The renderer is simply required to tell flying saucer the dimensions a rendered piece of text would take up; flying saucer would then break the text up as appropriate to make it fit per line.

Page breaks is an issue that I discussed in the flying saucer mailing list here. In short, right now, it simply renders to one huge page, but it shouldn't be difficult to set a size and render it to separate pages.

 

Efraim Feinstein

unread,
Jan 28, 2010, 12:55:59 AM1/28/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> I just tried it and you're right - the transform worked but, when I
> attempt to open the resulting test.odt file, OpenOffice thinks the
> file is corrupt.

Check that the zip file you created (read: the odt) looks something like
this (use unzip -l)
Length Date Time Name
--------- ---------- ----- ----
6707 2010-01-27 22:14 content.xml
0 2010-01-27 22:14 META-INF/
510 2010-01-27 22:14 meta.xml
39 2010-01-27 22:14 mimetype
287 2010-01-27 22:14 settings.xml
3324 2010-01-27 22:14 styles.xml

OOo is shorthand for OpenOffice.org :-)

> OpenOffice on the Mac does open your test.odt file without any
> problems - is OOo something different?
>

Last I checked, the Mac version has rendering problems with complex
OpenType layout. *But*, I was also testing on OS X 10.5, and, as we
discovered, some applications work better in 10.6.

Ze'ev Clementson

unread,
Jan 28, 2010, 1:22:10 AM1/28/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Wed, Jan 27, 2010 at 9:55 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Hi,
>
> Ze'ev Clementson wrote:
>>
>> I just tried it and you're right - the transform worked but, when I
>> attempt to open the resulting test.odt file, OpenOffice thinks the
>> file is corrupt.
>
> Check that the zip file you created (read: the odt) looks something like
> this (use unzip -l)
>  Length      Date    Time    Name
> ---------  ---------- -----   ----
>    6707  2010-01-27 22:14   content.xml
>       0  2010-01-27 22:14   META-INF/
>     510  2010-01-27 22:14   meta.xml
>      39  2010-01-27 22:14   mimetype
>     287  2010-01-27 22:14   settings.xml
>    3324  2010-01-27 22:14   styles.xml

This is what I get:

~/Desktop/fo2odf-1.2.1 $ xsltproc xsl/odf/fo2odf-archive.xsl test.fo
INFO: === Expanding FO shorthand properties... ===
INFO: === Computing FO properties values... ===
INFO: === Copying inherited properties... ===
INFO: === Preprocessing FOs for easier transformation to ODF... ===
INFO: === Transforming preprocessed FOs to ODF elements... ===
INFO: === Fixing structure of the ODF elements... ===
INFO: === Compressing ODF automatic styles... ===
INFO: === Generating ODF automatic styles... ===
INFO: === Generating ODF text content... ===
~/Desktop/fo2odf-1.2.1 $ cd transformed
~/Desktop/fo2odf-1.2.1/transformed $ zip ../test.odt *
adding: META-INF/ (stored 0%)
adding: content.xml (deflated 78%)
adding: meta.xml (deflated 51%)
adding: mimetype (stored 0%)
adding: settings.xml (deflated 46%)
adding: styles.xml (deflated 75%)
~/Desktop/fo2odf-1.2.1/transformed $ cd ..
~/Desktop/fo2odf-1.2.1 $ ls
LICENCE.TXT bin etc lib
samples test.odt xsl
README.TXT doc java php
test.fo transformed
~/Desktop/fo2odf-1.2.1 $ cp test.odt ../test/
~/Desktop/fo2odf-1.2.1 $ cd ../test/
~/Desktop/test $ ls
test.odt
~/Desktop/test $ unzip -l test.odt
Archive: test.odt
Length Date Time Name
-------- ---- ---- ----
0 01-27-10 20:04 META-INF/
6709 01-27-10 20:04 content.xml
510 01-27-10 20:04 meta.xml
39 01-27-10 20:04 mimetype
287 01-27-10 20:04 settings.xml
3330 01-27-10 20:04 styles.xml
-------- -------
10875 6 files
~/Desktop/test $


> OOo is shorthand for OpenOffice.org :-)

So there is some logic in the world!

>> OpenOffice on the Mac does open your test.odt file without any
>> problems - is OOo something different?
>>
>
> Last I checked, the Mac version has rendering problems with complex OpenType
> layout.  *But*, I was also testing on OS X 10.5, and, as we discovered, some
> applications work better in 10.6.

The odd thing is that I am able to open your test.odt file and it
looks fine; however, OOo thinks my test.odt file is corrupt. We
processed the same FO file with the same XSLT file using the same
xsltproc engine, but our output differs by a few bytes:

Yours:


6707 2010-01-27 22:14 content.xml

3324 2010-01-27 22:14 styles.xml

Mine:
6709 01-27-10 20:04 content.xml
3330 01-27-10 20:04 styles.xml

Here's a diff of the two different files (your files are in ~/Desktop,
mine are in ~/Desktop/test):

~/Desktop $ diff -u content.xml test/content.xml
--- content.xml 2010-01-26 19:58:26.000000000 -0800
+++ test/content.xml 2010-01-27 20:04:04.000000000 -0800
@@ -4,7 +4,7 @@
<style:style style:family="paragraph" style:name="P1">
<style:paragraph-properties fo:text-align="start" style:font-name="F1"/>
</style:style>
- <style:style style:family="paragraph" style:name="P_2"
style:master-page-name="MP_id343384">
+ <style:style style:family="paragraph" style:name="P_2"
style:master-page-name="MP_id35946642">
<style:paragraph-properties style:line-height-at-least="20.4pt"
fo:margin-top="NaNpt" fo:margin-bottom="NaNpt" fo:text-align="center"
fo:orphans="2" fo:widows="2"/>
<style:text-properties fo:color="#000000" fo:font-family="Ezra
SIL" fo:font-size="17.3pt" fo:font-weight="bold"/>
</style:style>
~/Desktop $ diff -u styles.xml test/styles.xml
--- styles.xml 2010-01-26 19:58:26.000000000 -0800
+++ test/styles.xml 2010-01-27 20:04:04.000000000 -0800
@@ -1,7 +1,7 @@
<?xml version="1.0"?>
<office:document-styles
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:foIn="http://www.w3.org/1999/XSL/Format"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:ooow="http://openoffice.org/2004/writer"
xmlns:oooc="http://openoffice.org/2004/calc"
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms" version="1.0">
<office:automatic-styles>
- <style:page-layout style:name="PL_id343328">
+ <style:page-layout style:name="PL_id35946588">
<style:page-layout-properties fo:color="#000000"
fo:margin-top="36pt" fo:margin-bottom="36pt" fo:margin-left="36pt"
fo:margin-right="36pt" fo:page-width="612pt" fo:page-height="792pt"/>
<style:header-style>
<style:header-footer-properties fo:color="#000000"
fo:min-height="17.00787402pt" fo:border-bottom="1pt solid #000000"
style:join-border="false" style:vertical-align="middle"/>
@@ -27,7 +27,7 @@
</style:style>
</office:styles>
<office:master-styles>
- <style:master-page style:name="MP_id343384"
style:page-layout-name="PL_id343328">
+ <style:master-page style:name="MP_id35946642"
style:page-layout-name="PL_id35946588">
<style:header>
<text:p text:style-name="STATIC-P_1"> XSL-FO Hebrew test
using XEP </text:p>
</style:header>
~/Desktop $

Any ideas why yours is different from mine and why yours is readable
by OOo but mine isn't?

- Ze'ev

test.odt

Ze'ev Clementson

unread,
Jan 28, 2010, 1:52:58 AM1/28/10
to jewishlitu...@googlegroups.com

I subsequently diff'ed the other files just to make certain there were
no differences (even though the byte counts were the same). There were
no differences in those files:

~/Desktop $ diff -u meta.xml test/meta.xml
~/Desktop $ diff -u mimetype test/mimetype
~/Desktop $ diff -u settings.xml test/settings.xml
~/Desktop $

> Any ideas why yours is different from mine and why yours is readable
> by OOo but mine isn't?

I don't normally use OOo and I don't know the ODT file definitions,
but the diffs seem to indicate that the only variance between your
test.odt and mine are:

- <style:style style:family="paragraph" style:name="P_2"
style:master-page-name="MP_id343384">
+ <style:style style:family="paragraph" style:name="P_2"
style:master-page-name="MP_id35946642">

and

- <style:page-layout style:name="PL_id343328">
+ <style:page-layout style:name="PL_id35946588">

and

- <style:master-page style:name="MP_id343384"
style:page-layout-name="PL_id343328">
+ <style:master-page style:name="MP_id35946642"
style:page-layout-name="PL_id35946588">

Any idea why these would mean the difference between a readable ODT
file and a non-readable one?

- Ze'ev

Efraim Feinstein

unread,
Jan 28, 2010, 10:02:51 AM1/28/10
to jewishlitu...@googlegroups.com
Hi,

Azriel Fasten wrote:
>
> Page breaks is an issue that I discussed in the flying saucer mailing

> list here <http://markmail.org/message/vq2wjmehtkoreviw>. In short,

> right now, it simply renders to one huge page, but it shouldn't be
> difficult to set a size and render it to separate pages.

Page breaking is a nontrivial operation. Consider that you could have
multiple text streams on one page (the amount of space taken up by
footnotes is variable, depending on how many footnotes the page has).
Wherever the layout engine is, it has to be aware of page breaks in
order to do placement of text (I think).

If we go the flying saucer route, we need to do a few things: (1) see if
we can plug in an alternate PDF library to iText: That's because iText
switched from an LGPL/MPL dual license to AGPL in the latest version.
Once flying saucer upgrades the version of iText it uses, we won't be
able to link iText into our applet because of license incompatibility.
[we could in fact delay this step and continue using the old version of
iText as long as flying saucer does] (2) Introduce bidi into flying
saucer and OpenType/complex layout into the PDF engine (do I have this
right?).

My sense is that if we're going to do this, we need to (as flying saucer
dev Peter Brant called it) "do it right" or we're going to hit
functionality walls in the kluge solution very quickly.

Azriel Fasten

unread,
Jan 28, 2010, 10:19:13 AM1/28/10
to jewishliturgy-discuss
On Thu, Jan 28, 2010 at 10:02 AM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
Hi,

Azriel Fasten wrote:

Page breaks is an issue that I discussed in the flying saucer mailing list here <http://markmail.org/message/vq2wjmehtkoreviw>. In short, right now, it simply renders to one huge page, but it shouldn't be difficult to set a size and render it to separate pages.

Page breaking is a nontrivial operation.  Consider that you could have multiple text streams on one page (the amount of space taken up by footnotes is variable, depending on how many footnotes the page has).  Wherever the layout engine is, it has to be aware of page breaks in order to do placement of text (I think).

Page breaking is already handled by flying saucer, just not for images. It should be a matter of informing the CSS layout engine of where the page should end. and resetting the Graphics2D to work on a different image after a page break.
 

If we go the flying saucer route, we need to do a few things: (1) see if we can plug in an alternate PDF library to iText:  That's because iText switched from an LGPL/MPL dual license to AGPL in the latest version.  Once flying saucer upgrades the version of iText it uses, we won't be able to link iText into our applet because of license incompatibility.  [we could in fact delay this step and continue using the old version of iText as long as flying saucer does]

I don't think they can upgrade their iText, since AGPL forces AGPL or GPLv3.
 
 (2) Introduce bidi into flying saucer and OpenType/complex layout into the PDF engine (do I have this right?).

Yes. This is currently possible with a PDF Graphics2D api, but it wouldn't be semantic, but drawn vectors.

Efraim Feinstein

unread,
Jan 28, 2010, 10:59:32 AM1/28/10
to jewishlitu...@googlegroups.com
I think we have 3 viable options so far, and I want to summarize where I
think we are before we get too far into the details of any particular
implementation:

(1) Output XHTML+CSS w/paging extensions. The JLPTEI transforms are
pretty much "free" with the browser output code. Most of the additional
work is in coding paged CSS. Use flying saucer to output to PDF.
Requires extensive modification to rendering libraries (Java).

(2) Output XSL-FO. Requires an additional set of JLPTEI transforms to
output XSL-FO. Use fo2odt to transform XSL-FO to ODT; User can export
to PDF through third party software such as OpenOffice.org; Requires
extensive modification to fo2odt stylesheets (XSLT). Has the advantage
that other XSL-FO processors may be able to handle the transform output
as well.

(3) Output Xe(La)TeX->PDF; In theory, transforms can be written
immediately (in reality, it will likely be done only after the generic
backend transform reaches alpha status). In alpha stage, the user would
be presented with a TeX file and could install and run XeTeX on their
own. As we get to beta stage, we could offer a PDF transform on the
cloud for a fee. While I haven't done a complete study of the economics
or available providers, it looks like this could be done for under
$2/PDF (not per copy) on Amazon EC2 <http://aws.amazon.com/ec2/>. We
would have to write a TeX compilation server. (Would non-technical
users pay for PDF output or would they just see that a free way exists,
then give up saying "it's too hard to install and use?") The great
white hope is that ExTeX <http://extex.org> takes off and allows us to
run TeX via applet.

Comments?

Ze'ev Clementson

unread,
Jan 28, 2010, 1:08:06 PM1/28/10
to jewishlitu...@googlegroups.com
Hi Efraim,

I think this is a good technical summary, but it leaves out the
"end-user experience" factors associated with each option (which I
think are important to list too). I've added my comments regarding
what I perceive to be the pros/cons of each (from both technical and
end-user perspectives) below.

On Thu, Jan 28, 2010 at 7:59 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> I think we have 3 viable options so far, and I want to summarize where I
> think we are before we get too far into the details of any particular
> implementation:
>
> (1) Output XHTML+CSS w/paging extensions.  The JLPTEI transforms are pretty
> much "free" with the browser output code.  Most of the additional work is in
> coding paged CSS.  Use flying saucer to output to PDF.  Requires extensive
> modification to rendering libraries (Java).

Pros:
1. Technically, this is very attractive as it would allow us to make
use of one target output for content and multiple CSS3 stylesheets for
presentation (XHTML, PDF, etc).
2. Would allow us to leverage upcoming CSS3 tools/utilities.
3. XHTML5 & CSS3 are a good "future direction" to be aligned with.

Cons:
1. There is a "non-trivial" amount of work involved in modifying the
rendering libraries to support Hebrew/bidi.
2. This option relies on iText which is problematic (license issues,
potential long-term support issues with both iText & Flying Saucer).
3. The end-user would get a PDF so it's pretty much WYSIWYG with no
ability to do any "tweaks".
4. It is uncertain when CSS3 will become 'mainstream' and
developer/end-user tools will be available.

> (2) Output XSL-FO.  Requires an additional set of JLPTEI transforms to
> output XSL-FO.  Use fo2odt to transform XSL-FO to ODT; User can export to
> PDF through third party software such as OpenOffice.org; Requires extensive
> modification to fo2odt stylesheets (XSLT).  Has the advantage that other
> XSL-FO processors may be able to handle the transform output as well.

Pros:
1. Allows us to output ODT (allowing end-user to do "tweaks" to the
document) as an intermediate step prior to generating PDF.
2. Much of the XSL-FO -> ODT work has been done already with fo2odt.
3. User can (potentially) use commercial XSL-FO processors (allowing
for a number of different "final" document outputs).
4. XSL-FO is a standard for document formatting, so we can build our
own document renderers in the future if that becomes a desired
direction.
5. There are also open source and commercial XSL-FO editors that can
be used to directly edit the XSL-FO document.
6. Since XSL-FO is an XML standard, it means that our entire
development process (from input source documents to output formats) is
consistent and easily manipulated using XSLT.

Cons:
1. The fo2odt code would still need to be enhanced as it is incomplete
in it's current form (however, it could be enhanced with just those
XSL-FO constructs that are needed).
2. The XSL-FO standard is not very "popular" and (while there are
existing commercial vendors) it is unclear whether 3rd-party
commercial products will provide viable options for producing quality
Hebrew/bidi texts.
3. The XSL-FO standard is quite rich in being able to specify most
types of formatting; however, it is not as rich as TeX (however, this
may not be an issue as it is "good enough" for basic documents and the
end-user will be able to "tweak" the resulting document if necessary).
4. There is no open-source solution currently available that would
allow us to directly generate PDFs (using fo2odt requires the manual
step of opening the ODT document in OpenOffice and saving it as a
PDF).

> (3) Output Xe(La)TeX->PDF; In theory, transforms can be written immediately
> (in reality, it will likely be done only after the generic backend transform
> reaches alpha status).  In alpha stage, the user would be presented with a
> TeX file and could install and run XeTeX on their own.  As we get to beta
> stage, we could offer a PDF transform on the cloud for a fee.  While I
> haven't done a complete study of the economics or available providers, it
> looks like this could be done for under $2/PDF (not per copy) on Amazon EC2
> <http://aws.amazon.com/ec2/>.  We would have to write a TeX compilation
> server.  (Would non-technical users pay for PDF output or would they just
> see that a free way exists, then give up saying "it's too hard to install
> and use?")  The great white hope is that ExTeX <http://extex.org> takes off
> and allows us to run TeX via applet.

Note: In my comments below, I use "TeX" as a "generic" reference to
TeX-derivative options.

Pros:
1. A TeX-based solution provides the ultimate flexibility in
customizing the output.
2. Can utilize cloud-based solution for users who either can't or
don't want to install TeX locally.
3. TeX has been around for years, is very robust/mature, and there are
a lot of products/utilities/add-ons.
4. Potentially, TeX could be converted to ODT for end-user "tweaking"
(see: http://ubuntuforums.org/showthread.php?t=1033441).
5. There are end-user editors for WYSIWYG editing of TeX documents, so
(potentially) TeX output could be "tweaked" by end-users.

Cons:
1. TeX is a huge beast. It is a non-trivial install and complicated
for an end-user to use.
2. More difficult for developers to learn.
3. TeX is "old technology".
4. Although you can do "anything" with TeX, we only require a small
subset of it's functionality for the software-generated output so it
is "overkill".
5. It is unclear whether the TeX -> ODT option is going to work well.
6. Most users would prefer to use a more "conventional" editor (like
OpenOffice or Word) to do "tweaking" of documents rather than a TeX
editor (even if it was a WYSIWYG editor).

> Comments?

In my opinion, Option#3 (TeX) is the safest option. It requires no
immediate development effort prior to utilizing it (Efraim has already
used it in the proof-of-concept); however, it is not a very "elegant"
solution. Option#1 (XHTML/CSS3) is attractive technically; however, it
requires us to be early adopters of CSS3 and makes us dependent on
both the iText and Flying Saucer projects (for which there are license
and long-term viability issues) and would probably require substantial
Java development work in order to be usable. There is also no
guarantee that the technical issues can be resolved in a robust and
timely manner. Option#2 (XSL-FO) also requires a coding effort before
it will be a usable option; however, it gives us more options as it is
an "intermediate" format (XSL-FO) that can be transformed into
multiple end-user "target" formats (ODT, PDF, PS, RDF) using both open
source (fo2odt) and commercial products.

- Ze'ev

Russel Neiss

unread,
Jan 28, 2010, 2:22:39 PM1/28/10
to jewishlitu...@googlegroups.com
>> (1) Output XHTML+CSS w/paging extensions.  The JLPTEI transforms are pretty
>> much "free" with the browser output code.  Most of the additional work is in
>> coding paged CSS.  Use flying saucer to output to PDF.  Requires extensive
>> modification to rendering libraries (Java).
>
> Pros:
> 1. Technically, this is very attractive as it would allow us to make
> use of one target output for content and multiple CSS3 stylesheets for
> presentation (XHTML, PDF, etc).
> 2. Would allow us to leverage upcoming CSS3 tools/utilities.
> 3. XHTML5 & CSS3 are a good "future direction" to be aligned with.
>
> Cons:
> 1. There is a "non-trivial" amount of work involved in modifying the
> rendering libraries to support Hebrew/bidi.
> 2. This option relies on iText which is problematic (license issues,
> potential long-term support issues with both iText & Flying Saucer).
> 3. The end-user would get a PDF so it's pretty much WYSIWYG with no
> ability to do any "tweaks".
> 4. It is uncertain when CSS3 will become 'mainstream' and
> developer/end-user tools will be available.

Maybe I'm missing something -- but can't you call the xml using php
then convert it to pdf using the following -- http://www.fpdf.org/

Ze'ev Clementson

unread,
Jan 28, 2010, 3:08:53 PM1/28/10
to jewishlitu...@googlegroups.com
Hi Russel,

I don't know anything about fpdf and I don't know PHP; however, the
following from the FAQ would appear to rule it out for Hebrew:
"Don't use UTF-8 encoding. Standard FPDF fonts use ISO-8859-1 or Windows-1252."

It does apparently support "cp1255 (Hebrew)" but we would be using
Unicode. However, as I said, I don't use PHP, so I can't comment
further.

- Ze'ev

Efraim Feinstein

unread,
Jan 28, 2010, 3:11:45 PM1/28/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> "Don't use UTF-8 encoding. Standard FPDF fonts use ISO-8859-1 or Windows-1252."
>
> It does apparently support "cp1255 (Hebrew)" but we would be using
> Unicode. However, as I said, I don't use PHP, so I can't comment
> further.
>

TCPDF (http://www.tcpdf.org) modified FPDF for Unicode. They even claim
support for RTL and OpenType (although I couldn't tell you if that means
"OpenType" with complex layouts or not.

I think Azriel tried it and I think it broke down somewhere, but I can't
recall where.

Azriel Fasten

unread,
Jan 28, 2010, 4:51:43 PM1/28/10
to jewishlitu...@googlegroups.com


On Thu, Jan 28, 2010 at 2:22 PM, Russel Neiss <russel...@gmail.com> wrote:

Maybe I'm missing something -- but can't you call the xml using php
then convert it to pdf using the following -- http://www.fpdf.org/

 
IIRC, it does not support OpenType. See homepage: "TrueType, Type1 and encoding support".

Azriel Fasten

unread,
Jan 28, 2010, 4:53:30 PM1/28/10
to jewishlitu...@googlegroups.com


On Thu, Jan 28, 2010 at 3:11 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
Ze'ev Clementson wrote:
"Don't use UTF-8 encoding. Standard FPDF fonts use ISO-8859-1 or Windows-1252."

It does apparently support "cp1255 (Hebrew)" but we would be using
Unicode. However, as I said, I don't use PHP, so I can't comment
further.
 

TCPDF (http://www.tcpdf.org) modified FPDF for Unicode.  They even claim support for RTL and OpenType (although I couldn't tell you if that means "OpenType" with complex layouts or not.

I think Azriel tried it and I think it broke down somewhere, but I can't recall where.

It claimed to support OpenType, but it does not support Ezra. I was unable to definitively determine why. If you can get it to work, I would definitely port it to Java.

Efraim Feinstein

unread,
Jan 28, 2010, 4:56:58 PM1/28/10
to jewishlitu...@googlegroups.com
Azriel,

Azriel Fasten wrote:
>
> It claimed to support OpenType, but it does not support Ezra. I was
> unable to definitively determine why. If you can get it to work, I
> would definitely port it to Java.

Did you go through the instructions for incorporating new fonts?
<http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf_fonts>

What was the failure mode? (Do you still have a PDF example lying around?)

Azriel Fasten

unread,
Jan 28, 2010, 5:11:46 PM1/28/10
to jewishlitu...@googlegroups.com


On Thu, Jan 28, 2010 at 4:56 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
Azriel,

Did you go through the instructions for incorporating new fonts?  <http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf_fonts>

Yes. There was a message in the conversion log to the effect of "No kerning data found". I believe this was the cause of failure. 

What was the failure mode?  (Do you still have a PDF example lying around?)

It simply rendered the vowels in sequence with the letters. I do have a PDF, but I won't have access to it until Sunday.


Reply all
Reply to author
Forward
0 new messages