"knowing" LaTeX from a generic publishing aspect

Charles P. Schaum

unread,

Aug 19, 2011, 2:12:15 PM8/19/11

to

Dear TeXers,

As for "knowing" TeX, LaTeX, etc., I see in the publishing world that
things are headed in the direction of Scribe's Well Formed Document
Workflow, or home-grown iterations thereof. The idea is to use a rigorous
DTD in a CMS like Drupal or Philo to maintain an XML database. People can
tag Word files and submit them, whereupon they are run through Perl
scripts and pulled into the database.

From there, one can pull the XML into Quark, InDesign, and so on, and
even modify and export back to XML. From there you can do ebooks, etc.

ConTeXt has a multi-target approach built in. Not sure if the
implementation can be said to be as seamless in LaTeX via tex4ht, etc. If
LaTeX is to join this party, it has to become less of a text/code hybrid,
more "tagged" and formalized, and probably less friendly to creative work-
arounds. That may cramp some people's style.

Right now, LaTeX is a niche tool for specialized disciplines. People need
science books, and only a select few people using their preferred tools
can write them well. Thus, Springer and others have a workflow that meets
that setup. It is a peer-review situation that moves most of the editing
and typesetting (thus, the cost) off the publisher. They style files do
the rest.

This will not work in the humanities or in general publishing. It is
about all one can do to get humanities authors to tag Word files
properly. I have tried. XML is the future.

Biblical and classical studies can benefit from LaTeX, but I tried that
and got the big NO when I did a master's thesis. Even though Word is not
portable with RTL fonts beyond the Uniscribe compositing engine in
Windows and many profs use Macs. Still, they look at LaTeX and get a
little weirded out. But you might have there another niche.

Maybe the solution would be to work out a front end like Classical Text
Editor (there was once also something with DOS and EDMAC or ledmac) and
have biblatex and its contributed styles (especially Chicago in the US)
as the bibliography part. Probably a scaled-down distro would be the key.
But LaTeX in general publishing will probably fail from a business aspect
based on the human-intensive costs.

As an automated back-end, however, LaTeX might have possibilities. See
also:

http://www.scribenet.com/services/latex
http://www.scribenet.com/workflow/well-formed-document-workflow
http://www.scribedata.com/
http://learning.scribenet.com/about/wfdw/demo
http://learning.scribenet.com/node/609

Charles
--
Remove nospam to reply

OKB (not okblacke)

unread,

Aug 19, 2011, 3:15:51 PM8/19/11

to

Charles P. Schaum wrote:

> ConTeXt has a multi-target approach built in. Not sure if the
> implementation can be said to be as seamless in LaTeX via tex4ht,
> etc. If LaTeX is to join this party, it has to become less of a
> text/code hybrid, more "tagged" and formalized, and probably less
> friendly to creative work- arounds. That may cramp some people's
> style.

I believe this is a (possibly THE) key problem that makes TeX seem
backwards and old-fashioned from the point of view of modern software.
It goes back to Knuth's statement about TeX, that it is "designed for
making beautiful books" --- it's not designed for making easily parsable
document files. But in this day and age, it is absolutely essential for
documents to be portable, queryable, reformattable, etc. LaTeX improves
on TeX in this regard, but not enough (and it doesn't stop you from
using all the plain-TeX horrors).

I've recently been looking at some tools that attempt to convert
LaTeX to other formats (e.g., PlasTeX, Pandoc). It's heartbreaking to
see what these programs have to go through just to get the text of a
document out of TeX.

It would really be great if TeX had an API for its document
CREATION abilities (i.e., making beautiful books) that was separate from
the input file FORMAT, so you could create your document tree (in XML or
whatever you like) description and feed it in via the API to get your
PDF (or HTML, or whatever). Although TeX has some advantages, its
failure to separate these two aspects of the task makes me more and more
skeptical that it can survive outside of a math-based niche.

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown

Peter Flynn

unread,

Aug 20, 2011, 6:57:27 PM8/20/11

to

On 19/08/11 19:12, Charles P. Schaum wrote:
> As for "knowing" TeX, LaTeX, etc., I see in the publishing world that
> things are headed in the direction of Scribe's Well Formed Document
> Workflow, or home-grown iterations thereof. The idea is to use a rigorous
> DTD in a CMS like Drupal or Philo to maintain an XML database. People can
> tag Word files and submit them, whereupon they are run through Perl
> scripts and pulled into the database.
>
> From there, one can pull the XML into Quark, InDesign, and so on,

LaTeX should also be such a target. Transforming XML to LaTeX is the
easy bit.

> If
> LaTeX is to join this party, it has to become less of a text/code hybrid,
> more "tagged" and formalized, and probably less friendly to creative work-
> arounds. That may cramp some people's style.

Possibly, but I think this approach conflates two things: the editing
interface and the file format.

[La]TeX, probably uniquely among current typesetting systems, does not
have its own editing interface. It is therefore not possible for any
central authority to mandate what you can do in the code or how you can
do it. This has worked both for and against LaTeX: it has made it easy
to get started with a simple text editor, and it has allowed
sophisticated users to perform deep surgery on the innards; but now it
is making it hard to attract new users, who expect a full
typographically synchronous interface, and will not use any program that
does not have one.

[Avoid the term WYSIWYG, as it is misleading.]

The file format can only be parsed by TeX: no-one has ever written
another program that does this to the same extent. It is currently not
possible to transform arbitrary LaTeX documents into XML unassisted,
without requiring very significant analytic resources.

> Right now, LaTeX is a niche tool for specialized disciplines. People need
> science books, and only a select few people using their preferred tools
> can write them well. Thus, Springer and others have a workflow that meets
> that setup. It is a peer-review situation that moves most of the editing
> and typesetting (thus, the cost) off the publisher. They style files do
> the rest.

In some senses this is a rather narrow view. LaTeX is widely used
outside the natural sciences, something which scientists appear to be
unaware of. But LaTeX is not more widely used for three reasons:

1. the lack of a reliable typographically synchronous editor
2. the lack of good training for authors, to get them to use the
markup and packages provided instead of reinventing the wheel
3. a lack of marketing, which I have already posted about.

> This will not work in the humanities or in general publishing. It is
> about all one can do to get humanities authors to tag Word files
> properly. I have tried.

I have typeset many books for my institution and for publishers, written
by Humanities authors who took to LaTeX like a duck to water once they
were given a proper introduction. Too often their first encounter is via
a scientist colleague who "explains" it in terminology utterly foreign
to them.

> XML is the future.

Absolutely. But the XML can be typeset with LaTeX.

> Biblical and classical studies can benefit from LaTeX, but I tried that
> and got the big NO when I did a master's thesis. Even though Word is not
> portable with RTL fonts beyond the Uniscribe compositing engine in
> Windows and many profs use Macs. Still, they look at LaTeX and get a
> little weirded out. But you might have there another niche.

Right. Because they are shown raw markup and plaintext editors.

> Maybe the solution would be to work out a front end like Classical Text
> Editor (there was once also something with DOS and EDMAC or ledmac) and
> have biblatex and its contributed styles (especially Chicago in the US)
> as the bibliography part. Probably a scaled-down distro would be the key.
> But LaTeX in general publishing will probably fail from a business aspect
> based on the human-intensive costs.
>
> As an automated back-end, however, LaTeX might have possibilities.

There are several things that would need to be done: starting with the
hardest:

1. Write a new version of TeX that outputs XML. The invention of pdftex
was the first move away from DVI: we need another one to move to XML.
This is IMHO the ONLY way to get a transformation that is 100%
guaranteed to have taken everything into account. The hard bits are
defining the target schema, and rewriting TeX :-) Once the output is in
XML, a second stage (eg in XSLT) can transform the document to a more
commonly-used structure.

2. Write, beg, borrow, steal, or otherwise acquire a typographically
synchronous editor that will complete what LyX has started. It would
have to avoid completely any sign of ERT, and would need to have the
entire repertoire of CTAN packages built in, so that it would be able to
represent accurately any LaTeX document that did not have homebrew
macros that tinkered with plain TeX internals.

3. By far the easiest: switch all authors to an XML editor, using
whatever DTD/Schema is appropriate, and using LaTeX as the back-end
formatter, via an XSL transformation (instead of the XSL:FO solution
used in some systems now). This is what everyone working with XML and
LaTeX already does. The only problem is...guess what?...the lack of a
suitably usable typographically synchronous editor.

Verb. sap.
Lots of people here will dismiss this on the basis that this isn't what
TeX was written for, and that it does the author's soul good to have to
understand the markup and realise the complexity of what is happening
underneath the commands that get typed. I won't disagree: for those with
the time and aptitude for learning interior technologies, this may well
be true. But most authors just want to write, with the minimum of
interference from the system. And most of them, IMHE, do appreciate that
there are boundaries, and things that must be learned: but not down to
the level that authoring in plain text markup requires.

<plug>More on this and related stuff at next month's XML Summer School
in Oxford: http://www.xmlsummerschool.com</plug>

///Peter

Peter Flynn

unread,

Aug 20, 2011, 7:17:49 PM8/20/11

to

On 19/08/11 20:15, OKB (not okblacke) wrote in comp.text.tex:
[...]

> It would really be great if TeX had an API for its document
> CREATION abilities (i.e., making beautiful books) that was separate from
> the input file FORMAT, so you could create your document tree (in XML or
> whatever you like) description and feed it in via the API to get your
> PDF (or HTML, or whatever).

I am hoping to discuss this next month at the XML Summer School I just
mentioned. I raised it with several people last year, and they were very
supportive of the idea. Among the options would be a LaTeX output option
for XSLT (or more likely, an unofficial one that a processor would be
free to support or ignore), which would allow the specification of the
target tree in a way similar to the XHTML output option -- result tree
fragments in XML syntax plus xsl:environment and xsl:command. Or
something like. If I have time, I will post a strawman proposal to the
xml-tex-pdf mailing list (this message CC'd).

///Peter

OKB (not okblacke)

unread,

Aug 20, 2011, 8:50:51 PM8/20/11

to

Peter Flynn wrote:

> There are several things that would need to be done: starting with
> the hardest:
>
> 1. Write a new version of TeX that outputs XML. The invention of
> pdftex was the first move away from DVI: we need another one to
> move to XML. This is IMHO the ONLY way to get a transformation that
> is 100% guaranteed to have taken everything into account. The hard
> bits are defining the target schema, and rewriting TeX :-) Once
> the output is in XML, a second stage (eg in XSLT) can transform the
> document to a more commonly-used structure.

Hmmmm, why do you think this is necessary? To me it seems the
wrong way around. Frankly, I think TeX is beyond hope as an input
language. It's just not rigorously structured enough, and allows too
much messiness. It seems to me the better approach is the opposite:
create all your nice tools using XML or whatever sort of input format
you want, and then map those to TeX for output only.

From my perspective, basically the only the good thing about TeX is
that it produces nice-looking PDFs. Even there, it seems that for most
practical purposes it's limited to producing nice-looking PDFs only for
certain kinds of documents (e.g., trying to get wrappable figures or the
like in a completely general way is a real problem). Therefore, my
instinct is to try to make use of this core strength of TeX by using it
only as the last step in some toolchain --- you map your document markup
language to TeX at the very end, just to get a PDF. If you don't want a
PDF (you want HTML or something), you don't use TeX at all.

So what I see as the main task is getting a big enough arsenal of
TeX packages that make things look right, and making sure they work
completely seamlessly with one another. This means no conflicts or
unexpected problems when mixing multiple packages, and no gaps of the
type where "you can't use structure X inside/next to/around structure
Y".

Oleg Paraschenko

unread,

Aug 21, 2011, 8:17:03 AM8/21/11

to XML-T...@lists.ucc.ie

Hello Peter,

On 21 Aug., 01:17, Peter Flynn <pe...@silmaril.ie> wrote:
> On 19/08/11 20:15, OKB (not okblacke) wrote in comp.text.tex:
> [...]
>
> > It would really be great if TeX had an API for its document

> > CREATION abilities ...., so you could create your document
> > tree (in XML or whatever you like) description ....
...
> Among the options would be a LaTeX output option ...
> ... result tree fragments in XML syntax plus xsl:environment

> and xsl:command. Or something like.

Please look also at TeXML, it is based on a very similar idea:

http://getfo.org/texml/

TeXML: Resurrecting TeX in the XML world
http://www.tug.org/TUGboat/tb28-1/tb88parashchenko.pdf

A generalization:

Generate TeX documents using pdfscript
https://www.tug.org/members/TUGboat/tb31-3/tb99parashchenko.pdf

> If I have time, I will post a strawman proposal to the
> xml-tex-pdf mailing list (this message CC'd).
>
> ///Peter

--
Oleg

--
Oleg Parashchenko olpa@ http://uucode.com/
http://uucode.com/blog/ XML, TeX, Python, Mac, Chess

Peter Flynn

unread,

Aug 21, 2011, 1:11:18 PM8/21/11

to

On 21/08/11 01:50, OKB (not okblacke) wrote:
> Peter Flynn wrote:
>
>> There are several things that would need to be done: starting with
>> the hardest:
>>
>> 1. Write a new version of TeX that outputs XML. The invention of
>> pdftex was the first move away from DVI: we need another one to
>> move to XML. This is IMHO the ONLY way to get a transformation that
>> is 100% guaranteed to have taken everything into account. The hard
>> bits are defining the target schema, and rewriting TeX :-) Once
>> the output is in XML, a second stage (eg in XSLT) can transform the
>> document to a more commonly-used structure.
>
> Hmmmm, why do you think this is necessary? To me it seems the
> wrong way around.

Yes, it is. But there will always be people who want the control right
down to the last little bit.

> Frankly, I think TeX is beyond hope as an input
> language. It's just not rigorously structured enough, and allows too
> much messiness. It seems to me the better approach is the opposite:
> create all your nice tools using XML or whatever sort of input format
> you want, and then map those to TeX for output only.

Right.

> From my perspective, basically the only the good thing about TeX is
> that it produces nice-looking PDFs. Even there, it seems that for most
> practical purposes it's limited to producing nice-looking PDFs only for
> certain kinds of documents (e.g., trying to get wrappable figures or the
> like in a completely general way is a real problem).

It's a typesetter, not a page-layout program.

> Therefore, my
> instinct is to try to make use of this core strength of TeX by using it
> only as the last step in some toolchain --- you map your document markup
> language to TeX at the very end, just to get a PDF. If you don't want a
> PDF (you want HTML or something), you don't use TeX at all.
>
> So what I see as the main task is getting a big enough arsenal of
> TeX packages that make things look right, and making sure they work
> completely seamlessly with one another. This means no conflicts or
> unexpected problems when mixing multiple packages, and no gaps of the
> type where "you can't use structure X inside/next to/around structure
> Y".

Unfortunately, even if you can get that in the package coordination,
implementing the same rules inside an XML editor is hard, if you want to
be able to let people put anything anywhere (like wordprocessors do).
The moment you start needing to impose structural constraints, you need
some VERY comprehensive stylesheets for the editor so that you can
reasonably cover all eventualities. Plus you need to have stylesheets
for every common Schema/DTD that people are likely to want to use, plus
a really good stylesheet-creator that will let you implement new ones at
a moment's notice.

///Peter

OKB (not okblacke)

unread,

Aug 21, 2011, 2:36:34 PM8/21/11

to

Oleg Paraschenko wrote:

> Please look also at TeXML, it is based on a very similar idea:
>
> http://getfo.org/texml/

Interesting. Is this a purely syntactic mapping between XML and
TeX? From looking at the examples, it looks like you've created a one-
to-one mapping where every TeX command, environment, argument, etc. is
mapped to an XML element, so the XML duplicates the structural
arrangement of TeX tags. This is in itself quite a feat, but I think
what I'm looking for is a system where you mark up semantic units, not
just what TeX wants as syntactic units.

OKB (not okblacke)

unread,

Aug 21, 2011, 8:39:44 PM8/21/11

to

Peter Flynn wrote:

>> From my perspective, basically the only the good thing
>> about TeX is
>> that it produces nice-looking PDFs. Even there, it seems that for
>> most practical purposes it's limited to producing nice-looking
>> PDFs only for certain kinds of documents (e.g., trying to get
>> wrappable figures or the like in a completely general way is a
>> real problem).
>
> It's a typesetter, not a page-layout program.

Yes, so everyone says, but that's part of the problem, because the
typesetting facilities aren't exposed except as part of a page layout,
are they? Everyone likes to talk about how TeX is really good at
formatting paragraphs and doing the correct ligatures and spacing and
whatnot between various glyphs, but is there any way to leverage those
abilities as part of any larger document? Could you "plug in" TeX to a
page layout program and tell it to do its paragraph-formatting stuff
within the context of, say, a column on a page, or a text box, or
whatever?

As far as I've seen, TeX IS a page layout program, whether it wants
to be or not, because it creates pages; it's just a crappy one, because
it doesn't give you much control over the layout.

David Kastrup

unread,

Aug 21, 2011, 8:52:57 PM8/21/11

to

"OKB (not okblacke)" <brenNOS...@NObrenSPAMbarn.net> writes:

> Peter Flynn wrote:
>
>>> From my perspective, basically the only the good thing
>>> about TeX is
>>> that it produces nice-looking PDFs. Even there, it seems that for
>>> most practical purposes it's limited to producing nice-looking
>>> PDFs only for certain kinds of documents (e.g., trying to get
>>> wrappable figures or the like in a completely general way is a
>>> real problem).
>>
>> It's a typesetter, not a page-layout program.
>
> Yes, so everyone says, but that's part of the problem, because the
> typesetting facilities aren't exposed except as part of a page layout,
> are they? Everyone likes to talk about how TeX is really good at
> formatting paragraphs and doing the correct ligatures and spacing and
> whatnot between various glyphs, but is there any way to leverage those
> abilities as part of any larger document? Could you "plug in" TeX to a
> page layout program and tell it to do its paragraph-formatting stuff
> within the context of, say, a column on a page, or a text box, or
> whatever?
>
> As far as I've seen, TeX IS a page layout program, whether it wants
> to be or not, because it creates pages; it's just a crappy one, because
> it doesn't give you much control over the layout.

Oh, but it does. It is just not much fun programming output routines.

--
David Kastrup
UKTUG FAQ: <URL:http://www.tex.ac.uk/cgi-bin/texfaq2html>

Charles P. Schaum

unread,

Aug 22, 2011, 12:01:03 AM8/22/11

to

On Sat, 20 Aug 2011 23:57:27 +0100, Peter Flynn wrote:

> There are several things that would need to be done:

I have been wondering how something like this can be done for some time.
Thanks for presenting these well-structured ideas. It definitely
registers as an "oh, yeah" with me.

Granted, from the standpoint of an editor, I am woefully inadequate to
wrap my head around the details. The more I look around in the LaTeX
kernel and try to understand TeX by Topic, the more I see that it is much
easier to be a basic user than to get too far under the hood. I wish
there were a more visual description of "what goes on." As I get older,
my abstract abilities seem to be waning in favor of visual learning.
Pretty soon, I'll be playing with blocks...

I wish for a modularized API where you define an object with certain
properties and stick this text in it, etc. I really like the memoir class
in that regard. Memoir and biblatex allow me to just get things done.

Oleg Paraschenko

unread,

Aug 22, 2011, 2:07:06 AM8/22/11

to

Hello Brendan,

On 21 Aug., 20:36, "OKB (not okblacke)"

<brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
> Oleg Paraschenko wrote:
> > Please look also at TeXML, it is based on a very similar idea:
>
> >http://getfo.org/texml/
>
> Interesting. Is this a purely syntactic mapping between XML and
> TeX?

Formally, yes.

> From looking at the examples, it looks like you've created a one-
> to-one mapping where every TeX command, environment, argument, etc. is
> mapped to an XML element, so the XML duplicates the structural
> arrangement of TeX tags.

But in practice, I translate nearly 1-to-1 from XML structural units
to LaTeX environments, thus preserving semantics. Something like:

(XML)

<introduction>
<InformalTitle>About smth</InformalTitle>
<para>para1</para>
<para>para2</para>
</introduction>

-> (TeXML)

<env name="Introduction">
<cmd name="Informaltitle"><parm>About smth</parm></cmd>
<env name="para">para1</env>
<env name="para">para2</env>
</env>

-> (TeX)

\begin{introduction}
\InformalTitle{About smth}
\begin{para}para1\end{para}
\begin{para}para2\end{para}
\end{introduction}

> This is in itself quite a feat, but I think
> what I'm looking for is a system where you mark up semantic units, not
> just what TeX wants as syntactic units.
>
> --
> --OKB (not okblacke)
> Brendan Barnwell
> "Do not follow where the path may lead. Go, instead, where there is
> no path, and leave a trail."
> --author unknown

--
Oleg

Charles P. Schaum

unread,

Aug 22, 2011, 9:53:10 AM8/22/11

to

On Sun, 21 Aug 2011 23:07:06 -0700, Oleg Paraschenko wrote:

> Hello Brendan,
>
> On 21 Aug., 20:36, "OKB (not okblacke)"
> <brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
>> Oleg Paraschenko wrote:
>> > Please look also at TeXML, it is based on a very similar idea:
>>
>> >http://getfo.org/texml/
>>
>> Interesting. Is this a purely syntactic mapping between
>> XML and
>> TeX?
>
> Formally, yes.
>

That is just plain cool. I was wondering if something like that existed.

Peter Flynn

unread,

Aug 22, 2011, 2:33:12 PM8/22/11

to

It is very clever, but inefficient for authoring. I habitually write or
edit in DocBook, TEI, and other large and well-known vocabularies. The
semantic match to LaTeX is in many cases very close, and XSLT is a good
tool for transforming to LaTeX. But it is common for authors and
publishers to require additional features or behaviours to those
envisaged by the creators of the vocabulary. Both DocBook and TEI can be
extended to provide this, and mappings written in XSLT to handle them.
I'm not sure I want to introduce another transformation into the
pipeline, but for those without an established workflow I think TexML
has a lot to offer.

The critical point is always to get the document into XML as early in
the process as possible. From then on it is controllable.

///Peter

Peter Flynn

unread,

Aug 22, 2011, 2:42:42 PM8/22/11

to

On 22/08/11 01:39, OKB (not okblacke) wrote:
> Peter Flynn wrote:
>
>>> From my perspective, basically the only the good thing
>>> about TeX is
>>> that it produces nice-looking PDFs. Even there, it seems that for
>>> most practical purposes it's limited to producing nice-looking
>>> PDFs only for certain kinds of documents (e.g., trying to get
>>> wrappable figures or the like in a completely general way is a
>>> real problem).
>>
>> It's a typesetter, not a page-layout program.
>
> Yes, so everyone says, but that's part of the problem, because the
> typesetting facilities aren't exposed except as part of a page layout,
> are they? Everyone likes to talk about how TeX is really good at
> formatting paragraphs and doing the correct ligatures and spacing and
> whatnot between various glyphs, but is there any way to leverage those
> abilities as part of any larger document? Could you "plug in" TeX to a
> page layout program and tell it to do its paragraph-formatting stuff
> within the context of, say, a column on a page, or a text box, or
> whatever?

This has been done many times, but only as part of a commercial product.
The TeX engine has formed part of a lot of systems, from the early
Arbortext Publisher to the 3B2 typesetter. Its hard to do because of the
way Knuth wrote the program. The (apparently) defunct Textures system
from Blue Sky was the first TeX editor to implement synchronous
re-typesetting of the edit window as you typed, as far as I know.

It is also exposed, although not quite as you imply, in the Emacs C-c
C-r command, which runs TeX on the highlighted region, and (I believe)
an equivalent feature in AucTeX.

> As far as I've seen, TeX IS a page layout program, whether it wants
> to be or not, because it creates pages; it's just a crappy one, because
> it doesn't give you much control over the layout.

ConTeXt does. But otherwise it is as David said.

///Peter

Khaled Hosny

unread,

Aug 22, 2011, 2:49:17 PM8/22/11

to

On Aug 21, 12:57 am, Peter Flynn <pe...@silmaril.ie> wrote:
> 1. Write a new version of TeX that outputs XML. The invention of pdftex
> was the first move away from DVI: we need another one to move to XML.
> This is IMHO the ONLY way to get a transformation that is 100%
> guaranteed to have taken everything into account. The hard bits are
> defining the target schema, and rewriting TeX :-) Once the output is in
> XML, a second stage (eg in XSLT) can transform the document to a more
> commonly-used structure.

Recent versions of ConTeXt can output XML[1] without the need to
rewrite TeX, thanks to LuaTeX capabilities.

[1] http://wiki.contextgarden.net/epub

Regards,
Khaled

OKB (not okblacke)

unread,

Aug 22, 2011, 2:56:05 PM8/22/11

to

Oleg Paraschenko wrote:

> But in practice, I translate nearly 1-to-1 from XML structural units
> to LaTeX environments, thus preserving semantics. Something like:
>
> (XML)
>
> <introduction>
> <InformalTitle>About smth</InformalTitle>
> <para>para1</para>
> <para>para2</para>
> </introduction>
>
> -> (TeXML)
>
> <env name="Introduction">
> <cmd name="Informaltitle"><parm>About smth</parm></cmd>
> <env name="para">para1</env>
> <env name="para">para2</env>
> </env>
>
> -> (TeX)
>
> \begin{introduction}
> \InformalTitle{About smth}
> \begin{para}para1\end{para}
> \begin{para}para2\end{para}
> \end{introduction}

I see. And then you have specialized LaTeX packages set up to
interpret these environments?

Khaled Hosny

unread,

Aug 22, 2011, 2:57:28 PM8/22/11

to

On Aug 19, 9:15 pm, "OKB (not okblacke)"

<brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
> It would really be great if TeX had an API for its document
> CREATION abilities (i.e., making beautiful books) that was separate from
> the input file FORMAT, so you could create your document tree (in XML or
> whatever you like) description and feed it in via the API to get your
> PDF (or HTML, or whatever). Although TeX has some advantages, its
> failure to separate these two aspects of the task makes me more and more
> skeptical that it can survive outside of a math-based niche.

luaTeX can be, more or less, used that way:
http://wiki.luatex.org/index.php/TeX_without_TeX

Regards,
Khaled

Peter Flynn

unread,

Aug 22, 2011, 3:05:47 PM8/22/11

to

On 19/08/11 20:15, OKB (not okblacke) wrote:
[...]

> It would really be great if TeX had an API for its document
> CREATION abilities (i.e., making beautiful books) that was separate from
> the input file FORMAT, so you could create your document tree (in XML or
> whatever you like) description and feed it in via the API to get your
> PDF (or HTML, or whatever).

Just re-reading this...this is basically what an XSLT transformation
does. I feed an XML file into it, and it works like an API: it writes
LaTeX and out comes a PDF.

> Although TeX has some advantages, its
> failure to separate these two aspects of the task makes me more and more
> skeptical that it can survive outside of a math-based niche.

As I mentioned, the biggest growth area I am seeing right now is in the
Humanities.

///Peter

OKB (not okblacke)

unread,

Aug 22, 2011, 3:12:13 PM8/22/11

to

Peter Flynn wrote:

> On 22/08/11 01:39, OKB (not okblacke) wrote:
>> Yes, so everyone says, but that's part of the problem,
>> because the
>> typesetting facilities aren't exposed except as part of a page
>> layout, are they? Everyone likes to talk about how TeX is really
>> good at formatting paragraphs and doing the correct ligatures and
>> spacing and whatnot between various glyphs, but is there any way
>> to leverage those abilities as part of any larger document? Could
>> you "plug in" TeX to a page layout program and tell it to do its
>> paragraph-formatting stuff within the context of, say, a column on
>> a page, or a text box, or whatever?
>
> This has been done many times, but only as part of a commercial
> product. The TeX engine has formed part of a lot of systems, from
> the early Arbortext Publisher to the 3B2 typesetter. Its hard to do
> because of the way Knuth wrote the program. The (apparently)
> defunct Textures system from Blue Sky was the first TeX editor to
> implement synchronous re-typesetting of the edit window as you
> typed, as far as I know.
>
> It is also exposed, although not quite as you imply, in the Emacs
> C-c C-r command, which runs TeX on the highlighted region, and (I
> believe) an equivalent feature in AucTeX.

Hmm, interesting! I looked up the programs you mentioned. I can't
totally tell how they work just from perusing the websites, but it looks
like they're a bit different from what I mean. I'm not just talking
about a synchronous editor that compiles your code to TeX on the fly.
I'm talking about a system that compiles TeX code to PART of a document.
So, for instance, I draw a newspaper page layout, or something, with
multiple columns, and then I somehow run TeX on some input file, and the
resulting text is flowed into a particular column in my layout file.

That is what it would mean to me for TeX to be "just a typesetter".
Even the programs you describe, though, still seem to be compiling TeX
to a document, not to something that can be integrated as part of a
larger document.

Peter Flynn

unread,

Aug 22, 2011, 3:52:32 PM8/22/11

to

On 22/08/11 20:12, OKB (not okblacke) wrote:
[...]

> Hmm, interesting! I looked up the programs you mentioned. I can't
> totally tell how they work just from perusing the websites, but it looks
> like they're a bit different from what I mean. I'm not just talking
> about a synchronous editor that compiles your code to TeX on the fly.
> I'm talking about a system that compiles TeX code to PART of a document.
> So, for instance, I draw a newspaper page layout, or something, with
> multiple columns, and then I somehow run TeX on some input file, and the
> resulting text is flowed into a particular column in my layout file.

Like InDesign does? Parts of the Adobe engines originally had some teX
code in them, I believe. If you can define the API specs, I'm sure
someone can write an implementation.

> That is what it would mean to me for TeX to be "just a typesetter".
> Even the programs you describe, though, still seem to be compiling TeX
> to a document, not to something that can be integrated as part of a
> larger document.

Yes, it's monolithic as it stands. Maybe you should talk to the LaTeX3
people, though.

///Peter

OKB (not okblacke)

unread,

Aug 22, 2011, 10:16:13 PM8/22/11

to

Peter Flynn wrote:

> On 22/08/11 20:12, OKB (not okblacke) wrote:
> [...]
>> Hmm, interesting! I looked up the programs you
>> mentioned. I can't
>> totally tell how they work just from perusing the websites, but it
>> looks like they're a bit different from what I mean. I'm not just
>> talking about a synchronous editor that compiles your code to TeX
>> on the fly. I'm talking about a system that compiles TeX code to
>> PART of a document. So, for instance, I draw a newspaper page
>> layout, or something, with multiple columns, and then I somehow
>> run TeX on some input file, and the resulting text is flowed into
>> a particular column in my layout file.
>
> Like InDesign does? Parts of the Adobe engines originally had some
> teX code in them, I believe. If you can define the API specs, I'm
> sure someone can write an implementation.

Yeah, something like that.

>> That is what it would mean to me for TeX to be "just a
>> typesetter".
>> Even the programs you describe, though, still seem to be compiling
>> TeX to a document, not to something that can be integrated as part
>> of a larger document.
>
> Yes, it's monolithic as it stands. Maybe you should talk to the
> LaTeX3 people, though.

Hmmmm, I haven't looked much at LaTeX3, I'll take a gander.

Wolfgang Keller

unread,

Aug 23, 2011, 6:32:24 AM8/23/11

to

> This has been done many times, but only as part of a commercial
> product.

http://wiki.scribus.net/canvas/Working_with_latex_frames

> The (apparently) defunct Textures system from Blue Sky

http://www.bluesky.com/news/220b.html

> was the first TeX editor to implement synchronous re-typesetting of
> the edit window as you typed, as far as I know.

Sincerely,

Wolfgang

--
Führungskräfte leisten keine Arbeit (D'Alembert)

Robin Fairbairns

unread,

Aug 23, 2011, 6:50:23 AM8/23/11

to

Wolfgang Keller <feli...@gmx.net> writes:

>> The (apparently) defunct Textures system from Blue Sky
>
> http://www.bluesky.com/news/220b.html
>
>> was the first TeX editor to implement synchronous re-typesetting of
>> the edit window as you typed, as far as I know.

not really -- the vortex system used two sun workstations, one editing
and the other producing images, and did synchronous operation. it was
never updated as far as tex 3.0, iirc.

as for the status of textures, that link has been totally static for a
very long time. every so often i send a mail to bluesky asking whether
i should still list textures in the faq; the mail never bounces, and
every so often it's even answered. each time it _is_ answered, they
assure me a shiny new release is about to arrive. (the latest i got
such an answer was april 2007, when they had a preliminary release for
macos x [native]; apparently i've had nothing since. time to ask
again.)
--
Robin Fairbairns, Cambridge
my address is @cl.cam.ac.uk, regardless of the header. sorry about that.

as

unread,

Aug 24, 2011, 2:33:32 AM8/24/11

to

Le Tue, 23 Aug 2011 11:50:23 +0100,
Robin Fairbairns <rf...@cl.cam.ac.uk> a écrit :

> as for the status of textures, that link has been totally static for a
> very long time. every so often i send a mail to bluesky asking
> whether i should still list textures in the faq; the mail never
> bounces, and every so often it's even answered. each time it _is_
> answered, they assure me a shiny new release is about to arrive.
> (the latest i got such an answer was april 2007, when they had a
> preliminary release for macos x [native]; apparently i've had nothing
> since. time to ask again.)

They are waiting the Stix fonts to be realeased.
Someone should say them :)

--
Arnaud

Charles P. Schaum

unread,

Aug 24, 2011, 11:22:01 AM8/24/11

to

On Mon, 22 Aug 2011 11:57:28 -0700, Khaled Hosny wrote:

> luaTeX can be, more or less, used that way:
> http://wiki.luatex.org/index.php/TeX_without_TeX
>
> See also:
> http://speedata.github.com/publisher/
>
> Regards,
> Khaled

Looks like this might be the pathway to an app that can TeX things at the
console but also have a UI (reminiscent of GIMP).

William F Hammond

unread,

Aug 24, 2011, 12:03:58 PM8/24/11

to

"OKB (not okblacke)" <brenNOS...@NObrenSPAMbarn.net> writes:

> ... From looking at the examples, it looks like you've created a one-

> to-one mapping where every TeX command, environment, argument, etc. is
> mapped to an XML element, so the XML duplicates the structural
> arrangement of TeX tags. This is in itself quite a feat, but I think
> what I'm looking for is a system where you mark up semantic units, not
> just what TeX wants as syntactic units.

Have you looked at gellmu?

-- Bill

William F Hammond

unread,

Aug 24, 2011, 12:18:15 PM8/24/11

to

Peter Flynn <pe...@silmaril.ie> writes:

> It is very clever, but inefficient for authoring. I habitually write
> or edit in DocBook, TEI, and other large and well-known
> vocabularies. The semantic match to LaTeX is in many cases very close,
> and XSLT is a good tool for transforming to LaTeX. But it is common
> for authors and publishers to require additional features or
> behaviours to those envisaged by the creators of the vocabulary. Both
> DocBook and TEI can be extended to provide this, and mappings written
> in XSLT to handle them. I'm not sure I want to introduce another
> transformation into the pipeline, but for those without an established
> workflow I think TexML has a lot to offer.
>
> The critical point is always to get the document into XML as early in
> the process as possible. From then on it is controllable.

Pipelining transformations is useful. For example, when going from,
say, DocBook to LaTeX, particularly if one wants flexibility for
accommodating extensions of DocBook, I think there is gain in
translating first to an xml document type modeling LaTeX and then
using a second translation to get to LaTeX. The xml document type
modeling LaTeX should be constructed so that it also admits sane
transformation to html. Because DocBook is so large, I've only tried
this with small subsets of DocBook, but I liked the results better
than those obtained with someone else's direct route.

-- Bill

OKB (not okblacke)

unread,

Aug 24, 2011, 2:40:55 PM8/24/11

to

William F Hammond wrote:

I just took a look at it now. It looks similar to some other tools
(e.g., pandoc), but the requirement to install emacs and Perl makes it
unattractive to me.

OKB (not okblacke)

unread,

Aug 24, 2011, 3:18:28 PM8/24/11

to

Khaled Hosny wrote:

> On Aug 19, 9:15 pm, "OKB (not okblacke)"
> <brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
>> It would really be great if TeX had an API for its docume

>> nt CREATION abilities (i.e., making beautiful books) that was

>> separate from the input file FORMAT, so you could create your
>> document tree (in XML or whatever you like) description and feed
>> it in via the API to get your PDF (or HTML, or whatever).
>> Although TeX has some advantages, its failure to separate these
>> two aspects of the task makes me more and more skeptical that it
>> can survive outside of a math-based niche.
>
> luaTeX can be, more or less, used that way:
> http://wiki.luatex.org/index.php/TeX_without_TeX
>
> See also:
> http://speedata.github.com/publisher/

Yes, I've looked a bit at that, it's intriguing. In a way it looks
like the most promising existing route. The issue for me is that luaTeX
seems to be hooking in at a very low level, manipulating things like
glues and glyphs, as opposed to an XML-style deal where you're
manipulating relatively coarse semantic units. It's therefore not clear
to me how luaTeX would be used to leverage TeX's skills; the examples on
the site seem to be either patching up input before TeX sees it, or
tweaking it after.

To put it in Python-ish pseudocode, I'd like to be able to do stuff
like this:

def processSomeEnvironment(colspec, rows):
# Add a new right-aligned column to the beginning of each row
# Could also be "rightAlignedColumn + colspect", I don't care
colspec = "r" + colspec

# This is supposed to mean "create an aligned environment based on
a tabular-style column specification using "c", "l", "r", etc."
align = alignmentEnv(colspec)

rowCounter = 1
for row in rows:
# Output the row, but with a an additional cell at the front
containing the row number
align.addRow([rowCounter] + row)
rowCounter += 1
return align

That is, I want to manipulate units like rows and cells of tables, not
the individual "boxes" with their individual dimensions. Likewise I'd
like to be able to say "Put this image in the upper left corner of the
page and slice that area out of the text area (so text will wrap
around)". Does luaTeX allow document manipulation at this level of
granularity?

Are there any basic intro tutorials for luaTeX? I'd be especially
interested in something that shows how to do concrete tasks in luaTeX
(e.g., "how to implement a tabular in luaTeX").

Thanks

Khaled Hosny

unread,

Aug 24, 2011, 4:42:40 PM8/24/11

to

On Aug 24, 9:18 pm, "OKB (not okblacke)"

One can write a lua library that provides a higher level typesetting
API, or do something similar to "ConTeXt Lua Documents":
http://wiki.contextgarden.net/cld
No idea how hard to do a similar thing for LaTeX.

>
> Are there any basic intro tutorials for luaTeX? I'd be especially
> interested in something that shows how to do concrete tasks in luaTeX
> (e.g., "how to implement a tabular in luaTeX").

There is the pages in wiki.luatex.org and there are some articles in
wiki.contextgarden.net, TUGboat, MAPS etc.

Regards,
Khaled

zappathustra

unread,

Aug 24, 2011, 4:51:55 PM8/24/11

to

On 24 août, 21:18, "OKB (not okblacke)"

<brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
> Khaled Hosny wrote:
> > On Aug 19, 9:15 pm, "OKB (not okblacke)"
> > <brenNOSPAMb...@NObrenSPAMbarn.net> wrote:
> >> It would really be great if TeX had an API for its docume
> >> nt CREATION abilities (i.e., making beautiful books) that was
> >> separate from the input file FORMAT, so you could create your
> >> document tree (in XML or whatever you like) description and feed
> >> it in via the API to get your PDF (or HTML, or whatever).
> >> Although TeX has some advantages, its failure to separate these
> >> two aspects of the task makes me more and more skeptical that it
> >> can survive outside of a math-based niche.
>
> > luaTeX can be, more or less, used that way:
> >http://wiki.luatex.org/index.php/TeX_without_TeX
>
> > See also:
> >http://speedata.github.com/publisher/
>
> Yes, I've looked a bit at that, it's intriguing. In a way it looks
> like the most promising existing route. The issue for me is that luaTeX
> seems to be hooking in at a very low level, manipulating things like
> glues and glyphs, as opposed to an XML-style deal where you're
> manipulating relatively coarse semantic units. It's therefore not clear
> to me how luaTeX would be used to leverage TeX's skills; the examples on
> the site seem to be either patching up input before TeX sees it, or
> tweaking it after.

LuaTeX indeed manipulates glyphs and glues, but it can also manipulate
input files. What you need is to write an XML-to-TeX, or semantics-to-
typography, converter, and LuaTeX lets you do it. I think it exists in
ConTeXt; also I'll recommend my Interpreter package (if I may indulge
in a little bit of self-advertising), although it's not especially
tailored for XML, but you'll get the idea.

>
> To put it in Python-ish pseudocode, I'd like to be able to do stuff
> like this:
>
> def processSomeEnvironment(colspec, rows):
> # Add a new right-aligned column to the beginning of each row
> # Could also be "rightAlignedColumn + colspect", I don't care
> colspec = "r" + colspec
>
> # This is supposed to mean "create an aligned environment based on
> a tabular-style column specification using "c", "l", "r", etc."
> align = alignmentEnv(colspec)
>
> rowCounter = 1
> for row in rows:
> # Output the row, but with a an additional cell at the front
> containing the row number
> align.addRow([rowCounter] + row)
> rowCounter += 1
> return align
>
> That is, I want to manipulate units like rows and cells of tables, not
> the individual "boxes" with their individual dimensions.

Rows and cells are typographic units, not semantic units; manipulating
boxes is the closer you can get to working on them. Again, what you
need is an interface to work with "rows", i.e. nothing more than
something that calls "row" an horizontal list (in TeX jargon). As for
columns, they admittedly do not exist per se (basic tables are lines
of cells), but an interface wouldn't be so difficult to implement to
give the impression you're manipulating columns, especially with
LuaTeX.

> Likewise I'd
> like to be able to say "Put this image in the upper left corner of the
> page and slice that area out of the text area (so text will wrap
> around)". Does luaTeX allow document manipulation at this level of
> granularity?

Yes, although you probably won't say so in TeX, i.e. you'll go through
technical details that bear little resemblance to your English
sentence. But (again!) the problem is not with TeX, but with an
interface that lets you speak as you wish.

> Are there any basic intro tutorials for luaTeX? I'd be especially
> interested in something that shows how to do concrete tasks in luaTeX
> (e.g., "how to implement a tabular in luaTeX").

The wiki is a good place to start: http://wiki.luatex.org/
Some papers in TUGboat might interest you: http://www.tug.org/tugboat/contents.html

Best,
Paul

OKB (not okblacke)

unread,

Aug 24, 2011, 6:39:44 PM8/24/11

to

zappathustra wrote:

> On 24 août, 21:18, "OKB (not okblacke)"

>> Yes, I've looked a bit at that, it's intriguing. In a
>> way it looks like the most promising existing route. The issue

>> for me is that luaTe X seems to be hooking in at a very low level,

>> manipulating things like glues and glyphs, as opposed to an
>> XML-style deal where you're manipulating relatively coarse

>> semantic units. It's therefore not clea r to me how luaTeX would

>> be used to leverage TeX's skills; the examples on the site seem to
>> be either patching up input before TeX sees it, or tweaking it
>> after.
>
> LuaTeX indeed manipulates glyphs and glues, but it can also
> manipulate input files. What you need is to write an XML-to-TeX, or
> semantics-to- typography, converter, and LuaTeX lets you do it. I
> think it exists in ConTeXt; also I'll recommend my Interpreter
> package (if I may indulge in a little bit of self-advertising),
> although it's not especially tailored for XML, but you'll get the
> idea.

Yes, I'm actually currently working on something like that,
although who knows if I'll produce anything worth releasing. However,
if I'm going to work at that level, there's no need to involve TeX at
all until the very end --- i.e., no need to use TeX, ConTeXt, LaTeX,
luaTeX, or any other *TeX in my preprocessing. I'm writing my thing in
Python because that's the language I'm most comfortable with.

This is fine such as it is, but it means that I'm not really
working with TeX/semantic/document-structure units of ANY kind -- I'm
working with a plain text file, and performing textual manipulations on
that text to produce other text. The structural units are only those
that I (or whatever preprocessing package you like) define.

>> That is, I want to manipulate units like rows and cells of tables,
>> not the individual "boxes" with their individual dimensions.
>
> Rows and cells are typographic units, not semantic units;
> manipulating boxes is the closer you can get to working on them.
> Again, what you need is an interface to work with "rows", i.e.
> nothing more than something that calls "row" an horizontal list (in
> TeX jargon). As for columns, they admittedly do not exist per se
> (basic tables are lines of cells), but an interface wouldn't be so
> difficult to implement to give the impression you're manipulating
> columns, especially with LuaTeX.

I don't really agree with that. Or, rather, you can say that rows
and cells are typographical units, but my points is that I want to work
on the semantic units which are represented in that way, for which we
have no convenient term other than "row" or "cell". For instance, if I
have a table of quarterly sales receipts or whatever, there is a
semantic unit for "the set of all sales receipts data for a given
quarter", which contains semantic subunits for each particular data
point. These might correspond to rows and cells in the table. Likewise
there is a unit for "the set of all sales receipt data for widgets",
which might correspond to one column of the table (supposing the data
separated out widget sales from other sales).

But the names "row", "cell", "column" for these things don't arise
just from the fact that they're represented as rows and columns in the
output. Rather, they happen to represented as rows and columns in the
INPUT --- i.e., separated by linebreaks or \\ or whatever (for rows), or
by & or some other separator (for columns). I specifically don't want
to consider them as typographical units, because they're not --- it
should be possible for some to specify the data in these "rows" and
"columns" in an input file, but perform transformations on them before
they're typset, e.g. by adding extra columns (say a row number),
removing columns (by combining some) or whatever.

So if you want to call them "data sets" or something instead of
"rows" and "columns", fine, but I'm talking about dimensions into which
the data are conceptually divided on the input side, irrespective of
whether they happen to be represented as rows and cells in the output.

>> Likewise I'd
>> like to be able to say "Put this image in the upper left corner of
>> the page and slice that area out of the text area (so text will
>> wrap around)". Does luaTeX allow document manipulation at this
>> level of granularity?
>
> Yes, although you probably won't say so in TeX, i.e. you'll go
> through technical details that bear little resemblance to your
> English sentence. But (again!) the problem is not with TeX, but
> with an interface that lets you speak as you wish.

Sure, I'd agree with that. The problem with TeX is simply that it
doesn't support any of this, forcing me to write the stuff in another
language. More unfortunate, perhaps, is that LaTeX, although it could
theoretically be the entry point to a more structured document
representation, is not, because it allows too much TeX to seep through.

A while ago, for instance, I looked at PlasTeX, a Python library
for generating HTML from TeX. But PlasTeX knows it can't rely on
parsing TeX right, so it forces you to explicitly define the structure
and semantics of every TeX construct it doesn't know about. Even if
you're using LaTeX, you can't rely on a particular well-specified
structure for the document. Presumably this is because LaTeX didn't
attempt to be a separate "program", just a veneer over TeX, so it
doesn't create any sort of accessible data structure for the structural
units it superficially seems to be using.

>> Are there any basic intro tutorials for luaTeX? I'd be
>> especially interested in something that shows how to do concrete
>> tasks in luaTeX (e.g., "how to implement a tabular in luaTeX").
>
> The wiki is a good place to start: http://wiki.luatex.org/
> Some papers in TUGboat might interest you:
> http://www.tug.org/tugboat/contents.html

Thanks, I'm looking through some of that stuff. Most of the
examples I see on the luaTeX site are, again, much more about
manipulating typographical units than about mapping semantic units to
typographical units. This makes me agree that, as you say, the only
solution is to do preprocessing. But if I'm going to do that, I see no
reason to involve TeX in that process at all; I'll just do it in Python
and relegate TeX to an output-only status.

William F Hammond

unread,

Aug 25, 2011, 4:35:47 PM8/25/11

to

"OKB (not okblacke)" <brenNOS...@NObrenSPAMbarn.net> writes:

> ... I just took a look at it now. It looks similar to some other

> tools (e.g., pandoc), but the requirement to install emacs and Perl
> makes it unattractive to me.

Not so similar.

Pandoc, as advertised, is designed to be a universal converter from
format X to format Y. But at version 1.5.1.1, for example, it seems
to be insufficiently developed to convert the standard example
"small2e.tex" into HTML although quite interestingly it does seem to
manage getting it into Texinfo (from which one can certainly go to
HTML).

Also note that for some (X, Y) pairs, the possibility for translating
from X to Y is intrinsically limited to the point that it's unwise to
try.

As to gellmu's installation needs: given that one has a working TeX
installation, it should be child's play to install emacs and perl (if
they're not already there).

-- Bill