LaTeX vs XML?

Peter

unread,

May 28, 2002, 11:15:55 AM5/28/02

to

I've been using LaTeX (in a rather primitive way) for a few years now.
Recently I've also been getting into XML technologies. A couple of weeks
ago I embarked on a somewhat longish writing project and I had to decide
if I should use LaTeX or the DocBook XML vocabulary. It seems like, in
principle, XML could offer the same sort of abilities that LaTeX does.
For example one could style an XML document using XSLT into XML
formatting objects (XML-FO) and then render the XML-FO on a suitable
output device. In fact, I downloaded a style sheet that converts DocBook
to XML-FO and I did a few trials with the Apache project's formatting
object processor (FOP). The result wasn't great... but then FOP is only
in version 0.20.

In the end I decided to go with LaTeX largely because the document I'm
going to produce will contain a lot of mathematics. While in theory one
could mix MathML with DocBook, I haven't fully resolved how that works
and I have a feeling that the tools are not mature enough for it to work
well anyway. Also this project will be a good excuse to get into LaTeX
more deeply... something I've been wanting to do for a while as well.

However, in the long run, once the XML tools are mature, it seems like
XML could do the job quite satisfactorly. Furthermore XML would have the
advantage of being machine readible using off the shelf XML parsers.
While I don't see that being an issue for my current project, that is a
nice feature and I can appreciate that for some documents it might be
critical. I get the feeling that LaTeX documents are not easily digested
with "generic" software components. (That is, processing TeX documents
requires software that understands TeX).

So does this mean that TeX/LaTeX's days are numbered? Will this very
nice system fade into obscurity once the XML tools and standards have
finished maturing? Or is the world large enough for both technologies?

Peter

Phillip Lord

unread,

May 28, 2002, 1:24:14 PM5/28/02

to

>>>>> "Peter" == Peter <p...@ecet.vtc.edu> writes:

Peter> In the end I decided to go with LaTeX largely because the
Peter> document I'm going to produce will contain a lot of
Peter> mathematics. While in theory one could mix MathML with
Peter> DocBook, I haven't fully resolved how that works and I have a
Peter> feeling that the tools are not mature enough for it to work
Peter> well anyway. Also this project will be a good excuse to get
Peter> into LaTeX more deeply... something I've been wanting to do
Peter> for a while as well.

Peter> However, in the long run, once the XML tools are mature, it
Peter> seems like XML could do the job quite
Peter> satisfactorly. Furthermore XML would have the advantage of
Peter> being machine readible using off the shelf XML parsers.
Peter> While I don't see that being an issue for my current project,
Peter> that is a nice feature and I can appreciate that for some
Peter> documents it might be critical. I get the feeling that LaTeX
Peter> documents are not easily digested with "generic" software
Peter> components. (That is, processing TeX documents requires
Peter> software that understands TeX).

Peter> So does this mean that TeX/LaTeX's days are numbered? Will
Peter> this very nice system fade into obscurity once the XML tools
Peter> and standards have finished maturing? Or is the world large
Peter> enough for both technologies?

XML is a syntactic carrier. In a nutshell its a bunch of angle
brackets. There is no semantics there. Of course its possible that in
time someone will develop a XML instance which could replace tex. But
why would they? Yes you could parse the documents with xerces, but so
what? Parsing is not that hard, compared to the much bigger task of
understanding the semantic meaning of the tags. Certainly docbook is
not a replacement for tex, nor is it meant to be.

Besides its easy to convert between tex and XML. Try this perl script
which takes any latex document and converts it instantly to XML....

#!/usr/bin/perl -w

print "<latex>\n";
while(<>){
print;
}

print "</latex>\n";

Cheers

Phil

Javier Bezos

unread,

May 28, 2002, 1:28:27 PM5/28/02

to

Peter <p...@ecet.vtc.edu> wrote:

> So does this mean that TeX/LaTeX's days are numbered? Will this very
> nice system fade into obscurity once the XML tools and standards have
> finished maturing? Or is the world large enough for both technologies?

They are complementary. XML can describe the logical
structure os a document, while LaTeX is a typeseting
engine. You can use _both_. See for example xmltex and
PassiveTeX (or the somewhat outdated but still very
useful The LateX Web Companion, by Goossens and Rahtz,
Addison-Wesley, 1999).

Regards
Javier

___________________________________________________________
Javier Bezos | TeX y tipografia
jbezos at wanadoo dot es | http://perso.wanadoo.es/jbezos/
...........................................................
CervanTeX http://apolo.us.es/CervanTeX/CervanTeX.html

Peter

unread,

May 28, 2002, 9:35:58 PM5/28/02

to

In article <vf3cwcf...@rpc71.cs.man.ac.uk>, p.l...@russet.org.uk
says...

> XML is a syntactic carrier. In a nutshell its a bunch of angle
> brackets. There is no semantics there. Of course its possible that in
> time someone will develop a XML instance which could replace tex. But
> why would they?

I'm not an expert, but it seems like the goals of XSL-FO are very
similar to the goals of TeX in that it is interested in specifying the
placement of information on a page. I was just browsing around in
http://www.w3.org/TR/xsl/ and the description of the "Area Model" used
by XSL formatting objects sounds similar, to my untutored ear, to the
descriptions I've heard of how TeX uses nested boxes to place things. It
doesn't seem far fetched to say that styling DocBook into XSL-FO using
XSLT is providing the same sort of functionality that LaTeX does.

> Yes you could parse the documents with xerces, but so
> what? Parsing is not that hard, compared to the much bigger task of
> understanding the semantic meaning of the tags.

But there are situations where the applications are not required to
interpret semantics. For example, I understand that there exists a
generic tool for extracting elements from an XML document using an XPath
expression on the command line. Thus I could use this tool to, for
example, display the third paragraph of the second chapter of a DocBook
document. The tool does not need to know anything about DocBook for this
to be useful to me. Since the tool is using generic XML technology it
would be equally useful on any XML document... even "pure" data
documents with no presentational semantics at all. (In such a situation
it would serve as a kind of database query tool). Creating such a tool
would potentially offer a greater return on the investment than creating
similar kinds of TeX specific tools --- at least that would seem to be
the position of the XML community.

> Certainly docbook is
> not a replacement for tex, nor is it meant to be.

I will agree with you there. But the question really is can DocBook
together with XSL (both XSLT and XSL-FO) serve as a replacement for
LaTeX?

Peter

walter a kehowski

unread,

May 29, 2002, 2:53:26 AM5/29/02

to

Cute!

Javier Bezos

unread,

May 29, 2002, 4:23:33 AM5/29/02

to

Peter <p...@ecet.vtc.edu> wrote:

> I'm not an expert, but it seems like the goals of XSL-FO are very
> similar to the goals of TeX in that it is interested in specifying the
> placement of information on a page.

XSL-FO is interested in *specifying the placement* of information
on a page. TeX is interested in *placing* the information on the page.
You are mixing XSL-FO and the application rendering a document
with XSL-FO markup. They are different things and in fact you
can use TeX as the application. So, as I said, they are complementary.

Peter

unread,

May 29, 2002, 8:25:03 AM5/29/02

to

In article <1fcxvb6.1bh3m2jirxneoN%see....@no.spam.es>,
see....@no.spam.es says...

> XSL-FO is interested in *specifying the placement* of information
> on a page. TeX is interested in *placing* the information on the page.

Well, I understand that actually. For example, the Apache project has an
FO processor called FOP that does the rendering. So it would be more
accurate to compare the TeX program to FOP and it's ilke. Earlier I was
using "TeX" to refer to the file format that the TeX program accepts
rather than the executable itself.

I was reading a little in FOP's documentation... FOP worries about font
metrics and hyphenation issues. In fact the documentation even suggests
using TeX's hyphenation patterns file (you have to convert it to a
particular XML format first). So it seems clear that the two systems are
very much overlapping.

> You are mixing XSL-FO and the application rendering a document
> with XSL-FO markup. They are different things and in fact you
> can use TeX as the application. So, as I said, they are complementary.

I don't deny that it would be possible to use TeX as the final
formatting engine for a document. In fact, on a related thread that I
started on comp.text.xml several people have pointed out various
packages that do exactly that. I guess the real question is: why bother?

The answer may be: TeX currently does a better job than FO processors at
formatting... certainly it does a better job than FOP v0.20. However, I
could forsee the day when that would not be true. When (if??) that day
arrives, the advantage of a pure XML solution from start to finish may
push TeX out of the picture. At least that is my question.

Peter

unread,

May 29, 2002, 8:26:24 AM5/29/02

to

In article <GV_I8.2423$155.1...@news2.west.cox.net>, wkeh...@cox.net
says...

> Cute!

The script as it exists won't quite work. It would be necessary to
translate certain characters into entities as well (such as '&' and
'<'). However, that fix wouldn't be difficult.

Peter

William F. Hammond

unread,

May 29, 2002, 9:36:02 AM5/29/02

to

Peter <p...@ecet.vtc.edu> writes:

> In article <GV_I8.2423$155.1...@news2.west.cox.net>, wkeh...@cox.net
> says...
>
> > Cute!

But useless and of little relevance to the thread unless the point
being made is that the XML document type involved is critical.

> The script as it exists won't quite work. It would be necessary to
> translate certain characters into entities as well (such as '&' and
> '<'). However, that fix wouldn't be difficult.

Correct. (One would also need to catch LaTeX's range-dash and
punctuation-dash markup.) For example, the script's output for
sample2e.tex will not pass rxp validation.

As I've said before, though not recently, another (AFAIK useless)
observation is that DVI format can be cast as an XML document type.

This is based on the observations that (1) Geoffrey Tobin has provided
a text equivalent for DVI format called DT, (2) DT has the structure
of a classical assembly language, and (3) any classical assembly
language can be modeled as an XML document type by giving element
names to the instructions and their parameters. ("dv2dt" and "dt2dv"
are included with pre-compiled forms of TeXLive.)

The point here, however, is that bringing something up as an XML
document type is useful only to the extent that objects under that
document type can be usefully processed from that format using
XML processing frameworks (for which there are choices involving a
selection of programming languages).

-- Bill

Uwe Brauer

unread,

May 29, 2002, 11:24:07 AM5/29/02

to

Phillip Lord <p.l...@russet.org.uk> writes:

I presume I miss the point, but suppose your latex document contains
math, then it should be translated to mml, which is part of xml. I
know about some mml2latex converter but not the other way around.

Uwe Brauer

Phillip Lord

unread,

May 29, 2002, 10:49:46 AM5/29/02

to

>>>>> "William" == William F Hammond <ham...@whitehead.math.albany.edu> writes:

William> Peter <p...@ecet.vtc.edu> writes:

>> In article <GV_I8.2423$155.1...@news2.west.cox.net>,
>> wkeh...@cox.net says...
>>
>> > Cute!

William> But useless and of little relevance to the thread unless
William> the point being made is that the XML document type involved
William> is critical.

No, actually its very relevant. XML is a syntax, nothing more and
nothing less. Tex is not a syntax, although of course it has one.

XML can not replace tex. Something which uses XML as a syntax could
replace tex. However at the moment there is nothing there which
competes. Docbook does something different for instance.

And, as my script shows, tex and XML are orthogonal. You could replace
tex's existing syntax with XML, and still have something which was
both tex and XML. But why would you? You might not be able to parse
the document with xerces, but you'd still need a tex (based on an XML
syntax) to actually get a document out the other end. There is a
marginally possibility that it would be easy to write tools for
dealing with an XML based tex. But even that would be massively
outweighed by the requirement for recoding existing tools.

>> The script as it exists won't quite work. It would be necessary
>> to translate certain characters into entities as well (such as
>> '&' and '<'). However, that fix wouldn't be difficult.

William> Correct. (One would also need to catch LaTeX's range-dash
William> and punctuation-dash markup.) For example, the script's
William> output for sample2e.tex will not pass rxp validation.

I am utterly distressed that what I thought was a highly complete,
industrial strength piece of coding, has fallen apart so easily under
the examination of the esteemed individuals of c.t.t.

Phil

Phillip Lord

unread,

May 29, 2002, 11:05:18 AM5/29/02

to

>>>>> "Peter" == Peter <p...@ecet.vtc.edu> writes:

>> Yes you could parse the documents with xerces, but so what?
>> Parsing is not that hard, compared to the much bigger task of
>> understanding the semantic meaning of the tags.

Peter> But there are situations where the applications are not
Peter> required to interpret semantics. For example, I understand
Peter> that there exists a generic tool for extracting elements from
Peter> an XML document using an XPath expression on the command
Peter> line. Thus I could use this tool to, for example, display the
Peter> third paragraph of the second chapter of a DocBook
Peter> document. The tool does not need to know anything about
Peter> DocBook for this to be useful to me.

No I am afraid this is wrong.

XPath, and something like XSLT would allow you to extract the third
"para" tag, occurring as a child of the second "chapter" tag, which
was a child of the root node.

But this relates to the logical structure of the XML document, not of
the docbook. You are for instance assuming that the ordering of the
document on the printed page ("the third para of the second chapter")
is the same as in the typed version. This might be true. But if
docbook supports "include" type statements it might not be. Either way
you are assuming something about the semantics of docbook.

Besides which how big an thing is this? A regexp could achieve the
same end, and would be just as easy to write.

>> Certainly docbook is not a replacement for tex, nor is it meant
>> to be.

Peter> I will agree with you there. But the question really is can
Peter> DocBook together with XSL (both XSLT and XSL-FO) serve as a
Peter> replacement for LaTeX?

I don't know. Not for a long time anyway. The main reason that I moved
toward latex is because it is stable. A 10 year old document will
compile today. My experience with XML technologies is that stability
is not something you could accuse them of.

The other issue with docbook is that its entirely a logical
markup. Tex gives you the ability to do arbitrary things. So when I
wanted to pad number to the left with zeros, someone told me how to do
it within 10 minutes. To do this with docbook, you'd have to extend
the docbook schema, add support in the stylesheets and then write your
document. I'm not convinced.

Phil

Javier Bezos

unread,

May 29, 2002, 11:22:09 AM5/29/02

to

> The answer may be: TeX currently does a better job than FO processors at
> formatting... certainly it does a better job than FOP v0.20. However, I
> could forsee the day when that would not be true. When (if??) that day
> arrives, the advantage of a pure XML solution from start to finish may
> push TeX out of the picture. At least that is my question.

Well, who knows? AFAIAC, what I want is an outstanding tool, and
its name is not relevant. Currently is TeX, and the future
is still in the future.

William F. Hammond

unread,

May 29, 2002, 11:45:17 AM5/29/02

to

Peter <p...@ecet.vtc.edu> writes:

> . . . ago I embarked on a somewhat longish writing project and I

> had to decide if I should use LaTeX or the DocBook XML
> vocabulary. It seems like, in principle, XML could offer the same
> sort of abilities that LaTeX does.

In some sense, yes.

> For example one could style an XML document using XSLT into XML
> formatting objects (XML-FO) and then render the XML-FO on a suitable
> output device.

Just what tool(s) did you have in mind for processing XML-FO on, say,
a PostScript-compatible printer? The printer does not understand
XML-FO.

> . . .

> In the end I decided to go with LaTeX largely because the document I'm
> going to produce will contain a lot of mathematics. While in theory one
> could mix MathML with DocBook, I haven't fully resolved how that works
> and I have a feeling that the tools are not mature enough for it to work
> well anyway.

Right now what really works for MathML in the wider world is the
incorporation of _presentation_ MathML, version 1 (not 2), in XHTML
when rendered on the screen by (1) Netscape 7.0, (2) Mozilla 1.0, and
(3) W3C's Amaya. Have I missed anything else?

Freehand code for formatting to LaTeX an XML document type that
incorporates MathML is time-consuming but not difficult. I do not
know, however, of anything in this direction to mention.

I would only dream of using _content_ MathML in DocBook, and that is
not actually something that I am considering for now.

The more serious current issue is just how TeX-based authoring -- for
example the whole body of academic mathematical literature that is
presently TeX-based, as represented in part by the ArXiv repository at
Cornell (http://www.arxiv.org/) (formerly at LANL) -- will undertake
to address MathML.

> . . . However, in the long run, once the XML tools are mature, it

> seems like XML could do the job quite satisfactorly.

(Actually, XML "does" nothing, as already said.)

> Furthermore XML would have the advantage of being machine readible
> using off the shelf XML parsers. While I don't see that being an
> issue for my current project, that is a nice feature and I can
> appreciate that for some documents it might be critical.

Consider, for example, the desirability of being able to include
various articles by different authors in a single "collection"
document without the need for technical markup level intervention by
an editor.

> I get the feeling that LaTeX documents are not easily digested with
> "generic" software components. (That is, processing TeX documents
> requires software that understands TeX).

As shown by the long history of projects to translate LaTeX to HTML.

> So does this mean that TeX/LaTeX's days are numbered? Will this very
> nice system fade into obscurity once the XML tools and standards have
> finished maturing? Or is the world large enough for both technologies?

Not at all. There are many ways that TeX/LaTeX -- and I would add
Context -- can be interfaced with XML. However, there is not yet much
that is approaching maturity other than projects for translating XML
document type instances into TeX-based documents.

Projects beyond middle childhood that can be seen and tried include
(1) a DVI formatter for DocBook by Norman Walsh
(http://sourceforge.net/projects/docbook/) using the TeX backend of
the DSSSL engine "jade" (http://www.jclark.com/jade/) and (2) a DVI
formatter for TEI using the XML-FO translator "passivetex" by
Sebastian Rahtz (CTAN:macros/passivetex). My GELLMU project, in
early childhood, translates its own LaTeX-article-like document type
(still just a sketch) into LaTeX source using a Perl-based umbrella.

Note that formatting based on "stylesheets", whether CSS, DSSSL, or
XSLT, viewed from the world of TeX should be construed as a limited
method of typesetting -- perhaps as if coding with one's hands tied
behind one's back. While there are inter-operability advantages to
that approach, much more flexibility and power is available if one
does not restrict one's self to stylesheet formatting.

-- Bill

William F. Hammond

unread,

May 29, 2002, 11:48:43 AM5/29/02

to

Peter <p...@ecet.vtc.edu> writes:

> . . . ago I embarked on a somewhat longish writing project and I

> had to decide if I should use LaTeX or the DocBook XML
> vocabulary. It seems like, in principle, XML could offer the same
> sort of abilities that LaTeX does.

In some sense, yes.

> For example one could style an XML document using XSLT into XML
> formatting objects (XML-FO) and then render the XML-FO on a suitable
> output device.

Just what tool(s) did you have in mind for processing XML-FO on, say,

a PostScript-compatible printer? The printer does not understand
XML-FO.

> . . .

> In the end I decided to go with LaTeX largely because the document I'm
> going to produce will contain a lot of mathematics. While in theory one
> could mix MathML with DocBook, I haven't fully resolved how that works
> and I have a feeling that the tools are not mature enough for it to work
> well anyway.

Right now what really works for MathML in the wider world is the

incorporation of _presentation_ MathML, version 1 (not 2), in XHTML
when rendered on the screen by (1) Netscape 7.0, (2) Mozilla 1.0, and
(3) W3C's Amaya. Have I missed anything else?

Freehand code for formatting to LaTeX an XML document type that
incorporates MathML is time-consuming but not difficult. I do not
know, however, of anything in this direction to mention.

I would only dream of using _content_ MathML in DocBook, and that is
not actually something that I am considering for now.

The more serious current issue is just how TeX-based authoring -- for
example the whole body of academic mathematical literature that is
presently TeX-based, as represented in part by the ArXiv repository at
Cornell (http://www.arxiv.org/) (formerly at LANL) -- will undertake
to address MathML.

> . . . However, in the long run, once the XML tools are mature, it

> seems like XML could do the job quite satisfactorly.

(Actually, XML "does" nothing, as already said.)

> Furthermore XML would have the advantage of being machine readible

> using off the shelf XML parsers. While I don't see that being an
> issue for my current project, that is a nice feature and I can
> appreciate that for some documents it might be critical.

Consider, for example, the desirability of being able to include

various articles by different authors in a single "collection"
document without the need for technical markup level intervention by
an editor.

> I get the feeling that LaTeX documents are not easily digested with

> "generic" software components. (That is, processing TeX documents
> requires software that understands TeX).

As shown by the long history of projects to translate LaTeX to HTML.

> So does this mean that TeX/LaTeX's days are numbered? Will this very

> nice system fade into obscurity once the XML tools and standards have
> finished maturing? Or is the world large enough for both technologies?

Not at all. There are many ways that TeX/LaTeX -- and I would add

Ralph Furmaniak

unread,

May 29, 2002, 1:07:56 PM5/29/02

to

> I presume I miss the point, but suppose your latex document contains
> math, then it should be translated to mml, which is part of xml. I
> know about some mml2latex converter but not the other way around.

What a coincidence! I know of a few TeX to mml convertors, but not the
other way around.

Norman Gray

unread,

May 29, 2002, 1:07:52 PM5/29/02

to

Greetings,

I think there's certain amount of talking past each other in this
discussion, but....

Phillip Lord <p.l...@russet.org.uk> writes:

>>>>>> "Peter" == Peter <p...@ecet.vtc.edu> writes:

> Peter> document. The tool does not need to know anything about
> Peter> DocBook for this to be useful to me.

>No I am afraid this is wrong.

>[...]

>But this relates to the logical structure of the XML document, not of
>the docbook. You are for instance assuming that the ordering of the
>document on the printed page ("the third para of the second chapter")
>is the same as in the typed version. This might be true. But if
>docbook supports "include" type statements it might not be. Either way
>you are assuming something about the semantics of docbook.

I'm not sure what distinction you're making in `the logical structure
of the XML document, not of the docbook'. XPath expressions are
defined as assertions (conceptually akin to regexps) on the element
(and attribute and ...) tree which is the result of parsing an XML
document: thus all issues of inclusions and the like are resolved
before any XPath expressions are asserted.

It's a purely syntactical expression: if there is indeed an element
which matches the expression "/chapter[2]/para[3]" then you get that
back; if not, you get nothing. Semantics don't come into it until you
have to decide what to do with the element you just received, which is
not (I think) what's under discussion just here.

>Besides which how big an thing is this? A regexp could achieve the
>same end, and would be just as easy to write.

Manipulating LaTeX documents (or XML ones) using regexps is blue
murder, unless you have extremely constrained documents, which means
extremely disciplined authors. It's not impossible -- I and others
here can show you the scars if you're interested -- but it's not very
nice and it's a lot harder than using XSLT or DSSSL or something
designed for that.

>I don't know. Not for a long time anyway. The main reason that I moved
>toward latex is because it is stable. A 10 year old document will
>compile today. My experience with XML technologies is that stability
>is not something you could accuse them of.

The XML auxiliary technologies like Schemas, XPointer, blah, are still
being developed, but the core technology of XML is essentially SGML,
and you don't get more stable than that (in electronic documents,
before someone brings up cuneiform). LaTeX is less stable than SGML --
we had a change which slightly broke a significant number of documents
as recently as ten years ago.

>The other issue with docbook is that its entirely a logical
>markup. Tex gives you the ability to do arbitrary things.

That is the _really_ bad thing about (La)TeX from some points of view.
It allows authors to be clever, and thus gives them enough rope to
hang you (as the document processor) very high.

This doesn't mean that it's bad, just that it falls nicely between the
two stools of structure and flexibility, which means it's exactly the
right thing (and XML, etc, are not) when between those stools is where
you happen to be. A great number of folk are and will remain between
those stools, so LaTeX isn't about to disappear any time soon. If what
you want is real structure, however, or long-term reusability, LaTeX can
be a real pain.

Oh, and in answer to (what I think was) the original question, xmltex
is the business! It allows LaTeX to parse XML documents, formatting
them with all the power and flexibility of LaTeX. It can't do
anything much with MathML, if I recall correctly, but if the maths in
your XML document is TeX-style, there's no problem.

All the best,

Norman

--
---------------------------------------------------------------------------
Norman Gray http://www.astro.gla.ac.uk/users/norman/

William F. Hammond

unread,

May 29, 2002, 5:24:35 PM5/29/02

to

Ralph Furmaniak <sug...@sympatico.ca> writes:

Converting Tex to MathML is not at all a straightforward -- even a
well-defined -- task.

Converting MathML to the typeset format of your choice is rather
straightforward, though somewhat tedious.

-- Bill

William F. Hammond

unread,

May 29, 2002, 5:47:35 PM5/29/02

to

Phillip Lord <p.l...@russet.org.uk> writes:

> William> But useless and of little relevance to the thread unless
> William> the point being made is that the XML document type involved
> William> is critical.
>
> No, actually its very relevant. XML is a syntax, nothing more and
> nothing less. Tex is not a syntax, although of course it has one.

But an XML document type provides a structured markup vocabulary that
can be subjected to free-standing validation. An XML document type
is much more than a syntax. However, it is true that there is no
canonical formatting of a given XML document type.

If it is viewed as bad news that there is no canonical formatting, the
good news is that there can be many formattings, and one can certainly
provide one's own in the style that one prefers under the typesetting
format of one's choice.

> XML can not replace tex. Something which uses XML as a syntax could
> replace tex. However at the moment there is nothing there which
> competes. Docbook does something different for instance.

There is an SGML document type known as "sgmltexi" that models
Texinfo, the language of the GNU Documentation System. See
http://master.swlibero.org/~daniele/software/sgmltexi/ by Daniele
Giacomini. It comes with a free-standing Perl formatter for
converting its SGML markup to Texinfo. (In fact, I now notice that
there's a May 2002 revision that I've not seen.)

So an SGML document under the sgmltexi document type is essentially
the same thing as a Texinfo document.

> And, as my script shows, tex and XML are orthogonal. You could replace
> tex's existing syntax with XML, and still have something which was
> both tex and XML.

(Actually -- and quite hypothetically -- one might be able to do
something like this with an SGML document type by altering the
default syntax of SGML.)

-- Bill

Chris Gooch

unread,

May 29, 2002, 9:39:28 AM5/29/02

to Peter

This thread reminds me that a few weeks ago I asked
the following question on a technical writer's
mailing list ('tech...@lists.raycomm.com') in response
to a thread about "the directions of tomorrow's technical
writing" - I got no reponse, I think all the Framemaker
users weren't sure what to make of the question. :-)

So I'll ask it again here:

++++++
My question, for those that are interested in such things, is this; if
you
have already invested in using a semantically rich and readable mark-up
language
such as LaTeX (with your own macros for semantic elements), which gives
you
the best available typesetting for free, is it worth investing the extra
effort
in moving to DocBook or some other xml DTD, then introducing the extra
step (conversion from DocBook to TeX) for rendering to paper or PDF?
Is there a tangible gain, and is it worth the extra work? Or should the
increasing
interest in XML just be seen as evidence that other logical/semantic
markup
systems such as LaTeX were the right approach all along?

Does anyone with experience of this have any comments?
++++++++

I understand that XSLT -> TeX is only one method of rendering
DocBook/xml to PDF or PS or whatever, and that FO is another.
I also understand that various tools both commercial and free
are in various stages of development / beta, as discussed here
and on Peter's other thread on comp.text.xml. I was just wondering
if any of the experts out there care to share their thoughts?
What should a technical author who manages the documentation
for a software company's API be keeping himself aware of? :-)

Chris.

Christopher Gooch, Technical Author
LightWork Design, Sheffield, UK.
chris...@lightworkdesign.com www.lightworkdesign.com

Timothy Murphy

unread,

May 29, 2002, 7:30:04 PM5/29/02

to

Norman Gray <nor...@astro.gla.ac.uk> writes:

>It [xmltex] can't do

>anything much with MathML, if I recall correctly, but if the maths in
>your XML document is TeX-style, there's no problem.

How do you put TeX-style maths in an XML document?

--
Timothy Murphy
e-mail: t...@maths.tcd.ie
tel: 086-233 6090
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

Allin Cottrell

unread,

May 29, 2002, 8:47:58 PM5/29/02

to

IMO it depends on the nature of the documents one is dealing with.
In the case of a software company's API I'd seriously consider an
XML-based solution, on account of the flexibility this gives you,
along with the ability to check the structure of documents very
easily. You will no doubt find, though, that it is currently
considerably more difficult to get high-quality typeset results
with such a system -- so if good quality typesetting is very
important to you that would weigh in favor of staying with LaTeX.
The problem with typsetting XML is that one is generally (OK, I
don't know anything about direct FO systems so take with a pinch
of salt if you wish) using TeX at one or more removes. That's
inherently more problematic that using TeX directly.

FWIW, I continue to use LaTeX for papers and books (and would
consider it madness to use anything else), while I use XML, XSL,
jadetex and friends for maintaining a software manual. In the
latter case I'm willing to trade off relatively deficient type-
setting quality (the major bugbear: orphan section headings)
for the sake of easy production of PDF, ordinary HTML and HTML
suitable for compilation into Microsoft CHM files (yuck, but
there you are).

Allin Cottrell
Wake Forest University

Peter

unread,

May 29, 2002, 11:00:30 PM5/29/02

to

In article <i7sn4bu...@whitehead.math.albany.edu>,
ham...@whitehead.math.albany.edu says...

> > For example one could style an XML document using XSLT into XML
> > formatting objects (XML-FO) and then render the XML-FO on a suitable
> > output device.
>
> Just what tool(s) did you have in mind for processing XML-FO on, say,
> a PostScript-compatible printer? The printer does not understand
> XML-FO.

I've played around a bit with the Apache project's FOP. It accepts an
XSL-FO input file and outputs a variety of formats include PostScript,
PDF, and PCL. They claim that support for PDF is the most complete.
However, I wasn't particularly impressed with the results.

Of course some of what I saw may well have been due to the style sheet I
was using. In particular, I downloaded Norman Walsh's style sheets for
DocBook. My procedure was: a) create a small DocBook document, b) Use
Xalan (an XSLT engine) to style the document into XSL-FO using the
aforementioned style sheet, c) use FOP to convert the XSL-FO to PDF, d)
use Acrobat to view the PDF. Issues in the final document might be
issues with the style sheet for all I know. However, FOP did complain
about a bunch of unimplemented things as it processed the XSL-FO input I
provided. That can't be good.

Peter

P.S. Norman Walsh's style sheets seem to style DocBook into HTML quite
nicely... at least for the simple documents I used in my experiments.

Allin Cottrell

unread,

May 29, 2002, 11:04:46 PM5/29/02

to

Timothy Murphy wrote:
> Norman Gray <nor...@astro.gla.ac.uk> writes:
>
>>It [xmltex] can't do
>>anything much with MathML, if I recall correctly, but if the maths in
>>your XML document is TeX-style, there's no problem.
>
>
> How do you put TeX-style maths in an XML document?

With DBTeXMath

http://ricardo.ecn.wfu.edu/~cottrell/dbtexmath/

"1. Executive Summary

If you write SGML documents using the DocBook DTD, the files offered here let
you embed TeX equations directly in your SGML source files, and arrange for the
mathematical notation to be fed directly to TeX on output--hence avoiding both
(a) the need to code mathematics in MathML on the input side, and (b) the need
to rely upon experimental and unfinished dsssl-based mathematical typesetting
code. Provision is made for substituting graphical variants of mathematical
formulae in the case of output to HTML."

s/SGML/XML/ preserves truth value.

Allin Cottrell.

Kasper Peeters

unread,

May 30, 2002, 3:19:38 AM5/30/02

to

> The more serious current issue is just how TeX-based authoring -- for
> example the whole body of academic mathematical literature that is
> presently TeX-based, as represented in part by the ArXiv repository at
> Cornell (http://www.arxiv.org/) (formerly at LANL) -- will undertake
> to address MathML.

It shouldn't address MathML. Not only is the conversion practically
impossible, it also wouldn't lead to any significant advantage from a
scientific point of view. Why change when the existing system does the
job already?

Kasper

André Pönitz

unread,

May 30, 2002, 3:22:34 AM5/30/02

to

Ralph Furmaniak <sug...@sympatico.ca> wrote:
> What a coincidence! I know of a few TeX to mml convertors, but not the
> other way around.

Would you share your knowledge with us?

Andre'

--
Those who desire to give up Freedom in order to gain Security,
will not have, nor do they deserve, either one. (T. Jefferson)

Phillip Lord

unread,

May 30, 2002, 5:53:30 AM5/30/02

to

>>>>> "Norman" == Norman Gray <nor...@astro.gla.ac.uk> writes:

Norman> Greetings,

Norman> I think there's certain amount of talking past each other in
Norman> this discussion, but....

Norman> Phillip Lord <p.l...@russet.org.uk> writes:

>>>>>>> "Peter" == Peter <p...@ecet.vtc.edu> writes:

Peter> document. The tool does not need to know anything about

Peter> Peter> DocBook

>> for this to be useful to me.

>> No I am afraid this is wrong. [...] But this relates to the
>> logical structure of the XML document, not of the docbook. You
>> are for instance assuming that the ordering of the document on
>> the printed page ("the third para of the second chapter") is the
>> same as in the typed version. This might be true. But if docbook
>> supports "include" type statements it might not be. Either way
>> you are assuming something about the semantics of docbook.

Norman> I'm not sure what distinction you're making in `the logical
Norman> structure of the XML document, not of the docbook'.

"The third para of the second chapter" is description of part of the
end product. The book that you are producing. It may be that the third
para of the second chapter in fact maps to the third occurrence of
"para" with the second occurrence of "chapter". There again it may
not. You have to understand something about the semantics of XML to be
sure.

Norman> XPath expressions are defined as assertions (conceptually
Norman> akin to regexps) on the element (and attribute and ...) tree
Norman> which is the result of parsing an XML document: thus all
Norman> issues of inclusions and the like are resolved before any
Norman> XPath expressions are asserted.

No they are not.

In I write a XSD which defines a tag "<include>" then Xpath will not
take account of this and include the sub-document. This is because the
to understand that the effect of typing "include" will be to add in a
sub-document clearly requires knowledge of the semantics of the tags.

>> Besides which how big an thing is this? A regexp could achieve
>> the same end, and would be just as easy to write.

Norman> Manipulating LaTeX documents (or XML ones) using regexps is
Norman> blue murder, unless you have extremely constrained
Norman> documents, which means extremely disciplined authors.

I suspect that many of the difficulties in manipulating latex
comes from its semantic complexity. For the same reason. Saying "find
the third occurrence of "\section" in this file" is not hard. Saying
"find the third section in the document" is much harder. What if there
is a \input? What if someone has done
\newcommmand{\irritatingsection}{\section}?

>> I don't know. Not for a long time anyway. The main reason that I
>> moved toward latex is because it is stable. A 10 year old
>> document will compile today. My experience with XML technologies
>> is that stability is not something you could accuse them of.

Norman> The XML auxiliary technologies like Schemas, XPointer, blah,
Norman> are still being developed, but the core technology of XML is
Norman> essentially SGML, and you don't get more stable than that

If you don't have the auxilliary technologies then they have nothing
at all, except some angle brackets. Even the parser technologies seem
to change rapidly.

Don't get me wrong here, I have nothing against XML technologies
(well, I do...they keep changing, and XSLT is hell spawn of the
Elzeebub if you ask me). Having a universal syntax is fine, but I
don't think its as big a step as some claim.

>> The other issue with docbook is that its entirely a logical
>> markup. Tex gives you the ability to do arbitrary things.

Norman> That is the _really_ bad thing about (La)TeX from some
Norman> points of view. It allows authors to be clever, and thus
Norman> gives them enough rope to hang you (as the document
Norman> processor) very high.

Norman> This doesn't mean that it's bad, just that it falls nicely
Norman> between the two stools of structure and flexibility

Well I would agree with this. Personally I think that this will remain
so for a long time. Document presentation is just very hard, and needs
to be very flexible.

Phil

Norman Gray

unread,

May 30, 2002, 7:48:18 AM5/30/02

to

Greetings,

t...@maths.tcd.ie (Timothy Murphy) writes:

>Norman Gray <nor...@astro.gla.ac.uk> writes:

>>It [xmltex] can't do
>>anything much with MathML, if I recall correctly, but if the maths in
>>your XML document is TeX-style, there's no problem.

>How do you put TeX-style maths in an XML document?

Glibly:

...plus a <!NOTATION ...> declaration and a suitable declaration for <maths>
if you feel like making things legal, which I can't recall off the top
of my head.

How you process that is, of course, another problem entirely, and does
give you a bit of a headache for those output formats which aren't
TeX-based, but it's a perfectly well-defined problem, which is the
point.

Allan Cottrell's DBTeXMath is one way of managing the various
conversions involved, I've home-rolled others.

Norman Gray

unread,

May 30, 2002, 8:10:55 AM5/30/02

to

Greetings,

Phillip Lord <p.l...@russet.org.uk> writes:

> Norman> XPath expressions are defined as assertions (conceptually
> Norman> akin to regexps) on the element (and attribute and ...) tree
> Norman> which is the result of parsing an XML document: thus all
> Norman> issues of inclusions and the like are resolved before any
> Norman> XPath expressions are asserted.

>No they are not.

>In I write a XSD which defines a tag "<include>" then Xpath will not
>take account of this and include the sub-document. This is because the
>to understand that the effect of typing "include" will be to add in a
>sub-document clearly requires knowledge of the semantics of the tags.

Well, yes, I see what you mean, here. But that's surely rather a special
case. Element types such as your <include> are essentially indications
to the document processor to `change the element tree at this point',
and so any searches on the document structure will inevitably change in
effect depending on whether they're defined as happening before or
after that edit.

XPath expressions are defined as acting on essentially syntactic
elements as manifested in an element tree; if the XPath engine chooses
to change that tree before evaluating the expressions (possibly as a
result of processing that tree in some way), then the result of that
evaluation will be different than if it had not changed the tree. So
there is a semantic _aspect_ to this, in the sense that the XPath
engine may or may not have been prompted to edit the tree, but (a) I'm
not sure how LaTeX is any different to XML here, and (b) there's a
confusing contribution from that long-debated XML auxiliary standard
XPurposes.

The only fully general way of identifying the third paragraph in the
fifth chapter is to print the damn thing out and count them.

>I suspect that many of the difficulties in manipulating latex
>comes from its semantic complexity. For the same reason. Saying "find
>the third occurrence of "\section" in this file" is not hard. Saying
>"find the third section in the document" is much harder. What if there
>is a \input? What if someone has done
>\newcommmand{\irritatingsection}{\section}?

Or indeed

\renewcommand\input[1]{I decided not to include #1 here\dots}

I think you're right about LaTeX's complexity. Another way of putting
that might be that the underlying problem is that TeX is
turing-complete, whereas XML declares no processing at all!

William F. Hammond

unread,

May 30, 2002, 10:54:00 AM5/30/02

to

Norman Gray <nor...@astro.gla.ac.uk> writes:

> Phillip Lord <p.l...@russet.org.uk> writes:
> . . .

> >I suspect that many of the difficulties in manipulating latex
> >comes from its semantic complexity. For the same reason. Saying "find
> >the third occurrence of "\section" in this file" is not hard. Saying
> >"find the third section in the document" is much harder. What if there
> >is a \input? What if someone has done
> >\newcommmand{\irritatingsection}{\section}?
>
> Or indeed
>
> \renewcommand\input[1]{I decided not to include #1 here\dots}

But XML, independent of any document type, provides a method for
inclusions. If that is used, then the document instance is the whole
thing, and it is trivial to create an XML tool operating on a linear
parse stream to locate the third <par> in the second <section>.

> I think you're right about LaTeX's complexity. Another way of putting
> that might be that the underlying problem is that TeX is
> turing-complete, whereas XML declares no processing at all!

It is possible to write well-structured LaTeX, say article, documents
that are almost functionally equivalent to document instances of a
hypothetical XML document type "LaTeXArticle". But within the bounds
of observed use of LaTeX, it is also possible to stray very far from
that. The resulting typesetting quality is largely orthogonal to
these different choices of markup style.

An author is likely to find that her well-structured LaTeX source ages
better than her source with loose structure, possibly even with TeX
primitives and catcode manipulation.

On the other hand, it is a very small step from well-structured LaTeX
article source to LaTeXArticle, and taking that step opens a whole new
world of re-usability.

It's simply silly not to go there.

-- Bill

Uwe Brauer

unread,

May 30, 2002, 12:55:36 PM5/30/02

to

André Pönitz <poe...@gmx.de> writes:

> Ralph Furmaniak <sug...@sympatico.ca> wrote:
> > What a coincidence! I know of a few TeX to mml convertors, but not
> >the other way around.
>
> Would you share your knowledge with us?
>
> Andre'

I think I found what he meant, it is TtM written by Ian Hutchinson,
who also wrote tth. There was a free version in beta state available
some years ago, then it vanished. To my surprise it is available
again.

Uwe

William F. Hammond

unread,

May 30, 2002, 11:05:10 AM5/30/02

to

Chris Gooch <chris...@lightworkdesign.com> writes:

> My question, for those that are interested in such things, is this;
> if you have already invested in using a semantically rich and
> readable mark-up language such as LaTeX (with your own macros for
> semantic elements), which gives you the best available typesetting
> for free, is it worth investing the extra effort in moving to
> DocBook or some other xml DTD, then introducing the extra step
> (conversion from DocBook to TeX) for rendering to paper or PDF? Is
> there a tangible gain, and is it worth the extra work?

Yes, though perhaps for some other document type. Stay tuned.

> Or should the increasing interest in XML just be seen as evidence
> that other logical/semantic markup systems such as LaTeX were the
> right approach all along?

Also yes insofar as well-structured LaTeX markup is almost the same
thing as markup under a suitable corresponding XML document type.

> I understand that XSLT -> TeX is only one method of rendering
> DocBook/xml to PDF or PS or whatever, and that FO is another.
> I also understand that various tools both commercial and free
> are in various stages of development / beta, as discussed here
> and on Peter's other thread on comp.text.xml. I was just wondering
> if any of the experts out there care to share their thoughts?
> What should a technical author who manages the documentation
> for a software company's API be keeping himself aware of? :-)

This discussion is mostly about the future.

But note: now that MathML is about to get serious browser deployment,
the idea of being able to translate _new_ LaTeX documents into suitable
web formats becomes an even more serious idea.

It will be interesting to see what Leslie Lamport has to say at the
MathML Conference in Chicago at the end of June.

-- Bill

Phillip Lord

unread,

May 30, 2002, 11:18:27 AM5/30/02

to

>>>>> "Norman" == Norman Gray <nor...@astro.gla.ac.uk> writes:

>> In I write a XSD which defines a tag "<include>" then Xpath will
>> not take account of this and include the sub-document. This is
>> because the to understand that the effect of typing "include"
>> will be to add in a sub-document clearly requires knowledge of
>> the semantics of the tags.

Norman> Well, yes, I see what you mean, here. But that's surely
Norman> rather a special case.

In one way, yes, it is a special case. Perhaps this is a blindly
obvious point, but XML tools will only operate over XML...that is they
can only operate generically on XML syntax. The semantics of data sets
are normally much more complex.

Don't get me wrong here. I am a bioinformaticist. We have an
incredible plethora of syntaxes, some of them very complex. Often many
of them describing semantically identical data, and often the
semantics is very simple. Its a total pain in the ass. If everyone
used XML it would solve many of these problems. But not all of them,
particularly not if we ended up with a plethora of different DTD's!

>> I suspect that many of the difficulties in manipulating latex
>> comes from its semantic complexity. For the same reason. Saying
>> "find the third occurrence of "\section" in this file" is not
>> hard. Saying "find the third section in the document" is much
>> harder. What if there is a \input? What if someone has done
>> \newcommmand{\irritatingsection}{\section}?

Norman> Or indeed

Norman> \renewcommand\input[1]{I decided not to include #1
Norman> here\dots}

Norman> I think you're right about LaTeX's complexity. Another way
Norman> of putting that might be that the underlying problem is that
Norman> TeX is turing-complete, whereas XML declares no processing
Norman> at all!

Just so....

Phil

William F. Hammond

unread,

May 30, 2002, 11:19:12 AM5/30/02

to

Kasper Peeters <K.Pe...@damtp.cam.ac.uk> writes:

> > The more serious current issue is just how TeX-based authoring -- for
> > example the whole body of academic mathematical literature that is
> > presently TeX-based, as represented in part by the ArXiv repository at
> > Cornell (http://www.arxiv.org/) (formerly at LANL) -- will undertake
> > to address MathML.
>
> It shouldn't address MathML. Not only is the conversion practically
> impossible,

for most legacy documents at ArXiv, yes, conversion is impractical.

> it also wouldn't lead to any significant advantage from a
> scientific point of view.

Eh? To grok ArXiv documents you need either to have eyes or to be
TeX, the Program (or pdftex, or omega, ...). You think this is not a
significant disadvantage from the standpoint of the dissemination of
scientific knowledge? And you don't think that the scientific
community was massively dissed in 1995 when the browser makers coerced
W3 into tossing out HTML-3.0 math? And you have no interest in being
able to search inside a document for symbols?

> Why change when the existing system does the job already?

Why have we already changed since 1995 from DVI to PDF as the
preferred typesetting output?

-- Bill

Phillip Lord

unread,

May 30, 2002, 11:25:07 AM5/30/02

to

>>>>> "William" == William F Hammond <ham...@whitehead.math.albany.edu> writes:

William> Norman Gray <nor...@astro.gla.ac.uk> writes:

>> Phillip Lord <p.l...@russet.org.uk> writes: . . .

>> >I suspect that many of the difficulties in manipulating latex
>> comes from its >semantic complexity. For the same reason. Saying
>> "find the third occurrence >of "\section" in this file" is not
>> hard. Saying "find the third section in >the document" is much
>> harder. What if there is a \input? What if someone has >done
>> \newcommmand{\irritatingsection}{\section}? Or indeed
>>
>> \renewcommand\input[1]{I decided not to include #1 here\dots}

William> But XML, independent of any document type, provides a
William> method for inclusions. If that is used, then the document
William> instance is the whole thing, and it is trivial to create an
William> XML tool operating on a linear parse stream to locate the
William> third <par> in the second <section>.

Yes I agree. XML is in fact not entirely a syntax. It has some
semantics there.

Of course the semantics of XML inclusion might not be good enough for
a document description language. Perhaps you want something equivalent
to \include, rather than the \input?

Besides you are still using the semantics of the XML in this case. You
can only know that the third para of the second chapter in the final
document, is represented by the third para tag within the second
chapter tag if you understand something about the processing step
between them.

William> An author is likely to find that her well-structured LaTeX
William> source ages better than her source with loose structure,
William> possibly even with TeX primitives and catcode manipulation.

William> On the other hand, it is a very small step from
William> well-structured LaTeX article source to LaTeXArticle, and
William> taking that step opens a whole new world of re-usability.

William> It's simply silly not to go there.

I am not sure that the whole new world of re-usability is as large as
you think it is. Only time will tell.

Phil

William F. Hammond

unread,

May 30, 2002, 11:54:25 AM5/30/02

to

Phillip Lord <p.l...@russet.org.uk> writes:

> In one way, yes, it is a special case. Perhaps this is a blindly
> obvious point, but XML tools will only operate over XML...that is they
> can only operate generically on XML syntax. The semantics of data sets
> are normally much more complex.

Yes. Leaving XML ideas like schemas aside, however, for a given XML
document type one may use a processing umbrella in the language of
one's choice to do exactly what one wants to do. In fact, go beyond
stylesheets.

For example, SP (http://www.jclark.com/) is a C+ _library_.

For example, with sgmlspl (CPAN) the Perl handlers for events in an
ESIS stream can be as complicated as arbitrary Perl code or as simple
as ASP-like substitutions.

In particular, one can write tools that do data-level validation very
specific to one's particular needs. (Just don't expect to use a "web
browser" as the engine for such processing.)

-- Bill

Ralph Furmaniak

unread,

May 30, 2002, 2:03:11 PM5/30/02

to

> s/SGML/XML/ preserves truth value.

But does s/SGML/XML/g ?
<g>

Ralph Furmaniak

unread,

May 30, 2002, 2:37:32 PM5/30/02

to

> I suspect that many of the difficulties in manipulating latex
> comes from its semantic complexity. For the same reason. Saying "find
> the third occurrence of "\section" in this file" is not hard. Saying
> "find the third section in the document" is much harder. What if there
> is a \input? What if someone has done
> \newcommmand{\irritatingsection}{\section}?

I might just be saying this due to lack of experience, but it could be
interesting and even perhaps useful to have a version of tex that, before
generating the output, prints out a tex file with all of the inputs,
macros, and definitions applied. Just the basics, which can then be more
easily used by external programs.

Kai Großjohann

unread,

May 30, 2002, 4:15:35 PM5/30/02

to

"William F. Hammond" <ham...@whitehead.math.albany.edu> writes:

> But an XML document type provides a structured markup vocabulary that
> can be subjected to free-standing validation. An XML document type
> is much more than a syntax. However, it is true that there is no
> canonical formatting of a given XML document type.

It seems to me that a Yacc grammar (say) could achieve the same as a
DTD or an XML schema. So it's just syntax.

IMHO the real problem is to grok what it means when \author{John
Smith} or <au>John Smith</au> are being read by the program. It is
easier to write a program that reads <au>John Smith</au> because one
can use an existing XML parser, but in the end, the problem is to know
what it means for a string to be marked up as `author'. The word
tells something to a human, but not all element names (or macro names)
do.

Moving from TeX to XML could eliminate the quite nasty problems of
parsing TeX source that uses low-level markup like changing the \
character into something else or suchlike.

kai
--
Silence is foo!

Giuseppe Bilotta

unread,

May 30, 2002, 5:10:55 PM5/30/02

to

William F. Hammond wrote:
>
> > Why change when the existing system does the job already?
>
> Why have we already changed since 1995 from DVI to PDF as the
> preferred typesetting output?

First of all, the above statement is not exactly true (I, for one,
still work in DVI because of the inverse search features which gives
much easier correction capabilities). Second, DVI and PDF are
essentially functionally equivalent (for sure, more than, say, (La)TeX
and XML), so it's quite a different situation.

--
Giuseppe "Oblomov" Bilotta

Axiom I of the Giuseppe Bilotta
theory of IT:
Anything is better than MS

William F. Hammond

unread,

May 30, 2002, 5:17:27 PM5/30/02

to

Kai.Gro...@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> IMHO the real problem is to grok what it means when \author{John
> Smith} or <au>John Smith</au> are being read by the program. It is
> easier to write a program that reads <au>John Smith</au> because one
> can use an existing XML parser,

Not only existing XML parsers but also existing XML processing
frameworks.

> Moving from TeX to XML could eliminate the quite nasty problems of
> parsing TeX source that uses low-level markup like changing the \
> character into something else or suchlike.

The input front end of the GELLMU project enables one to use
LaTeX-like markup for writing XML, at the same time providing
\newcommand-style macros that take arguments.

-- Bill

Kasper Peeters

unread,

May 30, 2002, 5:30:43 PM5/30/02

to

> First of all, the above statement is not exactly true (I, for one,
> still work in DVI because of the inverse search features which gives
> much easier correction capabilities). Second, DVI and PDF are
> essentially functionally equivalent (for sure, more than, say, (La)TeX
> and XML), so it's quite a different situation.

Not to mention that existing DVI viewers are at least an order of
magnitude faster than PDF viewers.

Kasper

Kasper Peeters

unread,

May 30, 2002, 5:39:20 PM5/30/02

to

> Eh? To grok ArXiv documents you need either to have eyes or to be
> TeX, the Program (or pdftex, or omega, ...). You think this is not a
> significant disadvantage from the standpoint of the dissemination of
> scientific knowledge?

No, not at all. I have eyes (most of us do, in case you missed that
;-) ). And TeX is available for free for all platforms on the
planet. See also below.

> And you don't think that the scientific community was massively
> dissed in 1995 when the browser makers coerced W3 into tossing out
> HTML-3.0 math?

No. Nobody would author in HTML-3.0 math anyway, for the same reasons
that MathML is impossible to edit by hand. HTML-3.0 math was not a
serious alternative to TeX, so it was good to drop it altogether.

> And you have no interest in being able to search
> inside a document for symbols?

No. I want to search for keywords and text in the abstract of papers,
not for individual symbols. This is already possible with all papers
at arXiv.org. A well-written abstract and title is a much, much better
alternative than the option of brute force searching for symbols in
equations.

Kasper

Giuseppe Bilotta

unread,

May 30, 2002, 6:02:32 PM5/30/02

to

And they don't lock the DVI file (ok, I know that only the Win32
version of AcroRead does this, but GS has rather poor Type1 rendering
anyway ...)

William F. Hammond

unread,

May 30, 2002, 6:40:04 PM5/30/02

to

Giuseppe Bilotta <obl...@freemail.it> writes:

> William F. Hammond wrote:
> >
> > > Why change when the existing system does the job already?
> >
> > Why have we already changed since 1995 from DVI to PDF as the
> > preferred typesetting output?
>
> First of all, the above statement is not exactly true (I, for one,

ArXiv changed (though one can still get DVI). Don't we imagine that
ArXiv changed because PDF is what users outside the TeX world have?

> still work in DVI because of the inverse search features which gives
> much easier correction capabilities).

This is certainly useful, but isn't it based on dvi specials?
And aren't dvi anchors also still based on dvi specials? Is DVI
still really supported? (Don't misinterpret: I often prefer DVI.)

> Second, DVI and PDF are
> essentially functionally equivalent (for sure, more than, say, (La)TeX
> and XML), so it's quite a different situation.

No and no. 1. A DVI reader requires a full font installation,
pragmatically a TeX installation. 2. LaTeXArticle and
well-structured actual LaTeX are also functionally equivalent.

If, otherwise, it is true (I'm not sure) that DVI and PDF are
functionally equivalent, then shouldn't it be possible to have a
sleek PDF reader parallel to xdvi?

-- Bill

Giuseppe Bilotta

unread,

May 30, 2002, 7:26:26 PM5/30/02

to

William F. Hammond wrote:
> > still work in DVI because of the inverse search features which gives
> > much easier correction capabilities).
>
> This is certainly useful, but isn't it based on dvi specials?
> And aren't dvi anchors also still based on dvi specials? Is DVI
> still really supported? (Don't misinterpret: I often prefer DVI.)

What do you mean by "still really supported"?

> > Second, DVI and PDF are
> > essentially functionally equivalent (for sure, more than, say, (La)TeX
> > and XML), so it's quite a different situation.
>
> No and no. 1. A DVI reader requires a full font installation,
> pragmatically a TeX installation. 2. LaTeXArticle and
> well-structured actual LaTeX are also functionally equivalent.
>
> If, otherwise, it is true (I'm not sure) that DVI and PDF are
> functionally equivalent, then shouldn't it be possible to have a
> sleek PDF reader parallel to xdvi?

Like xpdf?

Patrick TJ McPhee

unread,

May 30, 2002, 10:46:08 PM5/30/02

to

In article <i7r8jtf...@whitehead.math.albany.edu>,
William F. Hammond <ham...@whitehead.math.albany.edu> wrote:
% Giuseppe Bilotta <obl...@freemail.it> writes:

[...]

% > still work in DVI because of the inverse search features which gives
% > much easier correction capabilities).
%
% This is certainly useful, but isn't it based on dvi specials?
% And aren't dvi anchors also still based on dvi specials? Is DVI
% still really supported? (Don't misinterpret: I often prefer DVI.)

I'm not sure what your point is. You could think of dvi specials
as equivalent to pdf dictionaries. Is PDF not really supported because
many useful features are implemented using dictionaries?

% If, otherwise, it is true (I'm not sure) that DVI and PDF are
% functionally equivalent, then shouldn't it be possible to have a
% sleek PDF reader parallel to xdvi?

If you mean by sleek smallish, on my machine, xdvi is 466,360 bytes and
xpdf is 377,948 bytes. xpdf doesn't support every feature of acrobat,
and it requires libt1 and libttf which push the size up by another
few hundred kb, but it's still fairly svelt by the standards of pdf
readers.

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

André Pönitz

unread,

May 31, 2002, 3:49:13 AM5/31/02

to

Uwe Brauer <o...@eucmos.sim.ucm.es> wrote:
> I think I found what he meant, it is TtM written by Ian Hutchinson,
> who also wrote tth. There was a free version in beta state available
> some years ago, then it vanished. To my surprise it is available
> again.

Look like binary only. Anyway, I'll have a look - or does anybody know
whether it works well?

Andre'

Timothy Murphy

unread,

May 30, 2002, 3:39:33 PM5/30/02

to

"William F. Hammond" <ham...@whitehead.math.albany.edu> writes:

>On the other hand, it is a very small step from well-structured LaTeX
>article source to LaTeXArticle, and taking that step opens a whole new
>world of re-usability.
>It's simply silly not to go there.

Does that include a LaTeX article with mathematics?

--
Timothy Murphy
e-mail: t...@maths.tcd.ie
tel: 086-233 6090
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

William F. Hammond

unread,

May 31, 2002, 6:43:45 AM5/31/02

to

t...@maths.tcd.ie (Timothy Murphy) writes:

> "William F. Hammond" <ham...@whitehead.math.albany.edu> writes:
>
> >On the other hand, it is a very small step from well-structured LaTeX
> >article source to LaTeXArticle, and taking that step opens a whole new
> >world of re-usability.
> >It's simply silly not to go there.
>
> Does that include a LaTeX article with mathematics?

Yes. For specifics of the GELLMU sketch see the Weierstrass product
of the Gamma function in my TUG 2001 slides:

http://math.albany.edu:8000/math/pers/hammond/Presen/tug2001/ .

-- Bill

Kasper Peeters

unread,

May 31, 2002, 7:22:51 AM5/31/02

to

> ArXiv changed (though one can still get DVI). Don't we imagine that
> ArXiv changed because PDF is what users outside the TeX world have?

Not really; from experience I can inform you that most if not all
physicists in theoretical particle physics have a full TeX
installation themselves, and use either xdvi or ghostview.

One reason for offering PDF is to enable physicists visiting weird
places with basic Windows-only machines (like internet cafes or
institutes taken over by Microsoft 'sponsoring') to at least get some
work done by being able to view/print these papers with IE and its
standard helpers. It would be incorrect to conclude from this that PDF
viewers are actually what people prefer to use if there is a choice.

Kasper

William F. Hammond

unread,

May 31, 2002, 8:07:43 AM5/31/02

to

Kasper Peeters <K.Pe...@damtp.cam.ac.uk> writes:

> > Eh? To grok ArXiv documents you need either to have eyes or to be
> > TeX, the Program (or pdftex, or omega, ...). You think this is not a
> > significant disadvantage from the standpoint of the dissemination of
> > scientific knowledge?
>
> No, not at all. I have eyes (most of us do, in case you missed that
> ;-) ). And TeX is available for free for all platforms on the
> planet. See also below.

I need to be in front of a suitable screen to read pdf. Sometimes I
just want to do a quick lookup, and robust HTML renditions are great
for that -- especially if I'm confined to a vt100 window, because
sometimes where I am I can't get directly into MathSciNet and X11 is
much too slow.

> No. Nobody would author in HTML-3.0 math anyway, for the same reasons
> that MathML is impossible to edit by hand. HTML-3.0 math was not a
> serious alternative to TeX, so it was good to drop it altogether.

I agree that HTML, with or without any form of math markup, should be
auto-spun from a more suitable language for authors.

> > And you have no interest in being able to search
> > inside a document for symbols?
>
> No. I want to search for keywords and text in the abstract of papers,
> not for individual symbols. This is already possible with all papers
> at arXiv.org. A well-written abstract and title is a much, much better
> alternative than the option of brute force searching for symbols in
> equations.

You did quote me saying "inside a document". I meant while I am reading
a specific document on the screen.

My meaning of "symbol" was, however, somewhat opaque. An example is
whatever notation I might use, for example, for the unitary normalizer
of the group of unitary Heisenberg operators on the Hilbert space of 3
dimensional Euclidean space. In my preamble I want something like

\newcommand{\Rn}[1]{\mathbb{R}^{#1}}
\newcommand{\unho}[1]{\mbox{Aut}_{U}(\mbox{L}^2(#1))}

-- perfectly standard -- and then

\mathsym{\unho3}{\unho{\Rn{3}}}

(Bear in mind that this is LaTeX-like markup for writing SGML.)

In my article body when I enter $\unho3$ I get the typesetting I want
but also with LaTeXArticle the name "unho3" is a search key that is
automatically available at every invocation.

Let's look ahead a bit. Today's Mozilla-style MathML rendering
foreshadows what _could_ be done with "stylesheets". Then the idea is
that when the reading user probes a typeset instance of $\unho3$ with
her mouse, she gets a cloud revealing the search key.

This just involves searching for instances of an XML element called
"Sym" having attribute "key" with value equal to the author's search
key in this case "unho3".

In fact, I'm inclined to assume that this user is simultaneously using
a paper typeset version.

Here's complete buildable GELLMU source for a demo:

-----
\documenttype{article}
\newcommand{\Rn}[1]{\mathbb{R}^{#1}}
\newcommand{\unho}[1]{\mbox{Aut}_{U}(\mbox{L}^2(#1))}
\mathsym{\unho3}{\unho{\Rn{3}}}
\title{Mathsym Demo}
\begin{document}

\[ \unho3 \]

\end{document}
-----

-- Bill

William F. Hammond

unread,

May 31, 2002, 8:41:57 AM5/31/02

to

"William F. Hammond" <ham...@whitehead.math.albany.edu>
writing while insufficiently awake:

> \newcommand{\unho}[1]{\mbox{Aut}_{U}(\mbox{L}^2(#1))}

To match the text description this should have been:

\newcommand{\unho}[1]{\mbox{Aut}_{U}(\mbox{Hs}(#1))}

(The argument can be any locally compact abelian group.)

-- Bill

Kasper Peeters

unread,

May 31, 2002, 8:58:17 AM5/31/02

to

> Sometimes I just want to do a quick lookup, and robust HTML
> renditions are great for that
>

> [...]

>
> Then the idea is that when the reading user probes a typeset
> instance of $\unho3$ with her mouse, she gets a cloud revealing the
> search key.

I cannot imagine that I would ever want to search for symbols that
way, but if you say you need it then you apparently have different
needs.

(If the user really wanted to know about "the unitary normalizer of

the group of unitary Heisenberg operators on the Hilbert space of 3

dimensional Euclidean space", why didn't she just look it up in the
index or table of contents? If a document is so badly written that the
only way to figure out the meaning of a symbol is to search for it by
brute force, I'm not sure whether it is worth reading at all.)

Kasper

William F. Hammond

unread,

May 31, 2002, 10:15:21 AM5/31/02

to

Kasper Peeters <K.Pe...@damtp.cam.ac.uk> writes:

> > Then the idea is that when the reading user probes a typeset
> > instance of $\unho3$ with her mouse, she gets a cloud revealing the
> > search key.
>
> I cannot imagine that I would ever want to search for symbols that
> way, but if you say you need it then you apparently have different
> needs.

Different individuals work differently. The idea of symbol-based
searching was first brought to my attention at an AMS meeting in 1993,
though I still don't know of any current broad-based implementation.

> (If the user really wanted to know about "the unitary normalizer of
> the group of unitary Heisenberg operators on the Hilbert space of 3
> dimensional Euclidean space", why didn't she just look it up in the
> index or table of contents?

It might not be important enough for a sectional title in the context
of a given article.

There would be a TOC entry only if there was a sectional title not
below specified depth. Few journal articles, even long ones, have
indices.

Of course, if \mathsym were to be used, it could be the basis for a
compiled index of symbols.

It's just one example of a usabilty feature that becomes cheap with
LaTeXArticle.

There is also the whole arena of re-usability.

Think of markup languages as the objects in a category. It's in the
interest of ArXiv to be able to handle objects that lend themselves to
as many emanating arrows as might someday be desired. This is what
makes the XML subcategory so important.

-- Bill

Kai Großjohann

unread,

Jun 5, 2002, 6:04:47 AM6/5/02

to

Kasper Peeters <K.Pe...@damtp.cam.ac.uk> writes:

> (If the user really wanted to know about "the unitary normalizer of
> the group of unitary Heisenberg operators on the Hilbert space of 3
> dimensional Euclidean space", why didn't she just look it up in the
> index or table of contents? If a document is so badly written that the
> only way to figure out the meaning of a symbol is to search for it by
> brute force, I'm not sure whether it is worth reading at all.)

Yahoo and Google are both useful. So searching in the table of
contents and searching in the fulltext are also both useful.