Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

YACC or Bison grammar for TeX or latex

549 views
Skip to first unread message

Justin R. Smith

unread,
Dec 5, 1998, 3:00:00 AM12/5/98
to
Do these exist anywhere? I want to write a C++ program that can parse TeX.

--
______________________________________________________________________
|
Time blows wildly against my door | Justin R. Smith
Stirring discarded sorrows | Department of Mathematics and
Like dead leaves of summers past | Computer Science
Memories of forgotten lore 11/21/98 | Drexel University
Making way for new tomorrows | Philadelphia, PA 19104
New hopes, new fears, |
and new ways that last | Office: (215) 895-1847
|
© Justin R. Smith, March 14, 1994 | Fax: (215) 895-1582

Home page: http://www.mcs.drexel.edu/~jsmith


James Kilfiger

unread,
Dec 5, 1998, 3:00:00 AM12/5/98
to
Justin R. Smith wrote:
> Do these exist anywhere? I want to write a C++ program that can parse TeX.

It doesn't, and probably can't, because TeX is a macro expansion
language, and there is the possiblity of arbitary catcode changes (etc,
etc), any sequence of characters has the potential to be valid TeX.

James


Simon Cozens

unread,
Dec 5, 1998, 3:00:00 AM12/5/98
to
James Kilfiger (comp.text.tex):

Yes, I've tried something similar, and found this out. The only way to
parse TeX, IMHO, is like TeX parses TeX. Have a look at TeX: The Program.
(Or weave and print out the web file)

--
In most countries selling harmful things like drugs is punishable.
Then howcome people can sell Microsoft software and go unpunished?
(By ha...@rost.abo.fi, Hasse Skrifvars)

Thorsten Ohl

unread,
Dec 8, 1998, 3:00:00 AM12/8/98
to
Simon Cozens <pemb...@sable.ox.ac.uk> writes:

> Yes, I've tried something similar, and found this out. The only way to
> parse TeX, IMHO, is like TeX parses TeX.

If you restrict yourself to LaTeX (catcode changes are verboten!),
then you can parse it in a more regular fashion.

Hevea (ftp://ftp.inria.fr/INRIA/Projects/para/hevea) does a good job.
--
Thorsten Ohl, Physics Department, TU Darmstadt -- o...@hep.tu-darmstadt.de
http://heplix.ikp.physik.tu-darmstadt.de/~ohl/ [<=== PGP public key here]

Donald Arseneau

unread,
Dec 8, 1998, 3:00:00 AM12/8/98
to
In article <ueg1aqy...@heplix4.ikp.physik.tu-darmstadt.de>, Thorsten Ohl <ohl@*RemoveTheStars*hep.tu-darmstadt.de> writes...

>Simon Cozens <pemb...@sable.ox.ac.uk> writes:
>
>> Yes, I've tried something similar, and found this out. The only way to
>> parse TeX, IMHO, is like TeX parses TeX.
>
>If you restrict yourself to LaTeX (catcode changes are verboten!),
>then you can parse it in a more regular fashion.

This is obviously not true. As a counterexample, the bit I
showed yesterday(?) from cite.sty:

\@tempskipa\lastskip \edef\@tempa{\the\@tempskipa}\unskip
\ifnum\lastpenalty=\z@ \penalty\@highpenalty \fi
\ifx\@tempa\@zero@skip \spacefactor1001 \fi % if no space before, set flag
\ifnum\spacefactor>\@m \ \else \hskip\@tempskipa \fi

If you parse in a regular fashion, but not like TeX, you are restricted
to a regularized subset of TeX *and* LaTeX. The snippet shown is used
in LaTeX. If you can't parse the TeX definitions of LaTex packages, then
you have to rewrite those packages in some other form (perl, caml...)
to support them. That's a far cry from automatic document conversion.

Does hevea (?) support the array package, including \newcolumntype?
How about xspace?

Donald Arseneau as...@triumf.ca

Thorsten Ohl

unread,
Dec 9, 1998, 3:00:00 AM12/9/98
to
as...@reg.triumf.ca (Donald Arseneau) writes:

> >If you restrict yourself to LaTeX (catcode changes are verboten!),
> >then you can parse it in a more regular fashion.
>
> This is obviously not true.

Sorry. I should have said `LaTeX as in Lamport's Book' (w/ some
popular extensions).

> That's a far cry from automatic document conversion.

Definitely.

> Does hevea (?) support the array package, including \newcolumntype?
> How about xspace?

AFAIK, not.

Here's the blurb from the HEVEA people:

ADVERTISEMENT

HEVEA is a LaTeX to HTML translator. The input language is a fairly
complete subset of LaTeX2e (old LaTeX style is also accepted) and the
output language is HTML that is (hopefully) correct with respect to
version 3.2.

Exotic symbols are translated into symbols
pertaining to the symbol font of the HTML browser, using the
non-standard FACE attribute of the FONT tag.
This allows the translation to HTML of quite a lot of the symbols
used in LaTeX.

HEVEA understands LaTeX macro definitions. Simple user style
files are understood with little or no modifications.
Furthermore, HEVEA customization is done by writing LaTeX code.

HEVEA is written in Objective Caml, as many lexers. It is quite fast
and flexible. Using HEVEA it is possible to translate large documents
such as manuals, books, etc. very quickly. All documents are
translated as one single HTML file. Then, the output file can be cut
into smaller files, using the companion program HACHA.

Simon Cozens

unread,
Dec 9, 1998, 3:00:00 AM12/9/98
to
Simon Cozens (comp.text.tex):

> Yes, I've tried something similar, and found this out. The only way to
> parse TeX, IMHO, is like TeX parses TeX. Have a look at TeX: The Program.
> (Or weave and print out the web file)

Having said that, there's a Text::TeX perl module, so it must be possible.

--
"A word to the wise: a credentials dicksize war is usually a bad idea on the
net."
(David Parsons in c.o.l.development.system, about coding in C.)

Robin Fairbairns

unread,
Dec 10, 1998, 3:00:00 AM12/10/98
to
In article <74l907$qcj$1...@aslan.lewell>,

Simon Cozens <si...@brecon.co.uk> wrote:
>Simon Cozens (comp.text.tex):
>> Yes, I've tried something similar, and found this out. The only way to
>> parse TeX, IMHO, is like TeX parses TeX. Have a look at TeX: The Program.
>> (Or weave and print out the web file)
>
>Having said that, there's a Text::TeX perl module, so it must be possible.

hence the well known phrase "the difficult we do, the impossible
requires a perl module".
--
Robin Fairbairns, Cambridge

Alan Shutko

unread,
Dec 10, 1998, 3:00:00 AM12/10/98
to
>>>>> "S" == Simon Cozens <pemb...@sable.ox.ac.uk> writes:

S> Having said that, there's a Text::TeX perl module, so it must be
S> possible.

Does it wok with David Carlisle's document in
<yg4soeo...@openmath.nag.co.uk>?

--
Alan Shutko <a...@acm.org> - By consent of the corrupted
Fats Loves Madelyn.

Timothy Murphy

unread,
Dec 10, 1998, 3:00:00 AM12/10/98
to
Thorsten Ohl <ohl@*RemoveTheStars*hep.tu-darmstadt.de> writes:

>If you restrict yourself to LaTeX (catcode changes are verboten!),
>then you can parse it in a more regular fashion.

>Hevea (ftp://ftp.inria.fr/INRIA/Projects/para/hevea) does a good job.

As a matter of interest, does TeX/LaTeX belong to
the class of formal languages yacc/bison can parse ? [Maybe LL(1) ?]


--
Timothy Murphy
e-mail: t...@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

0 new messages