What symbols are fundamental, and what ones are derived?
Knuths TeX book mumbles something about "registers" that
hold things like page numbers or whatever. Where is there
a complete list of these registers, uses, and limitations?
Knuths TeX book is an abomination, describing lexing and parsing
as mouth, gullet and stomach nonsense.
[Well, he invented most of what we know about parsing, he gets to
explain it any way he wants. Chapters 7 and 8 describe the syntax
operationally. -John]
> I've looked high and low without success. Where can i find
> something resembling the BNF of Knuth's TeX typesetting syntax?
Most of what is described in the TeXbook is plain TeX, which uses many
macros in plain.tex. I believe the book describes as primitives those
that are actually built in to the processor before any macros are
loaded (virtex).
> What symbols are fundamental, and what ones are derived?
As I mentioned earlier today in another newsgroup, TeX can't be
compiled. Among the things that can be changed are which characters
are letters and which are not. That can be changed just about up to
the point that they are read in and processed. You can't even say
what a symbol is until you have described which characters are
letters.
> Knuths TeX book mumbles something about "registers" that hold things
> like page numbers or whatever. Where is there a complete list of
> these registers, uses, and limitations?
I believe the fundamental integer registers are \count0 through
\count255, but most that are actually used are defined through macros.
The TeXbook describes them pretty well.
> Knuths TeX book is an abomination, describing lexing and parsing as
> mouth, gullet and stomach nonsense.
That is pretty important, as in some cases macros change things
just before they are used. If you get it wrong, they are changed
too late. Consider \def\x{\y}\x is x defined before it is
expanded, or not? How about \def\x{\y} \x ?
In some places white space is significant, and others it is not.
That is not going to be easy to do in BNF.
-- glen
Well, the stuff is in the TeXbook, but for most practical applications
that won't get you very far because TeX is a macro-processing
language, which has a tendency to blur (in practice although not in
theory) the syntax and the semantics of the language. Typical
user-visible features in plain or LaTeX involve many layers of macros.
When I try to parse LaTeX I usually just write a crude parser that
recognises LaTeX's special characters and gobbles up {} and []
delimited paramaters. The BNF is trivial.
Adrian
> Russell Shaw wrote:
> > Knuths TeX book mumbles something about "registers" that hold things
> > like page numbers or whatever. Where is there a complete list of
> > these registers, uses, and limitations?
>
> I believe the fundamental integer registers are \count0 through
> \count255, but most that are actually used are defined through macros.
> The TeXbook describes them pretty well.
>
> > Knuths TeX book is an abomination, describing lexing and parsing as
> > mouth, gullet and stomach nonsense.
>
> That is pretty important, as in some cases macros change things
> just before they are used. If you get it wrong, they are changed
> too late. Consider \def\x{\y}\x is x defined before it is
> expanded, or not? How about \def\x{\y} \x ?
Knuth's TeX Book is a manual for using TeX. It is a wonderful book.
It is volume A of a 5 volume work whose volumes are:
A. The TeX Book
B. TeX the Program
C. The METAFONT Book
D. METAFONT the Program
E. Computer Modern Typefaces
Volumes A and C are manuals for TeX and METAFONT, respectively. Volumes
B and D are literate documentation of these two programs. TeX and METAFONT
were written in Knuth's WEB system of literate programming. The sources
were therefore WEB files. A web file can be processed in two ways. One
processor for a web file is called weave. When one applies weave to a
web file, one gets a TeX file which is a literate account of the workings
of the program in all detail. When one applies tangle to a web file, one
gets a pascal program which is the program which is being documented by
the TeX file. One gets the program to work by running the pascal program
through pascal. (Of course, to get around the limitations of pascal, one
instead runs the pascal program through p2c to convert it to a C program
and then compiles the C program.)
The book "TeX the Program" is the result of taking the WEB source for
TeX, running it through weave, and then running the resulting TeX file
through TeX. This method of documenting programs is so good that the
resulting documentation is publishable, in this case as the book, "TeX
the Program". Similarly, the book "METAFONT the Program" is the result
of giving the WEB source of METATONT the same treatment.
So, to make a long story short, if you want details about the actual
workings of TeX and METAFONT, at a level that one can't really expect
a manual such as The TeX Book or The METAFONT Book to provide, you should
look at volumes B and D of Knuth's 5 volume masterpiece.
Volume E explains how all of Knuth's computer modern fonts were designed
in case you want to learn the art of designing professional quality fonts.
When they first appeared in the 1980's, I purchased all 5 volumes for $150
and I have never regretted it, nor have I ever stopped learning from them.
If you want to learn more about the concept of literate programming, Knuth
published an article about it and I think there is also a book on it. You
can also follow the newsgroup comp.programming.literate. There are now many
programs for literate documentation of programs. For C programs, Knuth and
Levy developed CWEB, with processors cweave and ctangle instead of weave and
tangle, and these programs have themselves been literately documented using
CWEB. One of the more popular literate programming programs was developed by
Norman Ramsay and is called noweb. It can be used to document just about any
programming language but loses the automatic formatting of code that one gets
with CWEB, although some people have developed packages to compensate for this.
--
Ignorantly,
Allan Adler <a...@zurich.csail.mit.edu>
> Where can i find something resembling the BNF of Knuth's TeX typesetting
> syntax?
>
> Knuths TeX book is an abomination, describing lexing and parsing as
> mouth, gullet and stomach nonsense.
You have already been referred to "TeX the Program" for a detailed
explanation of the internal workings. Victor Eijkhout's "TeX by Topic"
(<http://www.eijkhout.net/tbt/>) provides another, quite readable,
introduction to TeX's parsing process (see Chapters 1--3 and 11--14).
--
Philipp Lucas
phl...@f-m.fm
>> > Knuths TeX book is an abomination, describing lexing and parsing as
>> > mouth, gullet and stomach nonsense.
>Knuth's TeX Book is a manual for using TeX. It is a wonderful book.
>It is volume A of a 5 volume work whose volumes are:
>A. The TeX Book
>B. TeX the Program
>C. The METAFONT Book
>D. METAFONT the Program
>E. Computer Modern Typefaces
I had no patience for any of these books. On the other hand, The Art of
Programming is excellent.
Anyway, I learned TeX with these:
Rayond Seroul's and Silvio Levy's "A Beginners's Book of TeX"
(ISBN-0-387-97562-4, and ISBN-0-540-97562-4) and David Salomon's "The
Advanced TeXbook" (ISBN-0-387-94556-3).
I made my own preprocessor/format for TeX based on these books. I
remember playing a lot of games with the macro name space. For
example, only a limited character set can be used in macro names, but
I wanted to use any character. I ended up escaping: AA maps to A, AB
maps to backslash, AC to {, etc.
--
/* jha...@world.std.com AB1GO */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
> I made my own preprocessor/format for TeX based on these books. I
> remember playing a lot of games with the macro name space. For
> example, only a limited character set can be used in macro names, but
> I wanted to use any character. I ended up escaping: AA maps to A, AB
> maps to backslash, AC to {, etc.
Just about any character can be used in macro names if one really
wants to do it. LaTeX uses @ in many of its internal names, by
first changing the catcode of @ to letter, then defining macros
with @ in the names, or references to those macros. After
defining all the internal macros, it changes @ back to other.
You can also change the escape character used for macro references
from \ to something else. I believe catcode is described fairly
early in the TeXbook.
-- glen
>> I made my own preprocessor/format for TeX based on these books. I
>> remember playing a lot of games with the macro name space. For
>> example, only a limited character set can be used in macro names, but
>> I wanted to use any character. I ended up escaping: AA maps to A, AB
>> maps to backslash, AC to {, etc.
>Just about any character can be used in macro names if one really
>wants to do it. LaTeX uses @ in many of its internal names, by
>first changing the catcode of @ to letter, then defining macros
>with @ in the names, or references to those macros. After
>defining all the internal macros, it changes @ back to other.
This is getting off-topic, but may be useful information for those
wanting to use TeX as a back-end- certainly there is a good
possibility that compiler writers would be consulted for such a task
:-)
I wanted to have any user defined string in a macro name, so it is not
practical to play evil catcode games. Incidentally, this is for
writing cross reference data as a set of macro definitions to an
auxiliary file.
I was writing the pre-processor, so it was very easy to just remap
characters in the way I describe above. Later, I learned a more
conventional way to do this using \meaning (which is more or less how
\label{} and \ref{} in LaTeX work):
Create a macro called "\a \_$" without messing with the catcodes:
\expandafter\def\csname \meaning a\meaning \meaning \\meaning _\meaning $\endcsname{test}
Expand it:
This is a \csname \meaning a\meaning \meaning \\meaning _\meaning $\endcsname.
The TeX Book would say something like:
Exercise for the reader: create macro \label{foo_} which
writes "\expandafter\def\csname \meaning f\meaning o\meaning o\meaning _\endcsname{12}"
(where 12 is the current page number) to the .aux file.
Also write the matching \ref{}.
Extra points: have \meaning appear only on characters which are not letters.
It doesn't really parse at all, and its lexical entities are defined at
runtime, by the input it's lexing: spelling and boundaries are defined
and quite commonly, even for fundamental symbols, redefined several
times, in the input files themselves.
Most people could agree these would be bad things in a programming language.