Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug in TeX ?

26 views
Skip to first unread message

ks.vladimir

unread,
Mar 10, 2009, 4:24:04 PM3/10/09
to
Try the following code:
\showthe\count10!!!

The output is:
> 0.
<to be read again>
!
<*> \showthe\count25!
!!

Look at the last two lines. They point to the character '!' after
\count25 which is an error because '!' is not part of the <internal
quantity> eaten by \showthe. Correct lines would be:

<*> \showthe\count25
!!!

It correctly works for single variables (\showthe\endlinechar) and for
countdef tokens(\countdef\x25\showthe\x), but not for registers with
indexes. Can I ask Knuth for legendary $327.68 ? :-)

ks.vladimir

unread,
Mar 10, 2009, 4:30:13 PM3/10/09
to
Hm, it seems that tex eats that '!' when parsing <number>. Indeed, the
'!' is visible in output. It seems that I was wrong this time.

Donald Arseneau

unread,
Mar 10, 2009, 5:26:15 PM3/10/09
to
On Mar 10, 12:24 pm, "ks.vladimir" <ks.vladi...@gmail.com> wrote:
> Try the following code:
> \showthe\count10!!!
>
> The output is:> 0.
>
> <to be read again>
> !
> <*> \showthe\count25!
> !!
>
> Look at the last two lines. They point to the character '!' after
> \count25 which is an error because '!' is not part of the <internal
> quantity> eaten by \showthe.

I think the real error is reporting "count25" instead of
"count10"! :-)

I suspect you have sorted it out now, but the messages
are entirely correct and detailed enough to be clear.
TeX had to read the first "!" to find that it was not
another digit for the register index. It has put it aside
"to be read again". Two more exclamations are still
pending in the buffer and have not been read.

> It correctly works for single variables (\showthe\endlinechar)

It works correctly in both cases, but for \endlinechar it is
as you expected because TeX did not look for additional
digits.

Donald Arseneau

ks.vladimir

unread,
Mar 10, 2009, 6:38:06 PM3/10/09
to
Yes, now I understand it. Thanks for the explanation.

The reason for suspecting a bug was that \showthe is executed after
the number was parsed and TeX already knows where is stops. Currently
in my implementation of TeX parser (http://code.google.com/p/texpp/)
\showthe execution routine gets only the tokens which forms the number
itself so emulating such TeX behavior requires additional effort.

Jonathan Fine

unread,
Mar 11, 2009, 5:53:37 AM3/11/09
to ks.vladimir
Was: Re: Bug in TeX ?

ks.vladimir wrote:

> The reason for suspecting a bug was that \showthe is executed after
> the number was parsed and TeX already knows where is stops. Currently
> in my implementation of TeX parser (http://code.google.com/p/texpp/)
> \showthe execution routine gets only the tokens which forms the number
> itself so emulating such TeX behavior requires additional effort.

Please tell us - or at least me - more about this project. I'm
particularly interested in knowing why you wish to write it (not that I
think it is a bad idea).

--
Jonathan

ks.vladimir

unread,
Mar 11, 2009, 6:12:51 AM3/11/09
to
On Mar 11, 12:53 pm, Jonathan Fine <J.F...@open.ac.uk> wrote:
> Please tell us - or at least me - more about this project.  I'm
> particularly interested in knowing why you wish to write it (not that I
> think it is a bad idea).
I'm writing it to solve my particular task: automatic LaTeX document
modification. A very important requirement for me is to never break
neither formating nor meaning of the document. I've tested some of
already existing projects (LaTeX::TOM, plasTeX) but they do not meet
this requirement. I've also spend some time digging into TeX source
code, but re-using it seems to be not an easy task for me. So I've
decided to write my own solution. It may also be useful for others
(for example for implementing TeX auto-completion for TeX editor, or
robust detex solution or converter from TeX to another format) so I've
decided to release the code as separate project.

Currently TeXpp parses TeX documents in a simple form of document tree
(which should be especially convenient when using python interface).
The commands are actually executed, so TeX self-modifying features
will work. Of course it still requires huge amount of work to be
completed...

Jonathan Fine

unread,
Mar 11, 2009, 8:06:25 AM3/11/09
to
ks.vladimir wrote:
> On Mar 11, 12:53 pm, Jonathan Fine <J.F...@open.ac.uk> wrote:
>> Please tell us - or at least me - more about this project. I'm
>> particularly interested in knowing why you wish to write it (not that I
>> think it is a bad idea).
> I'm writing it to solve my particular task: automatic LaTeX document
> modification. A very important requirement for me is to never break
> neither formating nor meaning of the document.

Thank you. I have some suggestions which might help you. But first,
could you give some examples of 'difficult features' in the documents
you are processing.

For example, do they change catcodes. (Most maths documents don't.)

> I've tested some of
> already existing projects (LaTeX::TOM, plasTeX) but they do not meet
> this requirement. I've also spend some time digging into TeX source
> code, but re-using it seems to be not an easy task for me.

I didn't know about LaTeX::TOM. Thank you.

--
Jonathan

ks.vladimir

unread,
Mar 11, 2009, 9:11:40 AM3/11/09
to
On Mar 11, 3:06 pm, Jonathan Fine <J.F...@open.ac.uk> wrote:
> Thank you.  I have some suggestions which might help you.  But first,
> could you give some examples of 'difficult features' in the documents
> you are processing.
>
> For example, do they change catcodes.  (Most maths documents don't.)

Yes, catcodes are almost never touched (except for @). The most
problematic feature is macros and their arguments. Another problem is
huge number of packages which defines their own curious macros. I've
tested LaTeX::TOM, plasTeX and detex on huge number of random articles
from arxiv.org and the rate of failures was very big due to various
reasons.

Actually, due to time limits, my initial plan was to implement only
the most essential subset of TeX commands in the library and stubs for
other TeX and LaTeX commands in the python program on top of the
library. The direction for further work will be determined by testing
the program on real articles.

ks.vladimir

unread,
Mar 11, 2009, 9:20:35 AM3/11/09
to
I've found one more strange behavior in TeX: the code \count0=`\zzz
assigns value 48 to the count0. The command is bad since only one-
character tokens are allowed after `, but in case of any other bad
command TeX assigns zero to count0.

Jonathan Fine

unread,
Mar 11, 2009, 9:34:38 AM3/11/09
to

I get the same result, using
\message{\number`\zzz}

Note that 48 is the ASCII code for '0'. So it seems as though TeX is
inserting a '0' character as error recovery, and then applying \zzz.

I've also tried this:
===
$ tex
This is TeX, Version 3.141592 (MiKTeX 2.4)
**\def\zzz{123}

*\message{\number`\zzz}
! Improper alphabetic constant.
<to be read again>
\zzz
<*> \message{\number`\zzz
}
?
48123
*
===

I'd have to look at the source of TeX (which I don't have to hand) to
know if this is the intended behaviour. It does look at bit strange.

--
Jonathan

Jonathan Fine

unread,
Mar 11, 2009, 9:37:22 AM3/11/09
to
ks.vladimir wrote:
> On Mar 11, 3:06 pm, Jonathan Fine <J.F...@open.ac.uk> wrote:
>> Thank you. I have some suggestions which might help you. But first,
>> could you give some examples of 'difficult features' in the documents
>> you are processing.
>>
>> For example, do they change catcodes. (Most maths documents don't.)
>
> Yes, catcodes are almost never touched (except for @). The most
> problematic feature is macros and their arguments. Another problem is
> huge number of packages which defines their own curious macros. I've
> tested LaTeX::TOM, plasTeX and detex on huge number of random articles
> from arxiv.org and the rate of failures was very big due to various
> reasons.

So is it arxiv article that you're wanting to process? There are others
doing this sort of thing.
http://kwarc.eecs.iu-bremen.de/projects/arXMLiv/

They use Bruce Miller's LaTeXML, and their agenda might be similar to yours.

--
Jonathan

Philipp Stephani

unread,
Mar 11, 2009, 9:42:20 AM3/11/09
to
ks.vladimir schrieb:

It always assigns 48 (= the digit zero):

~$ tex
This is TeX, Version 3.1415926 (Web2C 7.5.7)
**\showthe\count0
> 1.
<*> \showthe\count0

?

*\count0=0

*\count0=`\par


! Improper alphabetic constant.
<to be read again>

\par
<*> \count0=`\par

?

*\showthe\count0
> 48.
<*> \showthe\count0

?


This is defined in tex.web:

if cur_val>255 then
begin print_err("Improper alphabetic constant");
@.Improper alphabetic constant@>
help2("A one-character control sequence belongs after a ` mark.")@/
("So I'm essentially inserting \0 here.");
cur_val:="0"; back_error;
end

--
Replace “READ-MY-SIG” by “tcalveu” to answer by mail.

Jonathan Fine

unread,
Mar 11, 2009, 9:46:50 AM3/11/09
to
Philipp Stephani wrote:

> *\count0=`\par
> ! Improper alphabetic constant.
> <to be read again>
> \par
> <*> \count0=`\par
>
> ?
>
> *\showthe\count0
>> 48.
> <*> \showthe\count0
>
> ?
>
>
> This is defined in tex.web:
>
> if cur_val>255 then
> begin print_err("Improper alphabetic constant");
> @.Improper alphabetic constant@>
> help2("A one-character control sequence belongs after a ` mark.")@/
> ("So I'm essentially inserting \0 here.");
> cur_val:="0"; back_error;
> end

It seems to me that Don Knuth gets to keep his money. Inserting \0 is a
reasonable thing to do (although not the only reasonable thing, and
perhaps not the best).

I didn't think of using 'help2' to find out what was going on. Well done.

--
Jonathan

ks.vladimir

unread,
Mar 11, 2009, 10:15:16 AM3/11/09
to

Yes, I understand. Thanks for pointing me to the code, I should have
thought about using help2 :)

ks.vladimir

unread,
Mar 11, 2009, 10:32:24 AM3/11/09
to

Thanks a lot for pointing me to this project, looks very interesting.
The agenda is very similar, but not the same. I need not to convert
articles but to change the articles while keeping them suitable for
further editing by the author, which requires preserving original
formatting (i.e. keeping all characters that are normally thrown away
by TeX input processor). On the other hand my task is a lot easier
because I don't have to attach any meaning to commands except small
subset which is interesting to me - most commands have to be just
parsed but not converted.

Another note is that TeXpp approach is more general: I'm trying to
make it fully compatible with TeX itself. For example, I have
automated tests that compares behavior of TeXpp and TeX (http://
code.google.com/p/texpp/source/browse/#svn/trunk/tests/tex). In the
future it will make TeXpp able to load any tex package.

Heiko Oberdiek

unread,
Mar 11, 2009, 11:16:57 AM3/11/09
to
"ks.vladimir" <ks.vl...@gmail.com> wrote:

> I've found one more strange behavior in TeX: the code \count0=`\zzz
> assigns value 48 to the count0.

You should study the .log file and pressing h for more error
information:

\count0=`\zzz
\message{<\the\count0>}
\end

The result:

| This is TeX, Version 3.14159 (Web2C 7.5.2)
| (./test.tex


| ! Improper alphabetic constant.
| <to be read again>

| \zzz
| l.1 \count0=`\zzz
|
| ? h


| A one-character control sequence belongs after a ` mark.

| So I'm essentially inserting \0 here.
|

| ?
| ! Undefined control sequence.
| <recently read> \zzz
|
| l.1 \count0=`\zzz
|
| ? h
| The control sequence at the end of the top line
| of your error message was never \def'ed. If you have
| misspelled it (e.g., `\hobx'), type `I' and the correct
| spelling (e.g., `I\hbox'). Otherwise just continue,
| and I'll forget about whatever was undefined.
|
| ?
| <48> )
| No pages of output.
| Transcript written on test.log.

The TeXbook writes:

| But TeX actually provides another kind of <number> that makes it
| unnecessary for you to know ASCII at all! The token `_{12} (left quote),
| when followed by any character token or by any control sequence token
| whose name is a single character, stands for TeX's internal code for the
| character in question. For example, \char`b and \char`\b are also
| equivalent to \char98.

Yours sincerely
Heiko <ober...@uni-freiburg.de>

Jonathan Fine

unread,
Mar 11, 2009, 11:50:19 AM3/11/09
to ks.vladimir
ks.vladimir wrote:

>> They use Bruce Miller's LaTeXML, and their agenda might be similar to yours.
>
> Thanks a lot for pointing me to this project, looks very interesting.
> The agenda is very similar, but not the same. I need not to convert
> articles but to change the articles while keeping them suitable for
> further editing by the author, which requires preserving original
> formatting (i.e. keeping all characters that are normally thrown away
> by TeX input processor). On the other hand my task is a lot easier
> because I don't have to attach any meaning to commands except small
> subset which is interesting to me - most commands have to be just
> parsed but not converted.

It seems to me that you want something like a pretty-printer, that might
also be able to do a some macro expansion (and perhaps contraction). In
particular, you'd like to keep author comments.

Is this correct? I ask because I would like something like this also.

--
Jonathan

ks.vladimir

unread,
Mar 11, 2009, 12:06:47 PM3/11/09
to

Essentially yes. I want to replace some words and constructions in the
document but only in certain contexts and when it will not break
anything. As a simple example consider replacing all occurrences of
word "enumerate" in the text by "\href{http://something}{enumerate}".
It should not touch "\begin{enumerate}" but still should replace it in
"\section{About enumerate}". Another requirement is that resulting
document should still be easily editable by the author and it means
preserving comments, newlines, etc.

And what is your usecase ?

Jonathan Fine

unread,
Mar 11, 2009, 1:16:30 PM3/11/09
to ks.vladimir
ks.vladimir wrote:

>> It seems to me that you want something like a pretty-printer, that might
>> also be able to do a some macro expansion (and perhaps contraction). In
>> particular, you'd like to keep author comments.
>>
>> Is this correct? I ask because I would like something like this also.
>
> Essentially yes. I want to replace some words and constructions in the
> document but only in certain contexts and when it will not break
> anything. As a simple example consider replacing all occurrences of
> word "enumerate" in the text by "\href{http://something}{enumerate}".
> It should not touch "\begin{enumerate}" but still should replace it in
> "\section{About enumerate}". Another requirement is that resulting
> document should still be easily editable by the author and it means
> preserving comments, newlines, etc.

You're use case is interesting. Here's my understanding of it. You
want to automatically create metadata, and use that metadata, and allow
the author to edit the metadata.

My use case is rather simpler. I've got a quite a few TeX files with
rather long lines, which I'd like to reformat.

In addition, I'm looking for better tools for parsing and processing TeX
source files.

I'll post soon about what I've been doing in this area.

--
Jonathan

Donald Arseneau

unread,
Mar 11, 2009, 6:06:41 PM3/11/09
to
On Mar 10, 2:38 pm, "ks.vladimir" <ks.vladi...@gmail.com> wrote:
> The reason for suspecting a bug was that \showthe is executed after
> the number was parsed and TeX already knows where is stops.

Indeed, it does know. However that ! had to have been read to
know where the number stops, and the context lines, showing
the token stream, aren't under the control of \showthe at all.
The context lines, with the line-breaks to indicate what has
been read, are displayed juast as for all error messages,
under the (minimal) control of \errorcontextlines even though
\showthe isn't really an error.

Donald Arseneau

ks.vladimir

unread,
Mar 14, 2009, 3:28:00 PM3/14/09
to
On Mar 11, 8:16 pm, Jonathan Fine <J.F...@open.ac.uk> wrote:
> ks.vladimir wrote:
> >> It seems to me that you want something like a pretty-printer, that might
> >> also be able to do a some macro expansion (and perhaps contraction).  In
> >> particular, you'd like to keep author comments.
>
> >> Is this correct?  I ask because I would like something like this also.
>
> > Essentially yes. I want to replace some words and constructions in the
> > document but only in certain contexts and when it will not break
> > anything. As a simple example consider replacing all occurrences of
> > word "enumerate" in the text by "\href{http://something}{enumerate}".
> > It should not touch "\begin{enumerate}" but still should replace it in
> > "\section{About enumerate}". Another requirement is that resulting
> > document should still be easily editable by the author and it means
> > preserving comments, newlines, etc.
>
> You're use case is interesting.  Here's my understanding of it.  You
> want to automatically create metadata, and use that metadata, and allow
> the author to edit the metadata.
>
> My use case is rather simpler.  I've got a quite a fewTeXfiles with

> rather long lines, which I'd like to reformat.
>
> In addition, I'm looking for better tools for parsing and processingTeX
> source files.
TeXpp could be such tool when it will be finished :)
Currently I've already implemented all TeX data types and all internal
parameters and registers. I hope very soon TeXpp will reach a stage
when it could parse and load plain.tex format.

Jonathan Fine

unread,
Mar 14, 2009, 7:03:11 PM3/14/09
to ks.vladimir
ks.vladimir wrote:

>> My use case is rather simpler. I've got a quite a fewTeXfiles with
>> rather long lines, which I'd like to reformat.
>>
>> In addition, I'm looking for better tools for parsing and processingTeX
>> source files.

> TeXpp could be such tool when it will be finished :)
> Currently I've already implemented all TeX data types and all internal
> parameters and registers. I hope very soon TeXpp will reach a stage
> when it could parse and load plain.tex format.

I'm interested in solving relatively simple special cases of this
problem. Such as well-behaved mathematics papers.

>> I'll post soon about what I've been doing in this area.

I'm a bit late on this. If you look at
http://pytex.svn.sourceforge.net/viewvc/pytex/trunk/pytex/sandbox/jfine/macroload/compile.py?revision=58&view=markup
you'll see that it's fairly easy to tokenize an input stream, provided
you can write the regular expressions.

I'd make things like
r'\begin{center}'
a single token.

Once that tokenizing is done, I think pretty-printing, and also the
transformation you want to do, will be straightforward.

I'd also make
r'''\makeatletter

% macro definitions

\makeatother'''
a single token.

--
Jonathan

0 new messages