Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

PATGEN performance problem on Digital UNIX

0 views
Skip to first unread message

Piet Tutelaers

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to

We want to run PATGEN, a tool for generating TeX hyphenation patterns,
on a big Dutch dictionary. This dictionary is only available on a
Digital UNIX system. But unfortunately the performance of PATGEN on
these machines is dramatically bad.

The sources and data files to reproduce the problem are available from
http://euridice.tue.nl/~ptutelae/TeX/patronen/performance.html

Hopefully somebody can provide help.

--Piet

e-mail: __o Piet Tutelaers
P.T.H.T...@tue.nl _`\<,_ ICTS / Room LG 1.82
phone: +31 (0)40 2474541 (_)/ (_) Eindhoven University of Technology
fax: +31 (0)40 2434438 Save nature P.O. Box 513, 5600 MB Eindhoven, NL

Jeff Sullivan

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
I can take a look at the reproducer that you posted. If it's
a C/C++ compiler problem, we would be very interested in
resolving it.

I'll reply directly if I find out more or need more info.

Thanks,
-Jeff
--
Jeff Sullivan | Compaq Computer Corp. | mailto:j...@zk3.dec.com
Compaq C/C++ | Nashua, NH 03062-2698 | Jeff.S...@compaq.com

Jeff Sullivan

unread,
Mar 21, 2000, 3:00:00 AM3/21/00
to
This looks like a source code problem in the patgen problem.

What is happening is that the stack is being corrupted by a
call to the null_terminate routine. A later call to exit()
ends up in an rendom loop doing "free"s of bad addresses.
This loop is what is consuming most of the time in the bad
case.

I used atom third degree (see man atom, man third) to diagnose
the memory overwrite. This is what it told me when I ran it:

strpascal.c: 25: reading invalid heap at 0x140460000
null_terminate patgen, strpascal.c, line 25
make_c_string patgen, strpascal.c, line 15
xfopen_pas patgen, xfopen-pas.c, line 15
dodictionary patgen, patgen2.c, line 1398
main_body patgen, patgen2.c, line 1698
main patgen, main.c, line 30
__start patgen

When I ran this in the debugger, I found that there were many
calls to make_c_string and did not quickly find where the problem
was occuring.

However, when I made a simple "debugging" change to null_terminate.
I saw this (bad) case:

null_terminate: s=pattmp.5, len=8
null_terminate: i=528

In a "good" case, I expect that i would be len-1. It
certainly is not in this case.

The debugging change I made to null_terminate was this:

int i = 0;
printf("null_terminate: s=%s, len=%d\n", s, strlen(s));
while (*s != ' ')
{ s++; i++; }
printf("null_terminate: i=%d\n", i);

In the (bad) case of pattmp.5, it looks like the string does
NOT have a trailing space and the null_terminate function will
add the NULL termination to some unknown location. It may
sometimes work, but it looks like a source code bug to me.

Let me know if this helps.

Piet Tutelaers

unread,
Mar 31, 2000, 3:00:00 AM3/31/00
to

I have have found a solution for the performance problem. Patgen runs
now in 10 minutes on a 600MHz Digital UNIX system. On our four year old
OSF/Alpha system (32 MBytes memory) it still takes 33 minutes. My
simple Pentium II based Linux system (64 MBytes) outperforms these
alpha's with 8 minutes and 30 seconds. For a detailed description of
the problem and the solution see my HTML page
http://euridice.tue.nl/~ptutelae/TeX/patgen

Hope the solution will find its way in the web2c sources of patgen.

Donald Arseneau

unread,
Mar 31, 2000, 3:00:00 AM3/31/00
to
Piet Tutelaers <P.T.H.T...@tue.nl> writes:

> alpha's with 8 minutes and 30 seconds. For a detailed description of
> the problem and the solution see my HTML page
> http://euridice.tue.nl/~ptutelae/TeX/patgen

No mention of a bad null_terminate routine?

Donald Arseneau as...@triumf.ca

Piet Tutelaers

unread,
Mar 31, 2000, 3:00:00 AM3/31/00
to
Donald Arseneau <as...@triumf.ca> writes:

>Piet Tutelaers <P.T.H.T...@tue.nl> writes:

>> alpha's with 8 minutes and 30 seconds. For a detailed description of
>> the problem and the solution see my HTML page
>> http://euridice.tue.nl/~ptutelae/TeX/patgen

>No mention of a bad null_terminate routine?

I had some discussion with Jeff Sullivan about this problem. The
problem did occur after patgen was finished with the patterns and wants
to write the resulting pattmp.N file (N depends on the wanted
hyphenation level). For some strange reason patgen, only on a Digital
UNIX system, expects a space after the filename "pattmp.N". We could
solve this easily in the patgen2.c version but the bad performance
still was there. When I switched over to the patgen sources from the
latest TeXlive CD I did not see the null_terminate problem anymore.
That is the reason why I forgot about it.

>Donald Arseneau as...@triumf.ca

--Piet

P.T.H.T...@tue.nl

0 new messages