MathTran and texd: Use fork() instead of running a daemon?

Jonathan Fine

unread,

May 11, 2007, 9:27:14 AM5/11/07

to

At the recent EuroTeX conference Jerzy Ludwichowski kindly gave
a talk on MathTran (http://www.mathtran.org) on my behalf. He
told me that some people suggested extending TeX so that it
could fork() itself, rather than running TeX as a daemon.

I can see the attraction of this approach. It makes it possible
to use without change existing macros. So I did some investigations,
and wrote some test code. Roughly speaking, my conclusion is that
running TeX as a daemon can give a 60-fold speed up, while forking
can give you a doubling of speed. (Sorry, it's not possible to do
both and get a 120-fold speed up.)

I've started a thread on this on a Sourceforge forum
http://sourceforge.net/forum/forum.php?thread_id=1733446&forum_id=310581

Please respond to this thread, or the forum, or both as you please.

Below is the text of the Sourceforge forum post.
===
In this thread we discuss a possible alternative to running TeX as a daemon.

Running TeX as a daemon allows one to avoid the start-up cost, which is
about 100ms (milliseconds). TeX can then typeset a formula in a perhaps 2ms.
Some people suggest instead using an extension of TeX that forks itself on
demand. One advantage of doing this is that it allows one to use code that
changes TeX's global state. This includes many existing macros.

I've written some code to test the performance cost of forking.
http://texd.cvs.sourceforge.net/texd/py/misc/time_fork.py?view=markup

Basically, running this test code shows that forking can be used with
moderate penalty, provide the child process does not change memory very
much. We open a process and give it 4 Mb of memory (this is about right, I
think, for LaTeX). We then time changing one byte in every 4096 in the
process. Then we time forking, and then we time forking AND modifying the
memory.

Here are the timing results:
Modify memory in original process: 2ms (Python is an interpreted program.)
Time to fork and do nothing: 23ms (This is the moderate penalty.)
Time to fork and modify: 83ms (Thus, 60ms to modify, as opposed to 2ms in
parent.)

The reason for the extra time in the child is that when its memory is
modified, the OS has allocate a new piece of memory, and 'stitch it in' to
the child's memory.

Starting and closing LaTeX takes about 170ms. Thus, very roughly, forking
could give a doubling of speed. Running TeX as a daemon gives a 60-times or
so speedup with LaTeX.

--
Jonathan

David Kastrup

unread,

May 11, 2007, 9:38:36 AM5/11/07

to

"Jonathan Fine" <J.F...@open.ac.uk> writes:

I think that to get the most from the forking approach, it would be
beneficial to have the compaction of "\dump" occur as close as
possible before the fork.

So I'd suggest trying to use mylatex.ltx in order to dump and reread a
format file including LaTeX as well as a suitable document preamble
just before performing the fork.

--
David Kastrup

Jonathan Fine

unread,

May 11, 2007, 10:27:16 AM5/11/07

to

"David Kastrup" <d...@gnu.org> wrote

> I think that to get the most from the forking approach, it would be
> beneficial to have the compaction of "\dump" occur as close as
> possible before the fork.

Interesting idea. I can see that it might help a bit, by making
the problem of the child having 'copy-on-write' memory smaller.

My guess is that it would reduce 60ms to perhaps 40ms, as
compared to the daemon value of 2ms.

--
Jonathan

David Kastrup

unread,

May 11, 2007, 10:34:12 AM5/11/07

to

"Jonathan Fine" <J.F...@open.ac.uk> writes:

But the daemon does not offer sufficient separation from previous
contents.

This could be amended by changes in the executable: it should help
quite a bit if one were to maintain a -1 level in the table of
equivalents not affected by global assignments and just initialized
once at the time of a pseudo-fork. Then setting up another task would
involve copying this -1 level to the global level 0 again.

However, global states and fonts and hyphenation patterns and similar
things not routed through the table of equivalents would still leak
information.

--
David Kastrup

Jonathan Fine

unread,

May 11, 2007, 10:57:15 AM5/11/07

to

"David Kastrup" <d...@gnu.org> wrote

> But the daemon does not offer sufficient separation from previous
> contents.
>
> This could be amended by changes in the executable: it should help
> quite a bit if one were to maintain a -1 level in the table of
> equivalents not affected by global assignments and just initialized
> once at the time of a pseudo-fork. Then setting up another task would
> involve copying this -1 level to the global level 0 again.
>
> However, global states and fonts and hyphenation patterns and similar
> things not routed through the table of equivalents would still leak
> information.

I don't think it is possible to do such a trick with the
TeX daemon used by MathTran:
http://www.mathtran.org/cgi-bin/secplain

Try to load a font, and it will refuse.
http://www.mathtran.org/cgi-bin/secplain?tex=%5Cfont%5Cxxx+cmr17

The same goes if you try to make an assignment.

The macros that do this trick can be found at
http://texd.cvs.sourceforge.net/texd/secsty/

In addition, the input string is filtered - if it contains
an ASCII NUL, it is rejected.

--
Jonathan

Michael D. Sofka

unread,

May 11, 2007, 11:31:43 AM5/11/07

to

"Jonathan Fine" <J.F...@open.ac.uk> writes:

> Here are the timing results:
> Modify memory in original process: 2ms (Python is an interpreted program.)
> Time to fork and do nothing: 23ms (This is the moderate penalty.)
> Time to fork and modify: 83ms (Thus, 60ms to modify, as opposed to 2ms in
> parent.)
>
> The reason for the extra time in the child is that when its memory is
> modified, the OS has allocate a new piece of memory, and 'stitch it in' to
> the child's memory.
>
> Starting and closing LaTeX takes about 170ms. Thus, very roughly, forking
> could give a doubling of speed. Running TeX as a daemon gives a 60-times or
> so speedup with LaTeX.
>
> --
> Jonathan
>
>

In the Perl CGI world we use either FastCGI, which pre-forks copies of
the perl process, Or modPerl, which builds Perl into Apache (which
itself pre-forks).

There are advantages and disadvantages to each approach. FastCGI is
usually easier to use for existing Perl code. modPerl provides hooks
between Apache and Perl, but usually requires the application be built
for modPerl from the ground up. (These are generalities not meant to
launch a Perl CGI religious war on comp.lang.tex.)

With either approach, it is usually necessary to include a reset-state
command, or set a limit to how many times a process is reused, since, as
with TeX Macro packages, many Perm Modules are non-reentrant.

Mike

--
Michael D. Sofka sof...@rpi.edu
C&MT Sr. Systems Programmer, Postmaster pro tem
Rensselaer Polytechnic Institute, Troy, NY. http://www.rpi.edu/~sofkam/

Jonathan Fine

unread,

May 11, 2007, 11:41:50 AM5/11/07

to

"Michael D. Sofka" <sof...@rpi.edu>

> In the Perl CGI world we use either FastCGI, which pre-forks copies of
> the perl process, Or modPerl, which builds Perl into Apache (which
> itself pre-forks).
>
> There are advantages and disadvantages to each approach. FastCGI is
> usually easier to use for existing Perl code. modPerl provides hooks
> between Apache and Perl, but usually requires the application be built
> for modPerl from the ground up. (These are generalities not meant to
> launch a Perl CGI religious war on comp.lang.tex.)
>
> With either approach, it is usually necessary to include a reset-state
> command, or set a limit to how many times a process is reused, since, as
> with TeX Macro packages, many Perm Modules are non-reentrant.

Hello Mike

Thank you for these comments, which help situate the matter in context.

You've made a key point. ModPerl and the TeX daemon, in general terms,
require the application to be built from the ground up with that
use in mind.

I also appreciate your comment about the possibility of an unproductive
conflict between the advocates of the two approaches.

--
Jonathan