Documentos en LaTeX que duren

3 views
Skip to first unread message

Fabián

unread,
Aug 20, 2008, 11:46:44 AM8/20/08
to LaTeX-EPN
En la página http://dw.tug.org/pracjourn/ van a encontrar el periódico
en línea del TeX Users Group: The PracTeX Journal, el cual siempre
contiene artículos muy interesantes como el siguiénte:

LaTeX documents that endure

en http://dw.tug.org/pracjourn/2008-2/carnes1/

Fabián Barba

LaTeX documents that endure

The title of this opinion piece may seem a little strange. After all,
if I keep my document source files safely stored away, and have a
LaTeX system to format them, they should always work, right? Well,
sometimes. More often than not, though, a set of LaTeX files more than
a few years old will probably not format the same today as they did in
the original edition.
Some examples

A common occurrence is a book author who is having difficulty making a
new edition. A few years ago the author used LaTeX to format his or
her book, and spent a lot of time getting the correct page breaks and
figure placements. Now, when it's time to revise the book for a second
edition, the author uses the same source files but finds that in many
places the book does not break pages or float figures the same way as
the original edition. There may even be LaTeX errors that definitely
weren't there before.

Another situation that can be confusing is with authors who are
collaborating on a book. Sometimes they find that they cannot format
their source files to get a consistent rendering of their book. The
two authors use the same source files, but Author A formats the book
and gets different page breaks than Author B. In some cases one of the
authors may get an "Undefined control sequence" or other error while
trying to format the files sent by the other author.

In the first case the difference is that time has elapsed between
attempts to format a document, and in the second case two LaTeX
systems are being used in different locations and most likely on
different computer platforms. In the first case the author's LaTeX
system was updated, and something changed which caused different page
breaks and figure floats. In the second case, the LaTeX systems used
by the collaborating authors are probably of the same vintage but
something is causing the authors to get different results.
LaTeX documents that endure

Anyone who has used LaTeX for a few years has a collection of document
source files. Many of these older documents will format successfully
with an up-to-date LaTeX system, but there will be some documents that
will not look the same as the original edition, or they will cause
LaTeX error messages. It sometimes takes a lot of work to get these
older documents to format successfully again.

If the older documents are not critical, such as letters, homework
assignments, or other material that is not published, the fact that
these may not format correctly is not a problem. But for books and
articles and other published material that may be revised and used
again, it would save a lot of time if they could be formatted
identically to the original edition.
LaTeX systems that work the same

When two or more authors are collaborating it would be best if the
various LaTeX systems used could format the exact same document from
the same source files. Another form of collaboration is a LaTeX
consultant working with a client --- it would save a lot of time and
expense if they each could reproduce the same exact document on their
respective systems. But what often happens is that the LaTeX systems
are different. For example, one person is using a Linux distribution
and the other is using a PC distribution. Formatting the same source
files with different LaTeX systems will often give different results.
Your ever-shifting LaTeX system

There are approximately 3,000 LaTeX system files. Some of these files
are uniform throughout most LaTeX systems. The LaTeX "kernel" files
and the common class files, such as article.sty, are identical on most
LaTeX distributions. The files that are the cause of variability are
the contributed class and style files. These files come into play when
a document contains a \documentclass or \usepackage command. There are
so many of these files and so many versions of each, that it is
unlikely that any two LaTeX systems in the world contain the same set
of LaTeX system files at any point in time.

Given all this possible variability in LaTeX system files, is it
possible that documents can be formatted the same over time, and that
LaTeX systems on different platforms can generate identical documents?
A current solution for enduring documents

One LaTeX system that can consistently format document editions is at
Mathematical Science Publishers (http://mathscipub.com) at the
University of California, Berkeley. Their journal issues and articles
can be formatted identically to the original editions at any point in
time. They accomplish this by using a version control system to
guarantee that the original set of LaTeX system files used with a
particular document is always used when formatting that document.

For further information on this system and its goals, see the
Abstract, Long-time preservation strategies for TeX-sourced content,
by Paulo Ney de Souza [1], and the video presentation [2]
Some other possibilities for enduring documents

One of the original goals of the TeX system is that given a valid TeX
formatting program and a set of source files, the identical document
can be formatted at any point in time and on any platform.

The problem is that the LaTeX system itself is a set of source files.
When a LaTeX document is formatted, the TeX program first reads all
the LaTeX system files and then inputs the author's source files.
Since the TeX program is completely consistent, and the author
controls his or her source files, the variable is the LaTeX system
files.

One way to create an enduring document is to keep the current set of
LaTeX system files with the author's source files. This guarantees
that the set of input files is consistent over time. This is
essentially how Mathematical Science Publishers is able to format
documents consistently. Their version control system will pull the
identical set of files from its archive each time a document is
formatted.

Suppose there was a way that all LaTeX system files could be bundled
together at a point in time, and kept along with the document source
files. This would be one way to solve the problem of enduring
documents. There might be some technical issues with organizing file
system directories so that a LaTeX system could process the document.
But if this problem could be solved it would provide a way to have
enduring LaTeX documents.

One problem with this approach is that there would be a large cluster
of files that could be used for one document only. This would work
well for a single author working on a single computer system. However,
if the author wanted to share this document with a collaborator, or
send it to a publisher, there could be problems. The size of the file
bundle would make this cumbersome, and it could be difficult for the
person receiving the file bundle to get it all to work.

One way to reduce the size of the file cluster is to distribute the
set of LaTeX system files to an online archive. Many current LaTeX
systems fetch needed files from online archives. If the set of needed
LaTeX system file versions could be kept in an index, a LaTeX system
could use the index to fetch the needed files from an online archive.
This would reduce the size of the file cluster since authors could
exchange just their source document files along with an index of LaTeX
system files.

In any case, it seems that the problem is one of ensuring that a
unique set of files can be maintained so that a LaTeX document will
format identically over time and across computer systems. Once this
problem is solved, it should be much easier and less time-consuming to
maintain a set of documents that will format consistently.
What are your thoughts?

A few colleagues who read this piece agreed that consistently
reproducible LaTeX documents is something that should be available to
every author. There are various ways this could be done. Give it some
thought and then let us know your ideas.

Some suggested that the LaTeX and TeX community, even with its
slightly inconsistent set of system files, is probably better off than
other documentation users. For example, how do those authors with 15-
year-old MS Word, InDesign, and other files, ever get them to work
again?
Reply all
Reply to author
Forward
0 new messages