using sphinx latex with scientific journal templates

670 views
Skip to first unread message

foobaron

unread,
Jan 29, 2011, 5:38:25 PM1/29/11
to sphinx-dev
Hi,
I've been using Sphinx a long time for software projects and more
recently for writing all my scientific papers. I love being able to
run it right on my iPad to view and edit papers with lots of equations
(HTML output with jsMath), and at the same time generate latex & PDFs
I can circulate to people for comments.

I've found that the problems start when I want to submit the paper to
a journal, because each journals requires the latex source of the
manuscript to conform exactly to their custom latex template, which of
course Sphinx's latex output does not. After studying how Sphinx /
docutils generate latex output, I decided not to mess with that code
at all but instead just wrote a separate Python script to extract the
relevant sections of Sphinx's latex output and insert them into the
latex template supplied by the journal. This is pretty easy, works
great, and is easy to adapt to different journals (so far I've done it
for PNAS, PLOS, and Information; if people want me to post an example
rewriter script I can do that). But it feels like a kluge; it seems
like lots of people want this kind of latex output customization and
we should instead all be using docutils latex templates etc. For
example, if Sphinx alters little details of the latex it outputs, that
might make my re-writer script stop working (because it has to search
for specific strings in Sphinx's output, and transform them). An
example of a latex paper produced this way is viewable here:
http://www.mdpi.com/2078-2489/2/1/17/

I'm totally sold on Sphinx as my long-term solution for being able to
"cross-compile" my content to many different outputs. I would now
like to work on this latex output template customization problem in a
general way that would be usable, extensible and customizable by
others. My question is what approach Sphinx developers would
recommend. I'll quickly note a few requirements:

- this is not just a matter of changing a template file. The Sphinx
code itself contains many fragments of latex that must be customized
for each possible output. For example,
sphinx.ext.mathbase.wrap_displaymath() outputs displaymath using the
non-standard environments "gather" and "split". This does not match
any scientific journal's template, so either this Sphinx code itself
must change, or we are stuck with my klugey approach (external scripts
that rewrite the latex output by Sphinx, inserting it into a journal's
template).

- scientific journals supply precise templates and demand that authors
follow them exactly. Nowadays they input the latex file directly into
their typesetting production, so to ensure a uniform appearance and
standard across all the papers they publish, they *require* that the
manuscript follow the template. As an author, I cannot deviate from
their template at all. For example, they won't permit the inclusion
of any packages other than a specified list that is used by their
template. Unfortunately, much of the Sphinx code for latex support
assumes the use of many custom packages. Again either that code has
to change, or we're stuck with the external re-writer script approach.

- this will require user-settable options for what to do with figures
and tables. For example, during the initial submission / review
phase, PLOS journals want the figures and tables included at the end
of the manuscript (not in the middle of the text where Sphinx inserts
them). However, once the paper is accepted for production, they want
*only* the figure legends included at the end of the manuscript (i.e.
do not include the figure images in the manuscript at all; they must
be submitted separately). With my re-writer script this is easy; it
just takes an optional argument that controls whether it includes the
figures in the output or not.

I'd like to get some advice about what approach people think would be
best. A few options come to mind:

- external rewriter scripts: the rewriter takes a Sphinx latex output
file and a latex template file, and inserts the relevant pieces of
content into the template. This could be designed in a relatively
modular way. I.e. a parser that extracts relevant sections from the
Sphinx latex output; a "standardizer" that removes non-standard things
like "gather" and "split". Then for each output target there could be
a very small amount of code that processes journal-specific options
like "submission format" vs. "production format". While in my
experience such "re-writers" are compact and easy to write, there is
clearly a disadvantage that if Sphinx changes its latex output, that
could break the parser or standardizer.

- for this reason, it might make sense to make the "parser" and
"standardizer" components of this actually part of the sphinx
codebase, along with a bunch of automated tests that ensure they are
working. Since these pieces must be kept in sync with the Sphinx
code, that argues that they should be part of the Sphinx mercurial
tree. Then the set of journal-specific "writer" scripts (which will
be *very* simple, since all they have to do is process various little
options) could either also be included with Sphinx, or distributed as
a separate project.

- "the full Monty": instead of using an external re-writer script, we
modify the Sphinx latex code (e.g. sphinx.ext.mathbase,
sphinx.writers.latex) to make it easy to customize the latex output in
a truly general way (i.e. to produce output that does *not* assume non-
standard packages, that inserts directly into any template file the
user specifies, etc.). Having browsed the sphinx code a bit, this
seems like a fair amount of work, as it requires understanding what
both docutils and sphinx are doing to produce the latex output, and a
fair amount of code is involved...

All comments and suggestions welcome. Sorry for the long post!

-- Chris Lee

Guenter Milde

unread,
Feb 1, 2011, 3:29:47 AM2/1/11
to sphin...@googlegroups.com
On 2011-01-29, foobaron wrote:
> Hi,
> I've been using Sphinx a long time for software projects and more
> recently for writing all my scientific papers. I love being able to
> run it right on my iPad to view and edit papers with lots of equations
> (HTML output with jsMath), and at the same time generate latex & PDFs
> I can circulate to people for comments.

For single documents, I suggest Docutils over Sphinx.
Missing some of the bells and whistles, but easier to set up and
configure.

OTOH, for my taste reStructuredText is too limited for scientific
papers yet: no citation support, no referencing to numbered
tables/figures via labels, ...

> I've found that the problems start when I want to submit the paper to
> a journal, because each journals requires the latex source of the
> manuscript to conform exactly to their custom latex template, which of
> course Sphinx's latex output does not. After studying how Sphinx /
> docutils generate latex output, I decided not to mess with that code
> at all but instead just wrote a separate Python script to extract the
> relevant sections of Sphinx's latex output and insert them into the
> latex template supplied by the journal. This is pretty easy, works
> great, and is easy to adapt to different journals (so far I've done it
> for PNAS, PLOS, and Information; if people want me to post an example
> rewriter script I can do that). But it feels like a kluge; it seems
> like lots of people want this kind of latex output customization and
> we should instead all be using docutils latex templates etc.

Docutils has a "publisher" API for this kind of work. It allows to
write custom back-ends that get the document parts and are free to
combine them as required.

As no-one used it with the LaTeX writer, there might be issues. Report
them on the docutils-devel list (if you want to go this way).

> example, if Sphinx alters little details of the latex it outputs, that
> might make my re-writer script stop working (because it has to search
> for specific strings in Sphinx's output, and transform them).

There is the open task of making Sphinx use the Docutils LaTeX writer to
reduce code duplication. This will definitely change lots of these
details.

...

> I'm totally sold on Sphinx as my long-term solution for being able to
> "cross-compile" my content to many different outputs. I would now
> like to work on this latex output template customization problem in a
> general way that would be usable, extensible and customizable by
> others. My question is what approach Sphinx developers would
> recommend. I'll quickly note a few requirements:

> - this is not just a matter of changing a template file. The Sphinx
> code itself contains many fragments of latex that must be customized
> for each possible output. For example,
> sphinx.ext.mathbase.wrap_displaymath() outputs displaymath using the
> non-standard environments "gather" and "split". This does not match
> any scientific journal's template, so either this Sphinx code itself
> must change, or we are stuck with my klugey approach (external scripts
> that rewrite the latex output by Sphinx, inserting it into a journal's
> template).

I am very cautious regarding changes to the Sphinx/Docutils code just to
please special requirements of some publisher.

* Some of Sphix' LaTeX output is still very specific (due to its
anchestry as a special purpose Python documentation writer). These
parts should be changed to be more "mainstream".

* Some of these customizations go "away from the mainstream" (e.g. "gather"
and "split" are provided by the "amsmath" package that can be regarded
a requirement for every serious math typesetting (except if special
packages are used that emulate its behaviour).

In these cases, I propose subclassing the Sphinx (or Docutils) writer
and do the customizations in the child.

> - scientific journals supply precise templates and demand that authors
> follow them exactly. Nowadays they input the latex file directly into
> their typesetting production, so to ensure a uniform appearance and
> standard across all the papers they publish, they *require* that the
> manuscript follow the template. As an author, I cannot deviate from
> their template at all. For example, they won't permit the inclusion
> of any packages other than a specified list that is used by their
> template. Unfortunately, much of the Sphinx code for latex support
> assumes the use of many custom packages. Again either that code has
> to change, or we're stuck with the external re-writer script approach.

This is (IMO) a clear case for using the publisher (to get the parts)
together with a custom writer (inheriting from the default one).

It may require to emulate (or copy/paste) the definitions of the
"prohibited" packages in the document preamble.


Generally, generating content that strictly follows a template is
troublesome.

> - this will require user-settable options for what to do with figures
> and tables. For example, during the initial submission / review
> phase, PLOS journals want the figures and tables included at the end
> of the manuscript (not in the middle of the text where Sphinx inserts
> them). However, once the paper is accepted for production, they want
> *only* the figure legends included at the end of the manuscript (i.e.
> do not include the figure images in the manuscript at all; they must
> be submitted separately). With my re-writer script this is easy; it
> just takes an optional argument that controls whether it includes the
> figures in the output or not.

Much of this can be done with custom style-sheets and existing latex packages
that can also be embedded by Docutils.

For leaving out objects (e.g. figures), the "skip-elements-with-class"
config option can be used.

> I'd like to get some advice about what approach people think would be
> best. A few options come to mind:

- use what is already available (custom style sheets, config options,
preamble code)

This will trim down the amount of necessary changes.

> - external rewriter scripts:
...

- external back-ends:

the back-end takes from Docutils (or Sphinx) the parts of the
latex output and a LaTeX template



> and inserts the relevant pieces of
> content into the template. This could be designed in a relatively
> modular way. I.e. a parser that extracts relevant sections from the
> Sphinx latex output; a "standardizer" that removes non-standard things
> like "gather" and "split". Then for each output target there could be
> a very small amount of code that processes journal-specific options
> like "submission format" vs. "production format".

As there is no parsing of a complete document, things are less likely
to break (but still not fail save across Docutils/Sphix versions).


> - for this reason, it might make sense to make the "parser" and
> "standardizer" components of this actually part of the sphinx
> codebase, along with a bunch of automated tests that ensure they are
> working. Since these pieces must be kept in sync with the Sphinx
> code, that argues that they should be part of the Sphinx mercurial
> tree. Then the set of journal-specific "writer" scripts (which will
> be *very* simple, since all they have to do is process various little
> options) could either also be included with Sphinx, or distributed as
> a separate project.

This will only make sense, if there is someone committed to keeping
them in sync.

> - "the full Monty": instead of using an external re-writer script, we
> modify the Sphinx latex code (e.g. sphinx.ext.mathbase,
> sphinx.writers.latex) to make it easy to customize the latex output in
> a truly general way (i.e. to produce output that does *not* assume non-
> standard packages, that inserts directly into any template file the
> user specifies, etc.). Having browsed the sphinx code a bit, this
> seems like a fair amount of work, as it requires understanding what
> both docutils and sphinx are doing to produce the latex output, and a
> fair amount of code is involved...

I suppose a mixed approach. Doing "the right thing at the right place"
requires a lot of understanding the code and discussion, but will be best
to improve both Docutils and Sphinx for everyone.

Günter

foobaron

unread,
Feb 1, 2011, 1:38:35 PM2/1/11
to sphinx-dev


On Feb 1, 12:29 am, Guenter Milde <mi...@users.berlios.de> wrote:

> OTOH, for my taste reStructuredText is too limited for scientific
> papers yet: no citation support, no referencing to numbered
> tables/figures via labels, ...

When we first tried writing papers in reST, our first step was to
solve the problem of supporting citations in both HTML and latex. On
the latex side, I just used the bibcite Sphinx extension and modified
it slightly (so it would automatically copy the .bib bibliography
database file to the _build/latex directory). On the HTML side, I
modified bibcite slightly to put each citation name in a <span
class="citation>, and added a _templates/layout.html that just adds a
javascript link to the document's script_files. The javascript
(written by my colleague Marc Harper) just scans the document for
citations, queries our bibliography server to get the citation info,
and inserts a formatted bibliography at the end of the document. All
of this took about an hour to figure out and implement. For both HTML
and latex it uses a single standard bibliography system (i.e. a bibtex
bibliography database), and works across all my platforms: latex
output, HTML on my desktop web browser, HTML on my iPad web browser,
etc.

The key point is that *none* of this required any changes to Sphinx.
So I would say the problem is not that reST doesn't support citations,
or even that Sphinx can't handle them, but only that the simple
extension for doing it (bibcite) hasn't been made a standard part of
the package.

The (trivial) modifications I made to bibcite are here:
https://bitbucket.org/foobaron/bibcite

My _templates/layout.html is just:
{% extends "!layout.html" %}
{% set script_files = script_files + [pathto("http://
citation.marcallenharper.com/site_media/citation_fetch.js", 1)] %}

I too want automatic referencing to numbered figures and tables via
labels. It seems like that should be straightforward to implement.
How hard do you think it would be? I am interested in trying to
implement this.

I'll take some more time to digest your very helpful comments, before
replying further.

-- Chris

gilberto dos santos alves

unread,
Feb 1, 2011, 5:28:27 PM2/1/11
to sphin...@googlegroups.com, cjle...@gmail.com
very interesting article! after reading url that you have posted, have you the rst source to public view, please if possible we try to simulate this in debian public environment of libraries http://www.telecentros.sp.gov.br/. thanks for these very nice kind of use.

2011/1/29 foobaron <cjle...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "sphinx-dev" group.
To post to this group, send email to sphin...@googlegroups.com.
To unsubscribe from this group, send email to sphinx-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sphinx-dev?hl=en.




--
gilberto dos santos alves

(11) 8646-5049
são paulo - SP - Brasil

foobaron

unread,
Feb 27, 2011, 6:58:00 PM2/27/11
to sphinx-dev
Taking your very helpful suggestions and information into account, I
decided to produce a separate tool (ReLaTeX) that "re-templates"
Sphinx latex output into any latex file you want to use as a
template. That is, it extracts the relevant content from Sphinx's
latex output, and uses a standard templating system (Jinja2) to
"inject" that content back into whatever latex file you want to use as
a template. This has worked well for me on a number of papers for
different journals, and hopefully could be useful to other Sphinx
users. It has the added advantage of being useful for "re-templating"
*any* latex input, not just latex produced by Sphinx. I'll post
details about the ReLaTeX release separately so that other Sphinx
users can see them, but I wanted to thank you for your very helpful
feedback.
Reply all
Reply to author
Forward
0 new messages