How to reduce memory usage of sphinx-build

519 views
Skip to first unread message

Neck Acm

unread,
Jan 6, 2012, 9:44:13 AM1/6/12
to sphinx-dev
Hi all, I am new to sphinx,
I am trying to convert massive plain text files to well-organized
html,
not program documentation, just some plain text record.

The source text files is about 116 MB, I use `sphinx-quickstart` to
create config file,
`make html` to build, then `sphinx-build` start to consume lots of
memory,
eventually eats all my memory ( 1.5 GB ), returning MemoryError, abort
the build process

I've tried to build with less files( 9.8 M ), sucessfully create
beautiful html,
means the config is fine, I also modified some option in conf.py like

html_domain_indices = False
html_use_index = False
primary_domain = 'None'

But still failed to build, and the output files are enormous big

Is there any way to reduce memory usage in building and the file size
of output html ?

Thank you :)

Georg Brandl

unread,
Jan 6, 2012, 10:20:08 AM1/6/12
to sphin...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

the question is how you organized your files. When you say "the output
files are enormous big", what exactly do you mean? Of course the HTML
will be bigger than the text file, due to added markup, but that should
not amount to more than, say, a factor of 3.

Are you by chance working with lots of "include" directives, making
one big document out of multiple files?

Georg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iEYEARECAAYFAk8HESgACgkQN9GcIYhpnLDGVwCfRb0L8nrz3rN+kpaIVFLGvq2x
xSMAn3dtBqZmnjP7Fz/Xo2blVovoVbxG
=1Tw/
-----END PGP SIGNATURE-----

Guenter Milde

unread,
Jan 6, 2012, 3:04:21 PM1/6/12
to sphin...@googlegroups.com
On 2012-01-06, Neck Acm wrote:

> Hi all, I am new to sphinx, I am trying to convert massive plain text
> files to well-organized html, not program documentation, just some
> plain text record.

> The source text files is about 116 MB

...


> eventually eats all my memory ( 1.5 GB ), returning MemoryError, abort
> the build process

This is gigantic. How many lines are this?

> I've tried to build with less files( 9.8 M ), sucessfully create
> beautiful html,

In the docutils-users list
http://docutils.sourceforge.net/docs/user/mailing-lists.html#docutils-users
is a recent thread with exactly this problem: even with input files of
about 4 Mb, compilation took half an hour.

Investigation showed that the Docutils parser does not scale well - Docutils
is simply not built for massive input files.

As Sphinx uses Docutils for the document conversion, the problem should be
the same here.

> Is there any way to reduce memory usage in building

* No easy way. You might try to fix some issues in the Docutils
parser/writer but the developers currently have no ressources to deal with
this.

* The recommended way is to split the document into separate documents.
Sphinx provides good support for inter-document links.

> and the file size of output html ?

* does the html size scale linear with the input file size?

* you might consider converting to e.g. epub, which is basically zipped HTML.

Günter

Neck Acm

unread,
Jan 7, 2012, 10:54:12 AM1/7/12
to sphinx-dev
> Hi,
>
> the question is how you organized your files.  When you say "the output
> files are enormous big", what exactly do you mean? Of course the HTML
> will be bigger than the text file, due to added markup, but that should
> not amount to more than, say, a factor of 3.
>
With 116 mb source file, the _build directory is 669 mb after sphinx-
build aborted the build process
265 mb _build/html
404 mb _build/doctrees

I didn't use much markup, because currently I am not so familiar with
rst markup,
But I use lots of Line blocks ( "|" in almost every lines' start ) to
preserve the document layout
(Those document were converted from pdf, haven't figure out how to
transform them to a better layout yet)

Is this may affect the file size ?

> Are you by chance working with lots of "include" directives, making
> one big document out of multiple files?
the index.rst is like the link below
http://pastebin.com/gdqTQgcj

in the source directory, there are 90 rst files
each file size is about 1.5 mb

Thanks for reply :)

Neck Acm

unread,
Jan 7, 2012, 11:47:39 AM1/7/12
to sphinx-dev

> > eventually eats all my memory ( 1.5 GB ), returning MemoryError, abort
> > the build process
>
> This is gigantic. How many lines are this?
90 source rst files
average 12000 line in each file ( traditional chinese)

> > Is there any way to reduce memory usage in building
> * The recommended way is to split the document into separate documents.
>   Sphinx provides good support for inter-document links.
Ok, I will give it a try

> > and the file size of output html ?
soucre directory 116 mb

265 mb _build/html
404 mb _build/doctrees
total 669M

but the output size seems acceptable after Georg Brandl's explanation
> * does the html size scale linear with the input file size?
yes
> * you might consider converting to e.g. epub, which is basically zipped HTML.
epub output directory is 193 mb,
but I want to share these files on the internet, others need to
download the whole epub file,
and install epub reader, seems not a good option for me ......
> Günter
Thank you for reply :)

Roger Binns

unread,
Jan 8, 2012, 1:05:06 AM1/8/12
to sphin...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/01/12 07:54, Neck Acm wrote:
> With 116 mb source file, the _build directory is 669 mb after sphinx-
> build aborted the build process

If you just want to get something built then I suggest trying using a 64
bit operating system (you are almost certainly using a 64 bit capable
processor).

When you use a 32 bit operating system the Python process will be limited
to approximately 2GB of address space (wiggle room varies based on OS,
shared libraries and other details). It looks like it is the address
space that you are running out of.

If you use a 64 bit operating system then address space is considerably
larger. However you may run out of swap space so it is good idea to
configure with lots of it.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8JMhEACgkQmOOfHg372QTjdACgyIgLOxjACRl2GQMoANM4Nfy6
toIAmgMN+nu/qRU57738V5uTHoJd/hc3
=R+F9
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages