Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] OCaml runtime using too much memory in 64-bit Linux

24 views
Skip to first unread message

Adam Chlipala

unread,
Nov 7, 2007, 12:29:18 PM11/7/07
to caml...@inria.fr
I've encountered a problem where certain OCaml programs use orders of
magnitude more RAM when compiled/run in 64-bit Linux instead of 32-bit
Linux. Some investigation led to the conclusion that the difference has
to do with the size of OCaml page tables. (Here I mean the page tables
maintained by the OCaml runtime system, not any OS stuff.)

A program that should be using just a few megabytes of RAM ends up using
200+ MB to store a page table. It seems that a C macro is defined by
default on 64-bit Linux to use mmap() instead of malloc(). Ironically,
a comment says that this was done to avoid being given blocks of memory
that are very far apart from each other, forcing the creation of overly
large page tables. It's ironic because that is exactly the problem that
is showing up now with mmap(). It ends up called twice for the program
I'm looking at, and the two addresses it returns are far enough apart to
lead to creation of a 200 MB page table.

Has anyone else experienced this problem? Would the runtime system need
to be changed to avoid it?

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Gerd Stolpmann

unread,
Nov 7, 2007, 1:21:11 PM11/7/07
to Adam Chlipala, caml...@inria.fr
Am Mittwoch, den 07.11.2007, 12:28 -0500 schrieb Adam Chlipala:
> I've encountered a problem where certain OCaml programs use orders of
> magnitude more RAM when compiled/run in 64-bit Linux instead of 32-bit
> Linux. Some investigation led to the conclusion that the difference has
> to do with the size of OCaml page tables. (Here I mean the page tables
> maintained by the OCaml runtime system, not any OS stuff.)
>
> A program that should be using just a few megabytes of RAM ends up using
> 200+ MB to store a page table. It seems that a C macro is defined by
> default on 64-bit Linux to use mmap() instead of malloc(). Ironically,
> a comment says that this was done to avoid being given blocks of memory
> that are very far apart from each other, forcing the creation of overly
> large page tables. It's ironic because that is exactly the problem that
> is showing up now with mmap(). It ends up called twice for the program
> I'm looking at, and the two addresses it returns are far enough apart to
> lead to creation of a 200 MB page table.
>
> Has anyone else experienced this problem? Would the runtime system need
> to be changed to avoid it?

We are using O'Caml on 64 bit Linux, and aren't aware of such problems.

Did you observe a debug GC message that proves it? 200 MB means that an
address space of 200M * 4K = 8E is covered.

Also think of Linux modifications that do address randomization, i.e.
prevent that contiguous addresses are allocated.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Adam Chlipala

unread,
Nov 7, 2007, 2:12:54 PM11/7/07
to Gerd Stolpmann, caml...@inria.fr
Gerd Stolpmann wrote:
> Am Mittwoch, den 07.11.2007, 12:28 -0500 schrieb Adam Chlipala:
>
>> I've encountered a problem where certain OCaml programs use orders of
>> magnitude more RAM when compiled/run in 64-bit Linux instead of 32-bit
>> Linux. Some investigation led to the conclusion that the difference has
>> to do with the size of OCaml page tables. (Here I mean the page tables
>> maintained by the OCaml runtime system, not any OS stuff.)
>>
>> ...

>>
>
> We are using O'Caml on 64 bit Linux, and aren't aware of such problems.
>
> Did you observe a debug GC message that proves it? 200 MB means that an
> address space of 200M * 4K = 8E is covered.
>

Here's one run, cut off after allocation seems to settle down:

OCAMLRUNPARAM="v=12" ./program_name.exe
Growing heap to 960k bytes
Growing page table to 204151332 entries
Growing heap to 1440k bytes
Growing heap to 1920k bytes

> Also think of Linux modifications that do address randomization, i.e.
> prevent that contiguous addresses are allocated.

That would definitely cause trouble. Thanks for the suggestion; I'll
look into it.

Samuel Mimram

unread,
Nov 8, 2007, 7:57:08 AM11/8/07
to caml...@inria.fr, Gerd Stolpmann
Hi,

Adam Chlipala wrote:
> Gerd Stolpmann wrote:
>> Am Mittwoch, den 07.11.2007, 12:28 -0500 schrieb Adam Chlipala:
>>
>>> I've encountered a problem where certain OCaml programs use orders of
>>> magnitude more RAM when compiled/run in 64-bit Linux instead of
>>> 32-bit Linux. Some investigation led to the conclusion that the
>>> difference has to do with the size of OCaml page tables. (Here I
>>> mean the page tables maintained by the OCaml runtime system, not any
>>> OS stuff.)
>>>
>>> ...
>>>
>>
>> We are using O'Caml on 64 bit Linux, and aren't aware of such problems.
>>
>> Did you observe a debug GC message that proves it? 200 MB means that an
>> address space of 200M * 4K = 8E is covered.

We are observing the same memory consumption problems with
liquidsoap[1]. On 32-bits machines ps says that it takes around 50M /
10M of VSZ / RSS whereas on 64-bits machines it takes 200M / 100M which
is much much bigger!

Here is the initial stack an heap allocation on 32-bits archs:

% OCAMLRUNPARAM="v=12" ./liquidsoap 'output.dummy(blank())'
Growing heap to 480k bytes
Growing page table to 3040 entries
Growing heap to 720k bytes
Growing page table to 3354 entries


Growing heap to 960k bytes

Growing page table to 3532 entries
Growing heap to 1200k bytes
Growing page table to 4251 entries


Growing heap to 1440k bytes

Growing page table to 4416 entries
Growing heap to 1680k bytes
Growing page table to 4478 entries


Growing heap to 1920k bytes

Growing page table to 4540 entries

And on 64-bits archs:

$ OCAMLRUNPARAM="v=12" ./liquidsoap 'output.dummy(blank())'


Growing heap to 960k bytes

Growing page table to 118256149 entries


Growing heap to 1440k bytes

I'm not sure I fully understand what these figures are. Is it expected
for the page table to be bigger with that many orders of magnitude on
64-bits archs?

Thanks!

Samuel.

[1] http://savonet.sf.net/

Romain Beauxis

unread,
Nov 8, 2007, 3:52:12 PM11/8/07
to caml...@yquem.inria.fr
Le Wednesday 07 November 2007 18:28:49 Adam Chlipala, vous avez écrit :
> A program that should be using just a few megabytes of RAM ends up using
> 200+ MB to store a page table.  It seems that a C macro is defined by
> default on 64-bit Linux to use mmap() instead of malloc().  Ironically,
> a comment says that this was done to avoid being given blocks of memory
> that are very far apart from each other, forcing the creation of overly
> large page tables.  It's ironic because that is exactly the problem that
> is showing up now with mmap().  It ends up called twice for the program
> I'm looking at, and the two addresses it returns are far enough apart to
> lead to creation of a 200 MB page table.


Unfortunatly, you can't compile without that option
on amd64 archs, you'll get this error:

> boot/ocamlrun boot/ocamlc -nostdlib -I boot -linkall -o ocaml.tmp toplevel/toplevellib.cma toplevel/topstart.cmo
>Fatal error: exception Out_of_memory


Romain

Romain Beauxis

unread,
Nov 13, 2007, 11:21:27 PM11/13/07
to caml...@inria.fr
Hi all !

Le Wednesday 07 November 2007 20:12:16 Adam Chlipala, vous avez écrit :
> Gerd Stolpmann wrote:
> > Am Mittwoch, den 07.11.2007, 12:28 -0500 schrieb Adam Chlipala:
> >> I've encountered a problem where certain OCaml programs use orders of
> >> magnitude more RAM when compiled/run in 64-bit Linux instead of 32-bit
> >> Linux. Some investigation led to the conclusion that the difference has
> >> to do with the size of OCaml page tables. (Here I mean the page tables
> >> maintained by the OCaml runtime system, not any OS stuff.)
> >>
> >> ...
> >
> > We are using O'Caml on 64 bit Linux, and aren't aware of such problems.
> >
> > Did you observe a debug GC message that proves it? 200 MB means that an
> > address space of 200M * 4K = 8E is covered.
>
> Here's one run, cut off after allocation seems to settle down:
>
> OCAMLRUNPARAM="v=12" ./program_name.exe
> Growing heap to 960k bytes
> Growing page table to 204151332 entries
> Growing heap to 1440k bytes
> Growing heap to 1920k bytes

Following Sam's answer on similar issue with our application, here are two
compared outputs for the same informations:

-- On i386:
5:13 toots@selassie ~% OCAMLRUNPARAM="v=12" liquidsoap 'output.dummy(blank())'
Growing heap to 480k bytes
Growing page table to 2648 entries
Growing heap to 720k bytes
Growing page table to 2710 entries


Growing heap to 960k bytes

Growing page table to 2815 entries

-- On amd64:
5:12 toots@ras-macintosh ~/sources/svn/savonet/trunk/liquidsoap/src%
OCAMLRUNPARAM="v=12" ./liquidsoap 'output.dummy(blank())'


Growing heap to 960k bytes

Growing page table to 104640820 entries


Growing heap to 1440k bytes
Growing heap to 1920k bytes

It seems that the "Growing page table to 104640820 entries" in amd64's log is
quite enourmeous, compared to similar values for i386.

Sorry, I can't debug more, I'm not expert at all on this topic.
However, I'll be glad to dig more if indicated what to do..


Romain

Vladimir Shabanov

unread,
Nov 14, 2007, 7:04:04 AM11/14/07
to caml...@inria.fr
2007/11/14, Romain Beauxis <romain....@gmail.com>:

> Following Sam's answer on similar issue with our application, here are two
> compared outputs for the same informations:
>
> -- On i386:
> 5:13 toots@selassie ~% OCAMLRUNPARAM="v=12" liquidsoap 'output.dummy(blank())'
> Growing heap to 480k bytes
> Growing page table to 2648 entries
> Growing heap to 720k bytes
> Growing page table to 2710 entries
> Growing heap to 960k bytes
> Growing page table to 2815 entries
>
> -- On amd64:
> 5:12 toots@ras-macintosh ~/sources/svn/savonet/trunk/liquidsoap/src%
> OCAMLRUNPARAM="v=12" ./liquidsoap 'output.dummy(blank())'
> Growing heap to 960k bytes
> Growing page table to 104640820 entries
> Growing heap to 1440k bytes
> Growing heap to 1920k bytes
>
> It seems that the "Growing page table to 104640820 entries" in amd64's log is
> quite enourmeous, compared to similar values for i386.

I also have problems with my application on amd64. The difference is
that I have additional memory allocated only in bytecode executable.

native amd64:
$ OCAMLRUNPARAM="v=12" ./_build/game.opt


Growing heap to 960k bytes

Growing page table to 72391 entries
.. (program output stripped)


Growing heap to 1440k bytes

Growing page table to 90522 entries
..

bytecode amd64:
$ OCAMLRUNPARAM="v=12" ./_build/game
Initial stack limit: 8192k bytes
Growing gray_vals to 32k bytes


Growing heap to 960k bytes

Growing page table to 141518746 entries
..


Growing heap to 1440k bytes

..

It gives me 80--300MB of additional memory allocated (virt & res).
Interestingly enough the number of page table entries is different
from run to run (hence the non-constant additional memory size). In
native executable page table entries count is constant.

Xavier Leroy

unread,
Nov 14, 2007, 7:56:20 AM11/14/07
to Vladimir Shabanov, caml...@inria.fr
Hello,

Concerning this issue with large page tables on 64-bit architectures,
I opened a problem report on the bug-tracking system to help gather
more information. I'd like to ask all members of this list that
reported the problem to kindly visit

http://caml.inria.fr/mantis/view.php?id=4448

and add the required information as a note. That will help
pinpointing the problem.

Some more explanation on what's going on. The Caml run-time system
needs to keep track of which memory areas belong to the major heap,
and uses a page table for this purpose, with a dense representation
(an array of bytes). If the major heap areas are closely spaced, this
table is very small compared with the size of the heap itself.
However, if these areas are widely spaced in the addressing space, the
table can get big.

For 32-bit platforms, this isn't much of a problem since the maximum
size of the page table is 1 megabytes. For 64-bit platforms, the sky
is the limit, however. So far, the only 64-bit platform where this
has been a problem in the past is Linux with glibc, where blocks
allocated by malloc() can come either from sbrk() or mmap(), two areas
that are spaced several *exa*bytes apart. We worked around the
problem by allocating all major heap areas directly with mmap(),
obtaining closely spaced addresses.

Apparently, this trick is no longer working on some systems, but I
need to understand better which ones exactly. (I suspect some Linux
distros that applied address randomization patches to the stock Linux
kernel.) So, please provide feedback in the BTS.

If the problem is confirmed, there are several ways to go about it.
One is to implement the page table with a sparse data structure,
e.g. a hash table. However, the major GC and some primitives like
polymorphic equality perform *lots* of page table lookups, so a
performance hit is to be expected. The other is to revise OCaml's
data representations so that the GC and polymorphic primitives no
longer need to know which pointers fall in the major heap. This seems
possible in principle, but will take quite a bit of work and break a
lot of badly written C/OCaml interface code. You've been warned :-)

- Xavier Leroy

Romain Beauxis

unread,
Nov 14, 2007, 9:17:14 AM11/14/07
to caml...@yquem.inria.fr
Hi Xavier !

Le Wednesday 14 November 2007 13:55:40 Xavier Leroy, vous avez écrit :
> Apparently, this trick is no longer working on some systems, but I
> need to understand better which ones exactly.  (I suspect some Linux
> distros that applied address randomization patches to the stock Linux
> kernel.)  So, please provide feedback in the BTS.

While printing the required data for your bug report, we found out that:
/proc/sys/kernel/randomize_va_space
can be set to 0

Afterward, the issue does not appear anymore.


Quite a good workaround until a real fix appear.. :)


Romain

Markus Mottl

unread,
Nov 14, 2007, 10:56:51 AM11/14/07
to caml...@inria.fr
On 11/14/07, Xavier Leroy <Xavier...@inria.fr> wrote:
> Concerning this issue with large page tables on 64-bit architectures,
> I opened a problem report on the bug-tracking system to help gather
> more information. I'd like to ask all members of this list that
> reported the problem to kindly visit
>
> http://caml.inria.fr/mantis/view.php?id=4448

I have just added a small note there describing a very simple fix,
which, though it may not be totally general, might be good enough and
requires very little effort to implement. If anybody wants to give it
a try...

Best regards,
Markus

--
Markus Mottl http://www.ocaml.info markus...@gmail.com

Stefan Monnier

unread,
Nov 14, 2007, 11:29:38 AM11/14/07
to caml...@inria.fr
> and uses a page table for this purpose, with a dense representation
> (an array of bytes). If the major heap areas are closely spaced, this
[...]

> For 32-bit platforms, this isn't much of a problem since the maximum
> size of the page table is 1 megabytes. For 64-bit platforms, the sky

How about allocating this array of bytes via mmap and then leave it
uninitialized (relying on POSIX's guarantee that it's already
initialized to zeros)?
This way you can easily have a 4GB "dense" table which doesn't use much
RAM since most of the 4GB will be mapped (via copy-on-write) to the same
"zero page".


Stefan


PS: Obviously this is orthogonal to the potential change in page-size
recommended by Brian.

Brian Hurt

unread,
Nov 14, 2007, 11:36:48 AM11/14/07
to Stefan Monnier, caml...@inria.fr
Stefan Monnier wrote:

>How about allocating this array of bytes via mmap and then leave it
>uninitialized (relying on POSIX's guarantee that it's already
>initialized to zeros)?
>This way you can easily have a 4GB "dense" table which doesn't use much
>RAM since most of the 4GB will be mapped (via copy-on-write) to the same
>"zero page".
>
>
>

Even on a system like linux, which optimistically allocates memory (i.e.
the actually underlying memory isn't allocated until you actually touch
it), once you read the page, it has to actually exist in memory. Even
using 1 byte per 4M page, mapping a whole 64-bit memory space requires 4
TB of ram. And many systems do not optimistically allocate memory, as
it causes a lot of problems (for example, allocations can return false
positives, which then segfault when you first touch the memory because
it can't really be allocated).

Brian

Lionel Elie Mamane

unread,
Nov 14, 2007, 11:46:19 AM11/14/07
to Stefan Monnier, caml...@inria.fr
On Wed, Nov 14, 2007 at 11:22:46AM -0500, Stefan Monnier wrote:
>> and uses a page table for this purpose, with a dense representation
>> (an array of bytes). If the major heap areas are closely spaced, this
> [...]
>> For 32-bit platforms, this isn't much of a problem since the maximum
>> size of the page table is 1 megabytes. For 64-bit platforms, the sky

> How about allocating this array of bytes via mmap and then leave it
> uninitialized (relying on POSIX's guarantee that it's already
> initialized to zeros)?
> This way you can easily have a 4GB "dense" table which doesn't use much
> RAM since most of the 4GB will be mapped (via copy-on-write) to the same
> "zero page".

I think this will fail on a GNU/Linux 2.6 system with
/proc/sys/vm/overcommit_memory set to 2, or any other system that
behaves as Linux with overcommit to 2. (Meaning, it actually reserves
place from the swap+ram pool, so that any mmapped/sbrk'd memory can
actually be used. It permits the kernel to guarantee that even if all
programs actually use all the memory they allocate, it will be able to
serve them - albeit slowly (swap use)).

In particular, the addressing space of a 64 bit machine is, well... 64
bits, by definition. For 4kiB = 2^12 B pages, one thus needs a table
of size 2^(64-12) = 2^52 bytes, that is 4 EB. That is, on any machine
with less than that of memory (and overcommit to 2), the program will
not run. Even at one bit (and not byte) per page, that is still
16PB...

Big pages don't get you out of the problem. 4MB pages only buy you a
factor 1024, that is 4PB and 16GB.


--
Lionel

Lionel Elie Mamane

unread,
Nov 14, 2007, 12:08:56 PM11/14/07
to Stefan Monnier, caml...@inria.fr
On Wed, Nov 14, 2007 at 05:45:44PM +0100, Lionel Elie Mamane wrote:

> In particular, the addressing space of a 64 bit machine is, well... 64
> bits, by definition. For 4kiB = 2^12 B pages, one thus needs a table
> of size 2^(64-12) = 2^52 bytes, that is 4 EB. That is, on any machine
> with less than that of memory (and overcommit to 2), the program will
> not run. Even at one bit (and not byte) per page, that is still
> 16PB...

> Big pages don't get you out of the problem. 4MB pages only buy you a
> factor 1024, that is 4PB and 16GB.

I got my prefixes all wrong... It is 4PiB, 16TiB, 4TiB and 16GiB...

Lionel Elie Mamane

unread,
Nov 14, 2007, 12:09:17 PM11/14/07
to Brian Hurt, Stefan Monnier, caml...@inria.fr
On Wed, Nov 14, 2007 at 11:36:16AM -0500, Brian Hurt wrote:
> Stefan Monnier wrote:

>> How about allocating this array of bytes via mmap and then leave it
>> uninitialized (relying on POSIX's guarantee that it's already
>> initialized to zeros)?
>> This way you can easily have a 4GB "dense" table which doesn't use much
>> RAM since most of the 4GB will be mapped (via copy-on-write) to the same
>> "zero page".

> Even on a system like linux, which optimistically allocates memory
> (i.e. the actually underlying memory isn't allocated until you
> actually touch it), once you read the page, it has to actually exist
> in memory.

This may not be a problem (but only people that know the system
intimately will *know*), it is plausible that the ocaml runtime system
would not read any entry in the table corresponding to a page that is
not allocated. Or maybe not.

--
Lionel

Stefan Monnier

unread,
Nov 14, 2007, 12:27:31 PM11/14/07
to Brian Hurt, caml...@inria.fr
> Even on a system like linux, which optimistically allocates memory (i.e. the
> actually underlying memory isn't allocated until you actually touch it),
> once you read the page, it has to actually exist in memory.

It exists in memory: it's the zero page (a page that contains all zero
bytes). And it's the same physical (RAM) page used for all pages that
have been allocated but not yet written. So as long as you don't write
to it, it shouldn't use any RAM space.

Of course, it may cost in swap use (depending on optimistic allocation
and the use of MAP_NORESERVE), and it will cost in kernel memory because
the kernel has to maintain the process's page table.

But it seems like a good quick fix, which preserves the advantages of
a dense array of bytes (i.e. fast and simple lookup, compact
representation using less cache space, ...).


Stefan

0 new messages