[Caml-list] Understanding usage by the runtime

orb...@ezabel.com

unread,

Dec 30, 2011, 6:46:05 PM12/30/11

to Caml List

I am running a fork of mfp's ocamlmq. I am trying to track down an apparent memory leak, after a week of heavy usage the process is using up 2GB of RAM, and after looking in all the obvious places (all of the containers tracked in the state object in the mq appear to be empty or steady in their size, including queues, data is moving in and out of them but not accumulating, memory consumption should be nearly O(1)) I decided to see what the gc could tell me. I've been printing out gc stats and am wondering if someone can help me grok how the runtime works. In this case I'm trying to compare top to what the gc says. The exact values are below, but the essence of it is that top states the process is using 238m of RAM whereas the gc states that the heap size is about 66megs. My understanding is that the heap size is the total amount of memory that the runtime has under its supervision.

I am running Ubuntu Linux on a 64bit virtual machine running in VMWare on an Ubuntu Linux OS.

My questions are:
- Is top untrustworthy here? Or is the heap_words value not the full story?
- Are there any tools that make it easier for me to track down memory leaks?

Thank you

Top:
VIRT = 250m
RES = 238m

GC:
Heap size according to gc output = 66536k

minor_words: 6843117723
promoted_words: 271790582
major_words: 1752005911
minor_collections: 210261
major_collections: 2358
heap_words: 8516608
heap_chunks: 16
top_heap_words: 30736896
live_words: 6936631
live_blocks: 1083582
free_words: 1579967
free_blocks: 1457
largest_free: 399643
fragments: 10
compactions: 465

--
Caml-list mailing list. Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

David Baelde

unread,

Dec 31, 2011, 4:12:34 AM12/31/11

to orb...@ezabel.com, Caml List

Hi,

My thoughts are not so fresh on that topic but, seeing the figures, it
could very well be that your memory leak is on the C side. Memory
allocated using malloc in C to Caml bindings won't show up in the Gc
info.

By the way, if you're sure that the leak is on the OCaml side, you
might be interested in ocaml-memprof. It's a patch by Fabrice Le
Fessant to get precise info about what kind of object is allocated by
the Gc over time. We've been able to use it a while ago on liquidsoap,
after Samuel Mimram adapted it for Ocaml 3.10 (you can find the
updated patch on his page).

Hope this helps,

David

orb...@ezabel.com

unread,

Dec 31, 2011, 10:33:49 AM12/31/11

to david....@ens-lyon.org, Caml List

Being on the C side is not even something I had considered. In this case, I think the only piece of code not part of the Ocaml RTS that is talking to C is Lwt. It is possible that there is a memory leak in there somewhere. The upside, though, is there seems to be some residue of it in the Ocaml side. My heap numbers given earlier are ~65megs which is significantly larger than it should be, so I might be able to track it down from the Ocaml side.

Thank you for the suggestion of ocaml-memprof.

/M

Richard W.M. Jones

unread,

Jan 1, 2012, 7:45:28 AM1/1/12

to orb...@ezabel.com, david....@ens-lyon.org, Caml List

On Sat, Dec 31, 2011 at 10:33:19AM -0500, orb...@ezabel.com wrote:
> Being on the C side is not even something I had considered. In this
> case, I think the only piece of code not part of the Ocaml RTS that is
> talking to C is Lwt. It is possible that there is a memory leak in
> there somewhere. The upside, though, is there seems to be some
> residue of it in the Ocaml side. My heap numbers given earlier are
> ~65megs which is significantly larger than it should be, so I might be
> able to track it down from the Ocaml side.

A couple of other ideas:

Is compaction disabled? lablgtk disables it unconditionally by
setting the global Gc max_overhead (see also the Gc documentation):

src/gtkMain.ml:
let () = Gc.set {(Gc.get()) with Gc.max_overhead = 1000000}

If something in your program or Lwt does the same, you may get
fragmentation of the C malloc heap or perhaps the OCaml heap. I've
experienced fragmentation in very long-running C programs and it's
insidious because it's very hard to understand what's really going on,
and impossible IME to remedy it.

Second suggestion is to look at /proc/<pid>/maps and/or smaps.
That'll tell you without doubt where the 2GB of memory is being used.
Most likely in the heap from the way you describe it, but it is worth
checking that top isn't reporting something innocuous such as a big
file-backed mmap in one of your C libraries.

Attached is a script that you can adapt to help you interpret
/proc/<pid>/maps.

Rich.

--
Richard Jones
Red Hat

maps.pl

Damien Doligez

unread,

Jan 4, 2012, 1:04:15 PM1/4/12

to Caml List

On 2012-01-01, at 13:44, Richard W.M. Jones wrote:

> Is compaction disabled? lablgtk disables it unconditionally by
> setting the global Gc max_overhead (see also the Gc documentation):
>
> src/gtkMain.ml:
> let () = Gc.set {(Gc.get()) with Gc.max_overhead = 1000000}

Anyone who disables compaction should seriously consider switching
to the first-fit allocation policy:

let () = Gc.set {(Gc.get ()) with Gc.allocation_policy = 1}

This may slow down allocations a bit, but the theory tells us that
it completely prevents unbounded fragmentation of the OCaml heap.

-- Damien

Adrien

unread,

Jan 4, 2012, 1:49:02 PM1/4/12

to Damien Doligez, Caml List

On 04/01/2012, Damien Doligez <damien....@inria.fr> wrote:
> On 2012-01-01, at 13:44, Richard W.M. Jones wrote:
>
>> Is compaction disabled? lablgtk disables it unconditionally by
>> setting the global Gc max_overhead (see also the Gc documentation):
>>
>> src/gtkMain.ml:
>> let () = Gc.set {(Gc.get()) with Gc.max_overhead = 1000000}
>
> Anyone who disables compaction should seriously consider switching
> to the first-fit allocation policy:
>
> let () = Gc.set {(Gc.get ()) with Gc.allocation_policy = 1}
>
> This may slow down allocations a bit, but the theory tells us that
> it completely prevents unbounded fragmentation of the OCaml heap.

I've often wondered what I should do when using lablgtk. It's a pretty
annoying issue and, as far as I understand, OCaml will only return the
memory to the OS upon compactions.

There is however something to do. Quoting lablgtk's README:
> IMPORTANT: Some Gtk data structures are allocated in the Caml heap,
> and their use in signals (Gtk functions internally cally callbacks)
> relies on their address being stable during a function call. For
> this reason automatic compation is disabled in GtkMain. If you need
> it, you may use compaction through Gc.compact where it is safe
> (timeouts, other threads...), but do not enable automatic compaction.

I've never really understood why it worked: I'm surprised the GC would
update addresses stored in the C side of GTK.

If you want to use timeouts, the following should work:
Glib.Timeout.add ~ms:0 ~callback:(fun () -> Gc.compact (); false)

I guess that Glib.Idle.add would work too.

That guarantees nothing about the time the compaction will run however
and in practice, adding a timeout or an idle and starting a long-running
and uninterruptible computation right after will severely delay the
compaction.

I haven't had the time to try it but it should be possible to pump
glib's event loop by hand in order to trigger the compaction. Another
possibility would be to spawn a thread and use a mutex to wait until the
compaction is done. And in case you're using Lwt, well, I don't know but
I'd expect the callback to be callable whenever threads can be switched.

Maybe that if it were possible to have a callback called each time the
runtime would like to do a compaction, this could be automated.

Regards,
Adrien Nader

John Carr

unread,

Jan 4, 2012, 2:38:19 PM1/4/12

to Adrien, Caml List

> There is however something to do. Quoting lablgtk's README:
> > IMPORTANT: Some Gtk data structures are allocated in the Caml heap,
> > and their use in signals (Gtk functions internally cally callbacks)
> > relies on their address being stable during a function call. For
> > this reason automatic compation is disabled in GtkMain. If you need
> > it, you may use compaction through Gc.compact where it is safe
> > (timeouts, other threads...), but do not enable automatic compaction.
>
> I've never really understood why it worked: I'm surprised the GC would
> update addresses stored in the C side of GTK.

I think the problem is, a C function can invoke a callback that calls
ocaml code that moves the object being operated on by the C function.
Because the C function is precompiled it does not register its copy of
the pointer as a GC root. When the callback returns the C function's
pointer is invalid.

This should be fixable with another level of indirection. Finalization
can free the C object. I infer from reading the source that the extra
level of indirection is considered an unacceptable penalty.

orb...@ezabel.com

unread,

Jan 7, 2012, 12:43:51 AM1/7/12

to Richard W.M. Jones, david....@ens-lyon.org, Caml List

Hello everyone!

I would like to reassure you that all is right in the world. After a large number of tests I finally tracked the problem down to an entry in a Hashtbl not being deleted. It was a one line fix!

One question does remain though: In my tests I would do some work that would cause ~1GB of RAM to be under control of the Gc. Then I would do something that, at the time I didn't understand, would case the Gc to compact all of its memory and go back down to less than 1 meg, but the RES value in top would only drop to about 400 megs. Is this expected behavior? I know the malloc implementation might hold on to some data for itself but 400x the amount of memory the Ocaml RTS actually needs seems a bit excessive. I know there is a bug report floating around from Martin about large Arrays not being properly freed, in this case my issue was with a Hashtbl. I do not know if Hashtbl is implemented with an Array underneath, but could that be the cause of my overhead if so?

Thank you

> <maps.pl>

Richard W.M. Jones

unread,

Jan 8, 2012, 1:45:31 PM1/8/12

to orb...@ezabel.com, david....@ens-lyon.org, Caml List

On Sat, Jan 07, 2012 at 12:43:22AM -0500, orb...@ezabel.com wrote:
> One question does remain though: In my tests I would do some work
> that would cause ~1GB of RAM to be under control of the Gc. Then I
> would do something that, at the time I didn't understand, would case
> the Gc to compact all of its memory and go back down to less than 1
> meg, but the RES value in top would only drop to about 400 megs. Is
> this expected behavior? I know the malloc implementation might hold
> on to some data for itself but 400x the amount of memory the Ocaml RTS
> actually needs seems a bit excessive.

I would say it's unusual, but not necessarily unexpected.

You have to understand (a) how C malloc works, (b) under what
conditions memory may be given back to the OS, and (c) whether it's
even necessary to give back memory to the OS.

Now (a) depends on what malloc implementation you're using. We can
assume it's Linux glibc, although even that has changed several times,
so it really depends on which precise version of glibc you've got, but
for this discussion I'll assume it's the latest version. All of these
details could be completely different for other operating systems ...

glibc currently has three strategies to allocate memory.

For small amounts (under 512 bytes in the current impl) it has a
linked list of cached blocks of fixed sizes that are used to satisfy
requests quickly.

For medium amounts (512 - 128K, adjustable) it has a complex
algorithm described as a combination of best fit and LRU.

For large allocations (128K and over, but tunable), it uses mmap.

Furthermore, for allocations < 128K, when more core is required from
the OS, it will either use sbrk(2) to increase the heap linearly, or
it will use mmap(2) to allocate >= 1MB chunks scattered around the
address space.

Basically what this means for (b) is that it's phenomenally hard to
predict if it will be possible to give back memory to the OS. It
depends on how the OCaml runtime requested it (what size, what order
of requests). It will depend on how random C allocations (libraries
and the OCaml runtime) happen to be spread around, since those cannot
be moved and will prevent memory from being given back. And it will
depend on the malloc control structures themselves which also cannot
be moved and their location will be highly dependent on the order in
which requests were made (maybe even not predictable if you have a
multithreaded program). It may be that just one struct is preventing
a whole mmapped area from being given back.

So you might think that your program "just allocated 1GB of RAM and
freed it" at the OCaml level, but what's happening at the allocator
level is likely to be far more complex.

And that brings us to (c): does it even make sense to give back memory
to the OS? Here's the news: the OS doesn't need you to give back
memory. Because of virtual memory and swap, the OS will quite happily
take back your memory whenever it wants without asking you. It could
even be more efficient this way.

Rich.

--
Richard Jones
Red Hat

Richard W.M. Jones

unread,

Jan 8, 2012, 2:01:13 PM1/8/12

to orb...@ezabel.com, david....@ens-lyon.org, Caml List

On Sun, Jan 08, 2012 at 06:45:05PM +0000, Richard W.M. Jones wrote:
> And that brings us to (c): does it even make sense to give back memory
> to the OS?

I forgot to mention one way in which this is more efficient: If you
munmap a piece of memory and later decide you need more memory so you
call mmap, then the kernel has to give you zeroed memory. You
probably didn't want zeroed memory, but you pay the penalty anyway.

(The converse of this is that if your unused memory is swapped out,
then it has to be written to disk and read back, which is even less
efficient.)

There is an madvise flag "MADV_DONTNEED" which is better than munmap +
mmap, although not as optimal as it could be. See links below.

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00733.html
http://www.reddit.com/r/programming/comments/dp5up/implementations_for_many_highlevel_programming/c120n77

Probably the OCaml GC should be setting madvise hints anyway.

While we're at it, the GC may be able to cooperate better with the
new(-ish) Transparent Hugepages feature of Linux.

I wonder if anyone has looked into these things to see if there are
any quick wins to be had?

Török Edwin

unread,

Jan 8, 2012, 5:34:19 PM1/8/12

to caml...@inria.fr

On 01/08/2012 09:00 PM, Richard W.M. Jones wrote:
> On Sun, Jan 08, 2012 at 06:45:05PM +0000, Richard W.M. Jones wrote:
>> And that brings us to (c): does it even make sense to give back memory
>> to the OS?

BTW you can try calling malloc_stats(), it should print statistics
on how much total memory malloc() is using, and how much of that is reclaimable/unreclaimable free memory.

Sometimes you may have this situation (fragmented memory):
| malloced bytes | .... large range of free bytes ... | malloced bytes |

AFAIK glibc is not able to give back the middle portion to the OS, you'll
have to use your own memory pool allocator that can munmap() the middle bit
when no longer needed.

A quick way to see if the increased mem usage you see in top is due to malloc() is
to switch temporarely to a different malloc impl. You can try linking with jemalloc, or tcmalloc.

>
> I forgot to mention one way in which this is more efficient: If you
> munmap a piece of memory and later decide you need more memory so you
> call mmap, then the kernel has to give you zeroed memory. You
> probably didn't want zeroed memory, but you pay the penalty anyway.

Also mmap() and munmap() are quite expensive in threaded apps because they
have to take a process-wide lock in the kernel, and that lock also used
to be held during page-fault I/O. I think thats why glibc "caches"
the mmap arenas. This doesn't really matter for OCaml though, as it already has
a process-wide lock for OCaml threads.

>
> (The converse of this is that if your unused memory is swapped out,
> then it has to be written to disk and read back, which is even less
> efficient.)
>
> There is an madvise flag "MADV_DONTNEED" which is better than munmap +
> mmap, although not as optimal as it could be. See links below.

You can also try to map fresh anonymous memory over the already mapped
area, saves an munmap call.

>
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00733.html
> http://www.reddit.com/r/programming/comments/dp5up/implementations_for_many_highlevel_programming/c120n77
>
> Probably the OCaml GC should be setting madvise hints anyway.

It should mmap()/munmap() instead of malloc/realloc/free in that case, right?
Which probably wouldn't be a bad idea, as you don't get the fragmentation issues
as much as you do with malloc.

>
> While we're at it, the GC may be able to cooperate better with the
> new(-ish) Transparent Hugepages feature of Linux.

Does it suffice to allocate the major heap in 2MB increments to take advantage of that?

Best regards,
--Edwin

orb...@ezabel.com

unread,

Jan 8, 2012, 5:51:07 PM1/8/12

to Richard W.M. Jones, david....@ens-lyon.org, Caml List

Thank you for the detailed response Rich.

Isn't the goal of compaction to keep all of these blocks of memory as close as possible? I should have noted the fragmentation of my heap after compaction, but it seems unlikely that my < 1meg of actual data could be fragmented across 400megs worth of chunks.

> Here's the news: the OS doesn't need you to give back
> memory. Because of virtual memory and swap, the OS will quite happily
> take back your memory whenever it wants without asking you.

For most cases this is true, however in my case (which is not the usual case), my OS has no swap. We actually prefer things to fail than to be swapped because we are doing computations could take months if we get into a swapping situation. I'm no linux expert so our solution is to not have swap and to keep our VMs light when it comes to I/O. Perhaps this is a poor solution but it does change things for us.

Thanks again Rich.

Richard W.M. Jones

unread,

Jan 8, 2012, 6:02:59 PM1/8/12

to orb...@ezabel.com, david....@ens-lyon.org, Caml List

On Sun, Jan 08, 2012 at 05:50:40PM -0500, orb...@ezabel.com wrote:
> Isn't the goal of compaction to keep all of these blocks of memory
> as close as possible? I should have noted the fragmentation of my
> heap after compaction, but it seems unlikely that my < 1meg of actual
> data could be fragmented across 400megs worth of chunks.

I might not have been clear: memory can only be given back to the OS
at the C / malloc allocator level. OCaml compaction has nothing to do
with this because C allocations (and data structures used by malloc
itself) cannot ever be moved.

However you can get a clearer picture if you look at /proc/<pid>/maps
or smaps and also if you have a debugging malloc implementation.

> > Here's the news: the OS doesn't need you to give back
> > memory. Because of virtual memory and swap, the OS will quite happily
> > take back your memory whenever it wants without asking you.
>
> For most cases this is true, however in my case (which is not the
> usual case), my OS has no swap. We actually prefer things to fail
> than to be swapped because we are doing computations could take months
> if we get into a swapping situation. I'm no linux expert so our
> solution is to not have swap and to keep our VMs light when it comes
> to I/O. Perhaps this is a poor solution but it does change things for
> us.

Sure, this is a perfectly valid case, we have many customers who use
RHEL like this.

orb...@ezabel.com

unread,

Jan 8, 2012, 6:26:34 PM1/8/12

to Richard W.M. Jones, david....@ens-lyon.org, Caml List

In this case, as far as I know, all of the memory I was creating was under Ocaml, not a C extension which is why I would have expected the memory to be given back to malloc which would then give it back to the OS. I understand that the malloc implementation might decide to retain some data, the overhead of 400x the amount of active data just startled me and I feel like something else was going on. I used your maps.pl script and what the runtime seemed to be doing is growing the anonymous mapped region and then moving it into the heap and shrinking anonymous mapped region, so in my case the heap grew to 1 Gig then down to 450 megs as teh Gc could finally free the data. I don't know what was really going on under the hood though and unfortunately not sure how to figure it out. Thankfully, at this point it's just a curiosity not a production problem.

Thank you

Richard W.M. Jones

unread,

Jan 9, 2012, 9:32:21 AM1/9/12

to Török Edwin, caml...@inria.fr

On Mon, Jan 09, 2012 at 12:33:49AM +0200, Török Edwin wrote:
> On 01/08/2012 09:00 PM, Richard W.M. Jones wrote:
> >Probably the OCaml GC should be setting madvise hints anyway.
>
> It should mmap()/munmap() instead of malloc/realloc/free in that
> case, right? Which probably wouldn't be a bad idea, as you don't
> get the fragmentation issues as much as you do with malloc.

Simply ensuring the mallocs are aligned to pages (ie using
posix_memalign) should be sufficient to allow madvise to be used. As
you say it may be better to use mmap for other reasons.

> >While we're at it, the GC may be able to cooperate better with the
> >new(-ish) Transparent Hugepages feature of Linux.
>
> Does it suffice to allocate the major heap in 2MB increments to take advantage of that?

Yes.

Check this file: /sys/kernel/mm/transparent_hugepage/enabled
If it says:

[always] advise never

then any contiguous anonymous (ie. malloc) virtual memory mapping
which is 2MB or larger and aligned to 2MB is a candidate for being
turned into THPs. It's thus very easy to use and most processes get
it for free.

Certain kernel operations cause huge pages to be split. Things like
creating a futex in a page. So you have to be a bit careful. This
talk by my colleague explains THP (in the context of KVM, but applies
to any process):

http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf

Rich.

--
Richard Jones
Red Hat

Richard W.M. Jones

unread,

Jan 9, 2012, 4:07:49 PM1/9/12

to Török Edwin, caml...@inria.fr

While we're on the subject of mmap tricks, here's another one that may
be worth benchmarking. (The trick comes from examining the glibc
sources).

If you mmap a large contiguous area of memory that is more than you
immediately need, mmap it PROT_NONE. The reason is that Linux won't
swap out this memory. When you need to use the memory, you call
mprotect PROT_READ|PROT_WRITE (on the part that you need) and use it
as normal.