Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

malloc w/ mmap() in multithreaded code

175 views
Skip to first unread message

ldb

unread,
Feb 5, 2007, 4:54:38 PM2/5/07
to
I have a long running program that eventually crashes when valloc()
returns a 0. This program is relatively non-trivial as it's written in
Ada, is multithreaded, has alot of SSE routines. A memory leak would
be the most obvious cause but this appears to be more sinister then a
simple memory leak.

So... ater alot of running around and searching through the code I
found an
anomaly that I'd like to explain and understand if it's the cause of
valloc() returning a 0. It may be unrelated to my problem above, but I
can't be sure. I've recreated this anomaly in a very simple program.

Basically, mallinfo() seems to produce garbage results in multi-
threaded code. In a very single program where I fire up 2 pthreads
have them malloc() and free a bunch of stuff, once all the threads are
finished, I print out malloc_stats() and mallinfo() and I seem to get
garbage for the mmap() related fields.

Most of the time I run the code, the hblks and hblkshd fields of
mallinfo() come back 0 and 0, but a fair percentage of the time I get
a strange answer where hblks is either 2, 5, -3 or -1 or something
like that. It's almost like there's a race condition inside the
malloc()/free() code that updates these fields.

This is out-of-the-box Ubuntu with gcc 4.1.2 and libc6 2.4

I've included the code at the bottom, but here is an example output:
Arena 0:
system bytes = 135168
in use bytes = 288
Arena 1:
system bytes = 135168
in use bytes = 1128
Total (incl. mmap):
system bytes = 4045234176
in use bytes = 4044965256
max mmap regions = 1
max mmap bytes = 250003456

hblks : -1 hblkshd : -250003456

The mmap() and hblk (from mallinfo()) data seems to be totally
corrupted, to me. (In this particular case, they've gone negative). In
this code, the "answer" should be 0 since everything has been freed,
should it not? Are these numbers supposed to be meaningful?

I've looked through the malloc code and it appears that n_mmaps of the
malloc state isn't protected by a mutex? Am I doing something wrong?

(this code has a pretty large malloc, but similar results with more
reasonable sized mallocs like 10 megs.. enough to trip the mmap()
threshold)
---------------------------------
Built with:
gcc main.c -lpthread

Here is the code I'm running:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <malloc.h>

void *detection_thread();

main (int argc, char *argv[])
{
pthread_t thread1, thread2;
pthread_t thread3, thread4;
struct mallinfo mi;

// spawn threads
pthread_create(&thread1, NULL, detection_thread, NULL);
pthread_create(&thread2, NULL, detection_thread, NULL);

// wait for threads to return
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);

printf("********************************\n");
malloc_stats();
mi = mallinfo();
printf("hblks : %d hblkshd : %d\n", mi.hblks, mi.hblkhd);

}

void *detection_thread()
{
int *slappy;
int i;
struct mallinfo mi;

for (i = 0; i < 5000; i++)
{
slappy = malloc(1000*1000*250);

if (slappy == NULL)
{
printf("CRASH\n");
exit(1);
}
free(slappy);
}

printf("Done!\n");

Kaz Kylheku

unread,
Feb 6, 2007, 1:23:02 AM2/6/07
to
On Feb 5, 1:54 pm, "ldb" <ldb_nos...@hotmail.com> wrote:
> Built with:
> gcc main.c -lpthread

This isn't right. It might be wortwhile to repeat the test after
rebuilding with:

gcc -phtread main.c


Wolfram Gloger

unread,
Feb 6, 2007, 6:08:24 AM2/6/07
to
"ldb" <ldb_n...@hotmail.com> writes:

> I have a long running program that eventually crashes when valloc()
> returns a 0. This program is relatively non-trivial as it's written in
> Ada, is multithreaded, has alot of SSE routines. A memory leak would
> be the most obvious cause but this appears to be more sinister then a
> simple memory leak.

When valloc() returns 0, you are out of memory, that is for sure.
Causes can include:

- hitting a ulimit (ulimit -v)
- running out of address space (32bit platform, in particular with many
threads as their stacks take up lots of space) -> try to obtain
/proc/self/maps when valloc() returns 0 to make sure
- fragmentation
- memory leak

> So... ater alot of running around and searching through the code I
> found an
> anomaly that I'd like to explain and understand if it's the cause of
> valloc() returning a 0. It may be unrelated to my problem above, but I
> can't be sure.

I'd say this is very probably unrelated.

> Basically, mallinfo() seems to produce garbage results in multi-

...


> hblks : -1 hblkshd : -250003456
>
> The mmap() and hblk (from mallinfo()) data seems to be totally
> corrupted, to me. (In this particular case, they've gone negative). In
> this code, the "answer" should be 0 since everything has been freed,
> should it not? Are these numbers supposed to be meaningful?
>
> I've looked through the malloc code and it appears that n_mmaps of the
> malloc state isn't protected by a mutex? Am I doing something wrong?

No, those particular statistics are simply unreliable with multiple
threads. It would be far too expensive to add a mutex just for this
number.

Regards,
Wolfram.

Robert Redelmeier

unread,
Feb 6, 2007, 8:27:01 AM2/6/07
to
Wolfram Gloger <wm...@dent.med.uni-muenchen.de> wrote in part:

> "ldb" <ldb_n...@hotmail.com> writes:
>
>> I have a long running program that eventually crashes when valloc()
>> returns a 0. This program is relatively non-trivial as it's written in
>> Ada, is multithreaded, has alot of SSE routines. A memory leak would
>> be the most obvious cause but this appears to be more sinister then a
>> simple memory leak.
>
> When valloc() returns 0, you are out of memory, that is for sure.
> Causes can include:
>
> - hitting a ulimit (ulimit -v)
> - running out of address space (32bit platform, in particular with
> many threads as their stacks take up lots of space) -> try to obtain
> /proc/self/maps when valloc() returns 0 to make sure
> - fragmentation
> - memory leak


Agreed. In addition, any pgm that expects and needs to be
long running (not always the same as running a long time)
should use "checkpointing" techniques (write intermediates
to files) and have restart capabilities.

Mem mgmt has to be especially tight, and OOM handled.
These pgms can start to resemble an OS.

-- Robert

phil-new...@ipal.net

unread,
Feb 6, 2007, 12:05:14 PM2/6/07
to

Sounds like Firefox. If I run one instance, which is it's normal mode
of operating (try to start a new one, and it reconnects the window you
open over to the existing process, so you really still just have one
instance running). As you have many windows and tabs open over hours
and days of usage, you (certainly, I) do not want to quit and restart
as that means lots of "where was I on that" states being lost.

Yet Firefox just isn't anywhere near the quality of Linux with regard
to things like memory management. That's not do say I would expect to
have such a program reach that level. But they could do better. The
point here, though, is to agree with you that quite man userland programs
do need to be considered much more than a simple processing element or
even more than just a simple app (like editing a file). That or else
they need to architecture different (like make Firefox really deal with
multiple instances so each doesn't have to step around all the memory
fragments allocated to windows in other instances).

Constrast that with short lived programs like web CGI (which I write in
the C language). In these cases, I know the program is very short lived
and I don't even bother calling free() and make more use of alloca()
instead of malloc(), if I even allocate anything at all. Very very soon
return will invoke exit() which invokes _exit() which calls the kernel
exit() syscall which cleans up virtual memory and makes all those unfreed
allocations moot.

--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
| first name lower case at ipal.net / spamtrap-200...@ipal.net |
|------------------------------------/-------------------------------------|

Robert Redelmeier

unread,
Feb 6, 2007, 5:50:11 PM2/6/07
to
phil-new...@ipal.net wrote in part:

> Yet Firefox just isn't anywhere near the quality of Linux with
> regard to things like memory management. That's not do say I
> would expect to have such a program reach that level. But they
> could do better. The point here, though, is to agree with you
> that quite man[y] userland programs do need to be considered much

> more than a simple processing element or even more than just
> a simple app (like editing a file). That or else they need
> to architecture different (like make Firefox really deal with
> multiple instances so each doesn't have to step around all the
> memory fragments allocated to windows in other instances).

I close FF frequently, so don't have as much trouble.
But then, I pay for it in startups. Still, FF ought to be
tighter. It might well be the single app on a kiosk machine.

> Constrast that with short lived programs like web CGI (which I
> write in the C language). In these cases, I know the program
> is very short lived and I don't even bother calling free() and
> make more use of alloca() instead of malloc(), if I even allocate
> anything at all. Very very soon return will invoke exit() which
> invokes _exit() which calls the kernel exit() syscall which cleans
> up virtual memory and makes all those unfreed allocations moot.

My first instinct is caution: you do not know how your pgms
might be used [adapted] by others. OTOH, by making your
code egregiously sloppy [no offense], misuse will quickly fail.
90% fixes are worse than no fix at all when 99.9+% is required.

-- Robert

phil-new...@ipal.net

unread,
Feb 6, 2007, 8:58:07 PM2/6/07
to
On Tue, 06 Feb 2007 22:50:11 GMT Robert Redelmeier <red...@ev1.net.invalid> wrote:

| phil-new...@ipal.net wrote in part:
|> Yet Firefox just isn't anywhere near the quality of Linux with
|> regard to things like memory management. That's not do say I
|> would expect to have such a program reach that level. But they
|> could do better. The point here, though, is to agree with you
|> that quite man[y] userland programs do need to be considered much
|> more than a simple processing element or even more than just
|> a simple app (like editing a file). That or else they need
|> to architecture different (like make Firefox really deal with
|> multiple instances so each doesn't have to step around all the
|> memory fragments allocated to windows in other instances).
|
| I close FF frequently, so don't have as much trouble.
| But then, I pay for it in startups. Still, FF ought to be
| tighter. It might well be the single app on a kiosk machine.

I actually tricked it into letting me start multiple instances.
So I can quit each instance when done, and the other stay. And
each doesn't flood the other's data structures, either.


|> Constrast that with short lived programs like web CGI (which I
|> write in the C language). In these cases, I know the program
|> is very short lived and I don't even bother calling free() and
|> make more use of alloca() instead of malloc(), if I even allocate
|> anything at all. Very very soon return will invoke exit() which
|> invokes _exit() which calls the kernel exit() syscall which cleans
|> up virtual memory and makes all those unfreed allocations moot.
|
| My first instinct is caution: you do not know how your pgms
| might be used [adapted] by others. OTOH, by making your
| code egregiously sloppy [no offense], misuse will quickly fail.
| 90% fixes are worse than no fix at all when 99.9+% is required.

That's a valid concern. Fortunately these have little or no general
public use, or are just too obviously short lived (check referrer
before delivering an image, or deliver a bogus image if the referrer
is not the correct domain).

techs...@gmail.com

unread,
Feb 11, 2007, 9:16:47 AM2/11/07
to
I am also facing serious problems with firefox, espcially when using
with
flash player, but just cant stop using it.
Any other browser which can match it?

Coming to the solutions to the problem, i use ff 2.0 and every time
my system
hangs, i will log into some other machine and telnet to my machine
just to
kill ff. Once i do this, my system will return to normalty and when i
open ff
again, it asks me whether to restore old session. Choosing this ff
will restore every tab which was open, with the same page,offset.
I think this is a better solution because we do not want to loose the
open tabs
very often.

On Feb 7, 3:50 am, Robert Redelmeier <red...@ev1.net.invalid> wrote:
> phil-news-nos...@ipal.net wrote in part:

0 new messages