Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Crashes in malloc() and free() on RedHat 9.0

37 views
Skip to first unread message

Chris Pritchard

unread,
May 7, 2003, 5:40:57 AM5/7/03
to
I am experiencing a strange problem once again with RedHat Linux 9.0,
where my programs all segfault, usually in free():

#0 0x4207494e in malloc_consolidate () from /lib/tls/libc.so.6
#1 0x42074838 in _int_free () from /lib/tls/libc.so.6
#2 0x420734d6 in free () from /lib/tls/libc.so.6

If however I include the following function in one of my C files..

extern int Bodge(void)
{
FILE *pFile = fdopen(1,"w+");
}

..then the problem no longer occurs. Note that the args to fdopen can
be any old junk, and that the function Bodge() is never called.

My programs do not include calls to malloc() or free() directly, my
memory functions are in a shared library that all my programs link
with.

I had this exact same problem a couple of years ago with just a few of
my programs, and the Bodge() function hid the problem. Now, every
single program needs this function. I assume it is affecting the
linking in some way, but I can't spot anything obvious.

Anyone have any ideas?

Thanks,

Chris

Kevin Easton

unread,
May 7, 2003, 7:05:06 AM5/7/03
to

You could be corrupting the heap... try running your program under
valgrind.

- Kevin.

Jens.T...@physik.fu-berlin.de

unread,
May 7, 2003, 7:10:46 AM5/7/03
to
Chris Pritchard <ch...@bakbone.co.uk> wrote:
> I am experiencing a strange problem once again with RedHat Linux 9.0,
> where my programs all segfault, usually in free():

> #0 0x4207494e in malloc_consolidate () from /lib/tls/libc.so.6
> #1 0x42074838 in _int_free () from /lib/tls/libc.so.6
> #2 0x420734d6 in free () from /lib/tls/libc.so.6

> If however I include the following function in one of my C files..

> extern int Bodge(void)
> {
> FILE *pFile = fdopen(1,"w+");
> }

> ..then the problem no longer occurs. Note that the args to fdopen can
> be any old junk, and that the function Bodge() is never called.

> My programs do not include calls to malloc() or free() directly, my
> memory functions are in a shared library that all my programs link
> with.

Usually, this is a sure sign that there's some memory corruption
going on in your program, either in your code or in the library.
It's extremely unlikely that you found a real bug in malloc() or
free(). Adding a function or changing some code sometimes makes
the problem seem to go away by changing the layout of the program
and the places where it stores its data, thus moving the place where
you now get the segfault from to a different position which then
doesn't get corrupted anymore - but the problem is still there...

Regards, Jens
--
_ _____ _____
| ||_ _||_ _| Jens.T...@physik.fu-berlin.de
_ | | | | | |
| |_| | | | | | http://www.physik.fu-berlin.de/~toerring
\___/ens|_|homs|_|oerring

Kasper Dupont

unread,
May 7, 2003, 7:33:32 AM5/7/03
to
Chris Pritchard wrote:
>
> Anyone have any ideas?

http://www.daimi.au.dk/~kasperd/comp.os.linux.development.faq.html#SIGSEGV

--
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:aaa...@daimi.au.dk
for(_=52;_;(_%5)||(_/=5),(_%5)&&(_-=2))putchar(_);

xrix

unread,
May 7, 2003, 1:37:12 PM5/7/03
to
Thanks for the responses so far. I tried valgrind over the executable, and
there were no errors. Also, the program works perfectly when under valgrind,
but does not work when run normally.

I do not think I have found a bug in malloc(), but I also do not think that
I am corrupting memory. It seems to be related to the fact that there are no
direct references to standard c library calls from the C file itself, only
indirectly via my shared library.

For example if I make my main function call fopen() to open a non-existing
file, there is no crash. If I move the fopen() into a function in my shared
library, and call that function from main(), there is now a crash in
fopen(). After this, if I put my Bodge() function (still unused) into my
main C file, the crash goes away.

What happens differently when I include this Bodge() function? I have run
ldd over the executables and there seems to be no difference in the
dependencies.

I had the same problem before when RedHat moved from 5.x up to 6.0, which
IIRC was a major change in libc version. Could this be something to do with
the problem this time?

Thanks,

Chris

"Chris Pritchard" <ch...@bakbone.co.uk> wrote in message
news:c137d2d1.03050...@posting.google.com...

Jens.T...@physik.fu-berlin.de

unread,
May 7, 2003, 3:08:57 PM5/7/03
to
xrix <ch...@vivid.force9.co.uk> wrote:
> Thanks for the responses so far. I tried valgrind over the executable, and
> there were no errors. Also, the program works perfectly when under valgrind,
> but does not work when run normally.

> I do not think I have found a bug in malloc(), but I also do not think that
> I am corrupting memory. It seems to be related to the fact that there are no
> direct references to standard c library calls from the C file itself, only
> indirectly via my shared library.

> For example if I make my main function call fopen() to open a non-existing
> file, there is no crash. If I move the fopen() into a function in my shared
> library, and call that function from main(), there is now a crash in
> fopen(). After this, if I put my Bodge() function (still unused) into my
> main C file, the crash goes away.

> What happens differently when I include this Bodge() function? I have run
> ldd over the executables and there seems to be no difference in the
> dependencies.

I am still convinced you have some memory corruption. You probably are
writing via a stray pointer somewhere into memory. When you add your
Bodge() function you just move some critical place in memory, which you
normally hit with the stray pointer, out of the way, so the segfault
does not happen. If you move your Bodge() function into the shared
library this critical place won't get moved, so the segfault still
happens. I am not familiar enough with how valgrind exactly works to
be able to explain why it doesn't find the problem and even seems to
make it into a Heisenbug, but there are usually some ways how such
tols can be fooled and you may have hit one. Did you try yet to set
MALLOC_CHECK_ to 2 and/or switch on libc's internal checking by
calling mcheck() at the start of your program (it's described in the
info pages for libc in detail)?

Allen McIntosh

unread,
May 7, 2003, 4:51:32 PM5/7/03
to
In article <Kybua.15949$9C6.8...@wards.force9.net>,

xrix <ch...@vivid.force9.co.uk> wrote:
>Thanks for the responses so far. I tried valgrind over the executable, and
>there were no errors. Also, the program works perfectly when under valgrind,
>but does not work when run normally.

This is evidence you are doing something you shouldn't.
It doesn't look to me like valgrind can tell if you write to allocated
chunk B when you are supposed to write to allocated chunk A.

>For example if I make my main function call fopen() to open a non-existing
>file, there is no crash.

fopen() likely calls malloc(). This will change the memory allocation
pattern of your program.

>What happens differently when I include this Bodge() function?

It adds a string, changing the pattern of data.

Paul Pluzhnikov

unread,
May 8, 2003, 12:05:40 AM5/8/03
to
"xrix" <ch...@vivid.force9.co.uk> writes:

> Thanks for the responses so far. I tried valgrind over the executable, and
> there were no errors. Also, the program works perfectly when under valgrind,
> but does not work when run normally.

This *may* be an indication that one of your shared libraries
overrides some libc function(s) it should not. Since valgrind
intercepts calls to malloc() and friends, your 'erroneous' entries
never get called, and it all works.

If your program also works with LD_PRELOAD=/lib/libc.so.6,
my theory would become even more probable.

I would examine output from "nm -D *.so | grep ' [TDWB] '"
and check that *none* of the symbols appear in any of libc.so.6
libpthread.so.0, /lib/ld-linux.so.2 or /lib/libdl.so.2

Cheers,
--
In order to understand recursion you must first understand recursion.

Chris Pritchard

unread,
May 12, 2003, 7:26:25 AM5/12/03
to
I have resolved this issue, but still do not clearly understand it. My main
application did not make any calls to libc, but my shared library did. I
had forgotten to link my shared library against libc, but my application
did link against it. I reduced my application to a single call to a
function in my shared library (so there is no chance of memory corruption
here), and that library function called a libc function (in this case
fopen()). It would seem that the C library function then corrupted memory
themselves, since they would randomly crash from then on.

By adding -lc to the link line for my shared library there was no further
problem.

What on earth is going on??? The functions in libc were obviously being
resolved or I would have gotten runtime link errors...

Thanks,

Chris

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Paul Pluzhnikov

unread,
May 12, 2003, 11:03:12 AM5/12/03
to
Chris Pritchard <ch...@bakbone.co.uk> writes:

> I have resolved this issue, but still do not clearly understand it.

[...]


> It would seem that the C library function then corrupted memory
> themselves, since they would randomly crash from then on.

A very unlikely explanation, especially given that your app is
"valgrind-clean".

> By adding -lc to the link line for my shared library there was no further
> problem.

So the issue has just "gone away" by random chance, only to reappear
when next glibc update comes about.

> What on earth is going on???

What makes you believe it is glibc that corrupts memory, and not
your library?

Have you checked whether your library overrides any of glibc
functions, like I suggested in another followup?

> The functions in libc were obviously being
> resolved or I would have gotten runtime link errors...

The change you made is extremely unlikely to change anything,
except slightly shift memory layout. The bug is still there with
99.99% probablity, and will show up again.

If you want to know what the bug is, mail me your shared library,
your test program, and a note written on the back of twenty-one
dollar bill ;-) [A postcard from the UK would also do ;-]

jimmy

unread,
May 15, 2003, 10:13:21 AM5/15/03
to
Chris Pritchard <ch...@bakbone.co.uk> wrote in message news:<opro14qb...@news.gxn.co.uk>...

> I have resolved this issue, but still do not clearly understand it.


I can only imagine thats because you're stupid, as is shown by your
decision to use a free operating system. Thats why you have to post on
news groups, whilst I simply phone for support.


>
> What on earth is going on??? The functions in libc were obviously being
> resolved or I would have gotten runtime link errors...
>

I would help, but no one who punctuates with three question marks
deserves it.

Kevin Easton

unread,
May 15, 2003, 11:19:17 AM5/15/03
to
jimmy <jconn...@hotmail.com> wrote:
> Chris Pritchard <ch...@bakbone.co.uk> wrote in message news:<opro14qb...@news.gxn.co.uk>...
>> I have resolved this issue, but still do not clearly understand it.

I think I might have worked it out. Just to clarify, the issue is:

* Main program doesn't use libc functions.

* Loadable shared object does use libc functions.

* Former dynamically loads the latter. Program crashes at call into
libc functions.

* If you modify the main program to require a libc symbol, even if it
never calls it, the problem goes away.

If so, I think the problem is that when the main program doesn't use any
libc symbols, the libc link is "optimised-away", and the libc startup
code never gets called. If so, you could probably fix it by including a
line like "extern void *malloc();" in your main code source.

- Kevin.

0 new messages