memcached segfault (running as root and using nss

Ricky Zhou

unread,

Aug 2, 2008, 1:27:48 AM8/2/08

to memc...@googlegroups.com

Hi, I'm getting a strange segfault on running
/usr/bin/memcached -l 127.0.0.1 -p 11000 -m 64 -u memcached
gdb output: http://ricky.fedorapeople.org/memcached/gdb

Here are the specific conditions that seem to trigger it (it seems to
run fine otherwise):
* Running it as root with the -u option
* We are using nss_db to store user information (passwd, group, shadow)
on this system, although the memcached user happens to be in
/etc/passwd. When I remove /var/db/passwd.db, memcached starts fine.

For nss_db, we have the following lines in /etc/nsswitch.conf:
passwd: db files
shadow: db files
group: db files

If it helps any, here are straces when /var/db/passwd.db is there:
http://ricky.fedorapeople.org/memcached/memcached.bad.log
and when it isn't:
http://ricky.fedorapeople.org/memcached/memcached.good.log

I've confirmed that it happens with 1.2.6 and 1.2.5. 1.2.3 worked fine,
though (and I haven't tested with 1.2.4 yet).

All of my testing has been on a RHEL 5.1 box with libevent-1.1a. Any
idea what could be causing this?

Thanks a lot,
Ricky

dormando

unread,

Aug 2, 2008, 1:45:24 AM8/2/08

to memc...@googlegroups.com

fwiw, I don't know what's going on here. So don't warnock this guy on my
account :)

Ricky Zhou

unread,

Aug 2, 2008, 1:54:18 AM8/2/08

to memc...@googlegroups.com

On 2008-08-02 01:27:48 AM, Ricky Zhou wrote:
> I've confirmed that it happens with 1.2.6 and 1.2.5. 1.2.3 worked fine,
> though (and I haven't tested with 1.2.4 yet).

Update: 1.2.4 worked fine, so maybe something changed in 1.2.5.

Thanks,
Ricky

dormando

unread,

Aug 2, 2008, 2:05:30 AM8/2/08

to memc...@googlegroups.com

If you're bored and inclinded, use `git bisect` to narrow down the exact
patch :)

Otherwise, someone who can reproduce it will have to track that down. A
lot of stuff went into 1.2.5

On Sat, 2 Aug 2008, Ricky Zhou wrote:

Ricky Zhou

unread,

Aug 2, 2008, 3:37:48 AM8/2/08

to memc...@googlegroups.com

(Apologies for replying to the wrong message, I apparently still haven't
receieved the latest replies to my post).

I just tried a git bisect, and got:
f3e522bcc5a211198f587bb63ce08f310a0b2783 is first bad commit
commit f3e522bcc5a211198f587bb63ce08f310a0b2783
Author: dormando <dormando@b0b603af-a30f-0410-a34e-baf09ae79d0b>
Date: Wed Feb 27 03:37:18 2008 +0000

Enable UDP by default, clean up server socket code (Brian Aker)

git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@726 b0b603af-a30f-0410-a34e-baf09ae79d0b

:100644 100644 b4598b1ccf50715c12a7d5db211fc3337455f541 03f8bb70c72bc502a912d1aa85ccd0444aa07455 M memcached.c

Also, I forgot to mention that my version is compiled with --enable-threads.

Thanks a lot,
Ricky

Ricky Zhou

unread,

Aug 2, 2008, 5:49:16 PM8/2/08

to memc...@googlegroups.com

On 2008-08-01 11:05:30 PM, dormando wrote:
> Otherwise, someone who can reproduce it will have to track that down. A
> lot of stuff went into 1.2.5

By the way, here are instructions for reproducing it on a RHEL or Fedora
machine:

Edit /etc/nsswitch.conf, replace the lines:

passwd: files
shadow: files
group: files

with

passwd: db files
shadow: db files
group: db files

As root, cd /var/db and run make.

Then (also as root), run:

memcached -l 127.0.0.1 -p 11000 -m 64 -u memcached

and this is where it segfaulted for me.

Hope this helps,
Ricky

Ricky Zhou

unread,

Aug 2, 2008, 7:11:24 PM8/2/08

to memc...@googlegroups.com

On 2008-08-02 03:37:48 AM, Ricky Zhou wrote:
> I just tried a git bisect, and got:
> f3e522bcc5a211198f587bb63ce08f310a0b2783 is first bad commit

Disclaimer: I don't have any C experience

I think the git-bisect (and all of the other weird conditions causing
the bug) were red herrings. Here's what I think the problem is now:

In thread.c:thread_init, there's a:

threads = malloc(sizeof(LIBEVENT_THREAD) * nthreads);

setup_thread is then called on each member of threads. In
setup_threads, me->base is initialized with:

if (! me->base) {
me->base = event_init();
if (! me->base) {
fprintf(stderr, "Can't allocate event base\n");
exit(1);
}
}

So me->base probably didn't get initialized properly because it got
garbage from malloc. When I switched to a calloc, memcached stopped
segfaulting (patch at http://ricky.fedorapeople.org/memcached/memcached-calloc.patch)

Does this look like the right description of the problem/solution?

Thanks,
Ricky

dormando

unread,

Sep 1, 2008, 1:47:18 AM9/1/08

to memc...@googlegroups.com

Hey,

We should probably be using zeroed out memory there anyway. Although the
fact that this only comes up in a specific condition under centos makes me
want to believe redhat's doing something retarded.

I've committed this and pushed to stable. Thanks!

-Dormando

On Sat, 2 Aug 2008, Ricky Zhou wrote:

Reply all

Reply to author

Forward

memcached segfault (running as root and using nss_db)

Ricky Zhou

dormando

Ricky Zhou

dormando

Ricky Zhou

Ricky Zhou

Ricky Zhou

dormando