Here are the specific conditions that seem to trigger it (it seems to
run fine otherwise):
* Running it as root with the -u option
* We are using nss_db to store user information (passwd, group, shadow)
on this system, although the memcached user happens to be in
/etc/passwd. When I remove /var/db/passwd.db, memcached starts fine.
For nss_db, we have the following lines in /etc/nsswitch.conf:
passwd: db files
shadow: db files
group: db files
If it helps any, here are straces when /var/db/passwd.db is there:
http://ricky.fedorapeople.org/memcached/memcached.bad.log
and when it isn't:
http://ricky.fedorapeople.org/memcached/memcached.good.log
I've confirmed that it happens with 1.2.6 and 1.2.5. 1.2.3 worked fine,
though (and I haven't tested with 1.2.4 yet).
All of my testing has been on a RHEL 5.1 box with libevent-1.1a. Any
idea what could be causing this?
Thanks a lot,
Ricky
Thanks,
Ricky
Otherwise, someone who can reproduce it will have to track that down. A
lot of stuff went into 1.2.5
On Sat, 2 Aug 2008, Ricky Zhou wrote:
I just tried a git bisect, and got:
f3e522bcc5a211198f587bb63ce08f310a0b2783 is first bad commit
commit f3e522bcc5a211198f587bb63ce08f310a0b2783
Author: dormando <dormando@b0b603af-a30f-0410-a34e-baf09ae79d0b>
Date: Wed Feb 27 03:37:18 2008 +0000
Enable UDP by default, clean up server socket code (Brian Aker)
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@726 b0b603af-a30f-0410-a34e-baf09ae79d0b
:100644 100644 b4598b1ccf50715c12a7d5db211fc3337455f541 03f8bb70c72bc502a912d1aa85ccd0444aa07455 M memcached.c
Also, I forgot to mention that my version is compiled with --enable-threads.
Thanks a lot,
Ricky
Edit /etc/nsswitch.conf, replace the lines:
passwd: files
shadow: files
group: files
with
passwd: db files
shadow: db files
group: db files
As root, cd /var/db and run make.
Then (also as root), run:
memcached -l 127.0.0.1 -p 11000 -m 64 -u memcached
and this is where it segfaulted for me.
Hope this helps,
Ricky
I think the git-bisect (and all of the other weird conditions causing
the bug) were red herrings. Here's what I think the problem is now:
In thread.c:thread_init, there's a:
threads = malloc(sizeof(LIBEVENT_THREAD) * nthreads);
setup_thread is then called on each member of threads. In
setup_threads, me->base is initialized with:
if (! me->base) {
me->base = event_init();
if (! me->base) {
fprintf(stderr, "Can't allocate event base\n");
exit(1);
}
}
So me->base probably didn't get initialized properly because it got
garbage from malloc. When I switched to a calloc, memcached stopped
segfaulting (patch at http://ricky.fedorapeople.org/memcached/memcached-calloc.patch)
Does this look like the right description of the problem/solution?
Thanks,
Ricky
We should probably be using zeroed out memory there anyway. Although the
fact that this only comes up in a specific condition under centos makes me
want to believe redhat's doing something retarded.
I've committed this and pushed to stable. Thanks!
-Dormando
On Sat, 2 Aug 2008, Ricky Zhou wrote: