I just committed an interesting change to Redis, now it is able to
provide a good stack trace on SIGSEGV/SIGBUS but only under Linux and
Mac OS X.
This will help a lot to track and fix bugs in the future, and will
have in the long term some good effects on the stability level of the
project.
The following is a short story of this implementation. Basically some
day ago I tried to implement this stuff using the backtrace() call
available both on glibc and *BSD systems. I removed the implementation
since it was very broken at best: no symbols for functions declared
static, no trace for the last stack frame since the signa handler
overwrites it, and so on: not worth the code at all.
Then Diego Rosario Brogna sent me a patch that was able to fix at
least one of the problems: it contained the low level code to access
the uc_mcontext structure to get the EIP (istruction ponter) register
in order to have a full stack trace, including the latest function
called (*the function* that caused the SIGSEGV, so pretty important to
know ;)
This was a major step forward, and I suggested Diego to implement his
own symbol table for static functions. This turned out to work well
and Diego sent me implementation that I tried to enhance a bit adding
the #ifdefs needed to make it available for Mac OS X too. Ok too much
words and too little examples: in the following example I'll use the
new command "DEBUG" in order to write to an invalid address. "DEBUG
SEGFAULT" will do just this. That's what I can see in the backtrace
dumped by Redis in the log file:
07 Jun 18:21:57 * ======= Ooops! Redis 0.101 got signal: -11- =======
07 Jun 18:21:57 * redis_version:0.101; uptime_in_seconds:4;
connected_clients:1; connected_slaves:0; used_memory:3168;
changes_since_last_save:0; bgsave_in_progress:0;
last_save_time:1244398913; total_connections_received:1;
total_commands_processed:0; role:master;
07 Jun 18:21:57 * 1 redis-server 0xb696 debugCommand + 54
07 Jun 18:21:57 * 2 ??? 0xffffffff
0x0 + 4294967295
07 Jun 18:21:57 * 3 redis-server 0x62f2 processCommand + 482
07 Jun 18:21:57 * 4 redis-server 0x6610 readQueryFromClient + 320
07 Jun 18:21:57 * 5 redis-server 0x00002f08
aeProcessEvents + 648
07 Jun 18:21:57 * 6 redis-server 0x000032f0 aeMain + 48
07 Jun 18:21:57 * 7 redis-server 0x0000ccf4 main + 484
07 Jun 18:21:57 * 8 redis-server 0x00002436 start + 54
That's pretty cool as you can see! We know that the wrong access was
caused by debugCommand + 54. And we can do a lot with this
information:
% gdb ./redis-server /Users/antirez/hack/redis
GNU gdb 6.3.50-20050815 (Apple version gdb-962) (Sat Jul 26 08:14:40 UTC 2008)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for
shared libraries ... done
(gdb) list *debugCommand+54
0xb696 is in debugCommand (redis.c:4090).
4085
4086 /* ================================= Debugging
============================== */
4087
4088 static void debugCommand(redisClient *c) {
4089 if (!strcasecmp(c->argv[1]->ptr,"segfault")) {
4090 *((char*)-1) = 'x';
4091 } else if (!strcasecmp(c->argv[1]->ptr,"object") && c->argc == 3) {
4092 dictEntry *de = dictFind(c->db->dict,c->argv[2]);
4093 robj *key, *val;
4094
As you can see the bad instruction is there " *((char*)-1) = 'x';".
Hopefully this changes should not create any problem with other
systems, but it's better to check if you can. AFAIK Redis will now not
compile on Mac OS X < 10.5.x. The change is trivial and I'll fix that
once somebody with a Mac OS X 10.4.x will complain and have time to
test the fix.
Cheers,
Salvatore
--
Salvatore 'antirez' Sanfilippo
http://invece.org
"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay