I sometimes observe broken smtpd.
ar% ps
...
none 1948578 0:01 44:55 264K Broken smtpd
none 2521172 0:01 44:58 264K Broken smtpd
none 3190037 0:01 45:03 264K Broken smtpd
...
ar% ACID 3190037
/proc/3190037/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
err=0x616d2f27
d=0x0
ctim=0x0
buf=0x0
p=0x0
arg=0x0
n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid:
ar% pr -tn /sys/src/libc/9sys/syslog.c
...
34 void
35 syslog(int cons, char *logname, char *fmt, ...)
36 {
37 char buf[1024];
38 char *ctim, *p;
39 va_list arg;
40 int n;
41 Dir *d;
42 char err[ERRMAX];
43
44 err[0] = ' 0';
45 errstr(err, sizeof err);
46 lock(&sl);
47
...
Is this phenomenon specific to me?
Kenji Arisawa
No, it isn't:
term% ps|grep Broken
none 917 2:08 42:50 264K Broken smtpd
impressive!
i don't see this problem, but i have implemented
some fairly agressive spam control. perhaps this
shows up after i tend to hang up.
(it is possible to use nupas smtpd with regular old
upas.)
would anyone mind sending me a full backtrace?
- erik
Today, I observed smtpd that was in endless loop and then broken.
ar% ps
...
none 20392 0:01 28:27 264K Running smtpd
...
ar% ACID 20392
/proc/20392/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
receiver(path=0x41f58)+0x1c5 /sys/src/cmd/upas/smtp/smtpd.c:625
sender=0x379bf
rcpt=0x41ad8
zzparse()+0x2c9 /sys/src/cmd/upas/smtp/smtpd.y:773
save1=0x0
save2=0x0
save3=0x0
save4=0x0
zzstate=0xf
zzchar=0xffffffff
zzs=0x3f36b
zzp=0xdfffe41c
zzn=0xfffffc18
main(argv=0xdfffefa4,argc=0x0)+0x261 /sys/src/cmd/upas/smtp/smtpd.c:194
netdir=0xdfffefbc
_argc=0x6e
_args=0x376c2
buf=0x0
_main+0x31 /sys/src/libc/386/main9.s:16
acid: src(receiver+0x1c5)
/sys/src/cmd/upas/smtp/smtpd.c:625
620
621 logged = 0;
622
623 /* forwarding() can modify 'path' on loopback request */
624 if(filterstate == ACCEPT && fflag && !authenticated &&
forwarding(path)) {
>625 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
626 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
627 rejectcount++;
628 reply("550 5.7.1 we don't relay. send to your-path@[] for "
629 "loopback.\r\n");
630 return;
I don't observe "Bad Forward" of this mail in /sys/log/smtpd.
After some amount of time,
ar% ps
...
none 20392 0:02 44:57 264K Broken smtpd
...
acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
err=0x616d2f27
d=0x0
ctim=0x0
buf=0x0
p=0x0
arg=0x0
n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid: src(syslog+0x29)
/sys/src/libc/9sys/syslog.c:46
41 Dir *d;
42 char err[ERRMAX];
43
44 err[0] = '\0';
45 errstr(err, sizeof err);
>46 lock(&sl);
47
48 /*
49 * paranoia makes us stat to make sure a fork+close
50 * hasn't broken our fd's
51 */
acid:
I hope the above information is helpful to fix the problem.
Kenji Arisawa
[...] A handler must
return a non-zero number if the note was recognized (and
resolved); otherwise it must return zero. When the system
i think you're getting into some sort of note loop. i think it would
be helpful to log all notes. i would try something like this
static int
catchalarm(void *, char *msg)
{
static int chattycathy;
if(chattycathy++ > 5)
return 0;
if(senders.first && rcvers.first)
syslog(0, "smtpd", "note: %s->%s: %s",
s_to_c(senders.first->p),
s_to_c(rcvers.first->p), msg);
else
syslog(0, "smtpd", "note: %s", msg);
if(pp){
syskillpg(pp->pid);
pp = 0;
}
return strstr(msg, "alarm") != 0;
}
but at a minimum, i would reverse the return values.
- erik
if the note interrupts the program while in syslog() or the allocator,
it will deadlock.
--
cinap
excellent point. in this case, i think it is more likely that the problem
is more mundane than that. in any event, we are chewing gobs
of cpu time, so we don't have a literal deadlock.
- erik
i didn't really understand why it was trying to continue after the alarm,
rather than logging and quitting. does it not always mean that a caller
has connected but hasn't completed a request after a long time?
as cinap noted(!) there's a separate problem of what happens if the trap or alarm
happens when locks are held, but i think that's less likely to be a problem in practice:
an alarm will probably go off when nothing much has happened (ie, the program isn't
holding any other lock but waiting for input). it's harder to be sure about
bad pointer traps, but if those happen with locks held,
i'd suspect those are more likely to be locks unrelated to those needed by syslog and print.
I added some debugging codes to catchalarm() in smtpd.c
static int
catchalarm(void *a, char *msg)
{
int rv = 1;
static int count = 0; // DEBUG by Kenar
count++;
if(count > 3)
return 0;
USED(a);
/* log alarms but continue */
if(strstr(msg, "alarm")){
if(senders.first && rcvers.first)
syslog(0, "smtpd", "note: %s->%s: %s",
s_to_c(senders.first->p),
s_to_c(rcvers.first->p), msg);
else
syslog(0, "smtpd", "note: %s", msg);
rv = 0;
}
syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar
/* kill the children if there are any */
if(pp)
syskillpg(pp->pid);
return rv;
}
after that I found broken smtpd:
none 50392 0:00 0:00 264K Broken smtpd
note that wasted cpu time is now 0:00
/sys/log/smtpd shows:
ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
Kenji Arisawa
acid: src(0x7463)
/sys/src/cmd/upas/smtp/smtpd.c:628
623 logged = 0;
624
625 /* forwarding() can modify 'path' on loopback request */
626 if(filterstate == ACCEPT && fflag && !authenticated && forwarding(path)) {
627 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
>628 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
629 rejectcount++;
630 reply("550 5.7.1 we don't relay. send to your-path@[] for "
631 "loopback.\r\n");
632 return;
633 }
replace lines 627,628 with
syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
sender, him, nci->rsys, rcpt);
i made this change some time ago and it appears to be working fine.
i may have seen a similar fault.
- erik