Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Long running Bourne Shell scripts

1 view
Skip to first unread message

Mark A Gebert

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
We have a long running shell script that is started out of the inittab and does
some monitoring on some of our systems if it detects a problem it sends email
out. After about a week the program detects the problems but does not send out
Email. We've tried serveral things but no dice. This is under Solaris 2.6.

Thanks in advance.

--geeb

-------------------------------------------------------------
Mark A. Gebert Email: ge...@merit.edu
Senior Research Programmer Voice:+1 734 936 2655
Merit Network, Inc Fax: +1 734 647 3185
4251 Plymouth Rd, Suite C, Ann Arbor, MI 48105-2785
-------------------------------------------------------------
I'd never thought I'd say this... But can I go work now?

% cat std.disclaimers


Mark A Gebert

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
The question is does anyone have a clue why this happens?

--geeb

--

Rick Carter

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
Flying blind here, but whenever I hear "long-running" and "doesn't work"
in the same sentence, I start wondering if there's a kerberos ticket
expiration involved somewhere.

- Rick

-------
Rick Carter, System Administrator, Physics Dept., University of Michigan
Rick....@umich.edu Voice: (734) 764-3348 FAX: (734) 763-9694
For Physics computer support, please use Physi...@umich.edu
"As mommy used to say, 'be nice to the man with the death-beam.'" - Kriegman


Mark A Gebert

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
Here is the script (remember it does work perfectly for the first week).

--geeb


#!/bin/sh

#
# daemoncheck - control process for running various system and daemon checks
#
# Written By: Mark A Gebert
# Date: July, 1999
#
# Thank to Mark Giuffrida (ma...@umich.edu) for the basic script)
#

umask 077

BINDIR="/usr/private/dc"
LOGFILE="/var/adm/daemonrestart.log"
HOST=`hostname | cut -d. -f1`
DOMAIN=merit.edu
MONITOREDEMAIL="n...@noc.ns.itd.umich.edu"

echo $$ > /etc/daemoncheck.pid

/usr/private/dc/bootcheck

cd ${BINDIR}

while true
do

#
# Setup Daemoncheck
#
DCSLEEPTIME=900
. /etc/hostconfig
export STARTX SENDMAIL
sleep ${DCSLEEPTIME}

mv ${LOGFILE} ${LOGFILE}.old
echo "daemonmonitor starting loop at `date`" > ${LOGFILE} 2>&1
startsize=`/bin/ls -l ${LOGFILE} | awk '{print $5}'`

#
# Perform system checks
#
SYSCHECKS="*.sys"
for SYSSCRIPT in $SYSCHECKS
do
if [ -s ${BINDIR}/${SYSSCRIPT} ]; then
( ${BINDIR}/${SYSSCRIPT} >> ${LOGFILE} 2>&1 )
fi
done

#
# Preform daemon checks
#
DAEMONCHECKS="*.daemon"
for DAEMONSCRIPT in $DAEMONCHECKS
do

if [ -s ${BINDIR}/${DAEMONSCRIPT} ]; then
( ${BINDIR}/${DAEMONSCRIPT} >> ${LOGFILE} 2>&1 )
fi

done

endsize=`/bin/ls -l ${LOGFILE} | awk '{print $5}'`

#
# Do we need to notify people?
#

if [ "$startsize" != "$endsize" ]; then
to=root+$HOST@$DOMAIN
if [ "${MONITORED:=-NO-}" = "-YES-" ]; then
to="$to, $MONITOREDEMAIL"
fi

/usr/lib/sendmail -t << EOF
To: $to
Subject: Daemoncheck alerts on `/bin/hostname`

`cat ${LOGFILE}`
EOF

fi
echo "daemonmonitor ending loop at `date`" >> ${LOGFILE} 2>&1
done



At 15:21 -0500 17 November 2000, Jason Presnell <presnell> wrote:

> On Fri, 17 Nov 2000, Mark A Gebert wrote:
>
> > The question is does anyone have a clue why this happens?
> >
>

> It would help if we could all see the script in question that is not
> working (remove anything that is private if you wish).
>
> -j

--

Jason Presnell

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to

John M. Lockard

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
Nada, none, nyet, nicht...

It's an iritating problem, that seems to happen no matter
how the process is started. I'm assuming that it's the
dc (DeamonCheck) process that you're talking about...

On Fri, Nov 17, 2000 at 04:55:22PM -0500, Mark A Gebert wrote:
> No kerberos in this one.... Zip zero ziltch.
>
> --geeb


>
> At 15:36 -0500 17 November 2000, Rick Carter <Rick.Carter> wrote:
>
> > Flying blind here, but whenever I hear "long-running" and "doesn't work"
> > in the same sentence, I start wondering if there's a kerberos ticket
> > expiration involved somewhere.
> >
> > - Rick
> >
> >

> > On Fri, 17 Nov 2000, Mark A Gebert wrote:
> >
> > > The question is does anyone have a clue why this happens?
> > >

> > > --geeb
> > >
> > > At 14:45 -0500 17 November 2000, Mark A Gebert <geeb> wrote:
> > >
> > > > We have a long running shell script that is started out of the inittab and does
> > > > some monitoring on some of our systems if it detects a problem it sends email
> > > > out. After about a week the program detects the problems but does not send out
> > > > Email. We've tried serveral things but no dice. This is under Solaris 2.6.
> > > >
> > > > Thanks in advance.
> > > >
> > > > --geeb
> > > >
> > > > -------------------------------------------------------------
> > > > Mark A. Gebert Email: ge...@merit.edu
> > > > Senior Research Programmer Voice:+1 734 936 2655
> > > > Merit Network, Inc Fax: +1 734 647 3185
> > > > 4251 Plymouth Rd, Suite C, Ann Arbor, MI 48105-2785
> > > > -------------------------------------------------------------
> > > > I'd never thought I'd say this... But can I go work now?
> > > >
> > > > % cat std.disclaimers
> > >
> > > --
> > >
> >
> > -------
> > Rick Carter, System Administrator, Physics Dept., University of Michigan
> > Rick....@umich.edu Voice: (734) 764-3348 FAX: (734) 763-9694
> > For Physics computer support, please use Physi...@umich.edu
> > "As mommy used to say, 'be nice to the man with the death-beam.'" - Kriegman
> >
>

> --

--
--jlockard - "Welcome to the Psychic Admin hotline.
Don't call us, we'll call you." - KSon

Mark A Gebert

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to

John A. Lauro

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
> Here is the script (remember it does work perfectly for the first week).

Personally, I run something with a 15 minute delay out of cron
instead of a infinite loop with a sleep....

How do you know it keeps monitoring? Is it because
/var/adm/daemonrestart.log.old is being updated like it should every
15 minutes + processing time?

Try logging *.debug in syslogd.conf to somewhere, and check for any
entries for sendmail when it should be sending out the emails. Maybe
it will give a clue if sendmail is running but refusing to send the e-
mail...

Are any portions that the script runs, or logs, or checks, etc... on
NFS mounts, or other remote file systems?

---------------------------------------------------------------------------
John Lauro email: jla...@flint.umich.edu
University of Michigan - Flint jla...@umich.edu
Information Technology Services
303 E. Kearsley St. phone: (810) 762-3123
Flint, MI 48502 fax: (810) 766-6805

John M. Lockard

unread,
Nov 17, 2000, 3:00:00 AM11/17/00
to
On Fri, Nov 17, 2000 at 05:32:27PM -0400, John A. Lauro wrote:
> > Here is the script (remember it does work perfectly for the first week).
>
> Personally, I run something with a 15 minute delay out of cron
> instead of a infinite loop with a sleep....

Reason for not running it out of cron, is that if cron dies, then
you also lose the script. This way, if the script dies, inittab
will restart it.

> How do you know it keeps monitoring? Is it because
> /var/adm/daemonrestart.log.old is being updated like it should every
> 15 minutes + processing time?

You can actually see that the logs (update info) is being written,
just not mailed out).

> Try logging *.debug in syslogd.conf to somewhere, and check for any
> entries for sendmail when it should be sending out the emails. Maybe
> it will give a clue if sendmail is running but refusing to send the e-
> mail...
>
> Are any portions that the script runs, or logs, or checks, etc... on
> NFS mounts, or other remote file systems?

Nope, in this case, everything is completely local on ufs or ffs
partitions.


--
--jlockard - "Why do you have to be so small?" - Lane Myer
-------------------------------------------------------------------
John M. Lockard | U of Michigan - School of Information
Sys Admin III | 400 West Hall - 550 E. University. Ave.
jloc...@umich.edu | Ann Arbor, MI 48109-1092
www.umich.edu/~jlockard | 734-615-8776 | 734-764-2475 FAX
-------------------------------------------------------------------

Dan Pritts

unread,
Nov 19, 2000, 3:00:00 AM11/19/00
to
here are a few suggestions.

add some error checking to the script and see the exit status from the
sendmail process. That might give a clue as to what is going wrong.

You might be hitting a limit problem although it's not clear what limit
you would be hitting.

there might be a bug in solaris sh causing this...try running it under
bash or ksh or zsh or something instead. Also double check that you
are up to date on any /bin/sh patches.

along the same lines, reimplement the thing in perl.

Once it stops sending email, hit it with a truss -p and see what happens
when it attempts to fork sendmail.

kludge it so that it keeps a counter and exits after a day or so.
Init will restart it and you'll be happy.

On Fri, 17 Nov 2000, Mark A Gebert wrote:
> We have a long running shell script that is started out of the inittab and does
> some monitoring on some of our systems if it detects a problem it sends email
> out. After about a week the program detects the problems but does not send ou

> Email. We've tried serveral things but no dice. This is under Solaris 2.6.
>
> Thanks in advance.
>
> --geeb
>
> -------------------------------------------------------------
> Mark A. Gebert Email: ge...@merit.edu
> Senior Research Programmer Voice:+1 734 936 2655
> Merit Network, Inc Fax: +1 734 647 3185
> 4251 Plymouth Rd, Suite C, Ann Arbor, MI 48105-2785
> -------------------------------------------------------------
> I'd never thought I'd say this... But can I go work now?
>
> % cat std.disclaimers
>

they say i shot a man named gray
and took his wife to italy dan pritts
she inherited a million bucks 734/996-0169
and when she died it came to me da...@umich.edu
i can't help it if i'm lucky...


Neil Brian Tweedy

unread,
Nov 20, 2000, 3:00:00 AM11/20/00
to
I'd look closely at the environment the shell is carrying. If
sendmail throws a few errors to stdout or stderr who is catching
them? Maybe a buffer setup for init.d isn't working out for a long
lived beast.

One could use (shudder) process accounting to look at whether
sendmail is getting started.

Just playing the tyro here. :).


neil
--
Neil Tweedy
Mathematics Computer Group
LS&A Information Technology
twe...@umich.edu

----------------------------------------------------------------------

0 new messages