Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Implementing a timeout in bash

17 views
Skip to first unread message

andy thomas

unread,
Mar 31, 2003, 4:35:19 AM3/31/03
to
We are using the lynx text browser within a bash script to access a test
page on an apache web server, dump the text in the page to a file and then
see if we can read a phrase within it, thereby proving that apache is
serving pages. Very occasionally, due to client overloads and some
interoperability problems with Cold Fusion server, apache will spawn
hundreds of child processes and stop serving pages (although still
remaining in memory) and the script should detect this, kill all the httpd
processes and restart apache.

This happens very rarely (about once every 4-6 weeks) and cannot be
reproduced or provoked by ourselves. But the last time it happened
(yesterday afternoon), what appeared to happen with the script is
lynx was able to make a valid HTTP connection and then hung there,
waiting for a page that never came, halting the script. What we need is
a timeout in the script so that if lynx fails to return after a certain
period, it gets terminated and the script continues to execute.

Does anyone know of way this can be done easily in bash (or preferably,
even in bourne or csh)?

Andy

Stephane CHAZELAS

unread,
Mar 31, 2003, 5:23:47 AM3/31/03
to
andy thomas wrote:
> We are using the lynx text browser within a bash script to access a test
> page on an apache web server,
[...]

> What we need is a timeout in the script so that if lynx fails
> to return after a certain period, it gets terminated and the
> script continues to execute.
[...]

Use curl or wget instead of lynx.

curl -m <seconds>

wget -T <seconds> -t <tries>

For how to set timeouts in shells, look at this group archive.
There has been a huge thread on this not so long ago, and
there's an example in the Advanced Bash Scripting Guide.

--
Stéphane

Brendon Caligari

unread,
Mar 31, 2003, 6:00:17 AM3/31/03
to

"andy thomas" <an...@ic.ac.uk> wrote in message
news:Pine.LNX.4.44.03033...@anahata.ma.ic.ac.uk...

'wget' does a very good job at downloading pages. I had inherited some
'dodgy' scripts using lynx, but had replaced everything using wget.

not to run into situations like yours, I often log 'successful' runs of
scripts/jobs similar to your apache monitoring script to a remote syslog
server on which I track entry 'ageing' and alert admins accordingly.

B.


andy thomas

unread,
Mar 31, 2003, 10:35:20 AM3/31/03
to
On Mon, 31 Mar 2003, Brendon Caligari wrote:

>
> "andy thomas" <an...@ic.ac.uk> wrote in message
> news:Pine.LNX.4.44.03033...@anahata.ma.ic.ac.uk...
> > We are using the lynx text browser within a bash script to access a test
> > page on an apache web server, dump the text in the page to a file and then
> > see if we can read a phrase within it, thereby proving that apache is
> > serving pages. Very occasionally, due to client overloads and some
> > interoperability problems with Cold Fusion server, apache will spawn
> > hundreds of child processes and stop serving pages (although still
> > remaining in memory) and the script should detect this, kill all the httpd
> > processes and restart apache.
> >
> > This happens very rarely (about once every 4-6 weeks) and cannot be
> > reproduced or provoked by ourselves. But the last time it happened
> > (yesterday afternoon), what appeared to happen with the script is
> > lynx was able to make a valid HTTP connection and then hung there,
> > waiting for a page that never came, halting the script. What we need is
> > a timeout in the script so that if lynx fails to return after a certain
> > period, it gets terminated and the script continues to execute.
> >
> > Does anyone know of way this can be done easily in bash (or preferably,
> > even in bourne or csh)?
> >
> > Andy
> >
>
> 'wget' does a very good job at downloading pages. I had inherited some
> 'dodgy' scripts using lynx, but had replaced everything using wget.

I've just looked at wget for the first time and it seems to have been
designed for this sort of thing! I've now replaced lynx with
wget -T 20 <URL> so that it times out after 20 seconds if it can't access
the page. Now I'll probably have to wait a few weeks for the problem to
occur again to see what happens but thanks very much for putting me on the
right track.

> not to run into situations like yours, I often log 'successful' runs of
> scripts/jobs similar to your apache monitoring script to a remote syslog
> server on which I track entry 'ageing' and alert admins accordingly.

Yes, the script runs every 60 seconds and logs the date/time followed by
the test phrase to a log file. So I suppose another script could have
run to verify the log was being updated every 60 seconds and detect the
situation where the first script had stalled owing to lynx still trying to
read a page. But it looks like wget will do the trick.

Many thanks,

Andy

Brendon Caligari

unread,
Mar 31, 2003, 5:08:22 PM3/31/03
to

"andy thomas" <an...@ic.ac.uk> wrote in message
news:Pine.LNX.4.44.030331...@anahata.ma.ic.ac.uk...

Glad to have been of help but Ouch......isn't 60 seconds a bit aggressive?
Avoidance of having scripts that monitor other scrips was a prime motivator
for moving a number of monitoring/administarative functions to a centralised
machine on the network. However, not to have too many scripts fired by cron
I often check that the previous cron has terminated, and instead of scanning
log files for the 'last entry', I periodically 'touch' a file (that sounds
sick) and check it's vintage using 'date' thereby working out when some
event last occured. The following snippet might not be very portable.

#!/bin/bash
TIME_FILE=`date +%s -r somefile`
TIME_NOW=`date +%s`
let TIME_DIFF=${TIME_NOW}-${TIME_FILE}
echo $TIME_DIFF
if [ ${TIME_DIFF} -gt 120 ]
then
echo "File is older than 2 minutes"
fi

B


andy thomas

unread,
Apr 2, 2003, 6:21:12 AM4/2/03
to
On Tue, 1 Apr 2003, Brendon Caligari wrote:

>
> "andy thomas" <an...@ic.ac.uk> wrote in message
> news:Pine.LNX.4.44.030331...@anahata.ma.ic.ac.uk...
> > On Mon, 31 Mar 2003, Brendon Caligari wrote:
> >
> > I've just looked at wget for the first time and it seems to have been
> > designed for this sort of thing! I've now replaced lynx with
> > wget -T 20 <URL> so that it times out after 20 seconds if it can't access
> > the page. Now I'll probably have to wait a few weeks for the problem to
> > occur again to see what happens but thanks very much for putting me on the
> > right track.
> >
> > > not to run into situations like yours, I often log 'successful' runs of
> > > scripts/jobs similar to your apache monitoring script to a remote syslog
> > > server on which I track entry 'ageing' and alert admins accordingly.
> >
> > Yes, the script runs every 60 seconds and logs the date/time followed by
> > the test phrase to a log file. So I suppose another script could have
> > run to verify the log was being updated every 60 seconds and detect the
> > situation where the first script had stalled owing to lynx still trying to
> > read a page. But it looks like wget will do the trick.
> >
>
> Glad to have been of help but Ouch......isn't 60 seconds a bit aggressive?

No, some of the customers whose web sites are hosted on this server get
very annoyed if there's any downtime. The server gets 2.8 million hits/day
and pushes out 11 Gb every 24 hours across 40 sites. In fact, one client
employs a remote monitoring company to access a page on their site from
several global locations every 2 minutes and calculate the average
response time. If the delay is over a certain threshold, we're sent emails
and SMS text messages warning us.

> Avoidance of having scripts that monitor other scrips was a prime motivator
> for moving a number of monitoring/administarative functions to a centralised
> machine on the network. However, not to have too many scripts fired by cron
> I often check that the previous cron has terminated, and instead of scanning
> log files for the 'last entry', I periodically 'touch' a file (that sounds
> sick) and check it's vintage using 'date' thereby working out when some
> event last occured. The following snippet might not be very portable.
>
> #!/bin/bash
> TIME_FILE=`date +%s -r somefile`
> TIME_NOW=`date +%s`
> let TIME_DIFF=${TIME_NOW}-${TIME_FILE}
> echo $TIME_DIFF
> if [ ${TIME_DIFF} -gt 120 ]
> then
> echo "File is older than 2 minutes"
> fi

This is the sort of thing I tend to do but both on the server that is
being monitored and on several remote machines as well. We've had cases
where an apache server has gone haywire, the server it's running on is
aware and sends out alerts by mail but port 25 has been blocked on its
own local subnet owing to a coincidental network problem. Having remote
monitors as well will pick these conditions up and alert me.

cheers,

Andy

Michael Paoli

unread,
Apr 5, 2003, 1:30:22 AM4/5/03
to
How about something roughly* like this, for a general timeout
capability. It should be sufficiently portable to work under any
reasonably Bourne compatible shell (e.g. Bourne, Bash, Korn, POSIX, ash,
...)

#!/bin/sh
#timeout in seconds
timeout=120

#must exceed timeout (preferably by a fair amount) and be within range
#sleep(1) can handle
longer=315360000

#we later terminate this sleep early
#and use its exit value (depending how it was terminated) to determine
#our course of action
sleep $longer &
longerPID=$!

{
#PROCESS_CHECKING_STUFF_THAT_MAY_HANG:
lynx #blah blah whatever...
#sleep(1) internally uses SIGALRM via alarm(2)
#so a premature SIGALRM will cause sleep(1) to prematurely exit 0
2>>/dev/null kill -14 $longerPID
} &

{
sleep $timeout
#try to trigger abnormal termination of our other sleep(1) (that PID
#may already be gone)
2>>/dev/null kill $longerPID
} &

wait $longerPID || {
#wait returns non-zero if longerPID exits non-zero
#handle the problem case
}

RTFM:
sleep(1)
alarm(2)
signal(2)
kill(1)
kill(2)
exit(3)
exit(2)
wait(2)
sh(1)
bash(1)

*caveats: check for bugs/typos, probably not suitable as-is for production
(remember to add checks, for example, that the sleep(1) processes were
successfully forked, etc.)

andy thomas <an...@ic.ac.uk> wrote in message news:<Pine.LNX.4.44.03033...@anahata.ma.ic.ac.uk>...

0 new messages