Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

System Crash - iSeries more vulnerable than previously believed...

132 views
Skip to first unread message

Frank Whittemore

unread,
Nov 18, 2002, 7:54:52 PM11/18/02
to
I'm an I.S. Manager of a small shop. While on vacation Friday I
received a phone call informing me that the system had crashed. Seems
as though a programmer ran the DSPJRN command leading to the filling
up of all our disk space resulting in a complete system crash. This
occurred on V5R1. I had previously been led to believe that in these
modern times it was impossible for any one user or programmer to crash
the entire system. Boy! Was I wrong or what?

IBM technical support has informed me that if the amount of storage
left on the system is less than the maximum that the user can use,
then yes, the system can go down when it runs out of storage. I am
also told that the system default for user disk usage is *NOMAX. Do
most of you actually set maximum amounts of storage for each and every
user including programmers? How do you determine such maximums for
users? For programmers?

Guess OS/400 isn't as bullet proof as I had been led to believe.

Has anyone else had this happen at their site? Has anyone had a
program go into a loop while writing to disk and had this happen?

What preventive measures are you taking to prevent such an occurrence?

Help! What to do?

Oliver Wenzel

unread,
Nov 19, 2002, 12:32:27 AM11/19/02
to
Frank Whittemore schrieb:

> Guess OS/400 isn't as bullet proof as I had been led to believe.
>
> Has anyone else had this happen at their site? Has anyone had a
> program go into a loop while writing to disk and had this happen?
>
> What preventive measures are you taking to prevent such an occurrence?
>
> Help! What to do?
>

Loops are always a possibility. I've also seen user queries sucking up to
20GB of temp storage - if we hadn't caught it in time - boom.

OS/400 will send out warning messages when the ASP threshold is reached.
If you ignore these - well, it's YOUR problem.

As preventive measures, we have tools installed that inform us about
critical system states, either by OS/400 message, e-mail or SMS. Also, at
least one person is doing checks periodically.

To determine user disk usage, you can do a dspobjd (*all/*all) to an
outfile. Then you can run queries on this file and see what each user
has.

HTH,

Oliver

Charles Wilt

unread,
Nov 20, 2002, 9:08:58 AM11/20/02
to
A minor correction to your thinking....

It's as good as impossible for a user program to crash the AS/400 by
accessing memory not allocated to that program or any other illegitimate
operation. IE. Windows blue screen of death.

In your case, the program was performing a perfectly legitimate
operation...writing to a file. The program didn't crash the system, the
system crashed because it had run out of space...an important difference.


In any event, as others have mentioned OS/400 sends a message to the
QSYSOPR queue when your ASP usage goes over it's defined threshold,
default is 80% but you can change it. But it up to you to do something
about it. You might want to look at monitoring software that will send
you an email or page for important messages such as this, as a small shop
ourselves with no operators, we find such software quite beneficial.

For users using query, *NOMAX on their storage limit is asking for
trouble.

Charles Wilt

In article <bcdcd347.0211...@posting.google.com>,
fra...@adelphia.net says...

Frank Whittemore

unread,
Nov 20, 2002, 1:21:35 PM11/20/02
to
Oliver -


>
> "As preventive measures, we have tools installed that inform us about
> critical system states, either by OS/400 message, e-mail or SMS. Also, at
> least one person is doing checks periodically."

Are the tools something that you purchased or something that you
developed in house?

If you purchased them, please document the name of the tools and the
company that you purchased them from.

Thanks.

Steven Kendall

unread,
Nov 20, 2002, 7:12:18 PM11/20/02
to
On 18 Nov 2002 16:54:52 -0800, fra...@adelphia.net (Frank Whittemore)
wrote:

The first time I saw this happen was on a system running V3R0M5, where
a looping job created itself a 360,000 page joblog.

I would strongly recommend reviewing how the QSTGLOWLMT and QSTGLOWACN
system values are set on your AS/400. Since these were introduced in
V4R2, I have set QSTGLOWLMT to 97% and QSTGLOWACN to *ENDSYS. This
will cause the system to shut down to a restricted state once the
system ASP is 97% full. The system is still down, but at least you
have the console available so that you can free up some space and get
the system back up quickly.

Beyond this, it is worth making sure that the QSYSOPR message queue is
monitored. In this case, you would have had "Serious storage
condition may exist. Press HELP." messages breaking once the
utilisation on the system ASP reached the storage threshold set in
SST. (If I recall correctly, this is set to 90% by default.)

It may also be worth creating a separate user ASP for programmers
libraries so that they can be confined to a set amount of disk space,
that won't bring the entire system down if it fills up.

I hope these suggestions are of some help to you.


Regards,


Steven Kendall
Blue Lake Technology Solutions

www.bluelake.co.nz

Oliver Wenzel

unread,
Nov 21, 2002, 12:48:40 AM11/21/02
to
Frank Whittemore schrieb:

> Are the tools something that you purchased or something that you
> developed in house?
>
> If you purchased them, please document the name of the tools and the
> company that you purchased them from.

We use Messenger Plus from Bytware. It allows to monitor for lots of error
conditions on the AS/400.

Regards,

Oliver

Charles Wilt

unread,
Nov 21, 2002, 9:12:37 AM11/21/02
to
I should have mentioned that Messenger Plus from Byteware is the tool we
use also. http://www.bytware.com

Another one to look at is Robot/Alert + Robot/Console from Help Systems.
www.helpsystems.com

Charles Wilt

In article <arhs3o$aqe$00$1...@news.t-online.com>, owe...@aranea.de says...

Charles Wilt

unread,
Nov 21, 2002, 9:15:26 AM11/21/02
to
In article <3ddc1eaa...@news.free.net.nz>,
ste...@NOSPAM.bluelake.co.nz says...
<snip>>
> It may also be worth creating a separate user ASP for programmers
> libraries so that they can be confined to a set amount of disk space,
> that won't bring the entire system down if it fills up.
>
> I hope these suggestions are of some help to you.

I'm not sure this would be of any help. When a User ASP fills up, it
simply overflows to the system ASP. So assuming the job continues to
run, it will eventually fill up the system ASP and the server will crash.

Unless you know of a way to prevent the user ASp from overflowing to the
system ASP?


Charles

Karl Hanson

unread,
Nov 21, 2002, 5:47:22 PM11/21/02
to

V5R2 has independent (switchable) ASPs, that cannot overflow into the
system ASP.
http://publib.boulder.ibm.com/iseries/v5r2/ic2924/index.htm?info/rzaix/rzaixcontrasting.htm

--
Karl Hanson

0 new messages