The server appears to page once every 1 to 3 days, causing it to slow.
What's the best way I can track what process is causing this?
Thanks,
John Seed
# vmstat 1 5
PROCS PAGING SYSTEM CPU
r b w frs dmd sw cch fil pft frp pos pif pis rso rsi sy cs us su id
3 161 104 476736 6 663 130 191 30 893 12 0 8 0 0 505 2226 0 67 33
2 160 99 477120 2 507 104 127 4 610 0 0 7 0 0 505 1582 0 50 50
0 162 98 477376 0 836 258 147 0 1037 9 0 9 0 0 595 2554 8 54 38
1 162 100 477496 0 713 123 191 0 934 5 0 3 0 0 524 2241 0 100
0
1 162 98 477576 0 649 100 123 0 801 0 0 2 0 0 378 1821 8 23 69
# mpsar 1 5
SCO_SV kettsco 3.2v5.0.5 i80386 05/02/2002
14:38:52 %usr %sys %wio %idle (-u)
14:39:03 0 2 97 1
14:39:16 0 2 98 0
14:39:19 0 2 96 2
14:39:30 0 1 97 1
14:39:46 0 2 97 1
Average 0 2 97 1
#
Firs of all, what does mpsar -r say?
Secondly, INCREASE the RAM in your system.
Lastly, you may want to decrease NBUF and NHBUF, but I would just install
more memory.
Brian
ps -el|sort -rn +9
will show processes sorted by size. 'memhog' is also useful; do a deja google
on it.
John
--
John DuBois spc...@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
The next time I was able to catch it happening;
# mpsar -r 1 5
SCO_SV kettsco 3.2v5.0.5 i80386 05/09/2002
09:37:59 freemem freeswp availrmem availsmem (-r)
09:38:12 44 210256 124009 868
09:38:22 35 207000 124011 1057
09:38:30 40 203752 124007 988
09:38:50 49 201136 124011 1057
09:38:58 36 195616 124009 1038
Average 40 203552 124009 1002
# ps -el|sort -rn +9
20 B 0 24450 3763 0 51 20 fb1236e0 1459692 - ?
00:00:4
3 vfsd
20 S 0 17601 1 0 76 24 fb11f910 10004 f028ea24 ?
00:00:02
_mprosrv
20 S 0 25724 1 0 76 24 fb123c40 9972 f028ea24 ?
00:00:00 _
mprosrv
0 S 0 25588 373 0 76 4 fb1199a8 4824 f028ea24 tty02
00:00:01 X
sco
<All the other processes snipped out>
As you have probably gathered, I'm no expert at this, but I think the
ps was telling me that process 24450 was using about 1.5gig of memory.
245450 is a Visionfs process;
# ps -ef | grep 24450
root 24450 3763 1 09:01:27 ? 00:00:44 vfsd --profile
/usr/vision/
vfsprofile
There is no good reason for Visionfs to be using this much memory.
500meg of memory would be overspecced for the entire system on this
server, let alone 2gig! Putting more memory in won't fix anything,
I'm sure this fault will spiral up and chomp up whatever memory is
there.
The system sped up again after a few minutes. I checked the VisionFS
log;
2002/05/09 10:04:13.380 (pid 24450)
SCO VisionFS(3.0) FATAL ERROR:
The program has encountered an error that means it cannot continue.
It will now exit. A technical description is given below to help
establish the cause.
server/process/abort
[unknown time] process/error SCO VisionFS(3.0)
Process se3660 has had a fatal signal: SEGV - segmentation violation
Aborting process
This explains why everything works happily again. The 24450 process
turned into a <defunct> process. I have a good dozen of these on the
server, which I presume are all the aftermath of the same problem.
My VisionFS level is 9.00.925 which I know is old, but this is quite
an old legacy system which is pretty much frozen until it is replaced
and I fear change when it comes to VisionFS. Also, I don't get this
on any other VisionFS installation.
Any other ideas as to what I can do to investigate/resolve this issue?
Thanks again,
John
No ideas regarding visionfs specifically, but you can reduce the impact of this
problem by limiting the amount of memory available to vfsd via ulimit.
Find where vfsd is started up, and if it's a script, replace its invokation
with something like this:
ksh -c 'ulimit -v 20000; vfsd [vfsd arguments ...]'
That will limit it to using 20MB. If it tries to use more than that, it will
die in the same manner that it does now at 1.5+GB. You can probably pick a
better number by looking at ps output to see what its normal size is.
Something will need to start a new vfsd when it dies else you won't have
visionfs service, but from your description it sounds like that's already
happening automatically. To determine where it's being started, look at its
parent process (pid 3763 in the output above). If it's not a script, you can
probably put a memory limit on whatever ancestor of it is started at multiuser
time, most likely from one of the files under /etc/rc2.d
> The next time I was able to catch it happening;
>
> SCO_SV kettsco 3.2v5.0.5 i80386 05/09/2002
>
> 09:37:59 freemem freeswp availrmem availsmem (-r)
> 09:38:12 44 210256 124009 868
> 09:38:22 35 207000 124011 1057
> 09:38:30 40 203752 124007 988
> 09:38:50 49 201136 124011 1057
> 09:38:58 36 195616 124009 1038
>
> Average 40 203552 124009 1002
> # ps -el|sort -rn +9
> 20 B 0 24450 3763 0 51 20 fb1236e0 1459692 - ? 00:00:43 vfsd
> As you have probably gathered, I'm no expert at this, but I think the
As John DuBois suggested, you can bandage this with a strategically
placed "ulimit" command.
It sounds like the vfsd process is spinning out of control in some sort
of memory allocation loop. If it is like typical programs that fail
this way, it will spin out its entire loop, consume all of memory, then
die with a coredump. Unfortunately, a process which is in the throes of
dumping core continues to hold all of its memory until the coredump
completes. So the entire event looks like this:
- program gets caught in an allocation loop
- system-wide memory is depleted, eventually getting close to zero
(you can see that availsmem got down to <1000 on your system)
- at the low-water mark for memory, the spinning processes gets an
allocation failure and dumps core
- dumping such a large core takes a long time; especially since large
portions of the dumping process have probably gotten shuffled out to
swap, so this is a disk-to-disk copy, possibly on the same disk
- during this long core dump, systemwide memory is very low, so other
processes may fail (fortunately this can be somewhat
self-correcting, since other processes that die leave a little more
headroom)
- eventually the spinning process finishes dumping core and system
memory availability leaps upwards
In `sar -r` output an event will look like this: availsmem is cruising
along at a fairly stable high value, then it starts dropping rapidly.
The rapid drop continues until it gets down to a low water value
somewhere in the 0-1000 range. Then it will stay near that low water
value for a long time (could be several minutes and might creep up
slightly during this time). Finally, it will pop back up to the
original stable value, or possibly a bit higher. Grotty ASCII graphic:
| _____________ ^
| ______ process | done |
| \ spins | |
| \ out | |
| \ of | |
| \ control | availsmem
| \ . | |
| \ . | |
| \ .dumps core...__| |
| \____---___---- |
+-----Time------------------------------------ v
Please see the following article for a set of tools which allow size
limits to be established for every process on your system:
http://groups.google.com/groups?selm=980401074...@vagabond.armory
>Bela<
>_mprosrv
> 20 S 0 25724 1 0 76 24 fb123c40 9972 f028ea24 ?
How big is the cache in shared memory for your Progress DB ?
I think you can see that with ps !
S.Marquardt
hagebau dd
germany
Thanks,
John
20 S root 28603 1 0 76 24 fb11da28 4084 f028ea24 May-10
?
00:00:00 /u/dlc83b/bin/_mprosrv /u/41r2/unixdb/openacc -n 15 -B 1000
-L 100000
-N TCP -S
20 S root 28610 1 0 76 24 fb11db80 1832 f028ea24 May-10
?
00:00:00 /u/dlc83b/bin/_mprosrv /u/41r2/unixdb/openstrt -n 15 -B 500
-L 20000 -
N TCP -S
20 S root 28617 1 0 76 24 fb11ecf8 2412 f028ea24 May-10
?
00:00:00 /u/dlc83b/bin/_mprosrv /u/41r2/unixdb/openlog -n 15 -B 1000
-L 20000 -
N TCP -S
20 S root 28638 1 0 76 24 fb11efa8 1832 f028ea24 May-10
?
00:00:00 /u/dlc83b/bin/_mprosrv /u/41r2/unixdb/openfood -n 15 -B 500
-L 20000 -
N TCP -S
20 S root 16753 16084 3 76 24 fb128e38 64 f0fb4380 16:11:32
ttyp0
00:00:00 grep L
Stefan Marquardt <nospam.stef...@hagebau.de> wrote in message news:<ah4vdukmcs7ph60l8...@4ax.com>...
Thanks for you help on this. I will ulimit VisionFS which will, as
you say, bandage the problem (I suspect I just won't FIX this properly
whithout upgrading VisionFS - I've already tried a re-install).
Bela, could I ask a big favour and get you to check the hyperlink for
the ulimit info you gave me. I click on the link and it gives be
nothing. I try putting either the whole or parts of the search string
and the only thing I get up is your posting in this thread.
Thanks,
John
> Thanks for you help on this. I will ulimit VisionFS which will, as
> you say, bandage the problem (I suspect I just won't FIX this properly
> whithout upgrading VisionFS - I've already tried a re-install).
It should be a pretty good bandage (it would suck if VisionFS just died,
but since it apparently restarts itself it shouldn't be too bad).
> Bela, could I ask a big favour and get you to check the hyperlink for
> the ulimit info you gave me. I click on the link and it gives be
> nothing. I try putting either the whole or parts of the search string
> and the only thing I get up is your posting in this thread.
Whoops, sorry about that. Somehow the last ".com" went missing from my
post. The full URL should be:
http://groups.google.com/groups?selm=980401074...@vagabond.armory.com
>Bela<