I'm looking for the article called: [OpenVMS] Reasons for RWMPB/RWMPE
States and PAGEFRAG/PAGECRIT Messages. I'm not able to find it on the HP
website nor on Google... Any help would be appreciated!
BR
Adrian
Here you go.
Alex
Copyright (c) 1998, 2001. Compaq Computer Corporation. All rights reserved.
PRODUCT: Compaq OpenVMS VAX, All Versions
COMPONENT: Memory Management
SOURCE: Compaq Computer Corporation
OVERVIEW:
This article contains information about certain process states: RWMPB
and RWMPE and console messages: PAGEFRAG and PAGECRIT. These states
and/or messages may be indicative of resource limitations that can
severely impact the performance of your VAX system.
This article covers methods for monitoring and understanding parts of
the VAX system as they pertain to paging, swapping, and basic working
set management. It is a long and complex article that addresses these
problems from different levels of understanding. Some of the
information may be too complex for a casual user.
This article contains 6 sections:
1. Overview
2. General analysis
3. Symptoms
4. Causes/Solution
5. Detailed Analysis
6. References
The SYMPTOMS section has been prioritized with the most commonly
observed symptom at the top of the list. Each symptom has a reference
to causes. These causes have been prioritized and have an associated
solution.
GENERAL ANALYSIS:
When a process page faults a page out of its working set, and that page
has been modified, that page is placed on the Modified Page List
(MPL). If the MPL has more than the SYSGEN parameter 'MPW_WAITLIMIT'
pages on it, that process will be placed in a RWMPx state until the
Modified Page Writer can transfer those pages from the MPL to a
pagefile on disk.
RWMPB (Resource Wait Modified Page writer Busy) indicates that the
Modified Page Writer is busy and trying to write pages out to the
pagefile(s). The Modified Page Writer is a portion of the SWAPPER
process that handles the writing of these pages.
RWMPE (Resource Wait Modified Page writer Emptying the MPL - pre V5.2)
indicates that the Modified Page Writer is trying to flush the entire
MPL to a pagefile(s).
It is normal for processes to go in and out of RWMPB or RWMPE states.
However, it is not normal for them to hang in these states for
extended periods of time.
The PAGEFRAG error indicates that the Modified Page Writer was unable
to locate up to 16 contiguous blocks in the first quarter of the
pagefile.
The PAGECRIT error indicates that the Modified Page Writer was able to
locate up to 16 contiguous blocks in only the last quarter of the
pagefile.
Note:
The PAGEFRAG message does not mean that the pagefile is fragmented
on the disk. It is a warning message stating that the pagefile is
becoming full and getting internally fragmented as pages are written
to and from the pagefiles.
Doing a BACKUP and RESTORE of the disk will NOT fix this internal
file fragmentation.
SYMPTOM 1:
Multiple SHOW SYSTEM commands show processes in RWMPB with *NO*
increase in CPU time, I/Os, or pagefaults. DECps rule R0240 or
R0245 may also fire.
Example:
VAX/VMS V5.4 on node ROTTIE 22-OCT-1993 13:09:25.77 Uptime 3 05:58
Pid Process Name State Pri I/O CPU Page flts Ph.Mem
24000401 SWAPPER HIB 16 0 00:10:58.67 0 0
24000406 CONFIGURE HIB 10 93 00:07:07.14 173 259
24000409 IPCACP HIB 10 7 00:00:01.04 121 166
2400040C HIENZ_57 RWMPB 4 6 00:00:00.39 84 215
^ ^ ^ ^
| | | |
+------------+------------+------+--------------+
|
+--( Process State, CPU time, I/Os, and )
( Pagefaults remain the same )
See CAUSES 1,2,3,6,8 for more information.
SYMPTOM 2:
The following messages are observed on the console.
SYSTEM-W-PAGEFRAG, Pagefile badly fragmented, system continuing
or
SYSTEM-W-PAGECRIT, Pagefile space critical, system trying
See CAUSES 1,2,3,6 for more information.
SYMPTOM 3:
Multiple SHOW SYSTEM commands show processes in RWMPB with *SOME*
increase in CPU time, I/Os, or Pagefaults. DECps rule R0240 or
R0245 may also fire.
See CAUSES 1,2,3,4,5,7,8,9,10 for more information.
SYMPTOM 4: (Pre OpenVMS v5.2)
Multiple SHOW SYSTEM commands show processes in RWMPE.
See CAUSES 1,2,3,4,5,6 for more information.
SYMPTON 5:
Slow reboots with the STARTUP process spending excessive time in RWMPB
state.
See CAUSE 2 for more information (consider increasing PQL_MWSDEFAULT).
CAUSE 1:
A pagefile on the system is over 50% consumed causing the Modified
Page Writer to take more time to look for contiguous free blocks in
the pagefile. If one or more pagefiles becomes full, the system
could hang.
To determine if a pagefile is full, issue the following command:
$ SHOW MEMORY/FILES
System Memory Resources on 22-OCT-1993 13:20:54.97
Paging File Usage (pages): Free Reservable Total
CPAGE:[000000]SWAPFILE.SYS;1 99992 99992 99992
CPAGE:[000000]PAGEFILE.SYS;1 5819 161511 399992
^ ^
| |
+-------------------+-------------------------+
|
+--("Free" space is less than 50% of the "Total")
Example:
(Total-Free) > (Total/2)
(399992-5819) > (399992/2)
394173 > 199999
If the 'SHOW MEMORY/FILES' command does not reveal an over consumed
pagefile, there is another technique that can be used to determine
if the pagefile(s) has ever been over 50% consumed. This technique
uses the SYS$SYSTEM:AGEN$FEEDBACK.EXE application as shown below:
$ DIRECTORY/SIZE CPAGE:[000000]PAGEFILE.SYS
Directory CPAGE:[000000]
PAGEFILE.SYS;1 400000
Total of 1 file, 400000 blocks.
$ RUN SYS$SYSTEM:AGEN$FEEDBACK
$ SEARCH SYS$SYSTEM:AGEN$FEEDBACK.DAT PAGEFILE
PAGEFILE1_NAME = "CPAGE:[000000]PAGEFILE.SYS;1"
PAGEFILE1_PEAK = 394173
According to AGEN$FEEDBACK.EXE, the pagefile was over 50% full at
one time.
Note:
If more than 1 pagefile is installed subsequent pagefiles
will be listed as "PAGEFILE2_", "PAGEFILE3_", etc.
SOLUTION 1:
Ensure all pagefiles on the system has 50% or more free space.
Methods for increasing pagefile space are listed below:
1. If you need to expand the pagefile located in SYS$SYSTEM, use
the command procedure SWAPFILES.COM in SYS$UPDATE. If you
receive an error message 'File Header is too full', you must do
an IMAGE BACKUP and RESTORE to compress the disk.
2. Use the SYSGEN Utility to create or expand any pagefiles as
follows:
$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> CREATE disk[directory]:pagefile_name/SIZE=<value>
The <value> you specify in the SYSGEN CREATE command depends on
the needs of the system to maintain 50% free space in a pagefile.
The maximum size of any single pagefile should be restricted to
1,048,575 blocks.
Using the output from the SHOW MEMORY/FILES command in CAUSE 1,
you see that 39,4173 pages of the pagefile were consumed.
Multiplying this value by 2 will help determine the requirements
for pagefile space on your system. For example:
Total - Free = Consumed
399992 - 5819 = 394173
Consumed * 2 = Required
394173 * 2 = 788346
The calculated "Required" value will be the value specified with
the /SIZE qualifier.
If there are multiple pagefiles on the system, then all pages
consumed will be the base for the calculation. Distribute the
amount determined from the "Pagefile_increase" calculation to
all pagefiles for performance load purposes. For example:
Required - Total = Pagefile_increase
788346 - 399992 = 388354
Divide the "Pagefile_increase" by the number of pagefiles
installed. Then add this value to the "Total" of each pagefile
to be used as the value for the /SIZE qualifier. For example,
assuming the system has 3 pagefiles that are 100000 blocks, the
total pagefile space will be 300000 blocks. SHOW MEMORY/FILES
shows a total of 100000 blocks "Free". Therefore, the
following calculation would be made:
Total - Free = Consumed
300000 - 100000 = 200000
Consumed * 2 = Required
200000 * 2 = 400000
Required - Total = Pagefile_increase
400000 - 300000 = 100000
Pagefile_increase
----------------------------- = Increase_Amount_per_pagefile
Number_of_Pagefiles_Installed
100000
------ = 33000 (Aprox.)
3
The /SIZE qualifier for the CREATE command in SYSGEN will have
a value of 1033000. If multiple files of dissimilar size are
installed, apply the "Increase_Amount_per_pagefile" proportionally
to the size of the pagefile.
Note:
When using the SYSGEN CREATE command to extend an existing
pagefile, SYSGEN will will extend the size of the file, but
the system must be rebooted so that OpenVMS can map the new
blocks as pagefile space. If the pagefile does not exist,
SYSGEN will created the new file, but you must then use
the SYSGEN INSTALL command to map the new file as a pagefile.
For example, to install the file PAGEFILE2.SYS, just created
on DUA2: in the [000000] directory, use the following command:
SYSGEN> INSTALL DUA2:[000000]PAGEFILE2.SYS/PAGEFILE
Remember, the number of page files that can be installed is
controlled by the SYSGEN parameter 'PAGFILCNT'. If you
create new pagefiles, ensure that this parameter is set
correctly.
CAUSE 2:
Processes are pagefaulting heavily. The amount a process pagefaults
and the speed in which that pagefault is resolved is dependent on the
environment the process is running under. What might be considered
heavy pagefaulting on one system may be very acceptable on another.
Heavy pagefaulting can occur for several reasons. Most of these
reasons have the same general approach and some are normal and
temporary reasons for the process to be in certain resource wait
states.
Conditions that cause processes to go into RWMPB are listed below:
Condition A
-----------
MPL is at or above MPW_WAITLIMIT. The process pagefaults removing a
page from its working set placing it on the MPL. The following is
displayed from a SHOW MEMORY command:
$ SHOW MEMORY/PHYSICAL
System Memory Resources
Phy Mem Use (pages): Total Free In Use Modified
Main Mem (256.00Mb) 524288 35237 472274 21102
^
|
+---------------------------------------------------------------+
| $ MCR SYSGEN
| SYSGEN> SHOW MPW_WAITLIMIT
| Param Name Current Default Min.
| ---------- ------- ------- -------
| MPW_WAITLIMIT 21091 620 0 .....
| ^
| |
+-----------------------+
|
+----< The current size of the MPL is greater the MPW_WAITLIMIT
parameter in SYSGEN.
Condition B
-----------
MPL is at or above MPW_LOWAITLIMIT and the SWAPPER's Modified Page
Writer is currently writing out pages from the MPL. The process
pagefaults and must remove a page from its working set to place it
on the MPL.
Condition C
-----------
A process is doing a pagefault and must remove a page from the
working set to pagefault another page into the working set. The
page it removes is a Process Page Table page that points to other
pages on the MPL.
Note:
The process is put into RWMPB state waiting for the SWAPPER
to wake up and write these pages out to the pagefile. This
mechanism is known as "Dead Page Table Scan". The memory
location PMS$GL_DPTSCN is incremented every time this occurs.
After OpenVMS v5.4, processes may remain in a RWMPB state from
zero to two seconds as the SWAPPER checks for this state every
other second. The only way to reduce this from occurring is to
try and reduce the process pagefaults.
On Pre-OpenVMS v5.2 systems, processes were placed in RWMPE while
the entire contents of the MPL was flushed.
SOLUTION 2:
One way to reduce the chance of a process being put into RWMPB/RWMPE
state, is to reduce process pagefaulting. To do this, use one of
the methods listed below:
1. Avoid the use of System Services '$PURGWS' and '$ADJWSL' which
can force pages out of the working set. When this occurs, the
process is then forced to fault these pages back in.
2. Avoid excessive image or process activations since the process'
working set for process space must start over again for each
image or process getting started.
3. Avoid the use of Automatic Working Set Decrementing by setting
the PFRATL SYSGEN parameter to the default value of 0.
4. On memory constrained systems, increase physical memory and
allow processes to have larger working set sizes. This will
reduce the overall pagefault rate.
5. Adjust the process working set sizes: WSDEFAULT, WSQUOTA,
WSEXENT and the WSMAX SYSGEN parameter to allow any memory
limited processes that are still pagefaulting heavily to get a
larger working set size. This is assuming there is sufficient
physical memory to allow the increase.
6. Adjust the SYSGEN parameters listed below to allow the working
set size to be adjusted more frequently and more generously:
Note:
Actual settings could vary from these suggested values.
o lower QUANTUM from 20 to 10 (or 5 if a very fast CPU)
o lower AWSTIME to QUANTUM
o lower PFRATH from 120 to 60
o increase WSINC from 150 to 403 (or higher if a memory rich
system)
o verify PFRATL is set to 0
7. Increase the "Buffer Zone" that the SWAPPER uses to determine
whether or not to place a process in this state. This is
defined in SOLUTION 7.
8. If the problem is occurring during high image activations, then
increasing PQL_MWSDEFAULT may alleviate the problem.
Use the following formula for calculating the lower threshold
for PQL_MWSDEFAULT.
PQL_MWSDEFAULT = ((VIRTUALPAGECNT+WSMAX)/512)+10
If PQL_MWSDEFAULT's new value is greater than PQL_MWSQUOTA and/or
PQL_MWSEXTENT, then these parameters should be increased.
Note:
The system must be rebooted, in order for changes to the
PQL_MWSDEFAULT parameter to take effect.
9. If the problem is occurring during boot and the STARTUP process
is spending extensive amounts of time in RWMPB, then you should
increase PQL_MWSDEFAULT to a value higher than
(VIRTUALPAGECNT+WSMAX)/512.
10. If the system is configured with a high PQL_MWSEXTENT and
PQL_DWSEXTENT (typically only seen on large memory systems,
i.e.; 3 gigabyte or more), lower these values to 150000 or
lower.
Note:
The PQL_DWSEXTENT and PQL_MWSEXTENT are both dynamic
parameters. The new value's effectiveness may be
tested on the running system.
CAUSE 3:
The system is swapping and there is insufficient swapfile space. If
a process must be outswapped from memory, the SWAPPER will attempt to
find enough contiguous space in a swapfile. If it can not find the
space in a swapfile, it will try to outswap the process to the
pagefile. This takes more pagefile space. (See CAUSE 1)
If the SWAPPER can not find contiguous swap space, it can take
considerable amounts of CPU time searching both swap and pagefiles.
This will further delay both the system and processes in RWMPB state.
For example, the following SHOW SYSTEM command shows a process that
is in a RWMPB state and is outswapped:
$ SHOW SYSTEM
VAX/VMS V5.4 on node ROTTIE 22-OCT-1993 13:09:25.77 Uptime 3 05:58
Pid Process Name State Pri I/O CPU Page flts Ph.Mem
24000401 SWAPPER HIB 16 0 00:10:58.67 0 0
24000406 CONFIGURE HIB 10 93 00:07:07.14 173 259
24000409 IPCACP HIB 10 7 00:00:01.04 121 166
2400040C HIENZ_57 RWMPB 4 6 00:00:00.39 84 215
24000410 GOLDEN RWMPB 4 6 00:00:01.20 182 340
24000416 DOBBIE LEF 4 184 00:08:55.08 29166 923
24000419 CORGIE RWMPB 4 174 00:11:59.22 20328 471
2400041C SHEPARD LEFO 9 -- swapped out -- 461
24000420 P_BULL LEFO 9 -- swapped out -- 461
24000421 LASSA_A HIBO 9 -- swapped out -- 285
^
|
+--- Processes in both RWMPB and outswapped.
SOLUTION 3:
Ensure there is sufficient swapfile space. The general rule is to
keep the swapfile at least 30% free on a swapping system and to make
the swapfile large enough so that the pagefile will not have to be
used for swapping.
CAUSE 4:
The pagefile(s) are on very busy disks, or disks are shared in a
cluster environment and together all the nodes are keeping the disk
very busy. Processes may remain in RWMPB state slightly longer than
normal due to disk contention waiting for the I/Os to complete.
To determine if the pagefiles are on very busy disks, use the
MONITOR DISK command. If this is a clustered environment, use the
MONITOR CLUSTER command. You can also use other performance tools
available such as DECPS.
Note:
If the pagefile is on a device that can not be accessed, this
problem will also occur. Perform the following steps prior to
any other performance tuning:
1. If a SHOW ERROR command shows errors on a device that
contains pagefiles, address this problem first.
2. The modified page writer has a limited number of IRPs.
The number of IRPs is defined by the SYSGEN parameter
MPW_IOLIMIT with a default of 4. It can only have
MPW_IOLIMIT I/Os outstanding.
To determine how many I/Os the modified page writer
has issued, execute the following and subtract the
"number of elements" from the value of MPW_IOLIMIT:
$ ANALYZE/SYSTEM
SDA> READ/EXECUTIVE
SDA> EXAMINE PAGE_MANAGEMENT+4B7O ! For OpenVMS 5.5 - 5.5-2
SDA> EXAMINE PAGE_MANAGEMENT+638C ! For OpenVMS 6.2 - 7.0
This is the listhead (MPW$GL_IRPFL) of the pre-allocated
I/O Request Packets (IRPs) for the modified page writer.
SOLUTION 4:
If the pagefile(s) are on very busy disks consider moving the file
to a less active volume. If the disk is in a cluster and all nodes
share the same disk for pagefiles, consider giving each node in the
cluster its own disk to use for its pagefile(s).
CAUSE 5:
The Modified Page Writer may have too many pages to flush at one
time.
SOLUTION 5:
Make sure that MPW_LOWAITLIMIT is not significantly lower than
MPW_HILIMIT. By default AUTOGEN will set up MPW_LOWAITLIMIT to
equal 'MPW_HILIMIT - MPW_WRTCLUSTER'. Check to see if
MPW_LOWAITLIMIT is hard coded in MODPARAMS.DAT to a value other
than the AUTOGEN default.
If MPW_LOWAITLIMIT is significantly lower then MPW_HILIMIT, the
Modified Page Writer may have to write out more pages than
necessary before freeing processes in a RWMPB state.
Note:
As the DECps rule R0240 suggests (see DECPS area in DETAILED
ANALYSIS section), you could also try reducing disk contention
by tuning the MPL to write out fewer pages when it is cleaning
up the list.
If the MPW parameters are hardcoded in SYS$SYSTEM:MODPARAMS.DAT,
you might first consider commenting them out and rerunning
AUTOGEN to retune the parameters. If the results are not
sufficient, then you could look into raising MPW_LOLIMIT to a
higher value to reduce the number of pages the SWAPPER must
write out. However, never raise MPW_LOLIMIT greater than
MPW_LOWAITLIMIT.
CAUSE 6:
The system dump is taking pages out of your pagefile. All of the
following items MUST be true in order for a system dump to be
consuming pages out of the pagefile:
1. The system does NOT have a SYS$SYSTEM:SYSDUMP.DMP file. To
determine if the system has a SYSDUMP.DMP file, issue the
following commands:
$ DIRECTORY SYS$SYSTEM:SYSDUMP.DMP
%DIRECT-W-NOFILES, no files found
2. The system does have a SYS$SYSTEM:PAGEFILE.SYS file as shown
below:
$ DIRECTORY/SIZE SYS$SYSTEM:PAGEFILE.SYS
Directory SYS$SYSROOT:[SYSEXE]
PAGEFILE.SYS;1 525000
3. The SYS$SYSTEM:PAGEFILE.SYS file is larger then the total size
of physical memory. To determine the size of physical memory,
issue the command 'SHOW MEMORY/PHYSICAL'. For example:
$ SHOW MEMORY/PHYSICAL
System Memory Resources
Phy Mem Use (pages): Total Free In Use Modified
Main Mem (256.00Mb) 524288 35237 472274 21102
Note:
In this example, the size of the pagefile (from the previous
command) is larger than the "Total" memory size displayed above.
4. The parameter 'SAVEDUMP' in SYSGEN is set to 1. To determine
this setting, issue the SYSGEN command below:
$ MCR SYSGEN
SYSGEN> SHOW SAVEDUMP
Parameter Name Current Default Min.
-------------- ------- ------- -----
SAVEDUMP 1 0 0 .........
Note:
To determine if your current pagefile contains a system dump,
issue the following command:
$ ANALYZE/CRASH SYS$SYSTEM:PAGEFILE.SYS
If you receive an 'SDA>' prompt from the above command, the
pagefile contains a system dump.
SOLUTION 6:
If you receive an 'SDA>' prompt, locate enough free space on another
disk and COPY the contents of the system dump file for later analysis.
The following SDA commands can be used:
SDA> COPY ddcu:[save_crash]SAVEDUMP.DMP
SDA> EXIT
Once you have a copy of the system dump, or if you are not concerned
with the contents of the crash, you can free the space in the pagefile
with the following command:
$ ANALYZE/CRASH/RELEASE SYS$SYSTEM:PAGEFILE.SYS
CAUSE 7:
The SWAPPER process is pre-empted by a higher priority real time
process. The SHOW SYSTEM command will show processes with a
priority in the range of 16 - 31 ("Pri" field).
SOLUTION 7:
To fix this cause, choose one of the solutions listed below:
1. If you do run with real time processes, you may want to consider
raising MPW_WAITLIMIT to a higher value. This will allow the
SWAPPER more elapsed time to get to the MPL before processes are
put into RWMPB state.
To allow the SWAPPER more elapsed time to notice that the MPL is
above MPW_HILIMIT but below MPW_WAITLIMIT, raise MPW_WAITLIMIT.
This technique gives the SWAPPER a larger "Buffer Zone" before
placing processes in RWMPB.
One starting point might be:
MPW_WAITLIMIT = MPW_HILIMIT + ( 5 * MPW_WRTCLUSTER )
2. You might also consider whether or not processes running as real
time processes need to run as such a high priority. They may work
just as well in a priority range of 4 - 15.
CAUSE 8:
The SYSGEN parameter 'MPW_WAITLIMIT' is less than MPW_HILIMIT.
SOLUTION 8:
Ensure MPW_WAITLIMIT is greater than or equal to MPW_HILIMIT. If
not, processes could hang in RWMPB indefinitely or the system could
hang. By default, AUTOGEN will set MPW_WAITLIMIT to equal
'MPW_HILIMIT + MPW_WRTCLUSTER'.
CAUSE 9:
The process is deleting virtual address space. If the virtual
address space deletion code is deleting a page with an outstanding
reference, it assumes that the SWAPPER is currently writing out
that page. The process will then pause in RWMPB.
SOLUTION 9:
Once the page has been written, the SWAPPER's Modified Page Writer
routine will free the process from this state.
CAUSE 10:
A rare problem may occur when processes remove pages off of the MPL
after another process has been placed into the RWMPB state.
Listed below is the detailed sequence of events that occur under this
condition:
1. MPL is greater then MPW_WAITLIMIT.
2. A process pagefaults a modified page to the MPL causing it
to go into RWMPB.
3. Before the SWAPPER wakes up to do its once a second checks,
other processes pagefault pages off the MPL causing it to
drop below MPW_HILIMIT.
4. The SWAPPER wakes up and sees the MPL below MPW_HILIMIT. It
assumes it has no work to do leaving the process described
in '#2' above in a RWMPB state.
Note:
This problem is usually seen on systems with multiple real
time processes which are at a priority higher than the
SWAPPER (priority 16). However, it can occur on any system
depending on the process pagefault behavior.
SOLUTION 10:
A process hung in a RWMPB state will be freed later by other
processes going in and out of the state. If this problem is
suspected, MPW_WAITLIMIT can be raised to reduce the chance of this
from occurring. (See the description on increasing the "Buffer Zone"
defined in SOLUTION 7.)
If you continue to see processes hang in RWMPB state for extended
periods of time, you may increase the "Buffer Zone" by raising
MPW_WAITLIMIT up to 20000 pages higher than MPW_HILIMIT. This issue
has been forwarded to OpenVMS Engineering.
DETAILED ANALYSIS:
This section answers the following questions:
1. What are the process pagefault dynamics and working set size?
A process gets WSDEFAULT number of WSLEs in its working set at
process activation time. These WSLEs will be made valid as a
process pagefaults. Based on the pagefault rate of the process,
the working set will grow to WSQUOTA by WSINC increments at
periodic intervals.
Note:
A process is checked to determine the necessity for working
set growth at AWSTIME intervals. AWSTIME is expressed in
units of 10 milliseconds and its default value is 20. The
amount a process pagefaults in AWSTIME is compared to an
upper threshold for pagefaults per 10 CPU seconds defined
by the SYSGEN parameter 'PFRATH'. For example, if AWSTIME
is set to 20 and PFRATH 120, a process would get WSINC WSLEs
if it pagefaulted 3 pages in AWSTIME as shown in the chart
below:
AWSTIME set to 20 = 200 milliseconds
3 pagefaults in 200 milliseconds = 15 pagefaults per CPU second
15 pagefaults per CPU second = 150 pagefaults per 10 CPU second
150 > 120 (Value of PFRATH)
----------------------------------------------
Process gets WSINC number of WSLEs
If it is determined that the process has exceeded the
pagefault threshold (PFRATH) within the specified interval
(AWSTIME), it will receive WSINC WSLEs when it has validated
more than 75% of the WSLEs previously given to the process.
The process will be allowed to grow, incrementally, beyond WSQUOTA
to WSEXTENT if there are at least BORROWLIM number of pages on the
FPL. This initial growth will be in WSLEs. The WSLEs can only be
validated with an actual page from the FPL when there are at least
GROWLIM number of pages on the FPL.
2. What occurs when a process pagefaults?
When a process tries to pagefault another page into its working
set, the OpenVMS Pagefault Handler searches the process' current
working set list for an EMPTY entry or a VALID entry that is not
in the Translation Buffer Cache (TB Cache). It will only search
the SYSGEN parameter 'TBSKIPWSL' working set list entries.
If an EMPTY working set list entry (WSLE) is found, a check is
made to see if another page can be added to the working set.
There are two conditions that must be met to add a page to the
working set:
1. The current working set in use (Process Page Count + Global
Page Count (PPGCNT+GPGCNT)) must not be at the currently
allowed Working Set Size (WSSIZE).
2. If the WSSIZE is greater than WSQUOTA, the number of free
pages on the system's Free Page List (FPL) must be greater
than or equal to GROWLIM free pages.
If we can not add a new page to the working set at this time,
the search will continue looking for a page to replace in the
existing working set.
If a valid WSLE is found and the page it describes is not in
the TB Cache, then that page is chosen to be replaced.
If the TBSKIPWSL count has expired and no EMPTY or invalid TB
Cache entries have been found, the next valid page is chosen
to be replaced. Processes could be performing page replacement
even though they have free WSLEs available in their working set.
Note:
Some sites have increased TBSKIPWSL to try and reduce page
replacement. However, it is not recommended that TBSKIPWSL
be changed from its default value. In doing this, some sites
have caused system hangs.
If the process is doing page replacement and the page it is
replacing is a modified page (a page the process has written to,
such as a data or buffer page), then that page will be placed on
the MPL. When this occurs, a check is made and the process is
put into a RWMPB state if one of the following is true:
1. The MPL contains more pages than the SYSGEN parameter
'MPW_WAITLIMIT'.
2. The MPL contains more pages than the SYSGEN parameter
'MPW_LOWAITLIMIT' and the Modified Page Writer is active
writing modified pages to the pagefile.
By placing these processes into RWMPB state, OpenVMS gives the
SWAPPER's Modified Page Writer time to clean up the MPL. It also
stalls those processes that are pagefaulting heavily and adding
pages to the MPL. This gives CPU time to other processes on the
system. Generally, the RWMPB state occurs on a system whose MPL
has grown faster than it could be written.
3. How is the MPL written?
The SWAPPER process wakes up once a second and checks to see
if the MPL is greater than MPW_HILIMIT. If so, it will start
writing some of the modified pages out to the pagefile until
the MPL reaches MPW_LOLIMIT (pre V5.2) or MPW_LOWAITLIMIT (V5.2
and above).
When the MPL drops below MPW_LOWAITLIMIT, the Modified Page
Writer will free those processes in a RWMPB state allowing them
to continue. Once a page has been written from the MPL, it is
placed on the Free Page List (FPL).
Parameters that control the MPL, and a diagram of the MPL are
listed below:
Parameters:
MPW_WAITLIMIT
Number of pages on the MPL that will cause a
pagefaulting process to be placed in RWMPB until
the next time the modified page writer writes the
list. The default is 'MPW_HILIMIT + MPW_WRTCLUSTER'.
Ensure that MPW_WAITLIMIT it is greater than or
equal to MPW_HILIMIT so that a system deadlock does
not occur.
MPW_HILIMIT
Upper limit for the number of pages on the MPL that
causes the Modified Page Writer to write the list to
a pagefile. Use a maximum of 500 and 4% of MEMSIZE.
MPW_WRTCLUSTER
Number pages to be written to the pagefile during 1
I/O transfer. The default is 120.
MPW_LOWAITLIMIT
Number of pages on the MPL at which processes in the
RWMPB state will be made computable. On VMS V5.2 and later
systems, this is the lower limit for the number of pages on
the MPL at which the Modified Page Writer stops writing
pages to the pagefile. The default is 'MPW_HILIMIT -
MPW_WRTCLUSTER'. Ensure that MPW_LOWAITLIMIT is greater
than or equal to MPW_LOLIMIT.
MPW_LOLIMIT
Lower limit for the number of pages on the MPL at which
the Modified Page Writer stops writing pages to the
pagefile on pre V5.2 systems. Use '3 * BALSETCNT', but no
more than 120.
MODIFIED PAGE LIST
+---------------------+
| |
MPW_WAITLIMIT --> +---------------------+
| |
| | <--- MPW_WRTCLUSTER
| |
MPW_HILIMIT --> +---------------------+
| |
| | <--- MPW_WRTCLUSTER
| |
MPW_LOWAITLIMIT -> +---------------------+
| |
| |
| |
| |
| |
| |
MPW_LOLIMIT --> +---------------------+
| |
+---------------------+
Parameters that control the Free Page List (FPL) are listed below:
Parameters:
BORROWLIM
Minimum number of pages that must be on the FPL before
the system will permit a process to grow past WSQUOTA
when doing automatic working set adjustment. Set to
'FREEGOAL + 25%'. If 'MMG_CTLFLAGS .NE. 0' then
'BORROWLIM = FREELIM'.
FREEGOAL
Number of pages to try and keep on the FPL. Use
maximum of a multiple of FREELIM and 1% of MEMSIZE.
If 'MMG_CTLFLAGS .NE. 0' then 'FREEGOAL = MEMSIZE/8,'
but stay between 1600 and 12000.
GROWLIM
Number of pages that must be on the FPL before a process
that is above WSQUOTA can add a page to its working set
during pagefault processing. Set to 'FREEGOAL - 1' so
that working set can be increased at every opportunity.
If 'MMG_CTLFLAGS .NE. 0' then 'GROWLIM = FREELIM'.
FREELIM
Minimum number of pages that must be on the FPL. Set
to 'BALSETCNT + 20', but stay between 32 and 150.
Note:
References to MMG_CTLFLAGS (proactive memory management)
are for Post- OpenVMS v5.4-2. This parameter is defined
in other database articles.
4. What are the reasons for RWMPE (Pre-OpenVMS v5.2)?
The RWMPE state indicates that the Modified Page Writer is trying
to 'EMPTY' or 'FLUSH' the MPL by writing all pages on the list out
to the pagefiles.
Starting with OpenVMS v5.2, the MPL is no longer flushed or
completely written to the pagefiles. Instead, it is selectively
scanned for pages to be written.
Note:
This new behavior is discussed in the "OpenVMS v5.2 Release
Notes" and in CAUSE 2 of this article under "Condition C".
There are 4 reasons to flush the MPL:
1) A process deletes a global section using the $DGBLSC system
service and that global section has pages on the MPL.
2) A process has been outswapped and its Process Header (PHD)
maps transition pages that are on the MPL.
3) OPCCRASH is run.
4) A process pagefaults a Page Table Page that references Page
Table Entrys that are on the MPL. This is the most common
cause for flushing the MPL.
Every page in use by a process has a corresponding Page Table
Entry (PTE). PTEs are longwords that are stored in a process
page called a Page Table Page (PTP). PTPs are part of the users
working set. OpenVMS takes certain steps to prevent the working
set from becoming consumed by PTPs.
If a pagefault occurs that requires page replacement (see
"Condition C" in CAUSE 2 above), and the page chosen to be
removed is a PTP, then the following steps are taken:
1) OpenVMS checks to see if there are there any pages in the
working set that are defined by any PTE in the PTP selected
for replacement. If so, the page must be left alone. The
pagefault handler will look for another page to remove.
2) A calculation is made of the number of dynamic pages
available in the working set (pages not locked, not in the
process header, and not a PTP). If there are sufficient
dynamic pages available, then the pagefault handler will
leave this PTP alone and look for another page to remove.
3) If there are no pages on the FPL or MPL, referenced by the
PTEs in the PTP, then the PTP can be removed from the
working set.
4) If there are pages on the FPL but not on the MPL
(referenced by the PTEs in the PTP), then links are severed
between the process and the pages on the FPL. The PTP can
then be removed from the working set.
5) If there are pages on the MPL, referenced by the PTEs in the
PTP, they must be written to the pagefile before those links
can be broken.
To break this link the MPW parameters 'MPW_HILIMIT' and
'MPW_LOLIMIT' are temporarily set to zero and the SWAPPER process
is awakened. Once awakened, the SWAPPER will flush the entire
contents of the MPL. (The flush is forced due to the temporary
setting of the MPW parameters.) This is the only way to guarantee
that the page being referenced by the PTE in a PTP has been written
to a pagefile. Once written to the pagefile, the pages are placed
on the FPL and the link to the PTE in the PTP can be broken. While
this flush is occurring, the the process will be placed in a RWMPE
state.
5. What causes a PAGEFRAG and PAGECRIT condition?
When the Modified Page Writer is invoked to write pages from the
MPL, it must first determine the number of contiguous blocks that
can be allocated from the pagefile.
The number of contiguous blocks it will attempt to allocate are
initially defined by the SYSGEN parameter 'MPW_WRTCLUSTER'. The
search for this initial allocation size is begun at the beginning
of the pagefile and proceeds to the end.
If the Modified Page Writer is unable to allocate MPW_WRTCLUSTER
number of contiguous pages, it will reduce this value by 16,
store it in PFL$B_ALLOCSIZ of the associated pagefile, and
re-attempt the allocation.
When the value PFL$B_ALLOCSIZ has been reduced to less then 16
and no contiguous pages have been found in the first 25% of the
pagefile, the PAGEFRAG console message is displayed.
If the value PFL$B_ALLOCSIZ is less then 16 and contiguous pages
have been found in the last 25% of the pagefile, the PAGECRIT
console message is displayed.
Note:
It is possible that neither the PAGEFRAG or PAGECRIT messages
will be displayed even if all pagefile space is consumed.
7. What are the indications in DECps V1.1 when processes are in a
RWMPB state?
DECps may trigger one of the following rules (rule R0240 and
R0245) when it finds processes in a RWMPB state so the the
System Manager can investigate for possible problems:
{R0245} The following images were waiting because the Modified
Page Writer was busy.
This wait state can occur when a process tries to put a page on
the modified list and it is at MPW_WAITLIMIT. It also can occur
when a process tries to take back a page back from the modified
list when page-writing is active and above MPW_LOWAITLIMIT.
The reason for this wait state is that the SYSGEN parameter
'MPW_WAITLIMIT' is less than the SYSGEN parameter 'MPW_HILIMIT'.
In this case, the system may hang. You should raise the SYSGEN
parameter 'MPW_WAITLIMIT' to at least be equal to the SYSGEN
parameter 'MPW_HILIMIT'. When changing the SYSGEN parameters,
always use AUTOGEN. However, you can also use SYSGEN to change
this parameter (MPW_WAITLIMIT) to affect the running system.
Total number of samples supporting this conclusion: ##
{R0240} The following images were waiting because modified page
writer was busy.
This wait state can occur when a process tries to put a page on
the modified list and it is at MPW_WAITLIMIT. It also can occur
when a process tries to take back a page from the modified list
when page-writing is active and above MPW_LOWAITLIMIT.
Typically, the reasons for this wait state are:
1. Insufficient page file space.
2. A real time job may be blocking swapper process.
3. Heavy paging activity.
4. Too many pages have to be flushed at a time.
When the pagefile(s) are on shared disks in a cluster, tune the
swapper for smaller and more frequent writes from the modified
page list to avoid disk contention and longer delays. The SYSGEN
parameters 'MPW_HILIMIT' and 'MPW_LOLIMIT' determine the amount
of pages which the SWAPPER will write to the pagefile(s).
Total number of samples supporting this conclusion: ##
REFERENCES:
"VMS Internals and Data Structures", V5.2, 1990, EY-C171E-DP-ECG.
"Guide to Setting Up a VMS System", April 1988, (A-LA25A-TE), Chapter
6: "Performing AUTOGEN and SYSGEN Operations".
"VMS Version 5.2 Release Notes", June 1989, (AA-LB22B-TE),
Section 3.29: "Modified-Page Writer - Flushing of Modified_Page List
Eliminated", page 3-57.
"VMS Version 5.0 Release Notes", April 1988, (AA-LB22A-TE), Section
8.70: "VMS Executive -- Changes", pages 8-64 to 8-66.
RELATED ARTICLE(S):
Other articles in the OPENVMS database describes proactive memory
management, pagefile sizes, and inactive SWAPPER problems. These
articles can be found using search strings of:
"Maximum Size Installed Pagefile"
"Details Proactive Memory Reclamation MMG_CTLFLAGS"
"SWAPPER Inactive System Hang"
Adrian
Alex Daniels wrote:
>
> Here you go.
>
> Alex
>
http://h18000.www1.hp.com/support/asktima/operating_systems/CHAMP_SRC931013001093.html