Virtual Memory Swapping

Khien Mien Kennedy Chew

unread,

May 8, 1992, 11:10:32 AM5/8/92

to

Hello,
I have a question about how swap space for virtual memory
is allocated on disk. When is the allocation done (at process creation
time or when a page is actually swapped out (not likely)? How is
the swap space managed? - different processes have their own spaces
on disk within the swap space or do they share the same large space.
Where can I find documentation about the above for different
unix and non-unix operating systems that support vm?

Thanks
Kennedy Chew

Barry Margolin

unread,

May 8, 1992, 9:28:05 PM5/8/92

to

In article <l0l6f8...@oasis.cs.utexas.edu> ken...@cs.utexas.edu (Khien Mien Kennedy Chew) writes:
> I have a question about how swap space for virtual memory
>is allocated on disk. When is the allocation done (at process creation
>time or when a page is actually swapped out (not likely)?

On most Unix systems, allocation is mainly done when a process is created.
In addition, more swap space may be allocated to a process when it
allocates more virtual memory (e.g. by calling brk(2) (which would probably
be called by malloc(3)) or creating a shared memory segment).

> How is
>the swap space managed? - different processes have their own spaces
>on disk within the swap space or do they share the same large space.

There's generally one or more large spaces, called swap partitions, that
are configured for the system and shared by all processes. On versions of
Unix that permit memory-mapped files (i.e. the mmap(2) system call) you can
map a file and use it as a private swap area.

> Where can I find documentation about the above for different
>unix and non-unix operating systems that support vm?

There are many operating systems textbooks, and I'm sure most of them
discuss this. Sorry, I don't have any particular ones to recommend.

--
Barry Margolin
System Manager, Thinking Machines Corp.

bar...@think.com {uunet,harvard}!think!barmar

Larry McVoy

unread,

May 9, 1992, 6:11:35 PM5/9/92

to

ken...@cs.utexas.edu (Khien Mien Kennedy Chew) writes:

: I have a question about how swap space for virtual memory

: is allocated on disk. When is the allocation done (at process creation
: time or when a page is actually swapped out (not likely)? How is
: the swap space managed? - different processes have their own spaces
: on disk within the swap space or do they share the same large space.

Allocation is done in fork(2) and in sbrk(2). Most Unix implementations
are very conservative as to when they allocate - they always preallocate
to make sure that if the process needs to be swapped it won't get killed
because there is no swap space. This is a bummer on systems that have
more memory than disk (such systems exist). SunOS 5.0 fixes this problem
to some extent by considering unbacked memory legit for swap space.
Allocation is delayed until the pages are pushed at which time the
swap device is asked to allocate space. If space is allocated then the
page[s] is/are renamed from memory "backing" to swap device backing.

: Where can I find documentation about the above for different

: unix and non-unix operating systems that support vm?

Howard Chartok did this work in SunOS. You can send him mail at
how...@eng.sun.com and ask for his docs. The BSD book documents this
as does the Bach book.

Strong opinion on>

I think the way Unix does swapping is crazy. Swapping whole processes only
made sense when the process was small. Given that we can have 100MB
processes, swapping just doesn't make any sense at all. My opinion is that
the swapper should be tossed out. Use the pager instead, and teach the
pager how to do large I/O. Currently, the pager scans pages in random
order (physical page frame order, which has nothing to do with file page
order). The pager should page vnodes, not pages. The vnode should keep
statistics as to # of pages, activity, maybe contiguous ranges, etc,
etc. Then the pager can scan the (small) list of vnodes and quickly figure
out which is causing the problem and how much can be done to fix it.

The swapper is busted.

The pager is stupid.

I claim that VM performance is not even with a factor of 10 what it could
be if there was a properly implemented pager. Of course, the speedup or
slowdown is only seen when you are operating under the condition that is
known as "10 pounds of sh*t in a 5 pound bag".

opinion off>
---
Larry McVoy (415) 336-7627 l...@sun.com

J Anthony Fitzgerald

unread,

May 9, 1992, 7:42:40 PM5/9/92

to

In article <l0ojgn...@appserv.Eng.Sun.COM> l...@slovax.Eng.Sun.COM (Larry McVoy) writes:
>
>Strong opinion on>
>
>I think the way Unix does swapping is crazy. Swapping whole processes only
>made sense when the process was small. Given that we can have 100MB
>processes, swapping just doesn't make any sense at all. My opinion is that
>the swapper should be tossed out. Use the pager instead, and teach the
>pager how to do large I/O. Currently, the pager scans pages in random
>order (physical page frame order, which has nothing to do with file page
>order). The pager should page vnodes, not pages. The vnode should keep
>statistics as to # of pages, activity, maybe contiguous ranges, etc,
>etc. Then the pager can scan the (small) list of vnodes and quickly figure
>out which is causing the problem and how much can be done to fix it.

Perhaps this is an area where UNIX can take a lesson from IBM's MVS
operating system ;-) The system administrator can choose to have swap
data sets or to perform swapping to the page data sets. The decision
depends on the mix of work in the installation. Primarily TSO would
indicate swapping to page data sets, however, when the installation runs
an application (like CICS) which is very sensitive to page delays it
makes sense to have swap data sets and remove the contention of TSO and
batch swapping away from the page volumes. In an ideal situation one
creates enough swap data sets so that the average working set of a TSO
session can be swapped out in parallel then (in theory) swapped in in
parallel to get the user going again with a reasonable approximation of
the working set in pretty close to a single revolution of the disks.

The most recent version of MVS/ESA also has some pretty sophisticated
paging algorithms to determine which pages should be removed from memory
to auxiliary storage. It even allows the application to indicate which
pages will not be required for a while and allow them to be block paged
so that the group of pages will come in together when referenced. It can
make some pretty impressive improvements in performance of applications
which make sequential sweeps through large arrays.

Moreover, the MVS auxiliary storage manager does not have to pre-allocate
auxiliary storage to back the real storage of applications. Slots on
page or swap are allocated only when pages must be moved from frames of
real storage to disk. Not all is rosy, however, the operating system
makes an estimate of how much auxiliary storage each running application
will require and when the total exceeds a threshold percentage of the
installed page then new work will not be started even though in a system
with large amounts of paging files the threshold can be unreasonabel :-(
There are ways around this problem but IBM frowns on them :-)

Sorry, I know this is UNIX.wizards, however, a look at another world can
sometimes lead to better ideas for improvement at home.
--
J. Anthony Fitzgerald j...@UNB.ca (506) 453-4573
Computing Services UofNB Box 4400 (506) 453-3590 (FAX)
Fredericton, NB, CANADA, E3B 5A3