History of copy on write

John Levine

unread,

Jan 21, 2011, 3:57:34 PM1/21/11

to

For a project I've been doing, I've been trying to trace the history
of copy-on-write in operating systems. It's ubiquitous now, but it
took surprisingly long for a lot of systems to make their addressing
hardware support it. Most notably. the IBM 360/67 had virtual memory
in 1967, but IBM mainframes couldn't do copy on write until the mid
1990s.

I put it on my blog so I can update it as I fill in the gaps.
Comments welcome.

http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html

Regards,
John Levine, jo...@iecc.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. http://jl.ly

Anne & Lynn Wheeler

unread,

Jan 21, 2011, 4:36:17 PM1/21/11

to

John Levine <jo...@iecc.com> writes:
> For a project I've been doing, I've been trying to trace the history
> of copy-on-write in operating systems. It's ubiquitous now, but it
> took surprisingly long for a lot of systems to make their addressing
> hardware support it. Most notably. the IBM 360/67 had virtual memory
> in 1967, but IBM mainframes couldn't do copy on write until the mid
> 1990s.
>
> I put it on my blog so I can update it as I fill in the gaps.
> Comments welcome.
>
> http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html

I'm pretty sure that unix under vm370 projects in the early to mid 80s
were supporting copy-on-write. It is also possible that the tss/370 SSUP
layer for AT&T unix in the early to mid 80s also supported
copy-on-write.

... a different kind of copy-on-write (from mid-70s):

also ... original 370 virtual memory architecture had provisions for
segment protect. when retrofitting virtual memory hardware to 370/165
ran into schedule problems ... several features were dropped from 370
virtual memory ... including segment protect. other 370 models and any
software had to drop back to working w/o the dropped features.

vm370 was implementing cms protected "shared" segments using the new
"segment protect" facility ... but when the feature was dropped ...
they had to fall back to a really funky/ugly hack using "storage protect
keys" (had to fiddle the virtual machine's psw and virtual storage
protect keys ... to something different than the virtual machine had
set).

in release 2, there was "virtual machine assist" microcode for 370/158
and 370/168 ... which had hardware support for some privileged
instructions to be directly executed in virtual machine mode. this
included lpsw and ssk instructions. however this was incompatible with
the storage key hack for shared segments ... so VMA couldn't be
activated for CMS users (with shared segments).

A hack was done for vm370 release 3 ... that allowed virtual machines to
be run w/o the storage protect hack (and with VMA active). Cross-user
integrity was preserved by having the system scan all shared pages
(whenever switching to different users) and if any were found to be
changed ... the previously running user had the changed paged made
private ... and the remaining users then refreshed an unchanged page
from disk.

The trade-off was that the benefit of running with VMA turned on (and no
performing the storage protect key hacks) ... more than offset the
changed page scanning overhead. At the time that the decision was made
... typical CMS user ran with only 16 shared pages. However, also part
of vm370 release 3 ... was significant increase in the number of shared
pages that typical CMS user ran with (inverting the trade-off measures)
... aka the scanning/vma decision was made in isolation from changing
vm370 release 3 to have a lot more shared pages.

By the time the issue escalated ... it was claimed to be too late. Some
number of (cms intensive) customers had been advised of the upcoming
change to support VMA for CMS users ... and they had already purchased
the VMA hardware upgrade for 370/168 (at substantial price). Nobody was
willing to tell those customers that 1) that they shouldn't actually run
CMS shared segment users with vma turned on and/or 2) shared segment CMS
wouldn't actually ship with VMA support (since it was no longer a
performance benefit).

This was also a pure single-processor environment ... when
multiprocessor support was shipped ... it was now necessary to have a
unique copy of all shared pages for every processor (so more than one
user would be running concurrently with the same shared pages ... where
either could corrupt the environment of the other). Now in addition to
having to scan the ever increasing number of shared pages (before
switching to a different user) ... the next user to be dispatched had to
have its tables scanned so that they were pointing to the set that were
specific to the processor that the user would be executing on.

the increase in number of (vm370 release 3) shared pages ... came from a
small subset of changes that I had converted from cp67 to vm370 ... some
old email:
http://www.garlic.com/~lynn/2006v.html#email731212
http://www.garlic.com/~lynn/2006w.html#email750102
http://www.garlic.com/~lynn/2006w.html#email750430

some recent posts mentioning storage protect key hack &/or segment
protect being dropped from 370 virtual memory architecture (because of
370/165 hardware issues):
http://www.garlic.com/~lynn/2011.html#44 CKD DASD
http://www.garlic.com/~lynn/2011.html#74 shared code, was Speed of Old Hard Disks - adcons

--
virtualization experience starting Jan1968, online at home since Mar1970

EricP

unread,

Jan 21, 2011, 6:58:01 PM1/21/11

to

Anne & Lynn Wheeler wrote:
> John Levine <jo...@iecc.com> writes:
>> For a project I've been doing, I've been trying to trace the history
>> of copy-on-write in operating systems. It's ubiquitous now, but it
>> took surprisingly long for a lot of systems to make their addressing
>> hardware support it. Most notably. the IBM 360/67 had virtual memory
>> in 1967, but IBM mainframes couldn't do copy on write until the mid
>> 1990s.
>>
>> I put it on my blog so I can update it as I fill in the gaps.
>> Comments welcome.
>>
>> http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html
>
> I'm pretty sure that unix under vm370 projects in the early to mid 80s
> were supporting copy-on-write. It is also possible that the tss/370 SSUP
> layer for AT&T unix in the early to mid 80s also supported
> copy-on-write.

VMS had COW in version 1.0, 1979

> .... a different kind of copy-on-write (from mid-70s):
>
> <snip>

>
> A hack was done for vm370 release 3 ... that allowed virtual machines to
> be run w/o the storage protect hack (and with VMA active). Cross-user
> integrity was preserved by having the system scan all shared pages
> (whenever switching to different users) and if any were found to be
> changed ... the previously running user had the changed paged made
> private ... and the remaining users then refreshed an unchanged page
> from disk.
>

We ported some software from VAX/VMS to VM/CMS about 1984
that used shared memory to communicate between processes.
The problem was getting VM to _NOT_ COW the writable shared segment pages.
I was told by the person who discovered the appropriate hack
that it came down to flipping a bit in the PTE.

Eric

Robert Myers

unread,

Jan 21, 2011, 8:25:32 PM1/21/11

to

Thus, the self-important tone with which people post to comp.arch.

A EE professor at [east coast branch of Cal Tech] told me I was in the
wrong department. I'm sure the detail-minders here will see why he
was wrong.

Robert.

Anne & Lynn Wheeler

unread,

Jan 21, 2011, 8:30:37 PM1/21/11

to

EricP <ThatWould...@thevillage.com> writes:
> We ported some software from VAX/VMS to VM/CMS about 1984
> that used shared memory to communicate between processes.
> The problem was getting VM to _NOT_ COW the writable shared segment pages.
> I was told by the person who discovered the appropriate hack
> that it came down to flipping a bit in the PTE.

re:
http://www.garlic.com/~lynn/2011.html#96 History of copy on write

original relational/sql was done in the 70s on vm370 (370/145) in
bldg. 28. Part of the internal extentions to vm370 for system/r was
"DWSS" ... dynamic writeable share segments (communication between
multiple processes). In the early 80s, for awhile DWSS was part of the
tech transfer from bldg. 28 to Endicott for SQL/DS ... but for what ever
reason, Endicott eventually decided to not use it.

misc. past posts mentioning system/r
http://www.garlic.com/~lynn/submain.html#systemr

misc. past posts mentioning DWSS:
http://www.garlic.com/~lynn/2000.html#18 Computer of the century
http://www.garlic.com/~lynn/2000b.html#55 Multics dual-page-size scheme
http://www.garlic.com/~lynn/2004f.html#23 command line switches [Re: [REALLY OT!] Overuse of symbolic
http://www.garlic.com/~lynn/2004f.html#26 command line switches [Re: [REALLY OT!] Overuse of symbolic
http://www.garlic.com/~lynn/2006t.html#16 Is the teaching of non-reentrant HLASM coding practices ever defensible?
http://www.garlic.com/~lynn/2006t.html#39 Why these original FORTRAN quirks?
http://www.garlic.com/~lynn/2006w.html#11 long ago and far away, vm370 from early/mid 70s
http://www.garlic.com/~lynn/2006y.html#26 moving on
http://www.garlic.com/~lynn/2007f.html#14 more shared segment archeology
http://www.garlic.com/~lynn/2009q.html#19 Mainframe running 1,500 Linux servers?

John Levine

unread,

Jan 21, 2011, 8:59:14 PM1/21/11

to

>> in 1967, but IBM mainframes couldn't do copy on write until the mid
>> 1990s.

>> http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html
>
>I'm pretty sure that unix under vm370 projects in the early to mid 80s
>were supporting copy-on-write. It is also possible that the tss/370 SSUP
>layer for AT&T unix in the early to mid 80s also supported
>copy-on-write.

VM/IX did something to handle fork calls efficiently, but unless they
had custom microcode for the 4361 or 4381, which I'm reasonably sure
they didn't, it had to be copy on touch. I know the RT/PC VRM was
copy-on-touch, I was there.

There was plenty of code that made pages read-only, but as far as I
can tell, until ESA/390 there wasn't any way to restart an instruction
that faulted due to trying to write to a read-only page.

R's,
John

John Levine

unread,

Jan 21, 2011, 9:10:51 PM1/21/11

to

>We ported some software from VAX/VMS to VM/CMS about 1984 that used
>shared memory to communicate between processes. The problem was
>getting VM to _NOT_ COW the writable shared segment pages. I was
>told by the person who discovered the appropriate hack that it came
>down to flipping a bit in the PTE.

Not to cavil or anything, but the ESA/390 Principles of Operation (of
which I have a quaint paper copy) is quite clear that up through
ESA/370 a program couldn't restart after a fault due to writing
a read-only page. So I believe it did something, but unless the
manuals are lying 1984 IBM mainframes didn't have the hardware
ability to do COW. COT sure, but not COW.

R's,
John

Anne & Lynn Wheeler

unread,

Jan 21, 2011, 10:17:55 PM1/21/11

to

John Levine <jo...@iecc.com> writes:
> Not to cavil or anything, but the ESA/390 Principles of Operation (of
> which I have a quaint paper copy) is quite clear that up through
> ESA/370 a program couldn't restart after a fault due to writing
> a read-only page. So I believe it did something, but unless the
> manuals are lying 1984 IBM mainframes didn't have the hardware
> ability to do COW. COT sure, but not COW.

re:

http://www.garlic.com/~lynn/2011.html#96 History of copy on write

http://www.garlic.com/~lynn/2011.html#97 History of copy on write

until they put "page protect" into virtual memory architecture ... after
having dropped "segment protect" from original 370 virtual memory
(before even shipping) ... vm370 was doing protect with storage keys
(and then vm370 release 3 mechanism effectively did copy after write ...
anybody could write ... but if they did, they were given private copy)

(360/)370 storage key protect says "store protect" suppresses the instruction.

online esa/390
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/CCONTENTS?SHELF=EZ2HW125&DN=SA22-7201-04&DT=19970613131822

states some flavor of "segment protect" eventually shipped in 370 ...
but was dropped for 370/xa and replaced with "page protect"
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/F.1.4?SHELF=EZ2HW125&DT=19970613131822&CASE=

(new) esa/390 "suppression on protect" (for page protect) is useful for
aix/esa copy-on-write
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/3.4.5?SHELF=EZ2HW125&DT=19970613131822

Melinda's history mentions 370 finally has segment protect being
introduced in 1982 (hardly a year before 370/xa avail in early 83; which
replaced segment protect with page protect) ... melinda's pages moving:
http://web.me.com/melinda.varian/

a decade after 370 virtual memory announce
http://en.wikipedia.org/wiki/IBM_System/370

... with the corporation continued efforts to kill off vm370 .... there
was somewhat less general "clean" architecture and more & more tailored
around POK's favorite son operating system.

John Levine

unread,

Jan 21, 2011, 11:03:18 PM1/21/11

to

>until they put "page protect" into virtual memory architecture ... after
>having dropped "segment protect" from original 370 virtual memory
>(before even shipping) ... vm370 was doing protect with storage keys
>(and then vm370 release 3 mechanism effectively did copy after write ...
>anybody could write ... but if they did, they were given private copy)

Ugh. Remember when that was? 1976?

>(360/)370 storage key protect says "store protect" suppresses the instruction.

In my 370 PoO the table on page 6-15 and the text on pages 6-21 and
6-22 says a fetch protect interrupt suppresses, but store protect
usually terminates.

R's,
John

Joe Pfeiffer

unread,

Jan 22, 2011, 1:42:16 AM1/22/11

to

John Levine <jo...@iecc.com> writes:

Not clear why that would imply the hardware couldn't do it; just that
the software couldn't. I'm not familiar with the acronym COT?
--
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)

Morten Reistad

unread,

Jan 22, 2011, 4:42:06 AM1/22/11

to

In article <ihdebb$2oo7$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:

>Not to cavil or anything, but the ESA/390 Principles of Operation (of
>which I have a quaint paper copy) is quite clear that up through
>ESA/370 a program couldn't restart after a fault due to writing
>a read-only page. So I believe it did something, but unless the
>manuals are lying 1984 IBM mainframes didn't have the hardware
>ability to do COW. COT sure, but not COW.

Ultrix was a laggard in this respect too. ISTR you could do shared
pages with some mmap-like functions, but all processes had pre-copied
pages; and ate memory.

-- mrr

nm...@cam.ac.uk

unread,

Jan 22, 2011, 4:28:43 AM1/22/11

to

In article <1b8vydv...@snowball.wb.pfeifferfamily.net>,

Joe Pfeiffer <pfei...@cs.nmsu.edu> wrote:
>John Levine <jo...@iecc.com> writes:
>
>>>We ported some software from VAX/VMS to VM/CMS about 1984 that used
>>>shared memory to communicate between processes. The problem was
>>>getting VM to _NOT_ COW the writable shared segment pages. I was
>>>told by the person who discovered the appropriate hack that it came
>>>down to flipping a bit in the PTE.
>>
>> Not to cavil or anything, but the ESA/390 Principles of Operation (of
>> which I have a quaint paper copy) is quite clear that up through
>> ESA/370 a program couldn't restart after a fault due to writing
>> a read-only page. So I believe it did something, but unless the
>> manuals are lying 1984 IBM mainframes didn't have the hardware
>> ability to do COW. COT sure, but not COW.
>
>Not clear why that would imply the hardware couldn't do it; just that
>the software couldn't. I'm not familiar with the acronym COT?

Er, the System/360 and successors were and are not RISC systems,
and not all non-restartable instructions were microcoded. As an
absolute statement, the manuals and John are definitely right,
though I cannot now remember if the insoluble problems occurred
only for privileged instructions and/or the vector extension.

It's a long time since I wrote System/370 interrupt handlers, but
almost all unprivileged instructions could be retried after even a
store protect, though the software to do it was non-trivial.

Taking a look at the green card (POP is in my office), which was
by then a yellow book, the only doubtful ones were the decimal
ones and adjuncts (including PACK etc.) and vector ones (which I
never used). MVC/MVCL etc. were not problems.

That does, of course, assume that the 0C4 interrupt was precise,
which I believe that it was in practice; the 0C5 was wildly
imprecise, and therein hangs a tale (or three) :-)

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Jan 22, 2011, 4:43:21 AM1/22/11

to

In article <ihddli$2nb5$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:
>
>>> in 1967, but IBM mainframes couldn't do copy on write until the mid
>>> 1990s.
>>> http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html
>>
>>I'm pretty sure that unix under vm370 projects in the early to mid 80s
>>were supporting copy-on-write. It is also possible that the tss/370 SSUP
>>layer for AT&T unix in the early to mid 80s also supported
>>copy-on-write.
>
>VM/IX did something to handle fork calls efficiently, but unless they
>had custom microcode for the 4361 or 4381, which I'm reasonably sure
>they didn't, it had to be copy on touch. I know the RT/PC VRM was
>copy-on-touch, I was there.

Not necessarily. In the early days, even VM/CMS had some pretty
severe restrictions on the applications codes it would run, and
most compilers didn't generate the problem instructions. For
example, Fortran needed only a few changes to the run-time system
to support CMS (or, indeed, COW).

>There was plenty of code that made pages read-only, but as far as I
>can tell, until ESA/390 there wasn't any way to restart an instruction
>that faulted due to trying to write to a read-only page.

It was done in the 1960s and 1970s, but the key was that it relied
on the program not doing one of the things that couldn't be fixed
up. That wasn't as hard as it sounds, for someone with the skills
of that date (now largely forgotten).

However, back to the history of COW as such. I should be surprised
if it much predated the rise of Unix, because it isn't particularly
useful in the absence of Unix's horrible fork/exec model. Indeed,
it is possible to say that it's ONLY purpose is an optimisation to
cover up the defects of the ghastly fork/exec design.

The main vendor I would look at for early traces would be DEC,
followed by Prime and the others of that ilk. I doubt that it
would have been implemented by mainframe vendors - or, indeed,
any company that wasn't associated with Unix in some way.

Regards,
Nick Maclaren.

Edward Feustel

unread,

Jan 22, 2011, 6:21:44 AM1/22/11

to

John,
You might look at our patent 4,812,981, Mar 14, 1989 assigned to
Prime Computer.
Ed Feustel

On Fri, 21 Jan 2011 20:57:34 +0000 (UTC), John Levine <jo...@iecc.com>
wrote:

Rob Warnock

unread,

Jan 22, 2011, 6:25:29 AM1/22/11

to

<nm...@cam.ac.uk> wrote:
+---------------

| However, back to the history of COW as such. I should be surprised
| if it much predated the rise of Unix, because it isn't particularly
| useful in the absence of Unix's horrible fork/exec model. Indeed,
| it is possible to say that it's ONLY purpose is an optimisation to
| cover up the defects of the ghastly fork/exec design.

+---------------

I'm not going to argue with the Unix's fork/exec model being perhaps
the *first* major justification of COW, but...

Another very important use was in "zero-copy DMA" output, wherein when
the user program writes a (possibly large, optimally page-aligned and
page-multiple-size) buffer the operating system simply tags the pages COW,
starts the I/O, and returns to the user. In the fast-path case, the user
will not try to write that particular buffer (that is, the COW-flagged
pages) until the output is complete, whereupon the I/O completion handler
would de-COW the pages again. But if the user *did* try to write the
buffer's pages before the I/O was complete, then the COW would be triggered
and a fresh set of kernel pages would be allocated to the user, the
contents of the old set copied to them, and the user's page table
(and TLB) updated to point to the new copy. [In the latter case the
I/O completion handler would be responsible for deallocating the
original buffer pages.]

At SGI in the late 1980s & early 1990s, we were able to get *enormous*
performance improvements in certain types of I/O -- particularly TCP
network traffic -- with this mark-COW-on-output approach, which avoids
[in the fast-path case] CPU-based copying of user data into kernel buffers.
This was especially important since the MIPS CPU's maximum block-copying
rate was *far* below the maximum DMA rate.

We also used "page-flipping" on input [where possible], which is sort
of the reverse of the above, but requires slightly different algorithms
and constraints on the buffers. [E.g., page-aligned and page-multiple-
sized buffers were not just "desirable", but *required*.]

[See also SGI's Direct I/O mode for the XFS filesystem, which added
the O_DIRECT file status flag, and was even *more* restrictive on
the user program than the above, but similarly avoided CPU-based
copying of user data from/to kernel filesystem buffers. I say "more
restrictive" since, unlike the TCP case, COW was *not* used -- altering
output data before the "write()" completed was "undefined", and could
corrupt data.]

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

nm...@cam.ac.uk

unread,

Jan 22, 2011, 6:07:31 AM1/22/11

to

In article <-cidndrm6Ps0I6fQ...@speakeasy.net>,

Rob Warnock <rp...@rpw3.org> wrote:
>|
>| However, back to the history of COW as such. I should be surprised
>| if it much predated the rise of Unix, because it isn't particularly
>| useful in the absence of Unix's horrible fork/exec model. Indeed,
>| it is possible to say that it's ONLY purpose is an optimisation to
>| cover up the defects of the ghastly fork/exec design.
>

>I'm not going to argue with the Unix's fork/exec model being perhaps
>the *first* major justification of COW, but...
>
>Another very important use was in "zero-copy DMA" output, wherein when
>the user program writes a (possibly large, optimally page-aligned and
>page-multiple-size) buffer the operating system simply tags the pages COW,
>starts the I/O, and returns to the user. In the fast-path case, the user
>will not try to write that particular buffer (that is, the COW-flagged
>pages) until the output is complete, whereupon the I/O completion handler
>would de-COW the pages again. But if the user *did* try to write the
>buffer's pages before the I/O was complete, then the COW would be triggered
>and a fresh set of kernel pages would be allocated to the user, the
>contents of the old set copied to them, and the user's page table
>(and TLB) updated to point to the new copy. [In the latter case the
>I/O completion handler would be responsible for deallocating the
>original buffer pages.]

Grrk. I would question the "very important", because the number of
programs dominated by the output of large buffers is small, and
that trick does not work for input (which is almost always more
important) and is counter-productive for small buffers. A useful
trick if you already have COW, yes, but not one worth implementing
COW for.

Regards,
Nick Maclaren.

Message has been deleted

Terje Mathisen

unread,

Jan 22, 2011, 7:28:29 AM1/22/11

to

Rob Warnock wrote:
> [See also SGI's Direct I/O mode for the XFS filesystem, which added
> the O_DIRECT file status flag, and was even *more* restrictive on
> the user program than the above, but similarly avoided CPU-based
> copying of user data from/to kernel filesystem buffers. I say "more
> restrictive" since, unlike the TCP case, COW was *not* used -- altering
> output data before the "write()" completed was "undefined", and could
> corrupt data.]

I believe Win* has similar restrictions when doing unbuffered/async IO:

Page-aligned buffers, don't touch until the OS signals that the work is
done.

It seems rather obvious that this is the set of restrictions that allow
maximum performance.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Message has been deleted

Paul A. Clayton

unread,

Jan 22, 2011, 12:22:17 PM1/22/11

to

On Jan 22, 1:42 am, Joe Pfeiffer <pfeif...@cs.nmsu.edu> wrote:
> John Levine <jo...@iecc.com> writes:

[snip]

> > manuals are lying 1984 IBM mainframes didn't have the hardware
> > ability to do COW. COT sure, but not COW.
>
> Not clear why that would imply the hardware couldn't do it; just that
> the software couldn't. I'm not familiar with the acronym COT?

From context provided in Message-ID: <ihddli$2nb5$1...@gal.iecc.com>
COT is "Copy On Touch".

Paul A. Clayton
just a technophile

John Levine

unread,

Jan 22, 2011, 12:46:19 PM1/22/11

to

>> Not to cavil or anything, but the ESA/390 Principles of Operation (of
>> which I have a quaint paper copy) is quite clear that up through
>> ESA/370 a program couldn't restart after a fault due to writing
>> a read-only page. So I believe it did something, but unless the
>> manuals are lying 1984 IBM mainframes didn't have the hardware
>> ability to do COW. COT sure, but not COW.
>
>Not clear why that would imply the hardware couldn't do it; just that
>the software couldn't. I'm not familiar with the acronym COT?

You're right, as Lynn noted, they could do it by scanning the page
change bits to find out after the fact that a page had been changed
and needed to be separated from other users of it.

R's,
John

EricP

unread,

Jan 22, 2011, 12:53:07 PM1/22/11

to

I'm not familiar with Copy On Touch but it sounds
like that would copy all pages. That isn't necessary.

If the 370 could only recover from a PTE marked invalid,
but not a write to a read-only page then they can do a
mostly-copy-on-write. The PTE could be invalid but marked with
whether it should be read-only (shared) or read-write (private).
If you touch a read-only page then it binds to the shared frame.
If you touch a read-write page then you get a private copy.
The cost over true COW is copying read-write access pages that were
read but never actually written, which is probably not that much.
(It appears though that the copy/no-copy was controlled by
a bit in the PTE and not by the page protection bits).

That would be 'good enough' given the original intent of
VM was to ease OS development, but wasteful for Unix fork.

Eric

Bakul Shah

unread,

Jan 22, 2011, 12:57:07 PM1/22/11

to

On 1/22/11 3:25 AM, Rob Warnock wrote:
> <nm...@cam.ac.uk> wrote:
> +---------------
> | However, back to the history of COW as such. I should be surprised
> | if it much predated the rise of Unix, because it isn't particularly
> | useful in the absence of Unix's horrible fork/exec model. Indeed,
> | it is possible to say that it's ONLY purpose is an optimisation to
> | cover up the defects of the ghastly fork/exec design.
> +---------------
>
> I'm not going to argue with the Unix's fork/exec model being perhaps
> the *first* major justification of COW, but...

Unix didn't have paging until 1979 (3BSD). 3BSD also added
vfork() which *didn't* use copy-on-write. IIRC, COW wasn't
used for another few years but at least by 1981 some of us
wondered why they didn't just use COW and do a `proper
fork()'! So the idea was in the air but it was based on what
was *already* available (per page "dirty" bit). So while "it
is _possible_ to say that its ONLY purpose is to ... cover up
the defects of the ghastly fork/exec design", it is not
accurate. COW's only purpose is to share data that is
modified infrequently but to me the more innovative part is
per-page dirty & access bits (& related traps). With them one
can do all sorts of interesting things, COW being one of
them.

When I first saw fork() I thought it was a beautiful design
choice! vfork() on the other hand is indeed ghastly :-)

John Levine

unread,

Jan 22, 2011, 1:00:32 PM1/22/11

to

>Taking a look at the green card (POP is in my office), which was
>by then a yellow book, the only doubtful ones were the decimal
>ones and adjuncts (including PACK etc.) and vector ones (which I
>never used). MVC/MVCL etc. were not problems.

how about something like

MVC 0(R,L),4(R)

where R is not word aligned and the data block crosses a page
boundary? Can you reliably assume that it stored everything up to
the boundary?

>That does, of course, assume that the 0C4 interrupt was precise,
>which I believe that it was in practice;

Yes, it was precise. I suppose that the response to some of the
arcane questions might have been Don't Do That. I certainly ran into
enough of those while trying to write programs that ran on TSS/360.

R's,
John

John Levine

unread,

Jan 22, 2011, 1:02:11 PM1/22/11

to

>If the 370 could only recover from a PTE marked invalid,
>but not a write to a read-only page then they can do a
>mostly-copy-on-write. The PTE could be invalid but marked with
>whether it should be read-only (shared) or read-write (private).
>If you touch a read-only page then it binds to the shared frame.
>If you touch a read-write page then you get a private copy.

That's Copy on Touch, which applies only to R/W pages. Sharing read
only pages was easy and common on the /67 and everything since.

R's,
John

EricP

unread,

Jan 22, 2011, 1:14:59 PM1/22/11

to

nm...@cam.ac.uk wrote:
>
> However, back to the history of COW as such. I should be surprised
> if it much predated the rise of Unix, because it isn't particularly
> useful in the absence of Unix's horrible fork/exec model. Indeed,
> it is possible to say that it's ONLY purpose is an optimisation to
> cover up the defects of the ghastly fork/exec design.

It is used when you want a mapped file page to come in
from disk, but not save any changes back to that file.
That is initialized read-write program variables,
and relocated code.

Eric

John Levine

unread,

Jan 22, 2011, 1:23:10 PM1/22/11

to

>However, back to the history of COW as such. I should be surprised
>if it much predated the rise of Unix, because it isn't particularly
>useful in the absence of Unix's horrible fork/exec model. Indeed,
>it is possible to say that it's ONLY purpose is an optimisation to
>cover up the defects of the ghastly fork/exec design.

Prepare to be surprised, since COW was in production use 15 years
before it showed up in Unix systems. As I said in the blog entry:

The earliest widely used copy on write implementation I'm aware of
was Tenex in 1972. One of the papers about Tenex says COW was
adapted from BBN Lisp on the SDS 940. Tenex ran on a PDP-10 modified
to add paged address hardware and larger physical addressing. The
main use of copy on write was to initialize writable data from
program files, for almost but not quite reentrant code, and large
data areas that were changed infrequently. (Tenex had a fork
primitive which it used to create new processes, although its use
was somewhat different from Unix's.)

The paper I read said they had Lisp applications which slurped big
data structures into memory, and only occasionally changed anything.
COW was great for that. The PDP-10 had four different subroutine call
instructions, two of which (JSR and JSA) modified the target address
and two of which (JSP and PUSHJ) didn't, so you had a fair amount of
mostly but not quite entirely pure code, depending on programmers'
programming style.

Dunno why you think fork/exec is so ghastly. Its key advantage is
that it's really fork/stuff/exec, with the stuff including I/O
rearrangements, dropping privilege, and so forth. If you combine fork
and exec into a single spawn operator, you have to add all of the
stuff as options to spawn. DTSS (a brilliant and underappreciated
system) did that, which worked but it was ugly.

Or you can do what Tenex did, a fork that created the child process
and system calls that allow the parent to do the stuff one call at a
time to the child, then start it. That wasn't a bad design, but it
was messier than fork/stuff/exec. You could do a Unix-style fork, new
process with a copy of the address space and a different return, but
it wasn't popular because the equivalent of exec was a lot easier to
do from the parent.

R's,
John

nm...@cam.ac.uk

unread,

Jan 22, 2011, 12:44:58 PM1/22/11

to

In article <ihf600$2vut$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:
>
>>Taking a look at the green card (POP is in my office), which was
>>by then a yellow book, the only doubtful ones were the decimal
>>ones and adjuncts (including PACK etc.) and vector ones (which I
>>never used). MVC/MVCL etc. were not problems.
>
>how about something like
>
> MVC 0(R,L),4(R)
>
>where R is not word aligned and the data block crosses a page
>boundary? Can you reliably assume that it stored everything up to
>the boundary?

Yes. Its specification was and is byte-by-byte. That is one of
the reasons that it was very slow.

Regards,
Nick Maclaren.

John Levine

unread,

Jan 22, 2011, 1:33:54 PM1/22/11

to

>You might look at our patent 4,812,981, Mar 14, 1989 assigned to
>Prime Computer.

Hm, interesting patent. But I have to say it looks an awful lot like
what the Tenex fork JSYS did in 1972.

R's,
John

John Levine

unread,

Jan 22, 2011, 1:49:21 PM1/22/11

to

>> MVC 0(R,L),4(R)
>>
>>where R is not word aligned and the data block crosses a page
>>boundary? Can you reliably assume that it stored everything up to
>>the boundary?
>
>Yes. Its specification was and is byte-by-byte. That is one of
>the reasons that it was very slow.

There's an as-if rule. On page 5-38 of my S/370 POO it says

When an instrucation has two storage operands the first of which
causes a store and the second a fetch reference, it is unpredictable
how much of the second operand is fetched before the results are
stored. In the case of destructively overlapping operands, the
portion of the second operand which is common to the first is not
necessarily fetched from storage.

So on a system with 32 or 64 bit wide memory, it could do the
operation in chunks, so long as the final result was right, which
could cause surprises if one of the chunks crossed a boundary.

I don't know to what extent the faster 370s optimized that situation,
particularly once you could fill memory more efficiently with MVCL.
But I expect that decimal arithmetic and ED or EDMK would be hopeless.

R's,
John

nm...@cam.ac.uk

unread,

Jan 22, 2011, 1:07:30 PM1/22/11

to

In article <ihf7ae$5r0$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:
>
>>However, back to the history of COW as such. I should be surprised
>>if it much predated the rise of Unix, because it isn't particularly
>>useful in the absence of Unix's horrible fork/exec model. Indeed,
>>it is possible to say that it's ONLY purpose is an optimisation to
>>cover up the defects of the ghastly fork/exec design.
>
>Prepare to be surprised, since COW was in production use 15 years

>before it showed up in Unix systems. ...

Interesting. It certainly didn't spread all that widely before
the rise of Unix, but temporal association does not prove that
there was a causal link!

>Dunno why you think fork/exec is so ghastly. Its key advantage is
>that it's really fork/stuff/exec, with the stuff including I/O
>rearrangements, dropping privilege, and so forth. If you combine fork
>and exec into a single spawn operator, you have to add all of the
>stuff as options to spawn. DTSS (a brilliant and underappreciated
>system) did that, which worked but it was ugly.

Like Bakul Shah, when I first saw the fork/exec design, I thought
that it was quite good, but I soon learnt better. A lot of the
RAS (and even security) disasters of Unix turn out to be caused
by it, and some are fundamental. A single spawn may be ugly, but
it's a HELL of a lot cleaner!

The point is that, unless BOTH the parent and child cooperate AND
they get all the details right, there are semantic entanglements
of extremely evil forms that few people even imagine exist. At
one stage, I wrote my own nohup because the standard ones were
broken and discovered just how impossible it was to do using only
documented facilities. And that excludes the utterly unspeakable
ways in which one process can bugger up another using a shared
file descriptor, memory segment, shared library etc. - including
children doing that to their parents. NOT good if you are changing
protection domain :-(

If there were a proper library call to do the 'stuff', it might
be tolerable - but there isn't.

Regards,
Nick Maclaren.

Bakul Shah

unread,

Jan 22, 2011, 2:05:36 PM1/22/11

to

On 1/22/11 10:07 AM, nm...@cam.ac.uk wrote:
> Like Bakul Shah, when I first saw the fork/exec design, I thought
> that it was quite good, but I soon learnt better. A lot of the
> RAS (and even security) disasters of Unix turn out to be caused
> by it, and some are fundamental. A single spawn may be ugly, but
> it's a HELL of a lot cleaner!

I still think that! fork() actually has quite simple, clean
semantics: share everything with the child and then in the
child fork block off things you don't want, before exec(). To
something equivalent in spawn() case is much uglier. Even
today somethings are easier to share between parent/child.
Replacements are much uglier (e.g. sending file descriptors
through send()). No time to get into other issues but security
disasters have nothing to do with fork() itself. Anything can
be abused.

nm...@cam.ac.uk

unread,

Jan 22, 2011, 1:22:54 PM1/22/11

to

In article <ihf8rh$b70$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:
>
>>> MVC 0(R,L),4(R)
>>>
>>>where R is not word aligned and the data block crosses a page
>>>boundary? Can you reliably assume that it stored everything up to
>>>the boundary?
>>
>>Yes. Its specification was and is byte-by-byte. That is one of
>>the reasons that it was very slow.
>
>There's an as-if rule. On page 5-38 of my S/370 POO it says

Yes.

>So on a system with 32 or 64 bit wide memory, it could do the
>operation in chunks, so long as the final result was right, which
>could cause surprises if one of the chunks crossed a boundary.

I would have to look at my copy again, but I am pretty sure that
the surprises were all of natures that could be recovered from.

>I don't know to what extent the faster 370s optimized that situation,
>particularly once you could fill memory more efficiently with MVCL.

Quite a lot, but they had fallback code (in the microcode) for the
problematic cases.

>But I expect that decimal arithmetic and ED or EDMK would be hopeless.

Could well be. It's too long ago for me to remember their semantics
to that level of detail. They weren't used in most compilers (PL/I
and COBOL excepted, of course, and possibly RPG). Fortran didn't
need the decimal option and had code to 'run' even without the
floating-point option, though I don't know if it worked.

Regards,
Nick Maclaren.

nm...@cam.ac.uk

unread,

Jan 22, 2011, 1:46:41 PM1/22/11

to

In article <4D3B2A80...@bitblocks.com>,

Er, no. There is a window of vulnerability caused by the model
that cannot be eliminated and is not present in spawn models,
which is most clearly seen when the parent and child need to
communicate through pipes.

A pipe is created in the parent, which then forks the child,
and then BOTH do some cleaning up. If high-RAS or high-security
is an issue, that must be completed before the exec, but there
is no mechanism to require the proper sequencing. This leads to
race conditions where a kill of one process by, say, a scheduler
can impact on the other.

That sort of problem was why I wrote my nohup, and the foulness
of the task (given how conceptually simple it is) surprised me.
I then thought about the issue, and realised that it was almost
insoluble within the Unix fork/exec model. Conceivably some
other fork/exec model might make it soluble, but I couldn't
think of one.

Another problem is that there are many circumstances under which
a process tree needs controlling as a whole. The Unix fork/exec
model doesn't make it possible to do that, which has led to many
systems having weird and wonderful extensions and some of the
most revolting code you can imagine in job schedulers. This,
being the dual of the nohup issue, interferes with it horribly.

The executive summary is that the model works well, UNTIL you
need serious RAS - at least up to mainframe standards - and then
it fails dismally.

Regards,
Nick Maclaren.

Anne & Lynn Wheeler

unread,

Jan 22, 2011, 2:58:46 PM1/22/11

to

nm...@cam.ac.uk writes:
> Yes. Its specification was and is byte-by-byte. That is one of
> the reasons that it was very slow.

re:
http://www.garlic.com/~lynn/2011.html#96 History of copy on write
http://www.garlic.com/~lynn/2011.html#97 History of copy on write
http://www.garlic.com/~lynn/2011.html#98 History of copy on write

360 ... all the storage operands would be pre-checked ... both starting
and ending (for at least store & fetch & 360/67 for page fault)
... before starting the instruction ... and abort the instruction w/o
doing anything. One of the byte-by-byte features was to propogate value
thru a field using overlapping operands, place a zero in 1st byte of
source operand, and then do a move with the start of the target operand
at +1 (the 2nd byte of the source is the same as first byte of the
target, the zero from the 2nd byte of the source isn't there until it
had been moved there). some more recent machines will attempt to
optimize with larger chunks if not overlapping operands.

370 ... introduced the "long" instructions ... which were defined to be
incrementally executed a byte at a time, interruptable, and restartable.
some of the early implementations would precheck the long instructions
operand addresses ... and abort w/o executing (instead of incrementally
executing until the problem address happened) ... which sometimes could
be unpredictable results.

somewhere recently somebody "APAR'ed" the TR/TRT instructions ... which
got it changed. The TR/TRT takes each byte of the first operand, indexes
displacement into the 2nd operand and either replaces the 1st operand
with the contents of the indexed byte (or stops because of
condition). The default was assume that the 2nd operand was always a
256byte table and precheck both the starting and ending storage
locations (for things like fetch protect). The APAR was that the 1st
operand/source might contain a limited set of values and programmer
takes advantage of the fact to build a table much less than 256 bytes
(always checking 2nd operand +256 could give erroneous results).

newer implementations now check to see if the (table) 2nd operand +256
bytes crosses 4k boundary (page fault, fetch protect, etc) ... if it
doesn't, the instruction is executed as before. If +256 crosses 4k
boundary, the instruction is pre-executed to see if any (source) byte
value results in address crossing a 4k boundary. New "performance"
recommendation is to never place starting address of 2nd operand/table
within 256 bytes of 4k boundary.

one of the digressions with regard to segment versus page protect. In
the original (370 virtual memory) segment protect design ... there was a
single page table (per virtual segment) ... with the possibility that
some virtual address spaces (sharing the segment) had r/o, store protect
and other virtual address spaces (sharing the same segment) had r/w
access.

with segment protect, all virtual address spaces could utilize the same
physical page table ... with the protection specification back in each
virtual address space specific segment table entry (i.e. extra bit in
pointer to page table). with page protect ... the protect indicator is
in the (shared) page table entry (rather than in the non-shared segment
table entry) ... resulting in all virtual address spaces sharing that
(segment/) page table ... will have the same protection.

semi-related recent discussion about virtual memory protection
mechanisms versus (360) storage key protection mechanims:
http://www.garlic.com/~lynn/2011.html#44 CKD DASD
http://www.garlic.com/~lynn/2011.html#74 shared code, was Speed of Old Hard Disks - adcons
http://www.garlic.com/~lynn/2011.html#79 Speed of Old Hard Disks - adcons

--
virtualization experience starting Jan1968, online at home since Mar1970

Benny Amorsen

unread,

Jan 22, 2011, 4:01:27 PM1/22/11

to

rp...@rpw3.org (Rob Warnock) writes:

> At SGI in the late 1980s & early 1990s, we were able to get *enormous*
> performance improvements in certain types of I/O -- particularly TCP
> network traffic -- with this mark-COW-on-output approach, which avoids
> [in the fast-path case] CPU-based copying of user data into kernel buffers.
> This was especially important since the MIPS CPU's maximum block-copying
> rate was *far* below the maximum DMA rate.

It is interesting that Linus Torvalds refuses to use similar tricks in
Linux, arguing that the manipulation of page tables is about as
expensive as simply copying the data. Perhaps modern systems are
simply that much faster at copying and slower at page table tricks.

AFAIK all Linux zero-copy I/O is done without page table trickery, which
means that you can e.g. change the data in a file AFTER you call
sendfile() or splice() and have the changed data hit the wire -- or a
somewhat unpredictable mixture of the old data and the new data, if the
timing is right.

/Benny

EricP

unread,

Jan 22, 2011, 4:24:13 PM1/22/11

to

It is possibly because in an SMP system changes to page tables
may require broadcasting TLB shootdowns and interrupting your peers.

Eric

Bakul Shah

unread,

Jan 22, 2011, 5:32:26 PM1/22/11

to

I know nothing about your application so my question may make
no sense but why can't you do `high-RAS' before forking? Or
alternately after whatever race condition has been resolved?
What does a pipe between two processes have to do with
high-RAS? In any case what you are illustrating is that it
may not fit certain applications but that is very far from
being universally "ghastly".

A lot of "security" problems these days have to do with
buffer overflows, people using dumb passwords, people not
upgrading their system to close off known vulnerabilities,
phishing, man in the middle attacks, bad code injection and
so on. I don't think any of them are made any worse (or
better) by use of fork/exec instead of spawn.

> That sort of problem was why I wrote my nohup, and the foulness
> of the task (given how conceptually simple it is) surprised me.

I don't know.... Seems to me you are trying to make it do
something it is not designed for, which makes the cleanup a
royal pain. But I *am* curious so feel free to describe the
problem in some detail (via email if you wish).

> Another problem is that there are many circumstances under which
> a process tree needs controlling as a whole.

What's wrong with the process group concept? Not sure what
this has to do with fork/exec.

> The executive summary is that the model works well, UNTIL you
> need serious RAS - at least up to mainframe standards - and then
> it fails dismally.

Most of us seem to get by without this.

Andy "Krazy" Glew

unread,

Jan 22, 2011, 9:25:34 PM1/22/11

to

On 1/22/2011 3:25 AM, Rob Warnock wrote:
> <nm...@cam.ac.uk> wrote:
> +---------------
> | However, back to the history of COW as such. I should be surprised
> | if it much predated the rise of Unix, because it isn't particularly
> | useful in the absence of Unix's horrible fork/exec model. Indeed,
> | it is possible to say that it's ONLY purpose is an optimisation to
> | cover up the defects of the ghastly fork/exec design.
> +---------------
>
> I'm not going to argue with the Unix's fork/exec model being perhaps
> the *first* major justification of COW, but...
>
> Another very important use was in "zero-copy DMA" output,

I have also described how much I like what I call Myrias COW-forking,
one of the few forms of shared memory parallelism that I know of that is
deterministic:

When forking, COW all pages
run the threads separately
When joining, merge the pages that were COWed according to some sort of
deterministic rule.

Andy "Krazy" Glew

unread,

Jan 22, 2011, 9:39:12 PM1/22/11

to

On 1/22/2011 2:32 PM, Bakul Shah wrote:
> On 1/22/11 10:46 AM, nm...@cam.ac.uk wrote:
>> In article<4D3B2A80...@bitblocks.com>,
>> Bakul Shah<use...@bitblocks.com> wrote:

> A lot of "security" problems these days have to do with
> buffer overflows, people using dumb passwords, people not
> upgrading their system to close off known vulnerabilities,
> phishing, man in the middle attacks, bad code injection and
> so on. I don't think any of them are made any worse (or
> better) by use of fork/exec instead of spawn.

I've been reading this thread on fork/vfork somewhat bemused.

Myself, I fall into what John Levine calls the Tenex camp: for Gould
Secure UNIX follow-ons, I wanted to spawn a non-running child, and then
give the parent syscalls to manipulate the child before it started the
child running.

(BTW, I was on Gould Real-Time UNIX, not Secure UNIX; but I was the
wannabe computer architect who wanted to design hardware and OSes for
all of our products.)

As for fork security: there were several examples of security holes
that amounted to a resource, such as a file desciptor, surviving across
a fork, not being closed inside the child before the exec.

fork/exec means that the coder has to close down all resources that were
inherited by the child that you do not want to pass to the exec.

Including resources, file descriptors, that the original programmer did
not know about; indeed, resources that may have been added by later
revisions of the OS.

It is better, security-wise, to require everything that you want the
exec'ed or spawned child to inherit to be explicitly specified, rather
than to be passively inherited when it might not be needed.

Andy "Krazy" Glew

unread,

Jan 22, 2011, 9:56:17 PM1/22/11

to

E.g. the parent had a file descriptor to a file open that he should have
closed down before exec'ing a setuid child...

Now the child, running as a different user, can access a file he should
not be able to be.

Repeat for any of the umpteen forms of IPC that later UNIXes added.

(IMHO one thing that the original UNIX got right was making everything a
file/file descriptor; later UNIXes diverged.)

Bakul Shah

unread,

Jan 22, 2011, 11:35:32 PM1/22/11

to

On 1/22/11 6:56 PM, Andy "Krazy" Glew wrote:
> On 1/22/2011 6:39 PM, Andy "Krazy" Glew wrote:
>> As for fork security: there were several examples of security holes that
>> amounted to a resource, such as a file desciptor, surviving across a
>> fork, not being closed inside the child before the exec.

Security holes can be closed. Different from race conditions
like what Nick was alluding to. I don't think it is fair to
blame sins of lazy programmers on fork/exec! In any case you
can set the close-on-exec flag on files where it matters.

>> fork/exec means that the coder has to close down all resources that were
>> inherited by the child that you do not want to pass to the exec.
>>
>> Including resources, file descriptors, that the original programmer did
>> not know about; indeed, resources that may have been added by later
>> revisions of the OS.
>>
>> It is better, security-wise, to require everything that you want the
>> exec'ed or spawned child to inherit to be explicitly specified, rather
>> than to be passively inherited when it might not be needed.

Basically _sharing_ complicates security. And security
complicates getting things done (how do you like TSA?). But
if you make sharing hard that does not necessarily make
things more secure. Just slower and more painful. And you
have to deal with a different set of problems. IMHO one
should use appropriate security mechanisms at various kinds
of borders but to have pervasive security in everything (a
lock on every door) doesn't increase security (and actually
has the opposite effect as you stop being aware). In the
end you still have to verify that everything is secure.

If you have to have pervasive security than even the spawn
model is not strong enough. You need capabilities or something
of its ilk. And probably h/w support (or do it cleverly like
in KeyKOS & its descendants -- IIRC, it relied on clever PTE
manipulation).

> E.g. the parent had a file descriptor to a file open that he should have closed down before exec'ing
> a setuid child...

You can use set the close-on-exec flag. Then there is rfork()
(as in Plan 9, don't recall what IRIX rfork did) which allows
you to specify which resources are shared.

> Now the child, running as a different user, can access a file he should not be able to be.
>
> Repeat for any of the umpteen forms of IPC that later UNIXes added.
>
> (IMHO one thing that the original UNIX got right was making everything a file/file descriptor; later
> UNIXes diverged.)

Agreed. This is why I like Plan 9.

nm...@cam.ac.uk

unread,

Jan 23, 2011, 4:28:04 AM1/23/11

to

In article <4D3B5AFA...@bitblocks.com>,

Bakul Shah <use...@bitblocks.com> wrote:
>
>I know nothing about your application so my question may make

>no sense but why can't you do `high-RAS' before forking? ...

Look, there is no point in me pursuing this, because it is clear
that we are talking at different levels. I am NOT talking about
the script kiddie level of security, but about how one writes
code that is close to bulletproof and/or defensive against an
intelligent, professional attack. And it's not MY application;
it's a complete class of them.

You completely missed my point about temporal ordering, though a
quick glance at POSIX and/or Microsoft documentation would show
it clearly. Without temporal ordering control, you NECESSARILY
have race conditions.

For another case, consider that nohup. You DO know that signals
are sent to processes, process groups, all processes with a
particular controlling terminal, all processes with a descriptor
open for a particular device, all processes whose PARENTS have
particular properties, and more, don't you? Breaking ALL of the
connexions is HARD, does NOT have enough POSIX support, and is
system-dependent.

>> The executive summary is that the model works well, UNTIL you
>> need serious RAS - at least up to mainframe standards - and then
>> it fails dismally.
>
>Most of us seem to get by without this.

Sigh. I know :-(

Back in the 1970s, the mainframes were quite rightly damned for
being insecure and unreliable by the proponents of capability
systems (and similarly in languages). Since then, we have gone
so far in the opposite direction that most of you have never used
even a SEMI-secure system!

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Jan 23, 2011, 1:58:14 PM1/23/11

to

On 1/22/2011 8:35 PM, Bakul Shah wrote:
> On 1/22/11 6:56 PM, Andy "Krazy" Glew wrote:
>> On 1/22/2011 6:39 PM, Andy "Krazy" Glew wrote:
>>> As for fork security: there were several examples of security holes that
>>> amounted to a resource, such as a file desciptor, surviving across a
>>> fork, not being closed inside the child before the exec.
>
> Security holes can be closed. Different from race conditions
> like what Nick was alluding to. I don't think it is fair to
> blame sins of lazy programmers on fork/exec! In any case you
> can set the close-on-exec flag on files where it matters.

Lazy and error prone programmers are a major cause of security problems.

Interfaces that make it easier for lazy and error prone programmers to
commit security errors are an even more important cause of security
problems:

More important because, while it is unlikely that we will ever get rid
of lazy and error prone programmers, since there are just too many of
them, there are far fewer interfaces.

Go for the points of greatest leverage.

Example:

http://stackoverflow.com/questions/1643304/how-to-set-close-on-exec-by-default

>No, there is no way to set close on exec as the default. You simply
need to be careful....

---

By the way, I can't find a full timeline for O_CLOEXEC, but it is fairly
recent, as witnessed by

http://www.linuxfoundation.org/news-media/lwf/userspace

User Space
By Corbet - January 26, 2009 - 12:08pm

glibc

Glibc is the GNU C library, the lowest-level system library which
interfaces directly with the kernel. Several libc implementations exist,
but glibc is the version shipped with most distributions; it is used
almost everywhere except in some embedded Linux deployments. The current
glibc release is 2.6.1. There is a 2.7 release in the works with no
planned release date; expected features include better fortification
(protection against some security exploits) for C++, better cpuset
support, support for the new O_CLOEXEC flag, x86_64 vDSO support, and more.

If glibc support is less than a few years old, then there is still a lot
of code that can't use it out there.

Which sounds to me like a good example of a bandaid, being applied to
fix a fork/exec security problem, 30+ years late.

nm...@cam.ac.uk

unread,

Jan 23, 2011, 1:49:43 PM1/23/11

to

In article <a6idnWcg4I_R56HQ...@giganews.com>,

Andy \"Krazy\" Glew <an...@SPAM.comp-arch.net> wrote:
>On 1/22/2011 8:35 PM, Bakul Shah wrote:
>
>>>> As for fork security: there were several examples of security holes that
>>>> amounted to a resource, such as a file desciptor, surviving across a
>>>> fork, not being closed inside the child before the exec.
>>
>> Security holes can be closed. Different from race conditions
>> like what Nick was alluding to. I don't think it is fair to
>> blame sins of lazy programmers on fork/exec! In any case you
>> can set the close-on-exec flag on files where it matters.
>
>Lazy and error prone programmers are a major cause of security problems.
>
>Interfaces that make it easier for lazy and error prone programmers to
>commit security errors are an even more important cause of security
>problems:

Far more common, I agree. However, the current ones not merely have
that property, but the one where even the most conscientious and
reliable programmer cannot close all of the known, soluble security
loopholes. That is often because there is no way to clean up
properly without causing other problems - e.g. shared library use.

In particular, when increasing privilege, the higher privilege
code doesn't get control until AFTER the exec, and therefore
fundamentally cannot protect itself against being passed a bad
environment. I have written code that did the following, to
reduce such exposures to a tolerable level:

Executable A is the one called by the 'user', and runs with
UID P and an unprivileged GID.
It does some cleaning up, forks to a child, and kills itself,
to render the child parentless.
The child then sets GID to Q (which is one of UID P's), and
then execs to an executable B, which is inaccessible to UID P,
owned by root, AND is accessible to GID Q.
Executable B is then called in a known environment, and does
the real work.

Hacking your way through that is left as an exercise for the
reader - yes, it's possible, under most Unices ....

Regards,
Nick Maclaren.

Robert Myers

unread,

Jan 23, 2011, 6:40:25 PM1/23/11

to

On Jan 23, 1:49 pm, n...@cam.ac.uk wrote:

> Hacking your way through that is left as an exercise for the
> reader

All past history, ain't it, Nick? The perfect moment comes only once,
even to computer snots.

Robert.

Bakul Shah

unread,

Jan 23, 2011, 9:09:49 PM1/23/11

to

On 1/23/11 10:58 AM, Andy "Krazy" Glew wrote:
> On 1/22/2011 8:35 PM, Bakul Shah wrote:
>> On 1/22/11 6:56 PM, Andy "Krazy" Glew wrote:
>>> On 1/22/2011 6:39 PM, Andy "Krazy" Glew wrote:
>>>> As for fork security: there were several examples of security holes that
>>>> amounted to a resource, such as a file desciptor, surviving across a
>>>> fork, not being closed inside the child before the exec.
>>
>> Security holes can be closed. Different from race conditions
>> like what Nick was alluding to. I don't think it is fair to
>> blame sins of lazy programmers on fork/exec! In any case you
>> can set the close-on-exec flag on files where it matters.
>
> Lazy and error prone programmers are a major cause of security problems.
>
> Interfaces that make it easier for lazy and error prone programmers to commit security errors are an
> even more important cause of security problems:

By the same logic no one should program in C/C++/assembly
language. But even if everyone moves to type safe languages,
you will still find security problems (programmer forgets to
validate an input field and the bad guy does an SQL injection
attack using it -- there are techniques to catch such things
but lazy programmers won't use them :-)

Fundamentally openness comes with security problems.

The point is, one makes pragmatic decisions based on what is
available and one's goals. dmr/ken could not have foreseen
today's security problems in 1972 -- just as the designers of
internet protocols didn't (and they were designing it *for*
the defence department!). If dmr/ken had thought of all the
security issues or had a perfectionist POV there might not
have been Unix-- though I suppose that would have made Nick
happy! I will still say for what fork() was designed to do,
it has served excellently. At any rate, fork() is not going
to go away. All one can do is to move forward. I mentioned
rfork(). That is why close-on-exec. FreeBSD for instance now
has an experimental feature called 'capsicum' that adds
capabilities and aids in application compartmentalisation
etc. Then there are facilities like `jail' in freebsd that
already aid in compartmentalisation.

> More important because, while it is unlikely that we will ever get rid of lazy and error prone
> programmers, since there are just too many of them, there are far fewer interfaces.

> Go for the points of greatest leverage.

No disagreement here. But I claim fixing fork() is not going
to help! The issue is we don't do quality any more.

Though at this point I will bow out.

van...@vsta.org

unread,

Jan 23, 2011, 10:56:37 PM1/23/11

to

"Andy \"Krazy\" Glew" <an...@spam.comp-arch.net> wrote:

> It is better, security-wise, to require everything that you want the
> exec'ed or spawned child to inherit to be explicitly specified, rather
> than to be passively inherited when it might not be needed.

Wasn't that the impetus for posix_spawn()? I seem to recall that the push at
the time was to deprecate fork, but (obviously) that never came to pass.

--
Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Tony Finch

unread,

Jan 24, 2011, 1:53:50 PM1/24/11

to

"Andy \"Krazy\" Glew" <an...@SPAM.comp-arch.net> wrote:
>

>(IMHO one thing that the original UNIX got right was making everything a
>file/file descriptor; later UNIXes diverged.)

Except for pids, signals, the command line, environment variables, ...

Tony.
--
f.anthony.n.finch <d...@dotat.at> http://dotat.at/
FISHER: NORTHWESTERLY 6 TO GALE 8 BECOMING VARIABLE 3 OR 4. MODERATE OR ROUGH.
RAIN THEN SHOWERS. MODERATE OR GOOD, OCCASIONALLY POOR.

Tony Finch

unread,

Jan 24, 2011, 1:57:26 PM1/24/11

to

nm...@cam.ac.uk wrote:
>
>Like Bakul Shah, when I first saw the fork/exec design, I thought
>that it was quite good, but I soon learnt better. A lot of the
>RAS (and even security) disasters of Unix turn out to be caused
>by it, and some are fundamental. A single spawn may be ugly, but
>it's a HELL of a lot cleaner!

I think the problems are more to do with setuid than with fork.

Tony.
--
f.anthony.n.finch <d...@dotat.at> http://dotat.at/

SOLE: NORTHEAST BACKING NORTH, 4 OR 5, OCCASIONALLY 6. SLIGHT OR MODERATE.
SHOWERS. GOOD.

nm...@cam.ac.uk

unread,

Jan 24, 2011, 1:34:54 PM1/24/11

to

In article <wVz*Ll...@news.chiark.greenend.org.uk>,

Tony Finch <d...@dotat.at> wrote:
>>
>>Like Bakul Shah, when I first saw the fork/exec design, I thought
>>that it was quite good, but I soon learnt better. A lot of the
>>RAS (and even security) disasters of Unix turn out to be caused
>>by it, and some are fundamental. A single spawn may be ugly, but
>>it's a HELL of a lot cleaner!
>
>I think the problems are more to do with setuid than with fork.

Not really, because most of them occur using simply fork and exec.

There have been lots of systems that use 'setuid' bits on executables
that escaped almost all of the problems (and, in some cases, ALL of
the problems) - but the common factor was that they used a 'spawn'
model of process creation and not a fork/exec one.

Regards,
Nick Maclaren.

Rick Jones

unread,

Jan 24, 2011, 3:28:34 PM1/24/11

to

Rob Warnock <rp...@rpw3.org> wrote:
> At SGI in the late 1980s & early 1990s, we were able to get
> *enormous* performance improvements in certain types of I/O --
> particularly TCP network traffic -- with this mark-COW-on-output
> approach, which avoids [in the fast-path case] CPU-based copying of
> user data into kernel buffers. This was especially important since
> the MIPS CPU's maximum block-copying rate was *far* below the
> maximum DMA rate.

SGI weren't alone in that regard.
ftp://ftp.cup.hp.com/dist/networking/briefs/copyavoid.pdf (at least
for as long as ftp.cup.hp.com remains up. I'll be taking id down by
June I suspect.)

In the case of HP-UX, it was a little bit uglier under the covers -
the virtually indexed caches in PA-RISC made it "quite inconvenient"
to have a second virual address for the page(s) that would go to the
same cache lines, so advantage was taken of the presence of CKO
(ChecKsum Offload) on the NIC and not needing to access the data going
down the stack in the normal case, and so a physical address found its
way down the stack, with some "special" routines to cover the case
when a connection migrated from a COWable to non-COWable interface and
other "exception."

> We also used "page-flipping" on input [where possible], which is
> sort of the reverse of the above, but requires slightly different
> algorithms and constraints on the buffers. [E.g., page-aligned and
> page-multiple- sized buffers were not just "desirable", but
> *required*.]

HP-UX called it page-remapping. A rose by another name. Alas, FDDI
and its nice MTU size didn't "win" and page sizes didn't stay 4096
bytes :(

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Rick Jones

unread,

Jan 24, 2011, 3:38:36 PM1/24/11

to

nm...@cam.ac.uk wrote:
> Grrk. I would question the "very important", because the number of
> programs dominated by the output of large buffers is small, and that
> trick does not work for input (which is almost always more
> important) and is counter-productive for small buffers. A useful
> trick if you already have COW, yes, but not one worth implementing
> COW for.

Pfft. The folks behind HP-UX thought it worth implementing for
networking :) Now, in the rarified academic scheme of things, perhaps
it doesn't seem worthwhile, but the ability to get link-rate FDDI on
an HP 9000 8X2 system (*) and still have some CPU for those pesky
applications, helped push some tin. Beyond simple netperf tests, it
did some very nice things to FTP performance, and well-received by
customers, before we had sendfile() for that.

Is bulk transfer of data in large packets the be-all and end-all? Of
course not - heck, if it were, we'd never have bothered creating
netperf to replace ttcp :) But that does not mean that accelerating
bulk transfer by getting rid of the last of the non-DMA memory bus
crossings wasn't worthwhile.

rick jones

(*) Alas, each successive HP 9000 system moved the NIO/HP-PB bus
farther and farther from memory. On the 8x2's HP-PB *was* the memory
bus. The 8x7's (later the F,G,H,I class) had it one converter away,
and the K Class had it two converters away. And by the K Class, the
"all pages of 4096 bytes" bit went away and 100Base-T with its puny
1500 byte MTU was replacing FDDI.

--
oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates

nm...@cam.ac.uk

unread,

Jan 24, 2011, 5:05:10 PM1/24/11

to

In article <ihko0c$fj3$2...@usenet01.boi.hp.com>,

Rick Jones <rick....@hp.com> wrote:
>
>nm...@cam.ac.uk wrote:
>> Grrk. I would question the "very important", because the number of
>> programs dominated by the output of large buffers is small, and that
>> trick does not work for input (which is almost always more
>> important) and is counter-productive for small buffers. A useful
>> trick if you already have COW, yes, but not one worth implementing
>> COW for.
>
>Pfft. The folks behind HP-UX thought it worth implementing for
>networking :) Now, in the rarified academic scheme of things, perhaps
>it doesn't seem worthwhile, but the ability to get link-rate FDDI on
>an HP 9000 8X2 system (*) and still have some CPU for those pesky
>applications, helped push some tin. Beyond simple netperf tests, it
>did some very nice things to FTP performance, and well-received by
>customers, before we had sendfile() for that.

Hmm. I remember having conversations like:

Salesdroid: gimmick A will improve performance by 50%.
Me: I measured it, and it was 10% slower.
Salesdroid: That's because your program wasn't suitable for it.
Me: What programs are suitable?
Salesdroid: Well, our benchmarketing suite is.

Regards,
Nick Maclaren.

Rick Jones

unread,

Jan 24, 2011, 5:58:13 PM1/24/11

to

I'm sorry if you got a bum steer with COW :)

I won't doubt you had discussions like that, and I won't pretend to
assert that COW was worthwhile for small packets or anything beyond
bulk transfers. I was simply taking exception to your ROI assertion.
And that sequence doesn't necessarily indicate that what you called
gimmick A was a failure and not worth having. It indicates that the
failure was in the benchmarking suite's ability to model you :) Or in
the salesdroid's selective cherry picking of the results across the
components of their company's benchmarking suite.

Also, in HP-UX at least, no attempt was made to have a COW with a
user's send() unless it was explicitly enabled via a setsockopt() call
for that socket, and met the other pre-conditions such as alignment,
so if the user application wasn't one where it would benefit, none of
that path was taken. So, the negative effect of having COW support in
the HP-UX stack was essentially epsilon for other cases.

rick jones
--
I don't interest myself in "why". I think more often in terms of
"when", sometimes "where"; always "how much." - Joubert

Terje Mathisen

unread,

Jan 25, 2011, 2:19:14 AM1/25/11

to

Rick Jones wrote:
> rick jones
> -- firebug n, the idiot who tosses a lit cigarette out his car window

idiot n, anyone who uses his car window as an ashtray or waste basket

(doing it because they don't want to dirty their car?)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

nm...@cam.ac.uk

unread,

Jan 25, 2011, 3:12:33 AM1/25/11

to

In article <ihl065$kdi$1...@usenet01.boi.hp.com>,

Rick Jones <rick....@hp.com> wrote:
>> >
>> >> Grrk. I would question the "very important", because the number of
>> >> programs dominated by the output of large buffers is small, and that
>> >> trick does not work for input (which is almost always more
>> >> important) and is counter-productive for small buffers. A useful
>> >> trick if you already have COW, yes, but not one worth implementing
>> >> COW for.
>> >
>> >Pfft. The folks behind HP-UX thought it worth implementing for
>> >networking :) Now, in the rarified academic scheme of things, perhaps
>> >it doesn't seem worthwhile, but the ability to get link-rate FDDI on
>> >an HP 9000 8X2 system (*) and still have some CPU for those pesky
>> >applications, helped push some tin. Beyond simple netperf tests, it
>> >did some very nice things to FTP performance, and well-received by
>> >customers, before we had sendfile() for that.
>

>I'm sorry if you got a bum steer with COW :)

It wasn't COW that the conversation was about :-)

But we are at cross-purposes. What I am querying is specifically
the description "very important" for that use - I am not denying
that it is useful for that. And I agree that, if COW is easy to
provide, that use alone might well justify it. It's a clean and
well-localised facility, after all.

My main point is that, in a system without a Unix-style fork, it
is a fairly minor facility. While it may improve a FEW benchmarks
by a large factor, it's pretty unlikely to make a large or even
significant difference to most real workloads.

The same is true of a better integer multiplication, of the sort
that PA-RISC does NOT have! There are quite a few important HPC
applications where that architecture is unnecessarily slow, for
that reason alone - one use came to me with one, and I had to say
that he was out of luck, because his code inherently relied on it.
But HP chose NOT to implement one, because it wasn't important
enough, OVERALL.

Regards,
Nick Maclaren.

John Levine

unread,

Jan 25, 2011, 11:09:43 AM1/25/11

to

>My main point is that, in a system without a Unix-style fork, it
>is a fairly minor facility. While it may improve a FEW benchmarks
>by a large factor, it's pretty unlikely to make a large or even
>significant difference to most real workloads.

Speaking off the top of my head (or other body part), it seems to me
that for implementing fork calls, copy-on-touch is probably better.
It's rare to touch a stack page without changing it, and COT should be
cheaper to set up, since you can leave the page tables as though the
pages are all non-resident.

I think the place where it really wins is shared libraries,
particularly Windows style where you can usually but not always map in
the library at the address where it was expected to be.

R's,
John

Bakul Shah

unread,

Jan 25, 2011, 12:12:41 PM1/25/11

to

On 1/25/11 8:09 AM, John Levine wrote:
> Speaking off the top of my head (or other body part), it seems to me
> that for implementing fork calls, copy-on-touch is probably better.
> It's rare to touch a stack page without changing it, and COT should be
> cheaper to set up, since you can leave the page tables as though the
> pages are all non-resident.

? Aren't most of the major costs of COT and COW the same? Taking the
trap, allocating a free page, copying, (& doing the same for the
parent page table the first time around).

> I think the place where it really wins is shared libraries,
> particularly Windows style where you can usually but not always map in
> the library at the address where it was expected to be.

I always wondered if one could somehow combine page tables and shared
lib jump tables to decrease the cost of accessing shared libs.

Tim McCaffrey

unread,

Jan 25, 2011, 3:58:25 PM1/25/11

to

In article <4D3CDF6D...@bitblocks.com>, use...@bitblocks.com says...

Funny, other mainframe OSs figured out those problems in the same time period.

Let's face it, Unix was supposed to be Multics-lite. It was supposed to do
multitasking/multiuser on a budget (a very small budget). Security (both from
the HS sense and the sandbox sense) wasn't a high priority, in other words
they didn't try to make it absolute since their attitude (if they knew it or
not) was that they controlled the environment and they "just had to be
careful". Example: I remember reading that there was NO recovery code in the
disk drivers, since they just felt if the disk was going bad, buy a new one.

BTW, you can program in assembly in a secure OS, and even on secure machines.
Suppose there was an OS that took full advantage of segments on the 386: You
can use pointers, but you can't use buffer overflow, execute data or otherwise
touch things your not supposed to. Still, assembly language is valid in such
an OS (yes, I wish there was one, it would be much more reliable).

- Tim

Chris Gray

unread,

Jan 25, 2011, 4:26:14 PM1/25/11

to

nm...@cam.ac.uk writes:

> My main point is that, in a system without a Unix-style fork, it
> is a fairly minor facility. While it may improve a FEW benchmarks
> by a large factor, it's pretty unlikely to make a large or even
> significant difference to most real workloads.

Well, the ability to use mmap/segv to implement COW was pretty crucial to all
the ports of Myrias semantics to workstations. Map parent page in as
read-only. Trap on write, allocate child copy, copy data, enable writes,
return. The PA-RISC needed an extra data copy, if I remember correctly.
(Something about not allowing the same physical page to be mapped into two
different processes at the same time? It's been too long!) However, the fact
that Myrias is long gone makes this use not invalidate your argument!

--
Chris Gray c...@GraySage.COM
http://www.Nalug.ORG/ (Lego)
http://www.GraySage.COM/cg/ (Other)

Anne & Lynn Wheeler

unread,

Jan 25, 2011, 4:53:19 PM1/25/11

to

timca...@aol.com (Tim McCaffrey) writes:
> Funny, other mainframe OSs figured out those problems in the same time period.
>
> Let's face it, Unix was supposed to be Multics-lite. It was supposed to do
> multitasking/multiuser on a budget (a very small budget). Security (both from
> the HS sense and the sandbox sense) wasn't a high priority, in other words
> they didn't try to make it absolute since their attitude (if they knew it or
> not) was that they controlled the environment and they "just had to be
> careful". Example: I remember reading that there was NO recovery code in the
> disk drivers, since they just felt if the disk was going bad, buy a new one.
>
> BTW, you can program in assembly in a secure OS, and even on secure machines.
> Suppose there was an OS that took full advantage of segments on the 386: You
> can use pointers, but you can't use buffer overflow, execute data or otherwise
> touch things your not supposed to. Still, assembly language is valid in such
> an OS (yes, I wish there was one, it would be much more reliable).

some number of the CTSS people went to 5th flr of 545 tech sq for
Multics and others went to the science center on the 4th flr and did
things like virtual machines (cp40, cp67, vm370).

multics was done w/pli ... and had none of the buffer overflow problems
of unix. old post
http://www.garlic.com/~lynn/2002l.html#42 Thirty Years Later: Lessons from the Multics Security Evaluation

with references to
http://www.acsac.org/2002/papers/classic-multics.pdf

and
http://csrc.nist.gov/publications/history/karg74.pdf

and a reference to virtual machine work done on the 4th flr:
http://www.nsa.gov/research/selinux/list-archive/0409/8362.shtml

part of the unix issue was that string/array conventions in C makes it
almost as hard to *NOT* shoot yourself in the foot ... as it is in many
other environments to actually shoot yourself in the foot (conventions
in many other environments result in having to work really hard to have
buffer overflows ... even for some assembler environments where there
are specific kinds of coding conventions).

--
virtualization experience starting Jan1968, online at home since Mar1970

robert...@yahoo.com

unread,

Jan 25, 2011, 5:14:42 PM1/25/11

to

On Jan 25, 2:58 pm, timcaff...@aol.com (Tim McCaffrey) wrote:
> BTW, you can program in assembly in a secure OS, and even on secure machines.
> Suppose there was an OS that took full advantage of segments on the 386: You
> can use pointers, but you can't use buffer overflow, execute data or otherwise
> touch things your not supposed to. Still, assembly language is valid in such
> an OS (yes, I wish there was one, it would be much more reliable).

It's important to not overstate the utility of 386 style segments in
this role - they're pretty much not anywhere fine-grained enough to
stop many cases (although they could stop many). Consider separate
objects in C - you could quite reasonably put them in separate
segments (ignoring the performance hit and the very limited number of
segments available), and you would prevent any sort of inter-object
buffer overflow. OTOH, intra-object buffer overflows aren't impeded
at all. Consider:

struct A {char c[8]; void *f();} a;

strcpy(a.c, "abcdefgh\x01\x23\x45\x67");
a.f();

Will generate a call to 0x01234567 or 0x67452301 on most 32 bit
implementations, and making “a” its own segment doesn’t help at all.

John Levine

unread,

Jan 25, 2011, 8:55:21 PM1/25/11

to

>> Speaking off the top of my head (or other body part), it seems to me
>> that for implementing fork calls, copy-on-touch is probably better.
>> It's rare to touch a stack page without changing it, and COT should be
>> cheaper to set up, since you can leave the page tables as though the
>> pages are all non-resident.
>
>? Aren't most of the major costs of COT and COW the same? Taking the
>trap, allocating a free page, copying, (& doing the same for the
>parent page table the first time around).

Depends on the architecture. If you have inverted page tables as on
the PowerPC, a given physical page can only be mapped to one place, so
while a COW page is shared, you have to remap it every time a
different process tries to touch it, potentially at every context
switch. With COT you don't map it at all until it's touched, then you
make a private copy. Or there's the pre-ESA/390 IBM mainframe where
you had to poll the change bits on all the writable pages on each
context switch to see if you needed to make another copy.

>> I think the place where it really wins is shared libraries,
>> particularly Windows style where you can usually but not always map in
>> the library at the address where it was expected to be.
>
>I always wondered if one could somehow combine page tables and shared
>lib jump tables to decrease the cost of accessing shared libs.

Ewwww.

R's,
John

Andy "Krazy" Glew

unread,

Jan 26, 2011, 2:08:27 AM1/26/11

to

Hold on: although I criticize(d) fork/exec, I think I understand that it
has some nice features:

Basically, it avoids the need for syscall proliferation. You don't have
to have two flavors of every syscall: you just have to have the flavor
that applies to yourself, and then arrange to inherit across exec.

Plus, fork by itself is quite useful. And here communication between
parent and child is accomplished by the memory image.

I must admit that it struck me as a very expensive way to do forking and
parameter passing. But if you ignore cost, then it is rather elegant.

I've used many of those other OSes. Their process forking mechanisms
were often so expensive that people avoided them. Think VAX/VMS - gag!!!!

However, fork/exec has security problems. Just like C.

Message has been deleted

Tim McCaffrey

unread,

Jan 26, 2011, 11:36:29 AM1/26/11

to

In article
<b67406a8-5ddf-486a...@k21g2000prb.googlegroups.com>,
robert...@yahoo.com says...
>
>On Jan 25, 2:58=A0pm, timcaff...@aol.com (Tim McCaffrey) wrote:
>> BTW, you can program in assembly in a secure OS, and even on secure machi=
>nes. =A0
>> Suppose there was an OS that took full advantage of segments on the 386: =
>=A0You
>> can use pointers, but you can't use buffer overflow, execute data or othe=
>rwise
>> touch things your not supposed to. =A0Still, assembly language is valid i=

>n such
>> an OS (yes, I wish there was one, it would be much more reliable).
>
>
>It's important to not overstate the utility of 386 style segments in
>this role - they're pretty much not anywhere fine-grained enough to
>stop many cases (although they could stop many). Consider separate
>objects in C - you could quite reasonably put them in separate
>segments (ignoring the performance hit and the very limited number of
>segments available), and you would prevent any sort of inter-object
>buffer overflow. OTOH, intra-object buffer overflows aren't impeded
>at all. Consider:
>
> struct A {char c[8]; void *f();} a;
>
> strcpy(a.c, "abcdefgh\x01\x23\x45\x67");
> a.f();
>
>Will generate a call to 0x01234567 or 0x67452301 on most 32 bit

>implementations, and making =93a=94 its own segment doesn=92t help at all.

Ok, I would claim that this shows how C doesn't like segmentation (it was
developed with a flat memory model, and you really do run into problems when
you run it on a segmented machine).

Anyway, the function call above, in the 386 model, would either be a "near" or
"far" pointer.

If it is a near pointer it can only point into the an address in the same code
segment as the caller. If it is a far pointer then the segment has to be a
valid code segment for a call through *f to work.

Of course, with a architecture that wasn't so tight with selectors (8K is just
not enough), a.c would have it's own segment. And languages like Pascal and
Algol just didn't have these string overflow problems (if you kept bounds
checking on).

- Tim

Bakul Shah

unread,

Jan 26, 2011, 1:35:01 PM1/26/11

to

On 1/26/11 12:38 AM, Morten Reistad wrote:

> In article<ihnuu9$1q24$1...@gal.iecc.com>, John Levine<jo...@iecc.com> wrote:
>
>>> ? Aren't most of the major costs of COT and COW the same? Taking the
>>> trap, allocating a free page, copying, (& doing the same for the
>>> parent page table the first time around).
>>
>> Depends on the architecture. If you have inverted page tables as on
>> the PowerPC, a given physical page can only be mapped to one place, so
>> while a COW page is shared, you have to remap it every time a
>> different process tries to touch it, potentially at every context
>> switch. With COT you don't map it at all until it's touched, then you
>> make a private copy. Or there's the pre-ESA/390 IBM mainframe where
>> you had to poll the change bits on all the writable pages on each
>> context switch to see if you needed to make another copy.
>

> This stragegy works badly in a modern OS setting. First, it does
> not work at all with SMP; and it has severe issues with multiple
> threads inside the same memory space.

Which strategy? Inverted page tables or copy-on-{write,touch}?
I don't think COW/COT have problems with SMP (you do have to
protect the shared operation with a mutex) or with multiple
threads. SGI did COW & threads. So do more modern OSes.

>>> I always wondered if one could somehow combine page tables and shared
>>> lib jump tables to decrease the cost of accessing shared libs.

I didn't express that well but what I was wondering about is
if there is a cheaper way to do dynamic linking, with linking
cost similar to current dynamic linking and access cost similar
to static linking. Probably not. If you did complete runtime
linking that resolves references just like in static linking,
there would be no difference in the function calling or global
data access cost but doing such complete reference resolution
would be rather expensive. IIRC, prior to shared libs, some
programs do did this (I think they used "ld -A" or some such
on their own binary -- may be the ELK scheme interpreter did
this?).

> Modern systems have copped out of this, and make linkage
> segments for the impure pointers, so the shared libs and the
> code referencing them can remain pure.
>
> Having COW on linkage segments enables the linker to be a fully
> user-mode process. Otherwise the linker will have to be
> manipulating page entries that requires some priviliges.
> Like on the 360 systems.

IIRC, rtld (ld-elf.so) the runtime loader just uses mmap tricks;
I didn't think COW has anything to do with full link-editing in
user-mode operation but I could be wrong; my knowledge of this
has paged out (looked into this in 2003 timeframe).

> A linker is a beast where I can envision a whole Pandoras
> box of attacks, and is definatly best kept in user space at
> a low privilige level.

Agreed.

nm...@cam.ac.uk

unread,

Jan 26, 2011, 1:07:58 PM1/26/11

to

In article <4D406955...@bitblocks.com>,

Bakul Shah <use...@bitblocks.com> wrote:
>
>I didn't express that well but what I was wondering about is
>if there is a cheaper way to do dynamic linking, with linking
>cost similar to current dynamic linking and access cost similar
>to static linking. Probably not. If you did complete runtime
>linking that resolves references just like in static linking,
>there would be no difference in the function calling or global
>data access cost but doing such complete reference resolution
>would be rather expensive. IIRC, prior to shared libs, some
>programs do did this (I think they used "ld -A" or some such
>on their own binary -- may be the ELK scheme interpreter did
>this?).

I haven't studied the methods used by modern dynamic linkers in
any detail, but you do NOT need any particular hardware support,
or even any operating system support. Been there - done that.

Regards,
Nick Maclaren.

robert...@yahoo.com

unread,

Jan 26, 2011, 2:21:47 PM1/26/11

to

On Jan 26, 10:36 am, timcaff...@aol.com (Tim McCaffrey) wrote:
> In article
> <b67406a8-5ddf-486a-b6f6-52528df84...@k21g2000prb.googlegroups.com>,
> robertwess...@yahoo.com says...

If you can generate a valid flat address, it's not going to be much
harder to generate a valid segmented address. The real value comes
from not being able to inject new code into executable regions (and
most systems can accomplish that with page attributes), but even then
if you can coerce existing code to do what you want, munging a
function pointer will suffice. In C for example, if you can arrange
to call system() with a string of your choosing, you can probably
accomplish some interesting things. Address space randomization make
life tougher for the attacker here (which again is usually done
without segments).

> Of course, with a architecture that wasn't so tight with selectors (8K is just
> not enough), a.c would have it's own segment. And languages like Pascal and
> Algol just didn't have these string overflow problems (if you kept bounds
> checking on).

Making a.c its own segment is really tough in C. You are, after all,
allowed to memcpy() the contents of one instance of struct A to
another, or from one instance of A to an appropriately sized array of
unsigned chars and then back to an instance of A. And you can do this
in code that no longer knows that some pointer is pointing to an
instance of A.

C really does like to consider objects as continuous runs of bytes.

nm...@cam.ac.uk

unread,

Jan 26, 2011, 1:54:56 PM1/26/11

to

In article <ihpiid$i09$1...@USTR-NEWS.TR.UNISYS.COM>,

Tim McCaffrey <timca...@aol.com> wrote:
>In article
><b67406a8-5ddf-486a...@k21g2000prb.googlegroups.com>,
>robert...@yahoo.com says...
>>

>> struct A {char c[8]; void *f();} a;
>>
>> strcpy(a.c, "abcdefgh\x01\x23\x45\x67");
>> a.f();
>>
>>Will generate a call to 0x01234567 or 0x67452301 on most 32 bit

>>implementations, and making 'a' its own segment doesn't help at all.

>
>Ok, I would claim that this shows how C doesn't like segmentation (it was
>developed with a flat memory model, and you really do run into problems when
>you run it on a segmented machine).

I am sorry, but that is complete nonsense. There are problems with
POSIX and segmentation, but none at all with C, and the systems it
was originally developed for were segmented, though not in the same
way the Intel x86 was/is.

Robert Wessel is correct that segmentation doesn't help at all with
such checking, but that's an ENTIRELY different matter.

Regards,
Nick Maclaren.

Bakul Shah

unread,

Jan 26, 2011, 3:00:25 PM1/26/11

to

You need OS support to say this here block of data (read from
a file) is actually code and please allow me to execute it. Other
stuff can be done in user mode. mmap() for example!

nm...@cam.ac.uk

unread,

Jan 26, 2011, 2:32:18 PM1/26/11

to

In article <4D407D59...@bitblocks.com>,

You don't even need that. The bare minimum is NOTHING but the
ability to load an executable and extract its symbol table (often
as separate operations). If you can't mark data as executable,
you get an extra load on some (but not all) architectures, but
that is all.

Regards,
Nick Maclaren.

Tim McCaffrey

unread,

Jan 26, 2011, 7:10:39 PM1/26/11

to

In article <ihpqm0$fsf$1...@gosset.csi.cam.ac.uk>, nm...@cam.ac.uk says...

>
>In article <ihpiid$i09$1...@USTR-NEWS.TR.UNISYS.COM>,
>Tim McCaffrey <timca...@aol.com> wrote:

>>Ok, I would claim that this shows how C doesn't like segmentation (it was
>>developed with a flat memory model, and you really do run into problems
when
>>you run it on a segmented machine).
>
>I am sorry, but that is complete nonsense. There are problems with
>POSIX and segmentation, but none at all with C, and the systems it
>was originally developed for were segmented, though not in the same
>way the Intel x86 was/is.

Was that the PDP-9? The PDP-11 certainitly didn't have segments.

A lot of C code relied (past tense, I think this is less so now) on flat
address space math with pointers (and sizeof(ptr) == sizeof(int)).

>
>Robert Wessel is correct that segmentation doesn't help at all with
>such checking, but that's an ENTIRELY different matter.
>

Well, I don't agree, having used a system/OS/language where it did help (a
lot).

C is not the best language if you want RAS/security/etc.
Unix was written in C, so it will tend to have the same attitudes/blind
spots.

All IMHO, of course.

- Tim

Robert Myers

unread,

Jan 26, 2011, 7:20:28 PM1/26/11

to

On Jan 26, 7:10 pm, timcaff...@aol.com (Tim McCaffrey) wrote:

>
> C is not the best language if you want RAS/security/etc.
> Unix was written in C, so it will tend to have the same attitudes/blind
> spots.
>

Best language, Tim? Nothing aside from RAS/Security matters.

Robert

N.B. I *****hate***** C and its descendents. I'd spend the rest of
my life ridding the world of it if there were a plausible replacement.

John Levine

unread,

Jan 26, 2011, 11:50:18 PM1/26/11

to

>I haven't studied the methods used by modern dynamic linkers in
>any detail, but you do NOT need any particular hardware support,
>or even any operating system support. Been there - done that.

I have studied them in considerable detail* and mostly agree with you.
On systems with a single process address space, all you need is
something like mmap() to map disk files into the process. Systems
with ELF shared libraries (BSD, Linux, etc.) work this way. On an
exec() call, the kernel maps in the main program, maps in ld.so the
dynamic loader, sets up the stack, and starts the loader. It then
uses normal user mode code to map in all the libraries, do any
required linking and relocation, and then jumps to the start of the
program.

The only other place you might need kernel help is on systems where
code and data addresss are different, so you need the OS to help
you mark code pages as executable.

R's,
John

* http://astore.amazon.com/theinvincibleele/detail/1558604960

John Levine

unread,

Jan 26, 2011, 11:52:07 PM1/26/11

to

>POSIX and segmentation, but none at all with C, and the systems it
>was originally developed for were segmented, though not in the same
>way the Intel x86 was/is.

You must be thinking of some other language. The first versions of C
ran on the GE 635 and PDP-11, both of which have a flat address space.

R's,
John

Bakul Shah

unread,

Jan 27, 2011, 2:07:39 AM1/27/11

to

You need OS support. Modern processors won't allow you to fetch
code from a page whose PTE doesn't have the execute bit set &
that marking is in the OS's domain. You may be thinking of
earlier Intel processors which didn't have a separate execute
bit in PTEs, which was exploited by virus writers to inject
code in data segment and jump to it (or "return" to it).

On dyanmic linking you may wish to read John Levine's excellent
book on Linkers & Loaders (don't think much has changed since
1999 so still mostly up-to-date).

nm...@cam.ac.uk

unread,

Jan 27, 2011, 9:38:48 AM1/27/11

to

I have more respect for Dennis Ritchie than THAT! He was designing
a semi-portable assembler, and there is no way that he would have
not developed the language for a reasonable range of the machines
it was likely to be used on.

I cannot now remember exactly which machines had separate code
and data spaces (and occasionally even more than one data space),
where code address 0x1234 did not refer to the same location as
data address 0x1234, but they were fairly widespread, and I think
that some PDPs were among them.

And such a system does NOT have a flat address space - I described it
as segmented, but not in the same way the the Intel x86 was/is, which
is true.

Regards,
Nick Maclaren.

Andy "Krazy" Glew

unread,

Jan 27, 2011, 2:24:07 PM1/27/11

to

On 1/26/2011 10:35 AM, Bakul Shah wrote:
> On 1/26/11 12:38 AM, Morten Reistad wrote:
>> In article<ihnuu9$1q24$1...@gal.iecc.com>, John Levine<jo...@iecc.com>
>> wrote:
> I don't think COW/COT have problems with SMP (you do have to
> protect the shared operation with a mutex) or with multiple
> threads. SGI did COW & threads. So do more modern OSes.

COW/COT don't have problems with SMP - nothing impossible to do.

However, any modification of the page tables potentially requires a TLB
shootdown. Or at least those modifications that tighten permission.

E.g. a R/W PTE is cached in many TLBs. You COW it to read-only, and you
must invalidate the copy in all remote TLBs.

The relative cost of TLB shootdown has increased over the years.

nm...@cam.ac.uk

unread,

Jan 27, 2011, 3:31:48 PM1/27/11

to

In article <67GdnRzDc-nDW9zQ...@giganews.com>,

Not to say, the potential for chaos when things go wrong :-(

That's one of the many reasons that I think that the approach of
confounding threads and processes is a mistake. With my preferred
approach, the cost of COW/COT for threaded programs would go up
very considerably, but it would be much simpler and cleaner.
I.e. you would never have a TLB shootdown between threads - it
would quiesce the whole process first.

Regards,
Nick Maclaren.

Rick Jones

unread,

Jan 27, 2011, 3:41:30 PM1/27/11

to

"Andy \"Krazy\" Glew" <an...@spam.comp-arch.net> wrote:
> E.g. a R/W PTE is cached in many TLBs. You COW it to read-only, and
> you must invalidate the copy in all remote TLBs.

> The relative cost of TLB shootdown has increased over the years.

Well, if the CPU designers had been able to keep giving us ever faster
cores, we wouldn't have so many TLBs to go shoot-at in the first place
:)

rick jones
--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

MitchAlsup

unread,

Jan 27, 2011, 5:14:19 PM1/27/11

to

On Jan 27, 2:41 pm, Rick Jones <rick.jon...@hp.com> wrote:

> "Andy \"Krazy\" Glew" <a...@spam.comp-arch.net> wrote:
>
> > E.g. a R/W PTE is cached in many TLBs. You COW it to read-only, and
> > you must invalidate the copy in all remote TLBs.
> > The relative cost of TLB shootdown has increased over the years.
>
> Well, if the CPU designers had been able to keep giving us ever faster
> cores, we wouldn't have so many TLBs to go shoot-at in the first place
> :)

I have (¿had?) been advocating coherent TLBs for about 15 years. {A
coherent TLB sees modifications to the page tables and takes
appropriate action to avoid the need to shootdowns.}

At least ONE system could not BOOT with a coherent TLB, and required
the TLB to hold onto translations while some boot process was moving
the page tables around in memory. {So rather than fix the firmware
(wich had a new release 3 times a year), they accepted the overhead of
the MP shootdowns and the inherent software overhead}

None of the CPU architectures I suggested were capable of accepting
the microarchitectural changes necessary to support this. Even though
it would get rid of all of the IPIs associated with the shootdowns
(and some of the changes would have made TLB loading faster as well).
In the end, MS basically pulled off a delayed IPI software subsystem
that dramatically reduced the number of IPIs due to TLB shootdowns at
some cost in recycling pages from active to allocatable.

Mitch

John Levine

unread,

Jan 27, 2011, 5:26:59 PM1/27/11

to

>I have more respect for Dennis Ritchie than THAT! He was designing
>a semi-portable assembler, and there is no way that he would have
>not developed the language for a reasonable range of the machines
>it was likely to be used on.

You don't have to speculate. In 1993 he wrote an article about it:

http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

The first target was the PDP-11, then the GE 635, and IBM 360/370, all
of which have flat address spaces.

>I cannot now remember exactly which machines had separate code
>and data spaces (and occasionally even more than one data space),

Oh, that. Yes, the PDP-11/45 and PDP-11/70 had separate address and
data spaces, but it's really a stretch to call that segmented. All
the data were in one address space and all the code was in the other.
Data pointers were flat 16 bit addresses, code pointers were flat 16
bit addresses.

Having done quite a lot of Unix kernel hackery in that era (there was
an unused third system mode which I used to provide direct access to
bitmap terminal screen memory), and having studied the PDP-11 C
compiler enough to use its code generation pass unmodified in a
production Fortran 77 compiler, I can promise you that there was
nothing in the compiler affected by split instructions and data other
than a few lines in the code generator to tell the asssembler whether
to emit into the code section or the data section of the object file.

We had an 11/05, with a single 16 bit address space, that did the
terminal emulation, sharing the screen memory. I compiled the C code
for it using the regular Unix compiler. The only tricky bit was
avoiding code that would generate multiply or divide instructions,
since the 11/05 didn't have those.

R's,
John

nm...@cam.ac.uk

unread,

Jan 27, 2011, 5:12:42 PM1/27/11

to

In article <ihsrfj$2e2q$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:
>
>>I have more respect for Dennis Ritchie than THAT! He was designing
>>a semi-portable assembler, and there is no way that he would have
>>not developed the language for a reasonable range of the machines
>>it was likely to be used on.
>
>You don't have to speculate. In 1993 he wrote an article about it:
>
>http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
>
>The first target was the PDP-11, then the GE 635, and IBM 360/370, all
>of which have flat address spaces.

I am not speculating. I said that I have more respect for Dennis
Ritchie than you imply I should. A competent language designer
will allow for not merely the immediate targets, but the ones for
which it is reasonably likely that might be used.

There are a fair number of restrictions and constraints in most
languages that don't get used initially, or sometimes at all,
because the designers didn't want to constrain their target systems
too much, but where those targets weren't actually used.

>>I cannot now remember exactly which machines had separate code
>>and data spaces (and occasionally even more than one data space),
>
>Oh, that. Yes, the PDP-11/45 and PDP-11/70 had separate address and
>data spaces, but it's really a stretch to call that segmented. All
>the data were in one address space and all the code was in the other.
>Data pointers were flat 16 bit addresses, code pointers were flat 16
>bit addresses.

In the context of this thread, which was PRECISELY about a union
of a data pointer and code pointer, that is precisely the non-flat
property that was being discussed. What would YOU call that, if
not segmented? Because it is assuredly not flat in the BCPL, C
and POSIX sense.

Regards,
Nick Maclaren.

Message has been deleted

Andy "Krazy" Glew

unread,

Jan 27, 2011, 8:24:45 PM1/27/11

to

On 1/27/2011 2:14 PM, MitchAlsup wrote:
> On Jan 27, 2:41 pm, Rick Jones<rick.jon...@hp.com> wrote:
>> "Andy \"Krazy\" Glew"<a...@spam.comp-arch.net> wrote:
>>
>>> E.g. a R/W PTE is cached in many TLBs. You COW it to read-only, and
>>> you must invalidate the copy in all remote TLBs.
>>> The relative cost of TLB shootdown has increased over the years.
>>
>> Well, if the CPU designers had been able to keep giving us ever faster
>> cores, we wouldn't have so many TLBs to go shoot-at in the first place
>> :)
>
> I have (�had?) been advocating coherent TLBs for about 15 years. {A
> coherent TLB sees modifications to the page tables and takes
> appropriate action to avoid the need to shootdowns.}

Present TLB architectures (on x86):

* non-coherent (non-snoopy) TLBs. Essentially, software managed TLB
coherence.

* no hardware TLB shootdown mechanism. Ya gotta do it yourself using
interprocessor interrupts.

Various other ISAs have had hardware TLB shootdown mechanisms - e.g,.
instructions that caused a message to be sent around to some subset of
the other processors in the system, saying "invalidate all of your TLB"
or "invalidate the TLB entry for such and such a virtual address"
(requires some notion of ASIDs) or "invalidate all TLB entries for such
and such a physical address" (which, by the way, requires that all
processors have the same notion of physical address, or a translation).

Most of these schemes work well in some configurations, but fail in
others. E.g. the Itanium TLB shootdown instruction originally waited
until all remote processors had performed the TLB invalidation. And
there could only be one such at a time. Can you say BOTTLENECK?

So, the hardware guys go off and design asynchronous and/or pipelined
and/or batched versions of the TLB invalidation instruction. And in
the meantime, the software guys fall back to using ... interrupt based
TLB shootdown.

And, as Mitch mentions, there are OSes and BIOSes and, yes, even CPU
microcode routines that depend (in an implicit, non-documented, FUD-ful
manner) on not being able to put off TLB invalidations until some
critical section is done. For CPU microcode, this happens naturally
because of interrupt blocking.

If you have a TLB shootdown instruction, you have to be able to block it
in certain critical sections. Much like you need to block interrupts.

If you want to build a coherent, snoopy, TLB, well, you *COULD* block
writes to the pages holding TLB entries in those critical sections.
But, that is a recipe for deadlock (P1's write is blocked waiting for
P2 to get out of a TLB critical section, but P2 needs a flag held by P1).

Better to provide instructions or operations so that OS and microcode
can hold translations in registers - thereby holding in registers all of
the state needed for those critical sections.

Then you can start doing coherent TLBs or TLB shootdown instructions.

In any case, you'll have problems with those existing but undocumented
FUDdish TLB critical sections. Either take a deep breath and go after
them, validating carefully.

(This is probably what you would do. Probably with special caveats
like "You can use the TLB shootdown for all "ordinary" user pages, but
you may want to fall back to interrupt based TLB shootdown for pages
mapping certain hardware datastructures.
But... virtual machines say that all priovileged pages look to
somebody else like ordinary user pages.)

Or come up with a scheme that allows the existing code to run
(especially microcode). This might be one of the best uses of
transactional memory: put all such microcode flows in a transaction.
If TLB shootdown affects them, restart the transaction. If not making
forward progress, stop the world while that instruction finishes.

This works okay for microcode flows, since we know to put the TX_START
at TX_END at the beginning and end of flow.

It would not work so well for OS code, since we don't know where to put
the transactions. Wrapping all OS code in such a transaction might be
correct, but badly slow. (And not even correct if you ever do
non-interrupt based interprocessor communication - "I'll set a flafgf
and spin until you clear it".)

There's nothing conceptually hard here. It's just a simple matter of
programming, making changes to code whose original author retired many
years ago.

Andy "Krazy" Glew

unread,

Jan 27, 2011, 8:32:15 PM1/27/11

to

I would call that a flat but partitioned address space.

Or, rather, two different address spaces that are each, respectively,
flat. And fixed size to boot. And non-overlapping (although you could
arrange

The term segment is usually used in the sense of multiple variable
sized spans or objects or ranges.

Yes, the term segment is overloaded. But the variable sizzed sense is
the most common sense.

Bakul Shah

unread,

Jan 28, 2011, 1:36:57 AM1/28/11

to

It's a one time cost and worth it if the cost of copying a page is less
compared to invalidate of TLB on other cores. The other thing to note
is that regardless of COW, running threads on N cores is going to scale
only so far.

Terje Mathisen

unread,

Jan 28, 2011, 1:40:41 AM1/28/11

to

nm...@cam.ac.uk wrote:
> In the context of this thread, which was PRECISELY about a union
> of a data pointer and code pointer, that is precisely the non-flat
> property that was being discussed. What would YOU call that, if
> not segmented? Because it is assuredly not flat in the BCPL, C
> and POSIX sense.

Nick, even if you store a code (function) pointer in a data segment,
that does not automatically turn said segment into a code segment. (Or
did the PDP require all function pointers to be located in the code
segment? That would indeed make it much harder to update such things!)

It is the act of doing an indirect branch via the contents of said
pointer that turns it into a code pointer, but viewed in memory it is
still just a 16-bit value without any annotations.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Bakul Shah

unread,

Jan 28, 2011, 2:13:59 AM1/28/11

to

On 1/27/11 2:24 PM, Morten Reistad wrote:
> In article<4D406955...@bitblocks.com>,

> Bakul Shah<use...@bitblocks.com> wrote:
>> On 1/26/11 12:38 AM, Morten Reistad wrote:
>>> In article<ihnuu9$1q24$1...@gal.iecc.com>, John Levine<jo...@iecc.com> wrote:
>

>>> This stragegy works badly in a modern OS setting. First, it does
>>> not work at all with SMP; and it has severe issues with multiple
>>> threads inside the same memory space.
>>
>> Which strategy? Inverted page tables or copy-on-{write,touch}?

>> I don't think COW/COT have problems with SMP (you do have to
>> protect the shared operation with a mutex) or with multiple

>> threads. SGI did COW& threads. So do more modern OSes.
>
> To have the linker manipulate (kernel) tables directly; like
> on the various OS'es for the 360 architecture.

Ah, I get it. Thanks!

> Linkers are complicated animals, and very prone to various
> attacks. Lots of the windows attacks get in via manupulation
> of the linkage handling through some strategic bytecode
> injection.
>
> Having this stuff running with kernel privilges is not a good
> strategy.

Agreed on both counts. I put the idea out here in the hope it
triggers someone's synapses to come up with a *better* solution
(I gave the criterion). Not surprising that it has not.

nm...@cam.ac.uk

unread,

Jan 28, 2011, 2:58:48 AM1/28/11

to

In article <b05a18-...@ntp6.tmsw.no>,

Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
>> In the context of this thread, which was PRECISELY about a union
>> of a data pointer and code pointer, that is precisely the non-flat
>> property that was being discussed. What would YOU call that, if
>> not segmented? Because it is assuredly not flat in the BCPL, C
>> and POSIX sense.
>
>Nick, even if you store a code (function) pointer in a data segment,
>that does not automatically turn said segment into a code segment. (Or
>did the PDP require all function pointers to be located in the code
>segment? That would indeed make it much harder to update such things!)

As far as I know, it didn't.

>It is the act of doing an indirect branch via the contents of said
>pointer that turns it into a code pointer, but viewed in memory it is
>still just a 16-bit value without any annotations.

Precisely. But the point is that addresses were NOT a simple 0:N
for all of memory, which is what is usually meant by a flat
addressing model. And, unlike (most of) the BCPL designs, you
could not simply use a function pointer as a data pointer and get
access to the code of the function.

Regards,
Nick Maclaren.

Message has been deleted

Tim McCaffrey

unread,

Jan 28, 2011, 6:08:14 PM1/28/11

to

In article <0150d12d-2c25-4314...@21g2000prv.googlegroups.com>,
rbmye...@gmail.com says...

I won't claim I know all languages, so my choices are from a somewhat limited
background.

I'm not thrilled with Java (or any GC based language).
I do like Pascal, although its got its own problems (Delphi seemed very "sane"
to me, not sure why it didn't take off).
Algol is a bit primitive (at least the one I worked with).
Snobol was fun, but like Perl, you can get a bit lost.
Cobol, well....
Fortran 77 needs help, I'm haven't learned Fortran 90.
Lisp and I just didn't get along (although I did port an interpreter).
APL looked useful if you need the ultimate calculator.
I never used Modula2, but it always looked like slightly updated Pascal to me.

Perhaps after I retire (which is hopefully a long ways off, unless I win the
lottery) I'll write the compiler for the language I wish I had all these
years. Of course, it'll be too late to do anybody any good... :(

- Tim

MitchAlsup

unread,

Jan 28, 2011, 7:21:48 PM1/28/11

to

On Jan 26, 6:20 pm, Robert Myers <rbmyers...@gmail.com> wrote:
> N.B. I *****hate***** C and its descendents. I'd spend the rest of
> my life ridding the world of it if there were a plausible replacement.

But as of now, there is not, and looking towards the future, I do not
see how one could ever be--unless it looked and smelled so much
like C that one could cross compile C into it--and at that point, why
bother?

In addition, there are people who can write error free C (or at least
as error as cn be written in languages that watch really out hard for
the programmer--trying to detect errors before .....) There might not
be many of these people, though.....

Mitch

Bill Findlay

unread,

Jan 28, 2011, 8:19:55 PM1/28/11

to

On 28/01/2011 23:08, in article ihvi8u$q15$1...@USTR-NEWS.TR.UNISYS.COM, "Tim
McCaffrey" <timca...@aol.com> wrote:

> I won't claim I know all languages, so my choices are from a somewhat limited
> background.
>
> I'm not thrilled with Java (or any GC based language).
> I do like Pascal, although its got its own problems (Delphi seemed very "sane"
> to me, not sure why it didn't take off).
> Algol is a bit primitive (at least the one I worked with).
> Snobol was fun, but like Perl, you can get a bit lost.
> Cobol, well....
> Fortran 77 needs help, I'm haven't learned Fortran 90.
> Lisp and I just didn't get along (although I did port an interpreter).
> APL looked useful if you need the ultimate calculator.
> I never used Modula2, but it always looked like slightly updated Pascal to me.
>
> Perhaps after I retire (which is hopefully a long ways off, unless I win the
> lottery) I'll write the compiler for the language I wish I had all these
> years. Of course, it'll be too late to do anybody any good... :(

Why do you consider only ancient languages?
If you like Pascal, you should love Ada 2012.
(And so should anyone, who is serious about RAS.)

--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;

John Levine

unread,

Jan 29, 2011, 12:21:03 AM1/29/11

to

>>Nick, even if you store a code (function) pointer in a data segment,
>>that does not automatically turn said segment into a code segment. (Or
>>did the PDP require all function pointers to be located in the code
>>segment? That would indeed make it much harder to update such things!)

Heck, no. Pointers were in the data segment. The only data fetched
from the code segment was immediate values in instructions, using the
(PC) or (PC)+ addressing mode.

>Precisely. But the point is that addresses were NOT a simple 0:N
>for all of memory, which is what is usually meant by a flat
>addressing model. And, unlike (most of) the BCPL designs, you
>could not simply use a function pointer as a data pointer and get
>access to the code of the function.

As I recall, we considered it a feature that you couldn't go smashing
your code by storing into it. In any event, we're in Humpty Dumpty
land here. The usual term for separate instruction and data
addressing is a Harvard architecture. (Not a very good name, since it
has little to do with the computers Aiken built at Harvard, but it's
too late to fix now.) If you want to call that segments, nobody's
going to stop you, but it's not what anyone else I know would call
segmented addressing. Dennis can certainly speak for himself, but I
would be pretty surprised if the C type system was invented to deal
with PDP-11 addressing, rather than to improve the error checking and
code generation.

R's,
John