Multics Concepts For the Contemporary Computing World

32 views
Skip to first unread message

John Ahlstrom

unread,
Jun 27, 2003, 3:07:47 PM6/27/03
to

In alt.os.multics
Tom Van Vleck wrote:
>
> I don't have time now to do this topic justice, but here are a
> few remarks, from the point of view of someone who left the
> OS design team in 1981.
>
--snip snip
>
> 7. Discussion on the "kernel" point has missed a key aspect.
> Microkernel systems such as Mach work by message passing. Multics
> had a notion of a "kernel" and there was a design project to
> separate ring 0 into kernel and non-kernel, and multiple projects
> to move stuff out of ring 0, mostly never shipped. But these two
> are not the same thing: there was never any proposal to introduce
> message-passing calls into the Multics architecture. So this is
> a big choice, to be made at the very beginning. Message passing
> architectures like Mach's are great for structure, but there's a
> heavy performance penalty you pay up front, in argument
> marshaling and so on. I worked on Tandem systems, and because
> they were fundamentally message passing, they were able to expand
> to multiprocessors and clusters with ease.
-snip snip

What about architectural support for message passing?
IIRC the GEC 4080 had such support.
From: http://www.cucumber.demon.co.uk/geccl/4000series/4080sales.html

> 4000 NUCLEUS TIMES
> (microseconds, typical)
[JKA: machine had 550nsec memory cycle]
> Semaphore operations
> no program change - 4.95
> program change - 35
> Segment load
> - 7.5
> Inter-chapter branch
> no segment change - 4.6
> segment change - 9.1
> Start input/output
> - 20.0
> Interrupt
> no program change- 8.7
> program change - 42
> Inter-process message
> no program change - 35
> program change - 55
>


Does the 4080 have any successors?
Any similar support in other architectures?
Can such support change the performance penalty enough to
make message passing cost-effective?

--
I don't think average programmers would get along very well
with languages that force them to think about their design
decisions before they plunge into coding.
Brian Inglis

Stephen Fuld

unread,
Jun 27, 2003, 3:40:50 PM6/27/03
to

"John Ahlstrom" <jahl...@cisco.com> wrote in message
news:3EFC9603...@cisco.com...

>
> In alt.os.multics
> Tom Van Vleck wrote:
> >
> > I don't have time now to do this topic justice, but here are a
> > few remarks, from the point of view of someone who left the
> > OS design team in 1981.
> >
> --snip snip
> >
> > 7. Discussion on the "kernel" point has missed a key aspect.
> > Microkernel systems such as Mach work by message passing. Multics
> > had a notion of a "kernel" and there was a design project to
> > separate ring 0 into kernel and non-kernel, and multiple projects
> > to move stuff out of ring 0, mostly never shipped. But these two
> > are not the same thing: there was never any proposal to introduce
> > message-passing calls into the Multics architecture. So this is
> > a big choice, to be made at the very beginning. Message passing
> > architectures like Mach's are great for structure, but there's a
> > heavy performance penalty you pay up front, in argument
> > marshaling and so on. I worked on Tandem systems, and because
> > they were fundamentally message passing, they were able to expand
> > to multiprocessors and clusters with ease.
> -snip snip
>
> What about architectural support for message passing?

Didn't the Elixi "mini-super" computer have such support?

--
- Stephen Fuld
e-mail address disguised to prevent spam


Russell Williams

unread,
Jun 30, 2003, 7:01:45 PM6/30/03
to
Stephen Fuld" <s.f...@PleaseRemove.att.net> wrote in message
news:651La.24517$3o3.1...@bgtnsc05-news.ops.worldnet.att.net...

Elxsi implemented message passing in hardware and microcode, and
used the control of link (port in Mach terms) ownership as the
fundamental system security mechanism, (along with the fact that
only the memory manager process had the hardware page tables
in its virtual address space). I/O completions showed up as messages
from controllers. It was basically a multi-server (Gnu Hurd-like)
system. Message passing was about at least an order of magnitude
slower than a function call (more if you sent much data by value).

1-2 orders of magnitude is well within the bounds where reasonable
partitioning of the OS would make the cost of message passing
insignificant. (On the other hand, a couple of bad partitioning
decisions were made that made those costs painful; refactoring had
to occur). The benefit was that we got excellent scaling from 1-12
processors, including the first (AFAIK) observations of
super-linear speedup (because adding processors added cache).

The machine was strange by today's standard in other ways: 64 bit
registers / integers, but only 32 bit virtual addresses. Cobol screamed
because you could do decimal in registers. The first
fast implementations of full IEEE SP/DP floating.

The hardware based messages and multi-server structure made for
some strange effects: on a machine with lots of RAM, you could be
using the source debugger on the memory manager while other users
continued their work without pause. We had a Unix server that
accepted "system call" messages from Posix processes (again, a good
partitioning got us lots of parallelism by farming out work to other
servers without too much time spent in message passing).

A technically interesting and successful design, but both its technical and
marketing niches were closed by the advance of the killer micros. Our
big competition was high-end VAXes, at a time when VAX software was
already entrenched, and the market for that class of hardware was being
supplanted by RISC workstations.

Russell Williams
not speaking for Adobe Systems


Stephen Fuld

unread,
Jun 30, 2003, 11:32:10 PM6/30/03
to

"Russell Williams" <williams...@adobe.com> wrote in message
news:tj3Ma.2789$Ry3.1...@monger.newsread.com...

> Stephen Fuld" <s.f...@PleaseRemove.att.net> wrote in message
> news:651La.24517$3o3.1...@bgtnsc05-news.ops.worldnet.att.net...
> >
> > "John Ahlstrom" <jahl...@cisco.com> wrote in message
> > news:3EFC9603...@cisco.com...
> > >
> > > In alt.os.multics
> > > Tom Van Vleck wrote:
> > > What about architectural support for message passing?
> >
> > Didn't the Elixi "mini-super" computer have such support?
>
> Elxsi implemented message passing in hardware and microcode,

Rest of very good explanation snipped

Thanks, I am glad I remembered correctly and your explanation of both the
technical and business issues was well done. I remember that it used huge
boards with ECL circuitry and big fans and thus was unsuitable for what we
were looking for at the time, but I remember being impressed with the
thought that went into its designs.

So, the obvious question is then, is there something that makes sense from
that idea to adapt into current microprocessor designs in order to give the
advantages of low cost message passing, and ease the development of more
modular software that would use it?

Cliff Sojourner

unread,
Jul 1, 2003, 12:21:24 AM7/1/03
to
> So, the obvious question is then, is there something that makes sense from
> that idea to adapt into current microprocessor designs in order to give
the
> advantages of low cost message passing, and ease the development of more
> modular software that would use it?

if it were easy to get the benefits of message passing OS then it would have
happened a long time ago.

programming a Tandem, for example, requires a very different mindset than
programming any *NIX system. by "programming" I mean "doing it properly".

also, as was pointed out earlier in this thread, not all applications can or
should pay the huge cost of message passing for the relatively minor gains
of scalability, atomicity, fault tolerance, manageability, reliability, etc.

but you're on the right track - how can we make message passing systems
attractive to "regular" applications?

tough question!


Rupert Pigott

unread,
Jul 1, 2003, 4:41:57 AM7/1/03
to
"Cliff Sojourner" <c...@employees.org> wrote in message
news:8%7Ma.4154$Xm3.1087@sccrnsc02...

> > So, the obvious question is then, is there something that makes sense
from
> > that idea to adapt into current microprocessor designs in order to give
> the
> > advantages of low cost message passing, and ease the development of more
> > modular software that would use it?
>
> if it were easy to get the benefits of message passing OS then it would
have
> happened a long time ago.
>
> programming a Tandem, for example, requires a very different mindset than
> programming any *NIX system. by "programming" I mean "doing it properly".
>
> also, as was pointed out earlier in this thread, not all applications can
or
> should pay the huge cost of message passing for the relatively minor gains
> of scalability, atomicity, fault tolerance, manageability, reliability,
etc.

By 'cost' do you mean that it takes longer to communicate
via message passing than shared memory ?

I don't see why this should be so. In a NUMA system or a
message passing system for a message to get from CPU A to
CPU B it will still have to travel along a very similar
signal path. So it can't be the plumbing that slows it down

If you are talking about a locally delivered message then
perhaps it could be slower, simply because you are eating
bandwidth to make a copy (and pranging the cache to boot).

The trick seems to be to make messages cheap in the
hardware, this has been done many many times. From my
point of view keeping it simple is the way to go here,
so you start by throwing your Ethernet gear into a skip. :)

I've seen a few people smugly compare TCP over Ethernet
with custom built shared-memory pipes down the years and
make the bogus leap of intuition that message passing
is slow... I suppose to counter that they would really
need to see how shared memory performs across TCP +
Ethernet. My guess would be : considerably more shite
than message passing.

Cheers,
Rupert


Sander Vesik

unread,
Jul 1, 2003, 8:37:06 AM7/1/03
to

Couledn't you add something like that onto a "conventional" processor?

--
Sander

+++ Out of cheese error +++

Anne & Lynn Wheeler

unread,
Jul 1, 2003, 10:47:10 AM7/1/03
to

"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> writes:
> By 'cost' do you mean that it takes longer to communicate
> via message passing than shared memory ?
>
> I don't see why this should be so. In a NUMA system or a
> message passing system for a message to get from CPU A to
> CPU B it will still have to travel along a very similar
> signal path. So it can't be the plumbing that slows it down
>
> If you are talking about a locally delivered message then
> perhaps it could be slower, simply because you are eating
> bandwidth to make a copy (and pranging the cache to boot).
>
> The trick seems to be to make messages cheap in the
> hardware, this has been done many many times. From my
> point of view keeping it simple is the way to go here,
> so you start by throwing your Ethernet gear into a skip. :)
>
> I've seen a few people smugly compare TCP over Ethernet
> with custom built shared-memory pipes down the years and
> make the bogus leap of intuition that message passing
> is slow... I suppose to counter that they would really
> need to see how shared memory performs across TCP +
> Ethernet. My guess would be : considerably more shite
> than message passing.

SCI was attempt to be all things to all people (mappings for
cache memory, disk protocol, etc):
http://www.scizzl.com/
http://hsi.web.cern.ch/HSI/sci/sci.html
http://www.computer.org/proceedings/lcn/1591/15910691abs.htm
http://lists.insecure.org/linux-kernel/2001/Jul/1421.html

i just ran across hardware announcement about targeting IPv6 on OC192
environments and doing 266million searches/second.
http://www.commsdesign.com/story/OEG20030630S0053

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Internet trivia 20th anv http://www.garlic.com/~lynn/rfcietff.htm

Peter da Silva

unread,
Jul 1, 2003, 1:38:36 PM7/1/03
to
In article <8%7Ma.4154$Xm3.1087@sccrnsc02>,

Cliff Sojourner <c...@employees.org> wrote:
> if it were easy to get the benefits of message passing OS then it would have
> happened a long time ago.

If the message passing is cheap enough (significantly less than system
call overhead in a "traditional" OS) then the message-passing system
can be faster than the traditional one. The problem with message
passing systems isn't the message passing overhead, it's that you have
to do a lot of work trying to avoid any service becoming a bottleneck.
Even on the Amiga where it was four instructions to put a message on a
queue the bottlenecks in the file system became a problem.

In a monolithic UNIX kernel this kind of thing comes for free, since
each system call automagically gets its own process context to handle
the whole operation from start to finish you never ended up blocked on
a read because some server somewhere was blocked on someone else's
request.

But now that I've mentioned the Amiga, I have to say that it did happen
a long time ago. There's unfortunately non-technical reasons why one
system or another becomes dominant or fails (for example, getting used
as a pawn in a war between Jack Tramiel and his former employers didn't
do the Amiga any good).

> but you're on the right track - how can we make message passing systems
> attractive to "regular" applications?

Message passing systems are a natural for GUI applications, and may
turn out still to be what they need. God knows there needs to be SOME
kind of fundamental paradigm shift in that environment.

--
#!/usr/bin/perl
$/="%\n";chomp(@_=<>);print$_[rand$.]

Peter da Silva, just another Perl poseur.

Geoff Lane

unread,
Jul 1, 2003, 1:49:33 PM7/1/03
to
In alt.folklore.computers Peter da Silva <pe...@abbnm.com> wrote:
> If the message passing is cheap enough (significantly less than system
> call overhead in a "traditional" OS) then the message-passing system
> can be faster than the traditional one.

Message passing also has another advantage - it defines interfaces that
cannot be subverted. Monolithic kernels allow poor programmers to bypass
defined interfaces in the interests of "effiency"

--
Geoff Lane

Barry Margolin

unread,
Jul 1, 2003, 1:57:05 PM7/1/03
to
In article <3f01c9ad$0$56600$bed6...@pubnews.gradwell.net>,

On the other hand, it also traps you into using those interfaces. If you
don't get the design right, it can be difficult to work around it. Ideally
this shouldn't be a problem, but in a practical sense it often is.

--
Barry Margolin, barry.m...@level3.com
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Tom Van Vleck

unread,
Jul 1, 2003, 2:54:04 PM7/1/03
to
"Rupert Pigott" wrote:
> I don't see why this should be so. In a NUMA system or a
> message passing system for a message to get from CPU A to
> CPU B it will still have to travel along a very similar
> signal path. So it can't be the plumbing that slows it down
>
> If you are talking about a locally delivered message then
> perhaps it could be slower, simply because you are eating
> bandwidth to make a copy (and pranging the cache to boot).
>
> The trick seems to be to make messages cheap in the
> hardware, this has been done many many times.

One cost of message based systems is making copies of
things. To make a message passing call, one has to at
minimum determine the size of the arguments, allocate a
message object, marshal the arguments into it, queue and
dequeue the message, and free the message object. If the
calling site and the called site do not share memory, than
additional copying and buffering is necessary. The storage
for the copies is either preallocated and mostly idle, or
is allocated and freed from a pool of storage, at the cost
of additional complexity; in either case it adds to memory
pressure.

Another cost is synchronization. Each allocation, freeing,
queueing, or dequeueing operation needs atomicity; whether
hidden in the hardware or done explicitly in software, this
synchronization requires some cost even if there is never a
conflict that causes one thread to delay.

My experience with message passing systems is that they
start out by penalized a factor of about two compared to
direct call systems, and that by employing many clever
strategies, can make up about half the deficit after years
of improvement. Sometimes the elegance, uniformity, and
protection provided by the message passing design is worth
it.

Stephen Fuld

unread,
Jul 1, 2003, 3:01:34 PM7/1/03
to

"Sander Vesik" <san...@haldjas.folklore.ee> wrote in message
news:10570630...@haldjas.folklore.ee...

> In comp.arch Stephen Fuld <s.f...@pleaseremove.att.net> wrote:

snip

> > So, the obvious question is then, is there something that makes sense
from
> > that idea to adapt into current microprocessor designs in order to give
the
> > advantages of low cost message passing, and ease the development of more
> > modular software that would use it?
> >
>
> Couledn't you add something like that onto a "conventional" processor?

I think you essentially restated part of my question. In the Elixi, Russel
pointed out that it needed both hardware and microcode. Now microcode is
passe on most current "conventional" processors, so you have to figure
something else out. In order to cross domains, you probably have to do some
fiddling with page tables or something. You want to avoid the overhead of a
full system call if possible. ISTM that there are some issues here to
resolve that may make it not worth while. Hence my question, and the second
part, which is, assuming that you had cheap message passing, what would it
take for much software to take advantage of it?

Pete Fenelon

unread,
Jul 1, 2003, 3:08:26 PM7/1/03
to
In alt.folklore.computers Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>
> The trick seems to be to make messages cheap in the
> hardware, this has been done many many times. From my
> point of view keeping it simple is the way to go here,
> so you start by throwing your Ethernet gear into a skip. :)
>


Back when I was a youngster slaving away for the military-industrial
complex I did a fair bit of work on Multibus 2 crates for sonar
systems that had quite an amusing little Message Passing Coprocessor
on them - (Intel 82389 rings a bell). Made inter-processor comms on
Multibus about as easy as DMA between a CPU and a peripheral. Nice, even
though Multibus 2 wasn't particularly elegant itself.

pete
--
pe...@fenelon.com "there's no room for enigmas in built-up areas" HMHB

Peter da Silva

unread,
Jul 1, 2003, 2:57:40 PM7/1/03
to
In article <3f01c9ad$0$56600$bed6...@pubnews.gradwell.net>,
Geoff Lane <zza...@buffy.sighup.org.uk> wrote:

Not that any OS has ever moved a component into the kernel to do the same
thing. :)

Peter da Silva

unread,
Jul 1, 2003, 2:59:44 PM7/1/03
to
In article <RXjMa.11$5P2...@paloalto-snr1.gtei.net>,

Barry Margolin <barry.m...@level3.com> wrote:
> In article <3f01c9ad$0$56600$bed6...@pubnews.gradwell.net>,
> Geoff Lane <zza...@buffy.sighup.org.uk> wrote:
> >In alt.folklore.computers Peter da Silva <pe...@abbnm.com> wrote:
> >> If the message passing is cheap enough (significantly less than system
> >> call overhead in a "traditional" OS) then the message-passing system
> >> can be faster than the traditional one.

> >Message passing also has another advantage - it defines interfaces that
> >cannot be subverted. Monolithic kernels allow poor programmers to bypass
> >defined interfaces in the interests of "effiency"

> On the other hand, it also traps you into using those interfaces. If you
> don't get the design right, it can be difficult to work around it. Ideally
> this shouldn't be a problem, but in a practical sense it often is.

No more than any other formalised interface does. If you need to redesign
to get rid of a poorly chosen interface, then it's probably best to be faced
with it up front than to have a new interface grow organically as components
start bypassing it.

Peter da Silva

unread,
Jul 1, 2003, 3:15:55 PM7/1/03
to
In article <thvv-7F26CE.1...@news.comcast.giganews.com>,

Tom Van Vleck <th...@multicians.org> wrote:
> One cost of message based systems is making copies of
> things.

You can use techniques similar to the ones used to cut down or even
eliminate copies in network stacks. All objects, all objects over a certain
size, or all objects designated as "fast copy" are mapped rather than
copied... and may even be allocated out of a shared memory area to cut
down on the amount of page table rearrangement needed. You just need
to agree that the sending component doesn't access the object after
it's sent.

> My experience with message passing systems is that they
> start out by penalized a factor of about two compared to
> direct call systems, and that by employing many clever
> strategies, can make up about half the deficit after years
> of improvement.

My experience is with one particular where message passing was only a few
times slower than a subroutine call. Also all messages were queued, so
rather than making a system call (which meant a context switch), and
then another, and another, a program sends multiple messages and only
then enters a wait and you hit the context switch.

This is similar to what X11 does in bundling multiple operations in one
message, but it applies to all the concurrent operations performed by
one component... so after initialization (which tends to be serialised)
it may be making more "system calls" but only a fraction of them actually
involve a context switch.

It ran into serialization problems, mostly due to components that didn't
keep multiple messages in flight but instead ran each to completion before
attending the next.

Rupert Pigott

unread,
Jul 1, 2003, 5:27:21 PM7/1/03
to

"Tom Van Vleck" <th...@multicians.org> wrote in message
news:thvv-7F26CE.1...@news.comcast.giganews.com...

> "Rupert Pigott" wrote:
> > I don't see why this should be so. In a NUMA system or a
> > message passing system for a message to get from CPU A to
> > CPU B it will still have to travel along a very similar
> > signal path. So it can't be the plumbing that slows it down
> >
> > If you are talking about a locally delivered message then
> > perhaps it could be slower, simply because you are eating
> > bandwidth to make a copy (and pranging the cache to boot).
> >
> > The trick seems to be to make messages cheap in the
> > hardware, this has been done many many times.
>
> One cost of message based systems is making copies of
> things. To make a message passing call, one has to at
> minimum determine the size of the arguments, allocate a
> message object, marshal the arguments into it, queue and
> dequeue the message, and free the message object. If the

That's not quite true. Allocate a receiving buffer (once)
for each "channel", not for each "message". In OCCAM this
was frequently done statically (no run time cost).

Also in a NUMA or SMP machine with cache-coherency the
copying and locking in effect perform the same kind of
interactions that message passing does. On a more
dogmatic day I'd assert that it is *precisely* the same
interaction but with different semantics presented to
the code.

> calling site and the called site do not share memory, than
> additional copying and buffering is necessary. The storage
> for the copies is either preallocated and mostly idle, or
> is allocated and freed from a pool of storage, at the cost
> of additional complexity; in either case it adds to memory
> pressure.

Not a huge concern in recent years judging by the
bloatage that applications have exhibited.

> Another cost is synchronization. Each allocation, freeing,
> queueing, or dequeueing operation needs atomicity; whether
> hidden in the hardware or done explicitly in software, this
> synchronization requires some cost even if there is never a
> conflict that causes one thread to delay.

This is required for Shared Memory too. The minimum
synchronisation and data transfer requirements will remain
the same at the application level in a modern SMP with
local cache for each CPU (unless the shared memory is
accessed directly word by word - slow).

I come from a CSP background which is more like ADA's
Rendezvous* mechanisms than this queued nonsense sitting
on top of heavyweight transports... If I wanted queueing
I implemented it in a tiny little process that sat
between the clients and the server.

> My experience with message passing systems is that they
> start out by penalized a factor of about two compared to
> direct call systems, and that by employing many clever
> strategies, can make up about half the deficit after years
> of improvement. Sometimes the elegance, uniformity, and
> protection provided by the message passing design is worth
> it.

In which case I'd guess that you were using relatively
heavyweight message passing compared to the form I am
most familiar with (and would advocate as the One True
Programming Paradigm...).

* = Please tell me ADA's Rendezvous doesn't do queueing...

Cheers,
Rupert


Rupert Pigott

unread,
Jul 1, 2003, 5:31:25 PM7/1/03
to

"Peter da Silva" <pe...@abbnm.com> wrote in message
news:bdsln0$292k$6...@jeeves.eng.abbnm.com...

Or you could do what many kludgers do : Add another interface and
botch the internals to fit.

Cheers,
Rupert


Rupert Pigott

unread,
Jul 1, 2003, 5:32:36 PM7/1/03
to

"Peter da Silva" <pe...@abbnm.com> wrote in message
news:bdslj4$292k$5...@jeeves.eng.abbnm.com...

> In article <3f01c9ad$0$56600$bed6...@pubnews.gradwell.net>,
> Geoff Lane <zza...@buffy.sighup.org.uk> wrote:
> > In alt.folklore.computers Peter da Silva <pe...@abbnm.com> wrote:
> > > If the message passing is cheap enough (significantly less than system
> > > call overhead in a "traditional" OS) then the message-passing system
> > > can be faster than the traditional one.
>
> > Message passing also has another advantage - it defines interfaces that
> > cannot be subverted. Monolithic kernels allow poor programmers to
bypass
> > defined interfaces in the interests of "effiency"
>
> Not that any OS has ever moved a component into the kernel to do the same
> thing. :)

God forbid that you put NFS servers, HTTP servers, and GUIs into
kernel ! That would be lunacy ! Who would do such a thing ? :)

Cheers,
Rupert


Chris Hedley

unread,
Jul 1, 2003, 5:42:40 PM7/1/03
to
According to Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk>:

> > Not that any OS has ever moved a component into the kernel to do the same
> > thing. :)
>
> God forbid that you put NFS servers, HTTP servers, and GUIs into
> kernel ! That would be lunacy ! Who would do such a thing ? :)

Some people could jump to the conclusion that MVT's memory scheme
is still state of the art...

Chris.
--
"If the world was an orange it would be like much too small, y'know?" Neil, '84
Currently playing: random early '80s radio stuff
http://www.chrishedley.com - assorted stuff, inc my genealogy. Gan canny!

Nick Maclaren

unread,
Jul 1, 2003, 6:12:46 PM7/1/03
to
In article <10570948...@saucer.planet.gong>,

Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>
>"Tom Van Vleck" <th...@multicians.org> wrote in message
>news:thvv-7F26CE.1...@news.comcast.giganews.com...
>> "Rupert Pigott" wrote:
>> > I don't see why this should be so. In a NUMA system or a
>> > message passing system for a message to get from CPU A to
>> > CPU B it will still have to travel along a very similar
>> > signal path. So it can't be the plumbing that slows it down
>> >
>> > If you are talking about a locally delivered message then
>> > perhaps it could be slower, simply because you are eating
>> > bandwidth to make a copy (and pranging the cache to boot).
>> >
>> > The trick seems to be to make messages cheap in the
>> > hardware, this has been done many many times.
>>
>> One cost of message based systems is making copies of
>> things. To make a message passing call, one has to at
>> minimum determine the size of the arguments, allocate a
>> message object, marshal the arguments into it, queue and
>> dequeue the message, and free the message object. If the
>
>That's not quite true. Allocate a receiving buffer (once)
>for each "channel", not for each "message". In OCCAM this
>was frequently done statically (no run time cost).

It's not even remotely true. Both Rupert Pigott and you are talking
about techniques that were widely known a long time back, and which
work well.

>Also in a NUMA or SMP machine with cache-coherency the
>copying and locking in effect perform the same kind of
>interactions that message passing does. On a more
>dogmatic day I'd assert that it is *precisely* the same
>interaction but with different semantics presented to
>the code.

I think that I agree with you. The only case I know of where shared
memory 'scores' is when transferring data from the middle of (say)
one stack to the middle of another. Shared memory can do that with
one copy; message passing sometimes (but not always) needs two.

>> calling site and the called site do not share memory, than
>> additional copying and buffering is necessary. The storage
>> for the copies is either preallocated and mostly idle, or
>> is allocated and freed from a pool of storage, at the cost
>> of additional complexity; in either case it adds to memory
>> pressure.
>
>Not a huge concern in recent years judging by the
>bloatage that applications have exhibited.

If the buffer management is competent, it isn't a major problem
except for massive processor counts. And only SGI produces a shared
memory system with above c. 100 CPUs.

>> Another cost is synchronization. Each allocation, freeing,
>> queueing, or dequeueing operation needs atomicity; whether
>> hidden in the hardware or done explicitly in software, this
>> synchronization requires some cost even if there is never a
>> conflict that causes one thread to delay.
>
>This is required for Shared Memory too. The minimum
>synchronisation and data transfer requirements will remain
>the same at the application level in a modern SMP with
>local cache for each CPU (unless the shared memory is
>accessed directly word by word - slow).

What is often worse is that shared memory interfaces often don't
provide decent synchronised transfers, and so you have to use
inappropriate ones (e.g. barriers).

>I come from a CSP background which is more like ADA's
>Rendezvous* mechanisms than this queued nonsense sitting
>on top of heavyweight transports... If I wanted queueing
>I implemented it in a tiny little process that sat
>between the clients and the server.

Heavyweight AND MISDESIGNED transports. N copies, each stage
done synchronously and no decent diagnostics.

>> My experience with message passing systems is that they
>> start out by penalized a factor of about two compared to
>> direct call systems, and that by employing many clever
>> strategies, can make up about half the deficit after years
>> of improvement. Sometimes the elegance, uniformity, and
>> protection provided by the message passing design is worth
>> it.
>
>In which case I'd guess that you were using relatively
>heavyweight message passing compared to the form I am
>most familiar with (and would advocate as the One True
>Programming Paradigm...).

Now, there I differ .... It is One of The True Programming Paradigms,
but I am a heretic (from whatever viewpoint) :-)


Regards,
Nick Maclaren.

Sander Vesik

unread,
Jul 1, 2003, 6:10:37 PM7/1/03
to
In comp.arch Stephen Fuld <s.f...@pleaseremove.att.net> wrote:
>
> "Sander Vesik" <san...@haldjas.folklore.ee> wrote in message
> news:10570630...@haldjas.folklore.ee...
>> In comp.arch Stephen Fuld <s.f...@pleaseremove.att.net> wrote:
>
> snip
>
>> > So, the obvious question is then, is there something that makes sense
> from
>> > that idea to adapt into current microprocessor designs in order to give
> the
>> > advantages of low cost message passing, and ease the development of more
>> > modular software that would use it?
>> >
>>
>> Couledn't you add something like that onto a "conventional" processor?
>
> I think you essentially restated part of my question. In the Elixi, Russel
> pointed out that it needed both hardware and microcode. Now microcode is

yes - in a shorter (and i'm afraid, infinititely worse spelled) version. By the
time I reached the end I had forgotten all about text before the description.

> passe on most current "conventional" processors, so you have to figure
> something else out. In order to cross domains, you probably have to do some
> fiddling with page tables or something. You want to avoid the overhead of a

Instead of microcode, one might use a special operating mode / exception level
and support instructions. such a mode could use alternate regs, have access to
data using more than one asid and so on. with some input checking in hardware
it could be both fast and RISCy.

> full system call if possible. ISTM that there are some issues here to
> resolve that may make it not worth while. Hence my question, and the second
> part, which is, assuming that you had cheap message passing, what would it
> take for much software to take advantage of it?
>

Hmmm... dependning on how ingrained their present message passing interfaces
and implementations are, mach or some of the newer microkernels might be portable
to such? Couldn't you as a first step eliminate some of their present inefficency
and then extend to achieve more performance?

Stephen Fuld

unread,
Jul 2, 2003, 1:01:43 AM7/2/03
to

"Sander Vesik" <san...@haldjas.folklore.ee> wrote in message
news:10570974...@haldjas.folklore.ee...

> In comp.arch Stephen Fuld <s.f...@pleaseremove.att.net> wrote:

snip

> Instead of microcode, one might use a special operating mode / exception


level
> and support instructions. such a mode could use alternate regs, have
access to
> data using more than one asid and so on. with some input checking in
hardware
> it could be both fast and RISCy.

Yes, I think you could use something like that. I guess I was looking for a
variety of potential solutions with some analysis of what fits the best, is
most efficient, is easiers to use, etc. You have indeed provided the
outline for one such method, Would the lower numbered rings (but still >0)
be sufficient, of do we need another mode?

> > full system call if possible. ISTM that there are some issues here to
> > resolve that may make it not worth while. Hence my question, and the
second
> > part, which is, assuming that you had cheap message passing, what would
it
> > take for much software to take advantage of it?
> >
>
> Hmmm... dependning on how ingrained their present message passing
interfaces
> and implementations are, mach or some of the newer microkernels might be
portable
> to such? Couldn't you as a first step eliminate some of their present
inefficency
> and then extend to achieve more performance?

I think so. And you could have a compatibility "trap" routine that took
what are now kernel calls and turned them into the appropriate messages
passed. Eventually, code could migrate toward the native interfaces for
increased performance and perhaps functionality.

James Cownie

unread,
Jul 2, 2003, 4:08:49 AM7/2/03
to
Rupert Pigott wrote:

>
> That's not quite true. Allocate a receiving buffer (once)
> for each "channel", not for each "message". In OCCAM this
> was frequently done statically (no run time cost).
>

I don't think so. In Occam all communication is synchronised,
so there is no need for _any_ receive buffer. Data can _always_
be transferred directly into the user's target variable.

Similarly on send, data can always be transferred directly
from the user's variable. (Or constant or ?, of course).

--
-- Jim

James Cownie <jco...@etnus.com>
Etnus, LLC. +44 117 9071438
http://www.etnus.com

Pete Fenelon

unread,
Jul 2, 2003, 5:10:29 AM7/2/03
to
In alt.folklore.computers Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk> wrote:
> God forbid that you put NFS servers, HTTP servers, and GUIs into
> kernel ! That would be lunacy ! Who would do such a thing ? :)
>

Thinking of no open-source OS in particular.... the script kiddies who
hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
optional ;)

I don't think I've seen an in-kernel GUI on any Unix system since
Whitechapel MG1s, but I'm sure someone could prove me wrong ;)

Morten Reistad

unread,
Jul 2, 2003, 6:41:24 AM7/2/03
to
In article <vg58c5l...@corp.supernews.com>,

Pete Fenelon <pe...@fenelon.com> wrote:
>In alt.folklore.computers Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>> God forbid that you put NFS servers, HTTP servers, and GUIs into
>> kernel ! That would be lunacy ! Who would do such a thing ? :)
>>
>
>Thinking of no open-source OS in particular.... the script kiddies who
>hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
>optional ;)

The Linux people have the nfs server still in user mode last I saw.
The BSD has had the nfs server tightly connected to the rest of the fs
code, and even if it is a separate process, it still is executing
kernel code in a high privilige level.

<rant>
Why do the file systems have to be so tightly integrated in the "ring0"
core? This is one subsystem that screams for standard callouts and
"ring1" level.
</rant off>


>I don't think I've seen an in-kernel GUI on any Unix system since
>Whitechapel MG1s, but I'm sure someone could prove me wrong ;)

GUI's, no; unless you count the fancy tty screen drivers.

Pete Fenelon

unread,
Jul 2, 2003, 8:07:08 AM7/2/03
to
In alt.folklore.computers Morten Reistad <m...@reistad.priv.no> wrote:
> In article <vg58c5l...@corp.supernews.com>,
> Pete Fenelon <pe...@fenelon.com> wrote:
>>In alt.folklore.computers Rupert Pigott <r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>>> God forbid that you put NFS servers, HTTP servers, and GUIs into
>>> kernel ! That would be lunacy ! Who would do such a thing ? :)
>>>
>>
>>Thinking of no open-source OS in particular.... the script kiddies who
>>hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
>>optional ;)
>
> The Linux people have the nfs server still in user mode last I saw.
> The BSD has had the nfs server tightly connected to the rest of the fs
> code, and even if it is a separate process, it still is executing
> kernel code in a high privilige level.


AFAIR, Acting as an NFS server under Linux doesn't need kernel
support (but can use optional kernel-side support). Acting as an
NFS client requires kernel support (to wire NFS into the supported
set of filesystems.)

>
> <rant>
> Why do the file systems have to be so tightly integrated in the "ring0"
> core? This is one subsystem that screams for standard callouts and
> "ring1" level.
> </rant off>

Agreed.

>
>
>>I don't think I've seen an in-kernel GUI on any Unix system since
>>Whitechapel MG1s, but I'm sure someone could prove me wrong ;)
>
> GUI's, no; unless you count the fancy tty screen drivers.

--

Holger Veit

unread,
Jul 2, 2003, 8:28:28 AM7/2/03
to
Pete Fenelon <pe...@fenelon.com> wrote:
> In alt.folklore.computers Morten Reistad <m...@reistad.priv.no> wrote:
[...]

>
> AFAIR, Acting as an NFS server under Linux doesn't need kernel
> support (but can use optional kernel-side support). Acting as an
> NFS client requires kernel support (to wire NFS into the supported
> set of filesystems.)
>
>>
>> <rant>
>> Why do the file systems have to be so tightly integrated in the "ring0"
>> core? This is one subsystem that screams for standard callouts and
>> "ring1" level.
>> </rant off>
>
> Agreed.

Seconded. Problem is meanwhile that the old VAX days are gone; those
that introduced the several privilege rings which the 386 copied rather
precisely. With the exception of OS/2 and older WinNT, it seems no
modern OS has actually used the feature of multiple privilege levels
at all, with the common distinction of "supervisor" (or "kernel") and
"user" modes. Earlier (like the 68000) and later processors (PPC, MIPS, etc.)
only have those two levels at all. I.e. the knowledge of layered privileges
seems to be gone and lost - it is no just and "everything" or "nothing"
difference which makes such systems rather vulnerable. Ring 1 for file
systems or hicore parts of drivers is appropriate - but then, as M$
demonstrated by destroying the rather clean concept of WinNT, there are
performance issues due to lousy application code that "forces" the OS
writers to circumvent such clean ring callouts by throwing the whole
garbage into ring 0. Or as in Linux, there is no explicit driver API
at all (like NT's HAL or OS/2's DevHlp); every driver can mess up everything
else in kernel mode without being prevented. Ideas like microkernels have
been beaten to death by Mach crap that didn't build up the kernel from
ground up, but by the nonsense attempt to strip a monolithic kernel and
move parts into user mode without first defining where the ring border
is supposed to be. Needless to say, the result was catastrophic.

Holger

Barry Margolin

unread,
Jul 2, 2003, 11:02:13 AM7/2/03
to
In article <kscudb.krn1.ln@acer>, Morten Reistad <m...@reistad.priv.no> wrote:
>In article <vg58c5l...@corp.supernews.com>,
>Pete Fenelon <pe...@fenelon.com> wrote:
>>In alt.folklore.computers Rupert Pigott
><r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>>> God forbid that you put NFS servers, HTTP servers, and GUIs into
>>> kernel ! That would be lunacy ! Who would do such a thing ? :)
>>>
>>
>>Thinking of no open-source OS in particular.... the script kiddies who
>>hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
>>optional ;)
>
>The Linux people have the nfs server still in user mode last I saw.

And I frequently hear complaints about how poor Linux's NFS support is.
Coincidence?

Unfortunately, the design of NFS practically screams for kernel
implementation. Most file system APIs implement an
open/do-lots-of-operations/close model of file access. NFS doesn't have
open or close operations, each request identifies the file using an opaque
"handle", and the file handle maps most naturally into Unix's inode model;
when implementing NFS servers on other operating systems, it's often
necessary to design kludges to support its file handles. Since the
standard API only deals with accessing files by name, not inode, it's
necessary to put the server in the kernel to get past the name requirement
(user-mode servers typically have to have the same kinds of kludges as
non-Unix implementations, or you need to add system calls that allow
by-inode access).

Shmuel (Seymour J.) Metz

unread,
Jul 2, 2003, 10:22:32 AM7/2/03
to
In <g8vsdb...@teabag.cbhnet>, on 07/01/2003

at 10:42 PM, c...@ieya.co.REMOVE_THIS.uk (Chris Hedley) said:

>Some people could jump to the conclusion that MVT's memory scheme is
>still state of the art...

Even MVT had storage protection; Supervisor didn't automatically give
you key 0. So moving graphics into the kernel on an IA-32 is even
worse than what MVT had. A company that would do a thing like that
would be capable of anything, even allowing users to include code in
e-mail that the company's software would automatically execute on
receipt of the e-mail. We all know that no one could be that stupid.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT

Any unsolicited bulk E-mail will be subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail.

Reply to domain Patriot dot net user shmuel+news to contact me. Do not reply
to spam...@library.lspace.org


Rupert Pigott

unread,
Jul 2, 2003, 11:46:13 AM7/2/03
to
"James Cownie" <jco...@etnus.com> wrote in message
news:lqwMa.866$Wp.61...@news-text.cableinet.net...

> Rupert Pigott wrote:
>
> >
> > That's not quite true. Allocate a receiving buffer (once)
> > for each "channel", not for each "message". In OCCAM this
> > was frequently done statically (no run time cost).
> >
>
> I don't think so. In Occam all communication is synchronised,
> so there is no need for _any_ receive buffer. Data can _always_
> be transferred directly into the user's target variable.
>
> Similarly on send, data can always be transferred directly
> from the user's variable. (Or constant or ?, of course).

Buffer == user variable.

The confusion is arising because I'm trying to be
language agnostic here, just looking at the basics
of sending bits across systems and synchronisation. :)

Cheers,
Rupert


Peter Ibbotson

unread,
Jul 2, 2003, 11:52:49 AM7/2/03
to
"Pete Fenelon" <pe...@fenelon.com> wrote in message
news:vg58c5l...@corp.supernews.com...

> In alt.folklore.computers Rupert Pigott
<r...@dark-try-removing-this-boong.demon.co.uk> wrote:
> > God forbid that you put NFS servers, HTTP servers, and GUIs into
> > kernel ! That would be lunacy ! Who would do such a thing ? :)
> >
>
> Thinking of no open-source OS in particular.... the script kiddies who
> hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
> optional ;)
>
> I don't think I've seen an in-kernel GUI on any Unix system since
> Whitechapel MG1s, but I'm sure someone could prove me wrong ;)


I learned C on the MG1, I don't remember the GUI as being in kernel, but
then at the time I'm not sure I'd have spotted the distinction. I always
liked the idea of a seperate mouse co-processor, and all windows having
their contents stored in a raster rather than having to repaint when they
overlapped. Are there technical documents on the web anywhere?

--
Work pet...@lakeview.co.uk.plugh.org | remove magic word .org to reply
Home pe...@ibbotson.co.uk.plugh.org | I own the domain but theres no MX


George Coulouris

unread,
Jul 2, 2003, 1:29:21 PM7/2/03
to
In article <10570948...@saucer.planet.gong>, Rupert Pigott wrote:
[snip]

> Also in a NUMA or SMP machine with cache-coherency the
> copying and locking in effect perform the same kind of
> interactions that message passing does. On a more
> dogmatic day I'd assert that it is *precisely* the same
> interaction but with different semantics presented to
> the code.

This reminds me of field theories vs. gluon exchange..

--
george coulouris
not speaking for ncbi
remove 's' from my email address to reply

Thomas

unread,
Jul 2, 2003, 3:45:30 PM7/2/03
to
Morten Reistad wrote:

> The Linux people have the nfs server still in user mode last I saw.

Debian offers a choice: user or kernel mode NFS server.

User mode has a real nfsd running handling requests.

Kernel model also starts an nfsd, that in turn starts kernel threads, and are
part of the kernel. Just like scheduler and interrupt handlers are part of the
kernel.


Thomas

Eric Lee Green

unread,
Jul 2, 2003, 4:40:50 PM7/2/03
to
Morten Reistad wrote:
> In article <vg58c5l...@corp.supernews.com>,
> Pete Fenelon <pe...@fenelon.com> wrote:
>>In alt.folklore.computers Rupert Pigott
>><r...@dark-try-removing-this-boong.demon.co.uk> wrote:
>>> God forbid that you put NFS servers, HTTP servers, and GUIs into
>>> kernel ! That would be lunacy ! Who would do such a thing ? :)
>>>
>>
>>Thinking of no open-source OS in particular.... the script kiddies who
>>hack the Linux kernel have managed 2 out of 3 ;) Fortunately they're
>>optional ;)
>
> The Linux people have the nfs server still in user mode last I saw.

Which apparently was over two years ago. The standard NFS server in Linux has
been the kernel one since the release of the Linux 2.4 operating system
kernel. The user mode NFS server is still available, but is unsupported and
only implements NFS V2, whereas the kernel mode NFS server implements NFS V3.
The NFS V4 reference implementation for Linux is also a kernel-mode server.

> <rant>
> Why do the file systems have to be so tightly integrated in the "ring0"
> core? This is one subsystem that screams for standard callouts and
> "ring1" level.
> </rant off>

Performance. Device drivers reading straight into filesystem buffers is
difficult to achieve in userland. You end up having to modify your VM to be
able to lock pages in memory, then have to go through the overhead of managing
said locking before you do I/O. Much easier to just have both operating in
kernal-land using the normal kernel memory page allocation mechanisms of the
OS in question.

That said, on my 2.4Ghz laptop, I could do reads through LUFS (Linux Userland
FileSystem) at 100MByte/sec, much faster than the hard drive can spin, so
performance is not as big an issue as it was "back in the day". I was using
over 20% of the CPU to do the reads, though, as vs. under 5% of the CPU for
the native kernel-mode filesystem. The main reason for the high CPU usage was
all the data copying between usermode and kernel-land needed. For example,
read() turns into: kernel call, VFS layer (may have cached data), pop back up
to userland, kernel call to device driver, copy result back up to userland,
copy result back down to kernel-land, pop back up to the user with a copy of
the data. Ouch. I can think of ways to speed this up, but none that will make
it as fast as the tightly-integrated kernel-land implementation.

> GUI's, no; unless you count the fancy tty screen drivers.

Hmm, the Linux 'fbcon' screen drivers ALMOST count as a native GUI, so I guess
the Linux geeks have achieved 2.5 out of 3 on the scale of atrocities :-).

--
Eric Lee Green mailto:er...@badtux.org
Unix/Linux/Storage Software Engineer needs job --
see http://badtux.org for resume


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Chris Hedley

unread,
Jul 2, 2003, 4:38:01 PM7/2/03
to
According to Shmuel (Seymour J.) Metz <spam...@library.lspace.org.invalid>:

> Even MVT had storage protection; Supervisor didn't automatically give
> you key 0.

I had a feeling that my comparison was probably unfair to MVT!

> So moving graphics into the kernel on an IA-32 is even
> worse than what MVT had. A company that would do a thing like that
> would be capable of anything, even allowing users to include code in
> e-mail that the company's software would automatically execute on
> receipt of the e-mail. We all know that no one could be that stupid.

No, it'll never happen. :/ (I could rant on and on, but I think
it's already been done by people who're better at it than me!)

Linus Torvalds

unread,
Jul 2, 2003, 4:46:45 PM7/2/03
to
In article <kscudb.krn1.ln@acer>, Morten Reistad <m...@reistad.priv.no> wrote:
>
>The Linux people have the nfs server still in user mode last I saw.

Nope. That was five years ago. Nobody uses the user-space server for
serious NFS serving any more, even though it _is_ useful for
experimenting with user-space filesystems (ie "ftp filesystem" or
"source control filesystem").

><rant>
>Why do the file systems have to be so tightly integrated in the "ring0"
>core? This is one subsystem that screams for standard callouts and
>"ring1" level.
></rant off>

Because only naive people think you can do it efficiently any other way.

Face it, microkernels and message passing on that level died a long time
ago, and that's a GOOD THING.

Most of the serious processing happens outside the filesystem (ie the
VFS layer keeps track of name caches, stat caches, content caches etc),
and all of those data structures are totally filesystem-independent (in
a well-designed system) and are used heavily by things like memory
management. Think mmap - the content caches are exposed to user space
etc. But that's not the only thing - the name cache is used extensively
to allow people to see where their data comes from (think "pwd", but on
steroids), and none of this is anything that the low-level filesystem
should ever care about.

At the same time, all those (ring0 - core) filesystem data structures
HAVE TO BE MADE AVAILABLE to the low-level filesystem for any kind of
efficient processing. If you think we're going to copy file contents
around, you're just crazy. In other words, the filesystem has to be
able to directly access the name cache, and the content caches. Which in
turn means that it has to be ring0 (core) too.

If you don't care about performance, you can add call-outs and copy-in
and copy-out etc crap. I'm telling you that you would be crazy to do it,
but judging from some of the people in academic OS research, you
wouldn't be alone in your own delusional world of crap.

Sorry to burst your bubble.

Linus

Tom Van Vleck

unread,
Jul 2, 2003, 5:38:30 PM7/2/03
to
Eric Lee Green wrote:

> Morten Reistad wrote:
> > <rant>
> > Why do the file systems have to be so tightly integrated in
> > the "ring0" core? This is one subsystem that screams for
> > standard callouts and"ring1" level.
> > </rant off>
>
> Performance. Device drivers reading straight into filesystem
> buffers is difficult to achieve in userland. You end up having
> to modify your VM to be able to lock pages in memory, then have
> to go through the overhead of managing said locking before you
> do I/O. Much easier to just have both operating in kernal-land
> using the normal kernel memory page allocation mechanisms of
> the OS in question.

Multics had a facility called the I/O interfacer, ioi_.
Its purpose was to allow the user to safely write a device
driver that ran in the user ring of a process, obtaining
page-locked I/O buffers and wiring and unwiring them efficiently.
The various tape DIMs and the printer DIM used ioi_.

One of the major efficiencies of this scheme was, again, that we
could avoid making multiple extra copies of the data. This saved
us complicated alloc, free, and snynchronization operations and
the related memory pressure on every record.

It worked great. As I remember, when the printer DIM was changed
to use ioi_, its load on the system decreased by more than half,
and we got to remove a bunch of device specific code from ring 0.

Charlie Gibbs

unread,
Jul 2, 2003, 5:19:31 PM7/2/03
to
In article <bduv4l$p08$1$8300...@news.demon.co.uk>
spa...@ibbotson.demon.co.uk (Peter Ibbotson) writes:

>I learned C on the MG1, I don't remember the GUI as being in kernel,
>but then at the time I'm not sure I'd have spotted the distinction.
>I always liked the idea of a seperate mouse co-processor, and all
>windows having their contents stored in a raster rather than having
>to repaint when they overlapped.

You mean like the Amiga's SMART_REFRESH windows? When I started
writing Windows programs, I was quite disgusted to discover that
the contents of my window would be erased if another window opened
over top of it, and that I was responsible for restoring it.
Oddly enough, my old Amiga 1000, with its paltry 68000, a couple
of support chips, and 512K of RAM, didn't seem to have much trouble
with the overhead that modern-day programmers with a Pentium 4 will
still claim is intolerably high...

--
/~\ cgi...@kltpzyxm.invalid (Charlie Gibbs)
\ / I'm really at ac.dekanfrus if you read it the right way.
X Top-posted messages will probably be ignored. See RFC1855.
/ \ HTML will DEFINITELY be ignored. Join the ASCII ribbon campaign!

Jeff Kenton

unread,
Jul 2, 2003, 7:57:21 PM7/2/03
to

Charlie Gibbs wrote:
...

>
> You mean like the Amiga's SMART_REFRESH windows? When I started
> writing Windows programs, I was quite disgusted to discover that
> the contents of my window would be erased if another window opened
> over top of it, and that I was responsible for restoring it.
> Oddly enough, my old Amiga 1000, with its paltry 68000, a couple
> of support chips, and 512K of RAM, didn't seem to have much trouble
> with the overhead that modern-day programmers with a Pentium 4 will
> still claim is intolerably high...

Agreed, but you're cheating a little here. The support chips included
state-of-the-art graphics processing without loading down the 68000.

jeff (one-time authorized Amiga reseller)


--

-------------------------------------------------------------------------
= Jeff Kenton Consulting and software development =
= http://home.comcast.net/~jeffrey.kenton =
-------------------------------------------------------------------------

Christopher Browne

unread,
Jul 2, 2003, 9:36:31 PM7/2/03
to
After takin a swig o' Arrakan spice grog, Jeff Kenton <Jeffrey...@comcast.net> belched out...:

> Charlie Gibbs wrote:
> ...
> >
>> You mean like the Amiga's SMART_REFRESH windows? When I started
>> writing Windows programs, I was quite disgusted to discover that
>> the contents of my window would be erased if another window opened
>> over top of it, and that I was responsible for restoring it.
>> Oddly enough, my old Amiga 1000, with its paltry 68000, a couple
>> of support chips, and 512K of RAM, didn't seem to have much trouble
>> with the overhead that modern-day programmers with a Pentium 4 will
>> still claim is intolerably high...
>
> Agreed, but you're cheating a little here. The support chips included
> state-of-the-art graphics processing without loading down the 68000.

.. And with some DSP chip that does a gazillion polygons per second
and 128MB of _graphics memory_, this is expected to be troublesome?
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','ntlug.org').
http://cbbrowne.com/info/spreadsheets.html
All bleeding stops...eventually.

Eric Lee Green

unread,
Jul 3, 2003, 12:10:55 AM7/3/03
to
Tom Van Vleck wrote:
> Eric Lee Green wrote:
>> Morten Reistad wrote:
>> > <rant>
>> > Why do the file systems have to be so tightly integrated in
>> > the "ring0" core? This is one subsystem that screams for
>> > standard callouts and"ring1" level.
>> > </rant off>
>>
>> Performance. Device drivers reading straight into filesystem
>> buffers is difficult to achieve in userland. You end up having
>> to modify your VM to be able to lock pages in memory, then have
>> to go through the overhead of managing said locking before you
>> do I/O. Much easier to just have both operating in kernal-land
>> using the normal kernel memory page allocation mechanisms of
>> the OS in question.
>
> Multics had a facility called the I/O interfacer, ioi_.
> Its purpose was to allow the user to safely write a device
> driver that ran in the user ring of a process, obtaining
> page-locked I/O buffers and wiring and unwiring them efficiently.
> The various tape DIMs and the printer DIM used ioi_.

This solves only part of the problem. With filesystems, you not only want to
move data between user and device, you also want to cache it, in a
preferential manner (e.g. directory nodes might get cached preferential to
data nodes, etc.). Multics was somewhat unique in its ability to memory-map
files between different processes (after all, they were just segments), which
used the page cache as a file cache, but memory mapping isn't particularly
efficient as a method for caching, especially with current processors. RMS is
having to re-write major parts of the GNU Hurd to move away from memory
mapping back to traditional I/O for numerous reasons (one reason being that it
imposed major restrictions on the size of devices and files with today's
32-bit processors).

> It worked great. As I remember, when the printer DIM was changed
> to use ioi_, its load on the system decreased by more than half,
> and we got to remove a bunch of device specific code from ring 0.

The deal with Multics was that it had wonderfully fast I/O, but a horrendously
slow context switch time and processors that were five years obsolete (and
slower than molasses) on the day they were first released. I'm still astounded
that a $6,000,000 Level/68 with three processors couldn't manage much over 100
users without staggering to a halt, in the same era where an equivalent IBM
370 installation could handle 500 users. (I was a user of USL Multics... I
swiftly learned to do my assignments at 3AM in the morning, when there was
only 40 or 50 people logged in... this probably explains my poor GPA during
the Multics era at USL too, since I was falling asleep in class :-). Anyhow,
an amount of copying that would have brought Multics to a crawl would not
cause a modern processor to even flinch. Copying across multiple contexts, not
copying in general, is the major issue, and with the filesystem in kernel-land
there's only one copying across a context (the copy from kernel-land to
user-land or vice-versa).

john jakson

unread,
Jul 3, 2003, 3:04:34 AM7/3/03
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10571607...@saucer.planet.gong>...

OT
For the Occam heads or ex Inmos folks!

In reviewing Occam more thoroughly recently it jumps out to me that
Occam is a wired kind of Hardware Description Language or HDL, the
kind of HDL that only a CSP mathematician or SW type would create with
some basic knowledge of HW but probably none on HW simulation event
wheels. I know Tony Hoare's 1st life was in HW, but the time gap to
CSP circa 68 was maybe too long.

Now HandelC binds Occam semantics with C syntax into a C based HDL
which allows the user to describe the algorithms parallel behaviour
directly rather than assuming some almighty parallelizing compiler try
to find it for them. HandelC can be used by C SW types to design soso
HW esp suitable for FPGAs. C based HDLs are not going to be popular
with HW folks for many reasons I leave alone here.

So HDLs use modules, always running processes, wires, hierarchy and
some C like syntax for functions, assignments etc.

Occam uses proedures, maybe always running processes, channels, !,?,
hierarchy and some non C like syntax to hang it all together.

Inside the HDL simulator, processes obey different rules, write before
read, not anonymous, static P list (HW doesn't just appear &
dissappear),

What would the world think if a small piece of the Verilog language
(or any good C like HDL with world class synthesis) was joined with C
& Occam semantics to allow a single language to support not just seq &
par programming but also support HW style of par coding. The run time
scheduler needed to support Occam channels & messages is not so very
different from the simulation event wheel of any good HDL engine.

The really big payoff is that either ordinary C seq code can be used &
mixed with Occam par code for SW types AND HW types can with extra
effort also code up in HW event driven logic. Given the kind of
compute intensive apps that would be written in Occam as distributed
processes possibly on more than 1 cpu, it is painfully obvious that
some of those problems can just as well be described with HW event
semantics. This opens the door to synthesizing such HW code portions
into accelerated HW engines or leaving it as end user HDL code. The
runtime for such a language could be either a static library, or
perhaps a new implementation of the Transputer with enhanced
scheduler. The HW applications running on this runtime could be truly
regarded as simulations of HW rather than the way we think of running
SW processes.

I could go on, but I have seen no other reasonable attempt to join HW
& SW languages into 1 language without the thing becoming an expensive
monster language available only to ASIC-FPGA guys (SuperLog,
SystemVerilog etc) some time in the distant future. ADA & VHDL might
have been a HW-SW candidate too, but gee they are both monster
languages & not exactly popular choices.

Besides compute intensive apps, I can also think of embedded systems
where HW & SW are usually designed by 2 distinct groups and then
cobbled together with some magic C glue & SDK. These systems have to
be rigidly defined accross a hard boundary of pyhsical and soft
entities long before either side can start. This would not have to be
the case with a mixed HW-SW language & runtime since some modules
might be real HW or simulated HW as performance allows and can be
changed by resynthesizing different parts as HW. Communication between
HW & SW uses shared link HW at the interface and channels on the SW
side.

I am working on such a mixed language & compiler, but the question is
how much of Verilog do I want to keep, most of it is already a mix of
HW and SW language features that already duplicates C (pretty badly
I'd say). I would like to only include the synthesizeable part and let
regular C be used to support it.

Any thoughts?

Regards
JJ

James Cownie

unread,
Jul 3, 2003, 4:36:25 AM7/3/03
to

But then your original statement is still wrong, since there
is no buffer allocated statically for each channel.

Consider
in ? a
in ? b
the variables a and b are allocated (on the stack), but there is
no store allocated as a buffer for the channel in.

Jan C. Vorbrüggen

unread,
Jul 3, 2003, 5:31:29 AM7/3/03
to
> This reminds me of field theories vs. gluon exchange..

...and both field theories and ccNUMA have "action at a distance",
while particle exchange and message passing make the delay more
explicit.

Jan

Ian G Batten

unread,
Jul 3, 2003, 6:06:19 AM7/3/03
to
In article <VtCMa.13$VI....@paloalto-snr1.gtei.net>,

Barry Margolin <barry.m...@level3.com> wrote:
> >The Linux people have the nfs server still in user mode last I saw.
>
> And I frequently hear complaints about how poor Linux's NFS support is.
> Coincidence?

It's in kernel space now, and has been for some time.

ian


Ian G Batten

unread,
Jul 3, 2003, 6:21:49 AM7/3/03
to
In article <vg58c5l...@corp.supernews.com>,
Pete Fenelon <pe...@fenelon.com> wrote:
> I don't think I've seen an in-kernel GUI on any Unix system since
> Whitechapel MG1s, but I'm sure someone could prove me wrong ;)

Suntools/SunWindows, present up until SunOS3.4?

ian

Pete Fenelon

unread,
Jul 3, 2003, 7:25:24 AM7/3/03
to

Predated the MG1 by a few years, I think, but definitely had a lot of
the pixrect stuff in-kernel.

Rupert Pigott

unread,
Jul 3, 2003, 10:52:17 AM7/3/03
to
"john jakson" <johnj...@yahoo.com> wrote in message
news:adb3971c.03070...@posting.google.com...

[SNIP]

> What would the world think if a small piece of the Verilog language
> (or any good C like HDL with world class synthesis) was joined with C
> & Occam semantics to allow a single language to support not just seq &
> par programming but also support HW style of par coding. The run time

This particular (and insignificant) part of the world would think that you
are
quite insane, but don't let that stop you ! Personally I really can't see
what
C would give you over and above OCCAM for that kind of gig. The two
key things that I missed in OCCAM were pointers and being able to
randomly change the type of stuff... The latter I grew to appreciate as I
matured. :)

As for Verilog I can't comment, never touched it, but I suspect that it
would
give something to OCCAM. A few of the folks I hung out with at INMOS
*did* use OCCAM for small simulations (can't remember how the big ones
were done), and a couple of the hardware bods semi-seriously suggested
doing a VHDL compiler for Transputers - from what they were saying it
seemed like a very snug fit (as you observed in your post).

Cheers,
Rupert


Rupert Pigott

unread,
Jul 3, 2003, 10:54:41 AM7/3/03
to
"James Cownie" <jco...@etnus.com> wrote in message
news:dWRMa.1853$gN2.14...@news-text.cableinet.net...

> Rupert Pigott wrote:
> > "James Cownie" <jco...@etnus.com> wrote in message
> > news:lqwMa.866$Wp.61...@news-text.cableinet.net...
> >
> >>Rupert Pigott wrote:
> >>
> >>
> >>>That's not quite true. Allocate a receiving buffer (once)
> >>>for each "channel", not for each "message". In OCCAM this
> >>>was frequently done statically (no run time cost).
> >>>
> >>
> >>I don't think so. In Occam all communication is synchronised,
> >>so there is no need for _any_ receive buffer. Data can _always_
> >>be transferred directly into the user's target variable.
> >>
> >>Similarly on send, data can always be transferred directly
> >>from the user's variable. (Or constant or ?, of course).
> >
> >
> > Buffer == user variable.
> >
> > The confusion is arising because I'm trying to be
> > language agnostic here, just looking at the basics
> > of sending bits across systems and synchronisation. :)
>
> But then your original statement is still wrong, since there
> is no buffer allocated statically for each channel.

Oops. Yes, of course. :)

> Consider
> in ? a
> in ? b
> the variables a and b are allocated (on the stack), but there is
> no store allocated as a buffer for the channel in.

The static thing is complete crap. Wrong braincell fired
I guess. But in essence the chunks of memory that the
channel I/O operates on is determined statically. It's
not dynamically allocated from a heap.

Cheers,
Rupert


Tim Shoppa

unread,
Jul 3, 2003, 1:18:51 PM7/3/03
to
Tom Van Vleck <th...@multicians.org> wrote in message news:<20030702173830...@multicians.org>...

> Eric Lee Green wrote:
>
> > Morten Reistad wrote:
> > > <rant>
> > > Why do the file systems have to be so tightly integrated in
> > > the "ring0" core? This is one subsystem that screams for
> > > standard callouts and"ring1" level.
> > > </rant off>
> >
> > Performance. Device drivers reading straight into filesystem
> > buffers is difficult to achieve in userland. You end up having
> > to modify your VM to be able to lock pages in memory, then have
> > to go through the overhead of managing said locking before you
> > do I/O. Much easier to just have both operating in kernal-land
> > using the normal kernel memory page allocation mechanisms of
> > the OS in question.
>
> Multics had a facility called the I/O interfacer, ioi_.
> Its purpose was to allow the user to safely write a device
> driver that ran in the user ring of a process, obtaining
> page-locked I/O buffers and wiring and unwiring them efficiently.
> The various tape DIMs and the printer DIM used ioi_.
>
> One of the major efficiencies of this scheme was, again, that we
> could avoid making multiple extra copies of the data. This saved
> us complicated alloc, free, and snynchronization operations and
> the related memory pressure on every record.

What details of the page-locked I/O buffers did ioi_ handle? In particular,
for DM devices it must have put the buffer address/length into the peripheral
controller for the user... how much consistency/inconsistency was there
between the various devices that ioi_ handled?

Right now I'm thinking of comparison with IBM mainframe-style channel
I/O (which also could avoid all those extra bounce buffers).

Tim.

Tom Van Vleck

unread,
Jul 3, 2003, 2:11:01 PM7/3/03
to
sho...@trailing-edge.com (Tim Shoppa) wrote:

> What details of the page-locked I/O buffers did ioi_ handle?
> In particular, for DM devices it must have put the buffer
> address/length into the peripheral controller for the user...
> how much consistency/inconsistency was there between the
> various devices that ioi_ handled?
>
> Right now I'm thinking of comparison with IBM mainframe-style
> channel I/O (which also could avoid all those extra bounce
> buffers).

It has been a long time.. as I remember, ioi_ knew how to give
you a wired, contiguous buffer in low enough memory that the I/O
controller could use it, and understood enough about the channel
protocol to keep it wired only when necessary. ioi_ also set up
the channel address base and bounds in the I/O controller
correctly.

As I remember, ioi_ knew very little about the devices or the
peripheral controllers; it just managed access to the buffers.

For more information, there is a project MAC TR:
Clark, D. D., An input-ouput architecture for virtual memory
computer systems, MAC-TR-117 (Ph.D. thesis), January 1974.

Abstract:
"In many large systems today, input/output is not performed
directly by the user, but is done interpretively by the system
for him, which causes additional overhead and also restricts the
user to whatever algorithms the system has implemented. Many
causes contribute to this involvement of the system in user
input/output, including the need to enforce protection
requirements, the inability to provide adequate response to
control signals from devices, and the difficulty of running
devices in a virtual environment, especially a virtual memory.
The goal of this thesis was the creation of an input/output
system which allows the user the freedom of direct access to the
device, and which allows the user to build input/output control
programs in a simple and understandable manner. This thesis
presents a design for an input/output subsystem architecture
which, in the context of a segmented, paged, time-shared computer
system, allows the user direct access to input/output devices.
This thesis proposes a particular architecture, to be used as an
example of a class of suitable designs, with the intention that
this example serve as a tool in understanding the large number
preferable form."

Benny Amorsen

unread,
Jul 3, 2003, 3:03:19 PM7/3/03
to
>>>>> "PI" == Peter Ibbotson <spa...@ibbotson.demon.co.uk> writes:

PI> I learned C on the MG1, I don't remember the GUI as being in
PI> kernel, but then at the time I'm not sure I'd have spotted the
PI> distinction. I always liked the idea of a seperate mouse
PI> co-processor, and all windows having their contents stored in a
PI> raster rather than having to repaint when they overlapped. Are
PI> there technical documents on the web anywhere?

With todays hardware, windows could simply be textures on polygons.
That way the GPU would handle all the repainting. A mouse coprocessor
is probably overkill, 50 interrupts a second is nothing for a modern
CPU and few people need more than 50 frames per second.


/Benny

Peter da Silva

unread,
Jul 3, 2003, 6:25:03 PM7/3/03
to
In article <3f02eaa8$7$fuzhry+tra$mr2...@news.patriot.net>,

Shmuel (Seymour J.) Metz <spam...@library.lspace.org.invalid> wrote:
> worse than what MVT had. A company that would do a thing like that
> would be capable of anything, even allowing users to include code in
> e-mail that the company's software would automatically execute on
> receipt of the e-mail. We all know that no one could be that stupid.

Bah, even the mythical "GOOD TIMES" virus required that you *open* it!

--
#!/usr/bin/perl
$/="%\n";chomp(@_=<>);print$_[rand$.]

Peter da Silva, just another Perl poseur.

Peter da Silva

unread,
Jul 3, 2003, 6:34:29 PM7/3/03
to
In article <BjKMa.89516$R73.9817@sccrnsc04>,
Jeff Kenton <Jeffrey...@comcast.net> wrote:

> Charlie Gibbs wrote:
> > You mean like the Amiga's SMART_REFRESH windows? When I started
> > writing Windows programs, I was quite disgusted to discover that
> > the contents of my window would be erased if another window opened
> > over top of it, and that I was responsible for restoring it.
> > Oddly enough, my old Amiga 1000, with its paltry 68000, a couple
> > of support chips, and 512K of RAM, didn't seem to have much trouble
> > with the overhead that modern-day programmers with a Pentium 4 will
> > still claim is intolerably high...

> Agreed, but you're cheating a little here. The support chips included
> state-of-the-art graphics processing without loading down the 68000.

Yep, though that was a fairly short term advantage, the 68020 often beat
the graphics chips and on the 68030 on my Amiga 3000 I got a big speedup
from a program that took the graphics chips out of the loop for text
rendering.

So... replace that with a 16 MHz 68030 and 2M of RAM, and you still have
to wonder why a Pentium 4 can't manage it.

In Mac OS X not only is that refresh handled by the OS, but it does it
for window scrolling and pre-renders hidden portions of subwindows in
the background. So the lesson wasn't completely lost.

john jakson

unread,
Jul 4, 2003, 12:24:49 AM7/4/03
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10572439...@saucer.planet.gong>...

You are probably right about the insanity. maybe I should move to Area
51 or just sell transparent alumin[i]um. We were all pretty crazy back
them, although I had no appreciation for what the architecture people
were really doing until a few years ago. I could see HW thinking all
through it if I squinted enough.

Well I'm half ways into the Verilog compiler anyway, so once the
runtime event wheel is up & running, adding != & ?= syncronized put &
get on chan vars is no big deal and they fit right into the fork &
join substitute for par blocks that Verilog already has. The C syntax
is important as Verilog is supposedly C like as well, only the
expressions are really C'ish.

Can't say I ever will meet any Occam users over here (MA), its a shame
more HW & SW people can't know that some of the others guys got
something similar they could understand.

Inmos mostly used their own HDL (far ahead of Verilog even now in some
ways) until IIRC they replaced it with Verilog & other commercial EDA
tools. I remember well Clive D's work on running Spice & maybe HDL
sims on a small 5 T network but the Transputer was not quite right for
it. Using Occam directly as a HDL would have been just as inefficient
as C on seq only cpus, but a new event time wheel scheduler could have
done the trick, but that was out of the question.

Regards
JJ

Scott Schwartz

unread,
Jul 4, 2003, 1:18:40 AM7/4/03
to
Just for the record, there are at least two modern systems, Plan 9 and
Inferno that use CSP for lots of interesting things. Modern languages
based on CSP include Limbo (from Bell Labs) and Erlang (from
Ericsson). All of these things have been used in commercial products,
not intended as academic research. They strike a practical balance.

Russ Cox wrote a nice essay on the topic:

"Resources about threaded programming in the Bell Labs CSP style"
http://plan9.bell-labs.com/who/rsc/thread/index.html

Nick Maclaren

unread,
Jul 4, 2003, 3:21:59 AM7/4/03