[OFFTOPIC] Very amusing DNS...

Paul Miller

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

A couple of days ago, they had a web page up at
http://linus.microsoft.com. Guess what it was -- The default page for
Apache on a RedHat installation!

hmm... I guess microsoft finally decided that windows was too unstable to
run. Or, maybe they just wanted to steal some of the source code!

-Paul

On Tue, 16 Jun 1998, Spirilis wrote:

> Hmm...
>
> <root>:/root# nslookup 131.107.74.11 198.6.1.1
> Server: cache00.ns.uu.net
> Address: 198.6.1.1
>
> Name: linus.microsoft.com
> Address: 131.107.74.11
>
>
> <root>:/root# nslookup linus.microsoft.com 198.6.1.1
> Server: cache00.ns.uu.net
> Address: 198.6.1.1
>
> Non-authoritative answer:
> Name: linus.microsoft.com
> Address: 131.107.74.11
>
> I wonder what MS uses that host for? ;-)
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majo...@vger.rutgers.edu
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu

Dean Gaudet

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

Or, just maybe, they wanted to have a system to do interoperability
testing. It would be very welcome IMNSHO. I'm tired of discovering
protocol flaws between their HTTP clients and Apache for them. There's
any number of valid reasons they would run a linux box, and none of them
have anything to do with windows being too unstable.

Dean

Joel Jaeggli

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

We've had this discussion before, if you'll check your archives. long
story short, microsoft tests the behavior of their web clients against
various webservers including linux boxes running apache. This shouldn't
all that surprising, given that apache in its various forms is supposed to
account for forty something percent of all web pages served.

joelja

--------------------------------------------------------------------------
Joel Jaeggli joe...@darkwing.uoregon.edu
Academic User Services con...@gladstone.uoregon.edu
PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.

Shawn Leas

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

On Tue, 16 Jun 1998, Dean Gaudet wrote:

> Or, just maybe, they wanted to have a system to do interoperability
> testing. It would be very welcome IMNSHO. I'm tired of discovering
> protocol flaws between their HTTP clients and Apache for them. There's
> any number of valid reasons they would run a linux box, and none of them
> have anything to do with windows being too unstable.

You mean like how they've failed miserably to move hotmail's services over
to NT? They use FreeBSD for the web server, and in runs Apache. The mail
servers are solaris, because of kernel threading I hear.

Case In Point - One for the home team

-Shawn

> Dean

Chris Wedgwood

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

On Wed, Jun 17, 1998 at 12:02:25AM -0500, Shawn Leas wrote:
> On Tue, 16 Jun 1998, Dean Gaudet wrote:
>
> > Or, just maybe, they wanted to have a system to do interoperability
> > testing. It would be very welcome IMNSHO. I'm tired of discovering
> > protocol flaws between their HTTP clients and Apache for them. There's
> > any number of valid reasons they would run a linux box, and none of them
> > have anything to do with windows being too unstable.
>
> You mean like how they've failed miserably to move hotmail's services over
> to NT? They use FreeBSD for the web server, and in runs Apache. The mail
> servers are solaris, because of kernel threading I hear.
>
> Case In Point - One for the home team

nope.

Dead _never_ said windows was stable. He just said there were lots of
reasons they might have a linux that aren't necessarily related to
stability, and interoperability is one possibility.

-cw

Shawn Leas

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

On Wed, 17 Jun 1998, Chris Wedgwood wrote:

> On Wed, Jun 17, 1998 at 12:02:25AM -0500, Shawn Leas wrote:
> > On Tue, 16 Jun 1998, Dean Gaudet wrote:
> >
> > > Or, just maybe, they wanted to have a system to do interoperability
> > > testing. It would be very welcome IMNSHO. I'm tired of discovering
> > > protocol flaws between their HTTP clients and Apache for them. There's
> > > any number of valid reasons they would run a linux box, and none of them
> > > have anything to do with windows being too unstable.
> >
> > You mean like how they've failed miserably to move hotmail's services over
> > to NT? They use FreeBSD for the web server, and in runs Apache. The mail
> > servers are solaris, because of kernel threading I hear.
> >
> > Case In Point - One for the home team
>
> nope.
>
> Dead _never_ said windows was stable. He just said there were lots of
> reasons they might have a linux that aren't necessarily related to
> stability, and interoperability is one possibility.

I was agreeing. Sorry if I was unclear. Back to testing reiserfs. Wow,
what a difference. I gotta say, I can't wait till it's in the kernel. I
wish it could get in the 2.1 series, but probably not.

-Shawn

Hans Reiser

unread,

Jun 17, 1998, 3:00:00 AM6/17/98

to

You are very kind, Shawn. I have a hope that we will be early 2.3. and an ambition to be done in time to be in 2.3.1, but it
remains to be done....

Hans

Anthony Barbachan

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

-----Original Message-----
From: Paul Miller <pa...@3dillusion.com>
To: Spirilis <spir...@mindmeld.dyn.ml.org>
Cc: linux-...@vger.rutgers.edu <linux-...@vger.rutgers.edu>
Date: Tuesday, June 16, 1998 11:27 PM
Subject: Re: [OFFTOPIC] Very amusing DNS...

>
>A couple of days ago, they had a web page up at
>http://linus.microsoft.com. Guess what it was -- The default page for
>Apache on a RedHat installation!
>

This could mean that they have finally started porting IE 4.01 to Linux as
they have done for Solaris and HPUX. I heard that the IE for UNIX
programmers were all (or at least mostly) Linux guys, they may have
convinced MS to release IE for Linux. Or they might just have been
compiling Apache 1.3.0 with frontpage extensions (and the other bundled
utilities) for Linux. If it is IE, the addition of MS as an application
provider for Linux should be benifitial to us.

>hmm... I guess microsoft finally decided that windows was too unstable to
>run. Or, maybe they just wanted to steal some of the source code!
>
>-Paul
>
>On Tue, 16 Jun 1998, Spirilis wrote:
>
>> Hmm...
>>
>> <root>:/root# nslookup 131.107.74.11 198.6.1.1
>> Server: cache00.ns.uu.net
>> Address: 198.6.1.1
>>
>> Name: linus.microsoft.com
>> Address: 131.107.74.11
>>
>>
>> <root>:/root# nslookup linus.microsoft.com 198.6.1.1
>> Server: cache00.ns.uu.net
>> Address: 198.6.1.1
>>
>> Non-authoritative answer:
>> Name: linus.microsoft.com
>> Address: 131.107.74.11
>>
>> I wonder what MS uses that host for? ;-)
>>
>>
>>

Spirilis

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

> utilities) for Linux. If it is IE, the addition of MS as an application
> provider for Linux should be benifitial to us.

Beneficial?? First off, if MS makes a product for Linux, this means a portion
of the Linux community will be at least partially dependent upon Microsoft
products... this means, in the end, we CAN'T DO without Microsoft! That is what
many other software developers are stuck in... as well as hardware vendors. Of
course, I would boycott any MS product made for Linux... but some people might
not. That's their own choice, but it still helps Microsoft eventually topple
Linux right on its head as it has so many other operating systems. (e.g.
DR-DOS, OS/2...)

Simon Kenyon

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

microsoft already make software for linux
dcom has been ported to linux for quite a while now
and the front page extensions to apache also work on linux

Derrik

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

On Thu, 18 Jun 1998, Anthony Barbachan wrote:

> This could mean that they have finally started porting IE 4.01 to Linux as
> they have done for Solaris and HPUX. I heard that the IE for UNIX
> programmers were all (or at least mostly) Linux guys, they may have
> convinced MS to release IE for Linux. Or they might just have been
> compiling Apache 1.3.0 with frontpage extensions (and the other bundled

> utilities) for Linux. If it is IE, the addition of MS as an application
> provider for Linux should be benifitial to us.

Sorry - I wouldn't want IE for Linux. Also, it's not likely to happen -
I'd guess maybe a 2% chance of it... If you remember Randy Chapman, he's
now working at Microsoft on some of the UNIX things they do (not very
many, mostly IE/Unix and FrontPage extensions and the like).. he's already
stated they have no intention of this. Also, the IE4/Unix "port" was done
using a commercial library package that implements Win32+MFC on top of
Solaris - first it's total bloatware, second, I find it unlikely that
they'd bother to port that to Linux (no money in it for 'em).

P.S. - I'm not against browsers - I might even buy Opera when the Linux
port comes out. I just hate IE.

Derrik Pates
dpa...@kalifornia.com
dpa...@acm.org

Alex Buell

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

nu...@bayside.net wrote:

> And the fact is that both Chapman and Dawson [IE4/solaris developers] > have grown quite comfortable shuttling back and forth between the
> worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go
> today to catch up to NT," says Dawson. "Take, just for one example,
> threading support. UNIX still has benefits, but NT is just a lot more
> full-featured."

OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on
a wall and mangled his private bits. Who are Chapman and Dawson kidding?
HAHAHA!! I can't believe these two are Solaris developers and yet come
out with this tripe?!

--
Cheers,
Alex.

Watch out, the NSA are everywhere. Your computer must be watched!

/\_/\ Legalise cannabis now!
( o.o ) Smoke some cannabis today!
> ^ < Peace, Love, Unity and Respect to all.

Dean Gaudet

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

On Thu, 18 Jun 1998, Alex Buell wrote:

> nu...@bayside.net wrote:
>
> > And the fact is that both Chapman and Dawson [IE4/solaris developers] > have grown quite comfortable shuttling back and forth between the
> > worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go
> > today to catch up to NT," says Dawson. "Take, just for one example,
> > threading support. UNIX still has benefits, but NT is just a lot more
> > full-featured."
>
> OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on
> a wall and mangled his private bits. Who are Chapman and Dawson kidding?
> HAHAHA!! I can't believe these two are Solaris developers and yet come
> out with this tripe?!

Have you worked with threads under NT and worked with threads under, say,
linux? Linux is in the dark ages as far as threads go. There's
linuxthreads, but to debug them you need to patch the kernel. You don't
get core dumps without another kernel patch. gdb doesn't support it all
directly, unless you patch it. None of that has made it into the main
distributions.

Even with the debugging problems solved, linuxthreads are heavier than
solaris pthreads or NT fibers. Both of those use a multiplexed user-level
and kernel-level threading system which results in fewer kernel context
switches. In userland a "context switch" is just a function call. But
we'll see this solved with Netscape's NSPR which was released with mozilla
-- it provides a multiplexed threading model (that particular model isn't
ported to linux yet). There's a paper from sun regarding solaris
pthreads, see
<http://www.arctic.org/~dgaudet/apache/2.0/impl_threads.ps.gz> for a copy
of it. You may also want to visit the JAWS papers at
<http://www.cs.wustl.edu/~jxh/research/research.html> for more discussion
on various threading paradigms.

Have you read my posts regarding file descriptors and other unix semantics
that are "unfortunate" when threading? They're not the end of the world,
but it's really obvious once you start digging into things that much of
unix was designed with a process in mind. For example, on NT there is
absolutely no problem with opening up 10000 files at the same time and
holding onto the file handles. This is exactly what's required to build a
top end webserver to get winning Specweb96 numbers on NT using
TransmitFile. On unix there's no TransmitFile, and instead we end up
using mmap() which has performance problems. Even if we had TransmitFile,
10k file descriptors isn't there. "You have to recompile your kernel for
that." Uh, no thanks, I have a hard enough time getting webserver
reviewers to use the right configuration file, asking them to recompile a
kernel is absolutely out of the question.

Unix multiplexing facilities -- select and poll -- are wake-all
primitives. When something happens, everything waiting is awakened and
immediately starts fighting for something to do. What a waste. They make
a lot of sense for processes though. On NT completion ports provide
wake-one semantics... which are perfect for threads.

NT may not be stable, but there's a lot of nice ideas in there. Don't
just shoo it away saying "pah, that's microsoft's piece of crap". DEC had
their hand in some of the architecture.

Dean

P.S. And now I'll go ask myself why I'm even responding to an advocacy
thread on linux-kernel.

Alex Buell

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

Hi guys,

> P.S. And now I'll go ask myself why I'm even responding to an advocacy
> thread on linux-kernel.

Please don't respond any further wrt my input about Microsoft to the
mailing list - take it to private if you wanna flame me or something.
:o)

--
Cheers,
Alex.

Watch out, the NSA are everywhere. Your computer must be watched!

/\_/\ Legalise cannabis now!
( o.o ) Smoke some cannabis today!
> ^ < Peace, Love, Unity and Respect to all.

-

Rik van Riel

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

On Thu, 18 Jun 1998, Spirilis wrote:

> > utilities) for Linux. If it is IE, the addition of MS as an application
> > provider for Linux should be benifitial to us.
>

> Beneficial?? First off, if MS makes a product for Linux, this means a portion
> of the Linux community will be at least partially dependent upon Microsoft
> products... this means, in the end, we CAN'T DO without Microsoft! That is what
> many other software developers are stuck in... as well as hardware vendors. Of
> course, I would boycott any MS product made for Linux... but some people might
> not. That's their own choice, but it still helps Microsoft eventually topple
> Linux right on its head as it has so many other operating systems. (e.g.
> DR-DOS, OS/2...)

Just look at it this way:

In half a year, we're going to be the _only_ OS that's
running MS Internet Exploder at a decent speed...
Will that mean that MS can't do without _us_ and that
some of the internetting population will switch to
us :-)

MS might just topple itself by showing that there _is_
one platform where IE runs at full speed.

Oh, and OS/2 definately isn't toppled! It still has twice
the number of users Windows NT has, and NT is being widely
touted a succes :)

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.v...@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

lin...@nightshade.ml.org

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

On Thu, 18 Jun 1998, Rik van Riel wrote:

> On Thu, 18 Jun 1998, Spirilis wrote:

[snip]

>
> In half a year, we're going to be the _only_ OS that's
> running MS Internet Exploder at a decent speed...
> Will that mean that MS can't do without _us_ and that
> some of the internetting population will switch to
> us :-)

Nope, they will just fill the innerloops with busycode.. And make IE Linux
slower then all the rest, and it will slow the whole system down while
it's running.

> MS might just topple itself by showing that there _is_
> one platform where IE runs at full speed.

Then they will develop some benchmark thats based on IE and use it show
that Linux is slow.

> Oh, and OS/2 definately isn't toppled! It still has twice
> the number of users Windows NT has, and NT is being widely
> touted a succes :)

Somehow I doubt there are that many people still running OS2.

This would be like determining that people on other planets use Linux
because if you add up all the times all of the Linux kernels been
downloaded, copyed, or put on a distribution cd and add them up you arrive
at a number larger then the population of earth.

> Rik.

Jason Venner

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

M$ directions on how to write production code

1) remember we are in the business of selling upgrades.
Do not ever make your code 'perfect'. Please leave some bugs
in, that can be worked around but are annoying.
Make notes as to where they are and the fix, so we can choose
which to remove in the next version.

2) Inner loops must NOT be optimized. If they are the users will
not need to buy another computer, forcing the sale of another
windows license, and also deliving dividends on our intel
stock.

3) NO-OP's are not sufficient for inner loops.
Use the INNER_LOOP_DELAY macro.
This will allocate system memory for operations
And depending on the system eat some of the disk
bandwidth as well.

Each release we modify some of the parameters in this loop,
so we can claim a performance increase, so we can tune just
how much memory, cpu and disk bandwidth we are waisting at
any given time.

Also if a competitor is gaining ground we can supply a service
release to tweek these.

4) To you kernel designes, remember, if the caller is not a M$
product you are required to run the delays at 5 x the normal
delay. All memory allocations are to actually allocate 8x the
requested memory instead of 4x.
Also, remember that the random factor for pointer/heap
corruption is 1 in 1000 for non MS products and 1 in 10000 for
MS products

Thomas Heide Clausen

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

-----BEGIN PGP SIGNED MESSAGE-----

On 18-Jun-98 lin...@nightshade.ml.org wrote:
> On Thu, 18 Jun 1998, Rik van Riel wrote:
>
>> On Thu, 18 Jun 1998, Spirilis wrote:
> [snip]
>>
>> In half a year, we're going to be the _only_ OS that's
>> running MS Internet Exploder at a decent speed...
>> Will that mean that MS can't do without _us_ and that
>> some of the internetting population will switch to
>> us :-)
>
> Nope, they will just fill the innerloops with busycode.. And
> make IE Linux
> slower then all the rest, and it will slow the whole system
> down while
> it's running.

In the ol' mainframe days people were not patching sources
but the running binaries. I bet that someone could take up those
skills again.........then again: why bother? lynx rulez :-)

- --thomas

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv

iQCVAwUBNYmEvcQLb2bL5bWVAQF0DQP9Gbz5+Ym8839OKCQ9gRxGyoV8fHe44SUY
WcG/vefstP/5UqfI1fLWs27rv+4OQXaEJtHCeODo1Qha+8S+HpUu3p0Xc7zKnCMY
OjOZwYFIaNyylsNhvYGKuCtCvQnenpicKL6nha46exDfnA+SfVzV/HjszPfQ5mCX
DE8QPpnMgoQ=
=u1X0
-----END PGP SIGNATURE-----

Daniel Egger

unread,

Jun 18, 1998, 3:00:00 AM6/18/98

to

On Thu, 18 Jun 1998, Spirilis wrote:

>Beneficial?? First off, if MS makes a product for Linux, this means a portion
>of the Linux community will be at least partially dependent upon Microsoft
>products... this means, in the end, we CAN'T DO without Microsoft!

Hey I wouldn't mind if the browser is small and fast.
But as we're talking about M$: This will be never the case :))

--

Servus,
Daniel

nu...@bayside.net

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

> >
> >A couple of days ago, they had a web page up at
> >http://linus.microsoft.com. Guess what it was -- The default page for
> >Apache on a RedHat installation!
> >
>

> This could mean that they have finally started porting IE 4.01 to Linux as
> they have done for Solaris and HPUX. I heard that the IE for UNIX
> programmers were all (or at least mostly) Linux guys, they may have
> convinced MS to release IE for Linux. Or they might just have been
> compiling Apache 1.3.0 with frontpage extensions (and the other bundled

> utilities) for Linux. If it is IE, the addition of MS as an application
> provider for Linux should be benifitial to us.
>

> >hmm... I guess microsoft finally decided that windows was too unstable to
> >run. Or, maybe they just wanted to steal some of the source code!

oh, you haven't read http://www.microsoft.com/ie/unix/devs.htm yet?

a quick quote from the page:

And the fact is that both Chapman and Dawson [IE4/solaris developers] have
grown quite comfortable shuttling back and forth between the worlds of
Windows and UNIX. "It's amazing to me how far UNIX has to go today to
catch up to NT," says Dawson. "Take, just for one example, threading
support. UNIX still has benefits, but NT is just a lot more
full-featured."

it's good for a laugh, at least :)
_ _ __ __ _ _ _
| / |/ /_ __/ /_____ | Nuke Skyjumper |
| / / // / '_/ -_) | "Master of the Farce" |
|_ /_/|_/\_,_/_/\_\\__/ _|_ nu...@bayside.net _|

David S. Miller

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
From: Dean Gaudet <dgaudet-list...@arctic.org>

[ My commented is not directed to Dean or anyone in particular,
there were just some things I wanted to state in general wrt.
to the issues raised here. ]

Even with the debugging problems solved, linuxthreads are heavier
than solaris pthreads or NT fibers. Both of those use a
multiplexed user-level and kernel-level threading system which
results in fewer kernel context switches. In userland a "context
switch" is just a function call. But we'll see this solved with
Netscape's NSPR which was released with mozilla -- it provides a
multiplexed threading model (that particular model isn't ported to
linux yet).

Making threads under Linux not be multiplexed at the user side was a
conscious design decision. Doing it half in user half in kernel (and
this is the distinction being mentioned when Solaris nomenclature
speaks of kernel bound and non-kernel bound threads) leads to enormous
levels of complexity for fundamental things such a signal handling.

The folks at Solaris spent a lot of time fixing bugs that were solely
getting signals right in their threads implementation. Keeping track
of what the kernel sends to a "kernel bound thread" and making sure
the right "pure user thread" within gets that signal correctly is
tricky buisness. It's complex and hell to get right. (search the
Solaris patch databases for "threads" and "signals" to see that I'm
for real here about how difficult it is to get right)

This is why we do it the way we do it.

For example, on NT there is absolutely no problem with opening up
10000 files at the same time and holding onto the file handles.
This is exactly what's required to build a top end webserver to get
winning Specweb96 numbers on NT using TransmitFile.

Yes, I know this.

On unix there's no TransmitFile, and instead we end up using mmap()
which has performance problems. Even if we had TransmitFile, 10k
file descriptors isn't there.

One thing to keep in mind when people start howling "xxx OS allows
such and such feature and Linux still does not yet, why is it so
limited etc.???" Go do a little research, and find out what the cost
of 10k file descriptors capability under NT is for processes which
don't use nearly that many.

I know, without actually being able to look at how NT does it, it's
hard to say for sure. But I bet low end processes pay a bit of a
price so these high end programs can have the facility.

This is the reason Linux is still upcoming with the feature. We won't
put it in until we come up with an implementation which costs next to
nothing for "normal" programs.

"You have to recompile your kernel for that." Uh, no thanks, I
have a hard enough time getting webserver reviewers to use the
right configuration file, asking them to recompile a kernel is
absolutely out of the question.

I actually don't tell people to do this. Instead I tell them to find
a solution within the current framework, and that what they are after
is in fact in the works. If someone can't make it work in the current
framework, Linux is not for them at least for now. A bigger danger
than losing users or apps for the moment due to missing features, is
to mis-design something and end up paying for it forever, this is the
path other unixs have gone down.

Unix multiplexing facilities -- select and poll -- are wake-all
primitives. When something happens, everything waiting is awakened
and immediately starts fighting for something to do. What a waste.
They make a lot of sense for processes though. On NT completion
ports provide wake-one semantics... which are perfect for threads.

Yes, this does in fact suck. However, the path to go down is not to
expect the way select/poll work to change, rather look at other
existing facilities or invent new ones which solve this problem.
Too much user code exists which depends upon the wake-all semantics,
so the only person to blame is whoever designed the behaviors of these
unix operations to begin with ;-)

Later,
David S. Miller
da...@dm.cobaltmicro.com

Andi Kleen

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

"David S. Miller" <da...@dm.cobaltmicro.com> writes:

> The folks at Solaris spent a lot of time fixing bugs that were solely
> getting signals right in their threads implementation. Keeping track
> of what the kernel sends to a "kernel bound thread" and making sure
> the right "pure user thread" within gets that signal correctly is
> tricky buisness. It's complex and hell to get right. (search the
> Solaris patch databases for "threads" and "signals" to see that I'm
> for real here about how difficult it is to get right)

Linux (LinuxThreads) has is it not really right unfortunately. There is
no way to send a signal to a process consisting of multiple threads and
it to be delivered to the first thread that has it unblocked (as defined
in POSIX) - it will be always delivered to the thread with the pid it was
directed to.

To fix it CLONE_PID would need to be made fully working.

Unfortunately that opens a can of worms - either a new tid is needed (with
new system calls etc. - ugly), or the the upper 16bits of pid space are
reused - but those are already allocated from Beowulf.

-Andi

Edward S. Marshall

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Thu, 18 Jun 1998, Spirilis wrote:
> Beneficial?? First off, if MS makes a product for Linux, this means a portion
> of the Linux community will be at least partially dependent upon Microsoft
> products...

If you've used IE or Outlook Express for Solaris, you'd know how untrue
your statement is; their UNIX software is beyond buggy...it's nearly
unusable. They do Windows well, and Mac isn't -too- bad, but everything
coming out of Redmond for UNIX has been horribly broken so far, and
uni-platform (SPARC/Solaris only).

I wouldn't waste my time even thinking about Microsoft developing software
for Linux until they work on fixing all their other problems with UNIX
development.

--
-------------------. emarshal at logic.net .---------------------------------
Edward S. Marshall `-----------------------' http://www.logic.net/~emarshal/

Linux labyrinth 2.1.106 #9 SMP Sun Jun 14 14:50:43 CDT 1998 i586 unknown
10:25pm up 9 min, 4 users, load average: 1.23, 0.81, 0.40

Nomad the Wanderer

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

After a few weeks of trying to get IE for solaris Working the other SA quit.
I had Netscape installed in about 30 mins.

Thus spake Edward S. Marshall (emar...@logic.net):

---------------------------------------------------------------------------
Robert L. Harris | Educate the Masses,
Senior System Administrator | Don't just help them to
at Great West Life. \_ Remain ignorant.

http://www.orci.com/~nomad

DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.

FYI:
perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

Spirilis

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Thu, 18 Jun 1998, David S. Miller wrote:

>
> For example, on NT there is absolutely no problem with opening up
> 10000 files at the same time and holding onto the file handles.
> This is exactly what's required to build a top end webserver to get
> winning Specweb96 numbers on NT using TransmitFile.
>
> Yes, I know this.

Is it not possible to configure Linux to be able to use 10k or greater file
descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max?
(shooting down the earlier comment regarding recompiling the kernel to allow 10k
or greater file descriptors...)

Dean Gaudet

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Thu, 18 Jun 1998, David S. Miller wrote:

> Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
> From: Dean Gaudet <dgaudet-list...@arctic.org>
>
> [ My commented is not directed to Dean or anyone in particular,
> there were just some things I wanted to state in general wrt.
> to the issues raised here. ]
>
> Even with the debugging problems solved, linuxthreads are heavier
> than solaris pthreads or NT fibers. Both of those use a
> multiplexed user-level and kernel-level threading system which
> results in fewer kernel context switches. In userland a "context
> switch" is just a function call. But we'll see this solved with
> Netscape's NSPR which was released with mozilla -- it provides a
> multiplexed threading model (that particular model isn't ported to
> linux yet).
>
> Making threads under Linux not be multiplexed at the user side was a
> conscious design decision. Doing it half in user half in kernel (and
> this is the distinction being mentioned when Solaris nomenclature
> speaks of kernel bound and non-kernel bound threads) leads to enormous
> levels of complexity for fundamental things such a signal handling.

Sure. If you need signals that sucks. This makes pthreads really hard to
split up like this, and I can totally see why linuxthreads is the way it
is.

But something like NSPR which requires folks to write in a dialect that is
portable between unix and NT (and still access performance features on
both) doesn't have signals... because asynchronous signalling leads to far
too many race conditions and other crap, it's not even considered good
programming practice these days. I don't miss it at all. NSPR gives me
primitives like PR_Send() which writes data, with a timeout.... which
nails the main thing I would use signals for in posix -- for timeouts.

(For reference NSPR on linux defaults to single process, multiplexed via
poll/select. It can be compiled to use pthreads directly, which also
works on linux. It has a hybrid mode that hasn't been ported to linux
yet.)

> One thing to keep in mind when people start howling "xxx OS allows
> such and such feature and Linux still does not yet, why is it so
> limited etc.???" Go do a little research, and find out what the cost
> of 10k file descriptors capability under NT is for processes which
> don't use nearly that many.
>
> I know, without actually being able to look at how NT does it, it's
> hard to say for sure. But I bet low end processes pay a bit of a
> price so these high end programs can have the facility.

I'm not sure. Did you see my extended file handles proposal? I carefully
avoided O(n) crap, I think it can be done O(1) for everything but process
destruction (where you have to scan the open descriptors). And the stuff
I was proposing is close to what NT provides. But of course it's not
POSIX :)

Briefly, an extended file handle is a global index, all processes get
handles out of this single space. To implement access rights you place an
extra field in each file structure, call it file_access_right. Each
process also has a file_access_right, they have to compare equal for the
handle's use to be permitted. exec() causes a new file_access_right to be
selected. fork() uses the same file_access_right (to set up exec),
clone() uses the same file_access_right.

This is essentially what NT provides. They don't have fork -- when you
create a process you explicitly decide which handles will be passed into
the new process... and they're given new addresses in the new process. To
do that with my scheme you first need to dup an extended fh into a regular
handle. NT does that "behind the scenes".

> Unix multiplexing facilities -- select and poll -- are wake-all
> primitives. When something happens, everything waiting is awakened
> and immediately starts fighting for something to do. What a waste.
> They make a lot of sense for processes though. On NT completion
> ports provide wake-one semantics... which are perfect for threads.
>
> Yes, this does in fact suck. However, the path to go down is not to
> expect the way select/poll work to change, rather look at other
> existing facilities or invent new ones which solve this problem.
> Too much user code exists which depends upon the wake-all semantics,
> so the only person to blame is whoever designed the behaviors of these
> unix operations to begin with ;-)

Right, I've said before that I don't care what the facility looks like, as
long as it provides wake-one :)

Dean

Albert D. Cahalan

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

David S. Miller writes:

>> Unix multiplexing facilities -- select and poll -- are wake-all
>> primitives. When something happens, everything waiting is awakened
>> and immediately starts fighting for something to do. What a waste.
>> They make a lot of sense for processes though. On NT completion
>> ports provide wake-one semantics... which are perfect for threads.
>
> Yes, this does in fact suck. However, the path to go down is not
> to expect the way select/poll work to change, rather look at other
> existing facilities or invent new ones which solve this problem.
> Too much user code exists which depends upon the wake-all semantics,
> so the only person to blame is whoever designed the behaviors of
> these unix operations to begin with ;-)

For select(), a negative fd count could be used to indicate
that only one process (one per event?) should be woken.

Flags are OK. If you allow 24 million file descriptors, you can
still get 8 flag bits.

Chris Wedgwood

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Thu, Jun 18, 1998 at 10:57:35PM -0700, Dean Gaudet wrote:

> Briefly, an extended file handle is a global index, all processes get
> handles out of this single space. To implement access rights you place an
> extra field in each file structure, call it file_access_right. Each
> process also has a file_access_right, they have to compare equal for the
> handle's use to be permitted. exec() causes a new file_access_right to be
> selected. fork() uses the same file_access_right (to set up exec),
> clone() uses the same file_access_right.

This could perhaps be done using the existing semantics where instead of
having one global table, have a two layer approach. So you do a lookup on
descriptor(fd,pid) and then use that as a lookup into a global table.

That hopefully wouldn't be too expensive, although could be excessively
large.

-cw

Miquel van Smoorenburg

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

In article <k27m2eg...@zero.aec.at>, Andi Kleen <a...@muc.de> wrote:
>Linux (LinuxThreads) has is it not really right unfortunately. There is
>no way to send a signal to a process consisting of multiple threads and
>it to be delivered to the first thread that has it unblocked (as defined
>in POSIX) - it will be always delivered to the thread with the pid it was
>directed to.
>
>To fix it CLONE_PID would need to be made fully working.

How about this, I thought this up one night when I could not sleep
so I have no idea if it still makes sense. It's about creating a
"thread group". All threads still have their own PID but share a
TID (Thread group ID) or TGID.

1. When a process calls clone() with CLONE_TID, the calling process becomes
"thread group leader". All it's children, and children of those created
by clone() with CLONE_TID share the thread id, which is the PID of the
first process.

2. All processes still have their own PID.

3. Synchronous signals (generated by the thread execution, e.g. SIGFPE)
are delivered to the thread/pid that raised them.

4. A fatal asynchronous (ie not blocked or caught) signal to any of the threads
causes the signalhandler for that signal to be unblocked, and set to default
for all members of the thread group. The signal is then sent to all threads.

5. An asynchronous signal sent to any thread is handled by the thread or,
if the thread has blocked the signal, by any other thread in the group
that hasn't blocked the signal, but at most to one thread.
Q: what if all threads have blocked the signal? Which one will get
the signal as "pending" ? A: the original thread?

6. Signals to the process group will behave as if sent only to the
thread group leader. Ofcourse (5) still holds.

Ofcourse you also might be able to do something with CLONE_SIGHAND. Perhaps
in combination with the above.

Mike.
--
Miquel van Smoorenburg | Our vision is to speed up time,
miq...@cistron.nl | eventually eliminating it.

Richard Gooch

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

David S. Miller writes:
> Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
> From: Dean Gaudet <dgaudet-list...@arctic.org>

[...]

> Unix multiplexing facilities -- select and poll -- are wake-all
> primitives. When something happens, everything waiting is awakened
> and immediately starts fighting for something to do. What a waste.
> They make a lot of sense for processes though. On NT completion
> ports provide wake-one semantics... which are perfect for threads.
>
> Yes, this does in fact suck. However, the path to go down is not to
> expect the way select/poll work to change, rather look at other
> existing facilities or invent new ones which solve this problem.
> Too much user code exists which depends upon the wake-all semantics,
> so the only person to blame is whoever designed the behaviors of these
> unix operations to begin with ;-)

On the other hand you could say that the UNIX semantics are fine and
are quite scalable, provided you use them sensibly. Some of these
"problems" are due to applications not being properly thought out in
the first place. If for example you have N threads each polling a
chunk of FDs, things can run well, provided you don't have *each*
thread polling *all* FDs. Of course, you want to use poll(2) rather
than select(2), but other than that the point stands.

Regards,

Richard....

Rik van Riel

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Thu, 18 Jun 1998 lin...@nightshade.ml.org wrote:
> On Thu, 18 Jun 1998, Rik van Riel wrote:
>
> > Oh, and OS/2 definately isn't toppled! It still has twice
> > the number of users Windows NT has, and NT is being widely
> > touted a succes :)
>
> Somehow I doubt there are that many people still running OS2.

OS/2 is in wide use in the banking business. Not just
for desktop work, but I've also seen a figure showing
that some 80% of ATM machines is running OS/2...

This makes me very happy because IBM has already fixed
all y2k problems for their own software. This means I
can do my shopping and buy food on jan 1st 2000 (assuming
the shops are open :-)

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.v...@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

Alan Cox

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

> > 10000 files at the same time and holding onto the file handles.
> > This is exactly what's required to build a top end webserver to get
> > winning Specweb96 numbers on NT using TransmitFile.
> >
> > Yes, I know this.
>
> Is it not possible to configure Linux to be able to use 10k or greater file
> descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max?
> (shooting down the earlier comment regarding recompiling the kernel to allow 10k
> or greater file descriptors...)

With Bill Hawes patches for handling file arrays it is. For the generic case
its not. Note that you can forget using select() with 10K descriptors
if you ever want to get any work done.

Alan

Alex Belits

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Fri, 19 Jun 1998, Richard Gooch wrote:

> David S. Miller writes:
> > Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
> > From: Dean Gaudet <dgaudet-list...@arctic.org>
> [...]
> > Unix multiplexing facilities -- select and poll -- are wake-all
> > primitives. When something happens, everything waiting is awakened
> > and immediately starts fighting for something to do. What a waste.
> > They make a lot of sense for processes though. On NT completion
> > ports provide wake-one semantics... which are perfect for threads.
> >
> > Yes, this does in fact suck. However, the path to go down is not to
> > expect the way select/poll work to change, rather look at other
> > existing facilities or invent new ones which solve this problem.
> > Too much user code exists which depends upon the wake-all semantics,
> > so the only person to blame is whoever designed the behaviors of these
> > unix operations to begin with ;-)
>
> On the other hand you could say that the UNIX semantics are fine and
> are quite scalable, provided you use them sensibly. Some of these
> "problems" are due to applications not being properly thought out in
> the first place.

#ifdef SARCASM

"Thundering Herd Problem II", with all original cast... ;-) This time it's
not accept(), but poll(), and the whole thing is multithreaded...

#endif

> If for example you have N threads each polling a
> chunk of FDs, things can run well, provided you don't have *each*
> thread polling *all* FDs. Of course, you want to use poll(2) rather
> than select(2), but other than that the point stands.

Can anyone provide a clear explanation, what is the benefit of doing
that in multiple threads vs. having one thread polling everything, if the
response on fd status change takes negligible time for the thread/process
that is polling them (other processes complete the operation while polling
comtinues)? I have a server that uses separate process mostly for polling,
however I'm not sure what poll()/select() scalability problems it may
encounter if used with huge fd number.

--
Alex

David S. Miller

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

Date: Fri, 19 Jun 1998 06:11:10 -0700 (PDT)
From: Alex Belits <abe...@phobos.illtel.denver.co.us>

Can anyone provide a clear explanation, what is the benefit of
doing that in multiple threads vs. having one thread polling
everything, if the response on fd status change takes negligible
time for the thread/process that is polling them (other processes
complete the operation while polling comtinues)? I have a server
that uses separate process mostly for polling, however I'm not sure
what poll()/select() scalability problems it may encounter if used
with huge fd number.

I look at it this way.

If you can divide the total set of fd's logically into seperate
groups, one strictly to a particular thread. Do it this way.
The problem with one thread polling all fd's and passing event
notification to threads via some other mechanism has the problem that
this one thread becomes the bottle neck.

The problem, for one, with web etc. servers is the incoming connection
socket. If you could tell select/poll "hey, when a new conn comes in,
wake up one of us", poof this issue would be solved. However the
defined semantics for these interfaces says to wake everyone polling
on it up.

Later,
David S. Miller
da...@dm.cobaltmicro.com

Richard Gooch

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

Maybe it's because it's late, but I'm missing what you're getting at
here...

> > If for example you have N threads each polling a
> > chunk of FDs, things can run well, provided you don't have *each*
> > thread polling *all* FDs. Of course, you want to use poll(2) rather
> > than select(2), but other than that the point stands.
>

> Can anyone provide a clear explanation, what is the benefit of doing
> that in multiple threads vs. having one thread polling everything, if the
> response on fd status change takes negligible time for the thread/process
> that is polling them (other processes complete the operation while polling
> comtinues)? I have a server that uses separate process mostly for polling,
> however I'm not sure what poll()/select() scalability problems it may
> encounter if used with huge fd number.

Both poll() and select() have scalability problems: it takes of the
order of 2-3 microseconds (Pentium 100) per FD to scan the list of
FDs. With thousands of FDs, mostly inactive, this time can start to
dominate (especially now that the TCP stack rocks). This is wasted
kernel time.

So you can solve this by dividing up your FDs amongst a group of
threads. Thus each thread only has to scan a short list. Of course,
the total number of FDs that has to be scanned is the same, so you may
ask where is the saving? The answer is that, assuming most FDs are
inactive (quite a reasonable assumption as it turns out, when looking
at timescales of a few ticks), chances are that not all your threads
will have to be woken up. And the less threads that have to be woken
up, the less FDs need to be scanned over any (short) period of
time. And there is your saving.

select() doesn't work so well in this scheme, because a thread that is
scanning FDs 9000-10000 still has to do that bit-testing for FDs
0-9000. Hence you are better off with poll(), even though the per-FD
cost of poll is higher. The poll2() syscall I implemented is the way
around this and other problems, but I've left that dormant since
2.1.5x (I'll get back to it for 2.3.x) because there is more work to
be done on the implementation of select() and poll() before the
benefits of poll2() really show. That work I've also left dormant till
2.3.x, since it wasn't adopted around 2.1.5x and I don't have time to
push it just yet. Have a peek at:

ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme

if you're interested.

Regards,

Richard....

Henning P. Schmiedehausen

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

spir...@mindmeld.dyn.ml.org (Spirilis) writes:

>> utilities) for Linux. If it is IE, the addition of MS as an application
>> provider for Linux should be benifitial to us.

>Beneficial?? First off, if MS makes a product for Linux, this means a portion

>of the Linux community will be at least partially dependent upon Microsoft

>products... this means, in the end, we CAN'T DO without Microsoft! That is what

So what? Noone will force you to use a M$ product on Linux.

I, for myself, would gladly embrace Office 9x for Linux as it would
mean goodbye for NT from my desktop for ever.

Kind regards
Henning

--
Dipl.-Inf. Henning P. Schmiedehausen -- h...@tanstaafl.de
TANSTAAFL! Consulting - Unix, Internet, Security

Hutweide 15 Fon.: 09131 / 50654-0 "There ain't no such
D-91054 Buckenhof Fax.: 09131 / 50654-20 thing as a free Linux"

Richard Jones

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

David S. Miller wrote:
> The problem, for one, with web etc. servers is the incoming connection
> socket. If you could tell select/poll "hey, when a new conn comes in,
> wake up one of us", poof this issue would be solved. However the
> defined semantics for these interfaces says to wake everyone polling
> on it up.

Apache handles this very nicely. It runs a group of processes,
and each *blocks* on accept(2). When a new connection comes in,
the kernel wakes up one, which handles that socket alone, using
blocking I/O (it uses alarm(2) to do timeouts).

This way they avoid the poll/select issue entirely.

[This applies to Apache 1.2, not sure about later versions]

Rich.

--
Richard Jones rjo...@orchestream.com Tel: +44 171 598 7557 Fax: 460 4461
Orchestream Ltd. 125 Old Brompton Rd. London SW7 3RP PGP: www.four11.com
"boredom ... one of the most overrated emotions ... the sky is made
of bubbles ..." Original message content Copyright © 1998

Mike Ford Ditto

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

> > The problem, for one, with web etc. servers is the incoming connection
> > socket. If you could tell select/poll "hey, when a new conn comes in,
> > wake up one of us", poof this issue would be solved. However the
> > defined semantics for these interfaces says to wake everyone polling
> > on it up.
>
> Apache handles this very nicely. It runs a group of processes,
> and each *blocks* on accept(2). When a new connection comes in,
> the kernel wakes up one, which handles that socket alone, using
> blocking I/O (it uses alarm(2) to do timeouts).

This demonstrates the point that select and poll are workarounds for
the lack of threading support in Unix. They aren't needed if you use
a threads facility (or a separate process for each thread you need).

Once you have threads you can stick to the intuitive synchronous model
of system calls, which has always effectively handled waking one of
multiple waiters.

Off topic, I would like to pick a nit:

accept() is a system call. accept(2) is not a system call, it is a
manual page. One doesn't block on accept(2), one *reads* accept(2)
to find out how to use accept().

-=] Ford [=-

"Heaven is exactly like where you (In Real Life: Mike Ditto)
are right now, only much, much better." fo...@omnicron.com
-- Laurie Anderson http://www.omnicron.com/~ford/ford.html

Alan Cox

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

> > the kernel wakes up one, which handles that socket alone, using
> > blocking I/O (it uses alarm(2) to do timeouts).
>
> This demonstrates the point that select and poll are workarounds for
> the lack of threading support in Unix. They aren't needed if you use
> a threads facility (or a separate process for each thread you need).

Actually select and poll are more efficient ways of describing most
multiple source event models without the overhead of threads.

And there are plenty of cases where each one is better. Select is clearly
a better model for inetd for example.

> accept() is a system call. accept(2) is not a system call, it is a
> manual page. One doesn't block on accept(2), one *reads* accept(2)
> to find out how to use accept().

Using accept(2) to indicate you are talking about the system call goes
back to at least my student days read comp.unix.wizards

Alan

Alex Belits

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Fri, 19 Jun 1998, David S. Miller wrote:

> I look at it this way.
>
> If you can divide the total set of fd's logically into seperate
> groups, one strictly to a particular thread. Do it this way.
> The problem with one thread polling all fd's and passing event
> notification to threads via some other mechanism has the problem that
> this one thread becomes the bottle neck.

I realize that every operation, performed indide that process/thread, if
takes any noticeable time, will hold back everything that depends on any
fd status change. But what if the code is optimized to reduce the time in
loop to the absolute minimum possible? Will poll() take more time by
itself (and indeed become a bottleneck) in one thread vs. multiple
poll()'s made at the same time in multiple threads? If the time spent in
the loop is minimal, is there any difference between waking up one of
looping threads, searching through its poll array and performing some
action, and with one thread waking up every time, searching larger array
(IMHO not a significant time compared to time spent by system while
processing those sockets) and then performing the same action, if that
action takes some insignificant time, comparable with time, spent in
buffers handling in the kernel itself? As I understand, with multiple
threads ot not, kernel still needs a time to process file descriptors
and choose thread to wake up even if threads already divided fds among
themselves, so the total amount of fd lists scanning won't change.

> The problem, for one, with web etc. servers is the incoming connection
> socket. If you could tell select/poll "hey, when a new conn comes in,
> wake up one of us", poof this issue would be solved. However the
> defined semantics for these interfaces says to wake everyone polling
> on it up.

This is why I do that in userspace -- one process is always waking up,
connection is placed in its internal queue, its fd is added to the
polling list, and after request is received and parsed asynchronously, fd
is immediately passed to another process through the AF_UNIX socket. While
main process is doing nonblocking I/O on multiple connections, there is no
I/O in the same loop except opening new connections, reading from them and
passing to other processes fds/data of connections that have sent their
requests and expect the response. Kind of userspace "multithreading",
optimized for the particular operation.

Possible problems can be caused either by poll() scalability (it will
take more time than if I did that in multiple threads simultaneously?) or
unexpectedly long time, spent reading data from sockets, or any delays in
fd passing, that I assume, should be followed by a context switch to the
receiving process that won't be unlike wake-one behavior, described by you
and Dean.

--
Alex

Alex Belits

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Fri, 19 Jun 1998, Richard Gooch wrote:

> > > On the other hand you could say that the UNIX semantics are fine and
> > > are quite scalable, provided you use them sensibly. Some of these
> > > "problems" are due to applications not being properly thought out in
> > > the first place.
> >
> > #ifdef SARCASM
> >
> > "Thundering Herd Problem II", with all original cast... ;-) This time it's
> > not accept(), but poll(), and the whole thing is multithreaded...
> >
> > #endif
>
> Maybe it's because it's late, but I'm missing what you're getting at
> here...

Maybe I had to explain it better, I refer to the "thundering herd
problem" (mass-wakeup of processes, waiting on something) that was
discusseded in freebsd-hackers ML about a year ago. Discussion was
along the same lines, just processes were mentioned instead of threads.

Alex Belits

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Fri, 19 Jun 1998, Mike "Ford" Ditto wrote:

> This demonstrates the point that select and poll are workarounds for
> the lack of threading support in Unix. They aren't needed if you use
> a threads facility (or a separate process for each thread you need).

Or threading is a workaround for lack of proper nonblocking I/O handling
in applications. Multithreading and nonblocking I/O are two opposite
concepts when it comes to I/O handling.

> Once you have threads you can stick to the intuitive synchronous model
> of system calls, which has always effectively handled waking one of
> multiple waiters.

Nothing is "intuitive". Everything mostly depends on habit and amount of
trouble, processing model creates for a programmer.

> Off topic, I would like to pick a nit:
>

> accept() is a system call. accept(2) is not a system call, it is a
> manual page. One doesn't block on accept(2), one *reads* accept(2)
> to find out how to use accept().

accept(2) means that system call is mentioned, and not a wrapper over it
that exist in userspace threads implementations.

Chris Wedgwood

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Fri, Jun 19, 1998 at 01:43:03PM +0100, Alan Cox wrote:
>
> With Bill Hawes patches for handling file arrays it is. For the generic
> case its not. Note that you can forget using select() with 10K descriptors

> if you ever want to get any work done.[B

perhaps being a kernel nazi here could help, basically have select(2) return
EPERM or some such (EDONTBEADUMBASS) where more than say 1024 fd's are
specified.

I've been playing with poll and it seems to work adequately and seems to
scale better, but nothing too concrete yet (it compiles and runs, lets ship
it).

Sure, its an ugly hack, but it would break broken code hard, before its
perhaps too late.

-Chris

Alan Cox

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

> perhaps being a kernel nazi here could help, basically have select(2) return
> EPERM or some such (EDONTBEADUMBASS) where more than say 1024 fd's are
> specified.

For select that basically has to occur.

> I've been playing with poll and it seems to work adequately and seems to
> scale better, but nothing too concrete yet (it compiles and runs, lets ship
> it).

If you like playing with poll fix glibc to use poll everywhere not select,
then you will have slain the library file size limit monster

Chris Wedgwood

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

On Sat, Jun 20, 1998 at 12:22:53AM +0100, Alan Cox wrote:

> For select that basically has to occur.

looks like is trucates to KFDS_NR:

if (n < 0)
goto out;
if (n > KFDS_NR)
n = KFDS_NR;

> If you like playing with poll fix glibc to use poll everywhere not select,
> then you will have slain the library file size limit monster

Making an authentication server (which I make money from) and twiddling with
libraries (which doesn't pay my bills) are two very different things.

Nonetheless its on my list of things to do one day. But as always, I expect
to die long before all the items on this list are ever completed.

-Chris

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Richard Jones wrote:

> David S. Miller wrote:
> > The problem, for one, with web etc. servers is the incoming connection
> > socket. If you could tell select/poll "hey, when a new conn comes in,
> > wake up one of us", poof this issue would be solved. However the
> > defined semantics for these interfaces says to wake everyone polling
> > on it up.
>

> Apache handles this very nicely. It runs a group of processes,
> and each *blocks* on accept(2). When a new connection comes in,

> the kernel wakes up one, which handles that socket alone, using
> blocking I/O (it uses alarm(2) to do timeouts).
>

> This way they avoid the poll/select issue entirely.
>
> [This applies to Apache 1.2, not sure about later versions]

1.2 actually will still use select() and then do an accept() immediately
after -- it doesn't treat the single and multiple socket cases
differently. 1.3 fixes this oversight.

But this all falls apart as soon as you ask apache to handle multiple
sockets (i.e. 80 and 443). See
<http://www.apache.org/docs/misc/perf-tuning.html>, search for "accept
Serialization" for my discussion of why it falls apart... and explanation
of why Apache uses fcntl() to serialize things (or flock() or any number
of other interprocess synchronization primitives).

That's a case where Apache would really like to tell the kernel "here's a
list of listening sockets, accept a connection and return me an fd,
thanks".

1.3 also fixes another starvation condition not documented on that page --
if you always scan your sockets from 0 to N, and your server has a really
busy socket early on, then it is possible to starve the higher numbered
sockets. i.e. if socket 3 is port 80, and socket 4 is port 443, and all
the children/threads/whatever always test 3 for new connections first it's
possible that 4 will be starved. 1.3 cycles through the list... this is
another example of things that can be done a lot more efficiently with
completion ports (or an isomorphic technology).

Dean

Jason Venner

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

What I really want is:

Given this list of things I am interested in, return me
an array of objects for each of the things that is ready for me.
The object containing the relevant thing from the interested thing in
question.

So: I would pass in a list of fd's or fd's & a block of data (for writes)
Some listening sockets, some actual sockets

I would get back an array of structures

the leader would be
the fd that the data is for

In the listening case the next data would be an FD optionally with
remote info

In the read case I would get back
a block of data and a length and a 'more ready now' flag

In the write case I would get back
status and available buffer space.

Ed Welbon

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

As TS Eliot would say, You speak better than you know.

On Thu, 18 Jun 1998, Jason Venner wrote:

> Each release we modify some of the parameters in this loop,
> so we can claim a performance increase, so we can tune just
> how much memory, cpu and disk bandwidth we are waisting at
> any given time.

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Richard Gooch wrote:

> On the other hand you could say that the UNIX semantics are fine and
> are quite scalable, provided you use them sensibly. Some of these
> "problems" are due to applications not being properly thought out in

> the first place. If for example you have N threads each polling a

> chunk of FDs, things can run well, provided you don't have *each*
> thread polling *all* FDs. Of course, you want to use poll(2) rather
> than select(2), but other than that the point stands.

You may not be able to exploit the parallism available in the hardware
unless you can "load balance" the descriptors well enough...

Dean

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Alex Belits wrote:

> Or threading is a workaround for lack of proper nonblocking I/O handling
> in applications. Multithreading and nonblocking I/O are two opposite
> concepts when it comes to I/O handling.

They're orthogonal concepts, not opposite. non-blocking I/O is one way to
implement a userland thread library; and a way to implement a hybrid
user/kernel library. Is that what you meant?

Like you allude to, using blocking I/O exclusively can actually be
somewhat of a crutch. Remember for each thread, (user, kernel, or hybrid)
you have a stack, plus a structure that describes the thread. Multiply
that by the number of I/O streams in progress and you really start
threatening your L2 cache... once you've got more active data than your L2
cache can hold performance drops off drastically.

Anthony Barbachan

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

-----Original Message-----
From: Spirilis <spir...@mindmeld.dyn.ml.org>
To: Anthony Barbachan <barb...@trill.cis.fordham.edu>
Cc: linux-...@vger.rutgers.edu <linux-...@vger.rutgers.edu>
Date: Thursday, June 18, 1998 9:35 AM
Subject: Re: [OFFTOPIC] Very amusing DNS...

>> utilities) for Linux. If it is IE, the addition of MS as an application
>> provider for Linux should be benifitial to us.
>
>Beneficial?? First off, if MS makes a product for Linux, this means a
portion
>of the Linux community will be at least partially dependent upon Microsoft
>products... this means, in the end, we CAN'T DO without Microsoft! That is
what

>many other software developers are stuck in... as well as hardware vendors.
Of
>course, I would boycott any MS product made for Linux... but some people
might
>not. That's their own choice, but it still helps Microsoft eventually
topple
>Linux right on its head as it has so many other operating systems. (e.g.
>DR-DOS, OS/2...)

Well your at least partially to late. Their version of Apache for use with
FrontPage 98 can already run on Linux. And how exactly are they suppose to
topple Linux by making applications for it? DR-DOS never had much of a
market. OS/2 also never had much of a market and died because it LACKED
applications amoung other things. MacOS shrank because it force one into a
proprietary architechture that cost much more and also lacked the amount of
applications that DOS and Windows had. So I do not see how they are going
to kill Linux by giving us an app written by a brand name company that RUNS
on Linux.

Alex Belits

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Dean Gaudet wrote:

> On Fri, 19 Jun 1998, Alex Belits wrote:
>
> > Or threading is a workaround for lack of proper nonblocking I/O handling
> > in applications. Multithreading and nonblocking I/O are two opposite
> > concepts when it comes to I/O handling.
>
> They're orthogonal concepts, not opposite. non-blocking I/O is one way to
> implement a userland thread library; and a way to implement a hybrid
> user/kernel library. Is that what you meant?

I mean that they can be used separately, providing the same
functionality, but their combination is rare, not because it can't be
efficient, but because they represent different styles. Some programmers
feel uncomfortably designing programs where they never can do things in a
"natural" order of actions performed on the same object, so they don't use
nonblocking I/O that can leave things incomplete and require doing
something else at any moment. Others can accept that, but have problems
with seeing multiple copies of themselves existing in one universe,
trying to live independently of each other ;-), so they see
"unnatural order" of nonblocking I/O operations as the lesser evil.
Combination of two are never required to achieve the functionality, and
mostly appear when the OS or libraries have significant bias toward one of
model, and programmer is biased toward another one. Performance
requirements may change this, however I still don't believe in "threads
will make everything faster", unless it has " on NT and Solaris"
immediately following it.

Of course, threads can be implemented through nonblocking I/O, and it's
possible to even implement nonblocking I/O through blocking one and
multithreading, however the need of such tricks is more related to
compatibility requirements than to anything else.

I admit that personally I am biased toward nonblocking I/O.

> Like you allude to, using blocking I/O exclusively can actually be
> somewhat of a crutch. Remember for each thread, (user, kernel, or hybrid)
> you have a stack, plus a structure that describes the thread. Multiply
> that by the number of I/O streams in progress and you really start
> threatening your L2 cache... once you've got more active data than your L2
> cache can hold performance drops off drastically.

I/O streams have some data that describes them in any model, explicitly
or implicitly. Various buffers, associated with every stream, seem for me
to be more of a threat for cache than threads descriptors or structures
that describe connections in nonblocking programs.

--
Alex

Anthony Barbachan

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

----Original Message-----
From: Derrik <dpa...@kalifornia.com>
To: A-nthony Barbachan <barb...@trill.cis.fordham.edu>
Cc: Paul Miller <pa...@3dillusion.com>; Spirilis
<spir...@mindmeld.dyn.ml.org>; linux-...@vger.rutgers.edu
<linux-...@vger.rutgers.edu>
Date: Thursday, June 18, 1998 12:26 PM
Subject: Re: [OFFTOPIC] Very amusing DNS...

>On Thu, 18 Jun 1998, Anthony Barbachan wrote:
>
>> This could mean that they have finally started porting IE 4.01 to Linux
as
>> they have done for Solaris and HPUX. I heard that the IE for UNIX
>> programmers were all (or at least mostly) Linux guys, they may have
>> convinced MS to release IE for Linux. Or they might just have been
>> compiling Apache 1.3.0 with frontpage extensions (and the other bundled

>> utilities) for Linux. If it is IE, the addition of MS as an application
>> provider for Linux should be benifitial to us.
>

>Sorry - I wouldn't want IE for Linux. Also, it's not likely to happen -
>I'd guess maybe a 2% chance of it... If you remember Randy Chapman, he's
>now working at Microsoft on some of the UNIX things they do (not very
>many, mostly IE/Unix and FrontPage extensions and the like).. he's already
>stated they have no intention of this. Also, the IE4/Unix "port" was done
>using a commercial library package that implements Win32+MFC on top of
>Solaris - first it's total bloatware, second, I find it unlikely that

Nowadays its hard to find an app thats not, including netscape.

>they'd bother to port that to Linux (no money in it for 'em).

No money in porting to Solaris or HP-UX either, little money from their
Windows one as well as most people get it free. They just want to be the
standard maker so they can make more money from server software, and OSes,
etc. So a Linux port which has a bigger installed base than Solaris or
HP-UX may make sense to them.

>
>P.S. - I'm not against browsers - I might even buy Opera when the Linux
>port comes out. I just hate IE.
>
>Derrik Pates
>dpa...@kalifornia.com
>dpa...@acm.org

Alex Belits

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Anthony Barbachan wrote:

> >Sorry - I wouldn't want IE for Linux. Also, it's not likely to happen -
> >I'd guess maybe a 2% chance of it... If you remember Randy Chapman, he's
> >now working at Microsoft on some of the UNIX things they do (not very
> >many, mostly IE/Unix and FrontPage extensions and the like).. he's already
> >stated they have no intention of this. Also, the IE4/Unix "port" was done
> >using a commercial library package that implements Win32+MFC on top of
> >Solaris - first it's total bloatware, second, I find it unlikely that
>
>
> Nowadays its hard to find an app thats not, including netscape.

Netscape uses separate X frontend (over Motif) and windows one (over
MFC), even though both "foundations" can qualify as bloatware. Threads
support in their portable runtime and inter-module interface in Mozilla 5
seem to be influenced by Windows multithreading and COM, but they don't
include the code of either.

> >they'd bother to port that to Linux (no money in it for 'em).
>
>
> No money in porting to Solaris or HP-UX either, little money from their
> Windows one as well as most people get it free. They just want to be the
> standard maker so they can make more money from server software, and OSes,
> etc.

IE for Unix is a political move (it exists, so it's cross-platform) --
no one expects it to actually be used.

> So a Linux port which has a bigger installed base than Solaris or
> HP-UX may make sense to them.

Except that every Linux box, placed instead of Windows one, is a
potential lost sale for NT, Office, Frontpage and IIS, that are their
bread and butter. And, more important, those boxes, if used as servers,
will continue to support standard HTTP, as opposed to their version of
protocol that will be necessary for the market capture. IE domination
helps promoting nonstandard protocol extensions while widespread use of
non-Microsoft servers promote the standard protocol and partially negate
the effect of IE being widespread on the client side. Getting one more
Linux-only box with IE in NT-only environment creates one more potential
non-Microsoft server, and thus doesn't serve IE's purpose well.

--
Alex

Alex Belits

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Anthony Barbachan wrote:

> Well your at least partially to late. Their version of Apache for use with
> FrontPage 98 can already run on Linux. And how exactly are they suppose to
> topple Linux by making applications for it?

By creating the network that depend on NT boxes for their operation. The
same reason for MS support for Samba using NT domain controller but not
Samba _as_ the domain controller (IIRC).

> DR-DOS never had much of a
> market.

s/market/exposure/

It was used, just users didn't see any difference, so they assumed, it's
MS-DOS.

> OS/2 also never had much of a market

For some time it had significantly more market than NT -- just IBM, as
usual, was so bad at marketing, it didn't promote OS/2 well.

> and died because it LACKED
> applications amoung other things.

Lacked native applications. Windows emulation allowed to run a lot of
Windows software.

> MacOS shrank because it force one into a
> proprietary architechture that cost much more and also lacked the amount of
> applications that DOS and Windows had.

MacOS shrank because underlying OS is technically inferior to everything
else except DOS, and user interface didn't give them too much of advantage
when Microsoft caught up on it.

> So I do not see how they are going
> to kill Linux by giving us an app written by a brand name company that RUNS
> on Linux.

Dependence of that company's actions. Look at Photoshop for Irix and
Windows.

George Bonser

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Fri, 19 Jun 1998, Alex Belits wrote:

> > So a Linux port which has a bigger installed base than Solaris or
> > HP-UX may make sense to them.
>
> Except that every Linux box, placed instead of Windows one, is a
> potential lost sale for NT, Office, Frontpage and IIS, that are their
> bread and butter. And, more important, those boxes, if used as servers,
> will continue to support standard HTTP, as opposed to their version of
> protocol that will be necessary for the market capture. IE domination
> helps promoting nonstandard protocol extensions while widespread use of
> non-Microsoft servers promote the standard protocol and partially negate
> the effect of IE being widespread on the client side. Getting one more
> Linux-only box with IE in NT-only environment creates one more potential
> non-Microsoft server, and thus doesn't serve IE's purpose well.
>
> --
> Alex

That is why I think that you are more likely to see Word for Linux or
Office for Linux than IE. Microsoft will actually make some money from
it. As for the OS, it will not impact Win98/NT-Workstation much since
nobody really buys them anyway, the computer just comes with one of them.
Linux would result in fewer NT-Server sales which is why I would not
expect to see Backoffice or Exchange Server ported to Linux.

George Bonser

Microsoft! Which end of the stick do you want today?

Anthony Barbachan

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

-----Original Message-----
From: Richard Gooch <Richar...@atnf.CSIRO.AU>
To: David S. Miller <da...@dm.cobaltmicro.com>
Cc: dgaudet-list...@arctic.org
<dgaudet-list...@arctic.org>; linux-...@vger.rutgers.edu
<linux-...@vger.rutgers.edu>
Date: Friday, June 19, 1998 6:50 AM
Subject: Re: Thread implementations...

>David S. Miller writes:
>> Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
>> From: Dean Gaudet <dgaudet-list...@arctic.org>
>[...]
>> Unix multiplexing facilities -- select and poll -- are wake-all
>> primitives. When something happens, everything waiting is awakened
>> and immediately starts fighting for something to do. What a waste.
>> They make a lot of sense for processes though. On NT completion
>> ports provide wake-one semantics... which are perfect for threads.
>>
>> Yes, this does in fact suck. However, the path to go down is not to
>> expect the way select/poll work to change, rather look at other
>> existing facilities or invent new ones which solve this problem.
>> Too much user code exists which depends upon the wake-all semantics,
>> so the only person to blame is whoever designed the behaviors of these
>> unix operations to begin with ;-)
>

>On the other hand you could say that the UNIX semantics are fine and
>are quite scalable, provided you use them sensibly. Some of these
>"problems" are due to applications not being properly thought out in
>the first place. If for example you have N threads each polling a
>chunk of FDs, things can run well, provided you don't have *each*
>thread polling *all* FDs. Of course, you want to use poll(2) rather
>than select(2), but other than that the point stands.
>

> Regards,
>
> Richard....

The ideal might be select functions with an extra parameter in which we can
pass a function's address which would then be called automatically, with an
active file descriptor as a parameter, when it is ready.

Richard Gooch

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

> The ideal might be select functions with an extra parameter in which we can
> pass a function's address which would then be called automatically, with an
> active file descriptor as a parameter, when it is ready.

What are you trying to solve here? The kernel scan of all FDs to check
for activity is still needed, unless you have something radically
different in mind (aka. AIO). And it is this kernel scan that chews up
lots of time.
There is another problem, and that is that application scan for which
FDs the kernel said are active: currently the application has to scan
all those FDs. There is a way around this too, but there are other
things to sort out first...

Regards,

Richard....

Martin Mares

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

Hello, world!\n

> This demonstrates the point that select and poll are workarounds for
> the lack of threading support in Unix. They aren't needed if you use
> a threads facility (or a separate process for each thread you need).
>

> Once you have threads you can stick to the intuitive synchronous model
> of system calls, which has always effectively handled waking one of
> multiple waiters.

Not as effectively as it would seem at the first sight. If you decide
to use multithreading, you eat lots of memory and (which is probably more
important) lots of L2 cache for thread stacks.

A possible solution could be extending of the SIGIO concept in the
following way:

- SIGIO handler gets an extra parameter containing the FD
in being reported.

- FASYNC gets split to read and write part

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Uncle Ed's Rule of Thumb: Never use your thumb for a rule."

Richard Gooch

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

Dean Gaudet writes:

>
> On Fri, 19 Jun 1998, Richard Gooch wrote:
>
> > On the other hand you could say that the UNIX semantics are fine and
> > are quite scalable, provided you use them sensibly. Some of these
> > "problems" are due to applications not being properly thought out in
> > the first place. If for example you have N threads each polling a
> > chunk of FDs, things can run well, provided you don't have *each*
> > thread polling *all* FDs. Of course, you want to use poll(2) rather
> > than select(2), but other than that the point stands.
>

> You may not be able to exploit the parallism available in the hardware
> unless you can "load balance" the descriptors well enough...

Use 10 threads. Seems to me that would provide reasonable load
balancing. And increasing that to 100 threads would be even better.
The aim is to ensure that, statistically, most threads will remain
sleeping for several clock ticks.
With a bit of extra work you could even slowly migrate consistently
active FDs to one or a few threads.

Regards,

Richard....

Richard Gooch

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

Alex Belits writes:
> On Fri, 19 Jun 1998, David S. Miller wrote:
>
> > I look at it this way.
> >
> > If you can divide the total set of fd's logically into seperate
> > groups, one strictly to a particular thread. Do it this way.
> > The problem with one thread polling all fd's and passing event
> > notification to threads via some other mechanism has the problem that
> > this one thread becomes the bottle neck.
>
> I realize that every operation, performed indide that process/thread, if
> takes any noticeable time, will hold back everything that depends on any
> fd status change. But what if the code is optimized to reduce the time in
> loop to the absolute minimum possible? Will poll() take more time by
> itself (and indeed become a bottleneck) in one thread vs. multiple
> poll()'s made at the same time in multiple threads? If the time spent in
> the loop is minimal, is there any difference between waking up one of
> looping threads, searching through its poll array and performing some
> action, and with one thread waking up every time, searching larger array
> (IMHO not a significant time compared to time spent by system while
> processing those sockets) and then performing the same action, if that
> action takes some insignificant time, comparable with time, spent in
> buffers handling in the kernel itself? As I understand, with multiple
> threads ot not, kernel still needs a time to process file descriptors
> and choose thread to wake up even if threads already divided fds among
> themselves, so the total amount of fd lists scanning won't change.

Assuming that most FDs are inactive, the time spent scanning a list of
FDs is 2-3 us per FD. So for 1000 FDs, we are looking at milliseconds,
which is quite a bit compared to some simple datagram processing in
userspace. So the time for select(2) or poll(2) of large numbers of
FDs is significant.

Splitting this work across many threads (say 10) reduces the
probability that more than one thread needs to be woken up during any
timeslice, hence far fewer FDs need to be scanned each time (only 100
in this example).

Unfortunately splitting the work amongst many threads is not always
easy. We can improve the speed of select(2) and poll(2) by a factor of
3 by changing the way they are implemented (API remains the same, of
course:-). This will buy us more in the scalability stakes.

> > The problem, for one, with web etc. servers is the incoming connection
> > socket. If you could tell select/poll "hey, when a new conn comes in,
> > wake up one of us", poof this issue would be solved. However the
> > defined semantics for these interfaces says to wake everyone polling
> > on it up.
>

> This is why I do that in userspace -- one process is always waking up,
> connection is placed in its internal queue, its fd is added to the
> polling list, and after request is received and parsed asynchronously, fd
> is immediately passed to another process through the AF_UNIX socket. While
> main process is doing nonblocking I/O on multiple connections, there is no
> I/O in the same loop except opening new connections, reading from them and
> passing to other processes fds/data of connections that have sent their
> requests and expect the response. Kind of userspace "multithreading",
> optimized for the particular operation.

People seem to be very interested in the new connection case. This
doesn't seem all that exiting or interesting to me (just have one
thread blocking in accept(2) and store the new FD in a global
array). To me the problem of processing data on existing connections
is more interesting (and harder to solve: hence more interesting:-).
Is there something deep I'm missing here?

Larry McVoy

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

: Even with the debugging problems solved, linuxthreads are heavier
: than solaris pthreads or NT fibers.

So how about quantifying that a bit and show us some numbers and how they
affect things in real life?

: Unix multiplexing facilities -- select and poll -- are wake-all

: primitives. When something happens, everything waiting is awakened
: and immediately starts fighting for something to do. What a waste.
: They make a lot of sense for processes though. On NT completion
: ports provide wake-one semantics... which are perfect for threads.
:
: Yes, this does in fact suck. However, the path to go down is not to
: expect the way select/poll work to change, rather look at other
: existing facilities or invent new ones which solve this problem.
: Too much user code exists which depends upon the wake-all semantics,

Hmm. SGI changed accept() from wakeup-all to wakeup-one with no problem.

I'd be interested in knowing which programs depend on the race.

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Sat, 20 Jun 1998, Richard Gooch wrote:

> Dean Gaudet writes:
> >
> > On Fri, 19 Jun 1998, Richard Gooch wrote:
> >
> > > On the other hand you could say that the UNIX semantics are fine and
> > > are quite scalable, provided you use them sensibly. Some of these
> > > "problems" are due to applications not being properly thought out in
> > > the first place. If for example you have N threads each polling a
> > > chunk of FDs, things can run well, provided you don't have *each*
> > > thread polling *all* FDs. Of course, you want to use poll(2) rather
> > > than select(2), but other than that the point stands.
> >
> > You may not be able to exploit the parallism available in the hardware
> > unless you can "load balance" the descriptors well enough...
>
> Use 10 threads. Seems to me that would provide reasonable load
> balancing. And increasing that to 100 threads would be even better.

No it wouldn't. 100 kernel-level threads is overkill. Unless your box
can do 100 things at a time there's no benefit from giving the kernel 100
objects to schedule. 10 is a much more reasonable number, and even that
may be too high. You only need as many kernel threads as there is
parallelism to exploit in the hardware. Everything else can, and should,
happen in userland where timeslices can be maximized and context switches
minimized.

> The aim is to ensure that, statistically, most threads will remain
> sleeping for several clock ticks.

What? If I am wasting system memory for a kernel-level thread I'm not
going to go about ensuring that it remains asleep! no way. I'm going to
use each and every time slice to its fullest -- because context switches
have a non-zero cost, it may be small, but it is non-zero.

> With a bit of extra work you could even slowly migrate consistently
> active FDs to one or a few threads.

But migrating them costs you extra CPU time. That's time that strictly
speaking, which does not need to be spent. NT doesn't have to spend this
time when using completion ports (I'm sounding like a broken record).

Look at this another way. If I'm using poll() to implement something,
then I typically have a structure that describes each FD and the state it
is in. I'm always interested in whether that FD is ready for read or
write. When it is ready I'll do some processing, modify the state,
read/write something, and then do nothing with it until it is ready again.

To do this I list for the kernel all the FDs and call poll(). Then the
kernel goes around and polls everything. For many descriptors (i.e. slow
long haul internet clients) this is a complete waste. There are two
approaches I've seen to deal with this:

- don't poll everything as frequently, do complex migration between
different "pools" sorted by how active the FD is. This reduces the number
of times slow sockets are polled. This is a win, but I feel it is far too
complex (read: easy to get wrong).

- let the kernel queue an event when the FD becomes ready. So rather than
calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
basis "when this is ready for read/write queue an event on this pipe, and
could you please hand me back this void * with it? thanks". In this
model when a write() returns EWOULDBLOCK the kernel implicitly sets that
FD up as "waiting for write", similarly for a read(). This means that no
matter what speed the socket is, it won't be polled and no complex
dividing of the FDs into threads needs to be done.

The latter model is a lot like completion ports... but probably far easier
to implement. When the kernel changes an FD in a way that could cause it
to become ready for read or write it checks if it's supposed to queue an
event. If the event queue becomes full the kernel should queue one event
saying "event queue full, you'll have to recover in whatever way you find
suitable... like use poll()".

Dean

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Sun, 21 Jun 1998, Richard Gooch wrote:

> People seem to be very interested in the new connection case. This
> doesn't seem all that exiting or interesting to me (just have one
> thread blocking in accept(2) and store the new FD in a global
> array). To me the problem of processing data on existing connections
> is more interesting (and harder to solve: hence more interesting:-).
> Is there something deep I'm missing here?

The new connection case is actually pretty much the same as all the other
cases, but maybe just easier to explain.

Suppose you do what you suggest. Have a single accept() thread which
plops FDs into a global queue. It also presumably tweaks a condition
variable to awake a waiting processing thread. To start processing a new
socket there are two context switches, one into the accept thread, and one
into a processing thread.

That second switch is a waste. Instead you could mutex protect accept()
and go into it with the processing thread, and release the mutex on the
way out. Then you have only one context switch for each new socket. This,
incidentally, is almost what happens inside the kernel... except the
kernel uses wake-all semantics (freebsd seems to have solved this for
accept... alan and linus say there are difficulties in solving it, so it
hasn't been solved in linux yet). So you can actually drop the mutex.

Back to the single thread/accept queue. There's only a single thread in
accept(), and if the box you're running on has two processors you're not
exploiting the parallelism available. You could do some fun gymnastics at
the user level to put multiple threads waiting on accept() ... but that's
overkill because usually the kernel is the best judge of the parallelism
available. So just putting a collection of threads into accept() and
letting the kernel sort it out solves this.

But does it? Now you have to figure out how many threads you should have
waiting in accept at any one time. (In Apache, this is the joyful
nonsense of deciding the right MinSpareServers and MaxSpareServers
settings to handle load spikes and parallelism and all that fun stuff.)
And your threads waiting in accept are kernel scheduled resources
consuming kernel ram.

If all your program did was call accept() you'd be able to figure this all
out pretty easily. But presumably you do more than that.

accept() is interesting because it is actually an event queue... it's a
queue of new connections arriving on a single socket. The kernel has all
the knowledge it needs to multiplex the socket connections in a way
suitable to the hardware.

But accept() is limiting because it only handles a single listening
socket. If your web server has both port 80 and port 443, you need some
way to accept connections on both. You prefer to run a single web server
to take advantage of shared configuration memory and other resources. Now
you need some method of accepting connections on multiple sockets. You
could just implement two pools of threads, one for each socket. But that
doesn't scale to many many sockets (which some people actually do use, for
better or for worse) ... and now you have to tune min/maxspare parameters
for multiple pools, what a headache.

What you'd really like is a way to say "accept a connection on any of
these sockets" so that you can continue to maintain a single pool of
threads. The single pool is not only easier to configure, it has the
benefits of cache locality. Presumably everything in the pool is
identical -- all the threads are capable of handling the returned socket.
The kernel can use LIFO on the waiting threads because the last-in thread
is most likely to still have data in L1.

But really the same can be said for read/write as well as accept. Suppose
you had a hybrid user/kernel threads package which uses co-operative
pre-emption, i/o points are pre-emption points. When a user-thread does
an i/o the package just notes that the user-thread is blocked. Then it
asks the kernel "give me an FD which is ready for I/O". It determines which
user-thread is waiting for that FD and dispatches that user-thread. In
this model you need as many kernel threads as there is parallelism to
exploit. The user-threads are written in the standard procedural manner,
which is easy to program (rather than the stateful manner of something
like squid where all the state transitions for i/o state are explicit
and in the control of the programmer).

Central to that is the "give me an FD which is ready for I/O" step. This
is where select/poll are traditionally used... but the question is really
a wake-one question, and select/poll are wake-all primitives. The
kernel-threads in this example are all equivalent, any one of them can
switch to any of the user-threads. Can you see how read/write are
pretty similar to accept and how all the problems are related?

Dean Gaudet

unread,

Jun 20, 1998, 3:00:00 AM6/20/98

to

On Sat, 20 Jun 1998, Larry McVoy wrote:

> : Even with the debugging problems solved, linuxthreads are heavier
> : than solaris pthreads or NT fibers.
>
> So how about quantifying that a bit and show us some numbers and how they
> affect things in real life?

As a matter of fact I can quantify this somewhat.

NSPR provides two modes of operation on linux -- one uses pthreads, the
other users a portable userland threads library (the standard
setjmp/longjmp deal although it uses sigsetjmp/siglongjmp, and needs a
little more optimization). I've ported apache 1.3 to NSPR as an
experiment for future versions of apache. I built the non-debugging
versions of the NSPR library, linked my apache-nspr code against it, and
set up a rather crude benchmark.

% dd if=/dev/zero of=htdocs/6k bs=1024 count=6
(the squid folks used to tell me 6k was the average object size on the
net, maybe the number is different these days)

% zb 127.0.0.1 /6k -p 8080 -c 10 -t 10 -k
(this is zeusbench asking for the 6k document, 10 simultaneous clients (it
uses select to multiplex), run for 10 seconds, use keep-alive persistent
http connections)

With pthreads it achieves 811 req/s.
With user threads it achieves 1024.40 req/s.

The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104.

Caveats: While NSPR has been designed extremely well, and the interfaces
don't show any immediate problems with doing underlying optimizations,
it's certainly not top speed yet. This applies in both cases however.
NSPR has a hybrid user/system model that lives on top of pthreads, I
haven't tried it yet (it's not ported to linux according to the docs).

I can do comparisons with the process-model based apache, and I used to
have a native pthreads port of apache... but the latter is out of date now
because I switched my efforts to NSPR in order to have greater portability
(including win32).

Larry does lmbench have a threads component that can benchmark different
threads libraries easily? I have to admit I'm not terribly familiar with
lmbench... but if you've got some benchmarks you'd like me to run I can
try them. Or you can try them -- NSPR comes with mozilla, after
downloading the tarball, "cd mozilla/nsprpub", then do "make BUILD_OPT=1"
to get the user-threads version, and do "make BUILD_OPT=1 USE_PTHREADS=1"
to get the pthreads version.

Nathan Hand

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

<<< No Message Collected >>>

Larry McVoy

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

: This demonstrates the point that select and poll are workarounds for

: the lack of threading support in Unix. They aren't needed if you use
: a threads facility (or a separate process for each thread you need).
:
: Once you have threads you can stick to the intuitive synchronous model
: of system calls, which has always effectively handled waking one of
: multiple waiters.

There are a number of people, usually systems / kernel types, who realize
that multiple threads/processes can have a severe negative effect
on performance, especially when you are trying to make things fit in
a small processor cache. Event driven programming tends to use less
system resources than threaded programming.

Anthony Barbachan

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

-----Original Message-----
From: Richard Gooch <Richar...@atnf.CSIRO.AU>
To: Anthony Barbachan <barb...@trill.cis.fordham.edu>
Cc: David S. Miller <da...@dm.cobaltmicro.com>;
dgaudet-list...@arctic.org <dgaudet-list...@arctic.org>;
linux-...@vger.rutgers.edu <linux-...@vger.rutgers.edu>
Date: Saturday, June 20, 1998 5:03 AM
Subject: Re: Thread implementations...

>Anthony Barbachan writes:
>>
>> -----Original Message-----
>> From: Richard Gooch <Richar...@atnf.CSIRO.AU>
>> To: David S. Miller <da...@dm.cobaltmicro.com>
>> Cc: dgaudet-list...@arctic.org
>> <dgaudet-list...@arctic.org>; linux-...@vger.rutgers.edu
>> <linux-...@vger.rutgers.edu>
>> Date: Friday, June 19, 1998 6:50 AM
>> Subject: Re: Thread implementations...
>>
>>
>> >David S. Miller writes:
>> >> Date: Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
>> >> From: Dean Gaudet <dgaudet-list...@arctic.org>
>> >[...]

>> >> Unix multiplexing facilities -- select and poll -- are wake-all
>> >> primitives. When something happens, everything waiting is awakened
>> >> and immediately starts fighting for something to do. What a waste.
>> >> They make a lot of sense for processes though. On NT completion
>> >> ports provide wake-one semantics... which are perfect for threads.
>> >>
>> >> Yes, this does in fact suck. However, the path to go down is not to
>> >> expect the way select/poll work to change, rather look at other
>> >> existing facilities or invent new ones which solve this problem.
>> >> Too much user code exists which depends upon the wake-all semantics,

>> >> so the only person to blame is whoever designed the behaviors of these
>> >> unix operations to begin with ;-)
>> >

>> >On the other hand you could say that the UNIX semantics are fine and
>> >are quite scalable, provided you use them sensibly. Some of these
>> >"problems" are due to applications not being properly thought out in
>> >the first place. If for example you have N threads each polling a
>> >chunk of FDs, things can run well, provided you don't have *each*
>> >thread polling *all* FDs. Of course, you want to use poll(2) rather
>> >than select(2), but other than that the point stands.
>>

>> The ideal might be select functions with an extra parameter in which we
can
>> pass a function's address which would then be called automatically, with
an
>> active file descriptor as a parameter, when it is ready.
>
>What are you trying to solve here? The kernel scan of all FDs to check
>for activity is still needed, unless you have something radically

Why is the kernel scan needed? Activity on a file handle doesn't just
magically appear. The kernel had to make it active in the first place.

>different in mind (aka. AIO). And it is this kernel scan that chews up
>lots of time.
>There is another problem, and that is that application scan for which
>FDs the kernel said are active: currently the application has to scan
>all those FDs. There is a way around this too, but there are other
>things to sort out first...
>

This is the one I was thinking about.

> Regards,
>
> Richard....

David S. Miller

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Date: Sat, 20 Jun 1998 14:37:36 -0700 (PDT)
From: Dean Gaudet <dgaudet-list...@arctic.org>

With pthreads it achieves 811 req/s.

With user threads it achieves 1024.40 req/s.

The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104.

If you have the opportunity, perform the same benchmark on an
architecture that implements context pids in the TLB. The entire TLB
is for all intents and purposes, flushed entirely of all userland
translations for even thread context switches.

Later,
David S. Miller
da...@dm.cobaltmicro.com

MOLNAR Ingo

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sat, 20 Jun 1998, David S. Miller wrote:

> With pthreads it achieves 811 req/s.
> With user threads it achieves 1024.40 req/s.
>
> The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104.
>
> If you have the opportunity, perform the same benchmark on an
> architecture that implements context pids in the TLB. The entire TLB
> is for all intents and purposes, flushed entirely of all userland
> translations for even thread context switches.

on x86 it is not flushed across thread-thread switches ... and on a PPro,
parts of the TLB are tagged as 'global' (kernel pages obviously), which
keeps the TLB-lossage even across non-shared-VM threads small. (zb->apache
and apache->zb switches in this case).

one thing i noticed about LinuxThreads, the most 'out of balance' basic
pthreads operation in pthread_create(). Does NSPR create a pre-allocated
pool of threads? (or some kind of adaptive pool?) If it's creating threads
heavily (say per-request), then thats bad, at least with the current
LinuxThreads implementation. We have a 1:5 gap between the latency of
clone() and pthread_create() there...

-- mingo

David S. Miller

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Date: Sun, 21 Jun 1998 05:03:29 +0200 (MET DST)
From: MOLNAR Ingo <mi...@valerie.inf.elte.hu>

on x86 it is not flushed across thread-thread switches ... and on a
PPro, parts of the TLB are tagged as 'global' (kernel pages
obviously), which keeps the TLB-lossage even across non-shared-VM
threads small. (zb->apache and apache->zb switches in this case).

I assumed that TSS switches were defined to reload csr3, which by
definition flushes the TLB of user entires.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-

MOLNAR Ingo

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sat, 20 Jun 1998, David S. Miller wrote:

> I assumed that TSS switches were defined to reload csr3, which by
> definition flushes the TLB of user entires.

it does have a 'short-cut' in the microcode, it does not flush the TLB if
cr3(A) == cr3(B) ... ugly :(

-- mingo

MOLNAR Ingo

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sat, 20 Jun 1998, David S. Miller wrote:

> I assumed that TSS switches were defined to reload csr3, which by
> definition flushes the TLB of user entires.
>

> Thats broken, not because it's a silly workaround for the Intel TLB
> mis-design, but rather because it changes behavior from what older
> CPU's did. So if someone optimized things to defer TLB flushes for
> mapping changes, when they knew they would task switch once before
> running the task again, this "microcode optimization" would break the
> behavior such a trick would depend upon.

unless this deferred TLB flush feature gets into 2.1, i plan on making a
new version of the softswitch stuff (that replaces TSS switching) for 2.3,
which should give us more pronounced control over TLB flushes and more ...

David S. Miller

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Date: Sat, 20 Jun 1998 20:12:35 -0700
From: "David S. Miller" <da...@dm.cobaltmicro.com>

I assumed that TSS switches were defined to reload csr3, which by
definition flushes the TLB of user entires.

Thats broken, not because it's a silly workaround for the Intel TLB
mis-design, but rather because it changes behavior from what older
CPU's did. So if someone optimized things to defer TLB flushes for
mapping changes, when they knew they would task switch once before
running the task again, this "microcode optimization" would break the
behavior such a trick would depend upon.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

OK, I agree that the accept case is a specific example of checking a
large number of FDs for some activity, be it read, write or new
connection.

> Central to that is the "give me an FD which is ready for I/O" step. This
> is where select/poll are traditionally used... but the question is really
> a wake-one question, and select/poll are wake-all primitives. The
> kernel-threads in this example are all equivalent, any one of them can
> switch to any of the user-threads. Can you see how read/write are
> pretty similar to accept and how all the problems are related?

Yep. I was thinking single-socket, multiple-connections.
Multiple-socket and multiple-connections is the general case. I still
don't agree that we *have* to have event completion ports, though. See
previous message about a simple (IMHO) userspace solution.

Regards,

Richard....

Raul Miller

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Anthony Barbachan <barb...@mail.cis.fordham.edu> wrote:
> Why is the kernel scan needed? Activity on a file handle doesn't just
> magically appear. The kernel had to make it active in the first place.

Either it has to scan the fd list (for things to queue), or it has to
scan the process list (for things to dequeue). Unfortunately a big,
sparse bitvector just isn't all that great of a way of representing a
short fd list.

--
Raul

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Dean Gaudet writes:
>
>
> On Sat, 20 Jun 1998, Richard Gooch wrote:
[...]

> > Use 10 threads. Seems to me that would provide reasonable load
> > balancing. And increasing that to 100 threads would be even better.
>
> No it wouldn't. 100 kernel-level threads is overkill. Unless your box
> can do 100 things at a time there's no benefit from giving the kernel 100
> objects to schedule. 10 is a much more reasonable number, and even that
> may be too high. You only need as many kernel threads as there is
> parallelism to exploit in the hardware. Everything else can, and should,
> happen in userland where timeslices can be maximized and context switches
> minimized.
>
> > The aim is to ensure that, statistically, most threads will remain
> > sleeping for several clock ticks.
>
> What? If I am wasting system memory for a kernel-level thread I'm not
> going to go about ensuring that it remains asleep! no way. I'm going to
> use each and every time slice to its fullest -- because context switches
> have a non-zero cost, it may be small, but it is non-zero.

The point is that *most* FDs are inactive. If in every timeslice you
have only 5 active FDs (taken from a uniform random distribution),
then with 10 threads only half of those are woken up. Hence only half
the number of FDs have to be scanned when these threads have processed
the activity. For 1000 FDs, then is a saving of 500 FD scans, which is
1.5 ms. So scanning load has gone from 30% to 15% (10 ms timeslice).
Also note that only 5 threads are woken up (scheduled), the other 5
remain asleep.

Now lets look at 100 threads. With 5 active FDs, you still get at most
5 threads woken up. But now FD scanning after processing activity
drops to a total of 50 FDs. So scanning load (per timeslice!) has
dropped to 150 us. So compared with the 10 thread case, we have saved
1.35 ms of FD scanning time. Compared with the 1 thread case, we have
saved 2.85 ms of scanning time (as always, per 10 ms timeslice). In
other words, only 0.15% scanning load. And still we are only
scheduling 5 threads *this timeslice*!

I don't know why you care so much about context switches: the time
taken for select(2) or poll(2) for many FDs is dominant!

Just how much time do you think scheduling is taking???

> > With a bit of extra work you could even slowly migrate consistently
> > active FDs to one or a few threads.
>
> But migrating them costs you extra CPU time. That's time that strictly
> speaking, which does not need to be spent. NT doesn't have to spend this
> time when using completion ports (I'm sounding like a broken record).

Migration is pretty cheap: it's a matter of swapping some entries in a
table. And migration only happens upon FD activity. Adding a few extra
microseconds for migration is peanuts compared with the time taken to
process a datagram.

> Look at this another way. If I'm using poll() to implement something,
> then I typically have a structure that describes each FD and the state it
> is in. I'm always interested in whether that FD is ready for read or
> write. When it is ready I'll do some processing, modify the state,
> read/write something, and then do nothing with it until it is ready again.

Yep, fine. My conceptual model is that I call a callback for each
active FD. Same thing.

> To do this I list for the kernel all the FDs and call poll(). Then the
> kernel goes around and polls everything. For many descriptors (i.e. slow
> long haul internet clients) this is a complete waste. There are two
> approaches I've seen to deal with this:
>
> - don't poll everything as frequently, do complex migration between
> different "pools" sorted by how active the FD is. This reduces the number
> of times slow sockets are polled. This is a win, but I feel it is far too
> complex (read: easy to get wrong).

It only needs to be done "right" once. In a library. Heck, I might
even modify my own FD management library code to do this just to prove
the point. Write once, use many!
Note that even the "complex" migration is optional: simply dividing up
FDs equally between N threads is a win.
Having migration between a small number of threads is going to be a
*real* win.

> - let the kernel queue an event when the FD becomes ready. So rather than
> calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> basis "when this is ready for read/write queue an event on this pipe, and
> could you please hand me back this void * with it? thanks". In this
> model when a write() returns EWOULDBLOCK the kernel implicitly sets that
> FD up as "waiting for write", similarly for a read(). This means that no
> matter what speed the socket is, it won't be polled and no complex
> dividing of the FDs into threads needs to be done.

I think this will be more complex to implement than a small userspace
library that uses a handful of threads.

> The latter model is a lot like completion ports... but probably far easier
> to implement. When the kernel changes an FD in a way that could cause it
> to become ready for read or write it checks if it's supposed to queue an
> event. If the event queue becomes full the kernel should queue one event
> saying "event queue full, you'll have to recover in whatever way you find
> suitable... like use poll()".

This involves kernel bloat. It seems to me that there is such a simple
userspace solution, so why bother hacking the kernel?
I'd much rather hack the kernel to speed up select(2) and poll(2) a
few times. This benefits all existing Linux/UNIX programmes.

Regards,

Richard....

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Anthony Barbachan writes:
> >> From: Richard Gooch <Richar...@atnf.CSIRO.AU>

> >> >On the other hand you could say that the UNIX semantics are fine and
> >> >are quite scalable, provided you use them sensibly. Some of these
> >> >"problems" are due to applications not being properly thought out in
> >> >the first place. If for example you have N threads each polling a
> >> >chunk of FDs, things can run well, provided you don't have *each*
> >> >thread polling *all* FDs. Of course, you want to use poll(2) rather
> >> >than select(2), but other than that the point stands.
> >>
> >> The ideal might be select functions with an extra parameter in which we
> can
> >> pass a function's address which would then be called automatically, with
> an
> >> active file descriptor as a parameter, when it is ready.
> >
> >What are you trying to solve here? The kernel scan of all FDs to check
> >for activity is still needed, unless you have something radically
>

> Why is the kernel scan needed? Activity on a file handle doesn't just
> magically appear. The kernel had to make it active in the first place.

The kernel scan is in the implementation of select(2) and
poll(2). Read fs/select.c to see how it's done. For each FD in your
array, the kernel has to check for activity. If you specify a timeout
and there is no activity yet, the kernel puts your process to sleep
and when activity does finally happen, this scan is performed *again*.

> >different in mind (aka. AIO). And it is this kernel scan that chews up
> >lots of time.
> >There is another problem, and that is that application scan for which
> >FDs the kernel said are active: currently the application has to scan
> >all those FDs. There is a way around this too, but there are other
> >things to sort out first...
>
> This is the one I was thinking about.

That is easily solved by implementing poll2(2), which I did last year
(not submitted because there is not much point until other things are
optimised). Hopefully sometime during 2.3 I'll have time to get back
to this.

John Kodis

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote:

> - let the kernel queue an event when the FD becomes ready. So rather than
> calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> basis "when this is ready for read/write queue an event on this pipe, and
> could you please hand me back this void * with it? thanks".

Yow! Shades of VMS! This sounds very much like the VMS Async System
Trap mechanism that allowed you to perform a queued IO operation using
a call something like:

status = sys$qio(
READ_OPCODE, fd, buffer, sizeof(buffer),
<lots of other parameters that I've long since forgotten>,
ast_function, ast_parameter, ...);

The read would get posted, and when complete the ast_function would
get called with the ast_parameter in the context of the process that
posted the QIO. This provided a powerful and easy-to-use method of
dealing with async IO. It's one of the few VMS features that I wish
Unix supported.

-- John Kodis.

Alex Buell

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, John Kodis wrote:

> status = sys$qio(
> READ_OPCODE, fd, buffer, sizeof(buffer),
> <lots of other parameters that I've long since forgotten>,
> ast_function, ast_parameter, ...);
>
> The read would get posted, and when complete the ast_function would
> get called with the ast_parameter in the context of the process that
> posted the QIO. This provided a powerful and easy-to-use method of
> dealing with async IO. It's one of the few VMS features that I wish
> Unix supported.

Yep, this was very cool. I used it for great effect in a MUD game I wrote
on VAX/VMS 5.1 yonks ago at university.

Cheers,
Alex
--
/\_/\ Legalise cannabis now!
( o.o ) Smoke some cannabis today!
> ^ < Peace, Love, Unity and Respect to all.

http://www.tahallah.demon.co.uk

John Summerfield

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

: On Fri, 19 Jun 1998, Dean Gaudet wrote:

: I mean that they can be used separately, providing the same
: functionality, but their combination is rare, not because it can't be
: efficient, but because they represent different styles. Some programmers
: feel uncomfortably designing programs where they never can do things in
a
: "natural" order of actions performed on the same object, so they don't
use
: nonblocking I/O that can leave things incomplete and require doing
: something else at any moment. Others can accept that, but have problems
: with seeing multiple copies of themselves existing in one universe,
: trying to live independently of each other ;-), so they see
: "unnatural order" of nonblocking I/O operations as the lesser evil.
: Combination of two are never required to achieve the functionality, and
: mostly appear when the OS or libraries have significant bias toward one
of
: model, and programmer is biased toward another one. Performance
: requirements may change this, however I still don't believe in "threads
: will make everything faster", unless it has " on NT and Solaris"
: immediately following it.

: Of course, threads can be implemented through nonblocking I/O, and
it's
: possible to even implement nonblocking I/O through blocking one and
: multithreading, however the need of such tricks is more related to
: compatibility requirements than to anything else.

I'm not sure what you mean by threads on Linux, but on OS/2 (with which
I'm more familiar) different threads can be executing simultaneously on
different processors in an SMP environment. To my mind this is one of the
greater benefits of threads. Along with the notion of using a separate
thread to print (for example in wordprocessing software) and maybe
handling some http requests in a web server, and ftp request in an ftp
server.

Cheers
John Summerfield
http://os2.ami.com.au/os2/ for OS/2 support.
Configuration, networking, combined IBM ftpsites index.

Alan Cox

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

> On Sun, 21 Jun 1998, John Kodis wrote:
>
> > status = sys$qio(
> > READ_OPCODE, fd, buffer, sizeof(buffer),

> > get called with the ast_parameter in the context of the process that
> > posted the QIO. This provided a powerful and easy-to-use method of
> > dealing with async IO. It's one of the few VMS features that I wish
> > Unix supported.

This kind of asynchronous I/O is defined in the Posix RT specification
and should all be in glibc 2.1.x at some point (the 2.1.x kernel has
real time signals - which queue and contain one byte of data - the
C library can do the rest with clone() and signal handlers).

> Yep, this was very cool. I used it for great effect in a MUD game I wrote
> on VAX/VMS 5.1 yonks ago at university.

VaxMUD ?

Gerard Roudier

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, John Kodis wrote:

> On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote:
>
> > - let the kernel queue an event when the FD becomes ready. So rather than
> > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> > basis "when this is ready for read/write queue an event on this pipe, and
> > could you please hand me back this void * with it? thanks".
>
> Yow! Shades of VMS! This sounds very much like the VMS Async System
> Trap mechanism that allowed you to perform a queued IO operation using
> a call something like:
>

> status = sys$qio(
> READ_OPCODE, fd, buffer, sizeof(buffer),

> <lots of other parameters that I've long since forgotten>,
> ast_function, ast_parameter, ...);
>
> The read would get posted, and when complete the ast_function would

> get called with the ast_parameter in the context of the process that
> posted the QIO. This provided a powerful and easy-to-use method of
> dealing with async IO. It's one of the few VMS features that I wish
> Unix supported.

RSX and friends (IAS, ...) already had such a feature.
With such a mechanism, application programs get IO completion (software)
interrupt as the kernel get completion interrupt from the hardware.
DEC O/Ses have had AST mechanisms for years without offering threads.
Speaking about VMS, you can pass data (or event) using interlocked
queues between AST and process and between processes using shared
memory and so you donnot need to use critical sections for synchonizing
data or event passing. No need to use several threads sharing a process
address space to make things rights.

Using multi-threading into a single process context is, IMO, just
importing into user-land kernel-like problems and providing such
a feature complexifies significantly the involved kernel.
Multi-threading into processes is not the way to go, IMO, especially
if you want to be portable across platforms.

If one really need to use threads, then, one of the following is true,
in my opinion:
- One likes complexity since one is stupid as most programmers.
- One's O/S handles processes as bloat entities.
- One has heared too much O/S 2 lovers.
- One is believing that MicroSoft-BASIC is multi-threaded.

There is probably lots of OS2 multi-threaded programs that can only be
broken on SMP, since I often heared OS2 multi-braindeaded programmers
assuming that threads inside a process are only preempted when
they call a system service.

I have written and maintained lots of application programs under VMS,
UNIX, some have been ported to a dozen of O/S, none of them uses threads.
I donnot envision to use multi-threads in application software and I
donnot want to have to deal with applications that uses this, for the
simple reasons that threads semantics differs too much between operating
systems and that application programs are often large programs that
donnot follow the high level of quality of O/S softwares.

Traditionnal UNIXes used light processes and preferently blocking I/Os.
Signals were preferently for error conditions.
The select() semantic has been a hack that has been very usefull for
implementing event-driven applications using a low number of fds, as
the X Server. Trying to use such a semantic to deal with thousands of
handles can only lead to performance problems. This is trivial.

Regards,
Gerard.

Dean Gaudet

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, MOLNAR Ingo wrote:

> one thing i noticed about LinuxThreads, the most 'out of balance' basic
> pthreads operation in pthread_create(). Does NSPR create a pre-allocated
> pool of threads? (or some kind of adaptive pool?) If it's creating threads
> heavily (say per-request), then thats bad, at least with the current
> LinuxThreads implementation. We have a 1:5 gap between the latency of
> clone() and pthread_create() there...

I prespawn a fixed number of threads. I still have to make it dynamically
size the pool of threads... NSPR itself doesn't do that. (It's one of the
optimizations I'd like to get to.) But it shouldn't affect the numbers I
posted.

Dean

Dean Gaudet

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, Richard Gooch wrote:

> Just how much time do you think scheduling is taking???

I care more about cache pollution. That is a side-effect of
context-switching which isn't entirely obvious from the context-switch
cost itself.

> It only needs to be done "right" once. In a library. Heck, I might
> even modify my own FD management library code to do this just to prove
> the point. Write once, use many!
> Note that even the "complex" migration is optional: simply dividing up
> FDs equally between N threads is a win.
> Having migration between a small number of threads is going to be a
> *real* win.

Right, and if you'll release this in a license other than GPL (i.e. LGPL
or MPL) so that it can be reused in non-GPL code (i.e. NSPR which is NPL),
that would be most excellent. (acronyms rewl).

> This involves kernel bloat. It seems to me that there is such a simple
> userspace solution, so why bother hacking the kernel?

I don't think the userspace solution is as fast as the event queue
solution.

Dean

Dean Gaudet

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, John Summerfield wrote:

> : On Fri, 19 Jun 1998, Dean Gaudet wrote:

Er, I didn't write the following... careful with the attributions.

Dean

Alan Cox

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

> > This involves kernel bloat. It seems to me that there is such a simple
> > userspace solution, so why bother hacking the kernel?
>
> I don't think the userspace solution is as fast as the event queue
> solution.

I think thats pretty obvious. Select() is an event queue mechanism which
does a setup for each select(). Asynchronous I/O has some similar
properties (clone, I/O , signal) but is only per handle. A pure event
queue model does one setup per handle only per handle that matters and
not per event setups. You just get the queue overheads

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Dean Gaudet writes:
>
>
> On Sun, 21 Jun 1998, Richard Gooch wrote:
>
> > Just how much time do you think scheduling is taking???
>
> I care more about cache pollution. That is a side-effect of
> context-switching which isn't entirely obvious from the context-switch
> cost itself.

True, but neverthless 3 ms is enough time to suck in 150 kBytes of
data into cache (assuming 50 MBytes/sec bus). And each thread should
be executing *nearly* the same code as the others.

> > It only needs to be done "right" once. In a library. Heck, I might
> > even modify my own FD management library code to do this just to prove
> > the point. Write once, use many!
> > Note that even the "complex" migration is optional: simply dividing up
> > FDs equally between N threads is a win.
> > Having migration between a small number of threads is going to be a
> > *real* win.
>
> Right, and if you'll release this in a license other than GPL (i.e. LGPL
> or MPL) so that it can be reused in non-GPL code (i.e. NSPR which is NPL),
> that would be most excellent. (acronyms rewl).

Err, well, I didn't mention a license, but as it happens my library is
LGPL.

> > This involves kernel bloat. It seems to me that there is such a simple
> > userspace solution, so why bother hacking the kernel?
>
> I don't think the userspace solution is as fast as the event queue
> solution.

Actually, I probably agree. But for me it's not the point: I believe
we can provide a sufficiently fast, scalable and lightweight
implementation in userspace, such that I/O completion ports and other
(kernel-space) features are not required.

Regards,

Richard....

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Gerard Roudier writes:
>
> On Sun, 21 Jun 1998, John Kodis wrote:
>
> > On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote:
> >
> > > - let the kernel queue an event when the FD becomes ready. So rather than
> > > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> > > basis "when this is ready for read/write queue an event on this pipe, and
> > > could you please hand me back this void * with it? thanks".
> >
> > Yow! Shades of VMS! This sounds very much like the VMS Async System
> > Trap mechanism that allowed you to perform a queued IO operation using
> > a call something like:

[...]

> Using multi-threading into a single process context is, IMO, just
> importing into user-land kernel-like problems and providing such
> a feature complexifies significantly the involved kernel.
> Multi-threading into processes is not the way to go, IMO, especially
> if you want to be portable across platforms.

I'm proposing a userspace abstraction that, on Unix systems, uses
select(2)/poll(2) and a modest number of threads. It could be ported
to another OS which has completion ports, if you cared.

> If one really need to use threads, then, one of the following is true,
> in my opinion:
> - One likes complexity since one is stupid as most programmers.
> - One's O/S handles processes as bloat entities.
> - One has heared too much O/S 2 lovers.
> - One is believing that MicroSoft-BASIC is multi-threaded.

Wow! This is really arrogant!

> There is probably lots of OS2 multi-threaded programs that can only be
> broken on SMP, since I often heared OS2 multi-braindeaded programmers
> assuming that threads inside a process are only preempted when
> they call a system service.

I don't see what this has to do with real threads on a real Unix.

> I have written and maintained lots of application programs under VMS,
> UNIX, some have been ported to a dozen of O/S, none of them uses threads.
> I donnot envision to use multi-threads in application software and I
> donnot want to have to deal with applications that uses this, for the
> simple reasons that threads semantics differs too much between operating
> systems and that application programs are often large programs that
> donnot follow the high level of quality of O/S softwares.

Threads have their uses. Sure, they can be abused. So what?

> Traditionnal UNIXes used light processes and preferently blocking I/Os.
> Signals were preferently for error conditions.
> The select() semantic has been a hack that has been very usefull for
> implementing event-driven applications using a low number of fds, as
> the X Server. Trying to use such a semantic to deal with thousands of
> handles can only lead to performance problems. This is trivial.

A lightweight userspace solution that uses a modest number of threads
is cabable of giving us a fast and scalable mechanism for handling
very large numbers of FDs. And it can do this without changing one
line of kernel code.
Independently, we can optimise the kernel to speed up select(2) and
poll(2) so that both this userspace library as well as other Unix
programmes benefit.

Richard Gooch

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

Alan Cox writes:
> > > This involves kernel bloat. It seems to me that there is such a simple
> > > userspace solution, so why bother hacking the kernel?
> >
> > I don't think the userspace solution is as fast as the event queue
> > solution.
>

> I think thats pretty obvious. Select() is an event queue mechanism which
> does a setup for each select(). Asynchronous I/O has some similar
> properties (clone, I/O , signal) but is only per handle. A pure event
> queue model does one setup per handle only per handle that matters and
> not per event setups. You just get the queue overheads

The point is a good userspace solution should be *fast enough*. I
define "fast enough" to be "such that polling overheads contribute
less than 10% of the application load".

Alex Belits

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

On Sun, 21 Jun 1998, John Summerfield wrote:

> I'm not sure what you mean by threads on Linux, but on OS/2 (with which
> I'm more familiar) different threads can be executing simultaneously on
> different processors in an SMP environment.

So are threads on Linux.

> To my mind this is one of the
> greater benefits of threads.

However separate processes on separate processors work even better.

> Along with the notion of using a separate
> thread to print (for example in wordprocessing software) and maybe
> handling some http requests in a web server, and ftp request in an ftp
> server.

Processes were used for that purpose since the invention of their
concept. For tasks that handle data independently and have to receive some
amount of initial data and then work independently or process existing i/o
streams, processes are more efficient, unless their management is poor
and requires large amount of process creation/exiting (early web servers
design, and inetd-based servers). In the situation where processing is
interdependent and requires reading/writing the same varables, mutexes in
threads can "serialize" processing at the extent where it will be
emulating one thread with asynchronous/nonblocking operation, and it will
negate the effect of SMP similar to the infamous low granularity locks
problem in SMP kernels.

--
Alex

Richard Gooch

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

I've written a document that tries to cover the various issues with
I/O events. Check out:
http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html

Regards,

Richard....

Dean Gaudet

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Note that the poll_ctrl you introduce in
<ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme>
is almost all the work required for a completion queue. The additional
code required is to add "void *user_data; int completion_fd;" to the event
structure. If the low level code is smart enough to fill in your events
structure it's smart enough to plop a word into a pipe when necessary. So
are you sure it'd be too much bloat to do completion queues? :)

Dean

Richard Gooch

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Dean Gaudet writes:
> Note that the poll_ctrl you introduce in
>
> <ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme>

Hey! Someone's already read it:-)

> is almost all the work required for a completion queue. The additional
> code required is to add "void *user_data; int completion_fd;" to the event
> structure. If the low level code is smart enough to fill in your events
> structure it's smart enough to plop a word into a pipe when necessary. So
> are you sure it'd be too much bloat to do completion queues? :)
>

> On Mon, 22 Jun 1998, Richard Gooch wrote:
>
> > I've written a document that tries to cover the various issues with
> > I/O events. Check out:
> > http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html

The new mechanism I introduce optimises an existing POSIX
interface. Also, it is optional: drivers which continue to do things
the old way will still work, they just won't be as fast. With
completion ports all drivers will have to be modified, so it involves
a lot more work.

I do agree that if my fastpoll optimisation is added, then the logical
place to add completion port support is in poll_notify(). I've added a
note in my documentation about that.

BTW: what happens when a FD is closed before the completion event is
read by the application? Protecting against that could be tricky, and
may require more code than simply dropping an int into a pipe.

Pavel Machek

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Hi!

> > > status = sys$qio(
> > > READ_OPCODE, fd, buffer, sizeof(buffer),

> > > get called with the ast_parameter in the context of the process that
> > > posted the QIO. This provided a powerful and easy-to-use method of
> > > dealing with async IO. It's one of the few VMS features that I wish
> > > Unix supported.
>

> This kind of asynchronous I/O is defined in the Posix RT specification
> and should all be in glibc 2.1.x at some point (the 2.1.x kernel has
> real time signals - which queue and contain one byte of data - the

One _byte_? Would not one long be much, much better? How are you going
to fit filehandle into one byte?

Pavel
--
The best software in life is free (not shareware)! Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

Alan Cox

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

> One _byte_? Would not one long be much, much better? How are you going
> to fit filehandle into one byte?

I think its one byte, and I agree one long might be better, one byte is
a good enough hash into a pending I/O RQ table

Malcolm Beattie

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Alan Cox writes:
> > One _byte_? Would not one long be much, much better? How are you going
> > to fit filehandle into one byte?
>
> I think its one byte, and I agree one long might be better, one byte is
> a good enough hash into a pending I/O RQ table

It's not a byte: it's an int or pointer. struct siginfo_t contains a
union sigval si_value and union sigval requires at least the members:
int sival_int;
void *sival_ptr;

--Malcolm

--
Malcolm Beattie <mbea...@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services

Andi Kleen

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

al...@lxorguk.ukuu.org.uk (Alan Cox) writes:

> > One _byte_? Would not one long be much, much better? How are you going
> > to fit filehandle into one byte?
>
> I think its one byte, and I agree one long might be better, one byte is
> a good enough hash into a pending I/O RQ table

At least in my 2.1.106 includes sigval_t for i386 is defined as:

typedef union sigval {
int sival_int;
void *sival_ptr;
} sigval_t;

The Single Unix speciification defines it as:

...
The sigval union is defined as:
int sival_int integer signal value
void* sival_ptr pointer signal value

-Andi

Pierre Phaneuf

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Nomad the Wanderer wrote:

> After a few weeks of trying to get IE for solaris Working the other SA quit.
> I had Netscape installed in about 30 mins.

Including the download, right? ;-)

--
Pierre Phaneuf
Web: http://newsfeed.sx.nec.com/~ynecpip/
finger: yne...@newsfeed.sx.nec.com

Nomad the Wanderer

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Just about.

Just untar/gzip it, run the install program, tell it to
install it in /usr/local/tools/Netscape/SunOS-X (X being 4 or 5)
and then just put a wrapper in place to determin which client is needed
then nfs export the sucker.

Robert

Thus spake Pierre Phaneuf (ppha...@sx.nec.com):

> Nomad the Wanderer wrote:
>
> > After a few weeks of trying to get IE for solaris Working the other SA quit.
> > I had Netscape installed in about 30 mins.
>
> Including the download, right? ;-)
>
> --
> Pierre Phaneuf
> Web: http://newsfeed.sx.nec.com/~ynecpip/
> finger: yne...@newsfeed.sx.nec.com

---------------------------------------------------------------------------
Robert L. Harris | Educate the Masses,
Senior System Administrator | Don't just help them to
at Great West Life. \_ Remain ignorant.

http://www.orci.com/~nomad

DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.

FYI:
perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'