Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Direct Linux syscalls

3 views

Skip to first unread message

Beth

unread,

Dec 17, 2004, 10:20:23 PM12/17/04

Hi,

I presume I've got the right place to ask for this...kind of "low
level" stuff, though...

Anyway...basically, programming Linux with assembly language using the "int
80h" interface directly, I'm confused by lots of different accounts of how
it works on different webpages all over the 'Net...

Apparently, with the 2.4 kernel (at least, I think it was that version
number...if not, which one was it? ;), the EBP register was added as
another register available for passing parameters to the "int 80h" system
call...

And what's confusing me is, with a 6 parameter call like "mmap", does it
now use the EBP register for the 6th parameter...OR does it still use the
old non-EBP calling style for "backwards compatibility"? Some pages I read
on the 'Net seem to "suggest" one, some the other (and none actually
address the issue directly...hence, asking around here, on the off-chance
someone might know :)...

When EBP was added as another useable register for "INT 80h", was this a
"retroactive" addition (that is, _all_ syscalls now go by the new EBP
interface) or was it only a new thing for post-2.4 syscalls (a "backwards
compatible" thing that older calls stay with the older interface but all
the newer calls have been "updated" with the new style interface)?

In other words, with a post-2.4 kernel, should I be putting the parameters
into registers including EBP for "mmap"...or is it still "stick it in
memory and pass a pointer to it in EBX", despite the changes to add EBP?

If it is the case that older calls use the older interface and newer calls
use the newer interface then the question that stems from that is, of
course, where is it documented with which kernel versions various syscalls
were added? You know, so that I can work out which syscalls need the new
interface and which are still on the old interface...

Although, in fact, any reference which explains which syscalls were added
with which kernel version would be a useful thing anyway (you know, to know
what syscalls are valid for which kernels and such more generally :)...

Just trying to compose some NASM include files and macros which can
"automate" making the system calls, you see...but it's actually rather
difficult to get useful information on this "low level" aspect because so
many people just stick with using the C interface (and, no, I don't need
the lecture on why you should use the C interface for "portability"...what
I need to do is _specific_ to Linux so "portability" is simply NOT an issue
and it's in assembly language for assembly language...so, I do understand
the benefits of the C interface but it's just "not applicable" in this
particular case :)...

Beth :)

Kasper Dupont

unread,

Dec 18, 2004, 3:19:06 AM12/18/04

Beth wrote:
>
> Hi,
>
> I presume I've got the right place to ask for this...kind of "low
> level" stuff, though...
>
> Anyway...basically, programming Linux with assembly language using the "int
> 80h" interface directly, I'm confused by lots of different accounts of how
> it works on different webpages all over the 'Net...

At some point (I think it was in 2.6) a new and faster
way was introduced. You still pass parameters the same
way, but instead of using int 0x80, you do something
else. (I don't know exactly what to do). But the old
interface still exist for compatibility.

>
> Apparently, with the 2.4 kernel (at least, I think it was that version
> number...if not, which one was it? ;), the EBP register was added as
> another register available for passing parameters to the "int 80h" system
> call...

I know an extra register was added at some point. But
I didn't know which one and when. But EBP sounds likely.

>
> And what's confusing me is, with a 6 parameter call like "mmap", does it
> now use the EBP register for the 6th parameter...OR does it still use the
> old non-EBP calling style for "backwards compatibility"? Some pages I read
> on the 'Net seem to "suggest" one, some the other (and none actually
> address the issue directly...hence, asking around here, on the off-chance
> someone might know :)...

A new mmap call was introduced. So we now have two
different mmap calls with different numbers (90 and
192). The old one is called exactly the same way it
used to be, but the name has been changed. The
renaiming does however not affect compatibility.

>
> When EBP was added as another useable register for "INT 80h", was this a
> "retroactive" addition (that is, _all_ syscalls now go by the new EBP
> interface) or was it only a new thing for post-2.4 syscalls (a "backwards
> compatible" thing that older calls stay with the older interface but all
> the newer calls have been "updated" with the new style interface)?

Compatibility across the user/kernel interface is a
priority. So no calls gets removed for this reason.
In older kernels, a pointer to a struct was used for
those calls needing more arguments than the kernel
supported. The old calls still exist, and better
calls have been added for new libraries to use.

>
> In other words, with a post-2.4 kernel, should I be putting the parameters
> into registers including EBP for "mmap"...or is it still "stick it in
> memory and pass a pointer to it in EBX", despite the changes to add EBP?

You should use the new mmap call, and place
parameters in registers.

>
> If it is the case that older calls use the older interface and newer calls
> use the newer interface then the question that stems from that is, of
> course, where is it documented with which kernel versions various syscalls
> were added? You know, so that I can work out which syscalls need the new
> interface and which are still on the old interface...

If you really want to, you can find out in which
kernel version a call was added. But it really
doesn't matter. A quick look in the kernel source
will tell you, what the interface of a call is.
I'll use arch/i386/kernel/sys_i386.c as an example.

asmlinkage int old_mmap(struct mmap_arg_struct *arg)
Obviously arguments are passed with a pointer to a
struct containing all the arguments. And old_ means
this call is deprecated, and there is a new one you
should use when writing new code.

asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
This is the new call, and takes 6 arguments in
registers.

>
> Although, in fact, any reference which explains which syscalls were added
> with which kernel version would be a useful thing anyway (you know, to know
> what syscalls are valid for which kernels and such more generally :)...

Most calls are addded in the order given by their
numbers. So entry.S will give you a hint.

--
Kasper Dupont

Gary Kato

unread,

Dec 18, 2004, 8:56:04 AM12/18/04

I'm not 100% sure, but one thing to watch for is that ebp seems only to be used
for that 6th parameter on kernels compiled for CPUs of Pentium II or later.
For kernels built for Pentium II or later (those with the SE extension to the
instruction set), system calls will use the "sysenter" instruction instead of
"int 80h".

This is from looking at the 2.6.9 kernel source code.

Tauno Voipio

unread,

Dec 18, 2004, 9:56:51 AM12/18/04

Beth wrote:
> Hi,
>
> I presume I've got the right place to ask for this...kind of "low
> level" stuff, though...
>
> Anyway...basically, programming Linux with assembly language using the "int
> 80h" interface directly, I'm confused by lots of different accounts of how
> it works on different webpages all over the 'Net...
>
> Apparently, with the 2.4 kernel (at least, I think it was that version
> number...if not, which one was it? ;), the EBP register was added as
> another register available for passing parameters to the "int 80h" system
> call...
>

Please do not call the kernel directly: you'll always get the
incompatibility rashes from the version changes.

If the shared libraries are not for you, link the assembly
code with the static version of libc, and use the libc calls.

HTH

Tauno Voipio
tauno voipio (at) iki fi

John Reiser

unread,

Dec 18, 2004, 11:51:58 AM12/18/04

Tauno Voipio wrote:
> Please do not call the kernel directly: you'll always get the
> incompatibility rashes from the version changes.

On the contrary, Linux has a much _better_ record of compatibility
at the kernel<->user interface using "int $0x80" than glibc-2.x
has with itself. Since glibc-2.0, there have been several cases
where a program that was dynamically linked with libc.so.6 has
failed to run _at_all_ on subsequent releases of glibc-2.y,
while in over 6 years I have never seen an incompatibility with
a bare-syscall executable on any subsequent version of Linux.

The fundamental reason why glibc has had compatibility problems
with "itself" is that /lib/ld-linux.so.2 and /lib/libc.so.6
[and libm, and libdl, and libnss*, ...] are not independent.
There is no unchanging ABI between them, they have each meddled
with the other's affairs, and users have suffered from various
bugs and/or "learning experiences" in the development and
maintenance of glibc. The existence and history of the
GLIBC_PRIVATE global symbol version is proof of this problem.
See http://BitWagon.com/rtldi/rtldi.html for a workaround.

The usual reason for seeming version problems between user C code
and the _kernel_ is that the C-language interface that most users
expect has cross-platform dependencies and evolution. The kernel
keeps maintaining 100% backward compatibility with "int $0x80",
and glibc-2 has not always "papered over" the differences adeptly.
Specifically: the layout of struct stat; the number of signals;
the sizes of uid_t, gid_t, and various time* types; all have
"impedance matching" in glibc to implement an illusion of uniform
cross-platform C runtime interface for a POSIX system. Most
user-written C code wants to use this "portable" interface instead
of the actual Linux kernel<->user interface, so executables built
from such code suffer the problems of glibc-2.x compatibility.
Code that is built to the actual Linux kernel<->user interface
just runs and runs without change on any subsequent Linux kernel
(and might not run on *BSD, Solaris, ...).

Beth

unread,

Dec 18, 2004, 12:23:22 PM12/18/04

Tauno Voipio wrote:
> Beth wrote:
> > Hi,
> >
> > I presume I've got the right place to ask for this...kind of "low
> > level" stuff, though...
> >
> > Anyway...basically, programming Linux with assembly language using the
"int
> > 80h" interface directly, I'm confused by lots of different accounts of
how
> > it works on different webpages all over the 'Net...
> >
> > Apparently, with the 2.4 kernel (at least, I think it was that version
> > number...if not, which one was it? ;), the EBP register was added as
> > another register available for passing parameters to the "int 80h"
system
> > call...
>
> Please do not call the kernel directly: you'll always get the
> incompatibility rashes from the version changes.

I think you didn't read down to the bottom of my post, where I stated that
I didn't require the "lecture" about "portability"...whether it's a good
idea or not more generally, in this particular case, I'm specifically
interested in this aspect...I'm studying it...

> If the shared libraries are not for you, link the assembly
> code with the static version of libc, and use the libc calls.

Yes, generally speaking...

But this "advice" and standard "lecture" can't logically apply to everyone,
can it?

For instance, if I was developing my own "libc" then you'd still advise I
call "libc"? And what does that "libc" call, another "libc"? Sorry, at some
point, one of those "libc" libraries has got to brave the "incompatibility
rashes" to actually communicate with the kernel for something to actually
happen...

And, also, of course, if I'm looking through the kernel source codes then
anything to do with this is "verboten" and I must close my eyes whenever
something related to that below "libc" appears in the kernel source code?
Does Linus also close his eyes when he's writing the kernel too? Lest he
see anything non-"libc"...

Or Embedded Linux where there's not enough RAM to support anything but
"bare essentials" (or, indeed, kicking out "libc" and other libraries could
lower the RAM requirements low enough to save on one more RAM
chip...reducing the price, selling more units, making more profits)?

Or use of Linux in (near) real-time applications where the speed of
response is critical? And I've measured it directly..."int 80h" shaved off
half a second from a 8 second long run - repeatedly calling syscalls a few
million times - in comparison to an identical program doing the same but
with "libc"...and "sysenter" probably speeds things up even more
substantially...yes, I know...what's half a second? Well, the answer to
that _SOLELY_ depends on what the applciation needs to do...for a desktop,
already taking many seconds to load up X applications, it's nothing...but
for a real-time system that needs to react quickly, it's an eternity _too
long_, quite possibly...

I appreciate what you're saying...really...but you can't really
legitimately dive in here and hand out "ivory tower" advice that there is
never any need ever at all in any way to need to know or use anything other
than "libc"...ever...

You don't know what the application is going to be for...you can't really
say what's right or wrong without first knowing about that...you could
have, indeed, just advised me, for example, to "use libc" when perhaps the
application I want to create _IS_ a light version of "libc"...such advice
is logically a little silly...you can't implement a "light libc" by
statically linking a non-light "libc" onto it...logical nonsense...

In this case, I need to know how it works (note, yes, not necessarily use
it, that's true...but _know_ how it works) because it's part of a project
to develop an x86 assembler with the relevent Linux support files
included...even if I never use it, I still need the details of how it works
to make sure that the assembler supports it...yes, it will also support
"libc" and that's probably the best choice more generally...but, in this
case, the objective is to create a functioning tool which supports all the
options and not force programmers using it that anything but "libc" is
"verboten"...that decision is for the programmers using the tool to make,
not its authors...

Beth :)

Grant Edwards

unread,

Dec 18, 2004, 1:13:12 PM12/18/04

On 2004-12-18, Beth <BethS...@hotmail.NOSPICEDHAM.com> wrote:

> For instance, if I was developing my own "libc" then you'd still advise I
> call "libc"? And what does that "libc" call, another "libc"? Sorry, at some
> point, one of those "libc" libraries has got to brave the "incompatibility
> rashes" to actually communicate with the kernel for something to actually
> happen...

Nope, it's turtles all the way down... ;)

--
Grant Edwards grante Yow! HUMAN REPLICAS are
at inserted into VATS of
visi.com NUTRITIONAL YEAST...

Tauno Voipio

unread,

Dec 18, 2004, 4:02:12 PM12/18/04

Beth wrote:

You'd say that you're trying to develop a replacement libc
and tell why you cannot use the standard one.

> And, also, of course, if I'm looking through the kernel source codes then
> anything to do with this is "verboten" and I must close my eyes whenever
> something related to that below "libc" appears in the kernel source code?
> Does Linus also close his eyes when he's writing the kernel too? Lest he
> see anything non-"libc"...

Of course, but then you are on your own and you have to
get the syscall sources of the kernel and figure it out
every time the kernel interface is changed.

> Or Embedded Linux where there's not enough RAM to support anything but
> "bare essentials" (or, indeed, kicking out "libc" and other libraries could
> lower the RAM requirements low enough to save on one more RAM
> chip...reducing the price, selling more units, making more profits)?

That does apply to the shared libc. In the static libc you'll get
what you're really using - you will need that much code anyway.

> Or use of Linux in (near) real-time applications where the speed of
> response is critical? And I've measured it directly..."int 80h" shaved off
> half a second from a 8 second long run - repeatedly calling syscalls a few
> million times - in comparison to an identical program doing the same but
> with "libc"...and "sysenter" probably speeds things up even more
> substantially...yes, I know...what's half a second? Well, the answer to
> that _SOLELY_ depends on what the applciation needs to do...for a desktop,
> already taking many seconds to load up X applications, it's nothing...but
> for a real-time system that needs to react quickly, it's an eternity _too
> long_, quite possibly...
>

For *real* real time, Linux is not the way to go. The internal
scheduling mechanism does not honor the interrupts and succeeding
thread/process switching fast enough. This is a design decision
to make much of the kernel internals non-reentrant.

> I appreciate what you're saying...really...but you can't really
> legitimately dive in here and hand out "ivory tower" advice that there is
> never any need ever at all in any way to need to know or use anything other
> than "libc"...ever...

Did you run with statically linked libc or the shared one?

> You don't know what the application is going to be for...you can't really
> say what's right or wrong without first knowing about that...you could
> have, indeed, just advised me, for example, to "use libc" when perhaps the
> application I want to create _IS_ a light version of "libc"...such advice
> is logically a little silly...you can't implement a "light libc" by
> statically linking a non-light "libc" onto it...logical nonsense...
>

Does your light libc produce less target code than the pieces of
statically linked standard libc? If yes, what's the difference?

> In this case, I need to know how it works (note, yes, not necessarily use
> it, that's true...but _know_ how it works) because it's part of a project
> to develop an x86 assembler with the relevent Linux support files
> included...even if I never use it, I still need the details of how it works
> to make sure that the assembler supports it...yes, it will also support
> "libc" and that's probably the best choice more generally...but, in this
> case, the objective is to create a functioning tool which supports all the
> options and not force programmers using it that anything but "libc" is
> "verboten"...that decision is for the programmers using the tool to make,
> not its authors...

It's not strictly verboten (or förbjudet in Linus' home language,
Swedish), but I tried to tell that you're risking a maintenance
problem. I've had more than my fair share of them during the time
I've been programming (since 1964).

---

What does your assembler offer above the standard tool of Linux, GNU as?

An assembler which does not support what is needed to make Linux
syscalls is hardly worth to be called an assembler - the interface
is very simple. The only thing maybe not needed in any other piece of
assembly code is the software interrupt (int 0x80).

I agree that for using on an i80386+ target, some of the libc
code might be lighter. This drops off the other host processors
also able to run Linux.

Before proceeding with the library project, have a look at
newlib from Red Hat <http://sources.redhat.com/newlib/>.
I'm using it with the GNU tools and ARM targets.

Tauno Voipio
tauno voipio (at) iki fi

> Beth :)
>
>

Gary Kato

unread,

Dec 19, 2004, 1:40:42 AM12/19/04

>kernels compiled for CPUs of Pentium II or later.
>For kernels built for Pentium II

:-P
Doesn't depend on compilation, the determination is made at run time.

Kasper Dupont

unread,

Dec 19, 2004, 7:35:24 AM12/19/04

Gary Kato wrote:
>
> I'm not 100% sure, but one thing to watch for is that ebp seems only to be used
> for that 6th parameter on kernels compiled for CPUs of Pentium II or later.

Are you sure? This sounds a bit strange to me.

> For kernels built for Pentium II or later (those with the SE extension to the
> instruction set), system calls will use the "sysenter" instruction instead of
> "int 80h".

You can use sysenter instead of int 80h, but
int 80h still exist. The reason for using
sysenter is better performance, though I don't
know exactly why there is a performance
difference.

With some kernel versions, you can just call
some high address (0xFFFFF000 I think), where
the kernel will have placed an appropriate
trapping instruction for your configuration.

--
Kasper Dupont

Kasper Dupont

unread,

Dec 19, 2004, 7:51:31 AM12/19/04

Tauno Voipio wrote:
>
> Please do not call the kernel directly: you'll always get the
> incompatibility rashes from the version changes.

You couldn't be more wrong. The int 0x80 interface
is the most stable interface in the entire system.

It is not portable to other architectures and
operating systems. But between two releases of
Linux nothing changes (except from new system
calls being added).

>
> If the shared libraries are not for you, link the assembly
> code with the static version of libc, and use the libc calls.

How do you create a truly statically linked
executable? Even if your executable doesn't
need /lib/libc.so.6 because that part of libc
is statically linked, libc will still load
other .so files using dlopen.

If I were to take your advice, I would not be
affected by changes in the user/libc interface.
However I would be affected by changes in the
kernel/libc interface, and by internal
interfaces between different parts of libc,
which is even worse.

--
Kasper Dupont

Kasper Dupont

unread,

Dec 19, 2004, 7:55:33 AM12/19/04

Beth wrote:
>
> Or use of Linux in (near) real-time applications where the speed of
> response is critical? And I've measured it directly..."int 80h" shaved off
> half a second from a 8 second long run - repeatedly calling syscalls a few
> million times - in comparison to an identical program doing the same but
> with "libc".

I'm surprised so much time is used by libc. Are you
really just using the libc system call wrappers
(documented in man chapter 2)?

Anyway the best way to speed up such an application
is not by trying to speed up the system calls, but
rather reducing the number of system calls.

For example stdio use buffering to avoid making a
system call for each character being read/written.

--
Kasper Dupont

Gary Kato

unread,

Dec 19, 2004, 9:26:50 AM12/19/04

>> I'm not 100% sure, but one thing to watch for is that ebp seems only to be
>used
>> for that 6th parameter on kernels compiled for CPUs of Pentium II or later.
>
>Are you sure? This sounds a bit strange to me.

I saw a comment about the 6th parameter in code that only seems to be called
when sysenter is used for system calls. I didn't see any such thing when int80
is used.

John Reiser

unread,

Dec 19, 2004, 10:53:32 AM12/19/04

> The int 0x80 interface is the most stable interface in the entire system.

but yesterday I found an incompatibility in Linux 2.6.9 with signal handlers
on x86. When calling rt_sigaction to specify a handler, now you must set
SA_RESTORER (0x04000000) in .sa_flags, and must set .sa_restorer appropriately
(which means &__restore_rt if SA_SIGINFO, else &__restore). The change is
motivated by a defensive security precaution which does not allow executable
code on the stack. If the call to rt_sigaction does not specify SA_RESTORER,
then Linux 2.4.20 will push the appropriate code on the stack, and set the
return address for the handler to point to that code. In 2.6.9 the user must
specify SA_RESTORER with .sa_restorer pointing to the user's own code,
else the signal handler will return to a garbage address [I see 0x420
for SIGALRM and 0x440 for SIGSEGV, but perhaps that is just random.]

--- /dev/null
+++ ./__restore.S
@@ -0,0 +1,10 @@
+#include <asm/unistd.h>
+
+__restore: .globl __restore
+ pop %eax
+ movl $ __NR_sigreturn,%eax
+ int $0x80
+
+__restore_rt: .globl __restore_rt
+ movl $ __NR_rt_sigreturn,%eax
+ int $0x80

Beth

unread,

Dec 21, 2004, 2:03:57 PM12/21/04

Kasper Dupont wrote:
> Gary Kato wrote:
> > I'm not 100% sure, but one thing to watch for is that ebp seems only to
be used
> > for that 6th parameter on kernels compiled for CPUs of Pentium II or
later.
>
> Are you sure? This sounds a bit strange to me.

Now you meet my problem trying to get answers for this...

> > For kernels built for Pentium II or later (those with the SE extension
to the
> > instruction set), system calls will use the "sysenter" instruction
instead of
> > "int 80h".
>
> You can use sysenter instead of int 80h, but
> int 80h still exist. The reason for using
> sysenter is better performance, though I don't
> know exactly why there is a performance
> difference.

From an assembly language point of view (this part I do know a little
better :), the "sysenter" has better performance because it's a CPU
instruction delibrately introduced to improve user -> kernel space
transitions...I don't know the exact details but it likely relates to
cutting down on the protection "checks" and hard-wiring dedicated
capabilities exactly to speed it all up...you know, a bit like a video card
draws a polygon faster than the main CPU can do (even when technically the
GPU is as capable a chip as the main CPU), just because it's "hard-wired"
in the hardware itself..."dedicated" circuitry does tend to perform faster
than "general-purpose" circuity (because it can be "hard-wired" to do the
specific job with "checks" and such by-passed)...

One of the slowest instructions turns out to be "int" (especially when also
doing user -> kernel stuff because "protections", "privilege level checks",
changing from the "user stack" to the "kernel stack" and so on and so forth
goes on...lots of "overhead")...remembering that "int" isn't solely for OS
API but is really a part of the interrupt system for responding to hardware
(in a sense, it's a kind of "hack" to re-use the same interrupt system for
providing OS functions by having the "int" instruction to generate a
software-based interrupt (a "faux IRQ", so to speak ;)...the mechanisms are
really dedicated to hardware IRQs primarily but Intel added "software
interrupts" to re-use its "run-time relocation" stuff for BIOS / OS
functions)...

Well, these days, they can spare some "die space" to add in a more
"dedicated" OS kernel calling mechanism and that's what "sysenter" /
"sysexit" represent...it actually has always been a rather slow method to
go the OS kernel...you could say "about time they fixed that up",
really...I suppose as OSes changed design from DOS to Linux / Windows, this
became more important (e.g. in DOS, you can do an awful lot without the
"kernel" being involved...indeed, many things DOS doesn't support at all
and you _must_ ignore it and handle it yourself...but with the more modern
multi-user, multi-tasking architectures, this is no longer practically
possible (imagine two tasks multi-tasked alongside each other, both trying
to "direct access" the same piece of hardware at the same time...they'd get
in each other's way and likely crash the system), so everything has to go
via the kernel and drivers...hence, as I say, the performance of the
user -> kernel calling mechanism has become more important as time has gone
on because, these days, you do nearly _everything_ via the kernel and often
are completely prohibited from doing it any other way...in fact, this
change has actually made assembly language only fractionally lower level
than C these days...just like a C program, it's mostly calls to kernel
functions...only the "bits in-between" are different that you can actually
access the CPU registers and instructions directly...no, not to say it's as
simple as C but it's not as complicated as it once was...the OSes
themselves have become "high-level" and changed that situation)...

> With some kernel versions, you can just call
> some high address (0xFFFFF000 I think), where
> the kernel will have placed an appropriate
> trapping instruction for your configuration.

Another way again? Though, "CALL" does perform better than "INT" (because "
INT" is more or less the same as "CALL" (via a "jump table") but with extra
"overhead" for the interrupt system...pushing the processor "flags"
register and a few extra things like that)...

This, in a sense, isn't surprising from the CPU side of things because
using "INT" never was the speediest way to do things (one of the slowest
instructions because it has to do so much)...Microsoft switched from "INT"
in DOS to using "CALL" for Windows (which required more linking / loader
overhead but actually performs better at "run-time"...basically, "INT" does
"relocation" at run-time...instead, they went for a scheme where the linker
/ loader deals with the "relocation" issues (it pre-calculates a "jump
table" during loading :) and then the leaner "CALL" instruction via this
"jump table" could be used...cuts out some of the extra "overhead" attached
to using "INT")...

To be honest, I was always rather surprised that Linux used "INT",
anyway...this is usually considered to be "not recommended" in a
multi-tasking system and not the speediest way to do it (though, it might
be the _easiest_ way to do it...so perhaps Linus was simply thinking of how
to get it done quickly, not what would necessarily be the "leanest and
meanest" performance-wise)...

Intel appears to have also recognised that though their "protections" are
great devices, they can be terribly slow (you know, while typical
instructions take 1 clock cycle - or are even run in parallel with other
instructions to effectively take 0 clock cycles, so to speak - these
"transitions" can take, worst case, _100s_ of clock cycles - quite a
difference, performance-wise, eh? - and aren't particularly "nice" to
things like the cache either...as I say, of the CPU instructions, "INT" is
around the worst there is...and "CALL" is also bad when either of the two
instructions are triggering a "privilege level" change...there's all the
"protection checks" and swapping registers and stacks...the CPU literally
re-arranged nearly everything)...the introduction of "sysenter" / "sysexit"
was to remedy one of the parts of the CPU which wasn't particularly
impressive...I haven't looked at the exact details of these instructions
(they are newer additions and I haven't even learnt about MMX / SSE
properly yet ;) but the basic idea is simply that these are dedicated
"kernel_call" instructions, so to speak...designed specifically to cut out
as much crap as possible getting from user to kernel space...from using C,
the significance of just how bad these instructions are can be missed
(okay, when I say "bad", this is, of course, _relative_...with 3GHz
machines and such, it's still blindly fast...but _in comparison_ to all the
"optimisations" they've made elsewhere on the CPUs, they've kind of
"forgotten" about this part for too long already ;)...

Beth :)

Gary Kato

unread,

Dec 21, 2004, 8:46:08 PM12/21/04

I think using "Int", a software interrupt, was a pretty standard way of making
system calls before there was such a thing as the 8086. Not many CPUs had a
special instruction like "sysenter". Maybe the VAX did. I'd have to look that
up.

>> With some kernel versions, you can just call
>> some high address (0xFFFFF000 I think), where
>> the kernel will have placed an appropriate
>> trapping instruction for your configuration.

This looks to be true (though I'm not sure of the address). The actual Int80
and Sysenter instructions actually reside in their own page.

I was looking into all this yesterday (and stil not fully understanding
everything). I got thrownoff the scent by the "syscallN" macros in unistd.h.
These seem to hardcode an Int80h into the start of the system function. When I
saw them, it was quite confusing. However I couldn't find anything that
actually used them (except one line further down in the same file that used it
to define execve). I commented them out and did a kernel build and sure enough,
there were no complaints. The first time, I had commented out the execve one
and the buld issued warning messages just for that.

So, those macros were just a false trail, except for execve. Now I'm wondering
why execve is the only system call forced to use Int80, even on CPUs with
sysenter. Is there a real issue here or did someone just forget to update it?

Beth

unread,

Dec 21, 2004, 11:17:14 PM12/21/04

Gary Kato wrote:
> I think using "Int", a software interrupt, was a pretty standard way of
making
> system calls before there was such a thing as the 8086.

Other way around; Fixed memory maps, fixed OS addresses on fixed ROM chips
with fixed hardware configurations...that's how things used to be...using
"trap" and "int" - what Intel call "software interrupts" and Motorola
prefer to call "traps" - came along later...

Something like the 8-bit Commodore 64 only had one IRQ, shared by
everything...no interrupt or I/O commands, just memory mapped hardware
access...when the 68K and x86 appeared, _that's_ when it become more
"usual" for desktop machines (which used to be called
"(home)microcomputers" ;) to use things like "int" and "trap" for OS
access...

> Not many CPUs had a special instruction like "sysenter".

_NO_ CPUs had special instructions like "sysenter"...not even the x86 range
had it until recently, as it's a relatively new instruction...

> Maybe the VAX did. I'd have to look that up.

Perhaps the VAX...but I doubt it..."sysenter" has been added to address a
pretty _specific_ problem that the x86 family suffers from...

> >> With some kernel versions, you can just call
> >> some high address (0xFFFFF000 I think), where
> >> the kernel will have placed an appropriate
> >> trapping instruction for your configuration.
>
> This looks to be true (though I'm not sure of the address). The actual
Int80
> and Sysenter instructions actually reside in their own page.

The "int" and "sysenter" instructions reside in the _CPU itself_...what are
you talking about here? Or do you mean what the instructions reference?

> I was looking into all this yesterday (and stil not fully understanding
> everything). I got thrownoff the scent by the "syscallN" macros in
unistd.h.
> These seem to hardcode an Int80h into the start of the system function.
When I
> saw them, it was quite confusing. However I couldn't find anything that
> actually used them (except one line further down in the same file that
used it
> to define execve). I commented them out and did a kernel build and sure
enough,
> there were no complaints. The first time, I had commented out the execve
one
> and the buld issued warning messages just for that.
>
> So, those macros were just a false trail, except for execve. Now I'm
wondering
> why execve is the only system call forced to use Int80, even on CPUs with
> sysenter. Is there a real issue here or did someone just forget to update
it?

No idea...although, I bet the answer is probably "compatibility" with
something for some reason, as that's usually the culprit for strange
looking code in 90% of cases ;)...

Beth ;)

Gary Kato

unread,

Dec 22, 2004, 12:33:12 AM12/22/04

>> I think using "Int", a software interrupt, was a pretty standard way of
>making
>> system calls before there was such a thing as the 8086.
>
>Other way around; Fixed memory maps, fixed OS addresses on fixed ROM chips
>with fixed hardware configurations...that's how things used to be...using
>"trap" and "int" - what Intel call "software interrupts" and Motorola
>prefer to call "traps" - came along later...

No. CP/M-80 used the RST instruction for system calls. I'm pretty sure the IBM
360 and DEC-10 also used similar instructions for OS calls. Early mini and
mainframes and micros used fixed memory addresses when things were simpler. The
problem is that when the OS changed, you had to recompile everything (or patch
the OS so the memory addresses were fixed). Then people get smart and figure
out that there's a better way.

>The "int" and "sysenter" instructions reside in the _CPU itself_...what are
>you talking about here? Or do you mean what the instructions reference?
>

What I'm saying is that Linux allocates a page that just has an Int 80
instruction in it and a page with just a Sysenter instruction in it. I believe
these pages are what were being refered to as being able to make a system call
at some high address. (Maybe)

>No idea...although, I bet the answer is probably "compatibility" with
>something for some reason, as that's usually the culprit for strange
>looking code in 90% of cases ;)...

Yes, I'm kind of curious about what that compatability problem would be.

Beth

unread,

Dec 22, 2004, 2:31:29 AM12/22/04

Tauno Voipio wrote:
> Beth wrote:
> > For instance, if I was developing my own "libc" then you'd still advise
I
> > call "libc"? And what does that "libc" call, another "libc"? Sorry, at
some
> > point, one of those "libc" libraries has got to brave the
"incompatibility
> > rashes" to actually communicate with the kernel for something to
actually
> > happen...
>
> You'd say that you're trying to develop a replacement libc
> and tell why you cannot use the standard one.

Not necessarily; I might consider it none of your business...

And there are a number of "light libc" variants out there, if you go
look...generally used, I would assume to get "bare minimum" libc
functionality for small size and possibly better performance when all the
"bells and whistles" aren't needed...indeed, much of "libc" is often
written in C itself (except for any bits which might necessitate using some
small amount of assembly language...such as using "int 80h" or
"sysenter")...C compilers are generally good but they aren't completely
perfect in producing code (indeed, the term is "optimising compiler", NOT
"optimal compiler" with good reason)...a "light libc" could improve on
this - in terms of reducing the code size and perhaps improving the
performance a little - by cleaning up any redundencies the compiler may
have produced...

> > And, also, of course, if I'm looking through the kernel source codes
then
> > anything to do with this is "verboten" and I must close my eyes
whenever
> > something related to that below "libc" appears in the kernel source
code?
> > Does Linus also close his eyes when he's writing the kernel too? Lest
he
> > see anything non-"libc"...
>
> Of course, but then you are on your own and you have to
> get the syscall sources of the kernel and figure it out
> every time the kernel interface is changed.

"Backwards compatibility"; The interface is added to but previous
interfaces must surely remain the same or, simply, older programs would
cease to function...

> > Or Embedded Linux where there's not enough RAM to support anything but
> > "bare essentials" (or, indeed, kicking out "libc" and other libraries
could
> > lower the RAM requirements low enough to save on one more RAM
> > chip...reducing the price, selling more units, making more profits)?
>
> That does apply to the shared libc.

No, it doesn't...sharing a library means that one copy can be re-used by
many processes...it doesn't make the shared library smaller on disk (or, in
this case, in RAM or on ROM)...on the contrary, shared libaries require
extra "relocation" information for linking...they require loader code that
knows how to interpret this and perform the "relocation" and run-time
linking...

And, anyway, I was talking about _NO_ "libc" whatsoever...no version of
"libc" - static, shared, standard, light, etc. - could be smaller than "no
libc"...that would be slightly illogical...

> In the static libc you'll get what you're really using - you
> will need that much code anyway.

How do you know I need that much code?

For instance, as we're talking "libc", let's consider the "format string"
for "printf"...it has code for parsing the format string, dealing with
variable length arguments based on that string, floating-point formats,
hexadecimal, width, precision, exponential notation, string concatentation,
etc., etc....that's one powerful - but, by extension, _COMPLEX_ - function
we've got there...

And the selective linking of a static library is not fine-grained below the
individual "modules" added to the library...hence, if you use "printf",
even if only to print "Hello, world!" and not take advantage of width or
precision or hexadecimal or floating-point or exponential notation, the
whole lot is linked in, anyway...because it can't cut a function into
pieces, it includes the whole thing (and anything "printf" references is
implicitly included too)...

On the other hand, with assembly language:

----------------------------------------------

String db "Hello, world!", 0Ah

mov eax, 4 ; sys_write
xor ebx, ebx ; stdout
inc ebx
mov ecx, String ; address of string
mov edx, 14 ; length
int 80h

----------------------------------------------

...we've got a grand total of 34 bytes, 14 of which are the data to be
printed itself...you know of any "libc" implementation that can compete
with 20 bytes (and, to be honest, you could probably optimise this further,
if you wanted to be fussy)?

You might say "yes, but this is not a direct comparison...printf can do so
much more than just write simple text like this, with floating-point and
exponential and hexadecimal and so forth"...

But the point here is that this program _DOESN'T_ use such things, it only
needs to print simple strings...maybe an integer or two as well (the code
for which can be condensed into a similar insignificantly small amount of
bytes :)...but no need for hexadecimal or floating-point or exponential
notation or any "width" or "precision"...including "printf" would get you
all these things, whether you used them or not...

This is the essential difference...you're talking about trying to take
things out...suppress "libc" including this or that...there's redundency
and you're trying "tricks" to reduce it...on the other hand, what I'm
talking about is starting with NOTHING and then only adding in what you
_STRICTLY_ need...avoiding redundency is child's play with this reverse
paradigm...I mean, you're not going to, oops, I slipped with the keyboard
and "accidentally" typed in 3KB of needless functionality that the program
doesn't need...

And this essential difference means that however "optimising" your compiler
gets, it can always be beaten by an enterprising human being who knows what
they are doing...yes, I said "always" and I'm not backing down from
that...it's inherent and it's logically inevitable...

The problem with assembly language lies in _one_ problem alone: Takes
longer to code...yes, writing your own "just what I need" version of printf
is not going to be as simple as "#include <stdio.h>" and just using the
"printf" in "libc"...

But, on the other hand, you only need do it _once_...amazing, really, how
people advocate "code re-use" and "libraries" with C coding and then seem
to completely forget about it for assembly coding...write your little
"value to hexadecimal ASCII" routine, write your little "floating-point"
routine and then place them into your _OWN_ "library" of useful
routines...then you only need write it once and can then call the library
routine any time you like...if you think of a way to improve it, then, fix
up the library routine and, just like C, re-compile with the new library of
improved routines...

Use of libraries - static or shared - is completely programming language
neutral, folks...why it's a clever idea for C is exactly the same reasoning
why it's clever for assembly language...it's NOT assembly language's fault
if people's brains suddenly turn to mush when programming assembly language
and they somehow "forget" about _GENERAL_ good programming practice of
code re-use, libraries and so forth...

All "libc" is, in fact, is someone else's "library" of useful routines
already written...

Indeed, take away "libc" and C's "portability" is a dubious claim...C minus
its "standard library" is _ONLY_ "CPU portable"...it gains "OS portable"
from the use of a library of "standard" routines (defined to be exactly the
same regardless of OS...the "printf" function is still the same form on
Windows or on Linux...the code inside it, mind you, is OS specific but the
"interface" to the function is "standardised" to be identical on all
OSes)...

And, in this regard, check out "HLA" by Randall Hyde (and its accompanying
"Art of Assembly" textbook :)...it's a "high-level assembler"...meaning
that it's an assembly language at its core but is supplemented by most of
the typical "creature comforts" from high-level languages...this includes a
C-style "standard library", which - along with HLA itself - has been ported
to Linux as well as Windows...if one is careful to use just the "standard
library" then you can re-compile an _assembly program_ on Linux or Windows
_without modifying the source code at all_ (being an assembly language,
it's not "CPU portable" - only does x86 code - but the standard library
provides "OS portable"...Linux and Windows are supported already and its
author has indicated enthusiasm to port it all to QNX, BeOS and so forth
too, time permitting)...indeed, this HLA compiler offers OOP support (not
available in C), has a syntax more akin to Ada or Pascal, nested procedures
(not available in C), iterators (not available in C), exception handling
support (used throughout the standard library...not available in C, though
it does appear in C++ :) and has a whole host of built-in macros for "if",
"while", "for", etc. that covers more than the handful of "flow control"
keywords offered by C...

Of course, this won't always be useful in all circumstances...if you need
portability to an iMac...well, they don't use x86 CPUs so this won't be
useful...BUT, if it meets your requirements, this HLA system offers support
from the lowest of the CPU's actual machine instructions up to support that
exceeds C...and you can select at which level you want to take things (so,
unimportant functions can be written in its more "high-level" mode...any
time-critical procedures can be written in assembly language :)...

Basically, you know how C compilers offer "inline ASM"? This HLA system is
kind of working in reverse in being an assembler that offers a degree of
"inline HLL"...of course, if you need more "high-level" than "low-level", C
still makes sense...but if the "bias" is in reverse, this HLA package could
better suit the requirements...

> > Or use of Linux in (near) real-time applications where the speed of
> > response is critical? And I've measured it directly..."int 80h" shaved
off
> > half a second from a 8 second long run - repeatedly calling syscalls a
few
> > million times - in comparison to an identical program doing the same
but
> > with "libc"...and "sysenter" probably speeds things up even more
> > substantially...yes, I know...what's half a second? Well, the answer to
> > that _SOLELY_ depends on what the applciation needs to do...for a
desktop,
> > already taking many seconds to load up X applications, it's
nothing...but
> > for a real-time system that needs to react quickly, it's an eternity
_too
> > long_, quite possibly...
>
> For *real* real time, Linux is not the way to go. The internal
> scheduling mechanism does not honor the interrupts and succeeding
> thread/process switching fast enough. This is a design decision
> to make much of the kernel internals non-reentrant.

Agreed; Linux is not meant to be "real-time" in its design...but then that
was the reason why I inserted the "(near)" above...but it could still be
"good enough", depending on what exactly it's attempting to be "near
real-time" with...you can also "tweak" the scheduling to a degree...it may
not be true "real-time" but it might still be sufficient...all depends what
the application specifically is...

Remembering that you might prefer Linux for its other assets..."free
software", "open source", etc....good real-time systems tend to be
_commercial_ good real-time systems...if you can "tweak" Linux to be "good
enough" to do the job, then you might just be able to knock off a few more
pennies from the manufacturing costs...reduce the price of the product a
few dollars...if the product was only a few dollars to begin with, then you
could just have made the product much more "affordable"...you sell more
units, you make more profits...

In an embedded system, you've got a specific configuration, you can afford
to be more "specific" in your coding...do we really need the software in a
Barbie doll to be "portable" to a VAX or a washing machine? If you take
advantage of these things, then you might be able to halve the RAM
requirements (so you're buying one less RAM chip or a cheaper smaller
capacity RAM chip...and that drops the manufacturing costs)...if you can
code it to perform well, then it's possible that you might not need an 8086
chip in the Barbie and a lesser 6502 or Z80 could equally do the job
because instead of swamping the problem with "hardware" (which _costs_ as a
solution), you swamp it with "software smarts"..."maximise" the usage of
what you've got (which can be "lesser" exactly because you're maximising
the usage of it)...

For example, it might not quite compare to the PC but there are "demos" out
there for the 1MHz 8-bit Commodore 64 which do simple "bump
mapping"...yeah, you know, "bump mapping"...the "remarkable" thing that
makes DOOM 3 such "hot property"...

Or, if you've got Gates' toy installed somewhere, you could try out the
following demos:

http://www.theproduct.de/

Which do quite a lot from a 64KB file that makes you re-think that Sir
Gates might have actually had a point when he said 640KB was more than
enough for anybody ;)...

> > I appreciate what you're saying...really...but you can't really
> > legitimately dive in here and hand out "ivory tower" advice that there
is
> > never any need ever at all in any way to need to know or use anything
other
> > than "libc"...ever...
>
> Did you run with statically linked libc or the shared one?

Neither; The "light libc" was just an example of something I might have
been doing...I'm not actually doing that, though...

Depending on requirements, I might not use any "libc" at all in some
circumstances...static other times...shared other times again...

Perhaps you're not getting my point here...what I'm exactly saying here is
that the _APPLICATION_ defines what's right, needed and required...it's a
"case by case" thing...there is no "generic" one-size-fits-all advice that
could apply to all situations...you might want to say "using libc is
usually a good idea" and I'd tend to agree, by and large, for most
non-trivial programs...but you can't say "always use libc"...it may or may
not be appropriate...the _APPLICATION_ defines what is and isn't the
correct thing to do, not some "generic theory" from some academic's
"textbook" being dictated _BLINDLY_ from an "ivory tower"...until you know
what the application is and what it requires, it's not possible for anyone
to be qualified to say what should or shouldn't be used...

I mean, this would be like saying "use a spanner" to any question a builder
might pose to you: "what should I use to bang in this nail?" / "use a
spanner"..."I need to plaster this wall, what would you recommend?" / "use
a spanner"..."I need to saw this piece of wood in two" / "use a
spanner"...realise that this same "knee jerk" one-size-fits-all advice of
"use libc" is just as silly and illogical a response...I'm not saying
"always" use something or "never" use something...I've used "libc" many a
time...I'd happily recommend its use where that makes sense...but _BLINDLY_
insisting on it hell or high water is just illogical and a touch silly,
really...

> > You don't know what the application is going to be for...you can't
really
> > say what's right or wrong without first knowing about that...you could
> > have, indeed, just advised me, for example, to "use libc" when perhaps
the
> > application I want to create _IS_ a light version of "libc"...such
advice
> > is logically a little silly...you can't implement a "light libc" by
> > statically linking a non-light "libc" onto it...logical nonsense...
>
> Does your light libc produce less target code than the pieces of
> statically linked standard libc? If yes, what's the difference?

I have not actually coded a "light libc" in reality...it was just an
"illustrative example" in this case...but, trust me, if I were to do so,
then I'd happily take a bet that I could improve on it, assuming certain
assumptions could be made (such as "only needs to work on Linux on an x86
system" and so forth)...and, also, of course, being given a reasonable time
in which to do it (at least comparable to how long the standard "libc" has
had in its development)...

The "difference" is remarkably simple, really...I'd blitz the redundency
with the one asset a compiler doesn't have: _UNDERSTANDING_ of what the
code is meant to do...and, with the "assumptions" of a particular platform,
to eradicate the redundencies introduced by "portability" concerns, where
not applicable (to code "specifically")...

And, note, I actually could not fail to equal the standard "libc", if I
don't better it...how can I be so confident? Simple, check out the "-s"
switch on a C compiler...I could do the following: Compile the standard
"libc" to assembly language, review that code for places where it could be
improved, make those changes, verify it is improved...if not, "roll back"
and look elsewhere for improvement...continue process until I've made as
many improvements as I can...

The compiler doesn't stand a chance in this contest, you see, because I can
_cheat_...the compiler is just a dumb program, in the end, so it cannot do
the same in return...I can _look_ at what the compiler produces, analyse it
against what is _strictly_ required...if the compiler has produced any
"redundency" (and, trust me, they do...oh yes, they do...check a dictionary
if you don't understand that "optimising compiler" does NOT mean "optimal
compiler" ;), I blitz it out of there...and there are a number of
improvements possible _below_ the C level, which cannot even be expressed
in C, let alone implemented...

For example, many a compiler has "proprietary extensions" which exceed the
C standards...an interesting one is provision for some "_fastcall" calling
convention which passes function parameters in registers rather than via
the stack...as noted, if we can assume a _specific_ platform, then
"portability" isn't a concern...we can then make use of these "proprietary
extensions"...revise the header files to declare the functions "_fastcall"
rather than standard C convention (where recursion is not a concern...the C
standard convention applies this to all functions regardless...practices a
"one size fits all" generic solution...we can improve by simply tailoring
it to be _specific_ to each function, _specific_ to a platform, _specific_
to the fuller capabilities offered by the compiler)...

For a fuller discussion on this topic, here's a set of essays on the
subject which highlight some of the kinds of things to consider:

http://webster.cs.ucr.edu/Articles/GreatDebate/index.html

> > In this case, I need to know how it works (note, yes, not necessarily
use
> > it, that's true...but _know_ how it works) because it's part of a
project
> > to develop an x86 assembler with the relevent Linux support files
> > included...even if I never use it, I still need the details of how it
works
> > to make sure that the assembler supports it...yes, it will also support
> > "libc" and that's probably the best choice more generally...but, in
this
> > case, the objective is to create a functioning tool which supports all
the
> > options and not force programmers using it that anything but "libc" is
> > "verboten"...that decision is for the programmers using the tool to
make,
> > not its authors...
>
> It's not strictly verboten (or förbjudet in Linus' home language,
> Swedish), but I tried to tell that you're risking a maintenance
> problem. I've had more than my fair share of them during the time
> I've been programming (since 1964).

Oh dear...we're "pulling rank" now, are we? I wasn't even born in 1964...

Nevertheless, perhaps I am risking a maintenance problem...but since when
did anything come for free in this life? I am risking a potential horrible
car crash every time I get into a car...I am risking being hit by that
proverbial bus every time I step into the road...

So, what do I do? Buy a lot of cotton-wool and wrap myself up in it? Oh
dear, the nearest store where I can buy food requires crossing a road or
two...and never using any transport (not just cars but trains, planes,
ships, etc.) ever again?

Let me ask this...no need to answer it publicly but at least answer it
honestly to yourself...since 1964, did all of your maintenance issues
instantly disappear the second you started using "libc"? This may be worse,
granted...but by how much? Let's say Linus completely alters the system
call interface every time the second digit of the version number changes
(2.4 -> 2.5 -> 2.6 and so on :)...this doesn't remotely actually happen, of
course (indeed, it is _just as stable_ an interface as any other)...but
let's assume it does...I have to change my "syscall" macro (anticipating
the "maintenance problem", you see, I have delibrately separated it off for
easier maintenance :) each time...this doesn't happen but were it to
happen, am I willing to pay that cost for the benefits acrued?

Maybe...maybe not...but, again, there is no "one size fits all"
answer...your "libc" doesn't eradicate "maintenance nightmares"...ooh, far
from it...

Being a Linux guy, you might never have visited Windows Update...but
Microsoft's little "toy" has a website for updating it with "patches" and
such...this site could be legitimately renamed "the Microsoft buffer
overrun patch download site"...nearly every week almost, Microsoft are
issuing a "buffer overrun exploit" warning and a "patch" for it...indeed,
they recently had the glorious nightmare of their GDI+ having a "buffer
overrun exploit" in the _JPEG decompression_ routines...yeah, you could
take over a machine simply by web browsing...perhaps Microsoft's most
glorious cock-up because it's not just that this routine is monumentally
screwed up...it's that GDI+ is a "redistributable"...hence, there are
copies of GDI+ all over the place - on software CDs, the DLL file repeated
in numerous ZIP files all over the hard drive, etc. - and though good
software should be checking "version numbers", quite a few don't really do
the job as good as they should...hence, you could apply the "patch",
install some more programs then discover you need to "patch" it once
more...install some more programs...oh, crap...it needs to be "patched" yet
again! Okay, most programs use "InstallShield" and that standard
installation program has the sense to respect "version numbers" so not
every program you install is necessarily going to do it...maybe a
rarity...but, still, it's a possibility...a possibility that should never
have happened...

Because, of course, "buffer overrun exploit" code is a completely
_preventable_ problem...put limits on all your buffers and be careful what
you stick onto the stack and you can make sure you're not stung by this...

But the chief cause for all these "buffer overrun exploits" can be traced
to your precious "libc"...because in "libc" are a number of routines which
keep going until they hit a zero terminator (and nothing else stops
them)...all it takes is a programmer not aware or being lazy to feed some
kind of "external input" - a file, something off the internet, etc. - into
one of these "dangerous" routines and you're well on the way to getting a
"buffer overrun exploit" of your own in your code...

Namely, "strcpy" (instead, use "strncpy" and set "n" to make it terminate
within the bounds of your "buffer")..."strcat"...basically, all the
routines that also have "n" counterparts...Ritchie went for zero
terminators because they were easy...not so much thought went into what
this meant for "buffer overruns"...there are some distinctly "dangerous"
routines happily sitting in "libc" that really should be ejected (but they
won't...good old "backwards compatibility" forcing us to live with our
mistakes 3 decades or so after we make them ;)...

The amount of instances of "buffer overrun" - often directly attributable
to "libc" routines - has ended up as one of the most serious and persistent
"maintenance problems" there currently is...Windows is more "buffer
overrun" than it can be called an OS...I recently spotted a "bug report" to
the NASM developers list...someone had found a "buffer overrun" or two in
their C / "libc" source code...Zlib - used by most OSes, causing a minor
panic when discovered that didn't just include Windows but Linux and OS X
and such too - had a few nice examples in it...

And not only is this "maintenance nightmare" usually the result of calling
"dangerous" libc routines without thinking (directly _attributable_ to the
use of "libc"...with a strong likelihood that NOT using "libc" would have
resulted in the problem never happening), it's a _SERIOUS_ maintenance
problem...you can't easily excuse a typical "buffer overrun" as something
you can get around to fixing in a few versions' time...when found, there
usually results a minor "scramble" to get it corrected with a
"patch"...because a "buffer overrun" has the potential to allow an outside
intruder to compromise the entire program and even compromise the entire
system (if you're running as "root" when it hits then, boy, you could find
yourself in deep elephant dung up to your neck) and to do so with basically
_no obstacles_ in their way to doing so but feeding delibrately overly
large output with a nice little malicious design to overwrite your code
with anything they desire and then they can sit back and wait for the CPU
to hit the "RET" instruction at the end of the procedure, gaining complete
control and execution from that point onwards..."libc" makes this problem
far too easily done and its consequences can be considered a very serious
"maintenance nightmare"...because, typically, you can't wait to fix
it...you need that "patch" quickly before some internet virus wipes out
every one of your users' machines...

And, though it could be considered highly "typical" of Microsoft and their
coding policies and practices, the amount of "buffer overrun" problems in
Windows is immensely worrying...they are issuing "patches" on a regular
basis...it's almost as if _every_ buffer in that several MB monster has an
"exploit" waiting to be discovered...you'd think they'd learn but, no, this
JPEG / GDI+ problem is quite recent...GDI+ was introduced with XP...they
aren't learning their lessons whatsoever...and unless you're a bit more
cautious with your "libc" recommendations, you could be accused of much the
same...

Tip: Put together a "variant" of your "libc" and leave out "strcpy",
"strcat" (retain "strncpy" and "strncat" instead :), then
re-compile...check out those "unresolved external" errors and see if you
had a "buffer overrun" or two that you missed...and then use this new
library and never use the original ever again...that'll force you to always
have to supply "n" for each and every buffer, reminding you not to
accidentally forget to limit those buffers...

Maintenance problems are solved by _FORETHOUGHT_ and _UNDERSTANDING_, not a
library nor a programming language...the use of libraries has "cons" as
well as "pros", you know...bugs are caused by laziness, complacency, bad
logic, etc....things a compiler can't guarantee that it can catch and help
you with (logical errors are usually completely beyond the compiler's
ability to detect)...

And what I'm saying to you is, simply, _education_ is the key..._know_
where the problems come from, _understand_ how to implement - not only
use - "portability" features, _find out_ what that "hidden" extra code the
compiler adds actually is, etc., etc....

Put it this way...if a Frenchman walked up to you and said "French is a
superior language...people who speak French never make any mistakes", then
you'd rightly consider them a touch "blinkered"...well, why should always
using a particular programming language be any better than always using a
particular natural language?

Our problems do not lie in languages or libraries...nor in syntactical
errors the compilers are qualified solely to catch...it lies in the
semantics and the logic and the habits of the programmers themselves...

By all means, use "libc"...but recognise that when they made out Java was
"the Second Coming", it was mostly hype...or when they made a big fuss of
the fashion of C++ OOP, it wasn't the panacea the hype promised...".NET" is
more disaster than anything else but you'll never hear that coming out of
Microsoft "PR", even if they themselves know it to be true...indeed, when
the TV commercials suggest that using a certain brand of shampoo will have
supermodels falling at your feet, be wise to be skeptical...use your "libc"
but _KNOW_ what you're actually getting yourself into...

Beth :)

Beth

unread,

Dec 22, 2004, 2:31:32 AM12/22/04

Gary Kato wrote:
> Early mini and
> mainframes and micros used fixed memory addresses when things were
simpler. The
> problem is that when the OS changed, you had to recompile everything (or
patch
> the OS so the memory addresses were fixed). Then people get smart and
figure
> out that there's a better way.

Yeah...and that's what I was saying...special OS calling instructions (or
using the interrupt mechanism to deal with this "relocation" issue) came
along _later_...the early stuff simply worked to fixed configurations and
were hard-wired that way (then again, the situation was different...the
hardware was typically fixed and the OS on a ROM chip, not in a disk
file...you didn't "recompile", you bought the next "model" up in the range
:)...

> >The "int" and "sysenter" instructions reside in the _CPU itself_...what
are
> >you talking about here? Or do you mean what the instructions reference?
> >
>
> What I'm saying is that Linux allocates a page that just has an Int 80
> instruction in it and a page with just a Sysenter instruction in it. I
believe
> these pages are what were being refered to as being able to make a system
call
> at some high address. (Maybe)

Hmmm, maybe...yeah, I'm curious as to what that's all about too...can't
really think as to why Linux would want or need that...oh, I'm sure there
is a reason...but, like you, I'm at a complete loss to work out what it
could be...

> >No idea...although, I bet the answer is probably "compatibility" with
> >something for some reason, as that's usually the culprit for strange
> >looking code in 90% of cases ;)...
>
> Yes, I'm kind of curious about what that compatability problem would be.

Well, I can't help you there...this seems as "mysterious" to me, as it
probably does to you...indeed, if you do find out, feel free to let me
know, if only to satisfy my curiosity ;)...

Beth :)

Kasper Dupont

unread,

Dec 22, 2004, 2:40:38 AM12/22/04

Gary Kato wrote:
>
> This looks to be true (though I'm not sure of the address). The actual Int80
> and Sysenter instructions actually reside in their own page.

The reason for doing it that way is, that glibc
can just call this address without having to know
if int 80h or sysenter is to be used on the
particular architecture.

>
> I was looking into all this yesterday (and stil not fully understanding
> everything). I got thrownoff the scent by the "syscallN" macros in unistd.h.
> These seem to hardcode an Int80h into the start of the system function. When I
> saw them, it was quite confusing. However I couldn't find anything that
> actually used them (except one line further down in the same file that used it
> to define execve). I commented them out and did a kernel build and sure enough,
> there were no complaints. The first time, I had commented out the execve one
> and the buld issued warning messages just for that.
>
> So, those macros were just a false trail, except for execve. Now I'm wondering
> why execve is the only system call forced to use Int80, even on CPUs with
> sysenter. Is there a real issue here or did someone just forget to update it?

You will find that these macros also exist in
glibc, which are the version called by user code.
So there are two copies, and maybe someone forgot
to update the version in the kernel. But the
glibc one is the one that really matters.

Kernel code shouldn't make system calls. It is
used in a few rare cases like kernel_thread
calling fork, and execve used to start init,
modprobe, and hotplug scripts. The execve call
isn't much of a problem though, I'm sure the
system call overhead is negligible compared to an
execve call.

--
Kasper Dupont

Kasper Dupont

unread,

Dec 22, 2004, 2:44:41 AM12/22/04

Beth wrote:
>
> the mechanisms are
> really dedicated to hardware IRQs primarily but Intel added "software
> interrupts" to re-use its "run-time relocation" stuff for BIOS / OS
> functions)...

OK, that explains why the instruction isn't really optimized.

> > With some kernel versions, you can just call
> > some high address (0xFFFFF000 I think), where
> > the kernel will have placed an appropriate
> > trapping instruction for your configuration.
>
> Another way again?

Actually not. It just call some code that will
use the right one of the two possibilities. The
code on the called address is also executing as
user mode code.

>
> To be honest, I was always rather surprised that Linux used "INT",
> anyway...this is usually considered to be "not recommended" in a
> multi-tasking system and not the speediest way to do it (though, it might
> be the _easiest_ way to do it...so perhaps Linus was simply thinking of how
> to get it done quickly, not what would necessarily be the "leanest and
> meanest" performance-wise)...

What other options were there when Linux were
originally designed for the 386?

--
Kasper Dupont

Lawrence DąOliveiro

unread,

Dec 22, 2004, 3:40:08 AM12/22/04

In article <xs_xd.542$657...@newsfe1-win.ntli.net>,
"Beth" <BethS...@hotmail.NOSPICEDHAM.com> wrote:

>From an assembly language point of view (this part I do know a little
>better :), the "sysenter" has better performance because it's a CPU
>instruction delibrately introduced to improve user -> kernel space
>transitions...

I doubt that it would save very much. The problem of communication
between userland and the kernel isn't exactly a new one. I think
experience has shown that the more elaborate mechanisms simply aren't
worth using. For instance, the 286 introduced "call gates", which were
supposed to be direct entry points to privileged functions, where the
hardware could take care of the work of dispatching to the correct
function, and I think even some of the argument passing and validation..
And that feature is almost certainly still present in today's Pentium
processors, but apart from OS/2 I don't think any operating system was
ever able to find a use for it. The hoary old "int" mechanism turned out
to be more efficient and more flexible.

Lawrence DąOliveiro

unread,

Dec 22, 2004, 3:51:46 AM12/22/04

In article <20041221204608...@mb-m04.aol.com>,
gary...@aol.com (Gary Kato) wrote:

>I think using "Int", a software interrupt, was a pretty standard way of making
>system calls before there was such a thing as the 8086. Not many CPUs had a
>special instruction like "sysenter". Maybe the VAX did. I'd have to look that
>up.

The VAX had 4 instructions specifically for invoking "privileged"
functions: CHMK, CHME, CHMS and CHMU (change mode to kernel, executive,
supervisor and user respectively). Though of course user mode was not
"privileged", but the instruction was provided for completeness, and
user-mode code could install a handler for it if it wanted. Each
instruction took a signed 16-bit argument, though the meaning of this
value was up to the exception handler. VMS used it as an identification
of the particular function that was being invoked.

Kernel mode was the most privileged mode. It was the only mode allowed
to run at a nonzero interrupt priority level. Exec mode was used for
less-privileged, but still trusted, services, mainly RMS (the layer
responsible for imposing record structures on block i/o devices, and
also for doing most of the parsing of filename paths). Supervisor mode
was used by the CLI. Unlike UNIX/Linux, user-mode programs mostly ran in
the same process context as the CLI. After the user program exited, the
system executed a "rundown" procedure which basically deleted all the
user-mode context (including closing all files opened in user mode),
which ended in control being returned to the CLI.

Jerry Peters

unread,

Dec 22, 2004, 12:42:57 PM12/22/04

Gary Kato <gary...@aol.com> wrote:
> >> I think using "Int", a software interrupt, was a pretty standard way of
> >making
> >> system calls before there was such a thing as the 8086.
> >
> >Other way around; Fixed memory maps, fixed OS addresses on fixed ROM chips
> >with fixed hardware configurations...that's how things used to be...using
> >"trap" and "int" - what Intel call "software interrupts" and Motorola
> >prefer to call "traps" - came along later...
>
> No. CP/M-80 used the RST instruction for system calls. I'm pretty sure the IBM
> 360 and DEC-10 also used similar instructions for OS calls. Early mini and
> mainframes and micros used fixed memory addresses when things were simpler. The
> problem is that when the OS changed, you had to recompile everything (or patch
> the OS so the memory addresses were fixed). Then people get smart and figure
> out that there's a better way.
>

IBM uses SVC, supervisor call. It's a 2 byte instruction with the
second byte conveying the routine number to call. There's also a newer
mechanism called PC, program call, that at least OS/390 & Z/OS are
starting to use.

Jerry
----snip-----

Tauno Voipio

unread,

Dec 22, 2004, 2:44:47 PM12/22/04

Beth wrote:

> Tauno Voipio wrote:
>
>>You'd say that you're trying to develop a replacement libc
>>and tell why you cannot use the standard one.
>
>
> Not necessarily; I might consider it none of your business...
>

>>In the static libc you'll get what you're really using - you
>>will need that much code anyway.
>
>
> How do you know I need that much code?

Properly called, the static libc takes just those functions you really
need. There is a minimum overhead for running even a null statement
under Linux - you cannot escape that.

Here you're comparing apples to orange juice - it is not fair to
compare formatted level 2 buffered I/O to direct syscalls.

To compare, I coded your example in C:

/* wrt.c - greeting */

#include <unistd.h>

#define msg "Hello, world\n"

int main(void)
{
write(STDOUT_FILENO, msg, sizeof(msg));
}

/* Translated and statically linked, system entry/exit
deleted.

080481c0 <main>:
80481c0: 55 push %ebp
80481c1: 89 e5 mov %esp,%ebp
80481c3: 83 ec 08 sub $0x8,%esp
80481c6: 83 c4 fc add $0xfffffffc,%esp

80481c9: 6a 0e push $0xe ; count
80481cb: 68 68 b6 08 08 push $0x808b668 ; -> string
80481d0: 6a 01 push $0x1 ; handle
80481d2: e8 59 3f 00 00 call 804c130 <__libc_write>

80481d7: c9 leave
80481d8: c3 ret

----

0804c130 <__libc_write>:
804c130: 53 push %ebx
804c131: 8b 54 24 10 mov 0x10(%esp,1),%edx
804c135: 8b 4c 24 0c mov 0xc(%esp,1),%ecx
804c139: 8b 5c 24 08 mov 0x8(%esp,1),%ebx
804c13d: b8 04 00 00 00 mov $0x4,%eax
804c142: cd 80 int $0x80
804c144: 5b pop %ebx
804c145: 3d 01 f0 ff ff cmp $0xfffff001,%eax
804c14a: 0f 83 10 54 00 00 jae 8051560 <__syscall_error>
804c150: c3 ret

This is not very far from your example. Admitted, there is one level
of parameter transfer, but it is peanuts compared to what happens to the
syscall inside the kernel.

Besides, it does test for the syscall return value, but your example
ignores the return value - a definitely non-recommended practice.

This is pretty much the same reasoning that led to the creation of
PL/360 for the IBM System/360's in the late 1960's. However, it
never really caught on. (And this is *not* PL/1 which is a totally
different software beast).

>>For *real* real time, Linux is not the way to go. The internal
>>scheduling mechanism does not honor the interrupts and succeeding
>>thread/process switching fast enough. This is a design decision
>>to make much of the kernel internals non-reentrant.
>
> Agreed; Linux is not meant to be "real-time" in its design...but then that
> was the reason why I inserted the "(near)" above...but it could still be
> "good enough", depending on what exactly it's attempting to be "near
> real-time" with...you can also "tweak" the scheduling to a degree...it may
> not be true "real-time" but it might still be sufficient...all depends what
> the application specifically is...

That does not close C out of real-time programming. I've written
real-time software since early 70's, and for the last 15 years nearly
entirely in C, except for things that have to be coded in assembler.

After transferring 40 kbytes of real-time C code from 80x86 to ARM
in one week successfully, I'd not frown on the portability issue. The
time included re-coding the real-time kernel in assembler.

> Which do quite a lot from a 64KB file that makes you re-think that Sir
> Gates might have actually had a point when he said 640KB was more than
> enough for anybody ;)...

I do not need to be reminded of the circumstances when memory is
a scarcity.

You'd try to squeeze a Basic compiler into a computer with
just 4 kilo-words of word-addressed storage. I did.

> I have not actually coded a "light libc" in reality...it was just an
> "illustrative example" in this case...but, trust me, if I were to do so,
> then I'd happily take a bet that I could improve on it, assuming certain
> assumptions could be made (such as "only needs to work on Linux on an x86
> system" and so forth)...and, also, of course, being given a reasonable time
> in which to do it (at least comparable to how long the standard "libc" has
> had in its development)...

Aha. But you just told me that it is the case.

> The "difference" is remarkably simple, really...I'd blitz the redundency
> with the one asset a compiler doesn't have: _UNDERSTANDING_ of what the
> code is meant to do...and, with the "assumptions" of a particular platform,
> to eradicate the redundencies introduced by "portability" concerns, where
> not applicable (to code "specifically")...

Maybe a short trip to compiler theory would be in order.

As far as I see, it is pretty difficult and error-prone to
out-smart a good optimising compiler, especially on a RISC platform.

> The compiler doesn't stand a chance in this contest, you see, because I can
> _cheat_...the compiler is just a dumb program, in the end, so it cannot do
> the same in return...I can _look_ at what the compiler produces, analyse it
> against what is _strictly_ required...if the compiler has produced any
> "redundency" (and, trust me, they do...oh yes, they do...check a dictionary
> if you don't understand that "optimising compiler" does NOT mean "optimal
> compiler" ;), I blitz it out of there...and there are a number of
> improvements possible _below_ the C level, which cannot even be expressed
> in C, let alone implemented...
>
> For example, many a compiler has "proprietary extensions" which exceed the
> C standards...an interesting one is provision for some "_fastcall" calling
> convention which passes function parameters in registers rather than via
> the stack...as noted, if we can assume a _specific_ platform, then
> "portability" isn't a concern...we can then make use of these "proprietary
> extensions"...revise the header files to declare the functions "_fastcall"
> rather than standard C convention (where recursion is not a concern...the C
> standard convention applies this to all functions regardless...practices a
> "one size fits all" generic solution...we can improve by simply tailoring
> it to be _specific_ to each function, _specific_ to a platform, _specific_
> to the fuller capabilities offered by the compiler)...

For embedded code, IMHO, the way to go is to write the code in C,
check the generated assembly code, and change the way the thing
is written if the result is not satisfactory.

My experience is that modern compilers and the correct selection of
optimisation produces good code with plenty less errors and lost
time than the equivalent in assembler code. Besides, an assembly
language program of any useful size is a behemoth to maintain
after years - and it is something that HAS to be done for any
professional program.

> Let me ask this...no need to answer it publicly but at least answer it
> honestly to yourself...since 1964, did all of your maintenance issues
> instantly disappear the second you started using "libc"? This may be worse,
> granted...but by how much? Let's say Linus completely alters the system
> call interface every time the second digit of the version number changes
> (2.4 -> 2.5 -> 2.6 and so on :)...this doesn't remotely actually happen, of
> course (indeed, it is _just as stable_ an interface as any other)...but
> let's assume it does...I have to change my "syscall" macro (anticipating
> the "maintenance problem", you see, I have delibrately separated it off for
> easier maintenance :) each time...this doesn't happen but were it to
> happen, am I willing to pay that cost for the benefits acrued?

Well - no, but there are ways to make the monsters smaller,
beforehand. Experience is a good tool there.

You're getting out of topic here.

libc is a multi-layer thing. At least I was talking of the
lowest layer offering direct kernel interface. All your examples
are for code on the upper layers - even with libc, you do not
need to use them if you do not like them.

> Namely, "strcpy" (instead, use "strncpy" and set "n" to make it terminate
> within the bounds of your "buffer")..."strcat"...basically, all the
> routines that also have "n" counterparts...Ritchie went for zero
> terminators because they were easy...not so much thought went into what
> this meant for "buffer overruns"...there are some distinctly "dangerous"
> routines happily sitting in "libc" that really should be ejected (but they
> won't...good old "backwards compatibility" forcing us to live with our
> mistakes 3 decades or so after we make them ;)...

If you coded according to the instructions, your network
code would use the number-limited routines.

For reference, see e.g. the W. Richard Stevens' book on
Unix network programming.

Happy coding, and do not invite more trouble than you
think you can handle.

Beth

unread,

Dec 22, 2004, 7:43:46 PM12/22/04

Kasper Dupont wrote:
> Gary Kato wrote:
> > This looks to be true (though I'm not sure of the address). The actual
Int80
> > and Sysenter instructions actually reside in their own page.
>
> The reason for doing it that way is, that glibc
> can just call this address without having to know
> if int 80h or sysenter is to be used on the
> particular architecture.

Ah, yes, very clever...then glibc can produce the same code whichever is in
operation because it's "deferred", so to speak, to whatever is in this
page...

Although, for those asking what one could do to improve on "libc" (the
other aspect of the thread that has evolved), surely this jumps out as
immediately one potential candidate? If, as I said, we can take some things
for granted...

Determine "by other means" whether the architecture uses "int 80h" or
"sysenter" and then use the instructions directly, not indirectly via this
page (which is using up at least 4KB for a one or two bytes instruction, as
well as the performance hit of constantly having to make the calls _via_
this page all the time)...

Indeed, here's a good use for "shared libraries"...you could create two
versions that are otherwise identical, except that one uses "int 80h"
throughout, the other uses "sysenter"...when a program starts up, it can
determine whether the system is using "int 80h" or "sysenter" and load in
the appropriate shared library...as the libraries are otherwise identical
and have the same "interface" to the application, then both libraries can
be used identically (indeed, if you like, once the shared library is
loaded, it can use the functions "transparently" to whether the system is
"int 80h" or "sysenter")...

A standard "libc" couldn't make these kinds of assumptions because it must
remain true to the "standards" (and nowhere in any standard does it talk of
variant shared library versions or any x86-specifics like "sysenter" but
then they can't because the standards must apply equally to all systems
regardless)...but, depending on your situation, you might be able to make
such mild, minor "deviations" from the standards here and there, which
could make useful differences...

Someone asked how assembly language could make a difference to improve
"libc"...well, the answer is, in a sense, "attitude"...no, cutting off some
"call overhead" in one place doesn't make a massive difference (though, in
some situations, you'd be surprised just how NOT insignificant some
"overhead" actually is)...neither does using libraries in a certain
way...neither does changing an algorith...the point is that _ALL OF THESE
TOGETHER_ (and a lot more besides) will make a significant
difference...it's the attitude of "good enough isn't"...

Beth :)

Beth

unread,

Dec 22, 2004, 9:14:44 PM12/22/04

Kasper Dupont wrote:
> Beth wrote:
> > the mechanisms are
> > really dedicated to hardware IRQs primarily but Intel added "software
> > interrupts" to re-use its "run-time relocation" stuff for BIOS / OS
> > functions)...
>
> OK, that explains why the instruction isn't really optimized.

Yeah, with the x86 architecture, I don't think Intel were seeing much of a
future in it at the design stage because it suffers from a number of quite
horrible "hacks"...the most infamous being the "real-mode addressing" used
before protected mode was added...addresses are calculated as: Address =
(Segment * 16) + Offset...which leads to the rather horrible situation that
0000:0400h, 0040:0000h and 0030:0100h all point to the _SAME_ memory
address, despite being numerically dis-similar...

There's no way a self-respecting engineer would prefer this "hack"
introduced, if they actually knew at the time that this chip would have a
_30 year_ lifespan...and become the architecture used in 90+% of machines
world-wide...but they didn't know, of course, so one cannot blame them but
the odd thing about the original architectural designs is hardly ideal...

Plus, the x86 started out as a simple 16-bit microprocessor...no MMU, no
"protections"...these were added with "protected mode" but, well, this was
a later addition...the point being that until you have such "protected
mode" operations added to the CPU, it would have no such thing as "user"
and "supervisor" modes to begin with, anyway, for an instruction like
"sysenter" to even make sense...

So, yeah, as noted, it's not really an "OS calling mechanism" as such, but
more of a "piggyback" on the interrupt system (needed for hardware IRQs and
such)...the interrupt system already had to deal with "run-time relocation"
(by not calling an address directly but calling it _indirectly_ through a
table of addresses...an "Interrupt Vector Table" (IVT)...later renamed
"Interrupt Descriptor Table" (IDT) to distinguish its protected mode
equivalent, as addressing works completely differently in protected mode
and required a different format table)...and then CPU exceptions also
re-used this same system (in a sense, the CPU exception being more or less
a form of "hardware IRQ" sent from the CPU to itself...an "internal" IRQ
;)...so, it made sense to also add on the "INT" instruction to allow
software to also generate "interrupts" on request and its mechanisms could
be "re-used" to also serve for BIOS / OS functions (which could benefit
from being indexed via a "table of addresses" so that newer versions of the
OS could relocate its routines elsewhere, just change the addresses in the
table but programs calling via the table don't need to be recompiled)...

> > > With some kernel versions, you can just call
> > > some high address (0xFFFFF000 I think), where
> > > the kernel will have placed an appropriate
> > > trapping instruction for your configuration.
> >
> > Another way again?
>
> Actually not. It just call some code that will
> use the right one of the two possibilities. The
> code on the called address is also executing as
> user mode code.

Yeah, from the description given elsewhere, I think I get the idea...the
application just calls the page and then it's either filled with "int 80h"
or "sysenter", depending on what the system is configured to use...thus,
it's "transparent" to the application which one is in use...kind of clever
but also kind of wasteful...

> > To be honest, I was always rather surprised that Linux used "INT",
> > anyway...this is usually considered to be "not recommended" in a
> > multi-tasking system and not the speediest way to do it (though, it
might
> > be the _easiest_ way to do it...so perhaps Linus was simply thinking of
how
> > to get it done quickly, not what would necessarily be the "leanest and
> > meanest" performance-wise)...
>
> What other options were there when Linux were originally designed for the
386?

There are a few options available, in fact...but the most sensible to my
mind would have been a plain, simple "CALL" instruction...the protected
mode architecture (which actually came in with the '286 originally but the
'386 made it 32-bit and, well, actually useful :) also allows a change of
"privilege levels" and such with a "call gate", as well as through
interrupts...

Indeed, the most obvious alternative option is much the same way that
shared libraries work by...a program calls the OS indirectly though a
"table", which can be constructed by the OS loader...

As a simple example, when a process loads, a "jump table" of address to all
the syscalls could be provided to the application...say, just as an
example, the EAX register holds the address of the start of this "jump
table"...the application can then save this address away in a variable and
use instructions of the form "CALL [ TableAddress + (syscallnumber * 4)]"
('386 addressing modes are powerful enough that this operation is a single
machine instruction) to index into that "jump table" and use it to call the
OS system functions...the OS loader itself constructs the table - so it can
fill out the table with the addresses dynamically and it can happily change
addresses from version to version - and the application is simply compiled
to make calls via the table...

The other point about using "CALL" is that it can be set up to make a
simple call to other user mode code or it can be set up with a "call gate"
in order to trigger a user -> kernel transition...and this would be
"transparent" to the calling application (the difference lies in the MMU
tables, not in the actual instructions an application uses)...

This also tends to make sense from a "micro-kernel" architecture
point-of-view (you still use the exact same "indirect CALL" instruction,
whether you're calling other user mode code or actually calling into kernel
space...thus, the system functions need not all be in kernel space...you
could even conceivably move a "monolithic" slowly towards a
"micro-kernel" - reduce the kernel but still provide all the same functions
that were available when "monolithic" with user mode equivalents - and do
it quite "transparently")...

As well as even making much more sense from the UNIX / C side of things in
that these "CALL" instructions could directly be C convention calls...as
Linux itself is written in C, the process is often quite bizarre from a
"big picture" point of view...call into "libc", loads stack into registers,
calls "int 80h", "int 80h" takes parameters and puts them on stack in order
to call internal C functions...it's moving the parameters around all over
the place and making calls to calls and such...quite a bit of "overhead"
attached (and, no, it's not that this "overhead" is unacceptably slow or
large but it's just not really necessary...why do something that there's no
actual need to be doing all the time?)...

The '386 supported an "indirect CALL" (with a "call gate", if you needed to
switch privilege levels during that call to jump into "kernel
space"...quite "transparent" to the caller that this is happening too) and
could have used a "jump table" kind of call from the beginning...indeed,
being NOT greatly dissimilar from the mechanism used for "shared libraries"
(the difference being that the OS loader constructs the "jump table"
dynamically, according to entries in the executable header about what
libraries and functions need to be loaded and "imported" into this "jump
table"), this could also have been made more "generic" and re-used for that
too (that, so to speak, every process automatically has the kernel loaded
"as if" it's a shared library when it starts)...

This system was certainly possible at the time because Microsoft were
already using it for Windows (and though Windows has a great many faults,
this is one place where they appear to have got it right...well, they're
using the right _mechanism_, anyway...BUT, as typical Microsoft, it's
rather wasted and spoilt with other strange "stdcall" conventions and an
insistance on 500 system API when 3 would do the job just as well)...

Indeed, I've had to look into this for a project and, ideally, a CALL via a
"jump table" with parameters in registers (where "libc" can provide C
convention "parameters via the stack" wrapper functionality around this for
"portability"...but where performance is more important than "portability",
an application can ignore this and use "parameters in registers"
directly)...this gives reasonably efficient performance while not actually
compromising flexibility, "portability" concerns (indeed, if you're not
fussed to support any lower-level "parameters in registers" interface, you
could make these calls actual C convention calls directly and then they are
called exactly like "libc" functions), "transparency" for "transitions"
between user and kernel space and so on and so forth...this method would
generally meet most requirements simultaneously...as close to a "one size
fits all" solution as is possible...it actually would be arguably _simpler_
and has potential for "re-use" for shared libraries too (the OS loader does
already create such a "jump table" for shared libraries...merely "allocate"
the first entries in the table to the system calls, "as if" these system
calls were from an implied "kernel shared library"...because these are
"implied", there's no need to list them in an executable header or
anything...to account for "future expansion", perhaps it's best to have one
"jump table" per "shared library" so that if the number of system calls
goes up, this doesn't "clash" with anything)...

Linus, though, didn't have "shared libraries" to begin with...and, well,
probably didn't much care about "performance" to begin with, rather than
just getting it up and running...now with Intel introducing "sysenter", the
question might be academic, as the dedicated instruction no doubts performs
better than any other means and should now be preferred...but, for the
original '386, there certainly were other options..."int" isn't the only
way...I think, simply, "int" was the most obvious and simplest way when
Linus first started (also familiar from DOS doing it this way too) and once
he decided on that way, you have to keep with it on "compatibility"
grounds...not that he did anything "wrong", so to speak, but there were
other possibilities available which could have performed slightly better
(and even given the kernel itself a "libc" style calling mechanism right at
the "core", feeding directly into the C functions of the kernel itself, as
the OS was written mostly with C itself :)...

Beth :)

Beth

unread,

Dec 22, 2004, 11:15:00 PM12/22/04

Lawrence DąOliveiro wrote:

> Beth wrote:
> >From an assembly language point of view (this part I do know a little
> >better :), the "sysenter" has better performance because it's a CPU
> >instruction delibrately introduced to improve user -> kernel space
> >transitions...
>
> I doubt that it would save very much.

Perhaps, but it should save _something_ because Intel introduced this
instruction _specifically_ to improve this situation...and once the CPU has
the facilities built into it, it seems rather odd not to use them (mind
you, you could do as Mr.Gates did and not bother to make proper use of the
'386 facilities - introduced in 1986 / 87 - until nearly a decade later
with Windows 95, making a big "hype" of its "32-bit capabilities", when
they _COULD_ have used them nearly a decade earlier...indeed, they _DID_
use them for OS/2 and at least moved towards it with NT...but the 3.x / 9x
range of Windows carried on acting as if the '386 hadn't been invented for
nearly a decade)...

I appreciate what you're saying but this kind of argument seems odd to
me...it's not really how much it saves, it's the fact that it _is_ saving
something...if it's there and relatively easily used, then even if it's
only a small saving, why not take it? If you go into a supermarket and
there's a small "discount" on a product you want, then do you take it to
the checkout but insist on paying the non-discounted price because "it's
not a large saving"? "10% off!" / "No, I don't want it because 10% isn't a
large enough saving...I'll only consider taking your discounts when they
reach 25% off...I insist on paying the full non-discounted price until you
offer at least 25% off"...maybe I'm missing something but this seems like
strange logic to me...if the offer is there and there's nothing
prohibitively difficult about taking advantage of it, then why not take it?

And, regardless, Linus clearly disagrees with you because he _has_
accounted for "sysenter" (would actually be rather ungrateful when, so I
hear in the "gossip", Intel _were_ thinking of OSes like Linux in
introducing it :)...

> The problem of communication
> between userland and the kernel isn't exactly a new one.

Of course not...it's not that the problem is "new" in general but more that
Intel haven't really done a great deal about it over the years (performance
never was the strong point in the designs, as the instructions - when
switching from userland to kernel - are a magnitude slower than anything
else...of course, part of that is because it's an inherently complicated
operation...but, really, Intel have only turned their attentions to
_optimising_ it a bit more with "sysenter"...they tend to "concentrate" on
one area at a time and "multi-media extensions" got the attention
first...then they looked to see if they could improve this some and
introduced "sysenter")...

> I think experience has shown that the more elaborate mechanisms simply
> aren't worth using.

Well, "more elaborate" suggests CISC-like instructions and RISC has
generally proved itself to be generally the wiser choice (indeed, even with
a CISC instruction set, the x86 is now RISC-like at its "core"...just
maintains the older CISC style for "compatibility"...it just breaks down
the bigger CISC instructions like "ENTER", "MOVS" into smaller instructions
internally...and you get equally good perform from using "ENTER" as from
using the simpler "MOV" and "PUSH" instructions instead to do the same job
"manually" - "ENTER"'s advantage being that it takes less bytes, not that
it performs better - or a broken down "MOVS", where you instead manually
use the simpler "MOV" instructions, typically now performs just as well -
if not better in some circumstances - as using "MOVS"...indeed, they should
only be preferred for "code size", not better performance because they
don't perform noticeably better at all on the modern CPUs and you can be
more "flexible" using simpler instructions...the reason being that, in
fact, what the CPU now does when it sees the CISC-like instructions of
"MOVS" or "ENTER" is to internally convert these into the smaller
operations and then execute those, anyway...just a "shorthand", not that
the CPU actually performs anything differently...they've all been made
somewhat RISC internally but the instruction set, of course, has to be
maintained for "compatibility"...so the solution was to "transparently"
convert the CISC instructions into a sequence of RISC-like instructions
internally and then perform those...despite what's often said about RISC
not being good for assembly language coding, this change has been
beneficial...as has the trend for "parallelism" and "out of order
execution" because now the CPU deals with "instruction scheduling"
itself...so this has benefitted ASM coders because you don't have to be
quite so "pedantic" while still getting good performance :)...

> For instance, the 286 introduced "call gates", which were
> supposed to be direct entry points to privileged functions, where the
> hardware could take care of the work of dispatching to the correct
> function, and I think even some of the argument passing and validation..
> And that feature is almost certainly still present in today's Pentium
> processors, but apart from OS/2 I don't think any operating system was
> ever able to find a use for it. The hoary old "int" mechanism turned out
> to be more efficient and more flexible.

Actually, not true...the "hoary old int mechanism" isn't more efficient and
isn't more flexible ("flexibility" is about the same, the efficiently is
slightly "below par" but not by a great deal to worry too much about it)...

OS/2 had the _sense_ to use it...why other OSes neglect it is
questionable...it's NOT because it's "more efficient" to do so...the CPU,
in fact, performs much the same actual actions for "call gate" or "int
gate" - remembering, in the end, that a CPU just shifts bits and bytes
around - but the "int" has a little extra "overhead" (to do with the
interrupt system)...

DOS never used it because it existed before such things were possible
(stuck in 16-bit-land)...

Windows originally didn't use it because it wasn't available (as Windows
did start out as nothing more than a "shell" program sitting atop DOS) and
Microsoft have basically had to stick with this for "backwards
compatibility" (for instance, the Win16 and Win32 interfaces were not
changed a great deal, even though the memory model / multi-tasking
environment was changed drastically...a "remnant" of this which you can see
in Win32 code is that applications are passed a "HINSTANCE" ("handle to
instance") and many API - "RegisterWindowClass", "CreateWindow", etc. - ask
for this parameter...yet, in fact, when the memory model changed, this
parameter isn't strictly needed...in Win16, the purpose of the "HINSTANCE"
was to identify multiple instances _in the same address space_...as of
Win32, every process is given its own _separate_ address space so the need
for the parameter is now redundent...but it is still passed to an
application and API still request it as a parameter because the API
interface was not changed from Win16 to Win32 - Microsoft, indeed, have
what they call a "source level compatibility guarantee" (that even if they
change what's "underneath", they guarantee that they will always retain
"source level compatibility"...that you can simply re-compile Win16
applications with a 32-bit compiler and all valid Win16 code should still
work perfectly well in Win32) - and it has been retained for _that_ reason
and no other)...the reason it was never changed in Windows is, for sure,
all to do with "backwards compatibility" issues only...

Why Linux chose this path, as I was stipulating, I don't really know...it
has always been a good option all along...here's the '486 timings, to show
it's always been fractionally faster - from Intel's own documentation:

CALL:
m16:32 (gate, same privilege) - 35
m16:32 (gate, more priv, no parm) - 69

INT:
immed8 (prot. mode, same priv.) - 44
immed8 (prot. mode, more priv.) - 71

Of course, there's very little in this, as we can see...the "int" isn't a
disaster...indeed, the timings are very much similar and this is to be
expected because "INT" and "CALL" really do very similar operations...an
"INT", in effect, is more or less an atomic "PUSHFD; CALL [ IDT + (index *
sizeof(IDT_ENTRY)) ]" sequence in what it actually does in
practice...hence, these timings are to be expected, really, because the
"INT" is really "indirect CALL with some small overhead attached"...so, the
timings coming out as more or less equal but "INT" being a few clock cycles
slower is to be expected...

It would be interesting to see what Intel's timings for these are
today...but since they went "out of order execution" and "parallelism" in
their designs, they've ceased to bother putting the timings into their
manuals (understandably so, though, because the way the "parallelism"
works, the timing of an instruction isn't actually independent of the
instructions before and after them anymore...as it executes instructions
simultaneously, one instruction can be "consumed" when run next to a longer
running instruction, leading to a "timing" of zero clock cycles...not that
it really could be zero cycles but that it's "effectively so", due to the
way "parallelism" operates...hence, this all screws up having precise
"timing" figures and Intel have stopped supplying them in their
documentation...because, to be fair, they would be rather misleading)...

I've not checked out the exact details of this "sysenter", I do
confess...but if it's true to its purpose in being a dedicated instruction
to specifically improve this kind of operation (and does so in a usefully
significant way), then this is all a "moot point", anyway, because the new
instruction - if it lives up to its claim (not having specifically looked
into it, I'm _presuming_ it's better because that's the reason for its
introduction but I don't actually know that this is the case...but, until I
look into it, I'll presume Intel aren't blatantly lying or anything ;) -
should be preferred from now on ("backwards compatibility" permitting, as
always, of course :)...

And, admittedly, by these timings, the difference is so slender, then you
could reasonably say "who cares?" and I wouldn't really dispute that...

Except to point out, in a more general context, that every cycle really
does count...because one issue that often comes up in discussions like this
is that I present these timings and, yeah, sure, it's only 3 or 4 clock
cycles or whatever and everyone says "oh, come on, that's so small, who on
Earth cares?"...but the point is that this is a "per use" difference...

What I mean is, it's not the individual clock cycles that count but the
_ACCUMULATED_ count that matters...the point is not that you lose 3 or 4
clock cycles but that, if you call 1000 system calls, you're losing 3000 to
4000 clock cycles...if you call a million times, it's 3 million to 4
million, which is equivalent to knocking 3-4 MHz off your clockspeed, in a
relative kind of way...again, this may not seem like much but it's the
_accumulation_ which is the real problem here...because you're not just
calling the system calls, are you? You're calling "libc" to call your
system calls for "portability"...so, double it because "libc" is also
adding on its "overhead" to each call too...now you're losing 6-8MHz (that
was the typical overall clockspeed of some of the first PCs ;)...and then
you're using a library, which uses "libc" itself, which calls into the
system calls...triple it...9-12MHz...and so on and so forth...

The problem is NOT that any of these individual pieces of "overhead" are
that greatly significant, it's that you're _ACCUMULATING_ them throughout
the system...

And the ultimate proof that it's _SIGNIFICANT_ is just to look at Windows
or X running...how many minutes do you have to wait for Outlook Express to
load (which I think should officially be awarded the "slowest loading
program" prize above all others)? How many seconds between hitting a button
on your panel in X before the window appears? And we _KNOW_ it's not the
hardware because, well, we know 3GHz is hardly poor CPU speed...indeed, we
_KNOW_ this is the case because on exactly the same machine we run DOOM 3
or Half-life 2, doing an awful lot more (in fact, trying out DOOM 3 on a
RAM-starved machine - technically, it was below the "minimum requirements"
on the RAM side of things (those "minimum requirements" always lie to an
extent ;) - you can _SEE_ the difference a good coder makes...because it
loads things "per room", in fact, exactly so that delays only happen when
moving from room to room...it actually does some impressive "memory
conservation" while running that you can begin to appreciate when you run
it on a machine that really doesn't have sufficient RAM...but the game is
still mostly playable (only a problem moving from room to room...runs
seamlessly whilst in that room) because the coders - Carmack and iD - had
the attitude to such "attention to detail", as to code the engine to
"conserve memory" throughout...the "attitude" really is the ultimate
difference between software that sucks and software that actually makes
people stop and say "wow!"...sometimes, when you think "it doesn't matter",
you very well could be shooting yourself in the foot...oh, it totally
_DOES_ matter...if a game were to delay too often, it becomes unplayable
and people won't hesitate (as users are very unforgiving) to drop it in the
recycle bin...when you have a range of software that you could use, which
one do you plump for in the end? The software that doesn't make you wait an
eternity, the software that's _really_ showing "attention to detail" when
it comes to "user friendly" and seamless operations...with a media player,
for example, the one that doesn't consume 90% of your RAM and, thus, forces
a lot of disk thrashing every time you change applications is the one to
prefer because many people like to play MP3s _while doing other
things_...with multi-tasking, programs NO LONGER have the run of the
machine and should really be coded with the notion of being run
simultaneously with other programs - RAM monsters - at its heart...

Don't follow Microsoft's false philosophy..._user time_ is more important
than your developer time...and failing to realise this could lead to
shooting yourself in the foot...because, simply, if you create an
application that takes 7 minutes to load, is disk thrashing and delaying
all over the place...while someone else is creating the same kind of
application but it operates smoothly and seamlessly (and doesn't fall apart
when run alongside 27 other applications...remembering lots of people have
their "instant messenger", "firewall" and a number of other utilities
_constantly running_ and don't want to be forced to shut them down to "free
up RAM and CPU" for other programs - indeed, with some kind of "firewall",
they _won't_ shut it down because of security - just to accomodate a "RAM
guzzler"...

Simply, the user will choose the other application...Microsoft get away
with these kinds of policies, NOT because they are good policies but
because they have _monopoly_...note that users who are clued up enough to
realise that Internet Explorer, Windows Media Player, Outlook and such
aren't the only options - those who realise that an "escape route" exists -
tend to flee the Microsoft ship as fast as their legs (well, modem
connection ;) will carry them...Microsoft don't particularly care because
"clueless" describes 90% of their target audience, who actually buy into
the "hype" that Microsoft "innovated" spreadsheets (rather than stealing it
contemptuously), GUIs (rather than stealing it contemptuosly) and that
"32-bit" in 1995 really was some "new thing" (rather than being the end of
nearly a decade of NOT properly using the 32-bit CPU they'd paid their
hard-earned money to buy...like buying a Harley Davidson and then having
"training wheels" added on and a restriction to not to exceed 10mph)...

In monopoly, the monopolistic provider decides to sell bruised, horrible
apples for $5000 each...they have monopoly over all fruit - no-one else you
can turn to - what can you do? If you want fruit, then you've got to pay
$5000 for a poor quality product...with no other choices...well, as the
saying goes: "beggers can't be choosers"...

This is what I find most disappointing with Linux because it's an "escape"
from such crap and nonsense (for those who know it exists and how to
install it and use it :)...but, then, on closer inspection, you question if
it really actually is an "escape" when many developers talk about things
with similar Microsoft attitudes and practices...does the product matter?
Not in comparison to "developer time"...more important to produce quick and
dirty crap that no-one likes or wants as fast as possible than to take the
time to make it significantly _USEFUL_...

Users don't actually buy technology, you know...they buy solutions and
applications, which might just happen to necessitate a technology or two to
achieve the end result...in the entire industry, the only people who seem
to truly understand this are Apple (and a few games companies, at least
within the realm of what they do in "entertainment" terms)...

Beth :)

Lawrence DąOliveiro

unread,

Dec 23, 2004, 12:54:31 AM12/23/04

In article <8Dryd.24$Rz...@newsfe2-win.ntli.net>,
"Beth" <BethS...@hotmail.NOSPICEDHAM.com> wrote:

>Lawrence DąOliveiro wrote:
>> Beth wrote:
>> >From an assembly language point of view (this part I do know a little
>> >better :), the "sysenter" has better performance because it's a CPU
>> >instruction delibrately introduced to improve user -> kernel space
>> >transitions...
>>
>> I doubt that it would save very much.
>
>Perhaps, but it should save _something_ because Intel introduced this
>instruction _specifically_ to improve this situation...

Lots of things have been introduced by lots of vendors over the years to
try to improve all kinds of situations. Particularly with machine
instructions, we have often found that "less is better"--which was why
we had the whole RISC movement.

>...and once the CPU has

>the facilities built into it, it seems rather odd not to use them...

There's lots of rubbish accumulated over the years in the x86
architecture that nobody in their right mind would use.

>> For instance, the 286 introduced "call gates", which were
>> supposed to be direct entry points to privileged functions, where the
>> hardware could take care of the work of dispatching to the correct
>> function, and I think even some of the argument passing and validation..
>> And that feature is almost certainly still present in today's Pentium
>> processors, but apart from OS/2 I don't think any operating system was
>> ever able to find a use for it. The hoary old "int" mechanism turned out
>> to be more efficient and more flexible.
>
>Actually, not true...the "hoary old int mechanism" isn't more efficient and
>isn't more flexible ("flexibility" is about the same, the efficiently is
>slightly "below par" but not by a great deal to worry too much about it)...
>
>OS/2 had the _sense_ to use it...why other OSes neglect it is
>questionable...

OS/2 seemed to be basically an exercise in trying to make use of all the
protection features of the 80286 processor ... about five years too late.

Kasper Dupont

unread,

Dec 23, 2004, 3:31:48 AM12/23/04

"Lawrence DąOliveiro" wrote:
>
> There's lots of rubbish accumulated over the years in the x86
> architecture that nobody in their right mind would use.

Good point. Actually in the AMD64 architecture the
first steps have been taken to remove segmentation
and virtual 86 mode.

>
> OS/2 seemed to be basically an exercise in trying to make use of all the
> protection features of the 80286 processor

That is a way to put it. It is the only OS I know
which was really designed for 80286. Most other PC
OSes are designed either for 8086 (possibly with
extra memory through some API) or for 80386. And
BTW the 386 have a few quirks caused by having to
be compatible with 286 protected mode.

> ... about five years too late.

Why do you say it was five years too late? I mean
was the 286 even five years old when OS/2 was
released?

--
Kasper Dupont

Kasper Dupont

unread,

Dec 23, 2004, 6:04:41 AM12/23/04

Beth wrote:
>
> Determine "by other means" whether the architecture uses "int 80h" or
> "sysenter" and then use the instructions directly,

To some extent that is possible.

> not indirectly via this
> page (which is using up at least 4KB for a one or two bytes instruction,

I wouldn't worry too much about 4KB systemwide for something
that is used this much. We could probably find good uses for
the rest of this page.

> as
> well as the performance hit of constantly having to make the calls _via_
> this page all the time)...

But this is followed by 21 assembler instructions to be
executed in kernel mode before the actual c implementation
of the system call takes over. And another 11 assembler
instructions on the return path. So what you are talking
about is just a possibility to elminate one out of 32
instructions.

>
> Indeed, here's a good use for "shared libraries"...you could create two
> versions that are otherwise identical, except that one uses "int 80h"
> throughout, the other uses "sysenter"...when a program starts up, it can
> determine whether the system is using "int 80h" or "sysenter" and load in
> the appropriate shared library...

This could even be done at install time. Some distributions
already have different versions of glibc for i386 and i686.

> as the libraries are otherwise identical
> and have the same "interface" to the application, then both libraries can
> be used identically (indeed, if you like, once the shared library is
> loaded, it can use the functions "transparently" to whether the system is
> "int 80h" or "sysenter")...

Sure, applications doesn't need to know the difference.

>
> A standard "libc" couldn't make these kinds of assumptions because it must
> remain true to the "standards"

No problem, it could be specified by compile time options.

> neither does changing an algorith...

You mean algorithm? Changing an algorithm can make a
major difference to performance. A sloppy implementation
of a good algorithm can easilly be a hundred times faster
than a bad algorithm with a lot of microoptimizations.

> the point is that _ALL OF THESE
> TOGETHER_ (and a lot more besides) will make a significant
> difference...it's the attitude of "good enough isn't"...

Sure, each improvement counts. But before spending time
improving on something, you have to figure out where it
will make the most difference.

If your program spend 90% of the time in user mode, and
only 10% of the time on system calls, then optimizing on
system call performance is not going to make a significant
difference.

It is not unusual to see programs use twice as many system
calls for a specific task than necesarry. Still they may
have a good performance since the system calls is not
really their bottleneck.

If some bad program use way too many system calls, then it
may matter if you reduce the overhead. But you can also
speed it up by reducing the number of system calls made,
which will improve performance a lot more than you could
ever have hoped for by mearly reducing the overhead, and
at the same time it makes your previous work an utter
waste of time.

--
Kasper Dupont

Tauno Voipio

unread,

Dec 23, 2004, 8:14:04 AM12/23/04

Kasper Dupont wrote:
>
>
> Why do you say it was five years too late? I mean
> was the 286 even five years old when OS/2 was
> released?
>

The 80286 architecture and the chips were introduced in
1980, a year before the PC.

The protected mode included a reservation for the 32
bit architecture.

---

I don't think there is much to be streamlined from
the user/kernel transition, reagrdless which instruction
is used for it. The far call using a gate was difficult
to code with flat mode compilers. In fact, both sysenter
and int 0x80 will end up using a gate (call, trap, interrupt).

Most of the contortions with privilege checks and
segmentation changes have to be made anyway, if
the kernel is to be kept in privileged mode and
the users out of it.

Gary Kato

unread,

Dec 23, 2004, 9:32:54 AM12/23/04

>The 80286 architecture and the chips were introduced in
>1980, a year before the PC.

Intel's Museum website says 1982, a year after the PC.

Lawrence DąOliveiro

unread,

Dec 23, 2004, 2:53:20 PM12/23/04

In article <wwzyd.183$DM...@read3.inet.fi>,
Tauno Voipio <tauno....@iki.INVALID.fi> wrote:

>The [286] protected mode included a reservation for the 32
>bit architecture.

No it didn't. It had no forward planning for 32-bit mode at all. You're
thinking of the Motorola 68000, as used in the original Apple Macintosh,
which was designed as a cut-down 32-bit processor from the beginning.
When Motorola brought out its first 32-bit processor, the 68020, it was
really just a matter of filling in the gaps in the original 68000
design. In the Intel world, on the other hand, the introduction of the
80386 necessitated adding entirely new addressing modes and registers
and everything.

Compare the software transitions from 16-bit to 32-bit mode in the
DOS/Windows world versus the Mac world. In the former, it was a
long-drawn-out process that took several years and a lot of
compatibility infrastructure to make it work. In the latter, it happened
essentially without anybody noticing.

Tauno Voipio

unread,

Dec 23, 2004, 4:23:15 PM12/23/04

Maybe.

I still have the handouts of Intel's International
Invitational Technical Seminar -80, in Brussels,
Belgium.

They presented a revolutionary controller,
the 8051 also.

I had an opportunity to chat with the designers
there, so I got some first-hand information there.

Tauno Voipio

unread,

Dec 23, 2004, 4:49:35 PM12/23/04

Beth wrote:
>
> Yeah, with the x86 architecture, I don't think Intel were seeing much of a
> future in it at the design stage because it suffers from a number of quite
> horrible "hacks"...the most infamous being the "real-mode addressing" used
> before protected mode was added...addresses are calculated as: Address =
> (Segment * 16) + Offset...which leads to the rather horrible situation that
> 0000:0400h, 0040:0000h and 0030:0100h all point to the _SAME_ memory
> address, despite being numerically dis-similar...
>
> There's no way a self-respecting engineer would prefer this "hack"
> introduced, if they actually knew at the time that this chip would have a
> _30 year_ lifespan...and become the architecture used in 90+% of machines
> world-wide...but they didn't know, of course, so one cannot blame them but
> the odd thing about the original architectural designs is hardly ideal...
>
> Plus, the x86 started out as a simple 16-bit microprocessor...no MMU, no
> "protections"...these were added with "protected mode" but, well, this was
> a later addition...the point being that until you have such "protected
> mode" operations added to the CPU, it would have no such thing as "user"
> and "supervisor" modes to begin with, anyway, for an instruction like
> "sysenter" to even make sense...

It seems that you do not understand the philosophy
behind the 80x86 family architecture.

Please remember that the processors were designed well
before the IBM PC. The first processor affected by the PC
and all the dirty programming with it, was 80386.

The main ideas of both segmented and paged addressing
were well developed in the mainframes in the 1960's and
1970's. The PC systems are not original here in any way.

The original idea of 8086 was to produce a processor with segmented
memory management to give the ultimate hardware protection
for critical real-time multitask systems. The segmentation
of 80286 was mostly designed before the 8086 was published.

The only intention of real mode was to create a simple
bootstrap mode with more than 64 kilobytes of address range.
This can be seen from the 80286 property that the only way
to return to real mode was a hardware reset. There is an ugly
hardware/software kludge in the original PC/AT to provide
a way to return from protected mode. The kludge includes
writing a reset cause to the clock chip CMOS memory and
using the keyboard controller to reset the main processor.

----

The segmented addressing scheme is a very effective
protection mechanism, but it is not a good one for
creating virtual memory. Virtual memory is the realm
of paged addressing.

Proper use of segmentation would need extensions to the
binary utilities (assembler, linker, ...). It would
create a better protected environment than the
minimal segmentation used in the current desktop operating
systems.

> So, yeah, as noted, it's not really an "OS calling mechanism" as such, but
> more of a "piggyback" on the interrupt system (needed for hardware IRQs and
> such)...the interrupt system already had to deal with "run-time relocation"
> (by not calling an address directly but calling it _indirectly_ through a
> table of addresses...an "Interrupt Vector Table" (IVT)...later renamed
> "Interrupt Descriptor Table" (IDT) to distinguish its protected mode
> equivalent, as addressing works completely differently in protected mode
> and required a different format table)...and then CPU exceptions also
> re-used this same system (in a sense, the CPU exception being more or less
> a form of "hardware IRQ" sent from the CPU to itself...an "internal" IRQ
> ;)...so, it made sense to also add on the "INT" instruction to allow
> software to also generate "interrupts" on request and its mechanisms could
> be "re-used" to also serve for BIOS / OS functions (which could benefit
> from being indexed via a "table of addresses" so that newer versions of the
> OS could relocate its routines elsewhere, just change the addresses in the
> table but programs calling via the table don't need to be recompiled)...

Would you be more happy, if the interrupts, traps and
other exceptions were called something else than interrupts?

The protection mechanism has to go through the same
operations at each exception, although hardware interrupt
identification requires the extra interrupt acknowledge cycle
to load the associated exception number from the interrupt
controller chip.

Using the software interrupts with instruction-fed exception
numbers actually simplifies the processor without an associated
run-time penalty.

> As a simple example, when a process loads, a "jump table" of address to all
> the syscalls could be provided to the application...say, just as an
> example, the EAX register holds the address of the start of this "jump
> table"...the application can then save this address away in a variable and
> use instructions of the form "CALL [ TableAddress + (syscallnumber * 4)]"
> ('386 addressing modes are powerful enough that this operation is a single
> machine instruction) to index into that "jump table" and use it to call the
> OS system functions...the OS loader itself constructs the table - so it can
> fill out the table with the addresses dynamically and it can happily change
> addresses from version to version - and the application is simply compiled
> to make calls via the table...

You seem not to understand the protection part of the processor and
kernel design.

> The other point about using "CALL" is that it can be set up to make a
> simple call to other user mode code or it can be set up with a "call gate"
> in order to trigger a user -> kernel transition...and this would be
> "transparent" to the calling application (the difference lies in the MMU
> tables, not in the actual instructions an application uses)...

Get a good text on the Intel 80386+ architecture and
read the memory management and exception parts.

> This also tends to make sense from a "micro-kernel" architecture
> point-of-view (you still use the exact same "indirect CALL" instruction,
> whether you're calling other user mode code or actually calling into kernel
> space...thus, the system functions need not all be in kernel space...you
> could even conceivably move a "monolithic" slowly towards a
> "micro-kernel" - reduce the kernel but still provide all the same functions
> that were available when "monolithic" with user mode equivalents - and do
> it quite "transparently")...

For a micro-kernel, have a look at GNU/Hurd. An easier starter
is prof. Tanenbaum's Minix.

> As well as even making much more sense from the UNIX / C side of things in
> that these "CALL" instructions could directly be C convention calls...as
> Linux itself is written in C, the process is often quite bizarre from a
> "big picture" point of view...call into "libc", loads stack into registers,
> calls "int 80h", "int 80h" takes parameters and puts them on stack in order
> to call internal C functions...it's moving the parameters around all over
> the place and making calls to calls and such...quite a bit of "overhead"
> attached (and, no, it's not that this "overhead" is unacceptably slow or
> large but it's just not really necessary...why do something that there's no
> actual need to be doing all the time?)...

That would necessitate the segmentation extensions
in the programming toolchain. The ring-crossing calls
are of necessity far calls with both segment and offset.

> The '386 supported an "indirect CALL" (with a "call gate", if you needed to
> switch privilege levels during that call to jump into "kernel
> space"...quite "transparent" to the caller that this is happening too) and
> could have used a "jump table" kind of call from the beginning...indeed,
> being NOT greatly dissimilar from the mechanism used for "shared libraries"
> (the difference being that the OS loader constructs the "jump table"
> dynamically, according to entries in the executable header about what
> libraries and functions need to be loaded and "imported" into this "jump
> table"), this could also have been made more "generic" and re-used for that
> too (that, so to speak, every process automatically has the kernel loaded
> "as if" it's a shared library when it starts)...

The point with gates is to provide a hardware mechanism
to limit the callers and entry points across a ring
crossing - see the hardware manuals.

Kasper Dupont

unread,

Dec 23, 2004, 8:06:50 PM12/23/04

"Lawrence DąOliveiro" wrote:
>
> In article <wwzyd.183$DM...@read3.inet.fi>,
> Tauno Voipio <tauno....@iki.INVALID.fi> wrote:
>
> >The [286] protected mode included a reservation for the 32
> >bit architecture.
>
> No it didn't.

Looking on the bitlayout in the 386 segment descriptors,
it is obvious, that it is not the result of planning.
Rather they put in the extra bits where they could find
space for it. And since there wasn't enough space left
for a 32 bit segment size, only 21 bits are used for
segment size. A 20 bit unsigned integer number, which is
then multiplied by 1 or 4096 depending on the last bit.

Reservation for a 32 bit architecture? Hah?

I even find it hard to believe anybody had considered
the 286 design when designing the 8086. Most notably the
infamous A20 line is a sure sign of missing forward
compatibility.

> It had no forward planning for 32-bit mode at all. You're
> thinking of the Motorola 68000, as used in the original Apple Macintosh,
> which was designed as a cut-down 32-bit processor from the beginning.

Seen from a software viewpoint, it was not cut-down in
any way. It had 32 bit registers and instructions that
were designed to work with 8, 16, and 32 bits from the
very begining.

The actual hardware was at least to some extent 16 bit.
The address bus was only 16 bits, and a 32 bit memory
access had to be performed in to opperations. That is
something which was handled by microcode. The software
never saw the dirty details.

--
Kasper Dupont

Tauno Voipio

unread,

Dec 25, 2004, 12:30:47 PM12/25/04

Kasper Dupont wrote:
>
>
> Looking on the bitlayout in the 386 segment descriptors,
> it is obvious, that it is not the result of planning.
> Rather they put in the extra bits where they could find
> space for it. And since there wasn't enough space left
> for a 32 bit segment size, only 21 bits are used for
> segment size. A 20 bit unsigned integer number, which is
> then multiplied by 1 or 4096 depending on the last bit.
>
> Reservation for a 32 bit architecture? Hah?

At lease the 286 designers told me that the extra bits
are for a 32 bit extension. Probably the ide they had
was different then the 80386 architecture due to the
fact thet the PC was not born yet.

> I even find it hard to believe anybody had considered
> the 286 design when designing the 8086. Most notably the
> infamous A20 line is a sure sign of missing forward
> compatibility.
>

You're barking up the wrong tree here: A20 gate was an
invention of the PC/AT designers to accommodate badly
coded (against the instructions of IBM) PC software
using the address wrap-around at 1 Mbyte.

>>It had no forward planning for 32-bit mode at all. You're
>>thinking of the Motorola 68000, as used in the original Apple Macintosh,
>>which was designed as a cut-down 32-bit processor from the beginning.

I did not - I coded both for the 80x86's and m68k's at
the same time, on embedded systems, neither the PC nor
the Mac.

Tauno Voipio

unread,

Dec 25, 2004, 3:10:06 PM12/25/04

Tauno Voipio wrote:

> At lease the 286 designers told me that the extra bits
> are for a 32 bit extension. Probably the ide they had
> was different then the 80386 architecture due to the
> fact thet the PC was not born yet.

Sorry for the misprints, please read:

At least the 286 designers told me that the extra bits
are for a 32 bit extension. Probably the idea they had
was different than the 80386 architecture due to the
fact that the PC was not born yet.

Kasper Dupont

unread,

Dec 25, 2004, 3:40:18 PM12/25/04

Tauno Voipio wrote:
>
> At lease the 286 designers told me that the extra bits
> are for a 32 bit extension. Probably the ide they had
> was different then the 80386 architecture due to the
> fact thet the PC was not born yet.

Maybe they were reserved for extensions. But obviously
not 32 bit extensions. There simply wasn't enough free
bits for 32 bits of anything.

>
> You're barking up the wrong tree here: A20 gate was an
> invention of the PC/AT designers to accommodate badly
> coded (against the instructions of IBM) PC software
> using the address wrap-around at 1 Mbyte.

According to some specifications I have seen,
the A20 gate is a part of the CPU.

>
> I did not - I coded both for the 80x86's and m68k's at
> the same time, on embedded systems, neither the PC nor
> the Mac.

Did anybody actually use x86 for embedded systems
at that time?

--
Kasper Dupont

Tauno Voipio

unread,

Dec 26, 2004, 4:46:07 AM12/26/04

Kasper Dupont wrote:
>
>>You're barking up the wrong tree here: A20 gate was an
>>invention of the PC/AT designers to accommodate badly
>>coded (against the instructions of IBM) PC software
>>using the address wrap-around at 1 Mbyte.
>
>
> According to some specifications I have seen,
> the A20 gate is a part of the CPU.
>

No - the Intel documents have no such thing. If you're
still interested, I can dig up the PC/AT schematics
from the attic and look at the IC number for the gate.
It is a pure kludge by IBM forced by the lousy coding
practices on the original PC.

>>I did not - I coded both for the 80x86's and m68k's at
>>the same time, on embedded systems, neither the PC nor
>>the Mac.
>
>
> Did anybody actually use x86 for embedded systems
> at that time?
>

Yes - plenty. There are some tens of thousands of
units based on 8086, 8088 and different flavours
of 80186 with my code inside still in service.

The 8086 family was an improvement over the previous
Intel designs, 8080 and 8085 (also Zilog Z80).

Tauno Voipio

unread,

Dec 27, 2004, 4:46:57 AM12/27/04

Tauno Voipio wrote:
> Kasper Dupont wrote:
>
>>
>>> You're barking up the wrong tree here: A20 gate was an
>>> invention of the PC/AT designers to accommodate badly
>>> coded (against the instructions of IBM) PC software
>>> using the address wrap-around at 1 Mbyte.
>>
>> According to some specifications I have seen,
>> the A20 gate is a part of the CPU.
>>
> No - the Intel documents have no such thing. If you're
> still interested, I can dig up the PC/AT schematics
> from the attic and look at the IC number for the gate.
> It is a pure kludge by IBM forced by the lousy coding
> practices on the original PC.

The A20 gate is the IC U129 (74F257 multiplexor) on both type 1
and type 2 PC/AT motherboards. On newer constructions, the gate
is a part of the support chipset. (Reference: IBM Technical Reference,
Personal Computer AT, IBM doc 6139362, September 1985).

Gary Kato

unread,

Dec 27, 2004, 9:21:52 PM12/27/04

>Did anybody actually use x86 for embedded systems
>at that time?

I worked on a system that used an Intel 86/10 board (8086) in 1980 or so. It
was used to connect a RAMTEK raster color graphics display to an IBM mainframe
I/O Channel. Most all graphics on IBM mainframes back then were vector graphics
units. I learned to hate those segment:offset architecture while working on
that one. We used CP/M-86 and assembly language.

0 new messages