A patch for linux 2.1.127

Steven N. Hirsch

unread,

Nov 8, 1998, 3:00:00 AM11/8/98

to

On Sun, 8 Nov 1998, H.J. Lu wrote:

> Hi,
>
> Here is a patch for Linux 2.1.127. Now I am running Linux 2.1.127
> compiled with egcs 1.1.1 prerelease using -O6.
>

HJ,

You will note that the schedule() changes completely broke lockd, and that
your linux-2.1.1xx.diff patch no longer results in a working RPC system.

Tried to E-Mail you, but your mailer bounces my mail.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Bernd Schmidt

unread,

Nov 8, 1998, 3:00:00 AM11/8/98

to

On Sun, 8 Nov 1998, Linus Torvalds wrote:

> On Sun, 8 Nov 1998, H.J. Lu wrote:
>

> > For
> >
> > : "r" (5 * 1000020/HZ)
> >
> > It puts 5 * 1000020/HZ in eax.
>
> Which is very obviously buggy, as the asm also has a clobber of "eax".
>
> This is not one of the cases where the documentation is unclear - it's
> very clear on this issue.. It is very very obviously a compiler bug, and
> anybody saying anything else is just misguided.
[...]
> However, even then I'd wonder what ELSE is broken. I don't see any other
> real fix than fixing egcs (at least making it raise an error when it can't
> do register allocation, rather than generating incorrect code).

The most recent egcs snapshot has a fix for this and other known bugs with
asm statements (as well as fixes for -mregparm). Can we now please fix
those cases in the kernel as well where it uses asm statements that are
really incorrect?

Bernd

Richard Henderson

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Sun, Nov 08, 1998 at 11:28:29AM -0800, H.J. Lu wrote:
> I don't know whom to blame. I have to say both. As you can see,
> calibrate_tsc is just asm code written in the asm statement.

What are you smoking? Of course it is a compiler problem.

The problem is the sort of general small register class losage that
has plagued us for some time, that is, silent failures when we run
out of reload registers. Which is thankfully solved now by Bernd's
recent herculean efforts.

Unfortunately, those patches are way too massive to go back to the
1.1 branch. For that I don't really know what to do.

I will note that as a workaround, there's no particular reason we
need to force this second value into a register at all, since it will
just be coped to %edx later.

r~

--- arch/i386/kernel/time.c.orig Sun Nov 8 21:35:18 1998
+++ arch/i386/kernel/time.c Sun Nov 8 21:36:44 1998
@@ -565,10 +565,9 @@
"movl %1, %%edx\n\t"
"divl %%ecx\n\t" /* eax= 2^32 / (1 * TSC counts per microsecond) */
/* Return eax for the use of fast_gettimeoffset */
- "movl %%eax, %0\n\t"
- : "=r" (retval)
- : "r" (5 * 1000020/HZ)
- : /* we clobber: */ "ax", "bx", "cx", "dx", "cc", "memory");
+ : "=&a" (retval)
+ : "g" (5 * 1000020/HZ)
+ : /* we clobber: */ "bx", "cx", "dx", "cc", "memory");
return retval;

David S. Miller

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

Date: Sun, 8 Nov 1998 23:21:39 -0800 (PST)
From: Linus Torvalds <torv...@transmeta.com>

How about something simple like:

if (SMALL_REGISTER_SET)
never_inline_functions_unless_the_user_asked_for_it();

which means that even with -O6 you would not inline functions
unless they were marked inline.

I think I'd rather tell people "don't compile the kernel with -O6"
than turn off -finline-functions for -O6 by default on any machine.

Later,
David S. Miller
da...@dm.cobaltmicro.com

Andi Kleen

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

In article <m0zcaVi...@ocean.lucon.org>,

h...@lucon.org (H.J. Lu) writes:
> Hi,
> Here is a patch for Linux 2.1.127. Now I am running Linux 2.1.127
> compiled with egcs 1.1.1 prerelease using -O6.

It is not a good idea to compile the kernel with -O6. -O6 includes
-finline-functions, and automatic inlining messes up a lot of fast
paths (mainly because egcs is not able to do register life splitting
yet)

e.g. this is a common idiom in the kernel:

static int __do_something(void)
{
/* do something slow that needs lots of registers */
}

static inline int do_something(void)
{
if (something needed)
__do_something();
}

In case the slow part is only seldom needed this generates very good
code because in the common path there is no function call and the calling
function has the complete register set for their own use (important on x86).
Now when you use -O6 the compiler will inline the slow path and increase the
number of variables that need to be fit into the same register significantly,
which gives much slower code. Also it has bad effects on the L1
cache. Nearly all functions where it pays to inline them are already
explicitely marked inline.

So just use -O2 instead of -O6.

-Andi

Peter T. Breuer

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

"A month of sundays ago Linus Torvalds wrote:"
> How about something simple like:
>
> if (SMALL_REGISTER_SET)
> never_inline_functions_unless_the_user_asked_for_it();
>
> which means that even with -O6 you would not inline functions unless they
> were marked inline.

Too much. I rely on gcc (and other compilers) inlining functions that are
only placeholders for other calls to optimize generated code.

I need functions inlined if they're dead-ends, or call other functions
and return the foreign result without further calculation. I.e. they
don't need to save and restore registers because they're not using any.
That's about a 3-4 times speedup win for me (my compiler-compilers ..).
I produce lots of short functions from expression analysis. "a+b+c"
might generate 6 functions, one for each subexpression and one for each
referent.

If gcc can't detect the (tail-recursion at worst) case its dataflow
analysis is fundamentally broken. Given that it is broken, possibly
only for the assembler case (?? - isn't gcc using its
internal representation at this point?) it should at least be able to say
"i'm not sure" and fallback to saving/restoring registers.

> Note that this is not just a workaround for a bug. The fact is, that with

As phrased, it is! It looks more like one should just stop gcc being so
optimistic when it's data analysis doesn't warrant it.

> small-register-set, inlining functions is not likely to be all that big of
> a win (and is often a loss due to register allocation pressure) unless the
> function is _really_ small or for some special cases - and in both cases
> hopefully the function is marked inline already by the knowledgeable user.

The knowledgeable user relies on the knowledgable compiler here. All
studies show that hand-optimizing code is a waste of effort 99.99% of
the time, not least because it's unmaintainable. Experience says so
too. I've regretted every "optimization" I've ever done.

Peter

Andrea Arcangeli

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Sun, 8 Nov 1998, Linus Torvalds wrote:

>Note that this is not just a workaround for a bug. The fact is, that with

>small-register-set, inlining functions is not likely to be all that big of
>a win (and is often a loss due to register allocation pressure) unless the
>function is _really_ small or for some special cases - and in both cases
>hopefully the function is marked inline already by the knowledgeable user.

And without speak about CPU cache issues. In most of cases inlining
everything cause tons of code to be kiked out from the CPU cache and this
hurts more than saving on the stack some parameter and `call'ing the
function.

Andrea Arcangeli

Bernd Schmidt

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Sun, 8 Nov 1998, Linus Torvalds wrote:

> On Sun, 8 Nov 1998, Richard Henderson wrote:
> >
> > Unfortunately, those patches are way too massive to go back to the
> > 1.1 branch. For that I don't really know what to do.
>

> How about something simple like:
>
> if (SMALL_REGISTER_SET)
> never_inline_functions_unless_the_user_asked_for_it();
>
> which means that even with -O6 you would not inline functions unless they
> were marked inline.
>

> Note that this is not just a workaround for a bug.

It's not even a workaround for a bug. Function inlining has nothing to
do with the problem. The following testcase (gcc.dg/clobbers.c in the egcs
testsuite) demonstrates that:

int main ()
{
int i;
__asm__ ("movl $1,%0\n\txorl %%eax,%%eax" : "=r" (i) : : "eax");
if (i != 1)
abort ();
__asm__ ("movl $1,%0\n\txorl %%ebx,%%ebx" : "=r" (i) : : "ebx");
if (i != 1)
abort ();
__asm__ ("movl $1,%0\n\txorl %%ecx,%%ecx" : "=r" (i) : : "ecx");
if (i != 1)
abort ();
__asm__ ("movl $1,%0\n\txorl %%edx,%%edx" : "=r" (i) : : "edx");
if (i != 1)
abort ();
__asm__ ("movl $1,%0\n\txorl %%esi,%%esi" : "=r" (i) : : "esi");
if (i != 1)
abort ();
__asm__ ("movl $1,%0\n\txorl %%edi,%%edi" : "=r" (i) : : "edi");
if (i != 1)
abort ();
return 0;
}

It might be possible to construct a workaround for egcs-1.1, but it would most
likely be rather ugly, and it doesn't help people who use gcc-2.7.2 or
egcs-1.0.3. The simplest workaround is to avoid using clobbers on the x86.

Bernd

kwro...@ce.mediaone.net

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

And lo, David S. Miller saith unto me:

>
> Date: Sun, 8 Nov 1998 23:21:39 -0800 (PST)
> From: Linus Torvalds <torv...@transmeta.com>
>

> How about something simple like:
>
> if (SMALL_REGISTER_SET)
> never_inline_functions_unless_the_user_asked_for_it();
>
> which means that even with -O6 you would not inline functions
> unless they were marked inline.
>

> I think I'd rather tell people "don't compile the kernel with -O6"
> than turn off -finline-functions for -O6 by default on any machine.
>

Why not just recommend "-O6 -fno-inline-functions" like we did with the
strength reduction bug? (Insert the right switch to disable the
aggressive inlining of -O6 but not disable explicitly requested
inlining here...and in the compiler code, if it's not there already.)

Maybe even add a "-OK" or "-OL" optimization level for code which has
already been hand-optimized in these ways? Of course, we'd also want
to just use -O6 for code that has not been so heavily optimized vis a
vis inlining on Intel.

Keith

Linus Torvalds

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Mon, 9 Nov 1998, David S. Miller wrote:
>
> I think I'd rather tell people "don't compile the kernel with -O6"
> than turn off -finline-functions for -O6 by default on any machine.

Umm, I thought egcs turned inlining on much more aggressively - somebody
said it happens with the default kernel compile flags (-O2).

If it only happens for -O6, I certainly agree.

Linus

Richard Henderson

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Mon, Nov 09, 1998 at 03:39:14PM -0600, kwro...@ce.mediaone.net wrote:
> Why not just recommend "-O6 -fno-inline-functions" like we did with the
> strength reduction bug?

Because that is exactly -O2. Unless you are using pgcc, which has
its own idea about the world, there are three levels of optimization

-O1 stuff
-O2 more stuff, instruction scheduling
-O3 automatic function inlining.

r~

Horst von Brand

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

h...@lucon.org (H.J. Lu) said:

[...]

> The problem starts when egcs 1.1.1 inlines calibrate_tsc. For

To inline an __initfunc is madness... is there any way to prevent this?
Other than not using more than -O2, and trusting your friendly kernel
hackers to have selected functions to inline right, that is.
--
Horst von Brand vonb...@sleipnir.valparaiso.cl
Casilla 9G, Viña del Mar, Chile +56 32 672616

Tom Vier

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

> It is not a good idea to compile the kernel with -O6. -O6 includes
> -finline-functions, and automatic inlining messes up a lot of fast
> paths (mainly because egcs is not able to do register life splitting
> yet)

i thought -O3 turned on inline functions? is egcs different than gcc,
in that respect?

gcc man page:
-O3 Optimize yet more. This turns on everything -O2
does, along with also turning on -finline-func-
tions.

> So just use -O2 instead of -O6.

just curious; i heard that freebsd always defaults to -O3 for their
kernel. is this cuz they code their kernel different, taking
-finline-functions into account, or cuz they just like to throw the
highest opt level on it?

--
Tom Vier - 0x82B007A8
nes...@sekurity.org | goto the Zero Page at:
Tortured Souls Software | http://www.erols.com/thomassr/zero/

Jeffrey A Law

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

In message <Pine.LNX.3.95.981109...@penguin.transmeta.com>yo

u write:
>
>
> On Mon, 9 Nov 1998, David S. Miller wrote:
> >
> > I think I'd rather tell people "don't compile the kernel with -O6"
> > than turn off -finline-functions for -O6 by default on any machine.
>
> Umm, I thought egcs turned inlining on much more aggressively - somebody
> said it happens with the default kernel compile flags (-O2).

Who told you that? They were either clueless about gcc or were using a
hacked version of gcc.

> If it only happens for -O6, I certainly agree.

toplev.c::

if (optimize >= 3)
{
flag_inline_functions = 1;
}

jef

Hans Lermen

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, 9 Nov 1998, Peter T. Breuer wrote:

> ... All

> studies show that hand-optimizing code is a waste of effort 99.99% of
> the time, not least because it's unmaintainable.

Smells a bit like an excuse for lazyness. For sure you have cases where
the programmer 'knows it better' than the compiler and for sure such
cases happen during kernel programming. If you can't tell the compiler
that you are the boss, you get lost.

And a final point: If you know the kernel was written/tested/proven with
-O2 and the Makefiles explicitely set -O2, then using -O6 and complaining
its not running is just ignoring the fact that the 'author' knows it
better.

Hans
<ler...@fgan.de>

Jamie Lokier

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Sun, Nov 08, 1998 at 11:21:39PM -0800, Linus Torvalds wrote:
> How about something simple like:
>
> if (SMALL_REGISTER_SET)
> never_inline_functions_unless_the_user_asked_for_it();
>
> which means that even with -O6 you would not inline functions unless they
> were marked inline.

Then just add `-fno-inline-functions' to the makefile instead. That's
distinct from `-fno-inline', which controls attention to explicit
`inline'.

Disclaimer: it *should* work... ;-)

> Note that this is not just a workaround for a bug.

By the sound of it, it's not a workaround at all. It suppresses the bug
in one case, but the bug is still present and may crop up again.

I'd still recommend -fno-inline-functions on grounds that the kernel is
full of inline fast cases calling out-of-line slow cases rarely, and
that's how we want it.

-- Jamie

Andi Kleen

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, Nov 10, 1998 at 04:03:01AM +0100, Tom Vier wrote:
> > It is not a good idea to compile the kernel with -O6. -O6 includes
> > -finline-functions, and automatic inlining messes up a lot of fast
> > paths (mainly because egcs is not able to do register life splitting
> > yet)
>
> i thought -O3 turned on inline functions? is egcs different than gcc,
> in that respect?

-On,n>3 is identical to -O3.

> just curious; i heard that freebsd always defaults to -O3 for their
> kernel. is this cuz they code their kernel different, taking
> -finline-functions into account, or cuz they just like to throw the
> highest opt level on it?

AFAIK that is wrong. FreeBSD uses -O for better debugging.

-Andi

Andi Kleen

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

In article <Pine.LNX.3.95.981109...@penguin.transmeta.com>,

Linus Torvalds <torv...@transmeta.com> writes:
> On Mon, 9 Nov 1998, David S. Miller wrote:
>>
>> I think I'd rather tell people "don't compile the kernel with -O6"
>> than turn off -finline-functions for -O6 by default on any machine.

> Umm, I thought egcs turned inlining on much more aggressively - somebody
> said it happens with the default kernel compile flags (-O2).

To clarify:
-O2 in egcs doesn't include -finline-functions.
What it includes is -fgcse, which may have similar effects (using more
registers, leading to problems with the register allocator)

Richard B. Johnson

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998, Hans Lermen wrote:

> On Mon, 9 Nov 1998, Peter T. Breuer wrote:
>
> > ... All
> > studies show that hand-optimizing code is a waste of effort 99.99% of
> > the time, not least because it's unmaintainable.
>
> Smells a bit like an excuse for lazyness. For sure you have cases where
> the programmer 'knows it better' than the compiler and for sure such
> cases happen during kernel programming. If you can't tell the compiler
> that you are the boss, you get lost.
>
> And a final point: If you know the kernel was written/tested/proven with
> -O2 and the Makefiles explicitely set -O2, then using -O6 and complaining
> its not running is just ignoring the fact that the 'author' knows it
> better.

If instead of using occasional 'inline asm' stuff, specific to a compiler,
some critical functions were written in real assembly with an assembler
that produced code exactly as written, using the language that the
processor designer specified, i.e., Intel for Intel, then there could
be improvements in the execution speed. Further, such functions would
be maintainable because the code produced is always what was written,
not some interpretation made by a compiler.

At Analogic, we write software that interacts with machines. Typically
there is an assembly-language wrapper around each piece of hardware
that makes each hardware assembly seem 'perfect'. This wrapper (you
might call it a driver) handles all of the device-specific aspects of
the hardware. Its interface to upper-level code uses 'C' calling
rules so that the upper levels, which deal with the logic necessary
to make the whole machine function, can be (are) written in the
'C' language.

The result is that capable 'real-time' software engineers can optimize
the hardware interface while maintaining the exact functionality
necessary for the 'system' software engineers who design the upper
levels.

For example, a typical CAT Scanner may have hundreds of special
communications and control boards that must function in a time-
critical manner to maintain system-wide pipe-lining to maximize
throughput, while maintaining safety.

One 'board' might be an A/D converter with a 16-channel MUX. The MUX must
be set, a MUX settle time must occur, the converter is told to convert,
one must wait for the conversion to complete, the converted result is
read, it is filtered, then written as a storage variable.

This all happens "auto-magically" within a timer ISR. When upper-level
code wants voltage-X or current-Y, it makes a function-call to get
it and it's simply read from RAM. The converter is never disturbed.

Further, the necessary "wait for results" shown above never wastes
CPU time because no code ever waits or loops within an ISR. The
required "waits" are just "do-nothing" states maintained within
the ISR where it gets called, but after checking a state-variable,
returns.

Such state-machines and real-time hardware interfaces can be readily
implemented and optimized in assembly. If you try to the same thing
in 'C', you can only use 'pseudo-asm' inside a 'C' wrapper. The
results are specific to a 'C' compiler version which makes maintain-
ability a nightmare.

I have a checksum routine which will checksum a 1500 MTU packet
(or less) in a single pass. It uses a computed jump into the
correct location of a aligned instruction-stream. This is used
in my network communications interface on an Analogic product
that received FAA certification two weeks ago.

There are no secrets on how this is done. Any competent assembly-
language programmer would use this technique as a trade-off of
RAM v.s. speed. The GNU pseudo-assem prevents me from porting this
to Linux.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.127 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

David S. Miller

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

Date: Tue, 10 Nov 1998 10:01:23 +0100 (MET)
From: Hans Lermen <ler...@elserv.ffm.fgan.de>

On Mon, 9 Nov 1998, Peter T. Breuer wrote:

> ... All
> studies show that hand-optimizing code is a waste of effort 99.99% of
> the time, not least because it's unmaintainable.

I'm so glad I've never bought into this line of reasoning...

Later,
David S. Miller
da...@dm.cobaltmicro.com

Richard B. Johnson

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998, Alex Buell wrote:

> On Tue, 10 Nov 1998, Richard B. Johnson wrote:
>
> > There are no secrets on how this is done. Any competent assembly-
> > language programmer would use this technique as a trade-off of RAM v.s.
> > speed. The GNU pseudo-assem prevents me from porting this to Linux.
>

> Why not use the GNU as? Won't this do what you require, Richard?
>
> Cheers,
> Alex.

Because I'll be damned if I'll let perfectly correct Intel assembly-
language code get mangled to:

xxx.asm: Assembler messages:
xxx.asm:6: Error: operands given don't match any known 386 instruction
xxx.asm:9: Error: Ignoring junk '[bp-10]' after expression
xxx.asm:9: Error: operands given don't match any known 386 instruction

... by an assembler that doesn't know Intel Assembly, but pretends so.

It is thoretically possible to convert correct code to GNU `as` junk,
however, the damn thing doesn't even do MACROs so if I am going to
make:
adc eax, [ebx+1000]
adc eax, [ebx+996]
adc eax, [ebx+992]

... etc.. a thousand times I would certainly
want to use another tool. It also doesn't know how to write
a byte to a memory location, i.e., it doesn't know about the PTR
expression to tell it whether to write a byte, a word, or a longword
to a memory location when you do something like:

mov variable,0

That's why we have BYTE PTR, WORD PTR, DWORD PTR, etc. Otherwise a
zero could (and does) smash adjacent data. It's a very bad tool.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.127 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

Marc Lehmann

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, Nov 09, 1998 at 02:16:52PM -0800, Richard Henderson wrote:
> On Mon, Nov 09, 1998 at 03:39:14PM -0600, kwro...@ce.mediaone.net wrote:
> > Why not just recommend "-O6 -fno-inline-functions" like we did with the
> > strength reduction bug?
>
> Because that is exactly -O2. Unless you are using pgcc, which has
> its own idea about the world, there are three levels of optimization

The main reason for -O6 in pgcc is that I can easier test for bugs in egcs,
i.e. if pgcc breaks with -O2 -finline-functions its most certainly a bug in
egcs. There is nothing wrong with -O99 and such..

OTOH, it seems pgcc firmly set the magical -O6 into the minds of the people,
and I can see that this is not the best thing ;(

-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / p...@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ --+
The choice of a GNU generation |

Alex Buell

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998, Richard B. Johnson wrote:

> There are no secrets on how this is done. Any competent assembly-
> language programmer would use this technique as a trade-off of RAM v.s.
> speed. The GNU pseudo-assem prevents me from porting this to Linux.

Why not use the GNU as? Won't this do what you require, Richard?

Cheers,
Alex.

---
/\_/\ Legalise cannabis now!
( o.o ) Grow some cannabis today!
> ^ < Peace, Love, Unity and Respect to all.

Check out http://www.tahallah.demon.co.uk
Linux lo-pc3035a 2.1.125 #6 Fri Oct 9 13:53:00 EDT 1998
One Intel Pentium 75+ processor, 66.36 total bogomips, 16M RAM
System library 5.4.44

David S. Miller

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

Date: Tue, 10 Nov 1998 10:28:02 -0500 (EST)
From: "Richard B. Johnson" <ro...@chaos.analogic.com>

There are no secrets on how this is done. Any competent assembly-
language programmer would use this technique as a trade-off of RAM
v.s. speed. The GNU pseudo-assem prevents me from porting this to
Linux.

? Your application just would not a good usage of the inline asm
feature of gcc, it was not designed for this sort of task. You can
always choose some other method to achieve what you want.

Although, I can't think of one situation where I wanted to do
something incredibly crazy and grotesque in raw assembly on Sparc and
couldn't find a way to do it within' a C source file with gcc. For
example I once had a version of the UltraSparc/VIS unrolled memcpy
completely in C using gcc, the arguments could be passed in, it
required 7 versions of the 300 instruction loop, etc. all in one hunk
of code, it had a return value, and it all worked.

I think the bottom line is that gcc inline asms are more powerful in
one aspect than other schemes, and in another aspect they are less
powerful than such schemes. It was a trade off in design when they
were initially implemented. Perhaps it's showing it's age now, so
lets work on fixing it.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-

Sascha Schumann

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998, Andi Kleen wrote:

> On Tue, Nov 10, 1998 at 04:03:01AM +0100, Tom Vier wrote:
>
> > just curious; i heard that freebsd always defaults to -O3 for their
> > kernel. is this cuz they code their kernel different, taking
> > -finline-functions into account, or cuz they just like to throw the
> > highest opt level on it?
>
> AFAIK that is wrong. FreeBSD uses -O for better debugging.

Not only for the kernel, but for the whole system. There are currently
some people who consider switching the standard C compiler from gcc
2.7.2.1 to egsc 1.x. It'd be interesting to know, if the *BSD kernels
suffer from similiar problems like the Linux kernel.

Sascha

Richard B. Johnson

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

I think that assemblers (as opposed to compilers) should be specifically
written for a platform. We have at this company collectively, about
100 years of Intel Assembly expertise. To force a foreign language
(GNU pseudo-assembly) on long-time assembly-language experts tends
to dampen their interest in becoming involved in helping to streamline
sections of code that are frequently executed. One of our experts
tried to port working video controller code to Linux and gave up
in disgust. This was going to sign-on with a Penguin Logo and show
startup messages in a box.

Another wanted to rewrite a driver for a 100-base Ethernet card in
assembly for Linux. This also failed because there was just too much
work to do converting Intel code to GNU 'stuff'.

Any/all of this stuff could be done with a real assembler. You get
real performance benefits if entire procedures (functions) are written
in the native machine-language. Portability-buffs can use 'C' substitutes
to keep their noses "clean", but those who wanted to save every CPU
cycle for user-mode work, could get their rocks off by making the
fastest (name your machine) kernel in the world.

FYI RISC machines are a real pain to work with in Assembly because
of the "always-larger-size-than-you-wanted" memory access. However,
there are experts available for those machines also. Linux is, after
all, world-wide. It would be nice to have the guy (or gal) who wrote
the latest 64-bit compiler, tweak you kernel with the assembly-language
sequences that make the machine scream.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.127 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

Alex Buell

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998, Richard B. Johnson wrote:

> It is thoretically possible to convert correct code to GNU `as` junk,
> however, the damn thing doesn't even do MACROs so if I am going to
> make:
> adc eax, [ebx+1000]
> adc eax, [ebx+996]
> adc eax, [ebx+992]

How about:

for (i = 1000; i < 0; i -= 4)
asm { adc eax, [ebx+i] }

OK, I know that was stupid. There's an Intel assembler that runs on Linux
called nasm-0.97, have you tried it? It accepts Intel opcodes and outputs
in a number of object formats including COFF, ELF and MS's very own object
code. It does link quite well with gcc object files.

> That's why we have BYTE PTR, WORD PTR, DWORD PTR, etc. Otherwise a
> zero could (and does) smash adjacent data. It's a very bad tool.

That's why they have movb, movw, and movl et. al.

Cheers,
Alex

--
/\_/\ Legalise cannabis now!
( o.o ) Grow some cannabis today!
> ^ < Peace, Love, Unity and Respect to all.

http://www.tahallah.demon.co.uk - *new* - rewritten for text browser users!

Linux tahallah 2.1.127 #61 SMP Sat Nov 7 18:17:58 EST 1998
Two Intel Pentium Pro processors, 331.78 total bogomips, 48M RAM
System library 2.0.100