Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TEST: ARM vs Pentium, MIPS, HP-PA

97 views
Skip to first unread message

Bedrijfskunde account

unread,
Jan 24, 1995, 7:01:36 AM1/24/95
to
FROM: Wilco Dijkstra (csg...@wing.rug.nl)

Hi there,

A few weeks ago, a friend of mine wrote a article about the ARM6.
We decided that it was a good idea to test different RISC and
CISC machines to see for ourselfes how fast an ARM *REALLY* is.
These are integer tests only, because the ARM doesn't have an FPU
on the chip (yet), and non maths programs never use floating-point,
because it FP is always slower than integer calculations.
As the results amazed us, we made them available for anyone:


TEST RESULTS COMPARING ARM TO PENTIUM, MIPS AND HP-PA:
=====================================================

Workstation RiscPC 602 HP9000/720 PC SGI Indy

Processor ARM 610 PA RISC 1.1 Pentium MIPS R4400

Architecture 32 bit 32 bit 32 bit 64 bit

Registers 16 32 4 32

Speed 30 MHz 50 MHz 90 MHz 150 MHz

Transistors 0.36 M ??? 3.1 M 2.3 M

Power 0.5 W @ 5V ??? 6 W @ 3.3V 12 W @ 3.3V

Cache 4 KB I+D extern 8 KB I/ 16 KB I/
128 KB I/ 8 KB D/ 16 KB D/
256 KB D 256 KB Ext 1 MB Ext

Busspeed (max) 64 MB/s +-100 MB/s +-120 MB/s 600 MB/s
Buswidth 32 bit 64 bit 64 bit 64 bit

OS RISC-OS 3.5 UNIX Windows-NT IRIS

Compiler Acorn C 4.00 GNU-C ? GNU-C ? GNU-C ?

Dhrystones 2.1 40 K 71 K 99 K 185 K
relative 1.00 1.80 2.50 4.60


Test 1: Normal Acorn C code against simple GNU optimisations (+O1)
The numbers mean x.xx times faster than a 30 MHz RiscPC.

Hanoi relative 1.00 2.02 4.32 6.10

Prime rel 1.00 1.31 4.73 4.43

Sieve rel 1.00 0.36 1.59 1.47

QSort rel 1.00 0.83 2.13 2.94

Average rel 1.00 1.13 3.19 3.74


Test 2: Handoptimized ARM code against maximum GNU optimisations (+O3)

Hanoi relative 1.00 0.83 1.14 2.25

Prime rel 1.00 0.76 2.33 2.40

Sieve rel 1.00 0.70 1.60 2.54

QSort rel 1.00 1.80 1.48 3.81

Average rel 1.00 1.02 1.62 2.74


Relative average speed @ 30 Mhz for optimized code = efficiency per MHz

Efficiency rel 1.00 0.61 ! 0.54 ! 0.55 !


All test programs are in C, except for the handoptimized code. All tests
do something usefull, they are examples of real programs. Note that the
routines are programmed for speed, so that smart compilers don't have too
much advantage. These tests test the processor speed, the primary and
secondary cache speed, and of course the compiler.

If you want these tests, write me. All tests are available for anyone
who wants to repeat them or test them on other systems. You can even
get the assembler versions for the Pentium, MIPS and HP and see how
optimal they are. I am VERY interested in the speed of the PowerPC and
the SuperSPARC. So if you happen to use one of these, please contact me.

As the ARM 610 is clocked at the lowest speed, has the smallest cache,
no secondary cache, no 64 bit bus, less hardware instructions (DIV!),
less registers, no writeback cache, and no full global optimizer in the
compiler, I felt it was *FAIR* to handoptimize the ARM code.
The other compilers did very good global analisys (up to 2 minutes to
compile 10 lines of C!), while the Acorn compiler generates a procedure
traceback (very slow!), has stack checking, uses callee saves registers
(the others use caller saves, which is faster) and compiles quickly
(within 5 seconds). So I did some SIMPLE optimisations (which could be
done by a smarter compiler) like tail-recursion, strength-reduction of
multiply, optimal register allocation, no stack checking, inlining of
the modulo function and loop unrolling.
No special tricks were used, eg. the algorithms stayed exactly the same,
and the number of iterations for each loop stayed the same, as wat the
amount of memory needed. So the same amount of work is done for all systems.

Besides: programming in ARM code is VERY EASY compared to other systems
(eg. fullly optimised Pentium code is slightly slower than less optimised!).
The ARM is designed to be programmed easily in machinelanguage.
I looked through the code of the Pentium, MIPS & PA. Their C-compilers
do produce pretty optimal code. The HP even used memory pipelining: while
comparing a word that was just read, the next word was fetched from memory.

Towers of Hanoi: this is a recursive program that calculates the moves
to solve the problem (depth = 20, so one million calls). It merely tests
the speed of recursive calls. As I already mentioned, the Acorn compiler
produces pretty bad code for recursive functions. After optimisation the
routine was 3.8 times faster! Note that the optimized version is the
fastest Hanoi routine if all systems were clocked at the same speed.

Prime: this routine calculates all primes up to 100000 by dividing
a suspected prime number by all possible divisors (up to the root of the
prime number of course). It uses one MUL and one MOD per iteration.
After inlining the MOD function and strength reducing the MUL, the
routine became twice as fast. Again, this routine is the fastest if
all systems are clocked at the same speed. This is actually
incredable, because all systems except for the ARM have HARDWARE
division. How is it possible that a SOFTWARE division is faster than
a HARDWARE division? I still can't explain why. Anyone?
(It is not because of the strenght-reduction of the multiply, because
the biggest value of it's operands is the root of 100000 = 316, so
it would take 10 / 2 + 1 = 6 cycles. The strength-reduced version uses
2 cycles. The modulo takes much more time, like 25 cycles on average.
It uses 2 cycles per bit, eg. 1024 % 4 would takes 16 cycles.)

Sieve: this is a routine that implements the Sieve of Erastosthenes.
All primes up to 1000000 are calculated. The routine tests mainly
the speed of byte-stores. This routine couldn't be made any faster,
because of the writebuffer, and the pretty small 5 million bytestores
per second for the RiscPC. It shows that the Pentium does have a fast
bytestore to the secondary cache. Probably a 6 cycle store at 90 MHz.

QuickSort: a version of the well known sorting algorithm. As 100000
random numbers were sorted, it takes 400 KB of memory, which seemed
to fit well in most secondary caches. Not for the RiscPC, of course.
The handoptimisations include loopunrolling and tail-recursion, what
resulted in 40% speed improvement. The Pentium probably uses 9-6-6-6
cycle cache refill. I had expected more from the 64bit secondary
cache. Only the MIPS performs good, with 75 MHz (10 ns) cache and
64 bit 4-2-2-2 cycle bursts at 150 MHz!

The results indicate that an ARM610 is not that slow. Actually, a simple
30 MHz processor can keep pace with 90 Mhz and even 64bit 150 MHz
processors... If all processors were clocked at 30 MHz, the ARM is the
most efficient processor, no matter what type of test is done.
At first we couldn't believe these results, and runned all
tests again and again. We thought the timings of the UNIX systems
were bad, but they weren't. We use the process timings, so only the
time the CPU spends in the testprocess is counted. All tests were run on
single user machines, so there was almost no process swapping. The
overhead for interrupts is indeed bigger on UNIX machines, but this is
mainly the result of a slow interrupt system. All tests were run at
least five times, and for at least 10 seconds to get accurate results.

A faster 64bit bus and a secondary writeback cache would make the ARM a
lot faster for the Quicksort and Sieve test. A higher clock will make both
Hanoi and Prime faster.
An ARM7 at 55 MHz will be FASTER than a Pentium and will get very
close to the 64bit MIPS!

The Hanoi program should be a hint for Acorn: their compiler can be made
much faster by better register allocation (please avoid loading the same
value multiple times in the same unchanged register), and please add an
option to avoid storing the function arguments unneccesary, and by
implementing real tailrecursion.


My conclusions: although the ARM seems like a toy for the more MHz = more
speed boys, it is a real competitor in the future. With a few really good
enhancements to the ARM (like zero or one cycle load/store/jump and going
asynchronous) it will be one of the fastest processors of the world.


Wilco Dijkstra (csg...@wing.rug.nl - Student IT - ARMPowered)

Having a good programmer and simple hardware is better than
having an average programmer and fast hardware!

Olly Betts

unread,
Jan 24, 1995, 8:41:47 AM1/24/95
to
In article <bdk.1.7...@rcondw.rug.nl>,

Bedrijfskunde account <b...@RcOndw.Rug.NL> wrote:
>(within 5 seconds). So I did some SIMPLE optimisations (which could be
>done by a smarter compiler) like tail-recursion, strength-reduction of
>multiply, optimal register allocation, no stack checking, inlining of
>the modulo function and loop unrolling.

Acorn C v4 should be able to spot tail-recursion, although you might
need to make it blatant (e.g. return recursed_function(); ). You can
also disable stack checking with #pragma no_check_stack (not #pragma
nocheck_stack as the manual implies).

Olly
--
Putting the "no" in technology.

Nathan D. Tuck

unread,
Jan 25, 1995, 4:54:15 PM1/25/95
to
In article <bdk.1.7...@RcOndw.Rug.NL>,
Bedrijfskunde account <b...@RcOndw.Rug.NL> wrote:
>FROM: Wilco Dijkstra (csg...@wing.rug.nl)

>
>Test 2: Handoptimized ARM code against maximum GNU optimisations (+O3)
>
>Hanoi relative 1.00 0.83 1.14 2.25
>
>Prime rel 1.00 0.76 2.33 2.40
>
>Sieve rel 1.00 0.70 1.60 2.54
>
>QSort rel 1.00 1.80 1.48 3.81
>
>Average rel 1.00 1.02 1.62 2.74
>
>
>Relative average speed @ 30 Mhz for optimized code = efficiency per MHz
>
>Efficiency rel 1.00 0.61 ! 0.54 ! 0.55 !
>
Your results are impressive, and one certainly wouldn't expect the
results to be so close. However, efficiency per Mhz is a misleading
figure to throw around. It assumes that as the ARM chip goes faster,
the bus and memory increase speed at the same rate. This simply is
not the case.

> If you want these tests, write me. All tests are available for anyone
>who wants to repeat them or test them on other systems. You can even
>get the assembler versions for the Pentium, MIPS and HP and see how
>optimal they are. I am VERY interested in the speed of the PowerPC and
>the SuperSPARC. So if you happen to use one of these, please contact me.

If you want to see PowerPC and SuperSparc (unless you mean
UltraSparc), I could easily provide you with some benches if you send
your code.

> As the ARM 610 is clocked at the lowest speed, has the smallest cache,
>no secondary cache, no 64 bit bus, less hardware instructions (DIV!),
>less registers, no writeback cache, and no full global optimizer in the
>compiler, I felt it was *FAIR* to handoptimize the ARM code.

Maybe a little bit unfair, any assembly bigot will tell you that he
can write faster code than any compiler. Sounds like we need a GNU
ARM compiler.

> The results indicate that an ARM610 is not that slow. Actually, a simple
>30 MHz processor can keep pace with 90 Mhz and even 64bit 150 MHz
>processors... If all processors were clocked at 30 MHz, the ARM is the
>most efficient processor, no matter what type of test is done.

If all the other processors were clocked at 30 Mhz, all their
operations would complete in 1 cycle, and they would be barely hurt at
all by cache misses. The compilers would know this, and the ARM would
most likely NOT be in first. Processor speed and performance are not
linearly related, at least with current technology (and particularly
faster Mhz current technology).

> An ARM7 at 55 MHz will be FASTER than a Pentium and will get very
>close to the 64bit MIPS!

If you have an ARM7 running at 55 Mhz, I would love to see the
timings. Otherwise I think it is a bit premature to say that.

Nate
nt...@hmc.edu


Michael Williams

unread,
Jan 26, 1995, 6:48:44 AM1/26/95
to
In article <3g6ha7$t...@jaws.cs.hmc.edu> nt...@muddcs.cs.hmc.edu (Nathan D. Tuck) writes:
>> An ARM7 at 55 MHz will be FASTER than a Pentium and will get very
>>close to the 64bit MIPS!
>
>If you have an ARM7 running at 55 Mhz, I would love to see the
>timings. Otherwise I think it is a bit premature to say that.

Various people were running ARM710s at 55MHz in Acorn Risc PCs at last
year's "Acorn World" show.

A 55MHz ARM would pull in the region of 49 dhrystone MIPS (86-87K
dhrystones). (The 8Kb cache on the ARM710 [vs. 4Kb on ARM610] also
means it will perform better with 'real' applications at this speed
than the ARM610 would.)

Mike.
____________________________________________________________________________
\ x / Michael Williams Advanced RISC Machines Limited
|\/|\/\ mwil...@armltd.co.uk Fulbourn Road, Cambridge, CB1 4JN, UK
| |(__)"I might well think that Matti, ARM Ltd. couldn't possibly comment."

Michael Williams

unread,
Jan 26, 1995, 6:59:11 AM1/26/95
to
In article <19950126....@mipc-08.brunel.ac.uk> ee91...@brunel.ac.uk writes:
>In article <bdk.1.7...@RcOndw.Rug.NL>,

> b...@RcOndw.Rug.NL (Bedrijfskunde account) wrote:
>
>> As the ARM 610 is clocked at the lowest speed, has the smallest cache,
>> no secondary cache, no 64 bit bus, less hardware instructions (DIV!),
>> less registers, no writeback cache
> ~~~~~~~~~~~~~~~~~~
>Yes it does.

Clarification needed here, I think. ARM610 does *not* have a writeback
cache. It has a write-through cache, and a write-buffer, which is not
the same thing. (The write-buffer is sometimes referred to as a write
cache, and this is where the confusion may come from.)

(A writeback cache is a data cache where data is read from and written
to the cache, but not written to main memory. When a cache line is
written to it is marked as dirty. If the cache replacement algorithm
decides that a dirty line needs to be thrown out in order to let a new
line enter the cache, then the dirty line is written out to memory. In
the ARM's write-through cache writes go to the cache and, at the same
time, are placed in a write-buffer and written to memory later. A
write-back cache has some overall performance advantages over the
write-through/write-buffer policy, but is more costly to implement.)

Mike. E&OE

Vareck Bostrom

unread,
Jan 27, 1995, 12:33:03 AM1/27/95
to
Michael Williams (michael....@armltd.co.uk) wrote:
: A 55MHz ARM would pull in the region of 49 dhrystone MIPS (86-87K

: dhrystones). (The 8Kb cache on the ARM710 [vs. 4Kb on ARM610] also
: means it will perform better with 'real' applications at this speed
: than the ARM610 would.)

I like the ARM, but I hate to point out that 86-87 k dhry2.1 is slower than
most new intel based PCs nowadays (I get 98300 dhry2.1 on a Pentium/90
running BSDi - and those results seem suspiciously low to me).

Has anyone run actual "real" benchmarks on the ARM? I'm interested specifically
in the "sim" benchmark.

Michael Williams

unread,
Jan 27, 1995, 6:15:52 AM1/27/95
to
In article <3ga0if$4...@jadzia.CSOS.ORST.EDU> bos...@CSOS.ORST.EDU (Vareck Bostrom) writes:
>Michael Williams (michael....@armltd.co.uk) wrote:
>: A 55MHz ARM would pull in the region of 49 dhrystone MIPS (86-87K
>: dhrystones).
>
>I like the ARM, but I hate to point out that 86-87 k dhry2.1 is slower than
>most new intel based PCs nowadays (I get 98300 dhry2.1 on a Pentium/90
>running BSDi - and those results seem suspiciously low to me).

Well the ARM is getting ~90% of the speed (in this dubious example) of
a supposedly dual-issue superscalar processor being clocked 60% faster
than it, which probably counts as suspicious. Unless the Pentium isn't
all its cracked up to be...

>Has anyone run actual "real" benchmarks on the ARM? I'm interested specifically
>in the "sim" benchmark.

If you can point me at it, I might have a look in my spare time. Bare in
mind that I'll be doing this under ARMulator on a SPARC2 (SPARC5 if I'm
lucky...).

Tom Hughes

unread,
Jan 29, 1995, 4:54:32 PM1/29/95
to
In article <3ga0if$4...@jadzia.CSOS.ORST.EDU> Vareck Bostrom (bos...@CSOS.ORST.EDU) wrote:

> I like the ARM, but I hate to point out that 86-87 k dhry2.1 is slower than
> most new intel based PCs nowadays (I get 98300 dhry2.1 on a Pentium/90
> running BSDi - and those results seem suspiciously low to me).

Sounds suspicious to me too. I seem to remember getting about 130k dhrystons
out of a Pentium 90 at work (running Unixware).

Tom

--
Tom Hughes (t...@compton.demon.co.uk)
5 Lampits, Hoddesdon, Herts EN11 8EH
... There's nothing quite so wonderful as money.

Torben AEgidius Mogensen

unread,
Jan 30, 1995, 6:59:04 AM1/30/95
to
michael....@armltd.co.uk (Michael Williams) writes:

>Clarification needed here, I think. ARM610 does *not* have a writeback
>cache. It has a write-through cache, and a write-buffer, which is not
>the same thing. (The write-buffer is sometimes referred to as a write
>cache, and this is where the confusion may come from.)

>(A writeback cache is a data cache where data is read from and written
>to the cache, but not written to main memory. When a cache line is
>written to it is marked as dirty. If the cache replacement algorithm
>decides that a dirty line needs to be thrown out in order to let a new
>line enter the cache, then the dirty line is written out to memory. In
>the ARM's write-through cache writes go to the cache and, at the same
>time, are placed in a write-buffer and written to memory later. A
>write-back cache has some overall performance advantages over the
>write-through/write-buffer policy, but is more costly to implement.)

And requires bus-snooping if you want to implement shared-memory
multiprocessing (e.g. a RiscPC with a 2nd processor card).

Torben Mogensen (tor...@diku.dk)

Torben AEgidius Mogensen

unread,
Jan 30, 1995, 7:03:27 AM1/30/95
to
michael....@armltd.co.uk (Michael Williams) writes:

>In article <3ga0if$4...@jadzia.CSOS.ORST.EDU> bos...@CSOS.ORST.EDU (Vareck Bostrom) writes:
>>Michael Williams (michael....@armltd.co.uk) wrote:
>>: A 55MHz ARM would pull in the region of 49 dhrystone MIPS (86-87K
>>: dhrystones).
>>
>>I like the ARM, but I hate to point out that 86-87 k dhry2.1 is slower than
>>most new intel based PCs nowadays (I get 98300 dhry2.1 on a Pentium/90
>>running BSDi - and those results seem suspiciously low to me).

>Well the ARM is getting ~90% of the speed (in this dubious example) of
>a supposedly dual-issue superscalar processor being clocked 60% faster
>than it, which probably counts as suspicious. Unless the Pentium isn't
>all its cracked up to be...

If the code for the Pentium isn't optimized for Pentium, I'm not
surprised. Anyway, Pentium is more like 1.5 issue instead of
dual-issue, as only simple instructions are supported on the second
functional unit. Similarly, PowerPC 601 is in practice about 1.3
issue, as there is one integer unit and one branch unit, which tends
to get less than 30% saturation. I'm not counting FP units for either
of the processors here.

Torben Mogensen (tor...@diku.dk)

Ian Griffiths

unread,
Jan 30, 1995, 7:45:49 AM1/30/95
to
Michael Williams (michael....@armltd.co.uk) wrote:

: Well the ARM is getting ~90% of the speed (in this dubious example) of


: a supposedly dual-issue superscalar processor being clocked 60% faster
: than it, which probably counts as suspicious. Unless the Pentium isn't
: all its cracked up to be...

Whatever makes you say that? :-)

Maybe Intel are using 'Clock halving' techniques to provide an upgrade
path for future Pentiums.

Seriously though, I gather that there are serious problems with the
Pentium's dual-issue system where one of the pipelines can stall and not
get recovered for a seriously long time. I don't know the details, but
it is in this area that at least one of Intel's competitors claims to
have a better design and significant speed advantage. Not that they've
started shipping chips yet mind...

--
/----------------------------------------------------------------------------\
| Ian Griffiths - ATM Adapter Development | igri...@madge.com |
| Madge Networks Ltd. | igri...@madge1.demon.co.uk |
>------------------------------------------/ i...@intrac.demon.co.uk |
| Since I'm posting on Usenet, you already \---------------------------------<
| know that I'm opinionated. You should be aware however that these opinions |
| have NOTHING whatsoever to do with Madge Networks Ltd. They're my fault. |
\----------------------------------------------------------------------------/

Ian Griffiths

unread,
Jan 30, 1995, 7:49:56 AM1/30/95
to
Tom Hughes (t...@compton.demon.co.uk) wrote:
: Vareck Bostrom (bos...@CSOS.ORST.EDU) wrote:

: > I like the ARM, but I hate to point out that 86-87 k dhry2.1 is slower than
: > most new intel based PCs nowadays (I get 98300 dhry2.1 on a Pentium/90
: > running BSDi - and those results seem suspiciously low to me).

: Sounds suspicious to me too. I seem to remember getting about 130k dhrystons
: out of a Pentium 90 at work (running Unixware).

Actually thinking about it, it's going to depend a lot on your compiler.

What compiler was being used on the BSDi system? If it's gcc, then that
may make a big difference. Last time I tried it was fairly easy to
demonstrate that the optimiser on gcc is broken (at least on Intel platforms)
with a simple program about 4 lines long... I think commercial 80x86
compilers have got fairly good these days. Also, if you compile on a
'Pentium aware' compiler, which knows about the instruction issue scheme
and can therefore do intelligent re-ordering, this will again make a big
difference.

Even then, the 98.3k sounds fairly low, and the 130k not a whole lot better
in comparison to the ARM, so my previous post about the Pentium not being
all that good is probably not contradicted by this. :-)

Eric Smith

unread,
Jan 30, 1995, 5:07:15 PM1/30/95
to
In article <fanf2-28019...@haroeris.trin.cam.ac.uk> fa...@cam.ac.uk (Tony Finch) writes:
> I hardly think it's fair to compare the elephantine Pentium with the
> thrifty ARM. Dhrystones per transistor? :-)

That's perhaps an interesting benchmark for chip designers, but I'm personally
much more interested in Dhrystones (or Specints) per unit cost, or per
Watt.

Eric

Ian Griffiths

unread,
Jan 31, 1995, 7:59:18 AM1/31/95
to
Torben AEgidius Mogensen (tor...@diku.dk) wrote:
: michael....@armltd.co.uk (Michael Williams) writes:

: >write-back cache has some overall performance advantages over the


: >write-through/write-buffer policy, but is more costly to implement.)

: And requires bus-snooping if you want to implement shared-memory
: multiprocessing (e.g. a RiscPC with a 2nd processor card).

It can also cause problems with things lurking in the cache longer than
you want:

The i860 has a write-back cache, and it took me ages to work out why
a program I'd written to run on an i860 system appeared to be corrupting
the screen in a highly bizarre way. It turned out that the central program
loop, which just updated a cursor on the screen using EORing was small
enough to fit in the cache, including all the system calls it made, so
the updates to the cursor never completely left the cache...

Nick Stephen

unread,
Feb 1, 1995, 7:58:55 AM2/1/95
to
In article <3gnrq0$h...@sunforest.mantis.co.uk>, ol...@mantis.co.uk (Olly Betts) writes:
> In article <D39tE...@madge1.demon.co.uk>,

> Ian Griffiths <igri...@madge1.demon.co.uk> wrote:
> >
> >The i860 has a write-back cache, and it took me ages to work out why
> >a program I'd written to run on an i860 system appeared to be corrupting
> >the screen in a highly bizarre way. It turned out that the central program
> >loop, which just updated a cursor on the screen using EORing was small
> >enough to fit in the cache, including all the system calls it made, so
> >the updates to the cursor never completely left the cache...
>
> Compiler warning: Your code is too efficient. Please add bloat.

Sounds to me like the cache controller wasn't properly configured (normally
the job of the O/S). Since when should video memory or any memory mapped I/O
registers be cached via a write-back cache? It's asking for trouble.

Nick

--
Nick Stephen Email: ste...@gr.osf.org
OSF Research Institute Phone: +33 76 63 48 72
2, Avenue de Vignate Fax: +33 76 51 05 32
38610 Gieres - France

M J Ebourne

unread,
Feb 1, 1995, 1:20:47 PM2/1/95
to
In article <D37yB...@madge1.demon.co.uk>,

Ian Griffiths <igri...@madge1.demon.co.uk> wrote:
>What compiler was being used on the BSDi system? If it's gcc, then that
>may make a big difference. Last time I tried it was fairly easy to
>demonstrate that the optimiser on gcc is broken (at least on Intel platforms)
>with a simple program about 4 lines long... I think commercial 80x86

Well according the the Linux FAQ intel have got the Pentium optimisations
under nda and are charging lots of money for them or something. Hence
GCC can't optimise for it. They recommend you don't get a pentium. :-)

--
Martin Ebourne "Pray that there's intelligent
Electronics, life somehere out in space,
Southampton University because there's bugger all
Email: m...@soton.ac.uk down here on earth." - Python

Olly Betts

unread,
Feb 1, 1995, 6:37:36 AM2/1/95
to
In article <D39tE...@madge1.demon.co.uk>,

Ian Griffiths <igri...@madge1.demon.co.uk> wrote:
>
>The i860 has a write-back cache, and it took me ages to work out why
>a program I'd written to run on an i860 system appeared to be corrupting
>the screen in a highly bizarre way. It turned out that the central program
>loop, which just updated a cursor on the screen using EORing was small
>enough to fit in the cache, including all the system calls it made, so
>the updates to the cursor never completely left the cache...

Compiler warning: Your code is too efficient. Please add bloat.

Olly

Ian Griffiths

unread,
Feb 2, 1995, 5:52:29 AM2/2/95
to
Nick Stephen (ste...@osf.org) wrote:
: ol...@mantis.co.uk (Olly Betts) writes:
: > Ian Griffiths <igri...@madge1.demon.co.uk> wrote:
: > >
: > > [stuff about cache problems on an i860]
: > Compiler warning: Your code is too efficient. Please add bloat.

: Sounds to me like the cache controller wasn't properly configured (normally
: the job of the O/S). Since when should video memory or any memory mapped I/O
: registers be cached via a write-back cache? It's asking for trouble.

OS? What's a one of them then? The system in question was based around
a PC (which barely has an OS anyway) with an i860 card in an ISA slot.
This card didn't really have an OS. It had a downloader, and call gates
that talked to a program on the PC supplying about 5 standard C I/O
functions, and also gave access to the mouse. Graphics were done via
a library.

This sort of explains why it didn't work. Then again the graphics libraries
could have done it, but since they were written to assume that you were
going to use a double-buffering scheme (which I'd kludged out for speed)
and did a cache flush when you swapped buffers, this was not, in general
a problem, and indeed was an advantage. If you're doing line drawing,
unless you have some rather clever horizontal line optimisations, a
write-back cache will actually speed things up in quite a lot of cases.

0 new messages