People have been going on about software compatability with SA for ages
now (considering the number of posts about it it _seems_ like ages!) and
several times people have suggested turning off the cache.
I have two questions about this:
1) How can turning off the cache affect compatability - doesn't this only
affect the speed at which the code runs?
2) Several people have said that the SA110 with cache off would run
slower than an ARM610. Does the small primary cache on the 610 have such
a great affect, or is this only true for benchmark tests which can 'fit'
inside the cache?
Any takers?
-- . -- .
( ) Nigel Parker - nc...@cam.ac.uk
( (/oo\) )
( \''/ ) and Friends WW
( \/ ) wwwwww /__\
( ) w"ww ww"w | oo | _WWWWW_
oo ( ) W o""o W (o)(o) (|_()_|) / o o \ (+)(+)
w"()"w ( )W ______ W w" "w \__/ (| __O__ |)/ \
W -==- W ( ) "w \_\/_/ w" W -====- W /|\/|\ \ \___/ / \ -==- /
"wwww" ' -- ' ww""wwwwww""ww "w w" |||||||| /-------\ \ /
w" "w = = |||||||||||| w""""""""""w |||||||||=========| <\/\/\/>
Elmo Big Bird Oscar Cookie Monster Bert Ernie Kermit
'Fraid not. The cache actually alters the programmers model,
because you can't make any assumptions about being able to run
code in memory you've just written to. (Other than the
assumption that you probably can't.)
> 2) Several people have said that the SA110 with cache off
> would run slower than an ARM610. Does the small primary
> cache on the 610 have such a great affect, or is this only
> true for benchmark tests which can 'fit' inside the cache?
Well it has some effect otherwise they wouldn't put one on. If
you think about it, turning off the cache on the StrongARM means
that it is limited entirely by the speed of the bus. The ARM610
with the cache enabled is not - it is limited merely by its own
performance potential. Without the cache, the ARM610 would also
be bus-limited. So if you compare a processor which is running
faster than it would if it were limited by the bus with a
similar processor which is being limited by the bus, it should
be no suprise that the former runs faster.
It will affect most code. Otherwise a RiscPC would only be
slightly more than twice as fast as an unexpanded A440.
--
Ian Griffiths
> People have been going on about software compatability with SA for ages
> now (considering the number of posts about it it _seems_ like ages!) and
> several times people have suggested turning off the cache.
>
> I have two questions about this:
>
> 1) How can turning off the cache affect compatability - doesn't this only
> affect the speed at which the code runs?
Nope, because the problem is that code that modifies itself may only
modify the copy in the cache and not the one in the main memory. Thus
leading to corrupted code. If you turn off the cache, then main memory
has to be modified on every access.
> 2) Several people have said that the SA110 with cache off would run
> slower than an ARM610. Does the small primary cache on the 610 have such
> a great affect, or is this only true for benchmark tests which can 'fit'
> inside the cache?
With no cache, then any ARM chip will run with N and S cycles that are
based on the speed of the main memory (since this is the definition of
these cycles). If the chip is cached, these cycles are sped up to the
speed of the cache memory. And they are then the same speed as you don't
get non-sequential cache access.
But I digress.. The point is that with no cache, the ARM's speed (any
ARM) is based heavily on the speed of the main memory. With a cache, the
speed is based on the speed of the cache RAM, effectively the same
thing, but the cache ram is so much faster.
I have an A440 with an ARM3. With the cache off it runs at almost
exactly ARM2 speed. This is because even though the ARM3 is clocked at
25MHz, the memory is going at 8MHz,and this is the limiting factor.
Adding a 4k cache to the ARM3 gave a 300% speed increase. That's the
difference a (small) cache makes. The SA has 2 16k caches ...
The SA with cache off would run faster than a 610 with cache off, but
almost certainly slower than a 610, or even an ARM3 perhaps, with cache
on.
--
101 easy ways to say 'No'
I'd love to but ...
87: I'm having my baby shoes bronzed.
At a recent BARUG meeting the SA was demonstrated. One of our number tested
the SA against a 710 using a BASIC program to which was a timed 1 to 1000000
FOR/NEXT loop. The results were as follows:-
Cache on Cache off
710 1.47s 11.5s (from memory)
SA 0.21s 17.5s (from memory)
I presume the SA being slower with cache off has something to do with the
memory accesses being non-optimised.
--
Ray
Bradford on Avon, Wiltshire
> I have two questions about this:
>
> 1) How can turning off the cache affect compatability - doesn't this only
> affect the speed at which the code runs?
No. The reason that SA-110 will not run self modifying code stems from the
fact that it has a separate instruction cache (IIRC). If you modify code,
the new version will not make its way into the cache (?) and hence the
proggy dies a death. By disabling the cache the problem is avoided. You
could force an I-cache fluch after each code modification, I suppose, but
wouldn't that be defeating the object?
> 2) Several people have said that the SA110 with cache off would run
> slower than an ARM610.
Yes. That is True(ish).
> Does the small primary cache on the 610 have such a great affect,
Don't forget it is running twice as fast as the main memory. If you want to
get an idea of how fast SA-110 will be without the cache turned on, turn the
610's cache off. This is not very accurate due to the SA-110's enhanced ALU
and pipeline, but will give you an idea.
> or is this only true for benchmark tests which can 'fit' inside the cache?
Hell no. Turn your cache off NOW and you'll notice HUGE difference! RISC OS
thrives with the cache turned on; this is in part the reason why SA-110 may
not need L2 cache. Running RiscBSD is a different kettle of mnemonics,
though.
> Any takers?
Me!
(#$1A4) (sig LZW compressed) :-) :-)
Pat.
--
===============================================================================
Patrick Herborn, | p...@sn2.ee.umist.ac.uk | Preferred !!!!!
Electronic Engineer [?] | p...@e04.she.man.ac.uk | My RiscPC ! :-)
UMIST, Manchester | h...@fs2.ee.umist.ac.uk | PC Server :-(
UK | p...@nanobyte.demon.co.uk | Ye olde Demon Acct
===============================================================================
Immensely inspiring tagline follows....
... Oh, I couldn't afford a whole new brain.
> The SA with cache off would run faster than a 610 with cache off, but
> almost certainly slower than a 610, or even an ARM3 perhaps, with cache
> on.
I'm not sure why you say a SA with cache off would run faster than a 610
with cache off. I'm assuming you mean with the same memory clock rate (as
on the SA upgrade). In fact, I could well imagine the SA with cache off
being *SLOWER* than the 610 with cache off. As has been pointed out to me,
uncached LDMs on the SA do not use burst memory access so will take around
twice as long as on the 610.
Fortunately this is a rather academic argument - who would buy SA and not
enable the cache?
--
Richard.
>1) How can turning off the cache affect compatability - doesn't this only
>affect the speed at which the code runs?
I believe the problem is with code that alters itself while it is being
executed. Because the StrongARM has separate data and instruction
caches it sort of loses track of things if you try to alter an
instruction which is residing in the instruction cache. If you turn off
the cache the processor is forced to write directly to memory and to
read its instructions straight from memory, and therefore will work.
>2) Several people have said that the SA110 with cache off would run
>slower than an ARM610. Does the small primary cache on the 610 have such
>a great affect, or is this only true for benchmark tests which can 'fit'
>inside the cache?
ChangeFSI on an ARM600 in a RiscPC consistently performs three times
faster with the 8k cache switched on - a typical 'real life' benchmark.
A lot of people do not appreciate how little time a computer actually
spends accessing memory or how little code is actually executed most of
the time.
With the exception of texture-mapping, sprite plotting and other
applications that require high-speed repeated duplication of an
unchanging large quantity of memory, what seems like a tiny cache is
perfectly adequate to get the processor into the better half of its
maximum possible speed.
Well that's the ranting over. To answer your question, switching off a
processor cache reduces it to the speed of the computer's memory - 16MHz
in the case of the RiscPC. So a StrongARM with the cache switched off
would always run at under 16MIPS - the ARM600 will usually exceed
25MIPS I believe.
--
Cheers now,
John
All opinions expressed are my own and are thus most likely entirely wrong.
This is true. However, if the processor is running 15 times faster than
the memory bus then even a tiny cache miss rate will have a huge effect,
so as soon as you have code which doesn't fit in the L1 cache you'll feel
it.
--
e----><----p | Stephen Burke | Internet: bu...@desy.de
H H 1 | Gruppe FH1T (Lancaster) | DECnet: vxdesy::burke (13313::burke)
H H 11 | DESY, Notkestrasse 85 | JANET: bu...@v2.rl.ac.uk
HHHHH 1 | 22603 Hamburg, Germany | Phone: +49 40 8998 4564
H H 1 | "It is also a good rule not to put too much confidence in
H H 11111 | experimental results until they have been confirmed by theory"
> I believe the problem is with code that alters itself while it is being
> executed. Because the StrongARM has separate data and instruction
> caches it sort of loses track of things if you try to alter an
> instruction which is residing in the instruction cache. If you turn off
> the cache the processor is forced to write directly to memory and to
> read its instructions straight from memory, and therefore will work.
There are in fact three points of issue:
1) The processor core will always read instructions from the instruction
cache, if present regardless of whether the memory they have come from has
been updated since they were read. Don't forget that a cache line is 8
words, so if you've executed an instruction anywhere within a block of eight
words all will be cached, even if some of them don't contain valid
instructions at all.
2) The data cache is write back. Data is not written back to main memory
until a cache line is replaced (there are two dirty bits per cache line,
each indicating whether any of four words have been updated since read into
the cache). If code is modified but the modifications are still held in the
data cache when the processor comes to execute the new code, the old
contents of the memory locations will be read into the instruction cache
instead.
3) StrongARM does not check for data held in the write buffer when doing
instruction cache line fills, so you may be unlucky and find that code just
written before trying to execute it (even to memory that isn't cached at
all) will still be in the write buffer and hence, again, the old data will
be read into the instruction cache.
To be absolutely safe where self modification is concerned, you either:
a) Need to flush the entire instruction cache (you can't flush bits of it),
force a data cache write back for those areas of memory you've updated
between modifying the code and executing it and drain the write buffer (as
the new SWI OS_SynchroniseCodeAreas does), or
b) For code that wasn't written with StrongARM in mind, you have to turn off
both caches and the write buffer which seriously dents performance.
If all this sounds like bad news, it isn't too bad - the only problem
software is likely to be those applications written in ARM code. The only
third party application which I know doesn't work is Artworks, although I'm
sure there will be others. RISC OS itself gets existing compressed AIF
images to run by turning off the caches while the code is decompressed,
removing one major problem. I would guess that self decrypting code will
also be a problem.
> Well that's the ranting over. To answer your question, switching off a
> processor cache reduces it to the speed of the computer's memory - 16MHz
> in the case of the RiscPC. So a StrongARM with the cache switched off
> would always run at under 16MIPS - the ARM600 will usually exceed
> 25MIPS I believe.
In fact the processor will run at much less than 16MIPS with the caches
turned off - by the time you've considered memory stores and loads as well
as instruction fetches. It's interesting that StrongARM appears to run much
slower than ARM610 with caches and buffers switched off - it may be related
to the way StrongARM sychronises with the MClk when performing external
accesses.
--
Mark Smith - Surrey, UK ...http://public.logica.com/~smithmd...
This message was posted via my private internet account. It doesn't represent
the position of any individuals or organisations with whom I may be linked.
> It's interesting that StrongARM appears to run much
> slower than ARM610 with caches and buffers switched off - it may be related
> to the way StrongARM sychronises with the MClk when performing external
> accesses.
As I said on my earlier posting, unlike earlier ARM chips SA doesn't use
burst mode memory access for LDMs (and by the looks of it not for STMs
either) for non-cached locations. Burst mode would be about twice as fast
as single access.
--
Richard.
> Nigel Parker <nc...@cam.ac.uk> wrote:
> >1) How can turning off the cache affect compatability - doesn't this only
> >affect the speed at which the code runs?
> I believe the problem is with code that alters itself while it is being
> executed. [...] If you turn off the cache the processor is forced to write
> directly to memory and to read its instructions straight from memory, and
> therefore will work.
It's also important that the caches are flushed. Switching them off only
prevents new data from being read in: instructions may still come from the
instruction cache, and the data cache is still kept up to date wrt STR and
STM instructions.
--
+---------------------------------------------------------------------------+
| Darren Salt - arc...@spuddy.mew.co.uk - ToonArmy - darre...@unn.ac.uk |
| Acorn A3010, Spectrum +3, BBC Master - ARM powered - A Windoze free zone! |
+----01268-515441-for-free-email-&-Usenet------We'll-win-it-next-season!----+
1987: Acorn releases the world's first 32-bit RISC-computer :-p Apple!