> "SG" <s...@erols.com> writes:
>
> >You first have to understand whats going on here, and obviously
> >you don't. Let me explain it once more:
>
> Actually, let me explain. When it comes to credentials with respect to
> UAE-JIT (and Amithlon), I suspect there is little doubt who knows better
> what's going on.
>
> But if anybody hasn't made up their mind yet, I invite you to compare
> Steve's emotionally charged verbiage with the actual hard facts I will
> intersperse....
The hard fact is, you "forgot" to tell your customers that
UAE-JIT is even slower than UAE interpretive(!) until a simple
loop is executed identically five times, at which point it dead
stops(!) to recompile the software into sloppy x86 code in the
hope that the task will last long enough to overcome all that
water under the bridge once the loop is invoked again, if its
invoked again, or wouldn't have finished long ago.
It took me to drag that admission out of you, otherwise no one
would know that to this day.
> >Interpretive emulation (UAE non-JIT) is very, very slow. It
> >uses pre-compiled translation that ultimately coverts 68K
> >instructions into x86 executable instructions step by
> >intricate step, every time it encounters one (all the time when
> >running AOS, obviously).
>
> It requires a fairly constant average of about 70 host cycles for each
> 68k instruction, over a wide range of 68k software tested. Which means
> that on a nice 1.75GHz machine, you are looking at 25 million 68k instructions
> executed.
> For small loops with small data sets, which actually fit into an 060's cache,
> the 060 will come out ahead by a factor of 2-3. However, as soon as either
> the active code size or the active data set goes beyond the size of the 060's
> caches, *it* will slow down quite a bit. The interpretative CPU emulation
> in UAE, however, *already* suffers quite atrocious pipeline stalls for each
> instruction, so it will slow down considerably less due to those extra
> stalls (even assuming that the much larger caches are also getting thrashed).
Which of course applies equally to your emulation, since as we've
now, finally, after dragging it out of you know, your product is
UAE interpretive until an identical loop is exectuted at least 5
times, at which point UAE-JIT actually stops to compile in the
hope that this loop is a long number crunching task.
If its not a long task, you're screwed. Not only are you using a
very slow emulation (straight UAE, minus), you are using it until
a task almost completes standard-UAE-slowly, then stopping to
wait for a recompile, only to find very little of the task
remaining anyway.
Since AmigaOS is so fast in general, the vast, vast majority of
the time this is what's going to happen. I wonder if people
realized JIT was slower than UAE interpretive until it found a
simple untouched loop thats already executed five times, when
they bought it from you?
> >Adding a JIT compiler dramatically slows this process down,
> >essentially bringing the entire emulation to a dead stop
> >while 68K code is dynamically compiled on the fly (Just In
> >Time) into hasty x86 native code. Once compiled, the sloppy x86
> >code generally runs faster than UAE-interpretive can run it with
> >its every-time pre-compiled instruction translation processes.
> >Until, of course, the code being iterated changes or completes
> >its task, then its back to square one.
>
> Except for the use of "dramatically", "dead", "hasty", "generally", "sloppy"
> and so on, that is basically correct.
>
> Once it has been determined that a block (a basic code fragment starting
> anywhere, and ending on the next possible control flow instruction) has been
> executed 5 times (using the interpretative emulation), the JIT compiler
> guesses that that particular block is likely to be executed several more
> times.
That would be a stone dumb assumption, but have no fear, its not
true. The assumption is: after 5x through identical code
sustained and untouched, the loop must continue to occur for
millions of times, if not billions, or even gazillions of
times. Thats why you wait for 5 and not 2, or 3, or 4, utterly
identical passes. JIT after 2 iterations would slow the already
slow interpretive emulation to crawl.
It goes something like this:
1 time through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly),2 times through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly)...
<repeat zillions of times to no avail>
Then... ...red alert, red alert, man your battlestations ....we
might have something here....
1 time through some code (sloooooowly), 2 times through some code
(sloooooowly), 3 time through some code (sloooooowly), 4 times
through some code (sloooooowly), 5 times through some code
(sloooooowly)...
Ahhhhhh! Throw the swtich! Oh my God, JIT might help if this
keeps going. QUICK!! No one is likely to be looking while we
are doing something so boring... HIT THE BRAKES!!!
Screeeeeeeeeecccccccccccccccccccchhhhhhhhhhhhhhh!
BANG!
Chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug chug chug chug chug chug chug chug chug
chug chug chug chug chug <times 18,000 per simple instruction,
if enough cache space is available>
Ok, go go go! Let the 25% speed x86 turtle out out of the
starting gate, "Run baby, run!"
Oh my God, the task is still running! Run, you deformed little
x86 mutant, run!! Its still not over, holy moly, we might
actually wi... Oh darn, the task wouldv'e finished already even
at slow UAE speed.
Nevermind.
...1 time through some code (sloooooowly), 1 time through some
code (sloooooowly), 1 time through some code (sloooooowly), 1
time through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly), 1 time through some code
(sloooooowly), 1 time through some code (sloooooowly), 1 time
through some code (sloooooowly),2 times through some code
(sloooooowly), 1 time through some code (sloooooowly)
And so on...
> In order to make things faster on those subsequent executions,
> the 68k code is compiled into x86 code.
> That code, while generated in a very straightforward manner (due to the
> time constraints),
"dramatically", "dead", "hasty", "generally", "sloppy"
> is actually quite optimized in a number of ways --- so
> much so that some artificial benchmarks are actually pretty much optimized
> away.
Oh that sounds enouraging.
> >Since JIT compiling is initially so *incredibly* slow compared to
> >interpretive instruction translation,
>
> Let's inject some numbers here, shall we? I have Amithlon at my fingertips,
> so I measured it on my Duron-8000. The "*incredibly* slow" JIT compilation
> takes a fairly stable average of 4000 host cycles per 68k instruction.
You mean, just to generate the dirty x86 code that has no hope of
achieving the host chip's actual native potential. Then, it has
to begin running it. Or did you "forget" that code actually has
to run, not just compile? My, my, aren't we forgetful today?
Even the absolute simplest, and more or less indefinite, loops
like RC5 and OGR only run at about 25% the x86 native processor's
demonstrated speed. That's -after- the massive initial
overhead penalty, the penalty above that you hoped to dupe people
into believing was all that was required to actually execute the
task, had long ago taken its toll.
> Combined with the average length of a block (4-5 instructions), it's
> 18,000 hist cycles per compiled block.
>
> Compared to the 70 host cycles per instruction for interpretative emulation,
> the ratio is about 60.
Ok so UAE-JIT is, roughly, 60x slower than UAE interpretive
(propbably more like 100x slower, since you are again
"forgetting" about the supervisor overhead that makes these
decisions) the first 6 times thorugh a simple number crunching
loop.
> >it isn't used at all, even within UAE-JIT until, by trial and error,
> >the emulator determines there might be a chance of some long term benefit.
>
> "Trial and error" of course referring to simply keeping a count of how often
> each particular block of code has been executed.
Oh really, it "chooses" what code is to be run? Or, can it only
look for iterative loops in whatever code its fed by the
operator? Sometimes finding an opportunity; sometimes not.
And rarely winning the recompile gamble even when it see the 5
identical iterations.
You do have one thing going for you though, a huge thing, UAE
Interpretive is so slow. Thats a big deal as you know. If
you were competing against a 604e speed chip, instead of an 020
or 030 speed emulated chip, it would be very difficult to ever
justisfy the lengthy recompile. This isn't God awful slow
Windows, beginning to end AOS responses aren't commonly measured
in seconds. They are measured in microseconds.
Using AOS, going even slower for less than few second tasks is
a very bad tradeoff.
> >Once it determines that code is being simply iteratively looped
>
> Nope, once it determines that a particular block of code has been used
> at least 5 times...
>
> >it halts the pre-compiled interpretive process dead, and takes a long
> >imeout
>
> ..it spends an average 18,000 host cycles (at 800,000,000 host cycles per
> second, that's 22.5 microseconds)...
Per instruction. Before it runs anything. Runs anything in
native code so sloppy it demonstrates 25% native speed, best
case. Assuming the gamble pays off and the loop didn't finish
long ago.
And completely devastating (better description: complete
disregarding) Exec's required scheduling resolution, putting
multitasking on hold, killing the very round robin pyramid scheme
that allows AOS to run so beautifully in the first place on UAE
Interpretive's faster 020-030 power.
> So to make up that opportunity cost, it takes about 62 further executions
> of the JIT-compiled block. Trust me, *most* things which get executed 5 times
> will get executed 67 times, and most will get executed many many more times.
Assuming there is no cost to actually execute the x86 code, its a
67:1 tradefoff--if there was no supervisor code and kernel
running to drag performance down even farther.
<more of the same snipped>
> >Read the PCTask-JIT Amiga-->Windows emulation docs, for an honest
> >explanation of when JIT compiling might help or hurt...
>
> The explanation of an old x86-to-68k JIT compiler might not be
> the ideal source of information about a recent 68k-to-x86 JIT compiler
At least he didn't try to pull the wool over all the emulation
user's eyes, like you did. He fessed up without being drug out
on to the carpet.
Bottomline(s):
UAE-JIT and Amithlon (assuming UAE interpretive did similarly little
in terms of emulation power) are always slower than simple UAE
Interpretive during the huge fraction of the time one isn't doing
long term, sustained, unchanged, non-multitasking, simple number
crunching for millions of loops. Bad AOS tradeoff.
For number crunching tasks that complete within a second or two,
UAE-JIT/Amithon are anywhere from slightly slower on the long
end of that time range, to massively slower on the short end.
Bad AOS tradeoff.
For the few (if any) times you let the computer sit untouched
and crunch for a few seconds+, UAE-JIT can provide big advantages
over UAE interpretive. Though, at that point, you are using some
other OS kernel at best to control what multitasking might occur,
or a completely bastardized Amiga kernel at worst. Considering
Amigans demand perfect responsiveness even while
numbercrunching, thats also a bad AOS tradeoff.
And remember, the "benefit" we are talking about here is relative
to 020-030 power UAE Interpretive, hardly applicable to a ng
where AmigaPPC native rules the roost.
Lets enter the real world for a second. Giving UAE-JIT/Amithlon
every possible benefit of the doubt: gazillions of simple
unchanging iterations, no general use or multitasking, computer
completely untouched while quietly numbercrunching (IOWs, not the
real world)...
RC5
---
JIT-x86 code @ 266MHz = 133 Kk/sec
68060 @ 66MHz = 210 Kk/sec
604e @ 280MHz = 926 Kk/sec
OGR
---
JIT-x86 code @ 266MHz = 0.230 Mn/sec
68060 @ 66MHz = 0.420 Mn/sec
604e @ 280MHz = 3.654 Mn/sec
Interestingly, thats using x86 hardware that cost more than my
CS-PPC over 2 years after my CS-PPC was added. And that x86
machine only achieves 000-020 speeds when not-simple-looping...
-Steve
[Bonkers, pure bonkers.]
So, "SG", how do you explain people in the real world being extremely
happy with Amithlon's speed?
Byeeeee.
--
http://www.geocities.com/brettocallaghan
Home of QDNStats V2 - Newsgroup Stats for Agent
> The hard fact is, you "forgot" to tell your customers that
> UAE-JIT is even slower than UAE interpretive(!) until a simple
> loop is executed identically five times, at which point it dead
> stops(!) to recompile the software into sloppy x86 code in the
> hope that the task will last long enough to overcome all that
> water under the bridge once the loop is invoked again, if its
> invoked again, or wouldn't have finished long ago.
Erm, Steve - if a basic block is executed 5 times, then statistically it wil
be run many many more times. It is a basic point of being a JIT - there is
no "forgot" about it, it is how a JIT works.
1) It isn't a loop. It is a basic block. If you do not know what a basic
block is, then please retake Compilers 101.
2) It compiles into reasonably optimised x86 code, not sloppy x86 code
3) The one-time hit that allows the code to run 30x faster subsequently is
an extremely useful thing to have.
> Which of course applies equally to your emulation, since as we've
> now, finally, after dragging it out of you know, your product is
> UAE interpretive until an identical loop is exectuted at least 5
> times, at which point UAE-JIT actually stops to compile in the
> hope that this loop is a long number crunching task.
But is is a basic block, not a loop. A small section of code - 5 to 10
instructions long on average, that takes practically no time in the real
world to compile to native code. E.g., any AmigaOS library function will be
accessed more that 5 times. Exec will be accessed more than 5 times. The
vast majority of any application will be accessed more than 5 times.
> If its not a long task, you're screwed. Not only are you using a
> very slow emulation (straight UAE, minus), you are using it until
> a task almost completes standard-UAE-slowly, then stopping to
> wait for a recompile, only to find very little of the task
> remaining anyway.
How many real-world tasks do you know that only execute bits of the code 5
to 50 times, no less, no more? About 0.001% of actual programs. Get over
it.
> That would be a stone dumb assumption, but have no fear, its not
> true. The assumption is: after 5x through identical code
> sustained and untouched, the loop must continue to occur for
> millions of times, if not billions, or even gazillions of
> times.
Bernie gave the maths that it was around 70 times more before you pulled
ahead of the interpretive code.
> 1 time through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly),2 times through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly)...
>
> <repeat zillions of times to no avail>
That is the interpretive method.
> if enough cache space is available>
Well, with 100's of MB's available to use, I don't think that is a problem
for Amithlon.
> Ok, go go go! Let the 25% speed x86 turtle out out of the
> starting gate, "Run baby, run!"
30x faster than the interpretive version.
> Oh my God, the task is still running! Run, you deformed little
> x86 mutant, run!! Its still not over, holy moly, we might
> actually wi... Oh darn, the task wouldv'e finished already even
> at slow UAE speed.
Only in 0.01% of real-world applications.
> Even the absolute simplest, and more or less indefinite, loops
> like RC5 and OGR only run at about 25% the x86 native processor's
> demonstrated speed. That's -after- the massive initial
> overhead penalty, the penalty above that you hoped to dupe people
> into believing was all that was required to actually execute the
> task, had long ago taken its toll.
That initial overhead means zilch if you run the benchmark for even a
minute. As you will be running it for hours, it means nothing at all.
Interpretive RC5 and OGR would be running at around 2% of native x86 speed.
A speed up to 25% is therefore good, no?
> Ok so UAE-JIT is, roughly, 60x slower than UAE interpretive
> (propbably more like 100x slower, since you are again
> "forgetting" about the supervisor overhead that makes these
> decisions) the first 6 times thorugh a simple number crunching
> loop.
Basic block.
> Per instruction. Before it runs anything. Runs anything in
> native code so sloppy it demonstrates 25% native speed, best
> case. Assuming the gamble pays off and the loop didn't finish
> long ago.
Basic Block.
It is an entirely reasonable and sensible expectation that if a basic block
is executed 5 times, it will be executed many, many, many more times -
possibly in the millions or billions of times even, for OS calls. So once
in a 100,000 times you compile a basic block and it isn't run many more
times - whoopie do, I think we can live with that for the massive general
increase in performance.
> RC5
> ---
> JIT-x86 code @ 266MHz = 133 Kk/sec
> 68060 @ 66MHz = 210 Kk/sec
> 604e @ 280MHz = 926 Kk/sec
> OGR
> ---
> JIT-x86 code @ 266MHz = 0.230 Mn/sec
> 68060 @ 66MHz = 0.420 Mn/sec
> 604e @ 280MHz = 3.654 Mn/sec
Ah, your favourite cache resident benchmark running on your crippled laptop.
Why haven't you tried running a decent benchmark - try rendering a scene in
Imagine 4 on all 3 platforms and get back to us with the results. Why not
use the 1GHz PIII system that you have?
Graham
Cool... so a modern PC, may be a decent P4, should
be about the same speed as a 280Mhz 604e (since
a 266Mhz laptop is super slow).
And Amithlon is faster than UAE, so would zoom....
and with V2.0 coming out, is going to be great.
Nice.
> Cool... so a modern PC, may be a decent P4, should be about the same
> speed as a 280Mhz 604e (since a 266Mhz laptop is super slow).
Until OS4, the 280Mhz PPC is only used for some games, and some of the
more difficult tasks faced by programs. the rest of the time its the 68k
thats doing the job....
> And Amithlon is faster than UAE, so would zoom.... and with V2.0
> coming out, is going to be great.
>
> Nice.
Very much so :)
Steve? Hard facts? Hah!
> UAE-JIT is even slower than UAE interpretive(!) until a simple
On your 266 Mhz piece of crap...see chart below.
>
> Since AmigaOS is so fast in general, the vast, vast majority of
Especially fast on Amithlon.
> > >Adding a JIT compiler dramatically slows this process down,
> > >essentially bringing the entire emulation to a dead stop
Lots of things come to a dead stop on your laptop.
> > >while 68K code is dynamically compiled on the fly (Just In
> > >Time) into hasty x86 native code. Once compiled, the sloppy x86
Nothing hasty on your laptop.
<Comedy Snipped>
>
> Ok, go go go! Let the 25% speed x86 turtle out out of the
> starting gate, "Run baby, run!"
>
You should replace your turtle with a rabbit and try the race again.
> Oh my God, the task is still running! Run, you deformed little
Still running after all that time. You said that UAE-Jit crashes after a
few seconds on your trusty (apparantly not stable) laptop.
> x86 mutant, run!! Its still not over, holy moly, we might
> actually wi... Oh darn, the task wouldv'e finished already even
> at slow UAE speed.
>
> Nevermind.
Steve...this is too much. Have you considered standup-comedy?
<Snipped a bunch of stuff here>
> OGR
> ---
Amithlon @ 1.5 GHZ = 3.811 Mn/sec
> JIT-x86 code @ 266MHz = 0.230 Mn/sec
> 68060 @ 66MHz = 0.420 Mn/sec
> 604e @ 280MHz = 3.654 Mn/sec
>
> Interestingly, thats using x86 hardware that cost more than my
> CS-PPC over 2 years after my CS-PPC was added. And that x86
> machine only achieves 000-020 speeds when not-simple-looping...
>
> http://amigapro.com/UAE.html
>
> -Steve
Now that I've managed to get up off the floor, I've zeroed in on your
problem (other than your lying habit). You are running UAE on absolute
crap! Of course your hardware was expensive...it's a laptop. Might I
point out that it was one of the cheapest laptops at the time. If my PC
performed that slowly, I'd drop it in the trash. No one but you gets
performance anywhere near as bad as yours.
[Jun 26 14:50:43 UTC] OGR: Saved 25/8-6-21-7-12-13 (11.30% done)
0.00:00:08.54 - [3,811,227 nodes/s]
Amithlon - AthlonXP 1800+ ($139.00 with mobo, SB128 $11.00, Adaptec SCSI
$15.00, 256MB DDR Ram $50.00, NVidia GeForce 2 $70.00, Case $40.00, HDD
from my miggies) What is that...a little over $300?
--
David S. Lund
Startup-Sequence/DSL Computer Consulting
813A Trailwood Dr., Hurst TX 76053
(817) 282-7276
www.startup-sequence.com
That's really a funny statement, specially considering that it comes
from someone who uses his 1.5MHz processor to do the same things that
one can do with a 240Mhz processor...
--
"Everything is vague to a degree you do not realize till you have tried
to make it precise"
Bertrand Russell
The Philosophy of Logical Atomism
> That's really a funny statement, specially considering that it comes
> from someone who uses his 1.5MHz processor to do the same things that
> one can do with a 240Mhz processor...
Sorry, I meant 280Mhz
> That's really a funny statement, specially considering that it comes
> from someone who uses his 1.5MHz processor to do the same things that
> one can do with a 240Mhz processor...
Yeah, but that isn't the issue. The issue is Steve saying that JIT is slower
than interpreted.
JIT isn't as fast as native of course, it is around 25% of the native x86
speed. So 1.5GHz / 4 is under 400MHz. Not too bad considering that the PPC
is better suited to this kind of application (more registers, etc) than the
x86 from my experience.
Maybe someone will make a native RC5/OGR for Amithlon, then we can see what
it can really do :)
Graham
> That's really a funny statement, specially considering that it comes
> from someone who uses his 1.5MHz processor to do the same things that
> one can do with a 240Mhz processor...
Actually, I'm very intrested to hear what system you are using
and why's that so wonderful (or is it) ?
In case it's amiga system, some explanation can be inserted just below my
text :
<here>
-- thanks!
It's not ME who's using it. I just read the benchmarks people here
posted and draw my conclusions. You should do that to, instead of
jumping to the neck of people who say things you don't like.
Davis S. Lund said:
> Amithlon @ 1.5 GHZ = 3.811 Mn/sec
> > JIT-x86 code @ 266MHz = 0.230 Mn/sec
> > 68060 @ 66MHz = 0.420 Mn/sec
> > 604e @ 280MHz = 3.654 Mn/sec
It appears to me that the 1.5GHz Machine running Amithlon performs just
a little bit better than the 604e@280Mhz, at least at this job, hence my
statement.
> In case it's amiga system, some explanation can be inserted just below my
> text :
>
> <here>
Sorry, I put them above. Did that piss you off?
>> That's really a funny statement, specially considering that it comes
>> from someone who uses his 1.5MHz processor to do the same things that
>> one can do with a 240Mhz processor...
> Sorry, I meant 280Mhz
But the hard fact is that no 240/280MHz processor can do things that a
1.5MHz can do.
You should really read the stuff you are gonna answer to, you know? Try
again, next time you might be lukier.
> David S Lund wrote:
> > Now that I've managed to get up off the floor, I've zeroed in on your
> > problem (other than your lying habit). You are running UAE on absolute
> > crap! Of course your hardware was expensive...it's a laptop. Might I
> > point out that it was one of the cheapest laptops at the time. If my PC
> > performed that slowly, I'd drop it in the trash. No one but you gets
> > performance anywhere near as bad as yours.
>
> That's really a funny statement, specially considering that it comes
> from someone who uses his 1.5MHz processor to do the same things that
> one can do with a 240Mhz processor...
>
First of all, most Amiga software does not run on PPC. Secondly, why
not? At todays PC prices, cost is not a factor. Thirdly, I do the same
things faster than Steve does. If it takes a fast PC to do it, that's
fine....they are cheap. The effeciency of AOS is what makes it all
possible. I agree wholeheartedly with Steve that UAE and AMitlon is
useless on a 266 Mhz PC.
> Fabio Alemagna wrote:
>
> > That's really a funny statement, specially considering that it comes
> > from someone who uses his 1.5MHz processor to do the same things that
> > one can do with a 240Mhz processor...
>
> Sorry, I meant 280Mhz
>
>
>
The CSPPC was an innovative card. If I had lots of money to spend, I'd
buy one for my A3000T and A3000D.
>pet...@machine.homelinux.net.invalid wrote:
>> In comp.sys.amiga.advocacy Fabio Alemagna <fale...@aros.org> wrote:
>>
>>
>>>That's really a funny statement, specially considering that it comes
>>>from someone who uses his 1.5GHz processor to do the same things that
>>>one can do with a 240Mhz processor...
>>
>>
>> Actually, I'm very intrested to hear what system you are using
>> and why's that so wonderful (or is it) ?
>
>It's not ME who's using it. I just read the benchmarks people here
>posted and draw my conclusions. You should do that to, instead of
>jumping to the neck of people who say things you don't like.
>
>
>Davis S. Lund said:
>
> > Amithlon @ 1.5 GHZ = 3.811 Mn/sec
> > > JIT-x86 code @ 266MHz = 0.230 Mn/sec
> > > 68060 @ 66MHz = 0.420 Mn/sec
> > > 604e @ 280MHz = 3.654 Mn/sec
>
>
>It appears to me that the 1.5GHz Machine running Amithlon performs just
>a little bit better than the 604e@280Mhz, at least at this job, hence my
>statement.
The statement is correct. The thing to note is that the Amithlon
benchmark is achieved with the 68k version of the client, therefore
the *emulated 68k* under Amithlon performs just a little bit better
than the *native* 604e@280MHz client at this job.
This is no reflection on how native x86 code performs under Amithlon.
--
Bill Hoggett
Nor it was meant to be, as the original point was not that one. However,
it's obvious that native x86 code performs like native x86 code even
under Amithlon, unless you use the Martin Blom's compiler which,
although giving the benefit of not having to deal with endianess issues,
renders the "native" code less performing than the "hand made" native
code, because of the fact that byte swapping is done always, even if not
strictly required, and because the code is bigger and thus doesn't
always fit in the processor's cache.
> Lets enter the real world for a second. Giving UAE-JIT/Amithlon
> every possible benefit of the doubt: gazillions of simple
> unchanging iterations, no general use or multitasking, computer
> completely untouched while quietly numbercrunching (IOWs, not the
> real world)...
>
> RC5
> ---
> JIT-x86 code @ 266MHz = 133 Kk/sec
There is no x86 code.
> 68060 @ 66MHz = 210 Kk/sec
> 604e @ 280MHz = 926 Kk/sec
> OGR
> ---
> JIT-x86 code @ 266MHz = 0.230 Mn/sec
> 68060 @ 66MHz = 0.420 Mn/sec
> 604e @ 280MHz = 3.654 Mn/sec
Here is my Cel640:
RC5 762 Kkeys
OGR 1.1 Mnodes
Don't you think it's a bit odd that I'm barely twice your
MHz, but still 5 times faster?
Regards...
>> OGR
>> ---
>> JIT-x86 code @ 266MHz = 0.230 Mn/sec
>> 68060 @ 66MHz = 0.420 Mn/sec
>> 604e @ 280MHz = 3.654 Mn/sec
>
> Here is my Cel640:
>
> RC5 762 Kkeys
> OGR 1.1 Mnodes
>
> Don't you think it's a bit odd that I'm barely twice your
> MHz, but still 5 times faster?
I should point out that these are WinUAE figures.
Regards...
SG wrote:
>
> <bme...@cs.monash.edu.au> wrote:
>
> > "SG" <s...@erols.com> writes:
> >
> > >You first have to understand whats going on here, and obviously
> > >you don't. Let me explain it once more:
> >
> > Actually, let me explain. When it comes to credentials with respect to
> > UAE-JIT (and Amithlon), I suspect there is little doubt who knows better
> > what's going on.
> >
> > But if anybody hasn't made up their mind yet, I invite you to compare
> > Steve's emotionally charged verbiage with the actual hard facts I will
> > intersperse....
>
> The hard fact is, you "forgot" to tell your customers that
> UAE-JIT is even slower than UAE interpretive(!) until a simple
> loop is executed identically five times, at which point it dead
> stops(!) to recompile the software into sloppy x86 code in the
> hope that the task will last long enough to overcome all that
> water under the bridge once the loop is invoked again, if its
> invoked again, or wouldn't have finished long ago.
Wow, its so funny watching Steve flail about, blindly in his hatred
of amiga developers.
> Since AmigaOS is so fast in general, the vast, vast majority of
> the time this is what's going to happen. I wonder if people
> realized JIT was slower than UAE interpretive until it found a
> simple untouched loop thats already executed five times, when
> they bought it from you?
I wonder how Steve keeps forgetting that Amothlon crushes a real
amiga (060/604e) in many real world tasks ?
> -Steve
Mad Dog
> Don't you think it's a bit odd that I'm barely twice your
> MHz, but still 5 times faster?
You could try setting your memory access to "indirect" (you will then have
to enable "force settings", too). Steve's laptop and/or Windows installation
might be screwed up enough so that Brian's code cannot enable direct
access.
Bernie
--
I'm sure I take rowing too seriously by many peoples standards- but,
they are not rowing people.
Stuyvesant B. Pell
Princeton, 1953
Oh, and it starts -immediately-, Steve, no waiting. :)
Regards...
>On Wed, 26 Jun 2002 13:28:45 GMT, "Peter" wrote:
>> Cool... so a modern PC, may be a decent P4, should be about the same
>> speed as a 280Mhz 604e (since a 266Mhz laptop is super slow).
>Until OS4, the 280Mhz PPC is only used for some games, and some of the
>more difficult tasks faced by programs. the rest of the time its the 68k
>thats doing the job....
And in a car with a NOS system you only use the NOS for acceleration,
all the other time you just use the normal enginepower ?
That's pretty stupid to say so, coz you only use the NOS when it's needed,
exactly as the Amiga.. you don't NEED to have PPC-native dir and makedir
commands, you want to have PPC-native datatypes, MP3 en/decoders, MPEG players
and such.. and.. Tada, WICH programs are available for PPC ? ..YES, mainly
programs that NEED much power, like datatypes, MP3 en/decoders and MPEG players.
I still think that my Amiga is pretty fast, the only times it feel slow
is when doing heavy tasks.. wich I actually don't do very often.
> The statement is correct. The thing to note is that the Amithlon
> benchmark is achieved with the 68k version of the client, therefore
> the *emulated 68k* under Amithlon performs just a little bit better
> than the *native* 604e@280MHz client at this job.
>
> This is no reflection on how native x86 code performs under Amithlon.
Thats pretty much right, as long as you aren't multitasking
anything, don't care if the Amiga kernel's behavior gets
completely trashed, don't touch the computer for anything else,
and don't mind 020 speed runing everything else using heinously
slow and thoroughly unstable and incompatible UAE interpretive,
and you don't need or care about Amiga compatibility, Amithlon
is almost an equal $2000 numbercrunching-alone option.
--
-Steve
> SG wrote:
>
> > The hard fact is, you "forgot" to tell your customers that
> > UAE-JIT is even slower than UAE interpretive(!) until a simple
> > loop is executed identically five times, at which point it dead
> > stops(!) to recompile the software into sloppy x86 code in the
> > hope that the task will last long enough to overcome all that
> > water under the bridge once the loop is invoked again, if its
> > invoked again, or wouldn't have finished long ago.
>
> Erm, Steve - if a basic block is executed 5 times, then statistically it wil
> be run many many more times.
Statistically nonesense, perhaps. Just becasue a block is
iterated five times doesn't mean it will take seconds to
complete, and thus benefit, even over super slow UAE
interpretive (which we now know is whit UAE-JIT/Amithlon is 99%
of the time, only slower).
> 2) It compiles into reasonably optimised x86 code, not sloppy x86 code
Right, thats 25% the speed of similar tasks compiled for the same
x86 chip from the beginning.
> 3) The one-time hit that allows the code to run 30x faster subsequently is
> an extremely useful thing to have.
The x86 chip natively isn't 30x faster than the venerable 060
(faster MHz/MHz than a relatively crude, x86 Athlon) running
alone natively. Thats before the 75% hit for the herinously
dirty compile.
> > Which of course applies equally to your emulation, since as we've
> > now, finally, after dragging it out of you know, your product is
> > UAE interpretive until an identical loop is exectuted at least 5
> > times, at which point UAE-JIT actually stops to compile in the
> > hope that this loop is a long number crunching task.
>
> But is is a basic block, not a loop. A small section of code - 5 to 10
> instructions long on average, that takes practically no time in the real
> world to compile to native code. E.g., any AmigaOS library function will be
> accessed more that 5 times. Exec will be accessed more than 5 times. The
> vast majority of any application will be accessed more than 5 times.
Not identically. That why there is a 5x rule, if JIT happened
everythime code was repeated, the emulation would crawl. The JIT
compilation stage is excrutiatingly slow compared to simple,
already very slow UAE interpretive (70:1 slower according to
Meyer, though more liike 100:1 when you figure in how much
slower Amithlon is than UAE interpretive due to the required
additional supervisor code).
> > RC5
> > ---
> > JIT-x86 code @ 266MHz = 133 Kk/sec
> > 68060 @ 66MHz = 210 Kk/sec
> > 604e @ 280MHz = 926 Kk/sec
> > OGR
> > ---
> > JIT-x86 code @ 266MHz = 0.230 Mn/sec
> > 68060 @ 66MHz = 0.420 Mn/sec
> > 604e @ 280MHz = 3.654 Mn/sec
>
> Ah, your favourite cache resident benchmark running on your
> crippled laptop.
> Why haven't you tried running a decent benchmark - try
> rendering a scene in
> Imagine 4 on all 3 platforms and get back to us with the
> results. Why not
> use the 1GHz PIII system that you have?
Because pretty much anything on an Amiga that takes more than a
second or two (the only time Amithlon or UAE-JIT will provide
some small benefit over old UAE Interpretive; all other times
its slower) is PPC compiled these days, and Amithlon/UAE can't
run any of it.
--
-Steve
> On Wed, 26 Jun 2002 13:28:45 GMT, "Peter" wrote:
>
> > Cool... so a modern PC, may be a decent P4, should be about the same
> > speed as a 280Mhz 604e (since a 266Mhz laptop is super slow).
>
> Until OS4, the 280Mhz PPC is only used for some games, and some of the
> more difficult tasks faced by programs. the rest of the time its the 68k
> thats doing the job....
Almost everything I do is PPC, and has been for a long time
already: GUI gadget and background rendering, Image processing;
rendering, all players and viewers, much of the gfx subsystem,
internet image and plugin rendering--shoot, what else could take
a mini fraction of a sec under AOS?
--
-Steve
> "SG" <s...@erols.com> wrote in message news:3D18FD8C.M...@erols.com...
> >
> > Lets enter the real world for a second. Giving UAE-JIT/Amithlon
> > every possible benefit of the doubt: gazillions of simple
> > unchanging iterations, no general use or multitasking, computer
> > completely untouched while quietly numbercrunching (IOWs, not the
> > real world)...
> >
> > RC5
> > ---
> > JIT-x86 code @ 266MHz = 133 Kk/sec
> > 68060 @ 66MHz = 210 Kk/sec
> > 604e @ 280MHz = 926 Kk/sec
> > OGR
> > ---
> > JIT-x86 code @ 266MHz = 0.230 Mn/sec
> > 68060 @ 66MHz = 0.420 Mn/sec
> > 604e @ 280MHz = 3.654 Mn/sec
>
> Cool... so a modern PC, may be a decent P4, should
> be about the same speed as a 280Mhz 604e (since
> a 266Mhz laptop is super slow).
Right, $2K+ now, vs already have had it for 5 years. All for
no speed benefit even in the most ideal cases, horrible
instability, bare bones 68K-only compatibility, and horrific
slowness as its a watered down UAE interpretive machine nearly
100% of general run time.
> And Amithlon is faster than UAE, so would zoom....
> and with V2.0 coming out, is going to be great.
Actually its slower than UAE-interpretive, until a simple
cachable loop iterates a few million times without change (thats
to overcome the opportunity cost of halting the emulation while
the compiler chugs, then starts; and not necessarily quickly
either).
--
-Steve
FYI steve, 70 instructions x five doesn't take "seconds" on an Athlon at >
1Ghz
70 instructions haven't taken "seconds" to execute on any computer hardware
for quite some time...
-JB
No, the REALLY sad fact is, you failed to RTFM.
>
> It took me to drag that admission out of you, otherwise no one
> would know that to this day.
Any one could have read it, all spelled out in the docs, you did no
such thing, you lie like a rug.
>
> > >Interpretive emulation (UAE non-JIT) is very, very slow. It
> > >uses pre-compiled translation that ultimately coverts 68K
> > >instructions into x86 executable instructions step by
> > >intricate step, every time it encounters one (all the time when
> > >running AOS, obviously).
> >
> > It requires a fairly constant average of about 70 host cycles for each
> > 68k instruction, over a wide range of 68k software tested. Which means
> > that on a nice 1.75GHz machine, you are looking at 25 million 68k instructions
> > executed.
> > For small loops with small data sets, which actually fit into an 060's cache,
> > the 060 will come out ahead by a factor of 2-3. However, as soon as either
> > the active code size or the active data set goes beyond the size of the 060's
> > caches, *it* will slow down quite a bit. The interpretative CPU emulation
> > in UAE, however, *already* suffers quite atrocious pipeline stalls for each
> > instruction, so it will slow down considerably less due to those extra
> > stalls (even assuming that the much larger caches are also getting thrashed).
>
> Which of course applies equally to your emulation, since as we've
> now, finally, after dragging it out of you know, your product is
> UAE interpretive until an identical loop is exectuted at least 5
> times, at which point UAE-JIT actually stops to compile in the
> hope that this loop is a long number crunching task.
NOTHING like the pipeline stalls of a 68060, it SMOKES you.
>
> If its not a long task, you're screwed. Not only are you using a
> very slow emulation (straight UAE, minus), you are using it until
> a task almost completes standard-UAE-slowly, then stopping to
> wait for a recompile, only to find very little of the task
> remaining anyway.
You are showing you ignorance again Steve.
>
> Since AmigaOS is so fast in general, the vast, vast majority of
> the time this is what's going to happen. I wonder if people
> realized JIT was slower than UAE interpretive until it found a
> simple untouched loop thats already executed five times, when
> they bought it from you?
Please, don't be a sore loser, we KNEW, unlike YOU, we DO RTFM.
>
> > >Adding a JIT compiler dramatically slows this process down,
> > >essentially bringing the entire emulation to a dead stop
> > >while 68K code is dynamically compiled on the fly (Just In
> > >Time) into hasty x86 native code. Once compiled, the sloppy x86
> > >code generally runs faster than UAE-interpretive can run it with
> > >its every-time pre-compiled instruction translation processes.
> > >Until, of course, the code being iterated changes or completes
> > >its task, then its back to square one.
> >
> > Except for the use of "dramatically", "dead", "hasty", "generally", "sloppy"
> > and so on, that is basically correct.
> >
> > Once it has been determined that a block (a basic code fragment starting
> > anywhere, and ending on the next possible control flow instruction) has been
> > executed 5 times (using the interpretative emulation), the JIT compiler
> > guesses that that particular block is likely to be executed several more
> > times.
>
> That would be a stone dumb assumption, but have no fear, its not
> true. The assumption is: after 5x through identical code
> sustained and untouched, the loop must continue to occur for
> millions of times, if not billions, or even gazillions of
> times. Thats why you wait for 5 and not 2, or 3, or 4, utterly
> identical passes. JIT after 2 iterations would slow the already
> slow interpretive emulation to crawl.
67, NOT millions, you moron. Your parents with catch you at thins one
day and whail you for it.
>
> It goes something like this:
>
> 1 time through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly),2 times through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly)...
>
> <repeat zillions of times to no avail>
>
> Then... ...red alert, red alert, man your battlestations ....we
> might have something here....
>
> 1 time through some code (sloooooowly), 2 times through some code
> (sloooooowly), 3 time through some code (sloooooowly), 4 times
> through some code (sloooooowly), 5 times through some code
> (sloooooowly)...
>
> Ahhhhhh! Throw the swtich! Oh my God, JIT might help if this
> keeps going. QUICK!! No one is likely to be looking while we
> are doing something so boring... HIT THE BRAKES!!!
>
> Screeeeeeeeeecccccccccccccccccccchhhhhhhhhhhhhhh!
>
> BANG!
>
> Chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug chug chug chug chug chug chug chug chug
> chug chug chug chug chug <times 18,000 per simple instruction,
> if enough cache space is available>
>
> Ok, go go go! Let the 25% speed x86 turtle out out of the
> starting gate, "Run baby, run!"
And it yawns big while it SMOKES Steves out dated hack.
>
> Oh my God, the task is still running! Run, you deformed little
> x86 mutant, run!! Its still not over, holy moly, we might
> actually wi... Oh darn, the task wouldv'e finished already even
> at slow UAE speed.
So, is THIS a lie, or, you can't keep it running, whitch IS the lie,
OH, BOTH.
>
> Nevermind.
>
> ....1 time through some code (sloooooowly), 1 time through some
> code (sloooooowly), 1 time through some code (sloooooowly), 1
> time through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly), 1 time through some code
> (sloooooowly), 1 time through some code (sloooooowly), 1 time
> through some code (sloooooowly),2 times through some code
> (sloooooowly), 1 time through some code (sloooooowly)
>
> And so on...
>
> > In order to make things faster on those subsequent executions,
> > the 68k code is compiled into x86 code.
> > That code, while generated in a very straightforward manner (due to the
> > time constraints),
>
> "dramatically", "dead", "hasty", "generally", "sloppy"
>
> > is actually quite optimized in a number of ways --- so
> > much so that some artificial benchmarks are actually pretty much optimized
> > away.
>
> Oh that sounds enouraging.
>
> > >Since JIT compiling is initially so *incredibly* slow compared to
> > >interpretive instruction translation,
> >
> > Let's inject some numbers here, shall we? I have Amithlon at my fingertips,
> > so I measured it on my Duron-8000. The "*incredibly* slow" JIT compilation
> > takes a fairly stable average of 4000 host cycles per 68k instruction.
>
> You mean, just to generate the dirty x86 code that has no hope of
> achieving the host chip's actual native potential. Then, it has
> to begin running it. Or did you "forget" that code actually has
> to run, not just compile? My, my, aren't we forgetful today?
>
> Even the absolute simplest, and more or less indefinite, loops
> like RC5 and OGR only run at about 25% the x86 native processor's
> demonstrated speed. That's -after- the massive initial
> overhead penalty, the penalty above that you hoped to dupe people
> into believing was all that was required to actually execute the
> task, had long ago taken its toll.
>
> > Combined with the average length of a block (4-5 instructions), it's
> > 18,000 hist cycles per compiled block.
> >
> > Compared to the 70 host cycles per instruction for interpretative emulation,
> > the ratio is about 60.
>
> Ok so UAE-JIT is, roughly, 60x slower than UAE interpretive
> (propbably more like 100x slower, since you are again
> "forgetting" about the supervisor overhead that makes these
> decisions) the first 6 times thorugh a simple number crunching
> loop.
>
> > >it isn't used at all, even within UAE-JIT until, by trial and error,
> > >the emulator determines there might be a chance of some long term benefit.
> >
> > "Trial and error" of course referring to simply keeping a count of how often
> > each particular block of code has been executed.
>
> Oh really, it "chooses" what code is to be run? Or, can it only
> look for iterative loops in whatever code its fed by the
> operator? Sometimes finding an opportunity; sometimes not.
> And rarely winning the recompile gamble even when it see the 5
> identical iterations.
>
> You do have one thing going for you though, a huge thing, UAE
> Interpretive is so slow. Thats a big deal as you know. If
> you were competing against a 604e speed chip, instead of an 020
> or 030 speed emulated chip, it would be very difficult to ever
> justisfy the lengthy recompile. This isn't God awful slow
> Windows, beginning to end AOS responses aren't commonly measured
> in seconds. They are measured in microseconds.
And AOS is AWSOME on Amithlon, this for some one who HAS used fast
Amigas, I got to build 4000Ts with CSPPC cards, VT4000/Flyer, systems
that rocked...then, not now, there still nice, still capeable, still
able to do the job, and do it very well, but, the cold hard fact is,
Amithlong is faster, it can't do toaster/flyer, but I don't need it,
it's can't do AGA, I don't need it, it CAN run promotable apps, I like
that, even CNet run damn fast on it, more responcize and faster than I
have EVER seen it, play the WOF game, forget about seeing the wheel
spin, too fast, you selct spin and there is the results (you LUST can
tell it did spin, a blink). I use it on a dayly bases, you do NOT,
you have NO clue, so you lie.
>
> Using AOS, going even slower for less than few second tasks is
> a very bad tradeoff.
No clue.
>
> > >Once it determines that code is being simply iteratively looped
> >
> > Nope, once it determines that a particular block of code has been used
> > at least 5 times...
> >
> > >it halts the pre-compiled interpretive process dead, and takes a long
> > >imeout
> >
> > ..it spends an average 18,000 host cycles (at 800,000,000 host cycles per
> > second, that's 22.5 microseconds)...
>
> Per instruction. Before it runs anything. Runs anything in
> native code so sloppy it demonstrates 25% native speed, best
> case. Assuming the gamble pays off and the loop didn't finish
> long ago.
At 25%, it smokes you, 2500 MHz/4, you do the math.
>
> And completely devastating (better description: complete
> disregarding) Exec's required scheduling resolution, putting
> multitasking on hold, killing the very round robin pyramid scheme
> that allows AOS to run so beautifully in the first place on UAE
> Interpretive's faster 020-030 power.
You have NO clue, or (more likely) you lie like a rug.
>
> > So to make up that opportunity cost, it takes about 62 further executions
> > of the JIT-compiled block. Trust me, *most* things which get executed 5 times
> > will get executed 67 times, and most will get executed many many more times.
In the time it takes your system to do what, 5 or 10?
>
> Assuming there is no cost to actually execute the x86 code, its a
> 67:1 tradefoff--if there was no supervisor code and kernel
> running to drag performance down even farther.
WAHAAAY above your system.
>
> <more of the same snipped>
>
> > >Read the PCTask-JIT Amiga-->Windows emulation docs, for an honest
> > >explanation of when JIT compiling might help or hurt...
> >
> > The explanation of an old x86-to-68k JIT compiler might not be
> > the ideal source of information about a recent 68k-to-x86 JIT compiler
>
> At least he didn't try to pull the wool over all the emulation
> user's eyes, like you did. He fessed up without being drug out
> on to the carpet.
>
> Bottomline(s):
>
> UAE-JIT and Amithlon (assuming UAE interpretive did similarly little
> in terms of emulation power) are always slower than simple UAE
> Interpretive during the huge fraction of the time one isn't doing
> long term, sustained, unchanged, non-multitasking, simple number
> crunching for millions of loops. Bad AOS tradeoff.
Wrong, Amithlon is way faster, in fact, with the JIT off it is still
faster.
>
> For number crunching tasks that complete within a second or two,
> UAE-JIT/Amithon are anywhere from slightly slower on the long
> end of that time range, to massively slower on the short end.
> Bad AOS tradeoff.
You can show this to be? (hint, I know better)
>
> For the few (if any) times you let the computer sit untouched
> and crunch for a few seconds+, UAE-JIT can provide big advantages
> over UAE interpretive. Though, at that point, you are using some
> other OS kernel at best to control what multitasking might occur,
> or a completely bastardized Amiga kernel at worst. Considering
> Amigans demand perfect responsiveness even while
> numbercrunching, thats also a bad AOS tradeoff.
And I GET perfect responsivness, but then, *I* don't have a 266 K6 lap
top, *MY* laptop is an old 333MHz k6-2 and it IS about like a 25MHz
68030, nothing to write home about, but usable (WinUAE JIT), my
desktop, is a bit more :)
>
> And remember, the "benefit" we are talking about here is relative
> to 020-030 power UAE Interpretive, hardly applicable to a ng
> where AmigaPPC native rules the roost.
Wrong, 68060/66 smoker.
>
> Lets enter the real world for a second. Giving UAE-JIT/Amithlon
> every possible benefit of the doubt: gazillions of simple
> unchanging iterations, no general use or multitasking, computer
> completely untouched while quietly numbercrunching (IOWs, not the
> real world)...
>
> RC5
> ---
> JIT-x86 code @ 266MHz = 133 Kk/sec
> 68060 @ 66MHz = 210 Kk/sec
> 604e @ 280MHz = 926 Kk/sec
> OGR
> ---
> JIT-x86 code @ 266MHz = 0.230 Mn/sec
> 68060 @ 66MHz = 0.420 Mn/sec
> 604e @ 280MHz = 3.654 Mn/sec
>
> Interestingly, thats using x86 hardware that cost more than my
> CS-PPC over 2 years after my CS-PPC was added. And that x86
> machine only achieves 000-020 speeds when not-simple-looping...
Now, for the REAL world, up that to even a lowly 1GHz Duron, and your
toast, but you always go against the LOW end, 266MHz k6, your a sad
one.
Interestingly, my hardware that cost WAY less smokes it.
>
> http://amigapro.com/UAE.html
And you have the NERVE to call yourself an Amig Pro.
>
> -Steve
And you can get the 240MHz for how much, but, it dose MORE that that
240MHz chip, and at a small fraction of the cost, and, that same x86
hardware can ne used to run a number of other OSs, you?
WHAT 280MHz? Overclocked, hell David NEVER overclocks, but *I* do, he
could as well. First thing I did with a 1GHz Duron was up it to
1333MHz, yup, 333MHz over, that alone is more MHz than you 280 :P~~ :)
At a SMALLER cost than the 280, but *I* have never seen a 280MHz 604e
on a CSPPC on sale, if you are overclocking, we can too.
>
> > In case it's amiga system, some explanation can be inserted just below my
> > text :
> >
> > <here>
>
> Sorry, I put them above. Did that piss you off?
And, he asked about what you are running, what kind of Amiga IOW your
specs?
>
You already posted that it took twice as long as my UAE-JIT
machine (since tossed for unusable instability) to start the
tasks, making it 140x slower than an 060 working alone.
--
-Steve
Takes much less time then that.
> complete, and thus benefit, even over super slow UAE
> interpretive (which we now know is whit UAE-JIT/Amithlon is 99%
> of the time, only slower).
Utter bullcrap. Can you find a single Amithlon user who agrees with you?
Have you ever run Amithlon? On what?
> Right, thats 25% the speed of similar tasks compiled for the same
> x86 chip from the beginning.
1.5 ghz/4 = 375 mhz. Totally acceptable emulated 68k speeds. Kicks your
Amiga's 66 mhz 060 butt.
>
> The x86 chip natively isn't 30x faster than the venerable 060
> (faster MHz/MHz than a relatively crude, x86 Athlon) running
> alone natively. Thats before the 75% hit for the herinously
> dirty compile.
>
Still don't agree with your numbers...but a 375mhz emulated 040 is
acceptable to me.
> Because pretty much anything on an Amiga that takes more than a
> second or two (the only time Amithlon or UAE-JIT will provide
> some small benefit over old UAE Interpretive; all other times
> its slower) is PPC compiled these days, and Amithlon/UAE can't
> run any of it.
Give an example where there is not a 68k equivalant.
>
You just don't want to lose. 68k imagine on AMithlon would kick the snot
out of PPC imagine on your amiga.
You don't do much than. AOS (68k), Browsers (68k) You wanna see
slow...wait til AOS4 on your PPC card (suppose you'll turn off JIT. :)
Nonsense. Here's a simple test for you. On the back of your Amithlon
box, what does the second line (white letters) say? It's basic Amithlon
knowledge.
Finally! You tossed the laptop. :)
Only, your lien like a rug again, let's see:
cost was NOT $2000, it was
Case.....................$32.00
Mobo/CPU........$130.00
60gb HDD...........$80.00
Flopy....................$15.00
CD-ROM............$40.00
Sound card........$11.00
NIC......................$10.00
512MB RAM $90.00
GeForce 2 vid..$30.00
Keyboard..........$15.00
Optical Mouse$30.00
Amithlon.........$150.00
Total................$633.00
And it is NOT 020 speed (what IS 020 speed, at what clock?) Also, I
run 10 ports on CNet BBS, ACS, AmIRC, DOpus, Microdit 2, Aweb all at
the same time, AND, add to the Imagine 4 when I run that, you were
saying?
> --
>
> -Steve
> >It appears to me that the 1.5GHz Machine running Amithlon performs just
> >a little bit better than the 604e@280Mhz, at least at this job, hence my
> >statement.
>
> The statement is correct. The thing to note is that the Amithlon
> benchmark is achieved with the 68k version of the client, therefore
> the *emulated 68k* under Amithlon performs just a little bit better
> than the *native* 604e@280MHz client at this job.
>
> This is no reflection on how native x86 code performs under Amithlon.
[wets pants], sorry, waiting for July 1st is like waiting for Xmas :)
--
Doughnuts are my downfall
>> >> Here is my Cel640:
>> >>
>> >> RC5 762 Kkeys
>> >> OGR 1.1 Mnodes
>> >>
>> >> Don't you think it's a bit odd that I'm barely twice your
>> >> MHz, but still 5 times faster?
>> >
>> > I should point out that these are WinUAE figures.
>>
>> Oh, and it starts -immediately-, Steve, no waiting. :)
>
> You already posted that it took twice as long as my UAE-JIT
> machine (since tossed for unusable instability) to start the
> tasks, making it 140x slower than an 060 working alone.
No that's not true anymore. I don't know if it's the new
WinUAE or distributed client, or if perhaps it was all a
coincidence, but there is no waiting here - it starts
crunching immediately.
Regards...
The Einiac did 5000 additions, or 357 multiplications, per
second. No? :)
Regards...
> Statistically nonesense, perhaps. Just becasue a block is
> iterated five times doesn't mean it will take seconds to
> complete,
If your Amiga is taking seconds to complete 5 - 10 native instructions, then
I suggest you buy a new computer.
> and thus benefit, even over super slow UAE
> interpretive (which we now know is whit UAE-JIT/Amithlon is 99%
> of the time, only slower).
More like 2% of the time. You don't know shit about compilers, programming
or stuff like that (you have proved it many times before) - code gets run
many times. It will get JIT compiled, stored in the JIT cache, and then
never needs to be JIT compiled again.
> Right, thats 25% the speed of similar tasks compiled for the same
> x86 chip from the beginning.
As I wrote, not bad for recompiling from another incompatible native
instruction set.
> The x86 chip natively isn't 30x faster than the venerable 060
> (faster MHz/MHz than a relatively crude, x86 Athlon) running
> alone natively. Thats before the 75% hit for the herinously
> dirty compile.
The Athlon has IPC are 3x higher than your '060. Ouch! That's gotta hurt.
The Athlon runs at clock speeds 25x faster than your '060.
The Athlon is therefore 75x faster than your '060 when running native code.
You lose.
>> The vast majority of any application will be accessed more than 5 times.
>
> Not identically.
Huh? The code is the same. Self modifying code has no worked on the Amiga
since processors higher than the 68010 were used, because it messes with
the on-chip cache.
> That why there is a 5x rule, if JIT happened
> everythime code was repeated, the emulation would crawl.
JIT only happens once? Haven't you worked this out yet?
> The JIT
> compilation stage is excrutiatingly slow compared to simple,
> already very slow UAE interpretive (70:1 slower according to
> Meyer, though more liike 100:1 when you figure in how much
> slower Amithlon is than UAE interpretive due to the required
> additional supervisor code).
So slow it takes under 70 repeats of that basic block before you are ahead
of the interpretive UAE, and if a basic block is repeated 5 times, then
statistically, in the vast majority (>99%) of cases, it will most likely be
repeated hundreds/thousands/millions of times more.
> Because pretty much anything on an Amiga that takes more than a
> second or two (the only time Amithlon or UAE-JIT will provide
> some small benefit over old UAE Interpretive;
More like 0.001 seconds before the JIT beats the Interpretive.
> all other times
> its slower) is PPC compiled these days, and Amithlon/UAE can't
> run any of it.
Usually there are 68k versions and PPC versions. You can run the PPC
versions, and we will run the 68k versions. As I said, Imagine 4. Render
the camaro at 1024x768 and post the time it took.
Graham
> And it is NOT 020 speed (what IS 020 speed, at what clock?)
Even old non-JIT WinUAE ran like at 040 speeds on my Cel450.
More presicely it "felt" a tad slower than my A4K/040/CV64,
and benched anywhere from a 25Mhz 030 to a 40Mhz 040 in
all the apps I used and tested.
Regards...
>Nonsense. Here's a simple test for you. On the back of your Amithlon
>box, what does the second line (white letters) say? It's basic Amithlon
>knowledge.
David, Steve does *not* have an Amithlon box. That's the crux of the
whole story. He's making it all up on the basis of his own misguided
"understanding" of how the theory works, mixed with some lame
experience of running WinUAE on a very old and hopelessly broken
laptop.
--
Bill Hoggett
>[wets pants], sorry, waiting for July 1st is like waiting for Xmas :)
Hey! Cut it out. I didn't even hint at anything that time... :)
--
Bill Hoggett
Sorry, your mention of x86 native, and I just started to think what
this NG will be like when benchmarks are dragged into the fray when x86
native versions of apps are available :)
Couldn't you just release it today? :) July 1st is a long time from now.
What'll we do this weekend to pass the time?
I figured that much. Seems like a stupid way to test something. So Steve
really has no qualification at all to even enter into the discussion,
except maybe for pure entertainment value.
Ford Musting Cobra R's are slow...I saw one stopped at a gas station
once.
> The x86 chip natively isn't 30x faster than the venerable 060
> (faster MHz/MHz than a relatively crude, x86 Athlon) running
> alone natively. Thats before the 75% hit for the herinously
> dirty compile.
BWA-HAHAHAHAHAHAHAHA... Oh shit! I just shot Pepsi out my nose!!!!
Best (and most painful) laugh I've had all day!
Even IF your statement were even /remotely/ correct, how in hell can you
POSSIBLY make such a MHz-to-MHz comparison? Last time I checked, the 68060
topped off at 66Mhz, whereas the Athlon is up to - what, 2100MHz or so?
Don't bother replying. I really can't afford to shoot more soda at my
laptop via my nose anymore.
--
Brian Team *AMIGA*
Actually, the Athlon XP 2100+ runs at 1733 MHz.
http://www6.tomshardware.com/cpu/02q2/020506/p4b-04.html
//Gunnar
> Actually, the Athlon XP 2100+ runs at 1733 MHz.
Yeah, I know; but it's good enough for this argument.
--
Brian Team *AMIGA*
Damn straight. Camaro SS or Pontiac T/A WS6 will own one every time
:-)
Highest clocked AthlonXP is 1.8Ghz
I can't wait to see what Steve has to say about AMD's Opteron processors,
since they are showing signs of being a good 50% faster per clock than the
AthlonXP is......
-JB
Sure about that?
http://www.amdmb.com/article-display.php?ArticleID=176&PageID=2
"AMD is estimating that the Opteron processor will be
20-25% faster than the Athlon processor. 20% of this
difference will come on-chip low latency memory
controller and the remaining 5% will come from core
changes."
Regards...
> http://www.amdmb.com/article-display.php?ArticleID=176&PageID=2
>
> "AMD is estimating that the Opteron processor will be
> 20-25% faster than the Athlon processor. 20% of this
> difference will come on-chip low latency memory
> controller and the remaining 5% will come from core
> changes."
Linux people are getting 20% performance increase after that by going from
32-bit to 64-bit code as well - the extra registers in 64-bit mode are
coming in handy here. In 64-bit mode I expect the IPC to be 35-45% higher
than the standard Athlon. The Hammer should be running at 2GHz+ as well.
The extra registers would probably enable Amithlon to become even more
efficient as well.
Graham
And available in multiprocessor configurations of 8 CPU's!!
> > >[wets pants], sorry, waiting for July 1st is like waiting for Xmas :)
> >
> > Hey! Cut it out. I didn't even hint at anything that time... :)
> Couldn't you just release it today? :) July 1st is a long time from now.
> What'll we do this weekend to pass the time?
We could go to Bernies and camp in his driveway :)
Too bad GM has announced end-of-line for both :'(
My goodness, what WILL they think of next? A whole EIGHT CPUs in a
machine?
Blimey!
Puts the 106-CPU SunFire 15000 to shame!
:-P
(Sorry, couldn't resist :) )
--
In the begining, there was nothing.
And God said "Let there be light" and there was light.
There was still nothing, but you could see it better.
Think that might be a long trip? Would we make it there in time?
<Homer> mmm.... N-U-M-A.. mmmmm... </Homer>
I think a dual Clawhammer rig will do me just fine though :)
-JB
> >
> > And available in multiprocessor configurations of 8 CPU's!!
>
> My goodness, what WILL they think of next? A whole EIGHT CPUs in a
> machine?
>
> Puts the 106-CPU SunFire 15000 to shame!
Yes it does 10 or even 5 of those 8 cpu Opteron boxes will outperform that
sun easy. And for less price.
> > Ford Musting Cobra R's are slow...I saw one stopped at a gas station
> > once.
>
> Damn straight. Camaro SS or Pontiac T/A WS6 will own one every time
The swedish koenigsegg will beat em all. :)
http://www.koenigsegg.se/index.asp
STEVE!!! we need a RIDE!!!
Whoooops... :) well, at these speeds, 20% is a pretty big gain in
performance, no?
>
>
> Regards...
>
Oh gawd this is going to be SOOO OT! But, my feather they be a rufled.
You DON'T want to start me on them yahaaay too heavy hunks of iron
called Comaro or Firbird, note, the R after that Cobra name, Z28, SS,
TA, BAH! Will eat your lunch any time (blow your doors off), this is
one bad mofo, 300 made for 2002, very fast , Chevy sux :P~~~
;-)
>
> :-)
>
>
Fine with me, they suck anyway.
People don't buy Sun hardware _just_ for the raw performance.
(If raw performance were the only deciding factor, everywhere would be using
IBM POWER4's for the big tasks)
-JB
> > Because pretty much anything on an Amiga that takes more than a
> > second or two (the only time Amithlon or UAE-JIT will provide
> > some small benefit over old UAE Interpretive; all other times
> > its slower) is PPC compiled these days, and Amithlon/UAE can't
> > run any of it.
>
> FYI steve, 70 instructions x five doesn't take "seconds" on an Athlon at >
> 1Ghz
Thats 70:1 the inefficiency of simple UAE interpretive. More
like 100:1, since again Meyer "forgot" about the JIT supervisor
code sucking cpu all the time, which makes both UAE-JIT and
Amithlon slower than simple UAE interpretive before 5 unchanged
loops occur and become candidates for a recompile with the (his
number) 70:1 inefficiency ratio. Thats before anything is even
run.
--
-Steve
> "SG" <s...@erols.com> wrote in news:3D1B9832.M...@erols.com:
> > Bjørnar Bolsøy <bbo...@hotmail.nospam> wrote:
>
> >> >> Here is my Cel640:
> >> >>
> >> >> RC5 762 Kkeys
> >> >> OGR 1.1 Mnodes
> >> >>
> >> >> Don't you think it's a bit odd that I'm barely twice your
> >> >> MHz, but still 5 times faster?
> >> >
> >> > I should point out that these are WinUAE figures.
> >>
> >> Oh, and it starts -immediately-, Steve, no waiting. :)
> >
> > You already posted that it took twice as long as my UAE-JIT
> > machine (since tossed for unusable instability) to start the
> > tasks, making it 140x slower than an 060 working alone.
>
> No that's not true anymore. I don't know if it's the new
> WinUAE or distributed client, or if perhaps it was all a
> coincidence, but there is no waiting here - it starts
> crunching immediately.
Its still a little over 70x slower than a CS-PPC 060 here using
the same client and prefs configs exactly (which is still 2x
faster than your initial claim).
All your claims magically get orders of magnitude faster once
you realize the performance with which you were once amazed is
actually quite pathetic, have you noticed this too?
--
-Steve
> "Eldor C. Luedtke Jr." <tr...@charter.net> wrote in news:3D1BB731.MD-
> 1.4.4...@charter.net:
>
> > And it is NOT 020 speed (what IS 020 speed, at what clock?)
>
> Even old non-JIT WinUAE ran like at 040 speeds on my Cel450.
> More presicely it "felt" a tad slower than my A4K/040/CV64,
> and benched anywhere from a 25Mhz 030 to a 40Mhz 040 in
> all the apps I used and tested.
The posted benchmarks show 020 speed.
--
-Steve
> SG wrote:
>
> > Statistically nonesense, perhaps. Just becasue a block is
> > iterated five times doesn't mean it will take seconds to
> > complete,
>
> If your Amiga is taking seconds to complete 5 - 10 native instructions, then
> I suggest you buy a new computer.
UAE-JIT takes 50-100x longer eitherway.
> > and thus benefit, even over super slow UAE
> > interpretive (which we now know is whit UAE-JIT/Amithlon is 99%
> > of the time, only slower).
>
> More like 2% of the time. You don't know shit about compilers, programming
> or stuff like that (you have proved it many times before) - code gets run
> many times. It will get JIT compiled, stored in the JIT cache, and then
> never needs to be JIT compiled again.
Until it gets dumped from the cache, but who cares anyway unless
you never do anything new.
> > Right, thats 25% the speed of similar tasks compiled for the same
> > x86 chip from the beginning.
>
> As I wrote, not bad for recompiling from another incompatible native
> instruction set.
Unfortunately, it removes and advantage x86 might provide over
even a 5 year old CS-PPC, in the cases where JIT might have
otherwise helped.
> > The JIT
> > compilation stage is excrutiatingly slow compared to simple,
> > already very slow UAE interpretive (70:1 slower according to
> > Meyer, though more liike 100:1 when you figure in how much
> > slower Amithlon is than UAE interpretive due to the required
> > additional supervisor code).
>
> So slow it takes under 70 repeats of that basic block before you are ahead
> of the interpretive UAE,
It hasn't even started running yet. And thats versus complete
speed-joke UAE.
And remember, AOS can't see or time the Linux driven compilation
time, its like all that time never happened. Intentionally
fudged speed marks in the few cases JIT might actually help.
> > all other times
> > its slower) is PPC compiled these days, and Amithlon/UAE can't
> > run any of it.
>
> Usually there are 68k versions and PPC versions. You can run the PPC
> versions, and we will run the 68k versions. As I said, Imagine 4. Render
> the camaro at 1024x768 and post the time it took.
Imagine 4 isn't Imagine 6 features-wise.
And why not use a Imagine2 x86 and go 5x faster than Amithlon?
It makes no sense. Clearly 604e speed (amazingly fast at 10x 060
speed, in general) is so magical because the 060 is virtually
untasked, allowing the multitasking Amiga experience to remain
millions of times faster than WinXP is even unloaded on a cutting
edge $2000 machine (AOS resulting in 1-3 microsecond reation
times for any/all UI interaction regardless of cpu load).
Any OS/machine can idly render something, even MacOS.
--
-Steve
Didn't mean for it to get off topic....
Ford Musting Cobra R's are slow...I saw one stopped at a gas station
once = Amithlon is slow because I ran UAE-Jit on a 266mhz laptop.
Really...he could have us there in no time. :)
> "James Boswell" <jamesb...@btopenworld.com> wrote in
> news:afgkud$eh3$1...@helle.btinternet.com:
>
> >> Because pretty much anything on an Amiga that takes more than a
> >> second or two (the only time Amithlon or UAE-JIT will provide
> >> some small benefit over old UAE Interpretive; all other times
> >> its slower) is PPC compiled these days, and Amithlon/UAE can't
> >> run any of it.
> >
> > FYI steve, 70 instructions x five doesn't take "seconds" on an Athlon
> > at > 1Ghz
> >
> > 70 instructions haven't taken "seconds" to execute on any computer
> > hardware for quite some time...
>
> The Einiac did 5000 additions, or 357 multiplications, per
> second. No? :)
18,000 instructions per 1 68K instruction. i.e. 70:1 worse than
simple UAE. Thats before anything is actually executed, just
while running under Linux in the case of JIT x86 code generation
(not execution) mentioned above. According to Meyer.
--
-Steve
> On 27 Jun 2002 23:19:36 -0600, "David S Lund" <dsl...@charter.net>
> wrote:
>
> >Nonsense. Here's a simple test for you. On the back of your Amithlon
> >box, what does the second line (white letters) say? It's basic Amithlon
> >knowledge.
>
> David, Steve does *not* have an Amithlon box. That's the crux of the
> whole story. He's making it all up on the basis of his own misguided
> "understanding" of how the theory works, mixed with some lame
> experience of running WinUAE on a very old and hopelessly broken
> laptop.
WinUAE-JIT is horrifically slow, in general. Amithlon/UAE-JIT
are actually slower than even old UAE interpretive (assuming
equally disfunctional Amiga compatibility) most of the time.
--
-Steve
I'm sure that explains why WinUAE-Jit on an Athlon 1GHz boots like this:
flash, flash, complete desktop in about 5 seconds. What kind of system
did you do all of your testing on?
The same thing happened to you...that's why you have a CSPPC. What are
you gonna get next?
faster than an A4000/40 is horrifically slow?
http://217.35.13.164/winuae/sysinfo1600mhz-nojit.png
-JB
> Bill Hoggett <bill_h...@lineone.net> wrote:
>
> > On 27 Jun 2002 23:19:36 -0600, "David S Lund" <dsl...@charter.net>
> > wrote:
> >
> > >Nonsense. Here's a simple test for you. On the back of your Amithlon
> > >box, what does the second line (white letters) say? It's basic Amithlon
> > >knowledge.
> >
> > David, Steve does *not* have an Amithlon box. That's the crux of the
> > whole story. He's making it all up on the basis of his own misguided
> > "understanding" of how the theory works, mixed with some lame
> > experience of running WinUAE on a very old and hopelessly broken
> > laptop.
>
> WinUAE-JIT is horrifically slow, in general. Amithlon/UAE-JIT
Only on your test system.
> are actually slower than even old UAE interpretive (assuming
You have not tested it.
> equally disfunctional Amiga compatibility) most of the time.
But you've never ran Amithlon, correct?
ROTFLMAO!!!!!!
Yeah, right.
I just love it when people spout such nonsence without having a clue
what they are talking about!
Hint - think very carefully for a moment about the sort of applications
which require a 106-way machine with 288G of physical RAM and you'll
begin to realise just how wrong your statement really is.
>> > You already posted that it took twice as long as my UAE-JIT
>> > machine (since tossed for unusable instability) to start the
>> > tasks, making it 140x slower than an 060 working alone.
>>
>> No that's not true anymore. I don't know if it's the new
>> WinUAE or distributed client, or if perhaps it was all a
>> coincidence, but there is no waiting here - it starts
>> crunching immediately.
>
> Its still a little over 70x slower than a CS-PPC 060 here using
> the same client and prefs configs exactly (which is still 2x
> faster than your initial claim).
Actually it was my setup which was twice faster than yours,
and I have no idea know where you get that 70x figure from,
you claimed 30x last time:
<snip>
From: SG (s...@erols.com)
Subject: Re: UAE-JIT vs CS-PPC update posted
Date: 2002-01-27 21:40:10 PST
Bjørnar Bolsøy <bbo...@hotmail.nospam> wrote: > "SG"
<s...@erols.com> wrote in news:3C53CC95.M...@erols.com:
> > how long does rc5 take to render the 7 lines of CLI
> > text before the crunching starts using the latest client?
>
> Hm.. You mean from when I write "rc5" and hit return? Just
> about three seconds before it starts chrunching.
Thats about 30 times slower than actual, it should be virtually
instant. It takes my UAE-JIT 7 secs, so our systems are right
on par for MHz.
<\snip>
> All your claims magically get orders
Look at the above and tell me everything is in order.
> of magnitude faster once
> you realize the performance with which you were once amazed is
> actually quite pathetic, have you noticed this too?
That stuff doesn't work on me Steve, though I could tell you
that it would be nice if you didn't always back away from
participation in our reproducable tests. Or better, that
you created someone yourself.
Regards...
You mean you want to go on record claiming 18,000/70 = 1 ?
:^)
Regards...
>> Erm, Steve - if a basic block is executed 5 times, then
>> statistically it wil be run many many more times.
>Statistically nonesense, perhaps. Just becasue a block is
>iterated five times doesn't mean it will take seconds to
>complete,
Steve, apart from the "seconds to complete" nonsense --- do you have
*any* data to suggest that you have the slightest clue about the
distribution of number-of-executions of blocks?
I can't provide any hard numbers here, because it is almost 2 years ago
that I collected and examined that data --- but I *have* made those
measurements; I see no indication whatsoever that you'd even know how
to make them.
So, when I state "a block that is executed 5 times is likely to be executed
many more times", I know what I am talking about. Please also note
the use if the verb "executed", as opposed to "iterated".
>Not identically. That why there is a 5x rule, if JIT happened
>everythime code was repeated, the emulation would crawl.
Nope, if JIT happened the very first time a bit of code is executed (as
might be the case in Petunia --- no mention is made of a fallback
interpretative emulation in the project description), then you end
up translating a lot of startup code which really never gets used
again.
Bernie
--
Any fool can make a rule, and any fool will mind it
Henry David Thoreau
American writer, 1817-62
>> The Einiac did 5000 additions, or 357 multiplications, per
>> second. No? :)
>18,000 instructions per 1 68K instruction.
18,000 *host cycles* to compile the average 68k *block* (i.e. 4-5
instruction).
>i.e. 70:1 worse than simple UAE.
i.e. 57 times more time taken to compile the average 68k instruction into
x86 code than to execute it using the interpretative emulation.
Of course, compilation happens only *once*, whereas interpretative
emulation needs 70 cycles *each time* the instruction is executed.
>Thats before anything is actually executed, just
>while running under Linux in the case of JIT x86 code generation
>(not execution) mentioned above. According to Meyer.
Once compiled, the instruction will execute *much* faster in JIT-compiled
code than in interpretative mode. Say 14 times faster. So each time through,
you make up 1.625% of the compilation cost.
Bernie
--
I never vote. It only encourages them
Elderly American lady quoted by comedian Jack Parr
>> So slow it takes under 70 repeats of that basic block before you are ahead
>> of the interpretive UAE,
>It hasn't even started running yet. And thats versus complete
>speed-joke UAE.
OK, once again for you, Steve. Please pay attention this time:
Execution times:
* average instruction using non-JIT emulation: 70 host cycles
* average instruction using JIT-compiled code: 5 host cycles[1]
* JIT-compiling average instruction: 4000 host cycles
Let n be the number of times an instruction is executed. Then
Not Using JIT:
Total time: n*70 host cycles
Using JIT, n<5:
Total time: n*70 host cycles
Using JIT, n>=5:
Time spent in interpretative emulation: 70*5= 350 host cycles
Time spent JIT-compiling: 4000 host cycles
Time spent running JIT-compiled code: (n-5)*5 host cycles
=========
Total time 4325+n*5 host cycles
Thus, the break-even point is reached when
4325+n*5 = n*70, i.e.
n=4325/65 -> n=66.54
[1] This is almost certainly high, but it is very hard to measure it without
the act of measurement completely screwing up the result.
>And remember, AOS can't see or time the Linux driven compilation
>time, its like all that time never happened.
You mean the processor cycle counter is stopped during compilation? Because,
you know, all time in Amithlon is based on the processor cycle counter...
How is the processor cycle counter stopped?
Bernie
--
Computers are useless.
They can only give you answers.
Pablo Picasso
>Almost everything I do is PPC, and has been for a long time
>already:
> * GUI gadget and background rendering,
Hmmm --- background and gadget rendering. Let's see, that would be something
like (simplified):
for (row=top;row<=bottom;row++)
for (col=left;col<=right;col++) {
pixval=decode(background)
draw_to_screen(row,col,pixval)
}
>Image processing;
Applying a generalized (2*s+1)x(2*s+1) mask to an image:
for (row=top;row<=bottom;row++)
for (col=left;col<=right;col++) {
pixval=0;
for (dx=-s;dx<=s;dx++)
for (dy=-s;dy<=s;dy++)
if (valid(row+dy,col+dx))
pixval+=source[row+dy][col+dx];
dest[row][col]=pixval;
}
>rendering,
Raytracing an image:
for (row=top;row<=bottom;row++)
for (col=left;col<=right;col++) {
pixval=follow_ray(camera_x,camera_y,camera_z,
dir_x(row,col), dir_y(row,col), dir_z(row,col));
rendered[row][col]=pixval;
>all players and viewers,
Playing an MP3:
for (block=0;block<NumBlocks;block++) {
read_block(file,buffer1);
decode_huffman(buffer1,buffer2);
decode_audio(buffer2,buffer3);
queue_buffer(buffer3);
}
with decode_audio(in,out) looking something like this:
{
[...]
for (i=0;i<SizeBuffer;i++)
out[i]=decode_sample(in,i);
}
Playing an animation:
for (frame=0;frame<NumFrames;frame++) {
read_framedata(file,buffer1);
decode_huffman(buffer1,buffer2);
decode_frame(buffer2,buffer3);
for (row=top;row<=bottom;row++)
for (col=left;col<=right;col++)
draw_pixel(row,col,buffer3[row][col]
}
>much of the gfx subsystem,
What part of the gfx subsystem is PPC on your machine?
>internet image and plugin rendering
Didn't we cover image decoding already? See "background" and "gadget"
rendering.
Have a look at the above stylized code --- and note what all the examples
have in common.
Bernie
--
Everyone is quick to blame the alien
Aeschylus
Greek tragedian, 525-456BC
>We could go to Bernies and camp in his driveway :)
The landlord has threatened to kick the other tenants out for parking their
(rather nice :) cars there. I doubt he'd appreciate a campout :)
Bernie "And it's the middle of winter, with hailstorms!" Meyer
--
Wherever you have an efficient government you have a dictatorship
Harry S. Truman
US President 1945-53
Quake3? right? :)
Hell I cant even figure out how to max out my 1 CPU and Ram, life is
grand :)
Bet it would sure play Everquest *REAL* good! =:)
--
Brian Team *AMIGA*
*grumble*
:-)
> Hell I cant even figure out how to max out my 1 CPU and Ram, life is
> grand :)
Heh!
I can still remember being blown away by the speed increase when I
added an 020 to my A500!
Not to mention upgrading my A4k to an 060.
Ahhh, those were the days!
Most benchmarks show Amithlon whipping ass all over a real amiga
(060/604e).
Thanks to Bjornar for doing those tests.
> -Steve
Mad Dog
You have a very selective memory. Let's take a look at some
of the tests i posted, and let's keep in mind that you were,
infact, part of these discussions.
http://groups.google.com/groups?q=XPKCrunch++author:bbolsoey%
40rl.telia.no.nospam&hl=en&lr=&ie=UTF-8&selm=Xns9071687D7bbolsoey%
40195.70.164.134&rnum=1
<snip>
From: Bjørnar Bolsøy (bbol...@rl.telia.no.nospam)
Subject: Re: Amiga Inc's plan to kill the Amiga
Newsgroups: comp.sys.amiga.advocacy
Date: 2001-03-27 00:12:05 PST
<snip>
Some of this was covered in several posts by myself, Bernd
and others last fall. A brief recap (snipping and cutting
because some of the original posts are not on Google yet):
XPKCrunch:
A4000/060/50 5 second
UAE-P3/733 10.4 seconds
LHATest:
Amiga060/50 0.21 seconds
UAE-P3/733 0.50 seconds
IBrwose boom.html:
Amiga060/66 - 7 seconds (or was it 21?)
Cel460 - 31 seconds
MPEGa with no skipping:
Amiga040/40 High quality, stereo, 22Khz (AmigaAmp)
Amiga040/25 High quality, mono, 22Khz
Amiga030/50 Low quality, mono, 11Khz
Athlon800 High quality, stereo, 22Khz
Cel450 High quality, mono, 11Khz
Benoit fractal:
Amiga060/66 - 2.6 seconds
Athlon800 - 7.6 seconds
Cel450 - 16 seconds
RC5:
Amiga040/25 - 21kkeys
Amiga030/50 - 14kkeys
Cel450 - 9kkeys
Here's a pretty extensive comparison on Doom/Quake I did
back in october. The purpose was really to show Steve that
the presence of a gfx card has in itself has little or no
real impact on performance (Steve later discareded all this
based on his own figures which show a different picture than
-all- these Amigas, but didn't post his config, client info or
benchmark setup when I asked him, invalidating any claims
anyway).
ADoom1.2, 320x200
Cel875/WinUAE
P96 16.3 fps
AGA 13.8 fps
ECS/EHB 11.7 fps
A1200/030-50
AGA 9.72 fps (ADoom1.1)
A1200/040-25
AGA 8.16 fps
Cgx/CV64 7.46 fps
A1200/040-50
AGA 19.6 fps (ADoom0.8)
A1200/060-50
AGA 24.62 fps (ADoom0.9)
A4000
PPC060-50/PIV 33.28 fps
CS060-50/CV64 27.78 fps
Now watch this, 640x480:
Cel875/UAE/AGA 3.0 fps
Cel875/UAE/P96 4.5 fps
A4000/PPC060-50/PIV 8.85 fps (ADoom1.1)
A4000/PPC604-200/PIV 12.49 fps (ADoomPPC1.3)
A1200/PPC603-200
AGA, 608x456 7.83 fps (ADoomPPC1.3)
CGX/CV64 4.86 fps (ADoomPPC1.3)
Looks like I'm just beating the 200Mhz PPC/1200/CV64
in hires gfx card mode though obviously it has me beat
running AGA.
Bottomline: I'm sure there are apps which will perform
worse than these given the right (or wrong) circumstances,
but that really defeats any generalization on the subject
-- there is simply no doubt that people are running a lot
of stuff at Amiga040 speeds, including myself.
Stuff that most people use, not curiosity apps or pointless
benchmark utilities.
Regards...
<\snip>
Regards...
> 18,000 instructions per 1 68K instruction. i.e. 70:1 worse than
> simple UAE. Thats before anything is actually executed, just
> while running under Linux in the case of JIT x86 code generation
> (not execution) mentioned above. According to Meyer.
The selective snipping comes into play again.
That is 18,000 instructions done once to turn that 68k instruction into an
x86 instruction or two, and then thereafter it runs the compiled code.
That is the whole point of JIT's you dumbwit n00b, a small hit at the
beginning that you barely notice that gets you vastly faster results at the
end of it.
If Amithlon was as you say it is, a 1.8GHz Athlon would execute 68k code at
the speed of a 100kHz 68k. Clearly it does not, as people are very happy
and reporting extremely fast execution times, so does that not make you
think for a second that you are completely wrong? Totally and utterly
wrong?
Graham
> UAE-JIT takes 50-100x longer eitherway.
Ah, SG changes his argument once he loses it.
> Until it gets dumped from the cache, but who cares anyway unless
> you never do anything new.
When the cache can be in the 100's of MB, there will be very few times that
it is dumped from the cache.
> Unfortunately, it removes and advantage x86 might provide over
> even a 5 year old CS-PPC, in the cases where JIT might have
> otherwise helped.
When you have a 1.8GHz x86 processor put up against a 66Mhz '060 I think we
can see what the winner will be even given YOUR claim that the efficiency
is only 25% on the x86. And given the vastly superior I/O on current x86
hardware (i.e., not your broken misconfigured piece-of-junk laptop) there
is only one clear winner, and all users of Amithlon (i.e., NOT YOU) agree
that winner is Amithlon by a clear mile.
So why do you continue to argue? You are wrong, accept it and move on.
>> So slow it takes under 70 repeats of that basic block before you are
>> ahead of the interpretive UAE,
>
> It hasn't even started running yet. And thats versus complete
> speed-joke UAE.
Actually it (the basic block in question) will have run 5 times at
interpretive speeds first, and then comes a delay so small that you won't
notice it whilst it is JIT compiled, and then a further 70 repeats of that
basic block (which will happen 99.99% of the time for basic blocks that are
run 5 times anyway) make the code run faster overall than the interpretive
version. 200 repeats and the interpretive version is left in the dust. 1
million repeats (still likely in 99% of the cases) and the JIT version has
lapped the interpretive version about 500000 times it is that much faster.
> And remember, AOS can't see or time the Linux driven compilation
> time, its like all that time never happened. Intentionally
> fudged speed marks in the few cases JIT might actually help.
I fail to see how a benchmark like rendering a scene in Imagine or
performing effects in ImageFX or whatever are "fudged speed marks"... care
to explain?
> Imagine 4 isn't Imagine 6 features-wise.
Yah, so? We aren't arguing features, we are arguing rendering times.
> And why not use a Imagine2 x86 and go 5x faster than Amithlon?
Ha, okay. I will simply use a PC and get my work done even faster than with
Amithlon, which in itself lets me get my work done many many times faster
than a 66MHz '060. You are the master of arguments.
Graham
> 18,000 instructions per 1 68K instruction. i.e. 70:1 worse than
> simple UAE. Thats before anything is actually executed, just
> while running under Linux in the case of JIT x86 code generation
> (not execution) mentioned above. According to Meyer.
The selective snipping comes into play again.
They're almost all 020 speed. The only Syspeed benchmark that
isn't 020 speed is cpu, but only the 1% of the time UAE
interpretive isn't in use, and even then the PPC/060 combo is
everybit as fast, and can actually multitask full speed.
I really can't imagine an Amiga user switching to a $2000 Linux
machine for one untouchable task at a time being slower than 5
year old CS-PPC speed 1% of the time, and 100s of times slower
the rest of the time--with very poor compatibility at that.
--
-Steve
> "SG" <s...@erols.com> writes:
> >Not identically. That why there is a 5x rule, if JIT happened
> >everythime code was repeated, the emulation would crawl.
>
> Nope, if JIT happened the very first time a bit of code is executed (as
> might be the case in Petunia --- no mention is made of a fallback
> interpretative emulation in the project description), then you end
> up translating a lot of startup code which really never gets used
> again.
Thats what I just said, it would crawl. 70:1 slower than
super slow interpretive UAE before anything even starts
executing, and for nothing 99% of the time.
--
-Steve
> "SG" <s...@erols.com> wrote in news:3D1CEB25.M...@erols.com:
> >> The Einiac did 5000 additions, or 357 multiplications, per
> >> second. No? :)
> >
> > 18,000 instructions per 1 68K instruction. i.e. 70:1 worse than
> > simple UAE.
>
> You mean you want to go on record claiming 18,000/70 = 1 ?
>
> :^)
Glad to see the smiley to acknowledge you had no idea.
--
-Steve
Sounds kindof stupid doesn't it. It is because you are quite wrong.
Please list the name of an Amithlon user that agrees with your statement
here. Surely you could find just one, especially if it performs like you
say.