On 26.09.1996 jxsa...@cc.Helsinki.FI (Jyrki O Saarinen)
wrote in Message "Different c2p routines tested":
> I tested some fast c2p routines. PAL: Low-res screen, 256 colours. This
> is important.. My CPU is 68040/40 and A4000. Here are my results:
> Name Count Mid. time (ms)
> --------------------------------------
> Ludde_CPU2_BLIT2: 66 10.90
> Kalms_CPU3_BLIT1: 66 10.90
> Kalms_CPU4_BLIT0: 66 10.97
> Peter_CPU4_BLIT0_020: 66 16.33
> Peter_CPU4_BLIT0_040: 66 11.88
> Copy_MOVEM_MOVEM: 66 10.92
> Copy_MOVEM_MOVE: 66 10.91
> Copy_MOVE_MOVE: 66 10.90
> Copy_PLUSPLUS: 66 10.90
I know that this discussion is rather old, and I don't know if anybody else
posted these results here before (as my news-server is not very trustful),
but here are the IMHO interesting results for an 68020 with 4MB FAST-RAM:
Name Count Mid. time (ms)
-------------------------------------------
Ludde_CPU2_BLIT2: 66 27.17
Kalms_CPU3_BLIT1: 66 27.27
Kalms_CPU4_BLIT0: 66 46.98
Peter_CPU4_BLIT0_020: 66 56.77
Peter_CPU4_BLIT0_040: 66 81.79
Copy_MOVEM_MOVEM: 66 11.42
Copy_MOVEM_MOVE: 66 11.33
Copy_MOVE_MOVE: 66 11.40
Copy_PLUSPLUS: 66 10.57
Sorry if a '020 user disturbed all those 040/060 discussions ;)
but some recent demos (like The Gate) showed that fast 1x1 routines are
possible on such machines, so it would be nice if some coders would still
try to support lower configuration than 030/50!
Tobias
___
// / gr...@crazy.inka.de / AMIGA - you can't beat the feeling /
___ // / /
\\ // / If mankind is the pride of creation, god must have /
\//_/ used an early beta-version of Windows'95 to create it /
: I know that this discussion is rather old, and I don't know if anybody else
: posted these results here before (as my news-server is not very trustful),
: but here are the IMHO interesting results for an 68020 with 4MB FAST-RAM:
Ok, great. I don't recall that anyone posted 020/14 results.
: Name Count Mid. time (ms)
: -------------------------------------------
: Ludde_CPU2_BLIT2: 66 27.17
: Kalms_CPU3_BLIT1: 66 27.27
Hmm. It seems that CPU2BLIT2 is obsolete. I don't know if it can be
optimized, but CPU3BLIT1 really seems to be the best all-round routine,
it even works with Blizzard B1260.
: Kalms_CPU4_BLIT0: 66 46.98
: Peter_CPU4_BLIT0_020: 66 56.77
: Peter_CPU4_BLIT0_040: 66 81.79
Interesting. Again CPU-only by Mikael Kalms seems to be much faster than
either routines by Peter McGavin, but still on B1260 things are very
strange; c2p_020_FastRam2.s was the fastest.
For CPU-only solution the routine by Mikael Kalms seems best.
Although why use CPU-only since with CPU3BLIT1 blitting takes only a
little over two frames in 320x256, so even if we rendering chunky buffer
fast, CPU passes take about a frame anyway. I hope everyone gets the
point..
Btw, did you use PAL: Low-res screen while testing?
: Copy_MOVEM_MOVEM: 66 11.42
: Copy_MOVEM_MOVE: 66 11.33
: Copy_MOVE_MOVE: 66 11.40
: Copy_PLUSPLUS: 66 10.57
Because of these..
: Sorry if a '020 user disturbed all those 040/060 discussions ;)
: but some recent demos (like The Gate) showed that fast 1x1 routines are
: possible on such machines, so it would be nice if some coders would still
: try to support lower configuration than 030/50!
Sure. Just smaller screen size or something.
--
http://www.helsinki.fi/~jxsaarin/ - University of Helsinki, Computer Science
Jyrki O Saarinen (jxsa...@cc.Helsinki.FI) wrote:
: Gr...@CRAZY.inka.de wrote:
:
: : I know that this discussion is rather old, and I don't know if anybody else
: : posted these results here before (as my news-server is not very trustful),
: : but here are the IMHO interesting results for an 68020 with 4MB FAST-RAM:
:
: Ok, great. I don't recall that anyone posted 020/14 results.
:
: : Name Count Mid. time (ms)
: : -------------------------------------------
: : Ludde_CPU2_BLIT2: 66 27.17
: : Kalms_CPU3_BLIT1: 66 27.27
:
: Hmm. It seems that CPU2BLIT2 is obsolete. I don't know if it can be
: optimized, but CPU3BLIT1 really seems to be the best all-round routine,
: it even works with Blizzard B1260.
:
: : Kalms_CPU4_BLIT0: 66 46.98
: : Peter_CPU4_BLIT0_020: 66 56.77
: : Peter_CPU4_BLIT0_040: 66 81.79
:
: Interesting. Again CPU-only by Mikael Kalms seems to be much faster than
: either routines by Peter McGavin, but still on B1260 things are very
: strange; c2p_020_FastRam2.s was the fastest.
Well, yes and no! If you read the "optimizing 680x0 asm" (or something)
article I found on Motorola's homepage, you will discover (like I did),
that code optimized for the 020 will run great on 060 as well. 040
is another story...
- Lasse (Vip/Equinox)
: : Interesting. Again CPU-only by Mikael Kalms seems to be much faster than
: : either routines by Peter McGavin, but still on B1260 things are very
: : strange; c2p_020_FastRam2.s was the fastest.
: Well, yes and no! If you read the "optimizing 680x0 asm" (or something)
: article I found on Motorola's homepage, you will discover (like I did),
: that code optimized for the 020 will run great on 060 as well. 040
: is another story...
That is crap.
The reason why some routines behave strangely on B1260 is the card, not
060. Both CPU-only routines by Mikael Kalms and Peter McGavin are free
on other 060 cards and most 040 cards.
Actually, the only routines that are needed are CPU3BLIT1 and CPU5BLIT0.
The program must just test which one results higher frame rate.
: So Kalms_CPU5_Blit0 is bad code for 060. For most part it doesn't pair
: but regardless of that Peter's C2P_020 still performs C2P in two stages,
: writing the mangled chunky back to fast mem and reading it back. Besides
: on other 060 cards Kalms one is faster.
It shouldnt matter how "bad" code pairs if those routines are free on a
slower 040 ..
Hmm. What routines should be used to have a minimum amount of them and
have support for all configs..
Jyrki O Saarinen (jxsa...@cc.Helsinki.FI) wrote:
: Lasse Staff Jensen <las...@idb.hist.no> wrote:
:
: : : Interesting. Again CPU-only by Mikael Kalms seems to be much faster than
: : : either routines by Peter McGavin, but still on B1260 things are very
: : : strange; c2p_020_FastRam2.s was the fastest.
:
: : Well, yes and no! If you read the "optimizing 680x0 asm" (or something)
: : article I found on Motorola's homepage, you will discover (like I did),
: : that code optimized for the 020 will run great on 060 as well. 040
: : is another story...
:
: That is crap.
Shit, I should have expressed it differently! I wrote yes and no,
because 1) Yes: B1260 is indeed strange 2) No: Code (in general) will
run faster with a 020 optimized routine than a 060 on.
It still don't explains the B1260, but at least we are one step closer...
- Lasse (Vip/Equinox)
: Well, yes and no! If you read the "optimizing 680x0 asm" (or something)
: article I found on Motorola's homepage, you will discover (like I did),
: that code optimized for the 020 will run great on 060 as well. 040
: is another story...
So Kalms_CPU5_Blit0 is bad code for 060. For most part it doesn't pair
but regardless of that Peter's C2P_020 still performs C2P in two stages,
writing the mangled chunky back to fast mem and reading it back. Besides
on other 060 cards Kalms one is faster.
--
D.
well, I guess the "new" merge would make it faster on 020-14 ?
this plus the goodie of having moduloed last c2p pass would
be the advantage. I read quite some "partial c2p" requests.
either cpu4blit1 or cpu2blit2 (the one blitter pass can be changed
to allow the otherone to write linear and though a part of the scr).
> optimized, but CPU3BLIT1 really seems to be the best all-round routine,
> it even works with Blizzard B1260.
>
> : Kalms_CPU4_BLIT0: 66 46.98
> : Peter_CPU4_BLIT0_020: 66 56.77
> : Peter_CPU4_BLIT0_040: 66 81.79
>
> Interesting. Again CPU-only by Mikael Kalms seems to be much faster than
> either routines by Peter McGavin, but still on B1260 things are very
> strange; c2p_020_FastRam2.s was the fastest.
>
> For CPU-only solution the routine by Mikael Kalms seems best.
changing it to do partial c2p should be no problem for cpuonly c2p.
> Although why use CPU-only since with CPU3BLIT1 blitting takes only a
> little over two frames in 320x256, so even if we rendering chunky buffer
> fast, CPU passes take about a frame anyway. I hope everyone gets the
> point..
it blits quite exactly 2 frames. however if cpu writes, it should drop
to about 2.3 frames. cpu can only write half speed however...
bus conflict! ideal are fx taking 3 frames. if you halfe the renderpart,
you will notice it won't approach the 25fps just like expected. well...
on the 020-14, bus conflict is minor. doing 1 cpupass it drops to
half copyspeed even without dma, so almost no difference if there
is blitter dma. that's why 020-14 rules. forget the last shit
statement please :D
oh, and I'd love to see the gate on 020-14.
I did not get from prev post if it runs or if it got
a totaly lame senseless "if <030" tester.
the gate showed fastest 1x1 tmap cubes I ever saw (saw it
on a 030-28).
> : Sorry if a '020 user disturbed all those 040/060 discussions ;)
well, there are lot more 020 than 040.
this sounds like "back to vanilla" ?
no. just don't mess too much with resources.
don't load the whole demo at once.
> : but some recent demos (like The Gate) showed that fast 1x1 routines are
> : possible on such machines, so it would be nice if some coders would still
> : try to support lower configuration than 030/50!
>
> Sure. Just smaller screen size or something.
well, not really. the one user might want 50hz 2x2 on 040, the other
would like it 1x1 even if it's 2fps on 020.
I saw a PC demo that was resolution configurable.
funny, amigademos would need it much more, much more different cpus
in use.
>
> --
> http://www.helsinki.fi/~jxsaarin/ - University of Helsinki, Computer Science
>
-------------------------------------------------------------------------
fisc...@Informatik.TU-Muenchen.DE (Juergen "Rally" Fischer) =:)
http://www.informatik.tu-muenchen.de/~fischerj/
^- dont forget this char
I read this, and it was interesting, although not that useful. Do you know if there's
some table of cycles used for all common instructions in some of the other files?
And what is a .pdf file, and how do you extract/read/print/whatever them? I have a
feeling they are supposed to be printed, but I don't know how :(
/Johan Rönnblom, Team Amiga
jxsaarin%cc.Helsinki.FI@INTERNET wrote :
> That is crap.
>
> The reason why some routines behave strangely on B1260 is the card, not
> 060. Both CPU-only routines by Mikael Kalms and Peter McGavin are free
> on other 060 cards and most 040 cards.
>
> Actually, the only routines that are needed are CPU3BLIT1 and CPU5BLIT0.
> The program must just test which one results higher frame rate.
Well, sadly both behave strange under my rtgmaster.library :((( Maybe some c2p
specialist could have a look at the code ? It is on Aminet
(gfx/board/rtgmaster.lha, the Kalms c2p are newc2p.asm and cpu5.asm (i was
told not to use the same filenames, as i modified the code)). Try it out with
the "flame" demo in the demos directory of rtgmaster... somewhat the display
behaves strange... :(
Steffen Haeuser
I read about those request too and I converted the CPU only solution by
Mikael Kalms to have both plane and chunky modulos so if anybody wants
it just mail me.
--
D.
: I read this, and it was interesting, although not that useful. Do you know if there's
: some table of cycles used for all common instructions in some of the other files?
: And what is a .pdf file, and how do you extract/read/print/whatever them? I have a
: feeling they are supposed to be printed, but I don't know how :(
.pdf is an Adobe Acrobat format file. You can view it or print it with
Ghostscript, Xpdf etc which are available from aminet.
--
D.
On 23.10.1996 jxsa...@cc.Helsinki.FI (Jyrki O Saarinen)
wrote in Message "Different c2p routines tested (again)":
> Btw, did you use PAL: Low-res screen while testing?
Sure - 320x256x8! When I first ran the test-program on my 640x256x3
workbench the results were a lot better :)
> : Sorry if a '020 user disturbed all those 040/060 discussions ;)
> : but some recent demos (like The Gate) showed that fast 1x1 routines are
> : possible on such machines, so it would be nice if some coders would still
> : try to support lower configuration than 030/50!
> Sure. Just smaller screen size or something.
I'd prefer if a 2x2 would be used on lower configuration (or that the user
can choose the c2p to use), but please don't do it like in THE GATE's
"low-mode", as the screen is rendered the same size as in normal mode,
and then only half of it is displayed - this doesn't look very good!
Could you perhaps make it available somewhere, either Aminet, www or
post it here.
Regards from Torgeir Lilleskog
: > Btw, did you use PAL: Low-res screen while testing?
: Sure - 320x256x8! When I first ran the test-program on my 640x256x3
: workbench the results were a lot better :)
Ok. I just had to ask..
: I'd prefer if a 2x2 would be used on lower configuration (or that the user
: can choose the c2p to use), but please don't do it like in THE GATE's
: "low-mode", as the screen is rendered the same size as in normal mode,
: and then only half of it is displayed - this doesn't look very good!
Actually using larger pixels or smaller screen size does not affect
frame rate linearly. A situation like this could be many small
polygons.
: I read this, and it was interesting, although not that useful. Do you know if there's
: some table of cycles used for all common instructions in some of the other files?
: And what is a .pdf file, and how do you extract/read/print/whatever them? I have a
: feeling they are supposed to be printed, but I don't know how :(
PDF is the portable data format or something like that.
It's a kind of mix between postscript and TeX, imho.
AFAIK, only Abode Acrobat reads this, but that's available for *ALL*
computer formats (ie PC, and Mac :-( suxx!)
: /Johan Rönnblom, Team Amiga
--
.-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-.
| _ __ _________ ____ http://www.abdn.ac.uk/~u13sac u13...@abdn.ac.uk |
| | |/\ \_/(__ / __) __ \ My opinions are not that of Aberdeen University |
| | (\ // /| _)| / or AUCC, thankfully. NEW!: ky...@hotmail.com |
| |_|\_\|_|/____)___)_|\_\ Adamant: telnet://130.83.9.19:4711 |100% Amiga, |
| Stuart Caie, mere undergraduate at Aberdeen University | always. |
`-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-'
: Well, sadly both behave strange under my rtgmaster.library :((( Maybe some c2p
: specialist could have a look at the code ? It is on Aminet
: (gfx/board/rtgmaster.lha, the Kalms c2p are newc2p.asm and cpu5.asm (i was
: told not to use the same filenames, as i modified the code)). Try it out with
: the "flame" demo in the demos directory of rtgmaster... somewhat the display
: behaves strange... :(
As I told you in the mail already, the reason lies in the fact that the
double buffering in rtgmaster (at least in aga driver) doesn't work.
Just see those raster splits in the lanscape demo. The reason why the
blitter assisting one "bugs" more seriously is because it still has some
planes left to blit when the particular buffer is brought to front. I
checked the newc2p.asm and there isn't even a mechanism which could
signal maintask when all the blitter passes have been completed.
--
D.
Hi, u13sac , on 30-Oct-96 09:55:48 you scribbled....
>PDF is the portable data format or something like that.
>It's a kind of mix between postscript and TeX, imho.
>AFAIK, only Abode Acrobat reads this, but that's available for *ALL*
>computer formats (ie PC, and Mac :-( suxx!)
there's a port of xPDF on aminet, and ghostscript also reads them AFAIR
Mike
--
---------------------------------------------------------------------------
Mike Redrobe - Mi...@Redrobe.demon.co.uk MikeRR on #Amiga
http://www.Redrobe.demon.co.uk
---------------------------------------------------------------------------