: Hi Everyone !
: I have several questions about c2p-converters
: - what are the fastest ones (ideas, theory, sources) ?
depends on cpu. 040 cpu only. other cpus mix cpu & blitter.
what means "fast" ? :) no, not a silly questions. 2 possibilities on
non-040-c2p:
a) the c2p procedure is to end as soon as possible
b) the c2p may take a while as it is done by blitter, the cpu already
does other things (this is for games).
for b) there's blitterscreen:
BLITTERSCREEN sources: http://www.informatik.tu-muenchen.de/~fischerj/
;)
you won't need to switch back to 6 planes using that ;)
: - what the hell is a scrambled (maybe spelled wrong) buffer ?
the buffer still consists of chunky bytes, but rearranged in a way
the c2p routine can do conversion faster. only good for special fx,
not for games.
: (I have heard this sometimes in this group)
: and what are his adventages/disadvantages ?
: - are there any very fast c2p which use a comparebuffer ?
yes, AFAIK Peter McGavin got most experience data on this.
AFAIK the routines get very fast when the soure to convert doesn't
change much.
: - ... anything to this topic
: I have never written an c2p by myself but I converted and modified
: converters I found in aminet. All of them do not use fixed plsize and the
: most are for 8 bitplanes.
: Are there really fast 6 bitplane c2p-converter ?
: My actual approachs are (50Mhz 030, OS-conform (except screens of
: gfxcards), 320x256)
aah 320x256. forget blitterscreen ;) but if you do last conversion pass with
blitter, you'd still get more speed!
: 8bpls : 45ms
: 6bpls : 43ms
the thing why 6planes is not much faster is you still parse throug a
8bit chunky source. but when it comes to blitter assistance, the
difference might be more (don't know, just a feeling).
: 4bpls : 30ms
: These c2p-converter(s) use LONG-writes to CHIPmem.
I wonder why 040/25 is said to do it 4MB/sec c2p but 030-50 is lot slower ?
: Thanks.
: Sven Steiniger
------------------------------------------------------------------------
fisc...@Informatik.TU-Muenchen.DE (Juergen "Rally" Fischer) =:)
<sb>Patrick Hanevold - Virtual Reality developer
<sb>patrick....@login.eunet.no
<sb>Amiga and official Be developer
I have several questions about c2p-converters
- what are the fastest ones (ideas, theory, sources) ?
- what the hell is a scrambled (maybe spelled wrong) buffer ?
(I have heard this sometimes in this group)
and what are his adventages/disadvantages ?
- are there any very fast c2p which use a comparebuffer ?
- ... anything to this topic
I have never written an c2p by myself but I converted and modified
converters I found in aminet. All of them do not use fixed plsize and the
most are for 8 bitplanes.
Are there really fast 6 bitplane c2p-converter ?
My actual approachs are (50Mhz 030, OS-conform (except screens of
gfxcards), 320x256)
8bpls : 45ms
6bpls : 43ms
4bpls : 30ms
These c2p-converter(s) use LONG-writes to CHIPmem.
Thanks.
Sven Steiniger
2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
Thats 160x128 bytes.
Can't get faster!
SS> I have several questions about c2p-converters
SS> - what are the fastest ones (ideas, theory, sources) ?
Depends a lot on what machine your using,and also on the types of images
used... (eg if there's not much difference between frames a smart routine
usually wins)
SS> - what the hell is a scrambled (maybe spelled wrong) buffer ?
The bytes for each pixel aren't sequential but are in an order that means less
work for the c2p routine.. eg instead of 1234567... you might use
1,5,9,13,2,6,10,14,3,.....
This is only really useful for a vertical raycasting engine though... ie one
table lookup per column.
SS> - are there any very fast c2p which use a comparebuffer ?
Yep. I've tried a few methods involving this technique. The last idea I had
was to detect which planes needed to be re-written, as opposed to simply
testing for no change in a group of pixels.
It works fairly well. The benefit is that as long as large regions only use a
small range of colours,then most of the time only a few planes actually need
re-writing. The problem is doing this calculation fast enough to make it
worthwhile.
SS> My actual approachs are (50Mhz 030, OS-conform (except screens of
SS> gfxcards), 320x256)
SS> 8bpls : 45ms
SS> 4bpls : 30ms
The 4 plane one should be much faster.. IMO
Ben... :)
I want to add that I can get 20ms for 4bpl 5pass (Non scrambed) If
I use a 512K table , a little bit silly but I get roughtly 4.1 mpixel
second: 320x210 pixel converted per frame (NTSC).
Stephan
PH> We are going to release one in a couple of days.
PH> 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
PH> Thats 160x128 bytes.
PH> Can't get faster!
In principle you can actually...
Quite a lot of the planar data may not need re-writing in the first place.
Ben... :)
Maybe not the correct routine? I think that a 030/50
really is able to do c2p free. Maybe 4th pass done with
the blitter.
-- _
a Stellar programmer _ //
"Amiga - back for the future" \X/
Oh, word swapping is the 5th pass. What is your opinion,
I think 030/50 should be able to do c2p free. Perhaps
three passes with CPU and the last one with the blitter.
> Aminet is not a good image of where c2p is today.
Well, there is the only 040 routine, by Peter McGavin.
(Or was it James McCoull, I wonder where he has vanished)
> I'm personally waiting for some new C2p code to be posted soon
> that is suposed to be much faster then what I wrote so far.
How is doing this..? ;)
PH>>> 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
PH>>> Can't get faster!
>> In principle you can actually...
>> Quite a lot of the planar data may not need re-writing in the first
>> place.
PH> re-writing? What the heck are you talking about?
Assuming that you still have a screenful of planar data from the last
conversion,it's very likely that a lot of the new planar data you write will be
identical to the old copy and thus doesn't need to be overwritten with the same
data!! (Depending on the type of graphics used and the amount of change per
frame...)
The end result is a c2p routine which takes a variable amount of time to do
the conversion,but may well be faster.
eg: Assuming that the first four pixels are colours 12,13,220,114. On the next
pass let's say they are now 15,12,223,116.
If you eor the old set with the new set,and re-order the data into
planar,you'll find that only planes 1 & 2 have actually changed... So you can
save 6 chip writes and a fair bit of manipulation,but at the the expense of
reading twice as much data from (hopefully) fast memory.
Ben... :)
PH>> 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
PH>> Thats 160x128 bytes.
PH>> Can't get faster!
>In principle you can actually...
>Quite a lot of the planar data may not need re-writing in the first place.
re-writing? What the heck are you talking about?
<sb>Patrick Hanevold - Virtual Reality developer
The idea situation is to do as many c2p passes with the CPU,
depending on the CPU speed, but in such a way that the
chunky2planar CPU process is as fast as normal copying
from fast ram to chip ram. So the c2p passes are "free"
between the chip writes.
The rest passes are done with the blitter using QBlit().
Starting from the 040, blitter is not needed since
040 is able to done all c2p passes completely free!
> - what the hell is a scrambled (maybe spelled wrong) buffer ?
> (I have heard this sometimes in this group)
> and what are his adventages/disadvantages ?
Pass1 is skipped by writing vertical colums in non-linear
order. Only good for vertical drawing, like Wolf3D.
Pass1 only rearranges bytes.
> Are there really fast 6 bitplane c2p-converter ?
> My actual approachs are (50Mhz 030, OS-conform (except screens of
> gfxcards), 320x256)
> 8bpls : 45ms
> 6bpls : 43ms
> 4bpls : 30ms
> These c2p-converter(s) use LONG-writes to CHIPmem.
What routine this is? I guess 030/50 should be able
to get c2p free --> 320x256x8 screen should be about 20ms.
This could be perhaps achieved by doing three passes
with CPU and the last one with the blitter.
-- _
: > I personaly use the 5 pass methode.
: > swap word
: > swap byte
: > swap nyble
: > swap bitpair
: > swap bit
: Oh, word swapping is the 5th pass. What is your opinion,
: I think 030/50 should be able to do c2p free. Perhaps
: three passes with CPU and the last one with the blitter.
You can do 13 inst betwen chip write on a 50mhz 030, right?
Thats 52 inst. Alomst yea. (4bit )
Actually my table methode use 50inst to do the 5pass, including
load/store.32 inst without the last pass, but on a 030 you
cant do fast read with a chip write.
basicly:
mw (a0)+,d0
ml (a1,d0*4),d1
mw (a0)+,d0
add.l (a2,d0*4),d1
.. 3 more time (Do pass 1,2,4)
pass 16+pass8 is 30 inst ... easy to make it overlap with
the 4 chipram write. (You write the 4 previous converted
long, this mean you have to unroll the loop and have 2 set of
register) But there is enought register & the all thing fit in the
cache. So the blitter is not needed...
Stephan
: PH>> We are going to release one in a couple of days.
: PH>> 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
: PH>> Thats 160x128 bytes.
: PH>> Can't get faster!
: >In principle you can actually...
: >Quite a lot of the planar data may not need re-writing in the first place.
depends on cpu, on 020-14, not writing is slower than a plain copy.
btw, on 040, anyone ever though on not writing 0's to the planes and clear them
with blitter ?
: re-writing? What the heck are you talking about?
: <sb>Patrick Hanevold - Virtual Reality developer
: What routine this is? I guess 030/50 should be able
: to get c2p free --> 320x256x8 screen should be about 20ms.
how fast is it donig the 040 version. Is the one from aminet free on 040?
: This could be perhaps achieved by doing three passes
: with CPU and the last one with the blitter.
if maybe new A1200 is only 030-40 or something, maybe blitter has to
overtake 1.5 passes ;)
: -- _
: a Stellar programmer _ //
: "Amiga - back for the future" \X/
: We are going to release one in a couple of days.
: 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
nice, but which cpu and which framerate (for the case of blitter assistance) ?
: Thats 160x128 bytes.
: Can't get faster!
: <sb>Patrick Hanevold - Virtual Reality developer
It does two passes almost free even on a 28MHz 020/030. Completely
free on a 40MHz 030. The blitting time was quite small if
I remeber correctly, the whole time was 17ms, and the CPU
part is ~5ms, so ~12ms blitting.
c2p_040_FastRam4.s 4543 ----rwed 29-Elo-93 14:53:40
This routine is from Aminet, if I recall correctly from a
package called FASTC2P.LZH or something. It does chunky2planar
"free". Funny how the c2p_020_FastRam2.s is about 50% slower
on the 040 that the 040 routine..
> if maybe new A1200 is only 030-40 or something, maybe blitter has to
> overtake 1.5 passes ;)
Could you see a 030/50 doing three passes free? It should
be possible since Ludde & Guys made a routine that does
two passes _almost_ free on a 28MHz 020/030.
PH>>> 2x2x256 just as fast as it takes to copy the chunky-buffer to chip. :)
PH>>> Thats 160x128 bytes.
PH>>> Can't get faster!
>>In principle you can actually...
>>Quite a lot of the planar data may not need re-writing in the first place.
>re-writing? What the heck are you talking about?
Compare buffers. The routine detect differences and only update changed areas.
--
Christopher Dyken (chri...@sn.no | http://www.sn.no/~christod)
LoungeBar Development
patrick....@login.eunet.no (Patrick Hanevold) writes:
>>for b) there's blitterscreen:
>>BLITTERSCREEN sources: http://www.informatik.tu-muenchen.de/~fischerj/
>>;)
>SLOOOOOOOOOOOW!!! :)
>In a couple of days, you'll get ours. :)
><sb>Patrick Hanevold - Virtual Reality developer
><sb>patrick....@login.eunet.no
><sb>Amiga and official Be developer
Are you using HAM mode? You can get excellent speed only updating the lower
four bitplanes and interlacing lores screens, alternating odd/even chunky
pixels.
jer...@wimsey.com
------------------
"Let he who has no sword sell his cloak and buy one." - Jesus, Luke 22:36
> What routine this is? I guess 030/50 should be able
> to get c2p free --> 320x256x8 screen should be about 20ms.
> This could be perhaps achieved by doing three passes
> with CPU and the last one with the blitter.
20 ms? Using scrambled chunky buffer I get 320x200x8 in 25.7
on 030@50 (slow :-( ). Well, I keep multitasking enabled using a 127
TaskPri, so interrupts should suck the cache... But 20 ms. for
320x256... Could you give some hints?
TIA,
--
---------------------------- --------------------------------------------
| Jorge Acereda | Dream the same thing everynight |
| ii...@rossegat.uji.es | I see our freedom in my sight |
| Intel Outside | No locked doors, no windows barred |
| Amiga Rules | No things to make my brain seem scarred |
---------------------------- --------------------------------------------
: : Oh, word swapping is the 5th pass. What is your opinion,
: : I think 030/50 should be able to do c2p free. Perhaps
: : three passes with CPU and the last one with the blitter.
After twidling I got the 4bit version to reach 18ms at 320x256.
On my 25mhz 030 this equal to half the bandwidth speed.
Because it uses tables it will never be able to c2p for 'free'.
But for a 25mhz 030 18ms is pretty acceptable if you have some
use of the unused blitter.(Because the first 3 pass are done with
a lookup table the 2 other pass overlaped during the chipmem write,
so the blitter as nothing to do to help)
Stephan
>> : We are going to release one in a couple of days.
>> : 2x2x256 just as fast as it takes to copy the chunky-buffer to chip.
>>
>> nice, but which cpu and which framerate (for the case of blitter
>> assistance) ?
>It does two passes almost free even on a 28MHz 020/030. Completely
>free on a 40MHz 030. The blitting time was quite small if
>I remeber correctly, the whole time was 17ms, and the CPU
>part is ~5ms, so ~12ms blitting.
Why not just measure it in scanlines? Why ms?
__
\\\__ Thore B. Karlsen % t...@sn.no % C64-C128D-A1200-A2000C
\XX/ Wowbagger/AFL&SSN % -c0d3r- % A1230/50MHz-2C4F/340MB
Acts.9.5: .. And the Lord said, I am Jesus whom thou persecutest: it is
hard for thee to kick against the pricks.
Yep, 16c just is not enough .. What do you think, could
it be possible to do three passes free on a 030/50?
On the other hand, two passes free and more blitting
should be OK too, because on 320x256 the 50MHz 68030
surely is not able to render a complex tmapped
scene faster than 25fps.
I was just speculating. Copying a 320x256x8 data area from
fast to chip ram takes about 20ms. Now, if three c2p passes
could be done "free" on 030/50 and the last pass with
the blitter, it would be great.
I dont think your systemfriendliness slows down..
Do you use the blitter?
naaah, whats cooler than 020-14 c2p for free ;) I see that for realtimish
fx it's better to accept cpu doing 10ms for doing one pass for a 2x2 screen.
I already got the routine, wait for bltscr_v1.3 :)
|> >In a couple of days, you'll get ours. :)
oh, yes I'm interested!
|>
|> ><sb>Patrick Hanevold - Virtual Reality developer
|> ><sb>patrick....@login.eunet.no
|> ><sb>Amiga and official Be developer
|>
|> Are you using HAM mode? You can get excellent speed only updating the lower
|> four bitplanes and interlacing lores screens, alternating odd/even chunky
|> pixels.
cool, have you done a demo with your idea ?
if the cpu is quick enough, and the renderer is dirty enough to get conversion
for free ;) then you should even be able to copy a 1x1 screen in 1/2 frame! =:o
but maybe you got to wait for PPC to get it for free ;)
would be especially interesting for all A3000 users! (quick cpu but no AGA)
|>
|> jer...@wimsey.com
|> ------------------
|> "Let he who has no sword sell his cloak and buy one." - Jesus, Luke 22:36
|>
5ms is the time for copying a 2x2 frame.
10ms is the time for one blitter pass.
The values seem to make sense, but isn't it rather 7ms for cpu doing
passes not completely free, and 10ms for one blitter pass ? ;)
BTW how much is the 2pass routine on 020 ?
My one pass routine needs 10ms for a 2x2 screen, so 2times more than
copying, I hope doing 2 passes is ~15ms and not ~20ms...
|>
|> -- _
|> a Stellar programmer _ //
|> "Amiga - back for the future" \X/
|> c2p_040_FastRam4.s 4543 ----rwed 29-Elo-93 14:53:40
|>
|> This routine is from Aminet, if I recall correctly from a
|> package called FASTC2P.LZH or something. It does chunky2planar
|> "free". Funny how the c2p_020_FastRam2.s is about 50% slower
|> on the 040 that the 040 routine..
|>
|> > if maybe new A1200 is only 030-40 or something, maybe blitter has to
|> > overtake 1.5 passes ;)
|>
|> Could you see a 030/50 doing three passes free? It should
If it can do this, and you use 3 pass c2p, then it does c2p for free.
|> be possible since Ludde & Guys made a routine that does
|> two passes _almost_ free on a 28MHz 020/030.
So remaining blitter action (one pass) is only 2 frames (for fullscreen 1x1!).
basicly 40 inst per pass that equal 96cycle*3 /8 = 36...
can you fit 36 cycle in betwen chipmem write on a 50mhz 030?
Stephan
Well, something around those figures.
> The values seem to make sense, but isn't it rather 7ms for cpu doing
> passes not completely free, and 10ms for one blitter pass ? ;)
Nope, starting from a 40MHz 68030 it does those two passes free.
Almost on a 28MHz Amiga, and in real-life situation (8bpl lores)
I could propably claim that even a 28MHz 020/030 does those
two passes free.
> BTW how much is the 2pass routine on 020 ?
I have not tested.. the old one was 20ms (7ms on 030/50),
I think the new 2pass routine could be about 10-15ms on a
020/14. I have to test it..
> My one pass routine needs 10ms for a 2x2 screen, so 2times more than
> copying, I hope doing 2 passes is ~15ms and not ~20ms...
Yep, Luddes one pass routine took also 10ms on 020/14.
How about doing the last pass, too..? ;) I calculated
about 40ms for blitting when converting a 320x128 screen,
two passes with a CPU. This is nice, too, since a 28MHz does
the CPU part free, and the "large" blitting time is not a problem
since the CPU propably is not able to render faster than
25fps.
>|> Are you using HAM mode? You can get excellent speed only updating
the lower
>|> four bitplanes and interlacing lores screens, alternating odd/even
chunky
>|> pixels.
>
>cool, have you done a demo with your idea ?
>
>if the cpu is quick enough, and the renderer is dirty enough to get
>conversion for free ;) then you should even be able to copy a 1x1
>screen in 1/2 >frame! =:o but maybe you got to wait for PPC to get it
>for free ;)
>would be especially interesting for all A3000 users! (quick cpu but no
>AGA)
I tried this using SuperHires Ham-6 (Soz A3000 users). On my 030-50F it
does a 12-Bit truecolour 1X1 screen (256*256 [1024*256]) in about 12.5
fps, and the processor is currently idle for >= 3/4 of the time (no
rendering just F>C conversion), as the blitter is doing the bulk of the
work and so there is plenty of time to render the screen with the
processor. The only problems that I can see with it is that colours
have to be scrambled to get this speed (this could be a pain when
shading Etc.) and it requires s**tloads of memory. It does look good
though. If only I had time to do something with it ;)
Doug.
Because I have a routine that measures the time that
a subroutine takes in milliseconds.
> This routine is from Aminet, if I recall correctly from a
> package called FASTC2P.LZH or something. It does chunky2planar
> "free". Funny how the c2p_020_FastRam2.s is about 50% slower
> on the 040 that the 040 routine..
Yep, but check the chip writes in the 020 version. This writes
can be pipelined a lot more.
how does it look ? tried watching a pic of a typical doom-scene on it ? :)
|> work and so there is plenty of time to render the screen with the
|> processor. The only problems that I can see with it is that colours
|> have to be scrambled to get this speed (this could be a pain when
scrambled _colors_ ? you mean words ?
|> shading Etc.) and it requires s**tloads of memory. It does look good
|> though. If only I had time to do something with it ;)
|>
|> Doug.
: Why not just measure it in scanlines? Why ms?
An even better measure 'standart' : pixel second :)
Stephan
he gave the numbers in ms.
after having counted the scanlines the effect needs,
(by measuring the hight of it on the TV! :D) you calculate
time= (measured_height/height_of_wbscreen)*(256/312.5)*20ms
for a wb screen of 256 pixel height.
;)
Rallys demo-coding formulas: ;)
----------------------------
nr_rasterlines_fx = (measured_height/measured_height_screen)*nr_lines_screen
time = (nr_rasterlines_fx/nr_rasterlines_tv) * frametime_tv
with nr_rasterlines_tv = 312.5 and frametime = 20ms on PAL.
note that (frametime_tv/nr_rasterlines_tv) is almost same on both PAL and NTSC,
the reason is they got almost same horiz frequency with a reasterline beeing
6.4ns on PAL.
So we can aproximate
time = (nr_rasterlines_fx*6.4ns)
= (measured_height/measured_height_screen)*nr_lines_screen*6.4ns
=;)
what do you think about my formulas ? imho much more interesting than
the crap at school ;)
|>
|> __
|> \\\__ Thore B. Karlsen % t...@sn.no % C64-C128D-A1200-A2000C
|> \XX/ Wowbagger/AFL&SSN % -c0d3r- % A1230/50MHz-2C4F/340MB
|>
|> Acts.9.5: .. And the Lord said, I am Jesus whom thou persecutest: it is
|> hard for thee to kick against the pricks.
|>
uhm well. I thought we were talking about more demoish (realtimish) fx here.
|>
|> -- _
|> a Stellar programmer _ //
|> "Amiga - back for the future" \X/
: >|> Are you using HAM mode? You can get excellent speed only updating
: the lower
: >|> four bitplanes and interlacing lores screens, alternating odd/even
: chunky
: >|> pixels.
: >
: >cool, have you done a demo with your idea ?
: >
: >if the cpu is quick enough, and the renderer is dirty enough to get
: >conversion for free ;) then you should even be able to copy a 1x1
: >screen in 1/2 >frame! =:o but maybe you got to wait for PPC to get it
: >for free ;)
: >would be especially interesting for all A3000 users! (quick cpu but no
: >AGA)
: I tried this using SuperHires Ham-6 (Soz A3000 users). On my 030-50F it
: does a 12-Bit truecolour 1X1 screen (256*256 [1024*256]) in about 12.5
: fps, and the processor is currently idle for >= 3/4 of the time (no
: rendering just F>C conversion), as the blitter is doing the bulk of the
: work and so there is plenty of time to render the screen with the
: processor. The only problems that I can see with it is that colours
: have to be scrambled to get this speed (this could be a pain when
: shading Etc.) and it requires s**tloads of memory. It does look good
: though. If only I had time to do something with it ;)
: Doug.
Very interresting indeed. :)
Anyone up for a test of this thesus?
Cheers...
what about a very dirty 1x1 mode:
R0G1B2R4G5B6
;) only one component of a pixel is overtaken, but not the one needed most
but just random (well, R,G and B cyclic).
Anyone knows how picutres look after this kind of conversion ?
|> and do a two pass (4- and 8-bit) C2P on the lower four bitplanes. The upper
|> two bitplanes in HAM mode are set to display RGBBRGBB... If you use
|> movep.w to output your chunky words, you don't need to do the 8-bit pass,
|> if you do the 4-bit with blitter.
|> This gives you a 16-bit 2x2/2x1 truecolour mode on all Amigas. Also you need to
|> setup the copper to swap screens every 1/60th or 1/50th of a second, and shift
|> the screen display right two pixels to fill in the odd/even pixels.
|> You can do the same with an interlaced screen, but the non-interlaced
|> flicker version looks cleaner (tried a dpaint anim using the same framerate
|> and it looks better than interlaced).
you mean the pixels get flicker-mixed ? theoretical you'd need a mask, but
maybe this looks better.
|>
|> And if you have the luxury of an AGA machine, you can set up a hires HAM
|> screen (640x100 or 640x200) and get a 160x100 or 160x200 display without
|> the need to flicker screens back and forth.
my picture viewer uses a render method, that just looks which component is
to needed updated (the biggest difference to the new wannabe-24bit value).
looks good for photos, quite quick (seen from speed of picture viewers,
which include palette for rendering).
With using tables and a PPC maybe a way for 24bit games ?
|>
|> >if the cpu is quick enough, and the renderer is dirty enough to get conversion
|> >for free ;) then you should even be able to copy a 1x1 screen in 1/2 frame! =:o
|>
|> >but maybe you got to wait for PPC to get it for free ;)
|>
|> >would be especially interesting for all A3000 users! (quick cpu but no AGA)
|>
|>
|> jer...@wimsey.com
|> ------------------
|> "Let he who has no sword sell his cloak and buy one." - Jesus, Luke 22:36
|>
Isnt 25fps enough you for a 3d scene with thousands of polygons?
>: Why not just measure it in scanlines? Why ms?
> An even better measure 'standart' : pixel second :)
Well, mostly when you talk about C2P conversion, it is for realtime
use, meaning a decent framerate. Alas, it makes sense to measure it in
something frame-relative, like pixels/frame, scanlines or something
else.
__
\\\__ Thore B. Karlsen % t...@sn.no % C64-C128D-A1200-A2000C
\XX/ Wowbagger/AFL&SSN % -c0d3r- % A1230/50MHz-2C4F/340MB
Psa.137.9: Happy shall he be, that taketh and dasheth thy little ones
against the stones.
>|> >> : We are going to release one in a couple of days.
>|> >> : 2x2x256 just as fast as it takes to copy the chunky-buffer to c
>|> >>
>|> >> nice, but which cpu and which framerate (for the case of blitter
>|> >> assistance) ?
>|>
>|> >It does two passes almost free even on a 28MHz 020/030. Completely
>|> >free on a 40MHz 030. The blitting time was quite small if
>|> >I remeber correctly, the whole time was 17ms, and the CPU
>|> >part is ~5ms, so ~12ms blitting.
>|>
>|> Why not just measure it in scanlines? Why ms?
>he gave the numbers in ms.
Yes, indeed he did!
>after having counted the scanlines the effect needs,
>(by measuring the hight of it on the TV! :D) you calculate
>time= (measured_height/height_of_wbscreen)*(256/312.5)*20ms
>for a wb screen of 256 pixel height.
>;)
Hey, I KNOW how to convert it.. :) But scanlines mean a lot more to me
than ms, it's not obvious from those numbers how much of my frame(s) is
wasted doing C2P. You don't do C2P only once per second!
>Rallys demo-coding formulas: ;)
>----------------------------
[...]
>what do you think about my formulas ? imho much more interesting than
>the crap at school ;)
Isn't everything more interesting than school though.. :)
__
\\\__ Thore B. Karlsen % t...@sn.no % C64-C128D-A1200-A2000C
\XX/ Wowbagger/AFL&SSN % -c0d3r- % A1230/50MHz-2C4F/340MB
4.Kings.2.23: .. and as he was going up by the way, there came forth
little children out of the city, and mocked him, and said
unto him, Go up, thou bald head; go up, thou bald head.
And he .. cursed them in the name of the LORD. And there
came forth two she bears out of the wood, and tare forty
and two children of them.
Framerate = 1000ms/(time in ms)
> > 20 ms? Using scrambled chunky buffer I get 320x200x8 in 25.7
> > on 030@50 (slow :-( ). Well, I keep multitasking enabled using a 127
> > TaskPri, so interrupts should suck the cache... But 20 ms. for
> > 320x256... Could you give some hints?
> I was just speculating. Copying a 320x256x8 data area from
> fast to chip ram takes about 20ms. Now, if three c2p passes
> could be done "free" on 030/50 and the last pass with
> the blitter, it would be great.
Using a scrambled buffer, I assume...
> I dont think your systemfriendliness slows down..
> Do you use the blitter?
No blitter. 320x200 with CPU only, takes 25.7 ms, beeing the first two
passes 5.something ms and the other two+writes to chip ram the other 20 ms.
(030@50 times) It's pretty slow. I think three totally free passes are not
possible on 030, but the times should not be more than 22 or 23 ms (?).
So, how much would require the last pass with the blitter? Will we
see full screen 25 fps doom on 030?
Greets,
Jorge Acereda (ii...@rossegat.uji.es)
: >: Why not just measure it in scanlines? Why ms?
: > An even better measure 'standart' : pixel second :)
: Well, mostly when you talk about C2P conversion, it is for realtime
: use, meaning a decent framerate. Alas, it makes sense to measure it in
: something frame-relative, like pixels/frame, scanlines or something
: else.
Most number I see are in pixel second... From pc card speed sheet to SGI
tech reference, in this 2 example they aply to realtime operation.
having a ms report of convereting a 320x256 screen is totaly weird.
scanline is the same thing... you need to define your scanline timing.
A pixel and a second is well defined... What is more logical?
1) I can c2p 4.4 mpixel per second on a 25mhz 030
or
2) I can c2p a 320x256 screen in 18.5 ms ...
Stephan
> |> Are you using HAM mode? You can get excellent speed only updating the lower
> |> four bitplanes and interlacing lores screens, alternating odd/even chunky
> |> pixels.
> cool, have you done a demo with your idea ?
I remember a demo with a BIG morphing texturemapped torus (from Sonik ?).
This seems to be using HAM8. It has some glitches, and shows horizontal
lines on screen from time to time.
Any idea on how are they doing the c2p?
Greets,
Jorge Acereda (ii...@rossegat.uji.es)
>> : Why not just measure it in scanlines? Why ms?
>> An even better measure 'standart' : pixel second :)
TBK> Well, mostly when you talk about C2P conversion, it is for
TBK> realtime use, meaning a decent framerate. Alas, it makes sense to
TBK> measure it in something frame-relative, like pixels/frame,
TBK> scanlines or something else.
I personally prefer ms, as 'frame-speed' is not always the same which could
lead to wrong conclusions.
Grtz John
-----------------------------------------------------------------------
John.H...@grafix.xs4all.nl TextDemo/FastView/Etc... development
-----------------------------------------------------------------------
-- Via Xenolink 1.985B3, XenolinkUUCP 1.1
: > uhm well. I thought we were talking about more demoish (realtimish) fx
: > here.
: Isnt 25fps enough you for a 3d scene with thousands of polygons?
I didn't think about special fps.
well, if blitter can keep up doing normal c2p then it's ok
(which not only depends on fps but on cpu & number of passes)
else removing 1 pass can help, with cpu doing 2 passes and blitter
1 insted of 2 you get factor 2 then...
: -- _
: : >: Why not just measure it in scanlines? Why ms?
: : > An even better measure 'standart' : pixel second :)
you mean pixel/second I guess.
: having a ms report of convereting a 320x256 screen is totaly weird.
noooo.
: scanline is the same thing... you need to define your scanline timing.
a scanline is 64ns on amiga (i.e. anyone talking about scanlines means
a PAL/NTSC one).
: A pixel and a second is well defined... What is more logical?
: 1) I can c2p 4.4 mpixel per second on a 25mhz 030
: or
: 2) I can c2p a 320x256 screen in 18.5 ms ...
logic ? both is information. and the latter one gives me the information
that it'll do it beyond one frame.
: Stephan
Look at a tv picture, after all thats how the screen is made up, of tiny
pixels of rgb.
How exactly is a HAM screen generated, i.e. what's the largest screen width?
I only ask because I gather that you can change 1 colour at a time, in effect
you have overlaping RGB planes each 3pixels wide. You could then say that a
1280 pixel wide HAM8 screen is equivalent to a 427 pixel wide 24 bit chunky
display. If you need a more accurate representation of a picture, leave the
green image alone and adjust the red and blue images to suit.
> Why not just measure it in scanlines? Why ms?
20 ms. = 1 frame.
IMHO it's better to measure ms. instead of scanlines... Scanlines
measuring is harder in OS-friendly code.
Greets,
>naaah, whats cooler than 020-14 c2p for free ;) I see that for realtimish
>fx it's better to accept cpu doing 10ms for doing one pass for a 2x2 screen.
>I already got the routine, wait for bltscr_v1.3 :)
>|> >In a couple of days, you'll get ours. :)
>oh, yes I'm interested!
Hehe.. Must write docs first. The source is ready.
>|>
>|> Are you using HAM mode? You can get excellent speed only updating the
>|> lower four bitplanes and interlacing lores screens, alternating odd/even
>|> chunky pixels.
I haven't replyed to this. Sorry to who ever wrote it.
We are using 256 colors 2x2.
Have plans to test 65536 colors. NO HAM! :)
>cool, have you done a demo with your idea ?
>if the cpu is quick enough, and the renderer is dirty enough to get
>conversion for free ;) then you should even be able to copy a 1x1 screen in
>1/2 frame! =:o
Sorry, it's 2x2. Will try it at 1x1. It will be fast. Fast as it can get.
>but maybe you got to wait for PPC to get it for free ;)
>would be especially interesting for all A3000 users! (quick cpu but no AGA)
Hehe.. :)
<sb>Patrick Hanevold - Virtual Reality developer
<sb>patrick....@login.eunet.no
<sb>Amiga and official Be developer
|> see full screen 25 fps doom on 030?
well, if the copying to vram (assuming free 2pss action) needs a frame
there's not much left.
25fps are rather ratings for a well cached 486-66 (no rather 486-80).
but maybe a 030-50 with good mem interface doing 320x160 can go 3 frames,
about 1/2 frame copy, 1.5 frames for raw-mapping (assuming 20cyle inner loop.
shading ?), so 1 frame left for polygon 1st-outer-loops+2nd-outer,
rotation, z....
well, a A1200+ with 030-40 and 4 frame doom, aaaah :)
|>
|> Greets,
|> Jorge Acereda (ii...@rossegat.uji.es)
: > Why not just measure it in scanlines? Why ms?
: 20 ms. = 1 frame.
: IMHO it's better to measure ms. instead of scanlines... Scanlines
: measuring is harder in OS-friendly code.
pixel/second make even more sense... 20ms mean nothing to me?
20ms to convert 320x256 pixel: is what you need to type.
Or Just say type: 4mpix/s
Stephan
well, it's a bit different there I guess.
|> How exactly is a HAM screen generated, i.e. what's the largest screen width?
|> I only ask because I gather that you can change 1 colour at a time, in effect
|> you have overlaping RGB planes each 3pixels wide. You could then say that a
yes!
|> 1280 pixel wide HAM8 screen is equivalent to a 427 pixel wide 24 bit chunky
|> display. If you need a more accurate representation of a picture, leave the
|> green image alone and adjust the red and blue images to suit.
theroetical it's 3x1.
I tried it. Looking to a photo on 640x512 via composite you don't recognize
much. Looking at it on LORES you can see a bit blockines. If the colors do
not change too quick blockines is less than on 2x1, as it's smoothed.
could be maybe the only fast way for 24bit animations (for the case 24bit
games get common. It's quite difficult for cpu to do shading on 24bit
values, so 8bit will stay for a while).
and maybe it's best way to do 256 color games on a fast ECS machine.
and maybe conversion time is not that much compared to rendering a
doom scene on A500 ;)
No. Three passes with the CPU and the last one with the blitter.
> No blitter. 320x200 with CPU only, takes 25.7 ms, beeing the first two
> passes 5.something ms and the other two+writes to chip ram the other 20
> ms. (030@50 times) It's pretty slow. I think three totally free passes
> are not possible on 030, but the times should not be more than 22 or 23 ms
Should be, because two passes are free on a 28MHz 020/030.
> So, how much would require the last pass with the blitter? Will we
> see full screen 25 fps doom on 030?
Hmm. Propably about 40ms..
no timer is more accurate than showing the timing via raster display.
you know any method with less overhead than the 8 cycles for a write
to colorregister ?
|>
|> pixel/second make even more sense... 20ms mean nothing to me?
well, pixel/second is not interestong, who wants to look an animation
at 1fps ?
|> 20ms to convert 320x256 pixel: is what you need to type.
|> Or Just say type: 4mpix/s
ok, the latter is less chars, true :)
|>
|> Stephan
: In article <4dkbml$e...@maureen.teleport.com>, ssc...@teleport.com (Stephan Schaem) writes:
: |> Organization: Teleport - Portland's Public Access (503) 220-1016
: |> Lines: 15
: |> Message-ID: <4dkbml$e...@maureen.teleport.com>
: |> References: <4cu7vo$1...@sinsen.sn.no> <4djdu8$b...@oreig.uji.es>
: |> NNTP-Posting-Host: kelly.teleport.com
: |> X-Newsreader: TIN [version 1.2 PL2]
: |>
: |> Jorge Acereda Macia (ii...@rossegat.uji.es) wrote:
: |> : Thore Bjerklund Karlsen (t...@sn.no) wrote:
: |>
: |> : > Why not just measure it in scanlines? Why ms?
: |>
: |> : 20 ms. = 1 frame.
: |>
: |> : IMHO it's better to measure ms. instead of scanlines... Scanlines
: |> : measuring is harder in OS-friendly code.
: no timer is more accurate than showing the timing via raster display.
: you know any method with less overhead than the 8 cycles for a write
: to colorregister ?
I use color register miself... then do iteration*videorate = mpixel
: |>
: |> pixel/second make even more sense... 20ms mean nothing to me?
: well, pixel/second is not interestong, who wants to look an animation
: at 1fps ?
Who want to look at a 320x256 display when I'm interested in 256x200 ?
mpixel is eassy to manipulate to your need.And you wont come across
a claim like I can c2p in 8ms... and then realize it was 160x200 screen
2x2 he was talking about :)
: |> 20ms to convert 320x256 pixel: is what you need to type.
: |> Or Just say type: 4mpix/s
: ok, the latter is less chars, true :)
I wanted to point out the resolution.. its totaly non standart
Stephan
Then we should talk about X NxN mpix/sec
I agree X ms for 320x256 is not a good nomenclature. Many people
use 256x256 or 320x200 in their code.
> |> : IMHO it's better to measure ms. instead of scanlines... Scanlines
> |> : measuring is harder in OS-friendly code.
> no timer is more accurate than showing the timing via raster display.
> you know any method with less overhead than the 8 cycles for a write
> to colorregister ?
Yeah, but when testing multitasking code this can be a bit harder.
ReadEClock() is fast. A scanline is a lot of time. ms are more accurate
in this sense if you measure in "integer rasterlines".
And what's that of writing to colorregisters? Are you counting scanlines
directly in the monitor??? %-)
well it's a difference if you use it for coding or for telling others.
pix/sec is best for telling the speed to others (you should mention if
you do less then 8 planes ;)
|>
|> Greets,
|> --
|> ---------------------------- --------------------------------------------
|> | Jorge Acereda | Dream the same thing everynight |
|> | ii...@rossegat.uji.es | I see our freedom in my sight |
|> | Intel Outside | No locked doors, no windows barred |
|> | Amiga Rules | No things to make my brain seem scarred |
|> ---------------------------- --------------------------------------------
>> after having counted the scanlines the effect needs, (by measuring the
>> hight of it on the TV! :D) you calculate
>> time= (measured_height/height_of_wbscreen)*(256/312.5)*20ms
JAM> Why don't you use ReadEClock()?
Works great, in fact, when used properly (and when you time sufficient loops
and take some other stuff into account) you can create a routine which is
accurate enough to tell you that instructions like 'Move.l d0,d0' take 2.00
cycles on a 68030 (so it is accurate to about 1/100th of a CPU cycle -- I guess
that beats counting rasterlines)
hehe and what about estimating the ratio outer vs inner loop of a
polyengine ? I let outer loop appear yellow and inner red.
Then I could see how ratios change change when polywidth
(a horiz line of it) increases. Outer loop same, but incresed
storage time, so the disply became more red at bottom.
Show me one clock that can perform this without making the
outer loop 2 times slower ;) I bet it won't include realtime
visualisation of the timings ;)
|>
|> Grtz John
|>
|> -----------------------------------------------------------------------
|> John.H...@grafix.xs4all.nl TextDemo/FastView/Etc... development
|> -----------------------------------------------------------------------
|> -- Via Xenolink 1.985B3, XenolinkUUCP 1.1
------------------------------------------------------------------------
>hehe and what about estimating the ratio outer vs inner loop of a
>polyengine ? I let outer loop appear yellow and inner red.
What about actually _measuring_ it ? :)
>Show me one clock that can perform this without making the
>outer loop 2 times slower ;)
The trick is to measure with varying loop counts. You only take
the time for whole outer loop (or many runs through this loop).
>I bet it won't include realtime
>visualisation of the timings ;)
This is true. But then you might be interested into exact results
instead of a visualization.
--
Michael van Elst
Internet: mle...@serpens.rhein.de
"A potential Snark may lurk in every tree."
|>> JAM> Why don't you use ReadEClock()?
|>> Works great, in fact, when used properly (and when you time sufficient
|>> loops and take some other stuff into account) you can create a routine
|>> which is accurate enough to tell you that instructions like 'Move.l
|>> d0,d0' take 2.00 cycles on a 68030 (so it is accurate to about 1/100th
|>> of a CPU cycle -- I guess that beats counting rasterlines)
JrF> hehe and what about estimating the ratio outer vs inner loop of a
JrF> polyengine ? I let outer loop appear yellow and inner red.
That would be simple. Just time the outerloop + innerloop, and then time the
outerloop alone (ie, with a non-existant innerloop). Do some math and you
should get an acceptable answer.
Grtz John
-----------------------------------------------------------------------
John.H...@grafix.xs4all.nl TextDemo/FastView/Etc... development
-----------------------------------------------------------------------
-- Via Xenolink 1.981, XenolinkUUCP 1.1
> |> Yeah, but when testing multitasking code this can be a bit harder.
> |> ReadEClock() is fast. A scanline is a lot of time. ms are more accurate
> mhm in which lib is this ? OS3.0 ? Do I need to open something bevore
> using it ?
Yeah, OS3.0. You need to open lowlevel.library, but if you need to
use it on <3.0 systems, you can write your replacement using timer.device.
>Yeah, OS3.0. You need to open lowlevel.library,
In fact you need OS3.1 to have lowlevel.library and it contains just a similar
function called ElapsedTime.
>but if you need to
>use it on <3.0 systems, you can write your replacement using timer.device.
Actually you can just call ReadEClock() from the timer.device library.
> >Yeah, OS3.0. You need to open lowlevel.library,
> In fact you need OS3.1 to have lowlevel.library and it contains just a similar
> function called ElapsedTime.
!?!?!? Really? Well, I have 1200 with OS3.0. and it came with lowlevel.library
(I bought it used). Haven't reinstalled the OS though.
> >but if you need to
> >use it on <3.0 systems, you can write your replacement using timer.device.
> Actually you can just call ReadEClock() from the timer.device library.
Hmmm... True. Seems like I confused both functions :-)