Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Drawing Arcs

60 views
Skip to first unread message

Jacob Wall

unread,
Jul 26, 2008, 9:19:52 PM7/26/08
to
Hello all,

I recently became a little obsessed with trying to speed up the
drawing of arcs on the HP 50g because the ARC command is somewhat
slow. I searched previous posts and found the question had been
brought up before but I couldn't find any real answers or attempts by
others to tackle the issue. So after searching the internet for ideas
I cobbled together a SysRPL program that is faster than the ARC
command but I still feel it could be optimized, which is why I'm
posting it here to see if anyone might be interested in pointing out
ways to optimize it, both for speed and size.

A few notes:
- the program assumes DEG mode set
- the program uses the midpoint circle algorithm (aka Bresenham's
circle algorithm)
- Inputs (5 reals) are as follows:
- 5: X0 **pixel x coordinate for arc centre (ex: 65)
- 4: Y0 **pixel y coordinate for arc centre (ex: 40)
- 3: B1 **bearing from arc centre to beginning of arc (ex:
45, bearings are clockwise from North)
- 2: B2 **bearing from arc centre to end of arc (ex: 90,
bearings are clockwise from North)
- 1: y **Radius
- the program is set up for demonstration purpose, therefore the use
of DOCLLCD, PIXON, and WaitForKey
- program size is ~850 bytes as shown
- to draw a full circle at X0=65, Y0=40, B1=0, B2=360, y=30 takes ~2.0
seconds
- to draw same circle with ARC command takes ~4.2 seconds


::
CK5NOLASTWD
DOCLLCD
TURNMENUOFF
%1
OVER
%-
%0
DUP
{
LAM X0
LAM Y0
LAM B1
LAM B2
LAM y
LAM d
LAM x
LAM A
}
BIND
BEGIN
LAM d
%0<
ITE
::
LAM d
LAM x
%2
%*
%3
%+
%+
'
LAM d
STO
;
::
LAM d
LAM x
%2
%*
LAM y
%2
%*
%-
%5
%+
%+
'
LAM d
STO
LAM y
%1-
'
LAM y
STO
;
LAM x
LAM y
%REC>%POL
%360
%MOD
'
LAM A
STO
DROP
% 90.
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%+
LAM Y0
LAM y
%-
COERCE2
PIXON
;
LAM A
LAM B1
%>=
LAM A
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%+
LAM Y0
LAM x
%-
COERCE2
PIXON
;
%180
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%+
LAM Y0
LAM x
%+
COERCE2
PIXON
;
% 90.
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%+
LAM Y0
LAM y
%+
COERCE2
PIXON
;
% 270.
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%-
LAM Y0
LAM y
%+
COERCE2
PIXON
;
%180
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%-
LAM Y0
LAM x
%+
COERCE2
PIXON
;
%360
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%-
LAM Y0
LAM x
%-
COERCE2
PIXON
;
% 270.
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%-
LAM Y0
LAM y
%-
COERCE2
PIXON
;
LAM x
%1+
'
LAM x
STO
LAM x
LAM y
%>
UNTIL
SetDAsTemp
WaitForKey
2DROP
ABND
;
@

Any suggestions would be appreciated.

Jacob

Andreas Möller

unread,
Jul 27, 2008, 8:07:17 AM7/27/08
to
Hello,

use NULLLAMs instead of named LAMs. Accessing them is a lot faster as
their position in memory is caluclated wheras named LAMs are searched.
As a matter of fact you can use any fixed address for binding a LAM, I
usually use TRUE for this because it is on an even address (IIRC) but
this should not matter on a 50G where the SATURN is emulated.
To access NULLLAMs above the current temporary enviroment you have to
add one for the protection word.
You can used DEFINE to make your code more readable with NULLLAMs .

Also according to the timings of EMU48 'case' is faster than 'ITE'.

HTH,
Andreas
http://www.software49g.gmxhome.de

Raymond Del Tondo

unread,
Jul 27, 2008, 9:35:35 AM7/27/08
to
Hello,

for REALLY FAST arc and circle drawing routines,
you might want to take a look at Mark Power's web page:

http://www.btinternet.com/~mark.power/hp48.htm

or more exactly :
http://www.btinternet.com/~mark.power/hp48/glib07.zip

The binary was made for the real HP-48 series,
but since the sources are included (HP/JAZZ syntax) ,
you will be able to port them to the 50g.


HTH

Raymond

"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag
news:57ba5eeb-fd0c-46f9...@w1g2000prk.googlegroups.com...


> Hello all,
>
> I recently became a little obsessed with trying to speed up the
> drawing of arcs on the HP 50g because the ARC command is somewhat
> slow. I searched previous posts and found the question had been
> brought up before but I couldn't find any real answers or attempts by
> others to tackle the issue. So after searching the internet for ideas
> I cobbled together a SysRPL program that is faster than the ARC
> command but I still feel it could be optimized, which is why I'm
> posting it here to see if anyone might be interested in pointing out
> ways to optimize it, both for speed and size.
>

> [..]

Jacob Wall

unread,
Jul 27, 2008, 12:52:59 PM7/27/08
to
Andreas, thanks for the NULLLAMs tip, I was not aware of the
advantages of using NULLLAMs, and in fact it makes quite a difference,
the circle that took ~2.0 seconds to draw only takes ~1.26 seconds to
draw after the changes. I did not notice any real gains when
substituting 'case' for 'ITE'

Raymond, I took a look at Mark Power's web page and looked at the
documentation for his graphics library. I only see a Circle command,
no Arc command. Also I unfortunately know nothing of machine code,
but thanks to the comments in the source file I can tell the same
midpoint circle algorithm is used. When I remove the tests from my
program that decide whether or not the pixel is within the arc or not,
the circle that now takes ~1.26 seconds to draw, only takes ~0.6
seconds. I'm sure that would be (much?) faster in machine code, but
it is beyond my knowledge.

So if anyone is interested in creating a machine code version of my
program I'd be quite interested, seeing as how I don't know how to do
it.

Now, for short arcs (<45 degrees of arc) the ARC command is still
faster than my program, however at 180 degrees of arc, ARC takes more
than double the time, and the full circle more than triple the time.
If an arc of any degree could be drawn in under 0.5 seconds at say the
30 pixel radius......

Jacob

Raymond Del Tondo

unread,
Jul 27, 2008, 7:13:39 PM7/27/08
to
Hi,

right, I just saw Mark's Graphics Library doesn't include an ARC function.

However the the CIRCLE command ist very fast, between less
than 100ms (0.1 sec!) for small circles and about 350ms for bigger ones,
like 75 pixel radius (150 pix diameter;-)

Please note that these times were measured on a real HP-48SX,
with 2 (TWO) MHz CPU clock!

On the doorstop series (49,50) these routines may be even faster.

I didn't dig into the sources so far,
maybe someone with more free time could do that,
to add an ARC like cmd?

BTW:
At least in the real HP-48 series calcs the PIXON cmd is the braking part,
even the internal versions of PIXON/PIXOFF/PIXTOG are very slow,
so these were the essentials to be replaced, which was done in Mark's lib.

Raymond

"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag

news:b0be004b-5d43-4599...@w39g2000prb.googlegroups.com...

cyrille de brebisson

unread,
Jul 27, 2008, 9:40:01 PM7/27/08
to
hello,

why don't you try to do it in ASM? it will be faster... amd nore fun to
code...

cyrille

"Jacob Wall" <j8w...@hotmail.com> wrote in message
news:57ba5eeb-fd0c-46f9...@w1g2000prk.googlegroups.com...


> Hello all,
>
> I recently became a little obsessed with trying to speed up the
> drawing of arcs on the HP 50g because the ARC command is somewhat
> slow. I searched previous posts and found the question had been
> brought up before but I couldn't find any real answers or attempts by
> others to tackle the issue. So after searching the internet for ideas
> I cobbled together a SysRPL program that is faster than the ARC
> command but I still feel it could be optimized, which is why I'm
> posting it here to see if anyone might be interested in pointing out
> ways to optimize it, both for speed and size.
>

> A few notes:=

Jacob Wall

unread,
Jul 28, 2008, 1:02:04 AM7/28/08
to
Raymond, agreed that the CIRCLE command is extremely fast from Mark's
library, I tried it with Emu48 emulating a 48gx, with "Authentic
Calculator Speed" set (although when comparing side by side with Emu48
emulating the 50g and a real 50g, there are discrepancies, although
not too major) Having said that I am not sure that the PIXON in the
50g is the culprit, I think its the fact that for each loop there are
24 tests (%>=, %<=, AND, each appearing 8 times) to decide whether the
pixel of each octant is within the arc, and furthermore COERCE2 also
appears 8 times each loop, if there was such a thing as negative
BINTS...... I got rid of the %REC>%POL and substituted with %ANGLE
which yielded a slight improvement. (~1.14 seconds for the 30 pixel
radius circle, ~0.86 seconds for half circle, ~0.72 seconds for
quarter) Actually now looking at those numbers, maybe PIXON is a
factor, but regardless, even bare minimum the program takes ~0.57
seconds, with zero degrees of arc.

Cyrille, having just recently got into SysRPL and ASM looking like a
different animal altogether, I think I'll hold off just a little on
trying my hand at it, although I suspect that in order to get the
desired speed, ASM will need to be employed.

Jacob

Claudio Lapilli

unread,
Jul 29, 2008, 8:12:30 PM7/29/08
to
On Jul 28, 1:02 am, Jacob Wall <j8w1...@hotmail.com> wrote:
<...>

> Having said that I am not sure that the PIXON in the
> 50g is the culprit, I think its the fact that for each loop there are

<...>


That's easy to test: simply replace PIXON with the proper number of
DROPs and run the same circle with and without PIXON. The difference
is exactly the overhead introduced by PIXON.

Claudio

Jacob Wall

unread,
Jul 29, 2008, 8:44:53 PM7/29/08
to

Yes so easy in fact, I didn't think of it. After replacing PIXON with
2DROP, the 30 pixel radius circle took ~1.02 seconds, which is ~0.11
seconds faster than my best performance with PIXON, not very
significant. Thanks Claudio, that clears the PIXON question.

Jacob

cyrille de brebisson

unread,
Jul 30, 2008, 11:06:46 AM7/30/08
to
hello,

most of the time is probably spent in the memory allocation used whenever a
new object is created. If you can simplify your program so that there are
less objects created, it will run faster.

you can also try to make sure that only small integers are used as they do
not need to be created as they are already present in ROM.

cyrille

"Jacob Wall" <j8w...@hotmail.com> wrote in message

news:7a89d114-f069-4732...@p10g2000prf.googlegroups.com...

Claudio Lapilli

unread,
Jul 30, 2008, 12:48:08 PM7/30/08
to
I have a few suggestions to optimize your routine:

Do not use angles inside the loop. I think it will be a lot faster if
you determine the start and end pixel count for each octant outside
the loop. By pixel count I mean the number of iterations in the
Bressenham loop, which will be pixels in the X direction for some
octants and in the Y direction for others.
The start and end points will therefore be integers, which will allow
you to completely remove real arithmetic inside the loop. Using only
bints will be a lot faster.
So you run the full octant iteration, and only call PIXON for pixels
between the start and end points for each quadrant, removing from the
loop all real comparisons, the conversion from real to polar and the
MOD operation, these last two being very expensive.

Claudio

Jacob Wall

unread,
Aug 1, 2008, 2:02:54 AM8/1/08
to
Hello,

Thanks to all of you who offered suggestions and insight, I've gained
more knowledge about the workings of these machines as a result.

Claudio, another great tip to calculate pixel ranges for each octant
beforehand. This certainly improves performance, especially when the
octant in question is not part of the desired arc, which tells me that
the addition/subtraction of x & y to center X & Y to determine the
pixel to be turned on is where I can still improve it, which as
Cyrille pointed out would be quicker if integers were used. I'll look
further into this in the days to come and will post the new and
improved version of the program once it's complete if anyone is
interested.

As it stands the 30 pixel radius arc times are:
full circle: ~0.88 seconds
half circle: ~0.61 seconds
quarter circle: ~0.48 seconds
eighth circle: ~0.43 seconds
minimum runtime using 1 degree of arc is ~0.36 seconds

Thanks again,

Jacob

Jacob Wall

unread,
Aug 2, 2008, 8:40:17 PM8/2/08
to
Hello,

Just to follow up, not too long ago I wished for:


"If an arc of any degree could be drawn in under 0.5 seconds at say
the
30 pixel radius...... "

Well that is now a reality. Here are the times with the new and
improved program: (30 pixel radius)
full circle: ~0.38 seconds
half circle: ~0.27 seconds
quarter circle: ~0.22 seconds
eight circle: ~0.21 seconds
minimum runtime using 0 degree of arc is ~0.17 seconds

The only thing now is that the program is 1180 Bytes, or about 330
Bytes bigger than the original program.

Below is the new program source, still the same 5 inputs as before.

::
CK5NOLASTWD
DOCLLCD
TURNMENUOFF
5ROLL
5ROLL
COERCE2
5UNROLL
5UNROLL
DUP
COERCE
%1
3PICK
%-
BINT0
DUP
5ROLL
DUP
%*
%2
%/
%SQRT
COERCE
BINT2
{}N
BINT8
NDUPN
DROP
NULLLAM
BINT15
NDUPN
DOBIND
11GETLAM
UNCOERCE
12GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
12GETLAM
COERCE
11GETLAM
UNCOERCE
13GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
13GETLAM
COERCE
BINT3
BINT1
DO
BINT9
BINT1
DO
DUP
BINT45
INDEX@
#*
#1+
#<
IT
::
DROP
INDEX@
::
DUP
BINT1
#=case
TrueTrue
DUP
BINT2
#=case
FalseFalse
DUP
BINT3
#=case
TrueFalse
DUP
BINT4
#=case
FALSETRUE
DUP
BINT5
#=case
TrueTrue
DUP
BINT6
#=case
FalseFalse
DUP
BINT7
#=case
TrueFalse
DUP
BINT8
#=case
FALSETRUE
;
ITE
::
4ROLL
DROP
;
::
ROT
DROP
;
ITE
::
SWAP
JINDEX@
INDEX@
GETLAM
PUTLIST
INDEX@
PUTLAM
;
::
SWAP
BINT3
JINDEX@
#-
INDEX@
GETLAM
PUTLIST
INDEX@
PUTLAM
;
DUP
BINT14
JINDEX@
#-
PUTLAM
ISTOPSTO
;
LOOP
LOOP
13GETLAM
12GETLAM
#>
BINT9
BINT1
DO
DUP
INDEX@
12GETLAM
#>
INDEX@
13GETLAM
#<
ROT
ITE
AND
OR
IT
::
BINT0
INDEX@
PUTLAM
;
LOOP
DROP
BEGIN
::
10GETLAM
%0<
case
::
10GETLAM
9GETLAM
UNCOERCE


%2
%*
%3
%+
%+

10PUTLAM
;
::
10GETLAM
9GETLAM
UNCOERCE
%2
%*
11GETLAM
UNCOERCE


%2
%*
%-
%5
%+
%+

10PUTLAM
11GETLAM
#1-
11PUTLAM
;
;
1GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
2GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
3GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
4GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
5GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
6GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
7GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
8GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
9GETLAM
#1+
DUP
9PUTLAM
11GETLAM
#>
UNTIL
ABND
SetDAsTemp
WaitForKey
2DROP
;
@

Jacob

Raymond Del Tondo

unread,
Aug 3, 2008, 5:53:42 AM8/3/08
to
Hello,

now that's an improvement.

However on a real HP-48SX your program takes ~2.9 secs for the full circle,
and ~1.9 secs on an HP-48GX ;-)

Raymond


"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag

news:809075ce-44fa-4194...@a8g2000prf.googlegroups.com...


> Hello,
>
> Just to follow up, not too long ago I wished for:
> "If an arc of any degree could be drawn in under 0.5 seconds at say
> the
> 30 pixel radius...... "
>
> Well that is now a reality. Here are the times with the new and
> improved program: (30 pixel radius)
> full circle: ~0.38 seconds
> half circle: ~0.27 seconds
> quarter circle: ~0.22 seconds
> eight circle: ~0.21 seconds
> minimum runtime using 0 degree of arc is ~0.17 seconds
>
> The only thing now is that the program is 1180 Bytes, or about 330
> Bytes bigger than the original program.
>
> Below is the new program source, still the same 5 inputs as before.
>

> [..]
>
> Jacob


Claudio Lapilli

unread,
Aug 3, 2008, 10:20:22 AM8/3/08
to
On Aug 2, 8:40 pm, Jacob Wall <j8w1...@hotmail.com> wrote:
> Hello,
>
> Just to follow up, not too long ago I wished for:
> "If an arc of any degree could be drawn in under 0.5 seconds at say
> the
> 30 pixel radius...... "
>
> Well that is now a reality.  Here are the times with the new and
> improved program: (30 pixel radius)
> full circle: ~0.38 seconds


Cool!
Cutting execution time by half always feels good, doesn't it?... :-)

I have one more idea, that could make it slower or faster (who knows
until you test it). The overhead of running the Bresenham loop is not
too much, and your program seems to have a lot of overhead by trying
to manage all octants at the same time (using lists, and many case's
and ITE's).
Now here's another way of doing it, which could potentially be faster
due to less object handling:

1) Calculate all initial parameters for the loop just like you are
doing now, but save them on a LAM for later use.
2) Run one Bresenham loop for each octant (reusing the saved
parameters).

The advantage is obvious for arcs that don't involve all octants: you
don't need to run the loop, so it will execute faster in that case.
Another advantage is that for any partial octants, you can stop the
loop as soon as you reach the stopping point.
The worst case would be the full circle, in this case you would be
running 8 independent complete loops. In this case, it may run slower
than your current routine, but not too much I think, since speed loss
may be compensated somewhat because you'll be using less LAM's, no
lists, etc.

And you could leave both algorithms, and choose which one to use at
run time based on the start and end angles, using your current routine
for arcs that involve several octants (you could measure the timings
on both routines to determine what's the best "switching" angle of
aperture).


It all depends if you want to keep breaking records or you are
satisfied with what you got... :-)

Regards,
Claudio

Jacob Wall

unread,
Aug 3, 2008, 10:30:36 PM8/3/08
to
Claudio, yeah I'm curious to see what difference it would make to run
one Bresenham loop for each octant, and time permitting I will look
into it.

> It all depends if you want to keep breaking records or you are
> satisfied with what you got... :-)

For now I am satisfied, but as Raymond pointed out, the program still
takes a little too long if run on the 48. Appreciate you taking the
time to run the program on the 48 Raymond. That really makes Mark
Power's machine code graphics library that much more impressive. As a
new to SysRPL programmer, I'm just happy that I can draw arcs and
circles now and not have to wait and wonder if my calculator decided
to go on strike :-)

Jacob

Yann

unread,
Aug 5, 2008, 11:29:41 AM8/5/08
to
Jacob, your proposed exercice is very interesting,
Since you took good care and had great results with SysRPL, would you
mind if i try to have a look at ASM side, which i'm currently
learning ?

I only own an HP48SX myself, but it is probably possible to create a
code which would be compatible with both HP48S and your HP50

Regards

Yann

unread,
Aug 5, 2008, 11:33:09 AM8/5/08
to

cyrille de brebisson

unread,
Aug 5, 2008, 5:19:35 PM8/5/08
to
hello,

you can use that for a base.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Circle asm toolbox %
% %
% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %
% the circle must be in the -2047, 2047 square %
% Grob can not be bigger than 2048 pixels %
% %
% 5 different calls: %
% aCircleW Draw a White line %
% aCircleG1 Draw a light gray line %
% aCircleG2 Draw a dark gray line %
% aCircleB Draw a Black line %
% aCircleXor Invert a line %
% %
% Uses: Carry, P, RSTK2 %
% D0 (Last pix Plane1) %
% D1 (Last pix plane2) %
% R3a: +/- Line Width %
% R4a: Plane length %
% Cs: Pixel Mask %
% As: Undefine %
% Bs: Undefine %
% IRAMBUFF (50 nibbles) %
% %
% All functions work either in W&B or gray scale %
% All functions are using standard bresenham algo %
% D0, D1 are pointing on the 2 bit plans. %
% During the process, %
% Am: 4*X Error %
% Bm: 4*Y Error %
% Cm: Error %
% Dm: 2* Radius %
% Ax: XCurrent %
% Bx: XMax (Grob Width) %
% Cx: YCurrent %
% Dx: YMax (Grob Height) %
% D0: Pointer on bit plan 1 %
% D1: Pointer on bit plan 2 %
% Cs: Pixel mask %
% R3a: Line Width or - Line Width %
% R4a: Plan length %
% IRAMBUFF: Pixon routine %
% %
% The Bresenham algo is use to draw the circle. %
% The original Bresenham use to compute the circle %
% Only on 1/8 of the circle and draw the other part%
% using symetrie. Doing a full pixon is quite slow %
% with a saturn, so, we are using 8 times %
% bresenham to draw suxcecivly the 8 octants of %
% the circle. Am, Bm and Cm are use to store the %
% values needed by bresenam, Ax, Bx, Cx and Dx are %
% used to store the current pixel coordinates %
% and the grob dimention to perform the cliping. %
% D0, D1 and Cs are use to point on the current %
% pixel. %
% %
% The algo we are using is: %
% X=2*Rayon-1 %
% Y=1 %
% cx=cx+rayon %
% cy=cy %
% Error = 0 %
% pixon(Cx, Cy) %
% while Y<X do %
% Begin %
% Error=Error+Y %
% Y=Y+2 %
% cy=cy+1 %
% If error>=X then %
% Begin %
% error= error-X %
% X=X-2 %
% cx=cx-1 %
% End %
% pixon(Cx, Cy) %
% End; %
% while X<>0 do %
% Begin %
% Error=Error-X %
% X=X-2 %
% cx=cx-1 %
% If error<0 then %
% Begin %
% error= error+Y %
% Y=Y+2 %
% cy=cy+1 %
% End %
% pixon(Cx, Cy) %
% End; %
% %
% The same routine is used all the time, but %
% a different Pixon routine is copied in IRAMBUFF %
% for each type of line, and this routine is used %
% to affect the pixels. the _Prepx routine are used%
% to copy the pixon routine in RAM. Note that to %
% decrease teh amount of used register by this %
% routine, the code is charged in Cm using LC with %
% P=3. in order to have a readable code, the LC %
% mnemonic is not used, bu is replaced by the hex %
% opcode: 3x where x is the number of nibbles to %
% load in C-1. at the end of the routine, C[34] are%
% loaded with 00. This means the this part of the %
% Ca register is lost %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*aCircleB
GOSUB _PrepB
GOTO _Circle

*aCircleW
GOSUB _PrepW
GOTO _Circle

*aCircleG1
GOSUB _PrepG1
GOTO _Circle

*aCircleG2
GOSUB _PrepG2
GOTO _Circle

*aCircleXor
GOSUB _PrepXor

*_Circle
A+C.A % Compute cx (cx+Rayon)
C=0.M C+C.A CSL.A CSL.A CSL.W D=C.M % Dm=Rayon*2
CSL.M CSL.M C=A.A A=C.W % Am=Rayon*2=X
GOSUBL aGrey?
D0-10 C=DAT0.A D=C.X % Read Max Y
D0+5 C=DAT0.A BCEX.A RSTK=C % Read Max X, Save cy
D0+5
GOSUBL ComputePixel % Compute pixel mask and D0 and D1
C=RSTK % Restore cy
ASR.M ASR.M % X=Am=Rayon
C=0.M % Y=Cm=0
B=0.M % Er=0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
?A=0.M RTY % Quick exit on Radius=0
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
P=14
% Octant 1 From a=0 to - 45
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C+1.X % Y+2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 2. From a=-45 to -90
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C+1.X % er+Y, Y+2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 3 From a=-90 to -135
% Doing the same thing again, but inverting Y and X
C=D.M A=C.M B=0.M C=0.M % ReInitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A-1.X % Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 4. From a=-135 to -180
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A-1.X % er+Y, Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 5 From a=-180 to -225
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C-1.X % Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 6. From a=-225 to -270
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C-1.X % er+Y, Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 7 From a=-270 to -315
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A+1.X % Y+2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 8. From a=-315 to -360
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A+1.X % er+Y, Y+2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
}
?A#0.P EXIT % End of octan?
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
P=0 RTN
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Return informations on a graphic object %
% %
% Input: D0: @ grob %
% P=0, HEX %
% Output: ST=0/1 fGray %
% 0: Grob W&B %
% 1: Grob Gray %
% R4a: Plane Length in nibbles %
% R3a: Line Width %
% D0: @ first pixel of the screen %
% %
% uses: RSTK1, Ca, Carry, mp %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*aGrey?
R3=A.A A=B.A R4=A.A % Save Aa in R3 and Ba in R4
D0+15 C=DAT0.A C+7.A CSRB.A CSRB.A CBIT=0.0 % Compute line width
D0-5 A=DAT0.A % read Grob height
B=0.A A+A.A % Some special multiplication
?CBIT=0.1 { B+A.A } A+A.A % I know that
?CBIT=0.2 { B+A.A } A+A.A % 1: C is between 0 and
2048/2=1024
?CBIT=0.3 { B+A.A } A+A.A % 2: C is even
?CBIT=0.4 { B+A.A } A+A.A % So I explode the loop,
?CBIT=0.5 { B+A.A } A+A.A % testing only bit 1 to 9
?CBIT=0.6 { B+A.A } A+A.A % of C
?CBIT=0.7 { B+A.A } A+A.A % an other great thing is
?CBIT=0.8 { B+A.A } A+A.A % that C is not change during
this
?CBIT=0.9 { B+A.A } A+A.A % multiplication
?CBIT=0.10 { B+A.A } A+A.A
?CBIT=0.11 { B+A.A } A+A.A
?CBIT=0.12 { B+A.A } A+A.A
?CBIT=0.13 { B+A.A } A+A.A
?CBIT=0.14 { B+A.A } A+A.A
?CBIT=0.15 { B+A.A }
D0-5 A=DAT0.A A-15.A D0+15 % read grob size (only bitplans)
D0 point on the first pixel
ST=0.=fGray ?A=B.A { ST=1.=fGray } % if plane length = grob plane
size, we are in W&B
A=B.A AR4EX.A B=A.A % Save Bitplane size, restore B
CR3EX.A A=C.A % Save Line Width, restore Aa
RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Compute Address of first pixel to draw %
% Input: R3a: Line Width ( Even, < 512 ) %
% R4a: Plane Width %
% St=x fGray %
% Aa: X %
% Ca: Y %
% D0: @ First pixel %
% P=0, HEX %
% %
% return: D0: @ Pixel, D1: @ Pixel plan 2 %
% Cs: Pixel Mask %
% P=0, HEX %
% %
% Uses: Cw, D0, D1, Carry, P, RSTK1 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*ComputePixel
AD0EX ABEX.A % We must now compute the position of the first
pixel
AR3EX.A C+C.A % D0=Aa Ba=D0 Aa=Ba
?ABIT=0.1 { B+C.A } C+C.A % We Want: D0=@Pix+Y*Width+X/4, D1=D0+PlanSize if
fGray
?ABIT=0.2 { B+C.A } C+C.A % Step 1: Compute B=B+Y*Width ( Width even <=
512 )
?ABIT=0.3 { B+C.A } C+C.A
?ABIT=0.4 { B+C.A } C+C.A
?ABIT=0.5 { B+C.A } C+C.A
?ABIT=0.6 { B+C.A } C+C.A
?ABIT=0.7 { B+C.A } C+C.A
?ABIT=0.8 { B+C.A } C+C.A
?ABIT=0.9 { B+C.A }
AR3EX.A % restore Aa=Ba
CD0EX D0=C % Ca=X
P=C.0 CSRB.A CSRB.A B+C.A % Compute pixel address
C=R4.A C+B.A ?ST=1.=fGray { C=B.A } D1=C % Point on pixel on plan 2 if grey
else, stay on same plan
ABEX.A AD0EX % Restore Ba=Ba, Aa=Aa, D0 point on pixel on plan
1
LC 1248124812481248 P=0 % Compute pixel mask
RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% For many of the graphical primitives, %
% The same routine is used all the time, but %
% a different Pixon routine is copied in IRAMBUFF %
% The _Prepx routine are used %
% to copy the pixon routine in RAM. Note that to %
% decrease teh amount of used register by this %
% routine, the code is charged in Cms using LC with%
% P=3. in order to have a readable code, the LC %
% mnemonic is not used, but is replaced by the hex %
% opcode: 3x where x is the number of nibbles to %
% load in C-1. at the end of the routine, C[34] is %
% loaded with 00. This means that this part of the %
% Ca register is lost %
% All prep routines use Cms, D1 and carry. %
% at the end of the routine, Ca=Ca&00fff %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*_PrepB
D1=(5)=IRAMBUFF P=3
$3B A=DAT0.S A!C.S DAT0=A.S DAT1=C.M D1+12
$38 ?ST=0.=fGray RTY A=DAT1.S DAT1=C.M D1+9
$39 A!C.S DAT1=A.S RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepW
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S A!C.S A-C.S DAT1=C.M D1+11
$38 DAT0=A.S ?ST=0.=fGray RTY DAT1=C.M D1+9
$3A A=DAT1.S A!C.S A-C.S DAT1=C.M D1+11
$35 DAT1=A.S RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepG1
D1=(5)=IRAMBUFF P=3
$3B A=DAT0.S A!C.S DAT0=A.S DAT1=C.M D1+12
$38 ?ST=0.=fGray RTY A=DAT1.S DAT1=C.M D1+9
$3C A!C.S A-C.S DAT1=A.S RTN DAT1=C.M D1+12 DAT1=C.S
LC 00 P=0 RTN

*_PrepG2
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S A!C.S A-C.S DAT1=C.M D1+11
$38 DAT0=A.S ?ST=0.=fGray RTY DAT1=C.M D1+9
$3B A=DAT1.S A!C.S DAT1=A.S DAT1=C.M D1+12
$31 RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepXor
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S B=A.S A!C.S DAT1=C.M D1+11
$3A B&C.S A-B.S DAT0=A.S DAT1=C.M D1+11
$34 ?ST=0.=fGray RTY DAT1=C.M D1+5
$3A A=DAT1.S B=A.S A!C.S DAT1=C.M D1+11
$3C B&C.S A-B.S DAT1=A.S RTN DAT1=C.M D1+12 DAT1=C.S
LC 00 P=0 RTN


"Yann" <kdo...@gmail.com> wrote in message
news:e7958dff-4119-4c78...@m73g2000hsh.googlegroups.com...

Jacob Wall

unread,
Aug 5, 2008, 9:00:04 PM8/5/08
to

Hello Yann,

Yes I would be very interested in seeing what might be possible with
ASM and look forward to seeing your progress.

Jacob

Yann

unread,
Aug 6, 2008, 9:38:44 AM8/6/08
to
Thanks for the code, Cyrille

It seems we had a similar idea, as i started too with a simple Full
Circle algorithm,
in order to provide some "training ground" for arc
(Circle are simpler, there is no need to check bearing, just draw the
pixel).

Hence i already had an SASM code ready by the time i found yours.
Anyway, the goal being to learn, it's always good to be able to
compare.
It seems your code is using a different syntax than SASM, probably
MASD.
I read here and then that MASD syntax is more powerfull than SASM,
but for the time being, i'm not even able to compile such a piece of
code using Debug4x.
Could you give me a hint at how to do that ?

Anyway, for those interested, here are some results for the current
Full Circle algorithm, using an HP48SX :
FastCircle on HP48X :
30 pixels radius (65,30,30) ===> 0.24s
60 pixels radius (30,-10,60) ==> 0.31s
200pixels radios (60,215,200) => 0.75s

This is a first try, so it most probably can be optimized more.
But it was very usefull in helping to understand how to write ASM
Code, and especilly how to initialise graphic area, write pixels, and
so on.

This implementation allows center outside of screen (including
negative numbers), and radius up to 1000 pixels.

The ASM code can draw circles on any Grob of any size.
However, the herebelow proposed binary has a SysRPL header, which
checks arguments, and provide the Graphic Grob as an input to ASM.
Therefore, the circle is drawn in the Graphic Area, visible using
GRAPH command (or LEFT arrow).
A nice side effect is that it should support both 64 & 80 lines
systems without any problem.
And if Graphic Area is "larger than screen", it will make use of its
full size.

Binaries for both HP48 & HP49/50 are available here :
http://www.xooimage.com/app/upload.php
FastCircle V1.0 is 411 Bytes long.

SASM Source Code can also be downloaded here :
http://img22.xooimage.com/files/c/a/2/fastcircle-v1.0-sourcecode-56ad87.zip
Code is in english, but comment are written in french, though....

Cyrille, i tried to compare both source code, but still lack some
experience to properly understand MASD.
It seems your code is much more ambitious, as it can provide GreyScale
output.
You're welcomed for any comment.

OK, now for the more complex arc algorithm...

Regards

cyrille de brebisson

unread,
Aug 6, 2008, 3:39:36 PM8/6/08
to
hello,

if you want to use that source, you probalby want to put a CODEM
ENDCODE
around the assembly part so that it gets compiled by the right compiler in
debug4x...

read the documentation for masd in the advanced user manual for the HP49/50
available online for more info on how it works...

but basicaly, it is simple:
you can have as many instructions per line as you want.
you can separate a feild from an instruction by a '.'
in the case where an instruction is R1=R1operationR2, you can ommit the R1=
part
you can create blocks using { } and use instructions to get in and out of
these blocks such as:
EXIT (Goto end of block), EXITC (exit Carry), EXITNC (exit if no carry), UP
(goto begining of block), UPC, UPNC
and a block opening just after a test means that if the test is true, the
blobk is skipped as in: ?A=B.A { A+A.A } is compiled into
?A=B A
SKIPYES .end
A=A+A A
.end

note, you can exit or go up more than one block at a time using UPn/EXITn
where n is a number as in:
LA 100
{
LC 80
{
?A=C.B UP2
C-1.A UPNC
}
A-1.X UPNC
}

Labels are declared with a leading '*', but since you have these nices
blocks, you rarely need them.
regards, cyrille

"Yann" <kdo...@gmail.com> wrote in message

news:24520f6e-d38f-44ea...@s50g2000hsb.googlegroups.com...

Yann

unread,
Aug 7, 2008, 9:32:12 AM8/7/08
to
Hello Jacob

Here is a first try at an ASM Arc Algorithm :
http://img21.xooimage.com/files/8/6/0/fast-approximative-arc-v1.0-570f4c.zip

I've called it Fast Approximative Arc, because it can be wrong by one
pixel.
As i was not smart enough to properly understand how you did to check
if a pixel must be drawn,
i tried to invent an "homebrew" algorithm, based on circumference
length
(The most famous : Circ = 2 x Pi x R).
Well, on paper, it looks pretty and simple, but it comes with problems
of its own, especially estimating distance from a trail of pixel and
correcting cumulated error from the "real" circle.

As a consequence, this version is not directly comparable with yours.
I would appreciate if you could give me a hand at understanding how
you test [pixel within arc], this way it would be possible to create a
code which use the same algorithm, remaining differences only coming
from programming language.

Well, nonetheless, i'm providing binaries for this version, because it
can still usefull for comparison in the future.
Input is as follows : X0, Y0, Radius (in pixel), StartBearing,
EndBearing (in degree, with 0° being North).
As approximation error increases with Radius, this version limits R to
60 max.

Here are some performance results, from an HP48SX (i expect later
model to perform better) :
30 pixels radius
Full : 0.50s
Half : 0.29s
Quarter : 0.18s
Eight : 0.12s
1° : 0.07s (Note : 0° means "Full Circle", so cannot be tested)

Yann

unread,
Aug 7, 2008, 10:24:27 AM8/7/08
to
Thanks Cyrille for these explanations,
this helped me a lot to understand MASD code, well just a little for
now.

I tried to compile your source, but Debug4x complains of 2 missing
entries :
=IRAMBUFF : which i could find over Internet ( =IRAMBUFF EQU #800F5 )
but is not part of "HP standard" nor "Carsten stable" entry list,
which makes me worry if this mnemonic is valid throughout the entire
HP Calculator range.
=fGray : which i could'nt find anywhere, seems related to GrayScale
plotting ?

I guess this sample code is part of a much bigger system, where it is
supposed to be integrated.

Anyway, it provided some very usefull information on how to optimize
ASM Code, and here are my findings :

1) First visible difference is in the algorithm itself.
I'm using X²+Y²<=R²+Err, while Toolbox only cares of Delta between
each iteration.
This second choice seems more efficient, so i changed my code to
reflect it,
however it produced virtually no difference.
The reason for this is that both methods are using efficient
additions, and any advantage for the second algorithm is probably
dwarfed by other parts of the code, which are much more costly .

Hence i started to do some cycle count to find out, especially on the
inner loop
And results are pretty interesting.

2) Most costly single operation is the multiplication to find the
correct GROB nibble. 360 cycles per pixel !

Solution : instead of relying on #MUL call, your code propose to
explode iterations directly within the code.
This will most certainly bring sizable performance benefits (difficult
to do worse than 360 cycles..), so i will mimic this idea.

3) I tend to use Scratch Registers (R0 to R4) like store/load areas.
Because i need much more variables that there are Scratch registers, i
store several of them per register, using CSLW5, ASLW5, etc. calls to
switch between variables.
Bad idea, each of these calls costs 124 cycles. Quite sizable. i
wasn't expecting that.
There were 6 such calls within the inner loop. By reducing them to 4,
i can already bench some sensible benefits.
I believe this is the current main performance bottleneck, as there
are calls like these everywhere in the code.

Solution : Your code use a much clever field separation, using X & M.
As a consequence, values are limited to -2048/+2047, but this looks
enough for such an algorithm.
What is even more striking, is that you keep most datas into Working
registers (A,B,C,D), seldomly using ScratchRegisters for storing. Now
that's a performance. And i guess this makes some terrible results.
I tend myself to clean Working registers between each part of the
code, only relying on Scratch Register to properly keep datas.
I'm not sure i will be able to go as far. But as a minimum,
eliminating CSLW5-like calls should be a goal, and the X/M field
separation seems a very clever idea to achieve that. I will try to
implement it later, because it will require some significant code
refactoring ( well, nearly rewriting everything...)

4) I had one w->W call within the inner loop (88 cycles), which has
been eliminated.
It immediately produced measurable benefits.

So i guess, one of the conclusions : counting cycles pay off.

5) Coding differences
5.1 ) I tend to re-use as much as possible "long" code sequences,
using GOSUB/RTN and different initialisation variables.
Toolbox code, on the other hand, is very straightforward, even not
relying on circle symetry.
As a consequence, Toolbox is likely to produce longer code, but with a
performance advantage.
For the time being, I think i will keep this part as it is, as i like
the memory footprint advantage.
5.2 ) Toolbox makes little use to external calls, such as POP#, #MUL,
and so on.
On the other hand, i was heavily using them up to now.
As a consequence, Toolbox code is probably longer, but faster, and
even much better, controlable.
For example, POP# fills A.A with Stack's system integer, but also
destroy C & D in the process.
#MUL gives the result in B, but destroy source data within A & C.
I think this is a pretty significant advantage, especially within
inner loops, so i will follow your example.

Well, that's all for now,
Thanks for the tips, this was very interesting to learn,
i will try to make use of them, and produce a better version over the
next few days.

Regards

Jacob Wall

unread,
Aug 7, 2008, 9:09:00 PM8/7/08
to
On Aug 7, 6:32 am, Yann <kdo4...@gmail.com> wrote:
> Hello Jacob
>
> Here is a first try at an ASM Arc Algorithm :http://img21.xooimage.com/files/8/6/0/fast-approximative-arc-v1.0-570...

Hello Yann,

I'll repost the program I came up with and put some explanatory notes
in along with the code to try and explain it. This is something I am
not very good at, but should get in the habit of doing regardless. So
here goes from the top:

***// This is preparing the input for the program and ends up being:
LAM 1 - LAM 8 are all lists of {MIN MAX} pixel in x (or y) direction
and by default I declare all 8 octants as being MIN= BINT0 and MAX=
(square root of half the radius squared), meaning the full octant
would be drawn if not otherwise interfered
LAM 9 = x to be used in bresenham loop, intially BINT0
LAM 10 = d to be used in bresenham loop, intially 1-radius
LAM 11 = y=radius
LAM 12 = bearing to end of arc, clockwise from North (as a surveyor
this makes more sense to me)
LAM 13 = bearing to beginning of arc
LAM 14 = Y0, y pixel coordinate of arc center
LAM 15 = X0, x pixel coordinate of arc center
//***

11GETLAM
UNCOERCE
12GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
12GETLAM
COERCE

***//The above calculates the delta pixel in x and y directions from
center to the end of the arc by using the polar to rectangular %POL>
%REC and leaves in the stack:
3: deltay (Absolute) from center of arc to pixel that defines the end
of the arc as a BINT
2: deltax, same as above
1: bearing to end of arc
NOTE: because my program was designed to be intuitive for myself to
handle compass bearings, the x and y values will be switched later on,
conveniently if you swap x and y rectangular values you can use %REC>
%POL and get compass bearings, but that is really another topic
//***


11GETLAM
UNCOERCE
13GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
13GETLAM
COERCE

***//This does the same as before except for the beginning of the arc
so now we have 6 values in the stack:
6: deltay (Absolute) from center of arc to pixel that defines the end
of the arc as a BINT
5: deltax, same as above BINT
4: bearing to end of arc BINT
3: deltay (Absolute) from center of arc to pixel that defines the
beginning of the arc as a BINT
2: deltax, same as above BINT
1: bearing to beginning of arc BINT
//***

***//Ok, there's a number of things going on the 2 loops going on, the
"outer" loop first takes the bearing to first the beginning of arc in
the first pass, (then bearing to end of arc in second pass) and then
passes it to the "inner" loop to figure out which octant that bearing
falls in, once a match is made, 2 flags are generated that enable the
decisions whether for the corresponding arc, the bresenham loop is
increasing the x or the y values during each iteration and also if the
iterations proceed clockwise or counterclockwise, for example the
first octant if looking at it from north to 45 degrees clockwise uses
the x value as its going through the iterations and the direction is
clockwise, the second octant increases the y value and the iteration
goes counterclockwise. So basically the end product here is that for
the 2 octants, (or 1 octant) these two loops put the correct {MIN MAX}
values in the LAM1-LAM8 lists that will tell the bresenham loop
whether or not to draw the pixel during the iteration.
//***

13GETLAM
12GETLAM
#>
BINT9
BINT1
DO
DUP
INDEX@
12GETLAM
#>
INDEX@
13GETLAM
#<
ROT
ITE
AND
OR
IT
::
BINT0
INDEX@
PUTLAM
;
LOOP
DROP

***//This loop stores BINT0 into any octant LAMs that are not
contained in the arc, (which is not a list, and later a test is
performed to check if the corresponding LAM is a list or not
//***

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
2GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
3GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
4GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
5GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
6GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
7GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
8GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP

9GETLAM
#< NOT
SWAP
9GETLAM
#> NOT


AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
9GETLAM
#1+
DUP
9PUTLAM
11GETLAM
#>
UNTIL

***//So that is the bresenham loop, and because the x and y values are
swapped (for surveyor piece of mind) you will find that even though
the algorithm is the same its simply set up differently to draw the
pixel I want it to. But for each octant it test if the pixel in
question during the iteration falls within the range that is {MIN MAX}
for that octant.
//***

ABND
SetDAsTemp
WaitForKey
2DROP
;
@

I hope this helps at least give the concept of what I was trying to
do. There are a number of different ways to attack this problem (and
Claudio seems to at least one more idea I still haven't had the time
to try out).

My program I should also modify to allow GRAD mode set, and a few
other tests to make it more universal...

Jacob

P.S. Thanks for the link to your binaries, Most likely a stupid
question, but when I run the program (with 5 reals in the stack, same
as the ones I was using or not?) I see only a blank screen, I did the
PICT RCL and saw only a blank grob, What am I doing wrong?

cyrille de brebisson

unread,
Aug 8, 2008, 9:08:22 AM8/8/08
to
hello,

> =fGray : which i could'nt find anywhere, seems related to GrayScale
plotting ?

fGray is a flag, you can replace it by any of the status flags you want...

>2) Most costly single operation is the multiplication to find the
>correct GROB nibble. 360 cycles per pixel !

That is why my code does NOT do a multiplication for the 'PIXON', but
instead moves a pointer and a mask in the register. this saves a LOT of
time!

try to time the following code on a calculator:
"SAVE
A=0.A R0=A.A
{
D0=(5)$806D5 A=DAT0.A D0=A
LC 00015 A=C.A B=C.A C=R0.A
GOSBVL 16642

A=R0.A A+2.A R0=A.A
LC 00030 ?A<C.A UP
}
LOADRPL
@"

>3) I tend to use Scratch Registers (R0 to R4) like store/load areas.
>Because i need much more variables that there are Scratch registers, i
>store several of them per register, using CSLW5, ASLW5, etc. calls to
>switch between variables.
>Bad idea, each of these calls costs 124 cycles. Quite sizable. i
>wasn't expecting that.
>There were 6 such calls within the inner loop. By reducing them to 4,
>i can already bench some sensible benefits.
>I believe this is the current main performance bottleneck, as there
>are calls like these everywhere in the code.

Yes, very bad idea. In addition, as you can see from my code, there is only
a need for 1 scratch register use in the inner loop.
what I would do if I was you is modify the code that I gave you to calculate
the X/Y coordinate for the first and last points. find which octans they are
in and calculate how many pixels to 'skip' store that data in D1 (which is
not used if you do not use the gray stuff), and go from there...

>So i guess, one of the conclusions : counting cycles pay off.

actually, it does not much. Once you know enough about optimization, you
usually don't have to count cycles anymore... you intuitively do the right
thing.

the whole point of optimization is: MAKE ALL INSTRUCTION DO STUFF THAT
ADVANCE THE PURPOSE.
a memory store or load, a register move does NOT advance the purpose and
should be eliminated (appart if the store is the desired effect of the
algo)... so, minimize all that and you will go faster...

>For the time being, I think i will keep this part as it is, as i like the
>memory footprint advantage.

there is plently of memory, don't care for it... just expand your code as
needed... especially if it allows to remove stuff from critical loops...

5.2 ) Toolbox makes little use to external calls, such as POP#, #MUL, and so
on.

yep, they never do exactly what you want, therefore forcing you to move
stuff aroudn in register, clutering precious scrtach registers and therefore
slowing you down...

regards, cyrille


cyrille de brebisson

unread,
Aug 8, 2008, 11:39:57 AM8/8/08
to
hello,

here is the code in ASM for an arc.

this is a source from a debug4x project, so it should compile readely if you
create a project and add this file in (make the project a library)...

the angles are given in a counter trigo notation (90° is south), and must be
multiplied by 8.9360 and some change before sending to the ASM function.
I will let you write the SysRPL wraper around, but basicaly, it should take:
6: Grob
5: X
4: Y
3: R
2: A1
1: A2
, mutliply the angles by the constant and coerce them (assumes that the
angles are 0<=Angle<=360)
and then call the circle function.

regards, cyrille


RPL
( C:\Documents and Settings\cydeb\Desktop\circle\circ.S, part of the cr.HPP
project, created by <> on 8/8/2008 )

INCLUDE cr.h

xNAME CC
::

CODEM

SAVE
LC 00000 R2=C.A
{
LC 0010C R0=C.A
C+C.A R1=C.A
D0=(5)$806D5 A=DAT0.A D0=A
LA 00045 LC 00020 B=C.A C=R2.A
GOSUB _Circle

A=R2.A A+2.A R2=A.A


LC 00030 ?A<C.A UP
}
LOADRPL

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% Circle asm toolbox %
% %
% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %

% R0a: 360-AngleStart*8.93608577019 90°->804%
% R1a: 360-AngleEnd*8.93608577019 %


% the circle must be in the -2047, 2047 square %
% Grob can not be bigger than 2048 pixels %
% %

% Uses: Carry, P, RSTK2 %
% D0 (Last pix Plane1) %
% D1 (Last pix plane2) %
% R3a: +/- Line Width %

% Cs: Pixel Mask %
% As: Undefine %
% Bs: Undefine %
% IRAMBUFF (50 nibbles) %
% %
% All functions work either in W&B or gray scale %
% All functions are using standard bresenham algo %
% D0, D1 are pointing on the 2 bit plans. %
% During the process, %
% Am: 4*X Error %
% Bm: 4*Y Error %
% Cm: Error %
% Dm: 2* Radius %
% Ax: XCurrent %
% Bx: XMax (Grob Width) %
% Cx: YCurrent %
% Dx: YMax (Grob Height) %
% D0: Pointer on bit plan 1 %
% D1: Pointer on bit plan 2 %
% Cs: Pixel mask %
% R3a: Line Width or - Line Width %

*_Circle
ST=0.0 % by default, start with pixel not drawn...
AR1EX.A CR0EX.A ?C<A.A { ST=1.0 ACEX.A } A-C.A AR0EX.A %Ca: start angle, Aa:
radius, R0: A2-A1, R1: Cx
% D= C*A/512 -> D= Angle*8.9*R/512 -> A*2*pi*R/360 -> number of pixels from
0° to Angle
D=0.A
?ABIT=0.0 { D+C.A } C+C.A
?ABIT=0.1 { D+C.A } C+C.A
?ABIT=0.2 { D+C.A } C+C.A
?ABIT=0.3 { D+C.A } C+C.A
?ABIT=0.4 { D+C.A } C+C.A
?ABIT=0.5 { D+C.A } C+C.A
?ABIT=0.6 { D+C.A } C+C.A
?ABIT=0.7 { D+C.A } DSRB.A
?ABIT=0.8 { D+C.A } DSRB.A
?ABIT=0.9 { D+C.A } DSRB.A
?ABIT=0.10 { D+C.A } DSRB.A
?ABIT=0.11 { D+C.A } DSRB.A
DSR.A
C=D.A D1=C C=R0.A % D1: number of pixel to first change Ca: A2-A1
D=0.A
?ABIT=0.0 { D+C.A } C+C.A
?ABIT=0.1 { D+C.A } C+C.A
?ABIT=0.2 { D+C.A } C+C.A
?ABIT=0.3 { D+C.A } C+C.A
?ABIT=0.4 { D+C.A } C+C.A
?ABIT=0.5 { D+C.A } C+C.A
?ABIT=0.6 { D+C.A } C+C.A
?ABIT=0.7 { D+C.A } DSRB.A
?ABIT=0.8 { D+C.A } DSRB.A
?ABIT=0.9 { D+C.A } DSRB.A
?ABIT=0.10 { D+C.A } DSRB.A
?ABIT=0.11 { D+C.A } DSRB.A
DSR.A
C=D.A ACEX.A AR1EX.A % R1: number of pixel from A1 to A2, Ca: radius, Aa:
Center X

A+C.A % Compute cx (cx+Rayon)
C=0.M C+C.A CSL.A CSL.A CSL.W D=C.M % Dm=Rayon*2
CSL.M CSL.M C=A.A A=C.W % Am=Rayon*2=X

GOSUBL _aGrey?


D0-10 C=DAT0.A D=C.X % Read Max Y
D0+5 C=DAT0.A BCEX.A RSTK=C % Read Max X, Save cy
D0+5

GOSUBL _ComputePixel % Compute pixel mask and D0 and D1


C=RSTK % Restore cy
ASR.M ASR.M % X=Am=Rayon
C=0.M % Y=Cm=0
B=0.M % Er=0

ST=0.0 % No Pixon for the moment
{ ?ST=0.0 EXIT ?A>=B.X EXIT ?C>=D.X EXIT A=DAT0.S A!C.S DAT0=A.S } % Pixon

( With cliping )
?A=0.M RTY % Quick exit on Radius=0
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
P=14
% Octant 1 From a=0 to - 45
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C+1.X % Y+2, cy+1

CR3EX.W % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.W

D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1

CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 2. From a=-45 to -90
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1

CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2
D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C+1.X % er+Y, Y+2, cy+1

CR3EX.W % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.W
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 3 From a=-90 to -135
% Doing the same thing again, but inverting Y and X
C=D.M A=C.M B=0.M C=0.M % ReInitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A-1.X % Y+2, cx-1

CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1

CR3EX.W % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.W
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 4. From a=-135 to -180
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1

CR3EX.W % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.W
D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A-1.X % er+Y, Y+2, cx-1

CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 5 From a=-180 to -225
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C-1.X % Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX

CR3EX.A

D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1

C+C.S SKNC { C+1.S D0+1 } % cx+1 part2
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 6. From a=-225 to -270
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1

C+C.S SKNC { C+1.S D0+1 } % cx+1 part2
D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C-1.X % er+Y, Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX

CR3EX.A
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 7 From a=-270 to -315
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A+1.X % Y+2, cx+1

C+C.S SKNC { C+1.S D0+1 } % cx+1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX

CR3EX.A
} % Else, error<0

?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}


% Octant 8. From a=-315 to -360
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX

CR3EX.A
D1-1 SKNC { CD1EX CR1EX.A CD1EX ?ST=0.0 { } ST=1.0 EXITC ST=0.0 } % count
pixels, turn pixelON Flag on/off as needed, update counter


?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A+1.X % er+Y, Y+2, cx+1

C+C.S SKNC { C+1.S D0+1 } % cx+1 part2


}
?A#0.P EXIT % End of octan?

?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
P=0 RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Return informations on a graphic object %
% %
% Input: D0: @ grob %
% P=0, HEX %
% Output:

% R3a: Line Width %
% D0: @ first pixel of the screen %
% %
% uses: RSTK1, Ca, Carry, mp %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

*_aGrey?
R3=C.A % Save Ca in R3


D0+15 C=DAT0.A C+7.A CSRB.A CSRB.A CBIT=0.0 % Compute line width

D0+5 % read grob size (only bitplans)

D0 point on the first pixel

CR3EX.A % Save Line Width
RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Compute Address of first pixel to draw %
% Input: R3a: Line Width ( Even, < 512 ) %

% Aa: X %
% Ca: Y %
% D0: @ First pixel %
% P=0, HEX %
% %
% return: D0: @ Pixel, D1: @ Pixel plan 2 %
% Cs: Pixel Mask %
% P=0, HEX %
% %
% Uses: Cw, D0, D1, Carry, P, RSTK1 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

*_ComputePixel


AD0EX ABEX.A % We must now compute the position of the first
pixel
AR3EX.A C+C.A % D0=Aa Ba=D0 Aa=Ba
?ABIT=0.1 { B+C.A } C+C.A % We Want: D0=@Pix+Y*Width+X/4, D1=D0+PlanSize if
fGray
?ABIT=0.2 { B+C.A } C+C.A % Step 1: Compute B=B+Y*Width ( Width even <=
512 )
?ABIT=0.3 { B+C.A } C+C.A
?ABIT=0.4 { B+C.A } C+C.A
?ABIT=0.5 { B+C.A } C+C.A
?ABIT=0.6 { B+C.A } C+C.A
?ABIT=0.7 { B+C.A } C+C.A
?ABIT=0.8 { B+C.A } C+C.A
?ABIT=0.9 { B+C.A }
AR3EX.A % restore Aa=Ba
CD0EX D0=C % Ca=X
P=C.0 CSRB.A CSRB.A B+C.A % Compute pixel address

ABEX.A AD0EX % Restore Ba=Ba, Aa=Aa, D0 point on pixel on plan
1
LC 1248124812481248 P=0 % Compute pixel mask
RTN

ENDCODE

;


"Yann" <kdo...@gmail.com> wrote in message

news:adfde48e-c901-4980...@w7g2000hsa.googlegroups.com...

cyrille de brebisson

unread,
Aug 8, 2008, 11:52:43 AM8/8/08
to
hello,

BTW, can you time test it as I do not have a 50g here to test it...

cyrille
"cyrille de brebisson" <cyr...@hp.com> wrote in message
news:g7hpcf$shu$1...@usenet01.boi.hp.com...

Yann

unread,
Aug 8, 2008, 9:41:36 PM8/8/08
to
Thanks very much for these informations

Jacob,
i believe your explanations are quite clear. It should prove enough to
start again coding.
I just need to find a way to calculate end/begin x/y without relying
on RPL calls, should it be possible & efficient.

Ah, and regarding this first implementation of "approximative arc",
yes, there is a slight difference in the order of parameters compared
your version :
5 : X0
4 : Y0
3 : Radius
2 : Beginning Arc (°)
1 : End Arc (°)


I've spent the last few days at improving the Circle algorithm,
because this is still a root source for Arc.
And the results are encouraging.

After getting rid of the previous "shift" methodology, i end up with
the following benchmarked results :
FastCircle V2
HP48SX
30 pixels (65,32,30) ==> 92ms
60 pixels (30,-10,60) ==> 98ms
200 pixels (60,215,200)==> 195ms


At this stage, FastCircle seems reasonably well optimised, and does
even compare favorably to Mark Power's Graph Library (http://
www.btinternet.com/~mark.power/hp48.htm) and any other assembler code
from HpCalc.org i could test. I've borrowed some ideas from Mark
Power's code by the end of development, which did not change
performance much (even slowed slightly by 1-2%), but help decrease the
code size.
As a consequence, FastCircle V2 is still 410 Bytes, although the
SysRPL header has increased quite a bit, due to a new feature : the
possibility to draw into any externally defined Grob (optional).

Input :
4 : Grob (Optional, if none provided, will draw into Graphic area)
3 : X0
2 : Y0
1 : Radius
Notes :
(0,0) is upper left.
X0 & Y0 can be negative.
All values are limited to 1000 (Grob Width/Height, X, Y & Radius)

You can download Binaries & Source code here :
http://phantasie.tonempire.net/utilitaires-f7/fastcircle-t58.htm


Cyrille, thank you for all these new inputs,
i will spend some time to digest everything, and try to make use of it
for the next step.
Some parts are still mysterious to me, especially the one where you
explain using a mask instead of a multiplication. well, that may be
because i'm tired after a pretty long trip today...

There are 2 directions which i believe can be investigated to improve
performances a bit more.

1) For drawn pixels, i'm using a hand-made multiplication, which is
faster than standard #MUL call?
For a standard 131 pixels width screen, it costs 163 cycles, instead
of 360 as before.
Quite an improvement, but it is still the most costly operation for a
drawn pixel.

Well, i find it quite a waste to do so many tests for each pixel,
whereas i know since initialisation one of the values, nibble-width.
For example, should i hard-code a fixed 131 pixels, which is 22h
nibbles, i would rather program :
"
C=C+C A
B=B+C A
CSL A
B=B+C A
"
That would be much much faster.

So, instead of running the full "generic" multiplication code, i've
ended up with the idea to actually produce the multiplication code at
initialisation. A kind of "tiny little brother" to modern concept such
as dynamic compilation.

I wonder if this can be considered on HP48, and what would be best way
to achieve this, if this is usefull.
The first idea was to replace the current space for multiplication by
a "writable" area, but that means self-modifying code, and does not
sounds good.
Another possibility would be to create a temporary object, in which
the multiplication code will be created, and link to it. Sure, GOSBVL/
RTN or PC manipulations are not cheap, but well, it should end up with
some interesting gain.
I currently wonder it that's actually an interesting idea.
I should also look a bit more into your mask methodology.

2) For partial circles (out of area sections), especially for large
ones,
the discarding algorithm, although very much improved, still run on a
per-pixel basis.
I guess it *could* be faster to discard whole sections if it can be
easily calculated, at initialisation.
This is still to be investigated at this stage,
I have a simple idea in mind currently, based on squares,
but this is bound to increase code size, so i wonder if this is
actually worth it.


Please feel free to comment on any of these lines Cyrille. I think i
still have a lot to learn from your last posts, so will spend some
time tomorrow to dig in a bit more.

Regards

cyrille de brebisson

unread,
Aug 8, 2008, 10:49:03 PM8/8/08
to
hello,

>Some parts are still mysterious to me, especially the one where you
>explain using a mask instead of a multiplication. well, that may be
>because i'm tired after a pretty long trip today...

ok, here is the trick.
let us say that you KNOW that your first pixel to print is the pixel 2 at
address X.
Load X in D1 and load 4 in Ds (because 4 is 0100 or, pixel number 2 set...)

now, the breasenham algo only works by UP/DOWN or LEFT/RIGHT moves...

if you want to move UP, just increate D1 by the length of line (assuming
line lenght is in R0: AD1EX CR0EX.A A+C.A AD1EX CR0EX.A ) and you are done!
you have the address of the pixel one above in D1 and it's mask in Ds. So,
to do a pixon, just do C=DAT1.S C!D.S DAT1=C.S
if you want to move left: DSRB.S ?D#0.S { D1-1 D+8.S } ie: shift the mask 1
bit to the right, if there is still a bit in Ds, all is ok, else it means
that you need to work on pixel 3 (2^3=8) at the previous address...

if you do that, then you only have to calculate the address and mask for 1
pixel (the first one)...


>1) For drawn pixels, i'm using a hand-made multiplication, which is
>faster than standard #MUL call?
>For a standard 131 pixels width screen, it costs 163 cycles, instead
>of 360 as before.
>Quite an improvement, but it is still the most costly operation for a
>drawn pixel.

heuu, if you are working on a standard size screen, you can calculate the
address/mask for a pixel like this:
%Aa: Y Ca:X R0:@screen
P=C.0 D0=C C=R0.A % save X in P and D0. C=@Screen
A+A.A C+A.A ASL.A C+A.A % C=C+A*2+A*2*16 -> C=C+A*34: ie: C=@line where
pixel is
AD0EX ASRB.A ASRB.A A+C.A D0=A % D0=@ of the pixel
LC 1248124812481248 P=0 % Load in Cs the mask of the pixel to turn on...
that should be around 120 cycles or so and does all the calculations needed
to turn your pixel on, it only modifies A, C, P and D0...

but anyhow, this is not a problem as, as stated above, you realy only need
to do this once at the begining...


>2) For partial circles (out of area sections), especially for large
>ones,
>the discarding algorithm, although very much improved, still run on a
>per-pixel basis.
>I guess it *could* be faster to discard whole sections if it can be
>easily calculated, at initialisation.
>This is still to be investigated at this stage,
>I have a simple idea in mind currently, based on squares,
>but this is bound to increase code size, so i wonder if this is
>actually worth it.

I think that the discard should be done by quadran... just work on the
quadran where there are pixels that needs turning on... this is the best
because easy to calculate the startup error for a quadran and easy to
calcualte the X/Y coordinates of the pixel at the begining of the quadran...

Just stop bothering about size. Size does mater, in some cases, but not in
this one... there is plenty of space on the calculator so don't sweat it....

regards, cyrille


Jacob Wall

unread,
Aug 9, 2008, 2:54:30 PM8/9/08
to
On Aug 7, 6:32 am, Yann <kdo4...@gmail.com> wrote:
> Hello Jacob
>
> Here is a first try at an ASM Arc Algorithm :http://img21.xooimage.com/files/8/6/0/fast-approximative-arc-v1.0-570...

Hello Yann,

I finally have some results from running your program on my 50g, here
are the times for the 30 pixel radius circle:
Full: 0.08 seconds
Half: 0.05 seconds
Quarter: 0.04 seconds
Eighth: 0.03 seconds
1 degree: 0.02 seconds

That is incredible!!!! The arc simply appears instantly.

Cyrille, still working on trying to get your arc program running on my
calculator, for some reason no matter what the input when I call xCC I
momentarily see a "ray" or "beam" of approximately 1 octant (octant 1
as described in your program comments) basically from the center of
the screen out to the extent (or close to) of the screen and the stack
is unchanged. This happens even when I provide no input at all. Any
idea as to what I'm doing wrong? I get the same results on Emu48.

Jacob

Yann

unread,
Aug 9, 2008, 8:07:00 PM8/9/08
to
> now, the breasenham algo only works by UP/DOWN or LEFT/RIGHT moves...
> if you want to move UP, just increate D1 by the length of line (assuming
> line lenght is in R0: AD1EX CR0EX.A A+C.A AD1EX CR0EX.A ) and you are done!

OK, now i understand why you are not using circle symetry :
you need the previous point and its nibble address to determine the
next one.

That's indeed very clever, good performances, low complexity, quite a
dream.
So be sure that i will make use of it for the next development phase.


I've been using the last few hours at testing a previously mentionned
idea : dynamically generate the code for multiplication during
initialisation, and the results are pretty interesting.
Actually, the funny thing is that now the multiplication code is so
fast (28 cycles for a normal 131 pixels screen) that it costs less
than the jump statements around it (48 cycles).
I couldn't find anything faster (& simpler) than :
A=D0 [16] % this in fact is ==> AD0EX D0=A
APCEX [16]
(... here is the dynamically created code ending with : )
APCEX [16] or PC=A [16]

Well anyway, this was interesting to try, and maybe such a concept
could be reused later.
New performance numbers follows :

FastCircle V2.5 on HP48SX (422 Bytes)
30 pixels (65,32,30) ==> 76ms
60 pixels (30,-10,60) ==> 88ms
200 pixels (60,215,200)==> 184ms

At this level of performance, i was wondering if the header was
becoming a bottleneck,
so i released FastCircle PureCode, for programmers, in order to find
out, and here are the results :

FastCircle PureCode v2.5 on HP48SX (299.5 Bytes)
30 pixels (65,32,30) ==> 56ms
60 pixels (30,-10,60) ==> 66ms
200 pixels (60,215,200)==> 161ms

So it seems that the header does actually cost a fixed 20ms time,
which is not unexpected.

You can find the new binaries for all HP platforms at this address :
http://phantasie.tonempire.net/utilitaires-f7/fastcircle-t58.htm

I believe this is the last release for full circle drawing, as it
seems to be a good enough base to work on the real objective of this
thread, Arcs drawing.


> here is the code in ASM for an arc.
> this is a source from a debug4x project, so it should compile readely if you
> create a project and add this file in (make the project a library)...

Thanks Cyrille. The sources codes you provided do indeed compile fine,
both for Arcs & Circles.
But for some reason, i'm unable to get any "drawn pixel" out of it.
Note that it does not crash either.

I've followed instructions to provide inputs directly in registers as
requested


% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %
% R0a: 360-AngleStart*8.93608577019 90°->804%
% R1a: 360-AngleEnd*8.93608577019 %

but to no avail, neither for Arc nor for Circle.

Furthermore, it seems the header you provided with the second code is
intentionnally wiping out input data with values of its own.
Therefore, i was expecting a fixed output from running the code, but
got a blank screen.

It could be this is because i'm testing on HP48 S/G, and the results
might be different on HP49 ?


> I finally have some results from running your program on my 50g

Thanks Jacob for taking the time to test it on your calc. Indeed, this
is encouraging.
I believe i will get rid of the "one pixel approximation" in the next
version, thanks to your explanations on algorithm and thanks to
Cyrille too.

Regards

cyrille de brebisson

unread,
Aug 11, 2008, 8:52:26 AM8/11/08
to
hello,

>Thanks Cyrille. The sources codes you provided do indeed compile fine,
>both for Arcs & Circles.
>But for some reason, i'm unable to get any "drawn pixel" out of it.
>Note that it does not crash either.

>I've followed instructions to provide inputs directly in registers as
>requested
% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %
% R0a: 360-AngleStart*8.93608577019 90°->804%
% R1a: 360-AngleEnd*8.93608577019 %

>but to no avail, neither for Arc nor for Circle.

does D0 point on the prolog of the GROB object or the data part of the GROB?
if you get it to point on the GROB prolog, it should work....

regards, cyrille


Yann

unread,
Aug 11, 2008, 11:17:44 AM8/11/08
to
Hello Cyrille

> does D0 point on the prolog of the GROB object or the data part of the GROB?
> if you get it to point on the GROB prolog, it should work....
>

Yes, it is the prolog, straight from the stack :

C=DAT1.A
D0=C


Regards

cyrille de brebisson

unread,
Aug 11, 2008, 2:40:04 PM8/11/08
to
hello,

that is strange.

anyhow, here is an updated version with some optimizations (skip of first
quadran if possible (I thouls do the same for the other quadrant, but I do
not have that much more time to spend on it, so if someone wants to spend
the time....

Right now, using emu 48 for timing, it look like it can draw a 23 pixel
diameter circle in around 8ms....

if you need help in trying to implement the quadran skip for the Q2, Q3, Q4,
just send em an email... what needs to be done is implement the skip code
and remove an extra EXIT in the quatran to skip code at the begiing of the
circle routine...

cyrille

RPL
( C:\Documents and Settings\cydeb\Desktop\circle\circ.S, part of the cr.HPP
project, created by <> on 8/8/2008 )

INCLUDE cr.h

xNAME CC
::
CODEM

SAVE A=DAT1.A A+10.A R4=A.A
A=0.A R2=A.A
{
D0=(5)$806D5 A=DAT0.A D0=A D1=A D1+20
% erase screen
A=0.W LC 87 { DAT1=A.16 D1+16 C-1.B UPNC }
% angles 1 and 2

D0=(5)$806D5 A=DAT0.A D0=A
A=0.A R0=A.A LC 00B69 R1=C.A
% X, Y, radius
LA 00045 LC 00020 B=C.A LC 17
GOSUB _Circle P=0

D0=(5)$806D5 A=DAT0.A D0=A
A=R2.A R0=A.A LC 0010C A+C.A LC 00B69 { A-C.A UPNC } A+C.A R1=A.A
% LA 00Af4 R0=A.A LC 0010C A+C.A LC 00B69 { A-C.A UPNC } A+C.A R1=A.A
% X, Y, radius
LA 00045 LC 00020 B=C.A LC 15
GOSUB _Circle P=0
% LC 01000 { C-1.A UPNC }
A=R2.A A+16.A R2=A.A LC 00B69 ?A>=C.A EXIT UP
}
LOADRPL

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Circle asm toolbox %
% %

% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %

% R0a: 360-AngleStart*8.4 %
% R1a: 360-AngleEnd*8.4 %


% the circle must be in the -2047, 2047 square %
% Grob can not be bigger than 2048 pixels %
% %
% Uses: Carry, P, RSTK2 %
% D0 (Last pix Plane1) %
% D1 (Last pix plane2) %
% R3a: +/- Line Width %
% Cs: Pixel Mask %
% As: Undefine %
% Bs: Undefine %

% All functions are using standard bresenham algo %

% D0 is pointing on the bit plan. %


% During the process, %
% Am: 4*X Error %
% Bm: 4*Y Error %
% Cm: Error %
% Dm: 2* Radius %
% Ax: XCurrent %
% Bx: XMax (Grob Width) %
% Cx: YCurrent %
% Dx: YMax (Grob Height) %
% D0: Pointer on bit plan 1 %

% D1: Nb Pixel until we turn pixon ON/OFF %


% Cs: Pixel mask %

% ST0: currently not drawing pixel (or drawing) %
% Ds: nuber of pixon/pixof switch before bail out..%
% Bs: bit feilf for quadrant to skip %
% R1a: Nb pixel for next series of Pixon/OFF %


% R3a: Line Width or - Line Width %
% %
% The Bresenham algo is use to draw the circle. %
% The original Bresenham use to compute the circle %
% Only on 1/8 of the circle and draw the other part%
% using symetrie. Doing a full pixon is quite slow %
% with a saturn, so, we are using 8 times %
% bresenham to draw suxcecivly the 8 octants of %
% the circle. Am, Bm and Cm are use to store the %
% values needed by bresenam, Ax, Bx, Cx and Dx are %
% used to store the current pixel coordinates %
% and the grob dimention to perform the cliping. %

% D0 and Cs are use to point on the current %

?C=0.A RTY % Fast exit if radius = 0

ST=0.1 ST=0.2 ST=0.0 % by default, start with pixel not drawn... the ST1 and
2 are here for protection so the add do not change other flags
D=0.S D+1.S
B=0.S
AR1EX.A CR0EX.A ?C<A.A { ST=1.0 ACEX.A D+1.S } A-C.A AR0EX.A %Ca: start

angle, Aa: radius, R0: A2-A1, R1: Cx
% D= C*A/512 -> D= Angle*8.9*R/512 -> A*2*pi*R/360 -> number of pixels from
0° to Angle

ST=0.3
?C#0.A
{
D-1.S CSTEX C+1.A CSTEX
ST=1.3
} SKELSE {
% build a bitfeild for every quadran that needs to be skipped before we
start painting
?ST=0.0 { { D=C.A LC 002D5 D-C.A EXITC B+1.S D-C.A EXIT EXITC B+2.S D-C.A
EXITC B+4.S D-C.A EXITC B+8.S } C+D.A }


D=0.A
?ABIT=0.0 { D+C.A } C+C.A
?ABIT=0.1 { D+C.A } C+C.A
?ABIT=0.2 { D+C.A } C+C.A
?ABIT=0.3 { D+C.A } C+C.A
?ABIT=0.4 { D+C.A } C+C.A
?ABIT=0.5 { D+C.A } C+C.A
?ABIT=0.6 { D+C.A } C+C.A
?ABIT=0.7 { D+C.A } DSRB.A
?ABIT=0.8 { D+C.A } DSRB.A
?ABIT=0.9 { D+C.A } DSRB.A
?ABIT=0.10 { D+C.A } DSRB.A
?ABIT=0.11 { D+C.A } DSRB.A
DSR.A

C=D.A D1=C % D1: number of pixel to first change Ca: A2-A1
}
C=R0.A D=0.A


?ABIT=0.0 { D+C.A } C+C.A
?ABIT=0.1 { D+C.A } C+C.A
?ABIT=0.2 { D+C.A } C+C.A
?ABIT=0.3 { D+C.A } C+C.A
?ABIT=0.4 { D+C.A } C+C.A
?ABIT=0.5 { D+C.A } C+C.A
?ABIT=0.6 { D+C.A } C+C.A
?ABIT=0.7 { D+C.A } DSRB.A
?ABIT=0.8 { D+C.A } DSRB.A
?ABIT=0.9 { D+C.A } DSRB.A
?ABIT=0.10 { D+C.A } DSRB.A
?ABIT=0.11 { D+C.A } DSRB.A
DSR.A

C=D.A ?ST=0.3 { D1=C } ACEX.A AR1EX.A % R1: number of pixel from A1 to A2,

Ca: radius, Aa: Center X

A+C.A % Compute cx (cx+Rayon)

C=0.M C+C.A CSL.A CSL.A CSL.W % Cm=Rayon*2

D0+10 C=DAT0.X C=D.S D=C.W % Dx: MaxY Dm: Rayon*2 Ds: nb pixel
color changes...
D0+5 C=DAT0.A RSTK=C C+7.A CSRB.A CSRB.A CBIT=0.0 R3=C.A % Compute line
width
D0+5

% Aa: cx, Ba: cy, Ca: line lenght, D0: @screen D0=D0+B*C+A/4. Do not modify
Aa and Ba

% We must now compute the position of the first pixel

AD0EX ABEX.A % D0: cx Ba: @Screen, Aa: cy Ca: line width
C+C.A


?ABIT=0.1 { B+C.A } C+C.A % We Want: D0=@Pix+Y*Width+X/4,

?ABIT=0.2 { B+C.A } C+C.A % Step 1: Compute B=B+Y*Width ( Width even <=
512 )
?ABIT=0.3 { B+C.A } C+C.A
?ABIT=0.4 { B+C.A } C+C.A
?ABIT=0.5 { B+C.A } C+C.A
?ABIT=0.6 { B+C.A } C+C.A
?ABIT=0.7 { B+C.A } C+C.A
?ABIT=0.8 { B+C.A } C+C.A
?ABIT=0.9 { B+C.A }

% D0: cx Ba: @Pixel, Aa: cy


CD0EX D0=C % Ca=X
P=C.0 CSRB.A CSRB.A B+C.A % Compute pixel address
ABEX.A AD0EX % Restore Ba=Ba, Aa=Aa, D0 point on pixel on plan
1
LC 1248124812481248 P=0 % Compute pixel mask

C=RSTK BCEX.A % Bx: max X Cx: current y

{ ?ST=0.0 EXIT ?A>=B.X EXIT ?C>=D.X EXIT A=DAT0.S A!C.S DAT0=A.S } % Pixon
( With cliping )

% First quadrant. do we need to bypass this quadrant or not?
SB=0 BSRB.S ?SB=0 { } SKNC { GOTO .Do_Bypass_Q1 }
% test for bypass of this quadran..
% ok, quadrant needs bypassing, so X=X-Radius, Y=Y+Radius.
DSRC DSRC DSRC DSRB.A CDEX.A % Ca: radius
D+C.X A-C.X % Update current X and Y
% Also need to move the pixel pointer... and this without modifying ANY
registers!
% need to do D0+R3*Radius+Radius/4 and also deal with the mask...
AD0EX ABEX.A AR3EX.A % Ba: @ pixel, Aa: line width, Ca: radius
B=B+A*C. note, A<512 and even
C+C.A
?ABIT=0.1 { B+C.A } C+C.A P=C.0
CSR.A B-C.A CSL.A C=P.0 % remove radius /4
?ABIT=0.2 { B+C.A } C+C.A


?ABIT=0.3 { B+C.A } C+C.A
?ABIT=0.4 { B+C.A } C+C.A
?ABIT=0.5 { B+C.A } C+C.A
?ABIT=0.6 { B+C.A } C+C.A
?ABIT=0.7 { B+C.A } C+C.A
?ABIT=0.8 { B+C.A }

C=P.0 { C-4.B EXITC CSRB.S ?C#0.S UP B-1.A C+8.S UPNC }
CSR.A CSR.A % restore C (radius)
AR3EX.A ABEX.A AD0EX % restore updated D0, Aa and Ba
CDEX.A D+D.A DSLC DSLC DSLC P=14 % restore Current Y, max Y and radius
SKIP {
*.Do_Bypass_Q1
C=D.M A=C.M % X=Am=Rayon


C=0.M % Y=Cm=0
B=0.M % Er=0

A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
P=14
% Octant 1 From a=0 to - 45
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C+1.X % Y+2, cy+1
CR3EX.W % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.W

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 2. From a=-45 to -90
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C+1.X % er+Y, Y+2, cy+1
CR3EX.W % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.W
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
}
% Octant 3 From a=-90 to -135
% Doing the same thing again, but inverting Y and X
C=D.M A=C.M B=0.M C=0.M % ReInitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A-1.X % Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.W % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.W
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 4. From a=-135 to -180
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.W % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.W

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A-1.X % er+Y, Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D0-1 } % cx-1 part2
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 5 From a=-180 to -225
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C-1.X % Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.A

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D0+1 } % cx+1 part2
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 6. From a=-225 to -270
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D0+1 } % cx+1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C-1.X % er+Y, Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
CR3EX.A
}
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 7 From a=-270 to -315
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A+1.X % Y+2, cx+1
C+C.S SKNC { C+1.S D0+1 } % cx+1 part2

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.A
} % Else, error<0
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
% Octant 8. From a=-315 to -360
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
CR3EX.A

D1-1 SKNC { CD1EX CR1EX.A CD1EX D-1.S RTNC CSTEX C+1.A CSTEX } % count

pixels, turn pixelON Flag on/off as needed, update counter

?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A+1.X % er+Y, Y+2, cx+1
C+C.S SKNC { C+1.S D0+1 } % cx+1 part2
}
?A#0.P EXIT % End of octan?
?ST=0.0 UP ?A>=B.X UP ?C>=D.X UP A=DAT0.S A!C.S DAT0=A.S UPNC % Pixon (
With cliping )
}
P=0 RTN

ENDCODE

;

"Yann" <kdo...@gmail.com> wrote in message

news:ede8ccab-33f2-4bce...@m73g2000hsh.googlegroups.com...

Yann

unread,
Aug 11, 2008, 5:45:07 PM8/11/08
to
Thanks Cyrille, i will look into it

Yann

unread,
Aug 11, 2008, 6:52:25 PM8/11/08
to
OK, Fine, i've got it working on a HP48GX.

It seems to produce an "animation" with a small arc traveling into a
larger full-circle.
I expect this is fully intended by the header you produced.

HP48SX on the other hand does not display anything (but there is some
activity for sure)
I noticed that you selected a fixed address for output, $806D5, which
seems to be =ADISP (->Stack grob).
Maybe GX and SX do not have the same behavior with it.
For example, this could be because the screen is not "refreshed" while
the ASM code is running.

OK, so for now i cannot time your code, but i guess i just need to
rewrite the header and that should be ok.
With such a speed announced, i believe there is plenty to learn from
reading your code.

Regards

cyrille de brebisson

unread,
Aug 11, 2008, 7:38:15 PM8/11/08
to
hello,

> HP48SX on the other hand does not display anything (but there is some
> activity for sure)
> I noticed that you selected a fixed address for output, $806D5, which
> seems to be =ADISP (->Stack grob).
> Maybe GX and SX do not have the same behavior with it.
> For example, this could be because the screen is not "refreshed" while
> the ASM code is running.

On the SX, the address of the display is not the same than the GX.. so, on
the SX, you need to change $806D5 to some other address to get the demo
working...

if all you want to do is create an ARC program that takes a gROB as input,
it will be compatible for SX/GX...
if running on an apple, you can probably replace some of the code
(multiplications) by some optimized multiplications instructions...

> With such a speed announced, i believe there is plenty to learn from
> reading your code.

:-) well, I do have some experience developing Saturn code... so that
helps...

regards, cyrille


John H Meyers

unread,
Aug 11, 2008, 8:45:25 PM8/11/08
to
On Mon, 11 Aug 2008 17:52:25 -0500, Yann wrote:

> OK, Fine, i've got it working on a HP48GX.

> HP48SX on the other hand does not display anything...


> a fixed address for output, $806D5, which seems to be =ADISP (->Stack grob).
> Maybe GX and SX do not have the same behavior with it.

Mika's "ent-srt.doc" (an old bible :) says:

70000 - 707D9 RAM pointer & status area [S/SX]

Pointer Name Description
70551 VDISP2 Menu display grob
70556 ADISP Text display grob
7055B VDISP Current display grob
70560* PDISP Pict grob???
70565 GDISP Graphic display grob
7056A* TEMPBOT Bottom of tempob area (?)
7056F TEMPTOP Top of tempob area
70574* RETTOP Top of rpl return stack
FREE RAM
70579 DSKTOP Top of stack
7057E* DSKBOT Bottom of stack
[*] not "supported"

Whereas in G/GX: [non-rom starts at 80000 rather than 70000]

806D5 ADISP
806DA VDISP
806DF PDISP
806E4 GDISP
806E9 TEMPBOT
806EE TEMPTOP
806F3 RETTOP
806F8 DSKTOP
806FD DSKBOT

To get the same document:

http://www.hpcalc.org/search.php?query=ent_srt

[r->] [OFF]

Yann

unread,
Aug 12, 2008, 2:37:44 PM8/12/08
to
Hello

I rewrote the header of your code, Cyrille, in order to have it using
the same convention as Jacob (Bearing 0° is North, Clockwise) and
output in Graphical display, for both HP48SX & GX.

You can download both SourceCode & Binary at :
http://phantasie.tonempire.net/utilitaires-f7/fast-approximative-arc-t59.htm#61

No problem with speed, it is hand down the fastest code around, with
output in the less-than-40ms zone for 48GX, including SysRPL header
which costs about 11ms alone.

However, there are still some issues with this version :

1) Accuracy is not correct. I've displayed some screenshots in the
link to illustrate.
I think this is a direct consequence of the selected methodology
(pixel counting).
I had the same issue with my first code some time ago, with equivalent
accuracy consequences.

2) The first quarter is sometimes not drawn. Probably a bug into the
discarding code.
I tried to remove it in order to release a usable binary, but
unfortunately it produced additionnal side effects, such as a sudden
90° output shift depending on input values (try Start Bearing = -1° &
+1°).

The bug can probably be corrected, but the accuracy may require a
different methodology, potentially resulting non-trivial code changes.

As a personal sidenote, i must add that your code provided me with the
perfect opportunity to start learning MASD. So thanks for your
contribution.

Veli-Pekka Nousiainen

unread,
Aug 12, 2008, 5:27:17 PM8/12/08
to
just to read this posting gives me
as a log time hp user (1st my very own was HP-21)
a very warm feeling

thank you Cyrille and U 2, Yann
VPN

"Yann" <kdo...@gmail.com> wrote in message

news:94754a55-b72a-4776...@m45g2000hsb.googlegroups.com...

username localhost

unread,
Aug 12, 2008, 9:52:28 PM8/12/08
to
On Aug 12, 5:27 pm, "Veli-Pekka Nousiainen"

<velipekka.nousiai...@saunalahti.fi> wrote:
> just to read this posting gives me
> as a log time hp user (1st my very own was HP-21)
> a very warm feeling
>
> thank you Cyrille and U 2, Yann
> VPN

You are a *logarithmic* time HP calulator user?
Hmm.........

Veli-Pekka Nousiainen

unread,
Aug 13, 2008, 1:28:37 AM8/13/08
to
at least if you look back
then timescale seems to be compressed
:-D
"username localhost" <username....@gmail.com> wrote in message
news:c02d4b4e-29e4-4dc0...@x35g2000hsb.googlegroups.com...

Yann

unread,
Aug 14, 2008, 10:08:50 PM8/14/08
to
At last,
here is a (long awaited) usable version for Fast Arc, which seems
reasonably accurate to be of any use.

The binaries for HP48, 49 and 50 can be downloaded at this website :
http://fastarc.webhop.org/

Input :
6 : Grob (<-- Optional, if none provided, will draw into Graphic Area)


5 : X0
4 : Y0
3 : Radius

2 : Start Bearing (°)
1 : End Bearing (°)

Being based on the code sample by Cyrille de Brebisson, it is quite
fast, although after completing the fullcircle code, the focus was no
longer speed, but rather arc accuracy, and getting something out of
the door before any real life issues prevent me from working on it.
Version 1 seems now completed. This version have the following change
compared to previous releases :

1) Accurary :
Granted, the algorithm is still at its core based on an approximation,
but this time, approximation should be less than one pixel, meaning in
fact full accuracy as far as display is concerned.
I've tried to trick the program in a number of way, and could no
longer find any blatant loophole with this version.
Some samples screenshots are provided in the link for illustration

2) Breisenham's cache
Well, i wanted to bring something new, so here it is.
I added a simple but effective cache algorithm, which allows to run
breisenham only once, reusing output for all octants. It significantly
improves performances and makes the code shorter (a simple read-cache
replace the optimized breisenham algo within each octant).

3) Expanded Boundaries
Due to improvements in register allocation, the input limits are now
fairly large, up to 250 000.
It is also possible to define X0 and Y0 (the center of the circle) as
negative values.


Thanks to Cyrille's "nibble traveler" methodology, which avoids
calculating memory pixel location, performance are good.
As a funny side effect, now the program spends more time in the header
than drawing the arc. This was tested with the "PureCode" version,
which removes the header, and it resulted in 31ms being spent on
header alone (on GX), which means twice more time than is necessary to
draw a 30pixels half-circle on the same platform.
Of course, performances on newer platform should be even better.

There are still some room for improvement though, especially as i
really did not optimized the "arc specific" part of the code. This is
partly because my focus was much more on getting something accurate
and fastly available.

Some ideas which could make it in a future release if need be :
- Sector discarding : this avoids to "loop" for each not-drawed pixel,
improving performances for circles with parts out of screen. Of
course, this cost some bytes to program.
- Dynamic compilation : the program would create the code which draws
Arc, using inputs to optimized tests and loops. This will improve
performance by a good margin, and potentially reduce code size too. It
is complex though, and make maintenance more difficult. So this is
rather a "last move" to do, when everything else is stable.

Well, i guess the next move could be on functionality, instead of ever
more performances.
For example, how to allow a user to "replace" the embedded "Arc"
function with this one ?
Would a "forced save" with "ARC" global name in the HOME directory be
enough for this ?
Eventually, i can also add some more features to the header to make it
compatible to HP syntax on top of the one already documented in this
thread. Well, if that is usefull...

Jacob Wall

unread,
Aug 14, 2008, 11:30:06 PM8/14/08
to

Hello Yann,

This is terrific!!! I've done some tests and everything looks great.
Here are my results using a 50g with the same input values you were
using for your 48 tests (Fastarc_v1.hp49&50.hp):

30px, Quarter Circle: 25ms
30px, Half Circle: 26ms
30px, Full Circle: 30ms
60px, Full Circle: 33ms
150px, 3/4 Circle: 41ms

Much faster than my SysRPL program :-), congratulations on your
incredible success.

Jacob

TW

unread,
Aug 14, 2008, 11:53:21 PM8/14/08
to
> 30px, Quarter Circle: 25ms
> 30px, Half Circle: 26ms
> 30px, Full Circle: 30ms
> 60px, Full Circle: 33ms
> 150px, 3/4 Circle: 41ms
>
> Much faster than my SysRPL program :-), congratulations on your
> incredible success.

Indeed that it is very good. Now the next step is to do it in ARM ASM
or in HPGCC. I have an HPGCC routine around here somewhere that does
something like 500-1K arcs a second and it was not very good code.

TW

username localhost

unread,
Aug 15, 2008, 2:08:34 AM8/15/08
to
On Aug 14, 10:08 pm, Yann <kdo4...@gmail.com> wrote:

> Thanks to Cyrille's "nibble traveler" methodology, which avoids
> calculating memory pixel location, performance are good.
> As a funny side effect, now the program spends more time in the header
> than drawing the arc. This was tested with the "PureCode" version,
> which removes the header, and it resulted in 31ms being spent on
> header alone (on GX), which means twice more time than is necessary to
> draw a 30pixels half-circle on the same platform.
> Of course, performances on newer platform should be even better.
>

TEVAL gives timings of 32-48 ms for me for your test cases on the 50g.

(These were full version tests, not PureCode tests)

A few larger scale testcases:
<<50000 50000 20000 0 180 fastarc>> 3.16 seconds
<<50000 50000 20000 0 360 fastarc>> 4.34 seconds
<<50000 50000 20000 180 360 fastarc>> 2.04 seconds

(the timing discrepancy between the first and last of those is
surprisingly large, considering both are half arcs of the same radius
and center point)

For all reasonable sized circles the results seem instant.

Yann

unread,
Aug 18, 2008, 6:32:58 PM8/18/08
to
Thanks for the feedback, i'm glad if it pleases you and can be of any
use.

Just some answers to latest comments :

- Discrepancy : there is a side-effect to the "nibble traveller"
methodology : it avoids calculating the memory position for each
pixel, but it needs a start point. This start point has been hardcoded
at position 90°. Therefore, if you start from position 0°, you have a
worse performance penalty than starting from 180°.
This could be improved, with a bit of logic, to start immediately to
the right octant, something i did for the older code.
But then, this would result in quite a bit more logic, creating havoc
in the carefully designed register allocation, increasing complexity &
code size, therefore i started to wonder if it was really necessary to
be "even more faster" at such a cost.
In the end, i decided to keep code length & complexity as low as
possible. But this, of course, can be changed if someone find it
necessary.

Note that, in your example, an even better solution would be to
discard undrawn quarters (or octants). Because these huge arcs are
"out of screen", not even drawn, this would result in an almost
immediate discarding. Here also, adding this bit of logic is not
hugely complex, but is bound to increase code size and create
complexity. Well, anyway, if there is a need for this, then why not,
this can be done.

- HPGCC (TW) : it is my long-term intention to move to ARM ASM some
day; For the time being, i'm merely learning ASM as an entertaining
exercise, and i find it more attractive to develop for my "old"
hardware student's calc, an HP48SX. The reasoning is : if it is fast
and usable enough for 48SX, then newer model will only find it even
better. For sure, there is a limit, but i'm not yet good enough to
cross through it.

As a sidenote, note that Circle code is quite simpler to do than Arc.
As an example, i've got a quick version of Fastcircle, derived from
Cyrille de Brebisson example with a Breisenham cache, which only needs
30ms to draw a full circle on HP48SX, including header. This is nearly
three times faster than Arc, because it is so much simpler to process.


Well, I guess the next step is to submit the FastArc program software
to HPCalc.org, so that anyone interested can get it in the future.

Regards

0 new messages