Drawing Arcs

Jacob Wall

unread,

Jul 26, 2008, 9:19:52 PM7/26/08

to

Hello all,

I recently became a little obsessed with trying to speed up the
drawing of arcs on the HP 50g because the ARC command is somewhat
slow. I searched previous posts and found the question had been
brought up before but I couldn't find any real answers or attempts by
others to tackle the issue. So after searching the internet for ideas
I cobbled together a SysRPL program that is faster than the ARC
command but I still feel it could be optimized, which is why I'm
posting it here to see if anyone might be interested in pointing out
ways to optimize it, both for speed and size.

A few notes:
- the program assumes DEG mode set
- the program uses the midpoint circle algorithm (aka Bresenham's
circle algorithm)
- Inputs (5 reals) are as follows:
- 5: X0 **pixel x coordinate for arc centre (ex: 65)
- 4: Y0 **pixel y coordinate for arc centre (ex: 40)
- 3: B1 **bearing from arc centre to beginning of arc (ex:
45, bearings are clockwise from North)
- 2: B2 **bearing from arc centre to end of arc (ex: 90,
bearings are clockwise from North)
- 1: y **Radius
- the program is set up for demonstration purpose, therefore the use
of DOCLLCD, PIXON, and WaitForKey
- program size is ~850 bytes as shown
- to draw a full circle at X0=65, Y0=40, B1=0, B2=360, y=30 takes ~2.0
seconds
- to draw same circle with ARC command takes ~4.2 seconds

::
CK5NOLASTWD
DOCLLCD
TURNMENUOFF
%1
OVER
%-
%0
DUP
{
LAM X0
LAM Y0
LAM B1
LAM B2
LAM y
LAM d
LAM x
LAM A
}
BIND
BEGIN
LAM d
%0<
ITE
::
LAM d
LAM x
%2
%*
%3
%+
%+
'
LAM d
STO
;
::
LAM d
LAM x
%2
%*
LAM y
%2
%*
%-
%5
%+
%+
'
LAM d
STO
LAM y
%1-
'
LAM y
STO
;
LAM x
LAM y
%REC>%POL
%360
%MOD
'
LAM A
STO
DROP
% 90.
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%+
LAM Y0
LAM y
%-
COERCE2
PIXON
;
LAM A
LAM B1
%>=
LAM A
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%+
LAM Y0
LAM x
%-
COERCE2
PIXON
;
%180
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%+
LAM Y0
LAM x
%+
COERCE2
PIXON
;
% 90.
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%+
LAM Y0
LAM y
%+
COERCE2
PIXON
;
% 270.
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%-
LAM Y0
LAM y
%+
COERCE2
PIXON
;
%180
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%-
LAM Y0
LAM x
%+
COERCE2
PIXON
;
%360
LAM A
%-
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM y
%-
LAM Y0
LAM x
%-
COERCE2
PIXON
;
% 270.
LAM A
%+
DUP
LAM B1
%>=
SWAP
LAM B2
%<=
AND
IT
::
LAM X0
LAM x
%-
LAM Y0
LAM y
%-
COERCE2
PIXON
;
LAM x
%1+
'
LAM x
STO
LAM x
LAM y
%>
UNTIL
SetDAsTemp
WaitForKey
2DROP
ABND
;
@

Any suggestions would be appreciated.

Jacob

Andreas Möller

unread,

Jul 27, 2008, 8:07:17 AM7/27/08

to

Hello,

use NULLLAMs instead of named LAMs. Accessing them is a lot faster as
their position in memory is caluclated wheras named LAMs are searched.
As a matter of fact you can use any fixed address for binding a LAM, I
usually use TRUE for this because it is on an even address (IIRC) but
this should not matter on a 50G where the SATURN is emulated.
To access NULLLAMs above the current temporary enviroment you have to
add one for the protection word.
You can used DEFINE to make your code more readable with NULLLAMs .

Also according to the timings of EMU48 'case' is faster than 'ITE'.

HTH,
Andreas
http://www.software49g.gmxhome.de

Raymond Del Tondo

unread,

Jul 27, 2008, 9:35:35 AM7/27/08

to

Hello,

for REALLY FAST arc and circle drawing routines,
you might want to take a look at Mark Power's web page:

http://www.btinternet.com/~mark.power/hp48.htm

or more exactly :
http://www.btinternet.com/~mark.power/hp48/glib07.zip

The binary was made for the real HP-48 series,
but since the sources are included (HP/JAZZ syntax) ,
you will be able to port them to the 50g.

HTH

Raymond

"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag
news:57ba5eeb-fd0c-46f9...@w1g2000prk.googlegroups.com...

> Hello all,
>
> I recently became a little obsessed with trying to speed up the
> drawing of arcs on the HP 50g because the ARC command is somewhat
> slow. I searched previous posts and found the question had been
> brought up before but I couldn't find any real answers or attempts by
> others to tackle the issue. So after searching the internet for ideas
> I cobbled together a SysRPL program that is faster than the ARC
> command but I still feel it could be optimized, which is why I'm
> posting it here to see if anyone might be interested in pointing out
> ways to optimize it, both for speed and size.
>

> [..]

Jacob Wall

unread,

Jul 27, 2008, 12:52:59 PM7/27/08

to

Andreas, thanks for the NULLLAMs tip, I was not aware of the
advantages of using NULLLAMs, and in fact it makes quite a difference,
the circle that took ~2.0 seconds to draw only takes ~1.26 seconds to
draw after the changes. I did not notice any real gains when
substituting 'case' for 'ITE'

Raymond, I took a look at Mark Power's web page and looked at the
documentation for his graphics library. I only see a Circle command,
no Arc command. Also I unfortunately know nothing of machine code,
but thanks to the comments in the source file I can tell the same
midpoint circle algorithm is used. When I remove the tests from my
program that decide whether or not the pixel is within the arc or not,
the circle that now takes ~1.26 seconds to draw, only takes ~0.6
seconds. I'm sure that would be (much?) faster in machine code, but
it is beyond my knowledge.

So if anyone is interested in creating a machine code version of my
program I'd be quite interested, seeing as how I don't know how to do
it.

Now, for short arcs (<45 degrees of arc) the ARC command is still
faster than my program, however at 180 degrees of arc, ARC takes more
than double the time, and the full circle more than triple the time.
If an arc of any degree could be drawn in under 0.5 seconds at say the
30 pixel radius......

Jacob

Raymond Del Tondo

unread,

Jul 27, 2008, 7:13:39 PM7/27/08

to

Hi,

right, I just saw Mark's Graphics Library doesn't include an ARC function.

However the the CIRCLE command ist very fast, between less
than 100ms (0.1 sec!) for small circles and about 350ms for bigger ones,
like 75 pixel radius (150 pix diameter;-)

Please note that these times were measured on a real HP-48SX,
with 2 (TWO) MHz CPU clock!

On the doorstop series (49,50) these routines may be even faster.

I didn't dig into the sources so far,
maybe someone with more free time could do that,
to add an ARC like cmd?

BTW:
At least in the real HP-48 series calcs the PIXON cmd is the braking part,
even the internal versions of PIXON/PIXOFF/PIXTOG are very slow,
so these were the essentials to be replaced, which was done in Mark's lib.

Raymond

"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag

news:b0be004b-5d43-4599...@w39g2000prb.googlegroups.com...

cyrille de brebisson

unread,

Jul 27, 2008, 9:40:01 PM7/27/08

to

hello,

why don't you try to do it in ASM? it will be faster... amd nore fun to
code...

cyrille

"Jacob Wall" <j8w...@hotmail.com> wrote in message
news:57ba5eeb-fd0c-46f9...@w1g2000prk.googlegroups.com...

> Hello all,
>
> I recently became a little obsessed with trying to speed up the
> drawing of arcs on the HP 50g because the ARC command is somewhat
> slow. I searched previous posts and found the question had been
> brought up before but I couldn't find any real answers or attempts by
> others to tackle the issue. So after searching the internet for ideas
> I cobbled together a SysRPL program that is faster than the ARC
> command but I still feel it could be optimized, which is why I'm
> posting it here to see if anyone might be interested in pointing out
> ways to optimize it, both for speed and size.
>

> A few notes:=

Jacob Wall

unread,

Jul 28, 2008, 1:02:04 AM7/28/08

to

Raymond, agreed that the CIRCLE command is extremely fast from Mark's
library, I tried it with Emu48 emulating a 48gx, with "Authentic
Calculator Speed" set (although when comparing side by side with Emu48
emulating the 50g and a real 50g, there are discrepancies, although
not too major) Having said that I am not sure that the PIXON in the
50g is the culprit, I think its the fact that for each loop there are
24 tests (%>=, %<=, AND, each appearing 8 times) to decide whether the
pixel of each octant is within the arc, and furthermore COERCE2 also
appears 8 times each loop, if there was such a thing as negative
BINTS...... I got rid of the %REC>%POL and substituted with %ANGLE
which yielded a slight improvement. (~1.14 seconds for the 30 pixel
radius circle, ~0.86 seconds for half circle, ~0.72 seconds for
quarter) Actually now looking at those numbers, maybe PIXON is a
factor, but regardless, even bare minimum the program takes ~0.57
seconds, with zero degrees of arc.

Cyrille, having just recently got into SysRPL and ASM looking like a
different animal altogether, I think I'll hold off just a little on
trying my hand at it, although I suspect that in order to get the
desired speed, ASM will need to be employed.

Jacob

Claudio Lapilli

unread,

Jul 29, 2008, 8:12:30 PM7/29/08

to

On Jul 28, 1:02 am, Jacob Wall <j8w1...@hotmail.com> wrote:
<...>

> Having said that I am not sure that the PIXON in the
> 50g is the culprit, I think its the fact that for each loop there are

<...>

That's easy to test: simply replace PIXON with the proper number of
DROPs and run the same circle with and without PIXON. The difference
is exactly the overhead introduced by PIXON.

Claudio

Jacob Wall

unread,

Jul 29, 2008, 8:44:53 PM7/29/08

to

Yes so easy in fact, I didn't think of it. After replacing PIXON with
2DROP, the 30 pixel radius circle took ~1.02 seconds, which is ~0.11
seconds faster than my best performance with PIXON, not very
significant. Thanks Claudio, that clears the PIXON question.

Jacob

cyrille de brebisson

unread,

Jul 30, 2008, 11:06:46 AM7/30/08

to

hello,

most of the time is probably spent in the memory allocation used whenever a
new object is created. If you can simplify your program so that there are
less objects created, it will run faster.

you can also try to make sure that only small integers are used as they do
not need to be created as they are already present in ROM.

cyrille

"Jacob Wall" <j8w...@hotmail.com> wrote in message

news:7a89d114-f069-4732...@p10g2000prf.googlegroups.com...

Claudio Lapilli

unread,

Jul 30, 2008, 12:48:08 PM7/30/08

to

I have a few suggestions to optimize your routine:

Do not use angles inside the loop. I think it will be a lot faster if
you determine the start and end pixel count for each octant outside
the loop. By pixel count I mean the number of iterations in the
Bressenham loop, which will be pixels in the X direction for some
octants and in the Y direction for others.
The start and end points will therefore be integers, which will allow
you to completely remove real arithmetic inside the loop. Using only
bints will be a lot faster.
So you run the full octant iteration, and only call PIXON for pixels
between the start and end points for each quadrant, removing from the
loop all real comparisons, the conversion from real to polar and the
MOD operation, these last two being very expensive.

Claudio

Jacob Wall

unread,

Aug 1, 2008, 2:02:54 AM8/1/08

to

Hello,

Thanks to all of you who offered suggestions and insight, I've gained
more knowledge about the workings of these machines as a result.

Claudio, another great tip to calculate pixel ranges for each octant
beforehand. This certainly improves performance, especially when the
octant in question is not part of the desired arc, which tells me that
the addition/subtraction of x & y to center X & Y to determine the
pixel to be turned on is where I can still improve it, which as
Cyrille pointed out would be quicker if integers were used. I'll look
further into this in the days to come and will post the new and
improved version of the program once it's complete if anyone is
interested.

As it stands the 30 pixel radius arc times are:
full circle: ~0.88 seconds
half circle: ~0.61 seconds
quarter circle: ~0.48 seconds
eighth circle: ~0.43 seconds
minimum runtime using 1 degree of arc is ~0.36 seconds

Thanks again,

Jacob

Jacob Wall

unread,

Aug 2, 2008, 8:40:17 PM8/2/08

to

Hello,

Just to follow up, not too long ago I wished for:

"If an arc of any degree could be drawn in under 0.5 seconds at say
the
30 pixel radius...... "

Well that is now a reality. Here are the times with the new and
improved program: (30 pixel radius)
full circle: ~0.38 seconds
half circle: ~0.27 seconds
quarter circle: ~0.22 seconds
eight circle: ~0.21 seconds
minimum runtime using 0 degree of arc is ~0.17 seconds

The only thing now is that the program is 1180 Bytes, or about 330
Bytes bigger than the original program.

Below is the new program source, still the same 5 inputs as before.

::
CK5NOLASTWD
DOCLLCD
TURNMENUOFF
5ROLL
5ROLL
COERCE2
5UNROLL
5UNROLL
DUP
COERCE
%1
3PICK
%-
BINT0
DUP
5ROLL
DUP
%*
%2
%/
%SQRT
COERCE
BINT2
{}N
BINT8
NDUPN
DROP
NULLLAM
BINT15
NDUPN
DOBIND
11GETLAM
UNCOERCE
12GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
12GETLAM
COERCE
11GETLAM
UNCOERCE
13GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
13GETLAM
COERCE
BINT3
BINT1
DO
BINT9
BINT1
DO
DUP
BINT45
INDEX@
#*
#1+
#<
IT
::
DROP
INDEX@
::
DUP
BINT1
#=case
TrueTrue
DUP
BINT2
#=case
FalseFalse
DUP
BINT3
#=case
TrueFalse
DUP
BINT4
#=case
FALSETRUE
DUP
BINT5
#=case
TrueTrue
DUP
BINT6
#=case
FalseFalse
DUP
BINT7
#=case
TrueFalse
DUP
BINT8
#=case
FALSETRUE
;
ITE
::
4ROLL
DROP
;
::
ROT
DROP
;
ITE
::
SWAP
JINDEX@
INDEX@
GETLAM
PUTLIST
INDEX@
PUTLAM
;
::
SWAP
BINT3
JINDEX@
#-
INDEX@
GETLAM
PUTLIST
INDEX@
PUTLAM
;
DUP
BINT14
JINDEX@
#-
PUTLAM
ISTOPSTO
;
LOOP
LOOP
13GETLAM
12GETLAM
#>
BINT9
BINT1
DO
DUP
INDEX@
12GETLAM
#>
INDEX@
13GETLAM
#<
ROT
ITE
AND
OR
IT
::
BINT0
INDEX@
PUTLAM
;
LOOP
DROP
BEGIN
::
10GETLAM
%0<
case
::
10GETLAM
9GETLAM
UNCOERCE

%2
%*
%3
%+
%+

10PUTLAM
;
::
10GETLAM
9GETLAM
UNCOERCE
%2
%*
11GETLAM
UNCOERCE

%2
%*
%-
%5
%+
%+

10PUTLAM
11GETLAM
#1-
11PUTLAM
;
;
1GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
2GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
3GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#+
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
4GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#+
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
5GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#+
PIXON
;
;
DROP
6GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#+
PIXON
;
;
DROP
7GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
11GETLAM
#-
14GETLAM
9GETLAM
#-
PIXON
;
;
DROP
8GETLAM
DUPTYPELIST?
ITE
::
INCOMPDROP
DUP
9GETLAM
#>
SWAP
9GETLAM
#=
OR
SWAP
DUP
9GETLAM
#<
SWAP
#0=
OR
AND
IT
::
15GETLAM
9GETLAM
#-
14GETLAM
11GETLAM
#-
PIXON
;
;
DROP
9GETLAM
#1+
DUP
9PUTLAM
11GETLAM
#>
UNTIL
ABND
SetDAsTemp
WaitForKey
2DROP
;
@

Jacob

Raymond Del Tondo

unread,

Aug 3, 2008, 5:53:42 AM8/3/08

to

Hello,

now that's an improvement.

However on a real HP-48SX your program takes ~2.9 secs for the full circle,
and ~1.9 secs on an HP-48GX ;-)

Raymond

"Jacob Wall" <j8w...@hotmail.com> schrieb im Newsbeitrag

news:809075ce-44fa-4194...@a8g2000prf.googlegroups.com...

> Hello,
>
> Just to follow up, not too long ago I wished for:
> "If an arc of any degree could be drawn in under 0.5 seconds at say
> the
> 30 pixel radius...... "
>
> Well that is now a reality. Here are the times with the new and
> improved program: (30 pixel radius)
> full circle: ~0.38 seconds
> half circle: ~0.27 seconds
> quarter circle: ~0.22 seconds
> eight circle: ~0.21 seconds
> minimum runtime using 0 degree of arc is ~0.17 seconds
>
> The only thing now is that the program is 1180 Bytes, or about 330
> Bytes bigger than the original program.
>
> Below is the new program source, still the same 5 inputs as before.
>

> [..]
>
> Jacob

Claudio Lapilli

unread,

Aug 3, 2008, 10:20:22 AM8/3/08

to

On Aug 2, 8:40 pm, Jacob Wall <j8w1...@hotmail.com> wrote:
> Hello,
>
> Just to follow up, not too long ago I wished for:
> "If an arc of any degree could be drawn in under 0.5 seconds at say
> the
> 30 pixel radius...... "
>
> Well that is now a reality. Here are the times with the new and
> improved program: (30 pixel radius)
> full circle: ~0.38 seconds

Cool!
Cutting execution time by half always feels good, doesn't it?... :-)

I have one more idea, that could make it slower or faster (who knows
until you test it). The overhead of running the Bresenham loop is not
too much, and your program seems to have a lot of overhead by trying
to manage all octants at the same time (using lists, and many case's
and ITE's).
Now here's another way of doing it, which could potentially be faster
due to less object handling:

1) Calculate all initial parameters for the loop just like you are
doing now, but save them on a LAM for later use.
2) Run one Bresenham loop for each octant (reusing the saved
parameters).

The advantage is obvious for arcs that don't involve all octants: you
don't need to run the loop, so it will execute faster in that case.
Another advantage is that for any partial octants, you can stop the
loop as soon as you reach the stopping point.
The worst case would be the full circle, in this case you would be
running 8 independent complete loops. In this case, it may run slower
than your current routine, but not too much I think, since speed loss
may be compensated somewhat because you'll be using less LAM's, no
lists, etc.

And you could leave both algorithms, and choose which one to use at
run time based on the start and end angles, using your current routine
for arcs that involve several octants (you could measure the timings
on both routines to determine what's the best "switching" angle of
aperture).

It all depends if you want to keep breaking records or you are
satisfied with what you got... :-)

Regards,
Claudio

Jacob Wall

unread,

Aug 3, 2008, 10:30:36 PM8/3/08

to

Claudio, yeah I'm curious to see what difference it would make to run
one Bresenham loop for each octant, and time permitting I will look
into it.

> It all depends if you want to keep breaking records or you are
> satisfied with what you got... :-)

For now I am satisfied, but as Raymond pointed out, the program still
takes a little too long if run on the 48. Appreciate you taking the
time to run the program on the 48 Raymond. That really makes Mark
Power's machine code graphics library that much more impressive. As a
new to SysRPL programmer, I'm just happy that I can draw arcs and
circles now and not have to wait and wonder if my calculator decided
to go on strike :-)

Jacob

Yann

unread,

Aug 5, 2008, 11:29:41 AM8/5/08

to

Jacob, your proposed exercice is very interesting,
Since you took good care and had great results with SysRPL, would you
mind if i try to have a look at ASM side, which i'm currently
learning ?

I only own an HP48SX myself, but it is probably possible to create a
code which would be compatible with both HP48S and your HP50

Regards

Yann

unread,

Aug 5, 2008, 11:33:09 AM8/5/08

to

cyrille de brebisson

unread,

Aug 5, 2008, 5:19:35 PM8/5/08

to

hello,

you can use that for a base.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Circle asm toolbox %
% %
% input: D0: @ grob %
% Aa: Centre x %
% Ba: Centre y %
% Ca: Rayon %
% the circle must be in the -2047, 2047 square %
% Grob can not be bigger than 2048 pixels %
% %
% 5 different calls: %
% aCircleW Draw a White line %
% aCircleG1 Draw a light gray line %
% aCircleG2 Draw a dark gray line %
% aCircleB Draw a Black line %
% aCircleXor Invert a line %
% %
% Uses: Carry, P, RSTK2 %
% D0 (Last pix Plane1) %
% D1 (Last pix plane2) %
% R3a: +/- Line Width %
% R4a: Plane length %
% Cs: Pixel Mask %
% As: Undefine %
% Bs: Undefine %
% IRAMBUFF (50 nibbles) %
% %
% All functions work either in W&B or gray scale %
% All functions are using standard bresenham algo %
% D0, D1 are pointing on the 2 bit plans. %
% During the process, %
% Am: 4*X Error %
% Bm: 4*Y Error %
% Cm: Error %
% Dm: 2* Radius %
% Ax: XCurrent %
% Bx: XMax (Grob Width) %
% Cx: YCurrent %
% Dx: YMax (Grob Height) %
% D0: Pointer on bit plan 1 %
% D1: Pointer on bit plan 2 %
% Cs: Pixel mask %
% R3a: Line Width or - Line Width %
% R4a: Plan length %
% IRAMBUFF: Pixon routine %
% %
% The Bresenham algo is use to draw the circle. %
% The original Bresenham use to compute the circle %
% Only on 1/8 of the circle and draw the other part%
% using symetrie. Doing a full pixon is quite slow %
% with a saturn, so, we are using 8 times %
% bresenham to draw suxcecivly the 8 octants of %
% the circle. Am, Bm and Cm are use to store the %
% values needed by bresenam, Ax, Bx, Cx and Dx are %
% used to store the current pixel coordinates %
% and the grob dimention to perform the cliping. %
% D0, D1 and Cs are use to point on the current %
% pixel. %
% %
% The algo we are using is: %
% X=2*Rayon-1 %
% Y=1 %
% cx=cx+rayon %
% cy=cy %
% Error = 0 %
% pixon(Cx, Cy) %
% while Y<X do %
% Begin %
% Error=Error+Y %
% Y=Y+2 %
% cy=cy+1 %
% If error>=X then %
% Begin %
% error= error-X %
% X=X-2 %
% cx=cx-1 %
% End %
% pixon(Cx, Cy) %
% End; %
% while X<>0 do %
% Begin %
% Error=Error-X %
% X=X-2 %
% cx=cx-1 %
% If error<0 then %
% Begin %
% error= error+Y %
% Y=Y+2 %
% cy=cy+1 %
% End %
% pixon(Cx, Cy) %
% End; %
% %
% The same routine is used all the time, but %
% a different Pixon routine is copied in IRAMBUFF %
% for each type of line, and this routine is used %
% to affect the pixels. the _Prepx routine are used%
% to copy the pixon routine in RAM. Note that to %
% decrease teh amount of used register by this %
% routine, the code is charged in Cm using LC with %
% P=3. in order to have a readable code, the LC %
% mnemonic is not used, bu is replaced by the hex %
% opcode: 3x where x is the number of nibbles to %
% load in C-1. at the end of the routine, C[34] are%
% loaded with 00. This means the this part of the %
% Ca register is lost %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*aCircleB
GOSUB _PrepB
GOTO _Circle

*aCircleW
GOSUB _PrepW
GOTO _Circle

*aCircleG1
GOSUB _PrepG1
GOTO _Circle

*aCircleG2
GOSUB _PrepG2
GOTO _Circle

*aCircleXor
GOSUB _PrepXor

*_Circle
A+C.A % Compute cx (cx+Rayon)
C=0.M C+C.A CSL.A CSL.A CSL.W D=C.M % Dm=Rayon*2
CSL.M CSL.M C=A.A A=C.W % Am=Rayon*2=X
GOSUBL aGrey?
D0-10 C=DAT0.A D=C.X % Read Max Y
D0+5 C=DAT0.A BCEX.A RSTK=C % Read Max X, Save cy
D0+5
GOSUBL ComputePixel % Compute pixel mask and D0 and D1
C=RSTK % Restore cy
ASR.M ASR.M % X=Am=Rayon
C=0.M % Y=Cm=0
B=0.M % Er=0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
?A=0.M RTY % Quick exit on Radius=0
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
P=14
% Octant 1 From a=0 to - 45
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C+1.X % Y+2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 2. From a=-45 to -90
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A-1.X % X-2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C+1.X % er+Y, Y+2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 3 From a=-90 to -135
% Doing the same thing again, but inverting Y and X
C=D.M A=C.M B=0.M C=0.M % ReInitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A-1.X % Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 4. From a=-135 to -180
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M C-1.X % X-2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A-1.X % er+Y, Y+2, cx-1
CSRB.S ?C#0.S { C+8.S D1-1 D0-1 } % cx-1 part2
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 5 From a=-180 to -225
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M C-1.X % Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 6. From a=-225 to -270
{
?A#0.P EXIT % End of octan?
B-A.M % er - X
A-2.M A+1.X % X-2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M C-1.X % er+Y, Y+2, cy-1
CR3EX.A % cy-1 part2
AD0EX A-C.A AD0EX
?ST=0.=fGray { AD1EX A-C.A AD1EX }
CR3EX.A
}
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 7 From a=-270 to -315
% Doing the same thing again, but inverting Y and X again and differantly
C=D.M A=C.M B=0.M C=0.M % Reinitializing values
A-1.M C+1.M % Because the bresenham algo need to add -(X-1) or
Y+1, we already add the 1 to X and Y
{
?A<=C.M EXIT % End of octant? X<Y
B+C.M % er+Y
C+2.M A+1.X % Y+2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2

?B<A.M % if error >= X
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
} % Else, error<0
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
% Octant 8. From a=-315 to -360
{
B-A.M % er - X
A-2.M C+1.X % X-2, cy+1
CR3EX.A % cy+1 part2
AD0EX A+C.A AD0EX
?ST=0.=fGray { AD1EX A+C.A AD1EX }
CR3EX.A
?B=0.P % if B>0 then do nothing
{ % else,
B+C.M C+2.M A+1.X % er+Y, Y+2, cx+1
C+C.S SKNC { C+1.S D1+1 D0+1 } % cx+1 part2
}
?A#0.P EXIT % End of octan?
{ ?A>=B.X EXIT ?C>=D.X EXIT GOSBVL =IRAMBUFF } % Pixon ( With cliping )
UP
}
P=0 RTN
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Return informations on a graphic object %
% %
% Input: D0: @ grob %
% P=0, HEX %
% Output: ST=0/1 fGray %
% 0: Grob W&B %
% 1: Grob Gray %
% R4a: Plane Length in nibbles %
% R3a: Line Width %
% D0: @ first pixel of the screen %
% %
% uses: RSTK1, Ca, Carry, mp %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*aGrey?
R3=A.A A=B.A R4=A.A % Save Aa in R3 and Ba in R4
D0+15 C=DAT0.A C+7.A CSRB.A CSRB.A CBIT=0.0 % Compute line width
D0-5 A=DAT0.A % read Grob height
B=0.A A+A.A % Some special multiplication
?CBIT=0.1 { B+A.A } A+A.A % I know that
?CBIT=0.2 { B+A.A } A+A.A % 1: C is between 0 and
2048/2=1024
?CBIT=0.3 { B+A.A } A+A.A % 2: C is even
?CBIT=0.4 { B+A.A } A+A.A % So I explode the loop,
?CBIT=0.5 { B+A.A } A+A.A % testing only bit 1 to 9
?CBIT=0.6 { B+A.A } A+A.A % of C
?CBIT=0.7 { B+A.A } A+A.A % an other great thing is
?CBIT=0.8 { B+A.A } A+A.A % that C is not change during
this
?CBIT=0.9 { B+A.A } A+A.A % multiplication
?CBIT=0.10 { B+A.A } A+A.A
?CBIT=0.11 { B+A.A } A+A.A
?CBIT=0.12 { B+A.A } A+A.A
?CBIT=0.13 { B+A.A } A+A.A
?CBIT=0.14 { B+A.A } A+A.A
?CBIT=0.15 { B+A.A }
D0-5 A=DAT0.A A-15.A D0+15 % read grob size (only bitplans)
D0 point on the first pixel
ST=0.=fGray ?A=B.A { ST=1.=fGray } % if plane length = grob plane
size, we are in W&B
A=B.A AR4EX.A B=A.A % Save Bitplane size, restore B
CR3EX.A A=C.A % Save Line Width, restore Aa
RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Compute Address of first pixel to draw %
% Input: R3a: Line Width ( Even, < 512 ) %
% R4a: Plane Width %
% St=x fGray %
% Aa: X %
% Ca: Y %
% D0: @ First pixel %
% P=0, HEX %
% %
% return: D0: @ Pixel, D1: @ Pixel plan 2 %
% Cs: Pixel Mask %
% P=0, HEX %
% %
% Uses: Cw, D0, D1, Carry, P, RSTK1 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*ComputePixel
AD0EX ABEX.A % We must now compute the position of the first
pixel
AR3EX.A C+C.A % D0=Aa Ba=D0 Aa=Ba
?ABIT=0.1 { B+C.A } C+C.A % We Want: D0=@Pix+Y*Width+X/4, D1=D0+PlanSize if
fGray
?ABIT=0.2 { B+C.A } C+C.A % Step 1: Compute B=B+Y*Width ( Width even <=
512 )
?ABIT=0.3 { B+C.A } C+C.A
?ABIT=0.4 { B+C.A } C+C.A
?ABIT=0.5 { B+C.A } C+C.A
?ABIT=0.6 { B+C.A } C+C.A
?ABIT=0.7 { B+C.A } C+C.A
?ABIT=0.8 { B+C.A } C+C.A
?ABIT=0.9 { B+C.A }
AR3EX.A % restore Aa=Ba
CD0EX D0=C % Ca=X
P=C.0 CSRB.A CSRB.A B+C.A % Compute pixel address
C=R4.A C+B.A ?ST=1.=fGray { C=B.A } D1=C % Point on pixel on plan 2 if grey
else, stay on same plan
ABEX.A AD0EX % Restore Ba=Ba, Aa=Aa, D0 point on pixel on plan
1
LC 1248124812481248 P=0 % Compute pixel mask
RTN

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% For many of the graphical primitives, %
% The same routine is used all the time, but %
% a different Pixon routine is copied in IRAMBUFF %
% The _Prepx routine are used %
% to copy the pixon routine in RAM. Note that to %
% decrease teh amount of used register by this %
% routine, the code is charged in Cms using LC with%
% P=3. in order to have a readable code, the LC %
% mnemonic is not used, but is replaced by the hex %
% opcode: 3x where x is the number of nibbles to %
% load in C-1. at the end of the routine, C[34] is %
% loaded with 00. This means that this part of the %
% Ca register is lost %
% All prep routines use Cms, D1 and carry. %
% at the end of the routine, Ca=Ca&00fff %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*_PrepB
D1=(5)=IRAMBUFF P=3
$3B A=DAT0.S A!C.S DAT0=A.S DAT1=C.M D1+12
$38 ?ST=0.=fGray RTY A=DAT1.S DAT1=C.M D1+9
$39 A!C.S DAT1=A.S RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepW
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S A!C.S A-C.S DAT1=C.M D1+11
$38 DAT0=A.S ?ST=0.=fGray RTY DAT1=C.M D1+9
$3A A=DAT1.S A!C.S A-C.S DAT1=C.M D1+11
$35 DAT1=A.S RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepG1
D1=(5)=IRAMBUFF P=3
$3B A=DAT0.S A!C.S DAT0=A.S DAT1=C.M D1+12
$38 ?ST=0.=fGray RTY A=DAT1.S DAT1=C.M D1+9
$3C A!C.S A-C.S DAT1=A.S RTN DAT1=C.M D1+12 DAT1=C.S
LC 00 P=0 RTN

*_PrepG2
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S A!C.S A-C.S DAT1=C.M D1+11
$38 DAT0=A.S ?ST=0.=fGray RTY DAT1=C.M D1+9
$3B A=DAT1.S A!C.S DAT1=A.S DAT1=C.M D1+12
$31 RTN DAT1=C.M
LC 00 P=0 RTN

*_PrepXor
D1=(5)=IRAMBUFF P=3
$3A A=DAT0.S B=A.S A!C.S DAT1=C.M D1+11
$3A B&C.S A-B.S DAT0=A.S DAT1=C.M D1+11
$34 ?ST=0.=fGray RTY DAT1=C.M D1+5
$3A A=DAT1.S B=A.S A!C.S DAT1=C.M D1+11
$3C B&C.S A-B.S DAT1=A.S RTN DAT1=C.M D1+12 DAT1=C.S
LC 00 P=0 RTN

"Yann" <kdo...@gmail.com> wrote in message
news:e7958dff-4119-4c78...@m73g2000hsh.googlegroups.com...

Jacob Wall

unread,

Aug 5, 2008, 9:00:04 PM8/5/08

to

Hello Yann,

Yes I would be very interested in seeing what might be possible with
ASM and look forward to seeing your progress.

Jacob

Yann

unread,

Aug 6, 2008, 9:38:44 AM8/6/08

to

Thanks for the code, Cyrille

It seems we had a similar idea, as i started too with a simple Full
Circle algorithm,
in order to provide some "training ground" for arc
(Circle are simpler, there is no need to check bearing, just draw the
pixel).

Hence i already had an SASM code ready by the time i found yours.
Anyway, the goal being to learn, it's always good to be able to
compare.
It seems your code is using a different syntax than SASM, probably
MASD.
I read here and then that MASD syntax is more powerfull than SASM,
but for the time being, i'm not even able to compile such a piece of
code using Debug4x.
Could you give me a hint at how to do that ?

Anyway, for those interested, here are some results for the current
Full Circle algorithm, using an HP48SX :
FastCircle on HP48X :
30 pixels radius (65,30,30) ===> 0.24s
60 pixels radius (30,-10,60) ==> 0.31s
200pixels radios (60,215,200) => 0.75s

This is a first try, so it most probably can be optimized more.
But it was very usefull in helping to understand how to write ASM
Code, and especilly how to initialise graphic area, write pixels, and
so on.

This implementation allows center outside of screen (including
negative numbers), and radius up to 1000 pixels.

The ASM code can draw circles on any Grob of any size.
However, the herebelow proposed binary has a SysRPL header, which
checks arguments, and provide the Graphic Grob as an input to ASM.
Therefore, the circle is drawn in the Graphic Area, visible using
GRAPH command (or LEFT arrow).
A nice side effect is that it should support both 64 & 80 lines
systems without any problem.
And if Graphic Area is "larger than screen", it will make use of its
full size.

Binaries for both HP48 & HP49/50 are available here :
http://www.xooimage.com/app/upload.php
FastCircle V1.0 is 411 Bytes long.

SASM Source Code can also be downloaded here :
http://img22.xooimage.com/files/c/a/2/fastcircle-v1.0-sourcecode-56ad87.zip
Code is in english, but comment are written in french, though....

Cyrille, i tried to compare both source code, but still lack some
experience to properly understand MASD.
It seems your code is much more ambitious, as it can provide GreyScale
output.
You're welcomed for any comment.

OK, now for the more complex arc algorithm...

Regards

cyrille de brebisson

unread,

Aug 6, 2008, 3:39:36 PM8/6/08

to

hello,

if you want to use that source, you probalby want to put a CODEM
ENDCODE
around the assembly part so that it gets compiled by the right compiler in
debug4x...

read the documentation for masd in the advanced user manual for the HP49/50
available online for more info on how it works...

but basicaly, it is simple:
you can have as many instructions per line as you want.
you can separate a feild from an instruction by a '.'
in the case where an instruction is R1=R1operationR2, you can ommit the R1=
part
you can create blocks using { } and use instructions to get in and out of
these blocks such as:
EXIT (Goto end of block), EXITC (exit Carry), EXITNC (exit if no carry), UP
(goto begining of block), UPC, UPNC
and a block opening just after a test means that if the test is true, the
blobk is skipped as in: ?A=B.A { A+A.A } is compiled into
?A=B A
SKIPYES .end
A=A+A A
.end

note, you can exit or go up more than one block at a time using UPn/EXITn
where n is a number as in:
LA 100
{
LC 80
{
?A=C.B UP2
C-1.A UPNC
}
A-1.X UPNC
}

Labels are declared with a leading '*', but since you have these nices
blocks, you rarely need them.
regards, cyrille

"Yann" <kdo...@gmail.com> wrote in message

news:24520f6e-d38f-44ea...@s50g2000hsb.googlegroups.com...

Yann

unread,

Aug 7, 2008, 9:32:12 AM8/7/08

to

Hello Jacob

Here is a first try at an ASM Arc Algorithm :
http://img21.xooimage.com/files/8/6/0/fast-approximative-arc-v1.0-570f4c.zip

I've called it Fast Approximative Arc, because it can be wrong by one
pixel.
As i was not smart enough to properly understand how you did to check
if a pixel must be drawn,
i tried to invent an "homebrew" algorithm, based on circumference
length
(The most famous : Circ = 2 x Pi x R).
Well, on paper, it looks pretty and simple, but it comes with problems
of its own, especially estimating distance from a trail of pixel and
correcting cumulated error from the "real" circle.

As a consequence, this version is not directly comparable with yours.
I would appreciate if you could give me a hand at understanding how
you test [pixel within arc], this way it would be possible to create a
code which use the same algorithm, remaining differences only coming
from programming language.

Well, nonetheless, i'm providing binaries for this version, because it
can still usefull for comparison in the future.
Input is as follows : X0, Y0, Radius (in pixel), StartBearing,
EndBearing (in degree, with 0° being North).
As approximation error increases with Radius, this version limits R to
60 max.

Here are some performance results, from an HP48SX (i expect later
model to perform better) :
30 pixels radius
Full : 0.50s
Half : 0.29s
Quarter : 0.18s
Eight : 0.12s
1° : 0.07s (Note : 0° means "Full Circle", so cannot be tested)

Yann

unread,

Aug 7, 2008, 10:24:27 AM8/7/08

to

Thanks Cyrille for these explanations,
this helped me a lot to understand MASD code, well just a little for
now.

I tried to compile your source, but Debug4x complains of 2 missing
entries :
=IRAMBUFF : which i could find over Internet ( =IRAMBUFF EQU #800F5 )
but is not part of "HP standard" nor "Carsten stable" entry list,
which makes me worry if this mnemonic is valid throughout the entire
HP Calculator range.
=fGray : which i could'nt find anywhere, seems related to GrayScale
plotting ?

I guess this sample code is part of a much bigger system, where it is
supposed to be integrated.

Anyway, it provided some very usefull information on how to optimize
ASM Code, and here are my findings :

1) First visible difference is in the algorithm itself.
I'm using X²+Y²<=R²+Err, while Toolbox only cares of Delta between
each iteration.
This second choice seems more efficient, so i changed my code to
reflect it,
however it produced virtually no difference.
The reason for this is that both methods are using efficient
additions, and any advantage for the second algorithm is probably
dwarfed by other parts of the code, which are much more costly .

Hence i started to do some cycle count to find out, especially on the
inner loop
And results are pretty interesting.

2) Most costly single operation is the multiplication to find the
correct GROB nibble. 360 cycles per pixel !

Solution : instead of relying on #MUL call, your code propose to
explode iterations directly within the code.
This will most certainly bring sizable performance benefits (difficult
to do worse than 360 cycles..), so i will mimic this idea.

3) I tend to use Scratch Registers (R0 to R4) like store/load areas.
Because i need much more variables that there are Scratch registers, i
store several of them per register, using CSLW5, ASLW5, etc. calls to
switch between variables.
Bad idea, each of these calls costs 124 cycles. Quite sizable. i
wasn't expecting that.
There were 6 such calls within the inner loop. By reducing them to 4,
i can already bench some sensible benefits.
I believe this is the current main performance bottleneck, as there
are calls like these everywhere in the code.

Solution : Your code use a much clever field separation, using X & M.
As a consequence, values are limited to -2048/+2047, but this looks
enough for such an algorithm.
What is even more striking, is that you keep most datas into Working
registers (A,B,C,D), seldomly using ScratchRegisters for storing. Now
that's a performance. And i guess this makes some terrible results.
I tend myself to clean Working registers between each part of the
code, only relying on Scratch Register to properly keep datas.
I'm not sure i will be able to go as far. But as a minimum,
eliminating CSLW5-like calls should be a goal, and the X/M field
separation seems a very clever idea to achieve that. I will try to
implement it later, because it will require some significant code
refactoring ( well, nearly rewriting everything...)

4) I had one w->W call within the inner loop (88 cycles), which has
been eliminated.
It immediately produced measurable benefits.

So i guess, one of the conclusions : counting cycles pay off.

5) Coding differences
5.1 ) I tend to re-use as much as possible "long" code sequences,
using GOSUB/RTN and different initialisation variables.
Toolbox code, on the other hand, is very straightforward, even not
relying on circle symetry.
As a consequence, Toolbox is likely to produce longer code, but with a
performance advantage.
For the time being, I think i will keep this part as it is, as i like
the memory footprint advantage.
5.2 ) Toolbox makes little use to external calls, such as POP#, #MUL,
and so on.
On the other hand, i was heavily using them up to now.
As a consequence, Toolbox code is probably longer, but faster, and
even much better, controlable.
For example, POP# fills A.A with Stack's system integer, but also
destroy C & D in the process.
#MUL gives the result in B, but destroy source data within A & C.
I think this is a pretty significant advantage, especially within
inner loops, so i will follow your example.

Well, that's all for now,
Thanks for the tips, this was very interesting to learn,
i will try to make use of them, and produce a better version over the
next few days.

Regards

Jacob Wall

unread,

Aug 7, 2008, 9:09:00 PM8/7/08

to

On Aug 7, 6:32 am, Yann <kdo4...@gmail.com> wrote:
> Hello Jacob
>

> Here is a first try at an ASM Arc Algorithm :http://img21.xooimage.com/files/8/6/0/fast-approximative-arc-v1.0-570...

Hello Yann,

I'll repost the program I came up with and put some explanatory notes
in along with the code to try and explain it. This is something I am
not very good at, but should get in the habit of doing regardless. So
here goes from the top:

***// This is preparing the input for the program and ends up being:
LAM 1 - LAM 8 are all lists of {MIN MAX} pixel in x (or y) direction
and by default I declare all 8 octants as being MIN= BINT0 and MAX=
(square root of half the radius squared), meaning the full octant
would be drawn if not otherwise interfered
LAM 9 = x to be used in bresenham loop, intially BINT0
LAM 10 = d to be used in bresenham loop, intially 1-radius
LAM 11 = y=radius
LAM 12 = bearing to end of arc, clockwise from North (as a surveyor
this makes more sense to me)
LAM 13 = bearing to beginning of arc
LAM 14 = Y0, y pixel coordinate of arc center
LAM 15 = X0, x pixel coordinate of arc center
//***

11GETLAM
UNCOERCE
12GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
12GETLAM
COERCE

***//The above calculates the delta pixel in x and y directions from
center to the end of the arc by using the polar to rectangular %POL>
%REC and leaves in the stack:
3: deltay (Absolute) from center of arc to pixel that defines the end
of the arc as a BINT
2: deltax, same as above
1: bearing to end of arc
NOTE: because my program was designed to be intuitive for myself to
handle compass bearings, the x and y values will be switched later on,
conveniently if you swap x and y rectangular values you can use %REC>
%POL and get compass bearings, but that is really another topic
//***

11GETLAM
UNCOERCE
13GETLAM
%POL>%REC
%ABS
SWAP
%ABS
SWAP
COERCE2
13GETLAM
COERCE

***//This does the same as before except for the beginning of the arc
so now we have 6 values in the stack:
6: deltay (Absolute) from center of arc to pixel that defines the end
of the arc as a BINT
5: deltax, same as above BINT
4: bearing to end of arc BINT
3: deltay (Absolute) from center of arc to pixel that defines the
beginning of the arc as a BINT
2: deltax, same as above BINT
1: bearing to beginning of arc BINT
//***

***//Ok, there's a number of things going on the 2 loops going on, the
"outer" loop first takes the bearing to first the beginning of arc in
the first pass, (then bearing to end of arc in second pass) and then
passes it to the "inner" loop to figure out which octant that bearing
falls in, once a match is made, 2 flags are generated that enable the
decisions whether for the corresponding arc, the bresenham loop is
increasing the x or the y values during each iteration and also if the
iterations proceed clockwise or counterclockwise, for example the
first octant if looking at it from north to 45 degrees clockwise uses
the x value as its going through the iterations and the direction is
clockwise, the second octant increases the y value and the iteration
goes counterclockwise. So basically the end product here is that for
the 2 octants, (or 1 octant) these two loops put the correct {MIN MAX}
values in the LAM1-LAM8 lists that will tell the bresenham loop
whether or not to draw the pixel during the iteration.
//***

13GETLAM
12GETLAM
#>
BINT9
BINT1
DO
DUP
INDEX@
12GETLAM
#>
INDEX@
13GETLAM
#<
ROT
ITE
AND
OR
IT
::
BINT0
INDEX@
PUTLAM
;
LOOP
DROP

***//This loop stores BINT0 into any octant LAMs that are not
contained in the arc, (which is not a list, and later a test is
performed to check if the corresponding LAM is a list or not
//***