Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

6502 Delay Loops using Applesoft

349 views
Skip to first unread message

Allen Bong

unread,
Nov 25, 2010, 6:08:36 PM11/25/10
to
Hi,

Has anyone written a gereral delay loop calculating program in
Applesoft, to calculate the constants required to gererate the
required time delays in 6502 codes?
For example:

ZP1 EQU $XX ;ANY UNUSED ZP ADDR
ZP2 EQU $XX ;ANY UNUSED ZP ADDR

DELAY PHA
LDA K1
STA ZP1
LOOP1 LDA K2
STA ZP2
LOOP2 DEC ZP2
BNE LOOP2
DEC ZP1
BNE LOOP1
PLA
RTS

I have searched through google and never found one. But there are
some written using JAVA for other cpu and mcu. I ask because I needed
them from time to time doing small projects and if someone has already
written one, I wouldn't have to reinvented the wheel or else........

Thank you.

Allen

Michael J. Mahon

unread,
Nov 25, 2010, 7:26:58 PM11/25/10
to

I've done several projects using such techniques, but each is customized
to its particular usage.

When I get a chance, I'll provide more detail.

-michael - NadaNet 3.1: http://home.comcast.net/~mjmahon

Michael J. Mahon

unread,
Nov 26, 2010, 1:16:35 AM11/26/10
to

The sound synthesis routines I use generate cycle-accurate pulse widths.
The assembly language code for them is generated by an Applesoft program
that schedules the "work" instructions around the precisely timed events
using delay padding instructions where necessary.

I have not created any "general purpose" code for generating delays,
since every situation is different and usually requires adapting the
code to the specific situation.

I began a few years ago to write an article on the general topic of
cycle-accurate computing, considering several different cases that
arise in practice. For several reasons, I never completed this article,
but I'm happy to make my draft available to you. (I've emailed the
Word document to you.)

The most general technique for generating arbitrary cycle delays is
the use of nested loops, as you suggest. To obtain greater resolution,
I generally use index register loops (5 cycles per inner loop). With
nexted loops and arbitrary initialization of X and Y, it is possible to
create compact delay routines with 5-cycle resolution and a range of up
to over 328000 cycles. The total delay can then be easily padded to
single-cycle accuracy by adding a few extra instructions outside the
loop.

If a run-time variable precise delay is needed, a combination of
computed loop constants and branching to a variable target can
achieve any desired delay at the cost of adding the computation
code outside the timed operation(s).

If you already have a loop structure in mind, then algebra will
allow you to find the equation for the initial constants, as well
as reveal the resolution available. Each loop will have its unique
timing equation, requiring its own constant computation.

In the example you give, you re-initialize ZP2 for each iteration
of LOOP1. This is not necessary, and it will increase the range
of delays if you only initilize it once and let it cycle from 255
for all subsequent iterations.

The resolution of your loop is 8 cycles.

For my usual loop:

ldy #ycnt ; (2 cycles)
ldx #xcnt ; (2 cycles)
delay dex ; (2 cycles)
bne delay ; (3 cycles in loop, 2 cycles at end)
dey ; (2 cycles)
bne delay ; (3 cycles in loop, 2 cycles at end)

Notice that only the duration of the first X loop is set by �xcnt�,
since X is not reloaded after the first pass through the loop. This
is actually an advantage, since all Y loop iterations after the first
will incur the maximum delay in the X loop, and the first iteration
can be used to �trim� the total delay with a resolution of 5 cycles.

When this nested loop is executed, for all but the final Y iteration,
the DEY and BNE will add another 5 cycles to the loop execution time,
but on the final iteration, they will only add 4 cycles. As a result,
the execution time for this nested delay loop is 2 + 2 + (5 * xcnt) - 1
+ (ycnt-1) * (5 * 256 � 1 + 5) + 4, or, after simplification:

1284 * (ycnt - 1) + 5 * xcnt + 7

where �xcnt� and �ycnt� can range from 1 to 256 (which is represented by 0).

This is the kind of analysis which can be applied to any such nested
loop scheme, and the equation can then be solved for ycnt and xcnt
(in that order).

-michael

NadaNet and AppleCrate II: parallel computing for Apple II computers!
Home page: http://home.comcast.net/~mjmahon

"The wastebasket is our most important design
tool--and it's seriously underused."

Allen Bong

unread,
Nov 26, 2010, 2:15:30 AM11/26/10
to
> tool--and it's seriously underused."- Hide quoted text -
>
> - Show quoted text -

Thank you very much Michael, for providing such informative info to
me. I will read through your post and document carefully.

I will reply here if there is any more questions.
Best regards,

Allen

Scott Alfter

unread,
Dec 2, 2010, 12:12:56 PM12/2/10
to
In article <02b1803a-1471-40cb...@o23g2000prh.googlegroups.com>,

Allen Bong <allenb...@gmail.com> wrote:
>Has anyone written a gereral delay loop calculating program in
>Applesoft, to calculate the constants required to gererate the
>required time delays in 6502 codes?

The handful of times I've needed timing-critical delays (in an audio player
way back in the day, and more recently in some hardware bit-banging code),
it was easy enough to do them manually. The 6502 datasheet tells you how
many cycles each instruction needs to execute:

http://archive.6502.org/datasheets/rockwell_r650x_r651x.pdf
(go to page 10; it's the number in the lower right corner for each
instruction)

The Apple II runs at approximately 1 MHz (it's actually a smidge faster than
that), so one cycle takes about one microsecond. The shortest run time for
any instruction is two cycles, or about two microseconds. For short delays
(small multiples of 2 us), a block of NOPs will do. For longer delays, the
math to calculate how long a loop will take is fairly simple. Consider
this:

LDY #10
]1 DEY
BNE [1

The first two instructions take 2 cycles each. The branch takes 3 cycles if
the branch is taken to the same page, 4 if the branch is taken to a
different page, or 2 if the branch isn't taken. Assuming that the code is
all on the same page (which it will be most of the time, but you'd want to
verify where the assembler decides to place your code), LDY executes once,
DEY executes 10 times, and BNE is taken nine times and not taken once.

2 LDY
10*2 DEY
9*3 BNE taken
+ 2 BNE not taken
=====
51 cycles total

For nested loops, you'd want to calculate the runtime for the inner loop and
plug that into your calculation for the outer loop.

I suppose if I had a regular need for calculating delay loops, I might've
knocked together an app to do that. As shown above, though, they're simple
enough to put together by hand as you need them.

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( http://alfter.us/ Top-posting!
\_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?


--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Michael J. Mahon

unread,
Dec 4, 2010, 3:12:49 AM12/4/10
to
Scott Alfter wrote:
> In article <02b1803a-1471-40cb...@o23g2000prh.googlegroups.com>,
> Allen Bong <allenb...@gmail.com> wrote:
>
>>Has anyone written a gereral delay loop calculating program in
>>Applesoft, to calculate the constants required to gererate the
>>required time delays in 6502 codes?
>
>
> The handful of times I've needed timing-critical delays (in an audio player
> way back in the day, and more recently in some hardware bit-banging code),

Scott, I'd like to acknowledge you for your SoftDAC series, and
in particular for your original (mini-assembler) 3-bit version.

I downloaded it from Applelink Personal Edition (now AOL) in 1990, and
it started me on my various sound players/synthesizers, culminating in
DAC522 in 1993.

It was my amazement at hearing my Apple say "Insert Disk" quite
clearly that got me hooked on software DACs and what can be done
with them.

Thank you!

> it was easy enough to do them manually. The 6502 datasheet tells you how
> many cycles each instruction needs to execute:
>
> http://archive.6502.org/datasheets/rockwell_r650x_r651x.pdf
> (go to page 10; it's the number in the lower right corner for each
> instruction)

There are two pages (one for the 6502 and one for the 65C02) in Jim
Sather's _Understanding the Apple //e_ that provides even greater
detail on instruction timing, detailing the bus activity during each
cycle of each op. This can be very useful in trimming down to one
cycle, since the actual time of an access is often the critical event.
(The .doc I sent Allen contains those two pages as reference material.)

> The Apple II runs at approximately 1 MHz (it's actually a smidge faster than
> that), so one cycle takes about one microsecond. The shortest run time for
> any instruction is two cycles, or about two microseconds. For short delays
> (small multiples of 2 us), a block of NOPs will do. For longer delays, the
> math to calculate how long a loop will take is fairly simple.

I put together a short table of compact time delay instruction sequences
(depending on which register (if any) is available at the moment.

And for precise timing, Sather computes the actual average clock
frequency of Apple II models precisely. For most purposes, the rate
of 1.0205MHz is close enough.

When jitter is important, the 140ns "long cycle" at the end of each
scan line may need to be considered. Hopefully, it is not a problem,
since the only way to control it is to synchronize with the video
generator, which creates its own timing issues.

-michael

NadaNet 3.1 for Apple II parallel computing!

Allen Bong

unread,
Dec 4, 2010, 8:05:33 AM12/4/10
to
On Dec 3, 1:12 am, sc...@alfter.DIESPAMMERSDIE.us (Scott Alfter)
wrote:
> In article <02b1803a-1471-40cb-a69f-37dbab581...@o23g2000prh.googlegroups.com>,
> (IIGS(http://alfter.us/           Top-posting!

>  \_^_/ rm -rf /bin/laden            >What's the most annoying thing on Usenet?
>
> --- news://freenews.netfront.net/ - complaints: n...@netfront.net ---

Hi Scott,

Actually, I was thinking something of a more general purpose form of
delay-loop calculator. Not just for use by Apple 2 only. For example
you might design a 6502 SBC running at 1.5MHz or 2MHz using 6502A or
6502B or 65C02 (CMOS).

Something like this one for the PIC would be most helpful. Just enter
the oscillator frequency used by 6502, how much delay time you wanted
in uS, mS or S, then just press "calculate", and a subroutine is
produced. Very cool, isn't?

Like the ones here:

http://www.golovchenko.org/cgi-bin/delay

http://www.biltronix.com/picloops.html

I don't expect to have the convenient of that but at least one written
in Applesoft would be highly appreciated. If no one had done that, I
might try to roll one out myself in Applesoft.

Allen

Allen Bong

unread,
Dec 4, 2010, 8:28:59 AM12/4/10
to
On Dec 4, 4:12 pm, "Michael J. Mahon" <mjma...@aol.com> wrote:
> Scott Alfter wrote:
> > In article <02b1803a-1471-40cb-a69f-37dbab581...@o23g2000prh.googlegroups.com>,

Michael,

Sorry that I didn't reply you here sooner as I was tied up by some
projects that I was finishing off. I read your doc. with great
interest and there was a typo in the tables in Page 5 of your text.


Delay Padding Code
Delay cycles Bytes Code sequence Alternate code sequence
1 - (Not possible)
2 1 nop
3 2 lda zp sta zptrash
4 2 nop; nop
5 3 nop; lda zp nop; sta zptrash
6 3 nop; nop; nop
7 2 php; plp
8 4 nop; nop; nop; nop
9 3 php; plp; nop
10 4 php; plp; lda zp php; plp; sta zptrash
11 4 php; plp; nop; nop
12 3 jsr rtsloc

In Dealy 3,5,10, instead of "lda zptrash", you typed in "sta zptrash".

You used 2 examples to explain how to generate shorter delays of
producing 4KHz audio tone and the longer delays of producing 300Hz
tone. Both are very well explained, but I have difficulties
understanding hwo to calculate the no. of cycles use in the formulae
used in the delay loop below:


ldy #ycnt ; (2 cycles)
ldx #xcnt ; (2 cycles)
delay dex ; (2 cycles)
bne delay ; (3 cycles in loop, 2 cycles at end)
dey ; (2 cycles)
bne delay ; (3 cycles in loop, 2 cycles at end)

is 2 + 2 + (5 * xcnt) - 1 + (ycnt-1) * (5 * 256 – 1 + 5) + 4

I have no problems with 2+2+(5*xcnt)-1, but (ycnt-1)*(5*256-1+5)+4, I
have some difficulties to understand. Where does the -1 +5 and +4
came from? Would you explain a little bit more?

As for the servo delay, I haven't come to that yet.

Thank you very much.

Allen


Michael J. Mahon

unread,
Dec 4, 2010, 5:01:48 PM12/4/10
to

Actually, that's not a typo. ;-)

A reason for using "sta zptrash" instead of "lda zp" is to preserve
the content of the A register and the flags (at the cost of destroying
a zero page location: zptrash). This is often a good tradeoff.

> You used 2 examples to explain how to generate shorter delays of
> producing 4KHz audio tone and the longer delays of producing 300Hz
> tone. Both are very well explained, but I have difficulties
> understanding hwo to calculate the no. of cycles use in the formulae
> used in the delay loop below:
>
>
> ldy #ycnt ; (2 cycles)
> ldx #xcnt ; (2 cycles)
> delay dex ; (2 cycles)
> bne delay ; (3 cycles in loop, 2 cycles at end)
> dey ; (2 cycles)
> bne delay ; (3 cycles in loop, 2 cycles at end)
>
> is 2 + 2 + (5 * xcnt) - 1 + (ycnt-1) * (5 * 256 – 1 + 5) + 4
>
> I have no problems with 2+2+(5*xcnt)-1, but (ycnt-1)*(5*256-1+5)+4, I
> have some difficulties to understand. Where does the -1 +5 and +4
> came from? Would you explain a little bit more?

OK.

The initial "2 + 2 + (5 * xcnt) - 1" is the delay of the initial load
instructions plus the delay of the first execution of the X loop, with
xcnt as the number of iterations. The "-1" at the end compensates for
the fact that the final execution of the bne with X = 0 only takes 2
cycles, not 3.

The next part of the equation deals with any Y iterations. If ycnt = 1,
then the "dey; bne" doesn't branch, and the only additional delay is the
4 cycles taken by those instructions (that's what the "+ 4" at the end
is about). If ycnt is not equal to one, then there will be ycnt-1
iterations of the X loop with X=0 (256 iterations) at "5 * 256 – 1"
cycles per iteration, plus an additional 5 cycles for the dey and the
*taken* bne (thus, "+ 5").

> As for the servo delay, I haven't come to that yet.
>
> Thank you very much.

You're very welcome.

-michael

NadaNet 3.1 for Apple II parallel computing!

Home page: http://home.comcast.net/~mjmahon/

Michael J. Mahon

unread,
Dec 4, 2010, 5:35:42 PM12/4/10
to

This would be an interesting and instructive exercise, but I predict
that it will seldom be "useful", even to you. This is one of those
cases where real "generality" is only achieved by implementing a
choice among a host of specialized approaches.

There are two great design pitfalls: too much generality and too
much specialization. ;-) And the sweet spot is different for every
different application and context.

I've written several macros to generate <n cycles> or <n milliseconds>
of delay, but, like Scott, I find that a general purpose delay generator
is seldom useful. In NadaNet, I find application for two delay macros
that are more specialized and useful in the context of NadaNet, but
particular situations often arise that offer (or require) custom
timing approaches.

For example, there is often "work" to be done which occupies some part
of the delay needed, and the cycles required for the work must be
computed accurately and subtracted from the total delay requirement
before executing the padding routine. The process of computing this
"work delay" and the effort of making it require the same time in spite
of any conditional branches involved, is typically much more effort than
writing the padding routine, so very little effort is saved. (I've
often wished for an assembler that kept the current cycle count, just
like it keeps the current program counter, but it should be a *dynamic*
cycle count, and there's the rub ;-).

Further, it is common that certain registers or flags are important at
the point that the delay must occur, and a "standard" method will often
conflict with that requirement.

If you work with several situations requiring sometimes a few, sometimes
several, and sometimes many cycles of delay, you will find that while a
few "templates" are useful, no single approach or even a small set of
approaches is optimal.

Allen Bong

unread,
Dec 5, 2010, 4:01:19 PM12/5/10
to
> Allen- Hide quoted text -

>
> - Show quoted text -

Oops! Sorry for thinking that that was a typo.

After your explanation, now I am becoming clearer on the calculationo
of the delay loops. This will ease me on my future projects when a
delay is needed.

Thanks for your time again, Micheal.

Allen

Allen Bong

unread,
Dec 5, 2010, 4:18:05 PM12/5/10
to
> tool--and it's seriously underused."- Hide quoted text -

>
> - Show quoted text -

> For example, there is often "work" to be done which occupies some part


> of the delay needed, and the cycles required for the work must be
> computed accurately and subtracted from the total delay requirement
> before executing the padding routine. The process of computing this
> "work delay" and the effort of making it require the same time in spite
> of any conditional branches involved, is typically much more effort than
> writing the padding routine, so very little effort is saved. (I've
> often wished for an assembler that kept the current cycle count, just
> like it keeps the current program counter, but it should be a *dynamic*
> cycle count, and there's the rub ;-).
>

Yes, I have met that situation too. Recently I was writing my morse
code beeper program and found that I need a 700Hz tone generator for
92.3mS. So I calculated the delay required to click the speaker at
700 times per second embedded with "LDA $C030". Then on the outer
loops I calculated the number of loops x 700Hz loops = 92.3mS.

So my outer loop count constant = 92.3mS/(time taken to click speaker
at 700Hz). Can that be considered as doing the job (clicking speaker)
inside delay loops?

Though it may be a waste of effort writing the delay loops, I guess
I'd still be tempted to write one - OK, just as an exercise. Projects
make my brain more active and my heart cheering when it works.

Cheers,

Allen


Michael J. Mahon

unread,
Dec 5, 2010, 7:50:36 PM12/5/10
to

To generate a 700Hz signal, you'll need to toggle the speaker 1400
times per second.

Each reference to $C030 switches its state from low to high or high to
low, so two toggles are necessary to generate a complete rectangular
wave cycle.

> So my outer loop count constant = 92.3mS/(time taken to click speaker
> at 700Hz). Can that be considered as doing the job (clicking speaker)
> inside delay loops?

That's the right way to go about it. The problem is hierarchical:
first, generate one cycle of a 700Hz tone, then repeat it as many
times as necessary to create the length you need.

> Though it may be a waste of effort writing the delay loops, I guess
> I'd still be tempted to write one - OK, just as an exercise. Projects
> make my brain more active and my heart cheering when it works.

That's what I meant by it being an interesting and instructive exercise.

Good luck...and don't hesitate to ask if questions arise.

0 new messages