PEP001.PA: Or, how I managed to spend 3 days solving FizzBuzz

Warren Young

unread,

Dec 1, 2016, 10:24:02 AM12/1/16

to PiDP-8

I've just written up a long but — I hope — amusing saga about a PDP-8 assembly language problem I just solved:

https://tangentsoft.com/pidp8i/wiki?name=PEP001.PA

I'm not sure what the moral of the story is. I suppose that the only important thing is that it make you think about such things, and come to your own conclusions.

David Mercado

unread,

Dec 1, 2016, 2:55:16 PM12/1/16

to PiDP-8

That is a real eye opener into some the issues involved in writing assembly for the PDP-8 and how they can be solved. Good work!

As for me, I've always wanted to write small programs in FORTRAN on a DEC minicomputer, a much easier task. In college I wrote programs for a numerical methods class on an IBM mainframe, and wished I had my own PDP computer. Now I sort'a do!

Warren Young

unread,

Dec 1, 2016, 6:01:36 PM12/1/16

to PiDP-8

On Thursday, December 1, 2016 at 12:55:16 PM UTC-7, David Mercado wrote:

That is a real eye opener into some the issues involved in writing assembly for the PDP-8 and how they can be solved. Good work!

Thanks!

As for me, I've always wanted to write small programs in FORTRAN on a DEC minicomputer, a much easier task.

I've considered re-doing this challenge in FORTRAN, if only to see how much difference there is in the size of the assembled binary. If I took the time to pack my assembly implementation down tight, I think I could get it all into a single page of core.

Maybe you will beat me to it. I'll even spot you one core page: do it in under 2 full pages of core, and I'll be impressed.

Rick Murphy

unread,

Dec 4, 2016, 9:08:17 PM12/4/16

to Warren Young, PiDP-8

Doing this in FORTRAN would probably take up less than a "page" as long as you only include the generated FPP instructions.

(I say "page" as the FPP code doesn't have pages and uses 15 bit addresses). But let's use the 128-word definition of a page.

If you include the runtime and FPP emulator, then that's going to take up at least 8k.

For assembler, attached is a version that fits on a single page. There's a whole location free. :)

Let me be VERY clear: the original code is an impressive job for someone just getting started on PDP-8 assembler. I've been doing this for what seems like a lifetime (it has been!) and I did this - another day of fizzbuzz lost. :)

Getting down to one page:

There's several JMPs around data that aren't necessary. Move the data so those aren't required.

Nit: You define a "TCA" Pseudo-op. That's not needed, there's already a "CIA" (complement and increment AC) that does the same thing. Threw me for a minute.

Repeated use of "TSF; JMP .-1; TLS; CLA" - so I made it a routine.

TAD (3) - not needed. Replace with CLA CLL CML RAL IAC. Yeah, really. :)

Several cases of SNA followed by CLR which can be replaced with SNA CLA so the extra CLA is avoided.

Incrementing a variable in memory - don't use "CLA; TAD thing; IAC; DCA thing" - either do "CLA IAC; TAD thing; DCA thing; or, use "ISZ thing" when the skip isn't going to happen.

There's at least one other tweak that would bum another word or two. But perhaps this is getting to the brick wall. Quoting those who came before me:

/ELEKMAN'S THEOREM:  GIVEN 129 WORDS OF PDP-8 CODE, THE 129 WORDS CAN
/BE OPTOMIZED TO 128 WORDS.
/COROLLARY:  THE TIME REQUIRED TO APPLY ELEKMAN'S THEOREM INCREASES
/EXPONENTIALLY WITH THE NUMBER OF APPLICATIONS OF THE THEOREM TO A GIVEN PAGE.

(sic. *Optimized.)

Oh, yeah - use of page zero literals would help, but that would be cheating.

-Rick

--
You received this message because you are subscribed to the Google Groups "PiDP-8" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pidp-8+unsubscribe@googlegroups.com.
To post to this group, send email to pid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pidp-8/b785717c-a82a-4117-b2d4-b3510f93567a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Rick Murphy, CISSP-ISSAP, K1MU/4, Annandale VA USA

EU.PA

Warren Young

unread,

Dec 5, 2016, 12:41:43 AM12/5/16

to pid...@googlegroups.com, tange...@gmail.com

On Sunday, December 4, 2016 at 7:08:17 PM UTC-7, Rick Murphy wrote:

Doing this in FORTRAN would probably take up less than a "page" as long as you only include the generated FPP instructions.

That's plausible, since it would effectively move my DECPRT and PRINTS routines (and your TYPE routine) into the FORTRAN runtime, giving the compiler considerable space to generate sloppy code.

If you include the runtime and FPP emulator, then that's going to take up at least 8k.

I'm not going to count the FORTRAN runtime against the solution, and I'm willing to assume the existence of the FPP.

I've just written an OS/8 BASIC implementation that apparently uses the FPP, as I didn't need the "subtotal" trick. It's still ridiculously long at 11 lines of code, including 3 GOTOs, all necessary! I can see what old Dijkstra was yelling about now.

For assembler, attached is a version that fits on a single page. There's a whole location free. :)

According to palbart, you're over by 1 location. It complains:

current page literal capacity exceeded at Loc = 00374

I was able to get it to fit by removing the space at the end of the "ANSWER: " constant.

Note that a version of palbart is now built as a dependency of the PiDP-8/I software, so we can all be on the same page. I did this after noticing first that Debian is still shipping a six-year-old version of palbart, and second that not all OSes have it in their package repository.

I've checked in a variant of your code. Maybe you'd like to take another look at it and try to buy me back my space character? :)

Alternately, an optimization I'd already considered is moving to "stripped ASCII" representation. It would complicate the implementation of PRINTS, but the tradeoff might work out.

Let me be VERY clear: the original code is an impressive job for someone just getting started on PDP-8 assembler.

Thank you. It isn't my first assembly language program, but it might be my single longest one. I've rarely had cause to do more than read assembly language code.

As for your own offering, you should be both proud and ashamed to have perpetrated it. Especially the ENDG hack. Proud and ashamed, I tell you. :)

There's several JMPs around data that aren't necessary.

C++ has turned me into a fan of variable locality. Still, it does seem kind of silly to insist on that when we don't actually have block scoping or a hardware stack.

Nit: You define a "TCA" Pseudo-op. That's not needed, there's already a "CIA" (complement and increment AC) that does the same thing. Threw me for a minute.

I was trying to parallel the symbology of TAD.

But also:

Yesterday upon the stair

I met a man who wasn't there

He wasn't there again today

I think he's from the CIA

It makes using that instruction...distracting. TDM TLA.

Repeated use of "TSF; JMP .-1; TLS; CLA" - so I made it a routine.

I'm a bit surprised it made a difference. It exchanges four instructions for a 6-word routine, plus the JMS at each call site. I guess there were enough of these to net out negative in instructions.

Also, I was hoping to leave DECPRT as-stock, but oh well.

TAD (3) - not needed. Replace with CLA CLL CML RAL IAC. Yeah, really. :)

Yech. :)

I've moved that into a pseudo-op and documented it, since you used it twice.

Several cases of SNA followed by CLR which can be replaced with SNA CLA so the extra CLA is avoided.

Yes, I need to get better about recognizing when microcoded instructions can be merged.

Incrementing a variable in memory - don't use "CLA; TAD thing; IAC; DCA thing" - either do "CLA IAC; TAD thing; DCA thing; or, use "ISZ thing" when the skip isn't going to happen.

Good point, thanks.

/ELEKMAN'S THEOREM: GIVEN 129 WORDS OF PDP-8 CODE, THE 129 WORDS CAN

/BE OPTOMIZED TO 128 WORDS.
/COROLLARY:  THE TIME REQUIRED TO APPLY ELEKMAN'S THEOREM INCREASES
/EXPONENTIALLY WITH THE NUMBER OF APPLICATIONS OF THE THEOREM TO A GIVEN PAGE.

Truth. Simple truth.

Oh, yeah - use of page zero literals would help, but that would be cheating.

Is that safe under OS/8? I don't mind using the autoinc registers, since any program must not expect them to be untouched beyond the bounds of a subroutine, but OS/8 stores nothing down there itself?

clasystems

unread,

Mar 12, 2017, 6:42:47 AM3/12/17

to PiDP-8, tange...@gmail.com

I see that you found the obscure reference to Lennie Elekman and Jack Burness!

On Sunday, December 4, 2016 at 9:08:17 PM UTC-5, Rick Murphy wrote:

Doing this in FORTRAN would probably take up less than a "page" as long as you only include the generated FPP instructions.
(I say "page" as the FPP code doesn't have pages and uses 15 bit addresses). But let's use the 128-word definition of a page.

If you include the runtime and FPP emulator, then that's going to take up at least 8k.

For assembler, attached is a version that fits on a single page. There's a whole location free. :)
Let me be VERY clear: the original code is an impressive job for someone just getting started on PDP-8 assembler. I've been doing this for what seems like a lifetime (it has been!) and I did this - another day of fizzbuzz lost. :)

Getting down to one page:
There's several JMPs around data that aren't necessary. Move the data so those aren't required.
Nit: You define a "TCA" Pseudo-op. That's not needed, there's already a "CIA" (complement and increment AC) that does the same thing. Threw me for a minute.

Repeated use of "TSF; JMP .-1; TLS; CLA" - so I made it a routine.

TAD (3) - not needed. Replace with CLA CLL CML RAL IAC. Yeah, really. :)

Several cases of SNA followed by CLR which can be replaced with SNA CLA so the extra CLA is avoided.

Incrementing a variable in memory - don't use "CLA; TAD thing; IAC; DCA thing" - either do "CLA IAC; TAD thing; DCA thing; or, use "ISZ thing" when the skip isn't going to happen.

There's at least one other tweak that would bum another word or two. But perhaps this is getting to the brick wall. Quoting those who came before me:

/ELEKMAN'S THEOREM:  GIVEN 129 WORDS OF PDP-8 CODE, THE 129 WORDS CAN
/BE OPTOMIZED TO 128 WORDS.
/COROLLARY:  THE TIME REQUIRED TO APPLY ELEKMAN'S THEOREM INCREASES
/EXPONENTIALLY WITH THE NUMBER OF APPLICATIONS OF THE THEOREM TO A GIVEN PAGE.

(sic. *Optimized.)

More original form: ANY PDP-8 program can always be optimized to occupy even less program and data space. That's closer to Lennie's Language.

Jack Burness' corrolary is the amount of time it is likely to take is a direct function of the total number of words already bummed out and can increase indefinitely without limit.

That said, the P?S/8 system device handler is by far the best possible example of this. All of these people and more have worked on it, I think a total of about a dozen people [including me] but there are "quantum effects" as well. When any of the people worked on it, eventually they ran out of time to consider removing any additional words.

Then, still others looked at it and successfully removed even more, perhaps only some bits in a word in case that might have mattered, and then eventually much of the bit optimizations became relevant along with other brainstorming proving Elekman is correct. [Note: Richard Lary and Lenny Elekman wrote the first version of this code for the R-L Monitor System. It's now amazingly more tight and adds concepts that could not have been accomplished before in some cases synergistically.

In any case, once some others worked on the code, bring the code back to one of the other people opened up new opportunities to make it even smaller, and that did include some of those care/don't care things that are esssentially bit optimizations that add up to something.

And the elapsed time for the entire gamut of the example is nearly two decades!

Yes, despite all of this,:

I found a bug in the code that has persisted for many years! It is highly unlikely to occur, but I can give examples of how to screw it up that can be documented.

fortunately, I was able to fix the bug merely by moving one set of instructions down one to move into the newly-created space one instruction, and in the process it is deleted from its former location, for a net no-change in length.

And the current status is one minor theoretical bug that only affects poorly-written interrupt-driven programs in theory, a few additional potentially useful bit combinations currently not used, but could be pressed into service, and one actual free word! Total length is still exactly one page but now supports many more kernel-related features to pass to programs running under the system.

And the very last one: I found an amazing SEVEN BITS in one word I had overlooked! They are all now allocated to features to be used in the next release.

cjl [Do NOT ever count out Elekman''s Theorem; however it is possible that I may not outlast Burness' corollary this iteration].

clasystems

unread,

Mar 12, 2017, 7:00:38 AM3/12/17

to PiDP-8, tange...@gmail.com

See below:

OS/8 lords over 07600-07777 in every field, and at least fields 0 and 1 must exist. It does NOT give a hoot about anything else, but of you enable interrupts, you MIGHT get into trouble if you are not careful. OS/;8 does not have a clean patching mechanism to avoid problems, so some have gotten into trouble, but if you leave with all the flags clear, you only have to worry if you call one of the handlers that doesn't quite play by the rules. The TD8E handler is a sore-point because it has real-time considerations for example.

In P?S/8, I handle that using the SKON instruction because I know the TD8E must be on a PDP-8/E or higher and that's always thus available.

The handler is entered and interrupt status is tested with SKON. Then depending either an IOF or an ION is built and placed in-line as the handler exits. Thus, you of necessity momentarily lose the performance of the interrupts, but when the handler exits, you are back where you were.

I expect future releases of the P?S/8 TD8E handler to do a slightly better job and allow some PARTIAL overlap during the less critical phases of the code. Worst case, some expected performance improvement fails to mateirlaize and you didn't gain [but you certainly don't lose!].

This is because the specs for P?S/8 system device handlers that cannot fit in one page [in 07600-07777] was back then slavishly following OS/8 over the cliff so to speak. It was their foolish design choice that plagues the rest of the system since, but P?S/8 was redesigned out of the necessity to handle still other devices thus allowing more code space, but the TD8E handler was never rewritten to take advantage of that huge amount of space [SEVEN more pages!].

In P?S/8, you make it fit if you can in one page, and if you cannot you need another memory field, same as OS/8, except that means 8K instead of 4K, not 12K instead of 8K, and instead of only getting 124 more words [BATCH steals the last four of the page] you get an entire 1K of the top of the HIGHEST field. This is dynamically calculated as a function of your system memory. Only if coincidentally the machine is 8K does this look more like OS/8 with the support in field 1. If you have 32K, it goes in field 7 opening up EVERY LOCATION of extended memory to all programs except you can't have any of the highest field. There are so many useful things you can do with a larger contiguous memory space, auto-indexed algorithms that are fouled by OS/8, etc. Also, OS/8 BASIC and Fortran IV are crying out for this memory model. [You should hear the incredibly ugly kludges they do instead!]. When they are ported to the P?S/8 SHELL, the kludges will be eliminated entirely.

Also, 12K machines will load more code in those programs than in 8K. Due to their respective 15-bit addressing requirements, that just doesn't happen.

I explained my model to them and they agreed it is superior, but they were literally too lazy to do it over again!

cjl

Reply all

Reply to author

Forward