cjl
Ok, I will defer reposting what may yet appear; anyone else attempting
to post to alt.sys.pdp8 on the fore-runner thread of this one?
cjl
Again, anytime.
I have made mention a few times of the "slurp" loader, and I think it
deserves its own post. Loading a program using almost no space is
certainly a similar topic to writing code that is much smaller than
"tiny" by "modern" standards.
The slurp loader was invented by Richard Lary, and was first applied
to the R-L monitor as the only way to load user binary programs into
memory, created by assembling [with PAL III] source files into binary
files. Later, the mechanism was internally applied within Poly Basic,
but doesn't appear externally as files. P?S/8 implements a superset
of the slurp loader that supports extended memory. [In fact, P?S/8
can load binary files created on the R-L monitor system directly. The
R-L directory format is a subset of one of the P?S/8 file structures;
we wrote a conversion program for the text files, however the binary
files just move across as is. P?S/8 can only process them once they
are moved because the R-L directory is not totally compatibile with P?
S/8 programs, such as the binary loader; copy utilities can easily
move the files, etc.]
On most systems, you cannot directly load binary files into memory.
The reasons should be fairly obvious: Since the only memory truly
"off limits" is where the handler is resident [07600-07777], how can
you have the facility to load the rest of memory without having a
utility to occupy some of the memory where you are trying to load?
[Newer machines just evade the problem; they just can't load into larg
areas of memory; still larger areas of memory are always presumed to
be available, so there is no such thing as "too big". In any case, on
any system with a "sacred" resident area, the goal is to be able to
load into all non-"sacred" memory without restrictions.]
Various PDP-8 systems have done some ingenious work to attempt to
solve this problem:
In CPS-4K, a system that runs on a 4K LINC-8 with a pair of LINCtapes,
the binary output of assembly cannot be directly loaded into memory.
Instead, there is a "build" command that builds up an image of memory
on a dedicated scratch area on one of the tapes. [CPS-4K is one of
the few systems that requires a pair of tapes. P?S/8, R-L, and even
OS/8 only require one.]
Once that image is created, it can be loaded directly into memory pre-
located. Thus, in theory you can merely ask the system handler to
load in say 00000-07577 in one operation. [CPS-4K itself is not quite
capable of doing that! Quirks and limitations of the LINC-8 are
another topic; suffice to say that in other systems, this would have
been successful, but in CPS-4K, they had to resort to an ugly kludge
just to accomplish that!]
While this is a lot better with two tapes than what would be rather
tedious with one tape, this still takes quite a lot of time. [Note:
TU56 runs tapes marginally faster than TU55, but the LINC-8 is
noticeably slower. LINCtape transfers occur every 160 microseconds
per 12-bit instead of the usual 133 microseconds of DECtape.]
OS/8 uses a somewhat better notion: Part of memory is "protected"
from direct loading, and instead a virtual image is created, but only
for that portion of memory where the binary loader [ABSLDR] and
associated buffers and/or handler space is allocated. All other
memory is loaded directly. Portions of the relevant file are read
into a read-buffer, and the binary information is unpacked serially,
then either directly loaded into unprotected memory, or into the
appropriate section of a cached virtual buffer. After all of the
files are loaded "somewhere" the entire virtual buffer is itself read
into memory by the system handler over what a moment ago was the
loader program itself. [Note: As an implementation restriction, OS/8
throws away all attempts to load into 07600-07777.]
P?S/8 also has a similar loader available [but this is NOT a slurp
loader; it is OPTIONAL on DECtapes;this must be introduced first
before we get to the actual slurp loader]. OS/8 is obligated to read
in multiples of two pages at a time; it also has to write out double
pages as well, as this is part of the minimum requirement of OS/8.
Thus, inherently, P?S/8, having one-page read and one-page write
capability, could theoretically create a two pages smaller protected
area, assuming the code to unpack the files takes about the same
amount of code [a reasonable assumption, however the format of P?S/8
is inherently 12-bit, while OS/8 is actually 8-bit paper-tape frames
packed in the usual "3 for 2" OS/8 format also used for ASCII files.
Thus, the P?S/8 code running the loader is slightly smaller and also
slightly faster; this can be noticeable, albeit always mostly
negligible on very fast disks, but somewhat significant on such as
DECtapes].
However, by various trial and error techniques, it was determined that
OS/8 actually should have used LARGER buffers! [And it is possible to
write a replacement loader for OS/8 to accomplish this.] Thus, the
final version of the P?S/8 equivalent uses four page input and four
page output buffers to manage the virtual loading. While it is true
that now there is more memory to virtualize, the overhead of having to
deal with more memory is more than offset by the faster techniques to
load memory at all. Additionally, the larger cache buffer size makes
managing the virtual memory buffer more efficient. [Note: The P?S/8
virtual loader as described here maintains an image of attempts to
load into 07600-07777; these are not directly loaded, but there is a
documented way to obtain these as many as 128 additional words.
Applying one-off kludges, it is possible to load in certain PDP-8
system diagnostics OS/8 is incapable of, simply because OS/8 throws
these words away! Some of these are specifically for the PDP-8/e, so
this is not merely a problem of some older pre-OS/8-existence
programs. No system can do this job easier, because it meant for use
with paper-tapes, and incompatible with all O/Ses that run from
storage devices; at least the P?S/8 virtual loader gives you a
possible solution, albeit kludgy, often requiring a manual bootstrap
after the fact, but at least the program is loaded into memory at
all!]
Thus, when the P?S/8 virtual loader is compared to the OS/8 loader,
there is a small edge to P?S/8 for most typical loading operations,
and both are far faster than CPS-4K's method of virtualizing all of
field 0.
However, all of this pales by comparison to the slurp loader. The
seeming impossible is done:
Files are loaded without a read buffer, and better still, with the
tape write protected! The loading process involves virtually no tape
shoe-shining, since just about all tape motion is continuous rolling,
especially fast if the files are actually loaded contiguously on the
tapes. Files may abut in terms of physical blocks without so much as
a tape turnaround occurring. Depending on the user binary
requirement, the tape may not even have to do as much as a rewind.
So, how did he do it?
Anyone who has ever studied the PDP-11 FILEX program [written by
certain people with knowledge of the slurp loader; this legendary
technique made the rounds back then within DEC] may be aware of the
way the -11 controller gets 18-bit data. In essence, there is a flag
that goes up every some multiple of 33.3 microsonds [on the PDP-8,
every 12-bits transfers in 133.3 microseconds, so I assume you get 18
bits in something like 200 microseconds]. Each time the flag raises
in that mode, you can grab the two bits that were not transferred via
16-bit DMA; there is a reverse analogy for writing. Thus, an -11 can
read/write -10 and -15, etc. DECtapes.
On the PDP-8, there is no such thing as a word flag, but there is a
way to make the equivalent:
The DECtape is a three-cycle data-break device. What this exactly
means is as follows: Location 07754 is the word-count address; you
place there the two's complement of the number of words you want
transferred. If less than an entire block, you get what you asked
for; if larger than one block, the transfer is continued into the next
block as necessary, etc.
Location 07755 is the current-address register. You place there the
transfer address-1 of where you want the data in memory. As a
transfer occurs, the word-count register increments, the current
address register increments, and the word is transferred to the newly
updated current address indicated by the current address register. If
the word count went to zero, the transfer stops right then and there,
else keeps going for another word. You are allowed to merely wait for
it all to happen until the word count overflows, and of course that's
what all system handlers do. However, this is the slurp loader!
So, what the slurp loader does is to put 0000 into the word count
location. It places the address-1 of a one-word internal buffer
within the slurp loader [meaning somewhere within 07600-07777] and
waits for the word-count location [07754] to change from 0000 to 0001:
TAD (BUFFER-1) /GET ADDRESS
DCA 7755 /SETUP FOR TRANSFER
DCA 7754 /CLEAR WORD COUNT REGISTER; COULD TRANSFER 4096
WORDS IF WE LET IT
TAD 7754 /GET CURRENT WORD COUNT
SNA CLA /DID IT CHANGE?
JMP .-2 /NO, KEEP WAITING
after abour 133 microseconds, the latest tape word is now in
BUFFER, .-.
When the word count changes from 0000 to 0001, it is now true that the
one-word buffer pointed to by the updated current address register now
contains the just read in data. In the original version of the
loader, the current address register actually was also used as the
data buffer for the transfer! Thus, first the current address
register was set to 7754 [one less than itself]. When the word-count
went from 0000 to 0001, it was now true that the latest data word was
actually in 07755.
The latest word is dealt with in the loading sense. Then the two
registers are reset to the same values as the last word time. Thus,
the transfers run forever, or at least until the logical end-of-file
whenever that occurs [there is a sentinel condition] and then the
overall process is repeated for the next file [if any]. The search
routine of the slurp loader already understands "dead reckoning" so
that the next file argument wil always correctly set the tape search
to forward as necesssary even if the next block to transfer was
actually the next physical block on the tape, etc.
The slurp format itself breaks down into a series of 7 word groups.
There are 4 possibilities for any one word:
1) This is a data word; load it and increment the loading address.
2) This is a loading address; set for next time
3) This is a CDF instruction. Execute it to change the DF to the
designated one for extended memory support.
4) This is the end of this file.
Thus, you need two bits to express the destiny of any one of six data
words. That gives you a 12-bit control word to know what to do with
the next six data words; the tape format repeats the 7-word groups in
every block. There are 18 such groups per physical block; ever the 18
iterations, move on to the next block, including searching for it [it
knows that in this situation, search forward and the block is
immediately found; repeat the process for this latest block,etc.]
Thus, assuming no sizeable use of anything but data words [which is
typical], the efficiency of a binary file is just shy of 6/7 the
efficiency of a memory image file for the same data. This method is
what is used within Poly Basic; The R-L implementation does not
support the CDF Field setting [Note: Richard Lary never had a more-
than-4K machine to test it on!]; P?S/8 does the full set including
processing the field settings.
Note that this format is data-equivalent to paper-tape binary, but
quite a bit more efficient because it is 12-bit oriented, while paper-
tape inherently takes 8 bits of data to get you 6 bits of loading into
memory. This is part of why the slurp format loads faster, even with
the virtual loader. In any case, you wouldn't want such an unwieldy
format for a slurp loader! [Note: P?S/8 has numerous BIN program
options to convert to/from paper-tape binaries for all user binary
files. All situations are supported to handle RIM and BIN format, low
and high-speed punch and reader, etc. and an automatic way to make
files loadable by the RIM and the BIN loader from the very same paper-
tape. [It's an old trick back from the paper-tape days; you have an
extraneous extra frame at the end of the tape; it gets ignored by RIM
however becomes the tape checksum if BIN is checking it. If
converting such a paper-tape to P?S/8 binary, that end frame gets
ignored, as it doesn't actually contribute any loading data. Along
the way, all superfluous origin settings are removed, thus any remnant
of RIM format can be filtered out if desired; by specifying RIM format
punching, all the necessary overhead can be put back in! In short,
all bases are covered.]
Thus, what is placed into 07600 is the slurp loader code itself, a
fully compatible bootstrap at 07600, and the list of passed files
starting at 7757 to 7777 at most. The list is terminated with a
sentinal 0000 or it just fits into the space without a sentinel, the
technique described elsewhere that is standard throughout this family
of operating systems.
Now, an astute reader may notice something is missing: Just as in the
original R-L monitor, you have a program in memory, a write-
protectable system, a restart from 07600 bootstrap capability, etc.
But no way to read or write the system device, since the system
handler is gone! [And yes, some early programs had to binary load in
their own copies of a DECtape handler; the R-L limitation applied to
user programs as well as system programs!.]
Here is where what might be called "Slurp II" comes in. Borrowing
from the techniques already learned from Richard Lary's original slurp
loader, I took the trick a step further:
After all of the user code is loaded as dictated by the command-line
the user typed, suppose a "magic" additional file were passed to the
slurp loader for "loading" but this is more of a "friendly virus" than
a conventional file:
At the precise point where the user's passed files are all loaded in
completely, we are now to process the contents of the "magic" file.
At this point, several conditions have changed:
1) We need to maintain only a small few pieces of information, namely
the contents of 07756 which is the defined starting address meant to
be started with a JMP I 7756 instruction. [Note: If no starting
address was expressly given, the slurp loader is prepared to use a
"safe" address of an internal HLT [7402] instruction that differs from
that of its internal error handler. In this case, the clear AC that
results is further proof this is not an error, but rather the
innocuous consequences of the fact the user didn't supply a starting
address, etc.] Loader options also specified an extended memory
field, thus there is a CIF CDF to some field, followed by a JMP I 7756
to start the user's program up. If this set of instructions is
maintained fairly close to 07756, the set of the protected items can
be fairly small. Otherwise, the CIF CDF instruction might be able to
be copied elsewhere; the JMP I 7756 instruction is static
information. Location 07756 itself must be saved or else changed to
some other JMP I instruction using some other starting address.
Fortunately, this desperation move was never implemented, but it was
contemplated!
2) Most notably, all the words from 07757-07777 are now obsolete.
Thus, the "magic" file can specify loading of new instructions there!
[And, being extremely selective, certain just-became-obsolete portions
of the slurp loader are also obsolete, such as the portion devoted to
parsing the file parameter list; at this moment in time there is no
more need to handle the just finished list, thus the now *former* code
is available as program or data space to the "magic" file.
3) When the new words are all loaded into 07757-07777, a one-word
patch is applied to the slurp loader to make it startup the code now
loaded into 07757-07777. The CIF CDF instruction can be moved if
necessary, and thus much of the slurp loader code space associated
with tape searching and related events can also be benignly usurped.
[Note: This last technique has been used elsewhere; the released
binary of Lenny Elekman's 4K paper-tape BASIC patches the paper-tape
BIN loader to make the program self-start after the paper-tape is done
reading [without errors]. Of course, it presupposes exactly which
version of the BIN loader it's being loaded by!]
4) To make it even more interesting [this wasn't necessary for
DECtape, but for other devices, this additional trick was necessary to
get even more code in], it is possible to cheat the file format and
place data words after the indicated [in the slurp 7-word group format
sense] logical end-of-file which essentially represents a short
sequence of core-image words that can allow additional loading to deal
with other consequences, if any; this was used in the case of
supporting the RX01].
5) Thus, the slurp loader is now gone, and in its place is a piece of
code with a new mission: Reload the system bootstrap block [which is
the precise place where the system handler lives] into 07600-07753.
[Note: There is additional information in the block, but it pertains
to how P?S/8's keyboard monitor is loaded. We don't need and more
importantly don't want any locations further down in memory to be
disturbed; this is where the code that is running is loaded.] The
word count and current address are set accordingly to create the short
transfer of most, but not all of the block 0000 contents.
[Programmer's note: The source code for the "magic" file makes heavy
usage of the RELOC pseudo-op. At times, multiple origin settings have
to be in your head to ensure it's assembling what you really mean!
Each device handler includes its own slurp loader subsection to be
called up by the generic BIN program, and also its own "magic"
"file" [internally referred to as the "/I" file] and thus, despite all
of this, P?S/8 is still totally device-independent. When the system
is initially created, a dummy set is used to setup basic definitions.
Independently, TC01/08 DECtape or LINC-8 LINCtape, PDP-12 LINCtape,
DF32, TD8E, etc. handler files are loaded over the generic image to
create a system tailored to the specific hardware; this is
conceptually no different from OS/8, just more things to customize,
etc.]
Once this is done, the new contents of 07600-0777 are:
1) The system handler; the seeming impossible has been accomplished.
2) The handler's compatible bootstrap at 07600.
3) The preloaded contents of the call to the system handler at 07632
that will save 00000-07777 into blocks 0020-0057, and then restart at
07600 to do a normal P?S/8 keyboard reboot with no further want or
need of writing on the tape. Thus, if the user specifies to start at
07632, memory is saved and the system comes up afterwards to a console
prompt. ODT, BSAVE, START or other commands can be used for further
work, if desired. However, there is no obligation to start at 07632;
it remains available however, and the user is welcome to execute the
code at 07632 to get the contents of memory preserved for further work
after the next bootup, etc. [Note: This is conceptually no different
from OS/8 from a user's standpoint; there is an address to get memory
saved and then boot, or there is an address to toss memory and then
boot. Of course each system has a different set of addresses to
accomplish this, but there is no way or even reason to maintain
arbitrarily any extraneous compatibility with a system that a) came
later, and b) requires a minimum of 8K just to boot up; on a 4K
machine, you have to make it work, not be "compatible"! We are only
talking about having to remember two addess values per system here!]
As a convenience to the user, the following switches exist within the
P?S/8 BIN [the binary loader program] system program:
/V Don't use the slurp loader; use the virtual loader; the system
must be write-enabled. The system handler is used to get the binary
and virtual buffer into memory, thus it is available before, during
and after. Although somewhat faster than the OS/8 analog, still very
much slower than the slurp loader, if available. Note: In certain
hardware configurations, the slurp loader isn't available. Generally,
the slurp loader can only be implemented on devices such as the
DECmate RX50, RD51 hard disk controller, RX01/02/03, LINC-8 LINCtape,
TD8E which are all program-transfer [non-DMA] devices or three-cycle
devices such as the DF32 and the TC01/08 DECtape. [The RF08 cannot
have a slurp loader because it transfers too quickly and the code
cannot keep up!] Single-cycle DMA devices such as RK disks cannot use
this because the concept doesn't apply.
By default /V is off, unless there is no slurp loader, in which case
it's ignored and is acted on as if set on by default.
/I passes the "magic" "file" to the slurp loader; ignored if /V is set
or implied where /I must be ignored [no slurp loader]. Note: Since
the "magic" "file" is passed as a file argument, should /I apply,
there is a limitation placed on the maximum number of user-stated
files down one to a maximum of 16. [From the slurp loader's point of
view, this is still a maximum of 17 files; the last one is always the
"magic" "file".]
Due to the tiny and clever nature of the "magic" "file" it is in fact
contained wholely in the 128 words of tape block 0060 of the system
device. [You really can't expect too many words to be needed to usurp
a piece of code that is itself less than 128 words long!] As in the
description of the % and $ files [which are at 0020 and 0040
respectively], the "magic" "file" conforms to the rounded-up to
multiples of 8 requirements of P?S/8 itself.
As a convenience to the user, there is an alias to the BIN program
called GET. It's the same program except that it forces /I and
forces /0, prevents /1 through /7, and forces =7632. /V can be used if
desired [or is forced on certain hardware as described above.]
Thus, to load in a binary user program for use with ODT, etc.:
.GET BFILE /H
.
This loads in HLT [7402] in all of memory first, then loads the binary
files over that, then does the /I "magic" stuff, all of field 0 is
saved by forcing a start at 07632, which then reboots the keyboard
monitor to the next command prompt. [Note: As explained elsewhere, the
familiar TOPS-10 or OS/8 "." prompt is a popular user preference.
Many other users prefer the prompt {the current unit number expressed
as 0 1 2 3 4 5 6 7 as required} followed by ">" which was in turn
ripped off from P?S?8 by the later systems CP/M-80, CP/M-86 and MS-DOS
with the only difference being logical drives there are expressed as
letters, not numbers.]
Note: This command should not be confused with the vaguely analogous
OS/8 command. That command is used to load already-preloaded core
image files that were already SAVEd after first being binary loaded by
ABSLDR. This applies to user binary created by the assembler, and not
yet saved in an image format. The only utility that has been applied
to this point is BIN which creates saveable binary [or its alias GET
which is just a convenience with regard to setting certain loading
options]. Additionally, the OS/8 GET has no such concept as
preloading memory before the files' contents with either 0000 or 7402,
while these are standard options of BIN/GET. The user can save any
and all memory with the BSAVE command to create binary files from the
current contents of memory; all of field 0 is saved in blocks
002-0057, which will be used as needed for the data that will be
saved; all of extended memory is totally physical as P?S/8 itself
doesn't use any of it. This includes locations x7600-x7777 for any
existent extended memory field x. [Contrast that with OS/8's blanket
restriction on the use of x7600-x7777 in most situations for
potentially every field!]
As I posted elsewhere, there was an in-DEC "shootout" where P?S/8 on a
pair of DECtapes was able to load the binary in far faster than OS/8
running on an RK05/RK8E. It really wasn't a fair fight; DEC's
embarassment was predictable because they were told in advance
repeatedly, but just didn't believe what they were up against until
they experienced it directly.
A loader that operates in sort-of "zero locations used" mode is as
small as you can make a loading program!
cjl [your one-word-long program is one word too long]
If anyone has been following the various threads on alt.sys.pdp8 and
alt.folklore.computers cross-posted:
There seems to be a problem at the moment. Posts are being swallowed
up and just plain lost. All are invited to look directly in
alt.sys.pdp8
cjl [I hope this one gets posted!]
Looks like it might be a receiving issue rather than a transmitting issue -
I can see four posts from you today in alt.folklore.computers on Usenet
--
Chris Burrows
CFB Software
Armaide v2.0: ARM Oberon-07 Development System
http://www.cfbsoftware.com/armaide
I believe those are newer posts. But it just swallowed three posts,
one small but significent, the other two are[were] tests.
cjl
The news propagation of this newsgroup has been kind of odd for a long
time. I have set up leafnode talking to forteinc, xs4all and
tele2/swipnet, plus the local provider, GET. These are sufficiently
diverse so news rarely gets lost.
But some, very low volume, news takes detours, sometimes rather long
ones. The posts that Eugene Miya and Rick Aldersson sends usually
takes a day and a half to reach me; but I often get the replies to
those postings within minutes of the original posting.
I run polls round-robin with the local providers represented twice,
so it goes forteinc, tele2, get, xs4all, tele2, get over a two hour
interval. The vast bulk of news is here within an hour of the original
posting.
The late news always comes in from tele2/swipnet, and I know they
run some "fill feeds" themselves. So, it does indeed seem like there
is a foulup somewhere along the news path, but not totally broken if
you set up enough redundancy.
-- mrr
When something like that happens look out for people posting in
html or attaching binaries (anything other than plain text). Many
systems just reject them entirely.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
> cjl wrote:
> > cjl <clasyst...@gmail.com> wrote:
> >
> >> Test post. I have had three posts swallowed by the previous topic.
> >> This one is not posted to a.f.c. in case that's the problem.
> >
> > If anyone has been following the various threads on alt.sys.pdp8 and
> > alt.folklore.computers cross-posted:
> >
> > There seems to be a problem at the moment. Posts are being
> > swallowed up and just plain lost. All are invited to look directly
> > in alt.sys.pdp8
> >
> > cjl [I hope this one gets posted!]
>
> When something like that happens look out for people posting in
> html or attaching binaries (anything other than plain text). Many
> systems just reject them entirely.
As they should. Damn kids.
-- Patrick