DCA vs DEP

dpi

unread,

Nov 20, 2009, 7:05:30 PM11/20/09

to

One of the key decisions made by the architects of the 8 was the
Deposit Clear Accumulator instruction instead of a simple deposit
where the accumulator is not cleared. It has never been clear to me
that this was a wise decision because an extremely common operation is
a DCA followed immediatly by a TAD to get the data back into the AC.
I have often wondered if anyone ever did an analysis of common code to
see what would have been better or is it clear that the DCA the best
choice resulting in shorter code sequences in general.

Doug Ingraham
Rapid City, SD

glen herrmannsfeldt

unread,

Nov 20, 2009, 7:38:47 PM11/20/09

to

There is in one of Knuth's "The Art of Computer Programming"
a discussion of a two instruction computer. If you are restricted
in the number of opcodes, shorter isn't always better.

-- glen

radi...@gmail.com

unread,

Nov 21, 2009, 3:18:15 AM11/21/09

to

I always though of that as a direct manifestation of the of a core
memory "destructive read" operation... i.e., DCA has no write refresh
cycle after the read and is therefore a faster operation. In the next
instruction you can write it back manually (TAD) if you like with no
performance penalty. If you want the AC cleared you win.

I know that systems played tricks with read/modify/write cycles (like
INC) to delete the write refresh cycles of the read wherever possible
because this doubles memory throughput.

I've never looked a the schematics enough to verify that though...

Rob Doyle

glen herrmannsfeldt

unread,

Nov 21, 2009, 12:14:06 PM11/21/09

to

radi...@gmail.com <radi...@gmail.com> wrote:
> glen herrmannsfeldt wrote:
>> dpi <doug.i...@gmail.com> wrote:
>>> One of the key decisions made by the architects of the 8 was the
>>> Deposit Clear Accumulator instruction instead of a simple deposit
>>> where the accumulator is not cleared.

(snip)

>> There is in one of Knuth's "The Art of Computer Programming"
>> a discussion of a two instruction computer. If you are restricted
>> in the number of opcodes, shorter isn't always better.

> I always though of that as a direct manifestation of the of a core
> memory "destructive read" operation... i.e., DCA has no write refresh
> cycle after the read and is therefore a faster operation. In the next
> instruction you can write it back manually (TAD) if you like with no
> performance penalty. If you want the AC cleared you win.

(snip)

Are there any spare opcodes on the PDP-8 that could have been used
for separate store and clear operations? Maybe it is both
reasons together.

-- glen

invalid

unread,

Nov 21, 2009, 12:22:49 PM11/21/09

to

"dpi" <doug.i...@gmail.com> wrote in message
news:6983ade9-8056-4f12...@d10g2000yqh.googlegroups.com...

It's because in the very few major instruction groupings (only 8) that there
wasn't scope for a LOAD instruction as well as an ADD. (TAD).

Thinking was that the number of times that you'd want a second copy of
a datum was less than the number of times that you'd be about to load
in a new value, so DCA was better than STO (Your DEP) which you'd
then have to follow up with a CLA.

(Reminiscing from my 1972 Uni course which used the PDP8 as the design
reference for both the software and also the hardware parts of the degree
course.)

dpi

unread,

Nov 22, 2009, 2:27:09 AM11/22/09

to

On Nov 21, 1:18 am, "radioe...@gmail.com" <radioe...@gmail.com> wrote:
> I always though of that as a direct manifestation of the of a core
> memory "destructive read" operation... i.e., DCA has no write refresh
> cycle after the read and is therefore a faster operation. In the next
> instruction you can write it back manually (TAD) if you like with no
> performance penalty. If you want the AC cleared you win.

This was an intentional design decision. The AC does not have to be
cleared.

Basically this is how all instructions on the 8 occur.

/* this is the fetch portion which is common to all instructions */
MA <- PC /* Memory address of the
instruction to be fetched */
PC <- PC + 1 /* This updates the PC to point at
the next instruction */
MB <- MEM[MA] /* This is the instruction fetch from
memory */
MEM[MA] <- MB /* this is the write back of the
instruction since it was lost in the fetch */

/* What happens next depends on the instruction. For all memory
reference instructions: */
MA[5-11] <- MB[5-11] /* copy the address bits */
if (MB[4]==0) MA[0-4] <- 0 /* page zero reference zeros upper bits */

/* This is how indirect references work */
if (MB[3]==1) {
MB <- MEM[MA] /* Defer cycle also called indirect
addressing */
if (MA[8]==1 && MA[0-7]==0) MB <- MB+1 /* auto increment on indirect
reference 010 through 017 */
MEM[MA] <- MB /* write back the destroyed memory
location */
MA <= MB
}

/* at this point the MA has the address that the instruction will
operate on */
/* For the AND: */
MB <- MEM[MA]
MEM[MA] <- MB
AC <= AC & MB

/* For a TAD: */
MB <- MEM[MA]
MEM[MA] <- MB
AC <= AC + MB
LINK <- set appropriatly

/* For the ISZ: */
MB <- MEM[MA]
MB <- MB+1
MEM[MA] <- MB
if (MB==0) PC <- PC+1

/* For the DCA this is what happens: */
MB <- AC
AC <- 0
MEM[MA] <- MB

/* For a JMS: */
MB <- PC
MEM[MA] <- MB
PC <- MA
PC <- PC+1

/* For a JMP: */
PC <- MA

All Memory reference instructions take 2 cycles except for JMP which
takes 1. The Defer adds an additional cycle (or is it 2?) It looks
like things like the JMS should take longer than a simple one like DCA
but it doesn't because some of those operations listed take place in
parallel. For example the clear portion takes place at the same time
as the MB is written to memory. In instructions like the AND and TAD
the write back operation takes place during the time the AND and ADD
portions are taking place so it doesn't cost any extra time.

> I know that systems played tricks with read/modify/write cycles (like
> INC) to delete the write refresh cycles of the read wherever possible
> because this doubles memory throughput.

ISZ does and the auto increment defer case also does but nothing else
in the 8 does.

It cost almost nothing to add the clear to the Deposit operation. But
I still wonder if this was better overall than doing an explicit clear
before a TAD to get a Load operation.

The 8 instruction set is wonderfully simple!

invalid

unread,

Nov 22, 2009, 5:13:18 AM11/22/09

to

"dpi" <doug.i...@gmail.com> wrote in message

news:66280c76-946e-43df...@b2g2000yqi.googlegroups.com...

> /* this is the fetch portion which is common to all instructions */
> MA <- PC /* Memory address of the
> instruction to be fetched */
> PC <- PC + 1 /* This updates the PC to point at
> the next instruction */
> MB <- MEM[MA] /* This is the instruction fetch from
> memory */
> MEM[MA] <- MB /* this is the write back of the
> instruction since it was lost in the fetch */

etc.

You've reminded us all of an important historical point (Justifying
the additional cross-post added above.)

In the days of core stores, reading was destructive and so a single
memory read cycle was actually read followed by write-back. This was why
the autoincrement and ISZ write-backs did not increase the
number of memory cycles needed to execute their associated
instructions because the write-back of an exisiting cycle was used
for the purpose, but it did mean that the Instruction Decoder part of
the CPU had to be tied in to the Memory Controller.

Wouldn't be true of today's semiconductor memories, where extra
cycles would be needed to achieve the same effect; something to
be borne in mind when attempting simulations.

dpi

unread,

Nov 24, 2009, 7:02:21 PM11/24/09

to

On Nov 22, 3:13 am, "invalid" <inva...@invalid.invalid> wrote:
> In the days of core stores, reading was destructive and so a single
> memory read cycle was actually read followed by write-back. This was why
> the autoincrement and ISZ write-backs did not increase the
> number of memory cycles needed to execute their associated
> instructions because the write-back of an exisiting cycle was used
> for the purpose, but it did mean that the Instruction Decoder part of
> the CPU had to be tied in to the Memory Controller.
>
> Wouldn't be true of today's semiconductor memories, where extra
> cycles would be needed to achieve the same effect; something to
> be borne in mind when attempting simulations.

Actually dynamic ram is destructive read and always has been. This is
because a bit of data is stored as a charge in a pretty leaky
capacitor. A static ram element requires at least two transistors and
a few resistors to make a limited flipflop and they draw power all the
time to maintain their state. The leaky capacitor is a lot smaller in
size so takes a lot less space on the chip allowing lots more of them
in the same space. It also uses power only during refresh and the
actual read or write operation. For DRAM the read operation
discharges the capacitor so that the data must be rewritten or it is
lost. These capacitors also leak out with time so a refresh cycle
within a certain amount of time is necessary. Early implementations
of DRAM all allowed for read/modify/write cycles similar to core
memory. I haven't looked at the way any of the modern memory
subsystems are implemented but it is unlikely that modern designers
put stuff like that in because none of the mass market machines would
use it. But internally there is a write back after every read and
occasionally there is a refresh cycle which will refresh a whole row
of bits in the chips.

glen herrmannsfeldt

unread,

Nov 24, 2009, 11:27:48 PM11/24/09

to

In alt.sys.pdp8 dpi <doug.i...@gmail.com> wrote:

(snip on core memory, destructive read, and read modify write cycles)

> Actually dynamic ram is destructive read and always has been. This is
> because a bit of data is stored as a charge in a pretty leaky
> capacitor. A static ram element requires at least two transistors and
> a few resistors to make a limited flipflop and they draw power all the
> time to maintain their state.

Well, CMOS SRAM cells mostly only draw power when changing state.
That is why they can be used as battery backed long term storage.

> The leaky capacitor is a lot smaller in
> size so takes a lot less space on the chip allowing lots more of them
> in the same space. It also uses power only during refresh and the
> actual read or write operation. For DRAM the read operation
> discharges the capacitor so that the data must be rewritten or it is
> lost. These capacitors also leak out with time so a refresh cycle
> within a certain amount of time is necessary. Early implementations
> of DRAM all allowed for read/modify/write cycles similar to core
> memory. I haven't looked at the way any of the modern memory
> subsystems are implemented but it is unlikely that modern designers
> put stuff like that in because none of the mass market machines would
> use it. But internally there is a write back after every read and
> occasionally there is a refresh cycle which will refresh a whole row
> of bits in the chips.

A read or refresh cycle reads a whole row out (or is it column,
I forget), and then writes it back. The cycles are now fast enough
that the gain from waiting for the write back wouldn't be very big.
Actually, with cache in between, the DRAM likely doesn't see a
read-modify-write cycle, anyway.

-- glen

invalid

unread,

Nov 25, 2009, 3:57:01 AM11/25/09

to

"dpi" <doug.i...@gmail.com> wrote in message

news:53b11ba1-4710-4507...@p35g2000yqh.googlegroups.com...

I remember the refresh, but not what you say is destructive read.

There used to be a joke amongst amateur computer designers,
Q. What is the difference between static RAM and dynamic RAM?
A. Static works and dynamic doesn't

James Dow Allen

unread,

Nov 25, 2009, 1:39:34 PM11/25/09

to

On Nov 25, 7:02 am, dpi <doug.ingra...@gmail.com> wrote:
> Actually dynamic ram is destructive read and always has been.

Yes. The write-back is transparent, I think, above
the chip level, so many are unaware of it. Writing back is
(needs to be?) done immediately after the read so there may
not be time to use it in a read-modify-write scenario.
An entire row needs to be written during this write-back,
not just the cell in the column that is eventually selected.

> This is because a bit of data is stored as a charge in
> a pretty leaky capacitor.

Almost 30 years ago, the charge in such a cell was about
200,000 electrons, IIRC. In a conversation with Thomas L.
Palfi, a key inventor of semiconductor memory,
I expressed surprise that this tiny charge could be detected
reliably. His answer? "You can detect a single electron
if you're smart enough!" What's the charge in a single
cell of one of today's dense DRAM's?

> A static ram element requires at least two transistors and
> a few resistors to make a limited flipflop and they draw
> power all the time to maintain their state.

In the 1970's, IIRC, static ram cells used a total of 4 or 5
transistors (you need gating in addition to the flip-flop itself).
There was a 3-transistor "pseudo-static" cell used
in what was then advertised as the World's Fastest NMOS RAM.

As glen points out, the magic of CMOS reduces power
consumption.

James Dow Allen

glen herrmannsfeldt

unread,

Nov 25, 2009, 6:28:48 PM11/25/09

to

In alt.sys.pdp8 James Dow Allen <jdall...@yahoo.com> wrote:

> On Nov 25, 7:02?am, dpi <doug.ingra...@gmail.com> wrote:
>> Actually dynamic ram is destructive read and always has been.

> Yes. The write-back is transparent, I think, above
> the chip level, so many are unaware of it. Writing back is
> (needs to be?) done immediately after the read so there may
> not be time to use it in a read-modify-write scenario.
> An entire row needs to be written during this write-back,
> not just the cell in the column that is eventually selected.

Somewhere I might still have the data sheet from some older DRAM
in the days when that might have been considered.

(snip)

> In the 1970's, IIRC, static ram cells used a total of 4 or 5
> transistors (you need gating in addition to the flip-flop itself).
> There was a 3-transistor "pseudo-static" cell used
> in what was then advertised as the World's Fastest NMOS RAM.

The two transistor FF requries pull-up resistors. CMOS would
need four. Then one or two for the gating.

-- glen

Charles Richmond

unread,

Nov 25, 2009, 8:31:20 PM11/25/09

to

James Dow Allen wrote:
> On Nov 25, 7:02 am, dpi <doug.ingra...@gmail.com> wrote:
>> Actually dynamic ram is destructive read and always has been.
>
> Yes. The write-back is transparent, I think, above
> the chip level, so many are unaware of it. Writing back is
> (needs to be?) done immediately after the read so there may
> not be time to use it in a read-modify-write scenario.
> An entire row needs to be written during this write-back,
> not just the cell in the column that is eventually selected.
>

When you "hit each row" of the dynamic RAM chip, isn't that called
"refresh"???

--
+----------------------------------------+
| Charles and Francis Richmond |
| |
| plano dot net at aquaporin4 dot com |
+----------------------------------------+

dpi

unread,

Nov 26, 2009, 12:14:13 AM11/26/09

to

On Nov 25, 6:31 pm, Charles Richmond <friz...@tx.rr.com> wrote:
> James Dow Allen wrote:
> > On Nov 25, 7:02 am, dpi <doug.ingra...@gmail.com> wrote:
> >> Actually dynamic ram is destructive read and always has been.
>
> > Yes. The write-back is transparent, I think, above
> > the chip level, so many are unaware of it. Writing back is
> > (needs to be?) done immediately after the read so there may
> > not be time to use it in a read-modify-write scenario.
> > An entire row needs to be written during this write-back,
> > not just the cell in the column that is eventually selected.
>
> When you "hit each row" of the dynamic RAM chip, isn't that called
> "refresh"???

Any read or write would refresh the row. A write cycle is a read into
the row buffer followed by a modify of the cell you were interested in
writing followed by the write back of the row. Reads are read of a
row into the row buffer followed by a write back. I was thinking the
Intel 1103, which was a wretched chip to work with, had a read/modify/
write cycle but I didn't find a data sheet in a quick web search. I
am not sure I have a databook that old anymore.

I remember looking at one machine that didnt have any refresh
hardware, it was done automagically when an interrupt occurred. They
would always execute enough instructions in the interrupt routine to
hit the first 128 bytes of memory which would refresh all the dram
because all chips were selected for read, it was just the one they
wanted to see that was gated onto the data bus so all cells got
refreshed on every timer interrupt. I remember thinking this was
clever but it consumed more power and of course slowed down
execution. Woe be you if you turned off the interrupts for too long.
I can't remember what this machine was.

This is pretty far off topic now but still interesting.

Quadibloc

unread,

Nov 26, 2009, 12:27:55 AM11/26/09

to

On Nov 25, 11:39 am, James Dow Allen <jdallen2...@yahoo.com> wrote:

> In the 1970's, IIRC, static ram cells used a total of 4 or 5
> transistors (you need gating in addition to the flip-flop itself).
> There was a 3-transistor "pseudo-static" cell used
> in what was then advertised as the World's Fastest NMOS RAM.

Even today, 4 or 6 transistors are used per bit in most on-chip cache
memories.

John Savard

James Dow Allen

unread,

Nov 26, 2009, 2:46:50 AM11/26/09

to

On Nov 26, 8:31 am, Charles Richmond <friz...@tx.rr.com> wrote:
> James Dow Allen wrote:

Caveat: My involvement with computer memories was in the
mid-1970's specifically. Details might now be *completely*
different, for all I know.

> > Yes. The write-back is transparent, I think, above
> > the chip level, so many are unaware of it. Writing back is
> > (needs to be?) done immediately after the read so there may
> > not be time to use it in a read-modify-write scenario.
> > An entire row needs to be written during this write-back,
> > not just the cell in the column that is eventually selected.
>
> When you "hit each row" of the dynamic RAM chip, isn't that called
> "refresh"???

At the chip level, refresh is same as read. You refresh
one row at a time. The reason refresh cycles consumed much
more power than reads was that *every* memory chip was
selected during refresh. (In those days, 1 Megabyte of memory
might comprise 32 sets of 72 chips each. I guess memories
are denser these days. :-)

More interesting than the refresh itself was how to prevent
the processor, designed for static memories, from accessing
memory during the refresh. For a *very* long-winded account
of an interesting bug associated with refresh on
IBM's 370/158 (AP or MP), see
http://fabpedigree.com/james/bug22.htm

James Dow Allen

invalid

unread,

Nov 26, 2009, 5:36:55 AM11/26/09

to

"James Dow Allen" <jdall...@yahoo.com> wrote in message
news:ac930484-9743-4ec2...@y32g2000prd.googlegroups.com...

> More interesting than the refresh itself was how to prevent
> the processor, designed for static memories, from accessing
> memory during the refresh. For a *very* long-winded account
> of an interesting bug associated with refresh on
> IBM's 370/158 (AP or MP), see
> http://fabpedigree.com/james/bug22.htm

ISTR that the Z80 did this by having a memory refresh cycle,
immediately after instruction fetch, during the decode time.

jmfbahciv

unread,

Nov 26, 2009, 9:01:40 AM11/26/09

to

James Dow Allen wrote:
> On Nov 25, 7:02 am, dpi <doug.ingra...@gmail.com> wrote:
>> Actually dynamic ram is destructive read and always has been.
>
> Yes. The write-back is transparent, I think, above
> the chip level, so many are unaware of it. Writing back is
> (needs to be?) done immediately after the read so there may
> not be time to use it in a read-modify-write scenario.

<snip>

In some cases, this (no r-m-w) is not a feature ;-).

/BAH

glen herrmannsfeldt

unread,

Dec 7, 2009, 5:58:05 PM12/7/09

to

In alt.sys.pdp8 dpi <doug.i...@gmail.com> wrote:

(snip regarding read-modify-write cycles)

> Any read or write would refresh the row. A write cycle is a read into
> the row buffer followed by a modify of the cell you were interested in
> writing followed by the write back of the row. Reads are read of a
> row into the row buffer followed by a write back. I was thinking the
> Intel 1103, which was a wretched chip to work with, had a read/modify/
> write cycle but I didn't find a data sheet in a quick web search. I
> am not sure I have a databook that old anymore.

It seems that the MK4027, a popular 4K bit DRAM from not so long
ago, also has a read-modify-write timing diagram.

Somewhat easier to find than the 1103 data sheet, though I
think I had one of those not so long ago.

-- glen

Steve Gibson

unread,

Dec 29, 2009, 3:15:04 PM12/29/09

to

I'm a career assembly-language programmer who recently re-
discovered the PDP-8, the first machine language I ever learned.
During the past year I'm managed to collect a few machines and I
look forward to fully restoring those beauties in years to come.

I also recently built a couple of Bob Armstrong's lovely single-
chip (the PDP-8 clone by Intersil/Harris) single-board systems
with the blinking lights front panel ... then wrote a few front
panel "toys" to give the front panels something to do.

http://www.grc.com/pdp-8/showandtell-sbc.htm
http://www.grc.com/pdp-8/deepthought-sbc.htm
http://www.grc.com/pdp-8/lightsout-sbc.htm

So I enjoyed the DCA vs DEP discussion thread after having just
re-learned the PDP-8 ... and I think that the reason for the
choice of DCA over DEP was two-fold:

First, given a 3-bit opcode forcing as virtually a sparse
instruction set as imaginable, there wasn't room for both a LOAD
from memory and an ADD from memory. (As 'invalid' also mentioned
in this thread.) Given that ADD could perform a LOAD by
preceding it with a CLA, but LOAD couldn't perform an ADD, it
was clear that ADD (or TAD) won the "how to get data from
memory" instruction competition.

But I think the situation is less clear on the "saving something
to memory" direction. If deposit-to-memory did not also clear
the accumulator, and you wanted to follow a store with a simple
load, then you'd need a CLA before the TAD. The CLA, not being
an MRI instruction, is one cycle faster, but back in those days
core words were very expensive, so you'd want to avoid that if
possible. And, as has been mentioned, if the store *did* clear
the accumulator, and that's *not* what you wanted, you'd need to
immediately reload it, thus consuming another precious memory
word.

Having recently read through the early history of DEC and of the
genesis of the PDP-8 in particular, I think the answer, and the
reason for the designer's final choice, has was the target
market for the PDP-8: It was originally intended as more as a
process control machine than a general math-oriented data
processor. So the designers felt that the PDP-8 would be
shuttling data from one place to another with TAD/DCA/TAD/DCA
sequences more than wanting to hold onto data that had just been
stored.

In my own recent PDP-8 coding, I found that the cleared
accumulator was what I wanted more often than not. Though a scan
through my code (see the second two links above if you're
curious) does show some paired DCA/TAD's where I'm reloading
what I just stored, they are in the minority.

Anyway... just my 2 cents worth. It's great that this group
exists for keeping the PDP-8 spirit alive! :)

--
________________________________________________________________
Steve. Working on: GRC's DNS Benchmark utility documentation:
http://www.grc.com/dns/benchmark.htm