Comparing Today's PC to the IBM 704

Quadibloc

未读，

2021年11月10日 09:34:452021/11/10

收件人

The IBM 704 was a computer that used vacuum tubes for its logic
elements.
In some ways, it was one of the first computers to be similar to the
computers that came after it up to the present day. Older computers
sometimes used recirculating memories of some kind - drum memory,
or mercury delay lines, or magnetostrictive delay lines. Some had
Williams tubes, an early form of random access memory that was
finicky and unreliable, due to pattern sensitivity. The IBM had core
memory, which was a highly-reliable form of random-access memory.
Also, it had a hardware floating-point unit. While it only had instructions
to perform single-precision floating arithmetic, these instructions were
designed to retain information generated during the calculation that made
it easier to perform double-precision calculations.
I recently looked at a video about the experimental chess-playing program
tried on an IBM 704 computer. In the video, it was mentioned that when
calculating a satellite orbit, an IBM 704 performed one billiion calculations
in a single day.
I assume that means one billion instructions were executed.
This figure was also cited in a paper about the chess program, which
is probably where the text of the narration came from.
Today, a typical PC might have a clock rate of 3 GHz or thereabouts.
But most instructions take more than three cycles to execute.
Out-of-order execution attempts to approach the ideal of letting a
single thread still use every cycle to start a new instruction, but I
doubt it is approached that closely in practice.
So instead of being 200,000 times faster, or even 86,400 (the number
of seconds in a day) times faster, than an IBM 704, perhaps one of
today's PCs is closer to 10,000 or 20,000 times faster than a 704 in
single-threaded performance.

John Savard

JimBrakefield

未读，

2021年11月10日 10:32:422021/11/10

收件人

In painting the history of computers in broad brush strokes, three orders of magnitude seems to work.
At one time much was made of the 3M computer:
a million "instructions" per second, a mega byte of memory and a mega pixel display
Going backwards in time, the 3K computer:
thousands of instructions per second, kilobytes of memory and an ASCI display
and the 3U computer:
the adding machine and the abacus, paper and pencil
We are currently at the tail end of the 3G era and 3T single chip machines are appearing

It is clear that different problems have different needs, as in compute speed, memory size and IO bandwidth.
The "power" of a given computer is some complicated and nonlinear function of these three items?
So historically one can try to place different technologies into these eras:
vacuum tubes, relays, electro-mechanical, tape drives, core memory, flash memory etc.
Again one can use broad brush strokes and get some semblance of order.

John Levine

未读，

2021年11月10日 10:36:442021/11/10

收件人

According to Quadibloc <jsa...@ecn.ab.ca>:

>I recently looked at a video about the experimental chess-playing program
>tried on an IBM 704 computer. In the video, it was mentioned that when
>calculating a satellite orbit, an IBM 704 performed one billiion calculations
>in a single day.

The 704's cycle time was 12us. Most fixed point instructions took two or three
cycles, floating add or subtract 7 cycles, floating multiply or divide 17 or
18, and fixed multiply or divide 20 cycles.

Divide 12us into a day and you get 7.2 billion cycles, so if we assume an instruction
mix that didn't have a lot of multiplying and dividing, 7 cycles per instruction and
a billion instructions per day seems about right.

Instruction timings are on pp 28-29 of the manual:

http://bitsavers.org/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf

The cycle time was about 800Khz, so a 3Ghz PC cycles 36000 times faster but it
will issue instructions a lot faster than one every 7 cycles, so I'd say it's
more like 100,000 times faster.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

EricP

未读，

2021年11月10日 11:25:352021/11/10

收件人

John Levine wrote:
> According to Quadibloc <jsa...@ecn.ab.ca>:
>> I recently looked at a video about the experimental chess-playing program
>> tried on an IBM 704 computer. In the video, it was mentioned that when
>> calculating a satellite orbit, an IBM 704 performed one billiion calculations
>> in a single day.
>
> The 704's cycle time was 12us. Most fixed point instructions took two or three
> cycles, floating add or subtract 7 cycles, floating multiply or divide 17 or
> 18, and fixed multiply or divide 20 cycles.
>
> Divide 12us into a day and you get 7.2 billion cycles, so if we assume an instruction
> mix that didn't have a lot of multiplying and dividing, 7 cycles per instruction and
> a billion instructions per day seems about right.
>
> Instruction timings are on pp 28-29 of the manual:
>
> http://bitsavers.org/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf
>
> The cycle time was about 800Khz, so a 3Ghz PC cycles 36000 times faster but it
> will issue instructions a lot faster than one every 7 cycles, so I'd say it's
> more like 100,000 times faster.

704 is doing 1 billion per day.
x64 with an IPC of 2 is doing 2*3 billion per second
= a factor of 6*86400 = 518,400 faster.
That's over 60 or so years.

518,400 = (1.0 + i)^60, i = 0.25 = 25% compound annual growth.
That gives a doubling time of 3.11 years.

MitchAlsup

未读，

2021年11月10日 14:33:012021/11/10

收件人

On Wednesday, November 10, 2021 at 8:34:45 AM UTC-6, Quadibloc wrote:

> The IBM 704 was a computer that used vacuum tubes for its logic
> elements.
> In some ways, it was one of the first computers to be similar to the
> computers that came after it up to the present day. Older computers
> sometimes used recirculating memories of some kind - drum memory,
> or mercury delay lines, or magnetostrictive delay lines. Some had
> Williams tubes, an early form of random access memory that was
> finicky and unreliable, due to pattern sensitivity. The IBM had core
> memory, which was a highly-reliable form of random-access memory.
> Also, it had a hardware floating-point unit. While it only had instructions
> to perform single-precision floating arithmetic, these instructions were
> designed to retain information generated during the calculation that made
> it easier to perform double-precision calculations.
> I recently looked at a video about the experimental chess-playing program
> tried on an IBM 704 computer. In the video, it was mentioned that when
> calculating a satellite orbit, an IBM 704 performed one billiion calculations
> in a single day.
> I assume that means one billion instructions were executed.
> This figure was also cited in a paper about the chess program, which
> is probably where the text of the narration came from.
> Today, a typical PC might have a clock rate of 3 GHz or thereabouts.
<

OK enough

<
> But most instructions take more than three cycles to execute.
<

3 cycles of latency, but as many as 3-4 instructions start every cycle.

<
> Out-of-order execution attempts to approach the ideal of letting a
> single thread still use every cycle to start a new instruction, but I
> doubt it is approached that closely in practice.
<

About 50% of the instructions issued into the window complete.

<
> So instead of being 200,000 times faster, or even 86,400 (the number
> of seconds in a day) times faster, than an IBM 704, perhaps one of
> today's PCs is closer to 10,000 or 20,000 times faster than a 704 in
> single-threaded performance.
<

As stated above, I bet the number is closer to 200,000-1,000,000 faster.
>
> John Savard

anti...@math.uni.wroc.pl

未读，

2021年11月12日 09:17:532021/11/12

收件人

You are seriously underestimating modern machines. In addition
to what EricP and Mitch Alsup wrote, note that modern instructions
can do more than old ones. General registers and immediates
save many loads and stores. Addressing modes can do more than
simple index registers in 704. I would say that for given
work modern machine probably needs half of number of instructions
needed on 704. And that ignoring packed SIMD which can give
extra speedup. So I would say that modern machine is closer
to 1000000 (10^6) faster.

--
Waldek Hebisch

chris

未读，

2021年11月12日 11:43:262021/11/12

收件人

Not only that, but convenience and total time to solution. Would guess
that a 704 class machine may have taken a day just to prepare a program
to run, then into the queue, debug, more work etc. A modern machine can
sit on your desk with high res monitor, high grade compilers and
effectively give instant results. That's before even thinking about
power consumption, floor area, reliability and cost of maintenance...

Chris

consumption, maintenance and

BGB

未读，

2021年11月12日 14:40:012021/11/12

收件人

Based on what I can gather (crude estimate):
7030 was ~ 10x faster than 704 (based on earlier benchmarks);
7030 got ~ 2k on Dhrystone;
Saw a video which claimed ~ 4.8k for a 386SX-25
BJX2 gets ~ 69k, ATM (50MHz);
Emulator gets ~ 154k at 100MHz (*1);
My desktop PC gets ~ 11M-45M.

Back-propagating, one can hand-wave the 704 as being ~ 200 Dhrystone,
for sake of argument.

The 11 to 45 million delta is mostly due to MSVC vs GCC issues (Clang
gives ~ 41 million). This is on a Ryzen-7 2700x running at 3.7 GHz.

GCC's score drops considerably with '-Os' (slightly below the MSVC
score; giving ~ 9M). Likewise, '-O0' is a little slower than this (~
6M). Clang results are mostly similar to GCC results, just a little bit
smaller. GCC seems to fall into two groups "-O2/-O3"="fast";
"-O1/-Os"="less fast".

The relative speed differences in MSVC settings seems to be somewhat
smaller (except /O0, which is also ~ 6M).

Though, this is for '/Os' and '/O1'; trying to test with '/O2' seems to
give broken output (the "should be" lines don't match up, then the
benchmark prints nonsense numbers and crashes). It is possible "/O2"
might be faster if it were working correctly in this case (well, and/or
it tries some rather misguided attempt to shove it into SSE instructions
or similar).

Not sure what is going on here exactly, usually GCC vs MSVC comparisons
are a lot closer than this (usually within 2x), and the effect of
optimization settings isn't usually so drastic. Both are targeting
x86-64, with GCC and Clang running in WSL.

But, in any case, a conservative estimate here would be 55000x to
225000x vs the 704...

If one is measuring floating-point, SIMD, ..., it is likely the
difference would be much bigger (well, and my PC has a lot more cores, ...).

*1:
Would measure JIT'ed speed for BJX2, except my JIT is kinda broke, and
my emulator isn't particularly fast due to its efforts to be
cycle-accurate (so I can't really crank it and still get real-time,
unless I go and fix these parts; otherwise it is painfully slow).

Doing a "what if" (sub real time emulation for BJX2):
1.0 GHz would give ~ 1.6M;
3.7 GHz would give ~ 5.9M.

Scores would be ~ 1.9M and ~ 7.0M if one models a 2-way (fully set
associative) L1 (rather than a direct-mapped L1).

Or, 2.0M and 7.2M if one assumes a 100% hit rate for the L1 caches.
Note that the emulator assumes a stalling pipeline, with fairly strict
interlock handling (interlock penalties are the main source of missed
clock-cycles behind cache misses).

Values seem to scale fairly linearly relative to clock speed.

Though, I still kinda suspect BGBCC is still pretty buggy here, and
could do a bit better here.

And recently I have been off trying to track down a bug where the
relative order of "when" type promotions are performed seems to change
the results of arithmetic expressions, which should not happen. I have
also seen some stuff that seems "rather suspicious" regarding register
allocation and write-back (in particular, that disabling some
optimizations to skip needless register spills leads to the compiled
programs crashing, which implies "something has gone wrong"; it would
appear as-if something is amiss with how temporaries are being assigned
or used, but some attempts to add "sanity tests" haven't turned up any
hits, such as the compilers stack machine encountering "free"
temporaries having been pushed onto the stack, ...).

I guess it is an open question how well the ISA could do "if my compiler
weren't buggy crap".

...