Based on what I can gather (crude estimate):
7030 was ~ 10x faster than 704 (based on earlier benchmarks);
7030 got ~ 2k on Dhrystone;
Saw a video which claimed ~ 4.8k for a 386SX-25
BJX2 gets ~ 69k, ATM (50MHz);
Emulator gets ~ 154k at 100MHz (*1);
My desktop PC gets ~ 11M-45M.
Back-propagating, one can hand-wave the 704 as being ~ 200 Dhrystone,
for sake of argument.
The 11 to 45 million delta is mostly due to MSVC vs GCC issues (Clang
gives ~ 41 million). This is on a Ryzen-7 2700x running at 3.7 GHz.
GCC's score drops considerably with '-Os' (slightly below the MSVC
score; giving ~ 9M). Likewise, '-O0' is a little slower than this (~
6M). Clang results are mostly similar to GCC results, just a little bit
smaller. GCC seems to fall into two groups "-O2/-O3"="fast";
"-O1/-Os"="less fast".
The relative speed differences in MSVC settings seems to be somewhat
smaller (except /O0, which is also ~ 6M).
Though, this is for '/Os' and '/O1'; trying to test with '/O2' seems to
give broken output (the "should be" lines don't match up, then the
benchmark prints nonsense numbers and crashes). It is possible "/O2"
might be faster if it were working correctly in this case (well, and/or
it tries some rather misguided attempt to shove it into SSE instructions
or similar).
Not sure what is going on here exactly, usually GCC vs MSVC comparisons
are a lot closer than this (usually within 2x), and the effect of
optimization settings isn't usually so drastic. Both are targeting
x86-64, with GCC and Clang running in WSL.
But, in any case, a conservative estimate here would be 55000x to
225000x vs the 704...
If one is measuring floating-point, SIMD, ..., it is likely the
difference would be much bigger (well, and my PC has a lot more cores, ...).
*1:
Would measure JIT'ed speed for BJX2, except my JIT is kinda broke, and
my emulator isn't particularly fast due to its efforts to be
cycle-accurate (so I can't really crank it and still get real-time,
unless I go and fix these parts; otherwise it is painfully slow).
Doing a "what if" (sub real time emulation for BJX2):
1.0 GHz would give ~ 1.6M;
3.7 GHz would give ~ 5.9M.
Scores would be ~ 1.9M and ~ 7.0M if one models a 2-way (fully set
associative) L1 (rather than a direct-mapped L1).
Or, 2.0M and 7.2M if one assumes a 100% hit rate for the L1 caches.
Note that the emulator assumes a stalling pipeline, with fairly strict
interlock handling (interlock penalties are the main source of missed
clock-cycles behind cache misses).
Values seem to scale fairly linearly relative to clock speed.
Though, I still kinda suspect BGBCC is still pretty buggy here, and
could do a bit better here.
And recently I have been off trying to track down a bug where the
relative order of "when" type promotions are performed seems to change
the results of arithmetic expressions, which should not happen. I have
also seen some stuff that seems "rather suspicious" regarding register
allocation and write-back (in particular, that disabling some
optimizations to skip needless register spills leads to the compiled
programs crashing, which implies "something has gone wrong"; it would
appear as-if something is amiss with how temporaries are being assigned
or used, but some attempts to add "sanity tests" haven't turned up any
hits, such as the compilers stack machine encountering "free"
temporaries having been pushed onto the stack, ...).
I guess it is an open question how well the ISA could do "if my compiler
weren't buggy crap".
...