Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Let's merge pi5 and ils2 but not cycle realistic into trunk right now.

216 views
Skip to first unread message

William Cattey

unread,
Nov 10, 2024, 12:42:36 PM11/10/24
to PiDP-8
Sorry to have fallen silent for so long.  I came back from vacation back in August with a hurt shoulder that just kept getting worse. It ate my brain.  I'm finally better enough to feel able to work on side projects.

The results of the benchmarking show that the pi5 update, and the ils2 updates do not significantly impact performance.  They add real value, so I believe they should be merged into the trunk as-is.

The cycle realistic code cuts performance in half.  Last month there was discussion about it.  It seemed to my eye that consensus was that upstream would not accept such an update because of the combination of performance impact and divergence from the primary goal of SIMH that is to de-emphasize cycle accuracy.  Let's keep that conversation going.

Meanwhile, I'm going to pull the pi5 and ils2 stuff into trunk.  We can use this thread for discussion about that merge.

-Bill

Randy Merkel

unread,
Nov 10, 2024, 1:01:13 PM11/10/24
to PiDP-8
Will there be a build time option for cycle realist?

Mike Katz

unread,
Nov 10, 2024, 1:27:21 PM11/10/24
to Randy Merkel, PiDP-8
Bill,

I'm sorry for your shoulder.  I am familiar with that all too well.

Is there a problem merging cycle realistic into the main with either a compile time option or preferable a run time command line option (both OS and SIMH command lines).

If cycle realistic is a separate branch the average newbie will be unlikely to know it exists.
If it is a compile time option if can become part of the configuration option which requires a little attention but can be fairly easily done for the newbie.  However this requires the user to have 2 different binaries if they want to run both.
If it is a runtime option, this may increase the size of the binary but is the easiest to turn it on and off.  This is my preference as long as the default is cycle realistic is off.

One final comment, even with cycle realistic on, running on the slowest Pi, simulation is more than twice as fast as the approximately 0.5 MIPS rating of the actual PDP-8.  Does this meat the intention of SIMH?

Thanks for listening,

               Mike
--
You received this message because you are subscribed to the Google Groups "PiDP-8" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pidp-8+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pidp-8/2fe934f2-ff3b-4ea2-ac8d-af06482c5a91n%40googlegroups.com.

William Cattey

unread,
Nov 10, 2024, 1:50:25 PM11/10/24
to Mike Katz, Randy Merkel, Heinz-Bernd Eggenstein, PiDP-8
Hi Randy, Hi Mike, Hi HB,

Making the cycle-realistic branch a compile-time option is possible, but it is additional integration work beyond what's currently checked in.  I agree that ordinary people won't know what to do with a branch.

Funny story, while I've been offline, Oscar crafted a HowTo page that directed pi5 users to build the pi5 branch.  My current efforts are to eliminate that additional work.

I remember from the discussion in another thread there was back-and-forth about how having both current and cycle-realistic made the code base bigger. I think we didn't come to consensus on that discussion.  My sense is that we should not worry right now about bloat.  As we continue to evolve this software there will be times when the code base gets bigger to support emerging ideas, and then shrinks as we're able to get more of our work adopted upstream, and as we figure out how to refactor functionality.

HB has been the primary author of the recent cycle-realistic updates.  He knows the code best, and I think he's the primary stake holder in what happens next with it.

I think we should engage SIMH upstream to explain our situation, that we have this popular kit with blinky-lights, and we have code that fully supports "single step", but that it has half the performance of the existing code base. Ask them how serious the "half the performance" issue is.

I will take the action item to make that outreach.

Meanwhile, HB, would you like to do the "double the current code base and make cycle-realistic a compile time option" work?  Maybe we could call it "single-step support" (so as to be more attractive to upstream SIMH)?

-Bill

You received this message because you are subscribed to a topic in the Google Groups "PiDP-8" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pidp-8/OGUlirCYofU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pidp-8+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pidp-8/ad5482ee-89c0-4a16-b368-cde9e9fac1a4%40gmail.com.

Mike Katz

unread,
Nov 10, 2024, 1:59:54 PM11/10/24
to William Cattey, Randy Merkel, Heinz-Bernd Eggenstein, PiDP-8
Bill, HB, Randy,

If we can make "Single Step Support" a compile time option, how much more difficult would it be to make it a runtime option.

I agree with Bill that code bloat should not be a worry for now.

Thank you everyone for your time on making these improvements.

            Mike

William Cattey

unread,
Nov 10, 2024, 2:03:46 PM11/10/24
to Mike Katz, Randy Merkel, Heinz-Bernd Eggenstein, PiDP-8
I've merged the ils2 and the pi5 into trunk.

Question for others:  This seems to be the first time I've tried "single step".  Is it normal for it to do **nothing** in code lines other than "cycle accurate"?

-Bill

Steve Tockey

unread,
Nov 11, 2024, 2:55:05 PM11/11/24
to PiDP-8
 All,
Let me quick jump in here and say it's more than just "making the Sing Step work". It actually also makes the Sing Inst switch work realistically. On the stock PiDP-8/I SIMH, the Sing Step and Sing Inst switches are used to mount external (e.g., USB) media. That functionality is retained in Cycle Realistic, it just uses a different switch configuration to do so.

I might suggest that at least initially the simpler option is to make it compile-time. The code differences between Cycle Realistic and the stock version are sprinkled around about 6-8 different files. It should be fairly easy for a build script to just select one set vs. the other when compiling & building. Merging all of those changes into a single set of source code files will, IMHO, get pretty ugly.

I think that Mike makes a good suggestion in compiling and building both versions and then simply copying the desired executable into where it's needed to run the PiDP-8/i. That could be implemented in two very trivial scripts that do the copy, hiding the gory details from the PiDP-8/I owner.


My proverbial two cents,

-- steve

Bill Cattey

unread,
Nov 11, 2024, 3:32:47 PM11/11/24
to Steve Tockey, PiDP-8
Hi Steve,

What you propose seems like a good way forward.  Your message reminds me that, although HB did the most recent code pull, it was actually your code that did the cycle-realistic implementation. Apologies for mis-attributing.

I did a quick scan of the diffs from when HB did the merge on 2024 June 15.  Looks like the following files were touched:

src/SIMH/PDP8/pdp8_cpu.c -- the lion's share of changes.
src/pidp8i/gpio_common.h, and  src/pidp8i/gpio_common.c -- Enabling the Sing Step switch functionality.
src/pidp8i/gpio_ils.c -- Sing Step display functionality.
src/pidp8i/pidp8i.h -- Instruction flow.
src/pidp8i/main.c -- Major state display and handling the Sing Step switch.

Although I've not made a careful read of the code, and am willing to be told I'm wrong, it seems to me that updating everything under src/pidp8i to support SING STEP makes sense.  There could be compile-time (or even run-time) conditionals to expect the SIMH CPU gives us our major states.

Would we then have two versions of pdp8_cpu.c, or two sub-modules utilized by pdp8_cpu.c depending on whether or not we complied with "ENABLE_SING_STEP"?

Would it be possible to stub out the major state functionality so that it could be perceived by the pidp8 modules as either "always locked in a known state when the SING_STEP cpu stuff is disabled" or "ticking through major states when SING_STEP is present and enabled"?

-Bill

Steve Tockey wrote on 11/11/24 2:55 PM:

Steve Tockey

unread,
Nov 11, 2024, 4:03:56 PM11/11/24
to PiDP-8

Bill,
No worries, HB does deserve a lot of credit for pushing things forward.

I will need to check to be sure, but I do think there might be a few more source files affected. As I said, between 6 and 8 comes to mind when I was doing my own benchmarking work earlier. For each of those files, we could have two versions, e.g., pdp8_cpu.c and pdp8_cpu_cr.c. Just compile everything (although we would also need to tweak the #include statements in the *_cr.c files to include the *_cr.h versions of the header files). Then build using the stock *.o set into the stock executable, and build the other set into a separate executable. As I said, either copying the stock executable or the cycle realistic executable (possibly changing its name along the way) would be the final step and could be changed later by trivial auxiliary scripts if the PiDP-8/I owner wanted to switch from one version to the other.

That seems to be the path of least resistance to me.


-- steve

Bill Cattey

unread,
Nov 11, 2024, 4:11:18 PM11/11/24
to Steve Tockey, PiDP-8
I've opened an issue at opensimh.org asking how willing they'd be to accept a pull request for Major State emulation of the PDP8 at: https://github.com/open-simh/simh/issues/434

Steve Tockey wrote on 11/11/24 4:03 PM:

Mike Katz

unread,
Nov 11, 2024, 4:44:10 PM11/11/24
to Steve Tockey, PiDP-8
All,

As I think this through further, at compile time for the PiDP-8/I, create two binaries and have the pidp8i script take an option for Single Step or not.  It could also be pdip8i and pidp8iss for the script names.  That offers the user the greatest flexibility with the least amount of effort.  Any newbie who just runs pidp8i will get what they expect and if they wonder "how do i get single step to work correctly?" that information can be in the read me file.

Steve Tockey

unread,
Nov 14, 2024, 4:48:08 PM11/14/24
to PiDP-8
All,
FYI, I did look at my local version of the Cycle Realistic code and it spans these seven source code files in .../pidp8i/src/pidp8i:
- pidp8i.h
- gpio.common.h
- gpio-common.c
- gpio-ils.c
- main.c

It also needs this one file in .../pidp8i/src/SIMH/PDP8
- pdp8_cpu.c

If HB somehow refactored things to reduce the number of affected files, fine, but as far as I know Cycle Realistic requires changes in a total of 8 files.

I'm still running a fairly old version of PiDP-8/i so there may have been changes, but what I saw was that the build process results in the executable file .../pidp8i/bin/pidp8i-sim

It should be trivial to have renamed versions of those eight files sitting side-by-side in those directories, e.g., in .../pidp8i/src/pidp8i:
- pidp8i.h and pidp8i-cr.h
- gpio.common.h and gpio.common-cr.h
- gpio-common.c and gpio-common-cr.c
- gpio-ils.c and gpio-ils-cr.c
- main.c and main-cr.c

as well as in .../pidp8i/src/SIMH/PDP8
- pdp8_cpu.c and pdp8_cpu-cr.c

Some #include statements in the *-cr.c files will need to be edited to refer to the proper *-cr.h files but that's minor.

The Cycle Realistic version could be built into the executable file .../pidp8i/bin/pidp8i-sim-cr

The default install script could just continue to copy .../pidp8i/bin/pidp8i-sim into /opt/pidp8i/(wherever it goes, sorry I forgot to look)/pidp8i-sim as it already does.

We could then have two simple auxiliary scripts:
- one that copies ../pidp8i/bin/pidp8i-sim-cr into /opt/pidp8i/(wherever)/pidp8i-sim to switch to Cycle Realistic, and 
- one that copies ../pidp8i/bin/pidp8i-sim into /opt/pidp8i/(wherever)/pidp8i-sim to switch back

This seems to me to be the most painless way of supporting both versions.


Thoughts anyone? 

-- steve


On Monday, November 11, 2024 at 12:32:47 PM UTC-8 bill....@gmail.com wrote:

Heinz-Bernd Eggenstein

unread,
Nov 14, 2024, 5:45:24 PM11/14/24
to PiDP-8
Sorry for being silent on this, I was (am)  busy with a lot of other stuff.

Puuuuh....honestly I'm not a fan of maintaining both variants. I had hoped for cycle-realistic/major mode support to be the "new normal", and hopefully upstream SIMH adopting it as well but if not, just keeping it anyway. The SIMH PDP8 ( -8E actually...) cpu code seems to be quite stable and complete (other than not supporting major states....) anyway, so either way I do not see a big risk in diverging from upstream SIMH in that respect. Certainly not ideal but manageable. If we keep two code variants, that would also *not* make it easier to sync with upstream SIMH, rather the opposite. And I'm pretty sure we want to keep cycle-realistic in the codebase now it is out of the toothpaste tube.... I somehow do not see any advantages in keeping the old, not cycle-realistci code alongside the new C-R code in the release version.

Just my 2 €-cents 
HB

Steve Tockey

unread,
Nov 14, 2024, 7:30:42 PM11/14/24
to PiDP-8
HB,
Regarding two codebases and re-synching with upstream, you are certainly right: it's not good general practice to have nearly identical code side by side. On the other hand, as long as the relevant files are named so obviously similarly whenever there is a *-cr version of a file it needs to incorporate the same upstream changes as the version of the file without the -cr. A bit more work, yes. But it should be manageable. Is there really that much upstream change anyway? is it happening but we just aren't seeing inside of our PiDP-8/I bubble?

You do bring up a very interesting point: would a significant majority of the PiDP-8/I community be willing to take the approximate 50% performance hit of Cycle Realistic and just standardize on it? If there were only a few that weren't willing, could it be their responsibility to maintain their own non-cycle realistic version for themself?

-- steve

Mike Katz

unread,
Nov 14, 2024, 10:52:00 PM11/14/24
to Steve Tockey, PiDP-8

If the Single Step performance hit is too great they can turn off the Single Step code either by command line option or alternate binary.


Steve Tockey

unread,
Nov 15, 2024, 12:12:48 AM11/15/24
to PiDP-8

Mike,
In theory, yes. But practically, HB is correctly raising a concern about the people who maintain the code base (Bill, HB, several others possibly including even you and I). It undeniably puts an extra burden on the code base maintainers that would not be there if there were only one code base to maintain. I was assuming that by default the non-Cycle Realistic version would be the preferred choice if there could be only one. On the other hand, I interpret HB's question as asking whether the Cycle Realistic version could be the only "official" PiDP-8/I code base. If anyone felt they didn't want to deal with the lower performance of the Cycle Realistic version then they could retain the code for the non-Cycle Realistic version on their own. it would not be something maintained as part of the supported code base going forward.

Personally, I think it's an interesting proposal.


-- steve

Mike Katz

unread,
Nov 15, 2024, 1:34:10 PM11/15/24
to Steve Tockey, PiDP-8
Steve,

I would prefer to have two different scripts to start the executable

pidp8i and pidp8i-ss rather than using scripts to copy and rename files and the one pidp8i script to start it.

This way there is no confusion as to which executable is being run.

Though, I would much prefer a command line option that could be part of the pidp8i shell command or a command/variable set inside simh itself to enable and disable the Single Step code.  I understand the complexity of doing this.

I might be willing to undertake this task if my time allows in the near future.  I'm sure there would be code duplication as I would merge the code and do a simple if ( singlestep ) / else around the two different codes.

I might refactor the code as a second or third pass if time allows.

           Mike

Bill Cattey

unread,
Nov 15, 2024, 2:11:37 PM11/15/24
to PiDP-8, Steve Tockey, Mike Katz
This discussion may be moot.  Have people seen Mark Pizzolato's reply to the issue I opened at opensimh.org asking how willing they'd be to accept a pull request for Major State emulation of the PDP8 at: https://github.com/open-simh/simh/issues/434 ?

Mark Pizzolato wrote on 11/11/24 4:16 PM:

I will be glad to take your changes and include it in the https://github.com/simh/simh repo. In all likelyhood, the results could then also be migrated to the https://github.com/open-simh/simh repo.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.

There is a path to getting cycle-accurate behavior into PDP8 SIMH as the default, if we believe that Open SIMH will accept a pull request once it's been integrated into Mark Pizzolato's fork.

However, since the current tangentsoft trunk is sync'ed to OpenSIMH not Mark's fork, there may be some work in producing a pull request suitable for his code line.

-Bill

Mike Katz wrote on 11/15/24 1:34 PM:

Mike Katz

unread,
Nov 15, 2024, 2:21:02 PM11/15/24
to Bill Cattey, PiDP-8, Steve Tockey
Bill,

I think the question is one of do we push the SS version by itself and not the non-SS version or do we push a version what will build both as separate executables or do we push a version that uses the command line to enable/disable SS?

Guys, please correct me if I misunderstand.

      Mike

Bill Cattey

unread,
Nov 15, 2024, 5:38:10 PM11/15/24
to Mike Katz, PiDP-8, Steve Tockey
Hi Mike,

Yes. That's the question. I believe that the complication is that the inner loop of instruction execution changes in a fundamental way.  So the way you turn it on or off is to basically switch between two fundamentally incompatible interpreter loops.

-Bill

Mike Katz wrote on 11/15/24 2:20 PM:

Steve Tockey

unread,
Nov 17, 2024, 2:08:40 PM11/17/24
to PiDP-8

Bill, Mike, and all,
I am attaching a pair of pseudo-code-ish extracts of the two "fundamentally incompatible interpreter loops" (that's a very accurate description, Bill). I hope this makes obvious just how fundamentally incompatible they are.

I started with both versions of pdp8_cpu.c, deleted all of the identical code outside of the instruction loop, then summarized the bottom-level functionality as comments, keeping the relevant C control structures. This will illustrate that while the core functionality of the instruction loop code is identical except for the Major State emulation in the Cycle Realistic version, how all of that code is organized is completely different. I cannot see any reasonable way for anyone to make both structures live peacefully side by side in one code base. As a FYI, the trunk version of the loop is 1184 non-blank lines of source code and the Cycle Realistic loop is 979 non-blank lines. We are talking about a sizable chunk of code for both versions.

As well, I think when it comes down to it, the behavioral differences in PiDP-8/I specific code to handle the front panel is also different enough that trying to have that code live side by side would be a major challenge. There are some pretty fundamental structural difference in that code as well that we haven't even talked about.

To have one single code base that incorporates both behaviors cleanly will require rebuilding the entire emulator code base from the ground up. I seriously doubt anyone is in a mood for that. I am certainly not.

To support those two behavioral differences, we are pretty much forced into maintaining two different code bases. To be able to support only one code base, realistically a choice needs to be made about which behavior will be supported. And on that note, if the choice is to keep the non-Cycle Realistic version it won't be a major issue to me. I like the Cycle Realistic behavior enough that I'm happy maintaining my own local version.


-- steve
CR loop pcode.txt
Original loop pcode.txt

Mike Katz

unread,
Nov 17, 2024, 8:11:46 PM11/17/24
to Steve Tockey, PiDP-8
Steve,

I see two options, one for compile time and one for run time.

For compile time:

in each of the affected files:

#ifndef PiDP8ISS
#include "<MainLineTrunkCode>"
#else
#include ">PiDP8ISingleStepCode>"
#endif

or
if ( PiDP8_SS_Enabled )
{
    OldStylePDP8Emulator( );
}
else
{
    NewStylePDP8SingleStepEmulator( );
}

By putting all of the Single Step code in similarly named files but postfixed with SS in the file name and function names either solution will work.

The ifdef solution does have a smaller executable size but requires 2 executables
The if solution allows for runtime changing but creates a larger executable.

You could also use some very clever text substitution #define macro to have a single call in the code call either PDP8_Emulation() or PiDP8SS_Emulation() but that kind of code gets quite obscure.  This could work for both options.

Another option for the 2 executables would be to do the file switching in the make file based on a make parameter passed in.  If all of the main function names are the same then the make file could select which files to build and link it.  This will not work for the single executable option.

I am very opposed to creating a branch off of the main simh as we would have to rebase every time the trunk changed.

I just don't see the complication here.

Another idea just occurred to me.  What if we support the PiDP8-I code as a separate processor type from the standard PDP-8 code.  It would just be another processor with separate code just like the PDP-11 vs PDP-8? 

How does SIMH handle all of the different versions of the PDP-11 from the 11/03 up to the 11/94?  Many of them have different instructions, memory spaces, maximum memory, mmus, etc.  Why can't we duplicate that method for the PiDP-8/I?

Vincent Slyngstad

unread,
Nov 18, 2024, 7:03:22 AM11/18/24
to PiDP-8
Without taking the time to study the problem in detail, I want to ask whether the slowdown in the CR code is actually due to additional simulation work, or rather (I suspect) because the FP display is being updated every major cycle, rather than (effectively) every fetch. Since many instructions typically take an average of roughly 2 cycles, it seems that if the time is mostly updating the visibility of the state, that would account for the slowness. Intuitively, it seems to me that the actual simulation work is similar in both simulators.

That suggests an optimization in which the display is only updated every fetch, unless the SS switch is actually asserted.

Mike Katz

unread,
Nov 18, 2024, 1:01:26 PM11/18/24
to Vincent Slyngstad, PiDP-8
If Vince is correct, why don't we modify the code to only do the cycle accurate display when single stepping, where the time difference will be unnoticeable.

When not in single step mode the only time the display will need to be updated by cycle instead is if the CPU is halted for some reason.  And then the display can be updated appropriately (halt, single step, single instruction).

Vincent Slyngstad

unread,
Nov 18, 2024, 2:53:31 PM11/18/24
to PiDP-8
Naively, if we call the existing CR code "major_state()", I think I'm suggesting something like:
do {
  major_state();
until (SS or next_state == FETCH);

which (hopefully) would approximate the non-CR code when SS is not asserted. What I don't know is if that is actually reasonable in the SIMH context.

Steve Tockey

unread,
Nov 18, 2024, 2:56:39 PM11/18/24
to Mike Katz, Vincent Slyngstad, PiDP-8

Mike,
I don't believe that Vince is correct. I will defer to HB as he is much more familiar with the mechanics of driving the front panel lights, but my understanding is that the lights are NOT updated on each emulated machine step or instruction. In non-ILS, the front panel LEDs are only updated after some large number of instructions or cycles (1000?). In ILS, light on-off status is factored into corresponding "brightness" level status variables that are managed by a separate thread. In other words, the simulator "turning a light on" increases that brightness value following an upwards ramp rate while "turning a light off" decreases the brightness value following a slower downwards ramp rate. A separate thread flashes the actual LEDs at a duty cycle proportional to that light's currently stored "brightness" value.

The bottom line is that updating the front panel LEDs is intentionally as low overhead a job as it can be made to be. And, that overhead is mostly independent of instruction level vs. step-level emulation as far as I can tell.

Now, when it really comes down to it, the lines of code that actually emulate any given PDP-8 instruction are quite simple. Having already decided that you want to emulate an indirectly addressed, current page AND instruction it only takes 2 lines of C code in SIMH to make happen. The total actual begin-to-end C code for a typical PDP-8 instruction is probably on the order of 10 lines in most cases. And, FWIW, it's clear to me now that pdp8_cpu.c was written with high performance in mind. There's a significant amount of repeat code that, if "clean code" were a driving criterion, would have been factored out but it would have slowed down the top-end emulation speed.

For Cycle Realistic, there are two things going on to slow it down. One thing is that emulating on a Major State basis does increase loop overhead and there is more switch-case decision code to get to the point where you know what specific PDP-8 instruction is being emulated. The other thing is that some SIMH and PiDP-8/i infrastructure code that was being run only once per instruction now has to be run once per machine cycle. But it is infrastructure code that has to be run regardless, it's not something that can be ignored.

Your suggestion of running purely in instruction-level mode until single stepping with the switch is interesting but I believe unworkable in practice. A number of reasons, but the would include at least:
-- the core simulator code (pdp8_cpu.c) becomes massively complex because it has to do both instruction-level, and cycle-level, emulation and you have to be able to switch correctly between them
-- if you are emulating at an instruction level, pressing the Sing Step switch would never stop the machine except at the end of an instruction. On a real -8, Sing Step can stop execution at the end of any Major State. That part of the emulation becomes unrealistic.


-- steve


Mike Katz

unread,
Nov 18, 2024, 3:40:32 PM11/18/24
to Steve Tockey, Vincent Slyngstad, PiDP-8
Steve,

I don't know what level of optimization is set in the build.

One common form of optimization is to table drive switch/case statements if they are sequential.  This reduces the switch/case interpretation to a single table driven function call.  A switch/case based on opcode could be easily optimized, by the compiler, to be table driven.

So, some of that complexity might be optimized out by the compiler, depending on the optimization setting.

Thinking about the PDP-8 instruction set, the architecture lends itself to hand optimization.

Here is a quick pseudo code of how I would envision merging the Single Step state code with the main stream code.  I may be way off in this code, it is written from the seat of my pants without any research.  I hope I got the states correct.

if SS
    Set the state to fetch
endif
Get the next instruction
Parse opcode into Opcode
If the Opcode < 6
    if the instruction is direct page
        set Offset to 0
    else
        Set Offset to the current page
    endif
    Add Offset to the lower 7 bits of the instruction and store in Address
    if the instruction is indirect
        if SS
            Set the state to defer
        endif
        Replace Address with value at Address
    endif
else
    Set Address to the lower 9 bits of the instruction
endif
if SS
    Set the state to execute
endif
ExecuteInstruction( Opcode, Address )

   
What happens in ExecuteInstruction() does not affect the state at all, I believe.


Does this make any sense or am I way off base here?

      Mike

Vincent Slyngstad

unread,
Nov 18, 2024, 3:43:13 PM11/18/24
to PiDP-8
I don't think my suggestion with the "do-while" loop suffers from either your complexity issue or your suggestion of unrealistic stopping behavior.

OTOH, if it doesn't save any significant overhead either, it wouldn't confer any advantage.

Does the ILS ramping code execute in the inner simulation loop? If so, then that would affect CR vs non-CR performance.

Warren Young

unread,
Nov 18, 2024, 4:27:00 PM11/18/24
to PiDP-8
Some thoughts from a former maintainer who won't be doing any of the work on this and who therefore should probably be ignored outright… 🤓

The single most important decision I think y'all have to make is whether you're still supporting Pi 2 and older. I think there's a reasonable way to say, "No, you need a Pi 3 or newer for this release," if only because you can't buy Pi 2 boards in quantity any more, and the existing Pi 2 based systems work as well as they can be expected to within their constraints. In that world, you have plenty of CPU power to burn here. The last benchmark I did, the Pi 3 builds were 24x faster than a real PDP-8/i, albeit without taking the timing of different instruction classes into account.

Even so, there's no point burning CPU cycles to no good end.

The first thing you have to jettison is any idea of putting an "if (single_stepping)" test into the CPU decoding core. This executes literally millions of times a second on the production platform, and if it's branching on a volatile value, the branch predictor won't optimize that out for you. It will hurt benchmarking.

Therefore, some type of compile-time option will be superior, if only because you can do it in terms of a statically-compiled "#if PIDP8I_EMULATE_SINGLE_STEPPING" test.

(In case someone's getting a case of the clevers and is thinking of using a function pointer to select between the cases instead of a boolean flag, I predict that will give even worse performance. It's why there's so much PiDP-8/i specific code in the CPU instruction decoding loop in the first place. In my time as primary maintainer, I did try to hide more of that behind extracted functions, and it materially hurt performance.)

Something worth considering is that if you make this a Pi 3+ only feature, you save the need to make Pi 2 NLS builds of the software, which then gives you a huge chunk of time back for building other versions. Pi 3 ILS + SS, Pi 3 ILS + no-SS, new GPIO versions of same, etc. Alas, you don't get as much time back as you'd hope because slow as the Pi 2 is, the dominant element in the time it takes to produce a release is the time it takes to dd images to and from SD cards; the actual software build time comes in second. Last time I did a release with the current spare image set, I recall it taking about 8 hours of part-time-attention work, the kind where you have to be present at the computer to shuffle media or give commands more often than is convenient for doing something else in the foreground.

On the question of running single-stepping mode only while actually single-stepping, I thought part of the value of this feature was that it made the LED lights flash more accurately owing to partial instruction decoding now occurring, which before had to be faked.

On the question of whether to rebase on SIMH v4 or continue with OpenSIMH, I never saw that the switch did us much good, so switching back won't hurt. Still, this should be the last oscillation; however y'all decide, this should clinch it going forward.

Mike Katz

unread,
Nov 18, 2024, 6:25:00 PM11/18/24
to Warren Young, PiDP-8
Warren,

Your comments are welcomed.

If you used some global function pointers and only changed them on command then it would not take additional cycles

void (*)(void) pfExecuteCPU;

In the code for simh shell parser

if ( YES == boSS )
{
    pfExecuteCPU = PiDP8_Execute;
}
else
{
    pfExecuteCPU = PDP8_Execute;
}



Then the call to pfExecuteCPU() would not take any additional cycles.

That is a little bit of a kludge and obfuscates the actual function being called but would not have any additional runtime overhead.
--
You received this message because you are subscribed to the Google Groups "PiDP-8" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pidp-8+un...@googlegroups.com.

Warren Young

unread,
Nov 18, 2024, 7:07:15 PM11/18/24
to Mike Katz, PiDP-8
On Mon, Nov 18, 2024 at 16:24 Mike Katz <justme...@gmail.com> wrote:

if ( YES == boSS )
{
    pfExecuteCPU = PiDP8_Execute;
}
else
{
    pfExecuteCPU = PDP8_Execute;
}



Then the call to pfExecuteCPU() would not take any additional cycles.

The alternative I’m advocating for is inlining the contents of (*PiDP8_Execute)() within the instruction decoding loop. 

The fastest option is to ensure that one function fits into the CPU cache, ideally L1 alone. Yes, it violates the rules of “too-long functions” especially that horror-show case statement, but this is the inner loop; every indirection or out-of-cache lookup you add here runs millions of times a second; each extra nanosecond burnt per iteration becomes multiple milliseconds.

Inline it all.

Warren Young

unread,
Nov 18, 2024, 7:13:37 PM11/18/24
to Mike Katz, PiDP-8
On Mon, Nov 18, 2024 at 17:06 Warren Young <tange...@gmail.com> wrote:
each extra nanosecond burnt per iteration becomes multiple milliseconds

Thought of a better way to put it: every ns burnt is a 1% slowdown. 

(Fermi estimate, 1:10 million ratio: 

Mike Katz

unread,
Nov 18, 2024, 8:25:57 PM11/18/24
to Warren Young, PiDP-8
Warren,

What are the odds in a Linux, multi-tasking system, with dozens (and maybe more) tasks running at the time time that an interrupt or task switch won't invalidate the particular cache buffer you are using.  This would cause at worst a cache flush and at best just a cache miss.  Either one will slow down the execution. 

A single level 1 cache line on the Pi3, 4 and 5 is 64 bytes.  I doubt you are going to fit an entire function into a single cache line.  The odds of getting an entire function into enough cache lines so that the entire function is in cache in a multi-tasking, interrupt driven operating system is pretty slim.  Yes, the Pi 5 has 256 lines per CPU but how many task are

My Pi 5 with the latest OS has about 295 tasks with the GUI loaded with a terminal windows open.  Without the GUI there are about 198 tasks running.  This being said I don't see how any function would stay in the cache on a normal Linux system for any significant amount of time.  Even with tasks being spread of all of the cores.

Warren Young

unread,
Nov 18, 2024, 8:29:56 PM11/18/24
to Mike Katz, PiDP-8
Benchmark it and see. Don’t guess. Find out.

Mike Katz

unread,
Nov 18, 2024, 8:59:09 PM11/18/24
to Warren Young, PiDP-8
I'm sorry, benchmark what?

Warren Young

unread,
Nov 18, 2024, 9:20:31 PM11/18/24
to Mike Katz, PiDP-8
On Mon, Nov 18, 2024 at 18:59 Mike Katz <justme...@gmail.com> wrote:
I'm sorry, benchmark what?

There are at least four sensible configurations from a software engineering standpoint:

1. Inline everything called at the MIPS rate, as I advocate. Whether that’s a forked version of the simulator or one with two ifdef’d paths, one selected at compile time, the point is that this one is your fastest-possible baseline, with all choices reduced to a single operating set, compiled statically into the binary.

2. Inline everything, but selected at runtime by C control flow constructs hit on each pass through the instruction interpretation loop. (Or however often the display update loop runs under this “new ILS” I’m hearing about.)

3. Extract the choices to separate modules, hidden behind function APIs. This is the most software engineery option, but, I predict, measurably the slowest, and not by a little.

3a. Do it like Vince suggested, assigning the function pointer conditionally outside the main loop, but call through that indirection inline.

3b. Call the selected mode’s API directly, inline, paying the cost of the control flow construct on each iteration, saving the cost of a function pointer indirection and the ugliness of C’s FP syntax.

Paltry as an ARM cache is compared to a Xeon or EPYC processor, I predict these choices will all have a measurable effect. The advantage of having tried them all is that you will then know how much abstraction you’re willing to pay for.

Mike Katz

unread,
Nov 18, 2024, 11:39:58 PM11/18/24
to Warren Young, PiDP-8
Warren,

If we, as suggested, forget about anything before the Pi 3.  The slowest emulation (Pi 5 Single Step/Incandescent Lamp Simulation) is approximately 9 times faster than an actual PDP-8.  Running on a Pi 5 it's almost 40 times faster.  Yes that is about 7.5 times slower than trunk on the same Pi 5.

Does it really matter if we are 9 time or 8 time or 7 times faster than PDP-8 as long as we are at least as fast as a real PDP-8.  Many people throttle back SIMH in order to have a more realistic feel to the emulation.  People using the PiDP-8/I are probably even more likely to throttle back.

With this in mind, I think our discussion on emulation speed it moot.

If you want the fastest emulation possible run Trunk on a Pi 5.  If you want the most realistic emulation possible, you will run the SS/ILS code throttled back to 1:1.

That being said, we already have a compile time option for ILS/No ILS. And that makes sense to me.  Having another compile time option for PiDP-8/ILS/SS support works for me.  It is not my preferred method but having different executable names (PDP8, PiDP8I, PiDP8ISS) is a quite workable solution.

The Software Architect in me would like to see a single executable with runtime options to choose.  However, this complicates the code (without re-architecturing/re-factoring) to the point that that I think that 3 executables is a workable and viable compromise.  This would give the user of SIMH for the PDP-8 the choice of what to run.

This can be done in many different ways.  I will leave the mechanics of how to do this up to another thread.

The main goal would be to push this in such a way that a single get/clone would get everything necessary to compile all three versions.

    Mike

Warren Young

unread,
Nov 18, 2024, 11:47:04 PM11/18/24
to Mike Katz, PiDP-8
I think if you care about simulation speed above all else, you run SIMH on a proper desktop machine, not one of these wee little Pi boards.

But yes, there should be a build/runtime option combo that runs at least as fast as a real 8/I with the PiDP-8/I hardware on a Pi 3. That was a good baseline for the ILS back in my day, and it ought to be good now.

I’ve always thought putting a Pi 4 or 5 behind a 
PiDP-8/I panel was silly, but now that we’ve got something useful to spend the cycles on, go, go, go!

Vincent Slyngstad

unread,
Nov 19, 2024, 10:05:53 AM11/19/24
to PiDP-8
I finally managed to convince the fossil repository to show me the relevant version of the relevant source file. Of course, I have no idea how I did that, and no idea how to build with the thing.

I'm pretty sure that very minor modification of the CR code can be performant at non-CR emulation with a simple run-time switch (ie, "set cpu no cr"), if that is desirable. And with the SS switch still operable.

William Cattey

unread,
Nov 19, 2024, 10:28:42 AM11/19/24
to Vincent Slyngstad, PiDP-8
Hi Vince,

That's an exciting possibility!  Let's see what code is inspired here!

If you want help getting the build tree working for you, send me a private email and I'll work you through whatever is hanging you up.

-Bill

Vincent Slyngstad

unread,
Nov 20, 2024, 12:51:06 AM11/20/24
to PiDP-8
repository:   /home/vrs/pidp8/museum/pidp8i.fossil
local-root:   /home/vrs/pidp8/src/pidp8i/trunk/
config-db:    /home/vrs/.fossil
checkout:     ee19934afdd4c9274232910478739006346b472e 2024-08-08 21:13:50 UTC
parent:       9de1597bf49fe548641fa5323418060e8ea4d06a 2024-08-03 17:17:33 UTC
tags:         pi5-ils2-bworm-cyclerealistic
comment:      Fix mis-calling "False" as "false" and fix error return in
              check_exists (even though nobody uses it as of yet.) Merge in
              from trunk. (user: poetnerd)
EDITED     src/SIMH/PDP8/pdp8_cpu.c

Attached is the context diff produced by "fossil diff", and the changed file. Basically, a state bit is added in pdp8_cpu.c which allows cycle realism to be turned off (or back on again), just as EAE support can be disabled and re-enabled.

Then, "a do {} while" is wrapped around the inner state code. In cycle realistic mode, all major states execute the update overhead. If CR is disabled, then only FETCH_state does updates. Unless PIDP8I is defined, in which case swSingStep also can be asserted to single-cycle.

The one other (cosmetic) change is made to move a closing brace. As it was, there was a closing brace in each side of an #ifdef-#else, which meant the editor could not reliably find matching braces in the code.

It is also possible to add function calls to the cpu_mod[] initializer, which would be called when NOCR is set or unset. This would speed the while clause slightly by not requiring a masking operation in the test, thus slightly enhancing the innermost loop. I haven't done that yet, as redundantly storing the flag is probably not necessary.

What I'm realizing is that I don't have the necessary PiDP configuration(s) to run the benchmarks and quantify what difference this makes.

Vince
pdp8_cpu.c
diffs

Vincent Slyngstad

unread,
Nov 24, 2024, 8:24:52 AM11/24/24
to PiDP-8
Well, I did some benchmarking, and the results so far are a 20-30% speedup in non-CR mode. Unfortunately, the baseline CR code is about 100% (2X) slower than the non-CR code (not, say, 25%).

At this point, I think Warren is right, and the CR code is missing one of the main speed hacks; the4 over-decoding of the opcode as if it were 5 bits. This allows the condition tests for direct/indirect and paging, and about half the OPR mode selection to be embedded in the first switch (effectively, of FETCH). I think I'll try that next. (A bunch of code for MRI instructions is duplicated and kinda ugly, so I understand the desire to clean it up.)

Vince
Reply all
Reply to author
Forward
0 new messages