On Wed, 29 May 2019, Warren Young wrote:
> On Wednesday, May 29, 2019 at 11:04:17 AM UTC-6, Stephen Casner wrote:
> > On Wed, 29 May 2019, Warren Young wrote:
> >
> > > Why are threads being used in the first place?
> >
> > One motivation is to take advantage of multiple cores for
> > increased performance.
>
> ... If the display updating process is single-threaded, then surely
> one whole CPU core is enough for it?
I was simply speaking generally, that if some task requires more oomph
than one core can provide, then splitting the task into multiple
threads allows using multiple cores.
> > In the "IBM 1620 Jr." project for the Computer History
> > Museum everything runs in one Pi
>
> What generation of Pi?
3B
> > we need two threads because the machine cycle-level simulation
> > runs at a 20 microsecond cycle time and consumes most of one core
>
> The PiDP-8/I simulator takes ~6.5% of one Pi 3B core (1.2 GHz) when
> throttled to run at 333 kIPS, roughly approximating the speed of a
> real PDP-8/I.
>
> By "cycle-level" I assume you mean some kind of sub-instruction level,
> comparable to what the PDP-8 architecture calls "steps," which SIMH doesn't
> try to emulate. If all else were equal, that means the 1620 simulator has
> more to do, but a quick scan of the machine's Wikipedia page suggests that
> it's roughly 10 times slower than a mainstream PDP-8, so the two advantages
> ought to mostly cancel out.
The 1620 panel contains approximately 192 lights, many of which
reflect the state of internal logic gates in the machine. In order to
update those lights accurately, the simulation basically needs to
implement the same logic as in the flow sequence diagrams that specify
the operation of the computer.
> > the display update runs at a 10 millisecond cycle time on a
> > separate core.
>
> I suspect that if you just ran it every 20000 iterations of the
> instruction decoding loop, on the same thread, you wouldn't notice
> the overhead, especially if the simulator is throttled down to match
> the original machine's IPS rate.
No, because the display update takes 2 milliseconds to execute, so
that would distort the display for the cycle at which the update
occurred. The display update uses I2C to feed 12 LED driver chips
each of which drives 16 LEDs at a variable intensity using separate
PWM for each LED. It might be possible to divide up the display
update work into little pieces to be done in each 20 microsecond
machine cycle time, but it is much easier to maintain timing accuracy
with separate threads running on separate dedicated cores.
The simulator thread does throttle the execution rate by waiting to
the end of the 20 microsecond interval after it finishes the work of a
cycle.
> > we designed a
> > lock-free memory-based communication between the two threads so no
> > mutex is necessary and there are no data races.
>
> I did that for the PiDP-8/I as well.
>
> It's the same basic technique as double-buffered bitmapped graphics. There
> are two "display" buffers, one that the CPU decoding loop writes to, and
> one that the display update code reads from. About 100 times a second, it
> zeroes the read-from display and swaps it with the write-to display.
That is similar to the technique we use with two buffers. The
simulation accumulates counts of cycles during which each light should
be ON as well as a total count of the cycles. The display intensity
is the ratio of those two counts. The simuation works on the
currently active buffer, but which one is active is determined only by
the display thread. The display thread clears the counts as it
consumes the inactive buffer, then swaps. It does not matter if there
is some variation in the number of cycles that a buffer is active.
> Technically there's a race condition here in that the display could be
> updating at the time of the swap, giving the equivalent of "tearing" in
> bitmapped graphics displays, ...
We easily avoid that problem by having the display update thread delay
at least 20 microseconds (by doing other work) between the time it
switches which buffer is active and when it begins consuming the
data. That gives time for the simulation cycle to complete its update
of the buffer that was active at the time the cycle started.
> > of each light is performed within the simulation and then the display
> > update queries that state to get the appropriate intensity value for
> > each update.
>
> Yes, that's what my PiDP-8/I incandescent lamp simulator does.
The 1620 also used incandescent bulbs, so we are managing the change
in intensity in a manner similar to what you implemented.
> For the PiDP-11, instead of multiple levels, I think you just need a
> threshold: was this LED mostly on during the last update time? Something
> like "set_count > instructions_executed / 2".
I'm guessing that would still show too much flicker. That's the
problem with the existing PiDP11 software that motivates me to try a
different implementation. Plus I think I could make it do something
more realistic for the micro-ADR display (not truly accurate, but more
representative).
> > That query could be over IPC rather than directly
> > through memory as we did for the 1620
>
> Have you looked at SIMH's front panel API? It does that.
To drive the address and data lights more realistically I would like
to accumulate counts for the lights being on or off in each memory
cycle of an instruction. I don't think the FP API can provide that.
-- Steve