Hardware Architecture and Thread Synchronization

Rick C. Hodgin

unread,

Aug 29, 2016, 2:44:15 PM8/29/16

to

One of the aspects of LibSF 386-x40 is a new concept I've created called
"Love Threading":

https://github.com/RickCHodgin/libsf/blob/master/li386/oppie/oppie-6.png

Simply put: Two threads (main and love). They can be encoded to use
instructions which attempt to request servicing from the other thread
(either main thread or love thread can initiate). If the other thread
is not engaged in something like a task switch, or is not servicing an
interrupt, etc., then it will forego its current workload and enter in
to the requesting thread's workload and provide an "instant on" SMP on
the immediate workload, which, once exhausted, then sends the called
thread back to its own workload.

It's called "love threading" because it's what two people who love each
other would always do for each other: sacrifice their own thing to help
out. Husband's in the garage, calls out to the wife, "Honey, can you
give me a hand?" Probably not the first thing she'd like to do, but
calls out, "Yes. On my way." And she heads out to help out with whatever
the husband's doing. And, the same in the other way around.

Love Threading works in this way. And the ISA has special instructions
which allow parallel execution through a single code stream with each
thread processing its segment. Once completed, it goes to the next (if
any), and once exhausted, it returns to the original workload. However,
in the case of the situation when the love thread is busy, then the main
thread can do both parts in serial, hence the support given through the
ISA for this purpose.

In addition, I have created the ability to have special tiny heterogeneous
cores which operate on their own ISA and are kicked off by a branch
instruction, and cease processing by a type of HLT instruction encoded as
the last thing it processes.

To facilitate coordination between these multiple threads, I have
introduced thread synchronization registers which provide for wait-write,
and wait-read operations.

These will not write their value to the register until the other thread
has read the value, or read the value until the other thread has written
a new value.

These do not require OS intervention, and there are setup mechanisms to
allow for a watchdog timeout should one or multiple threads stall.

-----
These features will allow for single-thread boosts without OS support
where possible for brief bits of SMP code directly encoded into the
single instruction stream, as well as specialized processing that can
be kicked off without OS support with simple branch-launch instructions.

I was wondering if anyone knows of architectures which offer something
similar? I would like to study their successes and failures in these
areas.

Best regards,
Rick C. Hodgin

Rick C. Hodgin

unread,

Sep 1, 2016, 9:06:05 AM9/1/16

to

Love Atomicity

In light of the recent "Revival of pessimistic locking" thread, I have
come to realize that the Love Threading model I've created for LibSF
386-x40 could be extended to create a Love Atomicity (LA) model which
guarantees certain code blocks in the system are run atomically.

LA would provide ISA support for global atomicity in executing code.
It would not just be a lock on data at memory locations, but would
provide facilities to suspend execution in the current core, and to
dispatch the code block to a designated core for execution atomically.

If another request is currently being executed atomically, it would
enter a queue and operate in its turn. The originating core would
then either enter into a wait state for the code block to be completed,
and to receive back the go-ahead signal, or it would immediately fail
due to the wait queue where it can spin as needed, or go on to perform
other tasks.

In assembly syntax, it would be like this:

; A parameter is provided indicating the JMP target to
; begin waiting for the go-ahead singal:
lsync offset continue

; Code goes here, that which will execute atomically on
; the designated LA CPU

continue:
; No parameter is provided, indicating this is the signal
; of the end of the code block for the designated core,
; and the target of the wait for the originating core
lsync
; After this instruction, flags, registers, parameters
; can be examined to determine the state of the request.

Upon entry into the designated core, FS and GS would be replaced
by those setup by the OS for this purpose, to immediately point
to locations for semaphore data, coordination of multiple tasks,
multiple threads, etc. And upon exit from the designated core
through the stand-alone lsync instruction, the state is restored
to what it was before.

In this way, every core can guarantee that its request is always
run atomically, and it need not be a single memory access, but can
be an entire block of code.

The LA core will sacrifice its own workload (or a portion of its own
workload, see last paragraph*) to fill the requests as they come in.
It will be known that this core is set aside for this purpose, so it
is scheduled with lesser weight in critical needs than with the other
cores.

By creating this Love Atomicity ability, no OS protocol is required
apart from initial setup and ongoing support in the case of specific
needs by the various calls. It provides a way to perform global
syncing on complex code and testing, including the ability for the
designated core to call other functions in the originating core's
instruction stream, as it basically inherits the code state and is
able to perform its work as the originating thread would've been
able to do, save the altered FS and GS registers.

* I am also considering the ability to have the originating and
designated cores swap each others workload so that the bulk of the
compute is not lost during the atomic code run, but only a small part
of it, as the protocol can simply guarantee that the code block will
always be executed atomically, on a known core, with the known FS and
GS environment.

Rick C. Hodgin

unread,

Sep 1, 2016, 9:08:16 AM9/1/16

to

On Thursday, September 1, 2016 at 9:06:05 AM UTC-4, Rick C. Hodgin wrote:
> Love Atomicity
>

> In light of the recent "Revival of pessimistic locking" thread...

Link here:

https://groups.google.com/d/topic/comp.arch/Aj4cwf8tY3A/discussion

Chris M. Thomasson

unread,

Sep 3, 2016, 7:55:32 PM9/3/16

to

On 8/29/2016 11:44 AM, Rick C. Hodgin wrote:
> One of the aspects of LibSF 386-x40 is a new concept I've created called
> "Love Threading":
>
> https://github.com/RickCHodgin/libsf/blob/master/li386/oppie/oppie-6.png
>
> Simply put: Two threads (main and love). They can be encoded to use
> instructions which attempt to request servicing from the other thread
> (either main thread or love thread can initiate). If the other thread
> is not engaged in something like a task switch, or is not servicing an
> interrupt, etc., then it will forego its current workload and enter in
> to the requesting thread's workload and provide an "instant on" SMP on
> the immediate workload, which, once exhausted, then sends the called
> thread back to its own workload.

[...]

This kind of sounds like work-requesting. Its a term I used a while back
for an experimental scheduler:

https://groups.google.com/d/topic/comp.programming.threads/YBnjd-Sqc-w/discussion

When a thread has no work, it requests more work from other threads. My
friend has described the technique, and its caveats in more detail here:

http://www.1024cores.net/home/scalable-architecture/task-scheduling-strategies

Love threading sounds a bit similar is nature? Am I totally wrong on
this Rick?

Thanks.

Rick C. Hodgin

unread,

Sep 3, 2016, 8:10:49 PM9/3/16

to

It's similar. It isn't just done when the thread has no work, but rather
the thread now enters a block of code that would benefit from additional
hardware resources. It requests the other core abandon its current
workload and come into its instruction stream and help get its workload
completed more quickly.

It's a way to give a single thread a performance boost in those areas
where it could benefit from one. And, in a worst-case scenario when
the other core is unavailable, it simply executes both portions in
sequence, its portion first, then the other thread's portion, then its
portion, then the other thread's, iterating back and forth until the
workload is exhausted.

I've never heard of anything like it before. I don't know if it exists,
but I wouldn't be surprised to learn that it does. I call it Love
Threading because the other thread willingly sacrifices its own progress
to help out the other thread. And, of course, it could request the same
of the other thread as well as it has code where an injection of quick
SMP would be beneficial.