Am Fr., 18. März 2022 um 19:22 Uhr schrieb Linas Vepstas
<
linasv...@gmail.com>:
>
> Nala,
>
> On Fri, Mar 18, 2022 at 12:52 PM Nala Ginrut <
nalag...@gmail.com> wrote:
>>
>> Please also consider the Scheme for the embedded system.
>> The pthread is widely used in RTOS which could be the low-level implementation of the API,
>
>
> After a quick skim, srfi-18 appears to be almost exactly a one-to-one mapping into the pthreads API.
It was also influenced in parts by Java and the Windows API, I think,
but Marc Feeley will definitely know better.
> The raw thread-specific-set! and thread-specific in srfi-18 might be "foundational" but are unusable without additional work: at a minimum, the specific entries need to be alists or hash tables or something -- and, as mentioned earlier, the guile concept of "fluids" seems (to me) to be the superior abstraction, the abstraction that is needed.
Please add your thoughts about it to the SRFI 226 mailing list. As far
as Guile fluids are concerned, from a quick look at them, aren't they
mostly what parameter objects are in R7RS?
>
>>
>> and green-thread is not a good choice for compact embedded systems.
>
>
> My memory of green threads is very dim; as I recall, they were ugly hacks to work around the lack of thread support in the base OS, but otherwise offered zero advantages and zillions of disadvantages. I can't imagine why anyone would want green threads in this day and age, but perhaps I am criminally ignorant on the topic.
For SRFI 18/226, it doesn't matter whether green threads or native
threads underlie the implementation. There can even be a mixture like
mapping N Scheme threads to one OS thread.
>>> Actual threading performance depends strongly on proprietary (undocumented) parts of the CPU implemention. For example, locks are commonly implemented on cache lines, either on L1 or L2 or L3. Older AMD cpus seem to have only one lock for every 6 CPU's, (I think that means the lock hardware is in the L3 cache? I dunno) and so it is very easy to stall with locked cache-line contention. The very newest AMD CPU's seem to have 1 lock per CPU (so I guess they moved the lock hardware to the L1 cache??) and so are more easily parallelized under heavy lock workloads. Old PowerPC's had one lock per L1 cache, if I recall correctly. So servers work better than consumer hardware.
>>>
>>> To be clear: mutexes per-se are not the problem; atomic ops are. For example, in C++, the reference counts on shared pointers uses atomic ops, so if your C++ code uses lots of shared pointers, you will be pounding the heck out CPU lock hardware, and all of the CPU's are all going to snooping on the bus and they will all be checkpointing and invalidating and rolling back like crazy, hitting a very hard brick wall on some CPU designs. I have no clue how much srfi-18 or fibers depend on atomic ops, but these are real issues that hurt real-world parallelizability. Avoid splurging with atomic ops.
If you use std::shared_ptr, many uses of std::move are your friend or
you are probably doing it wrong. :)
>>> As to hash-tables: lock-free hash tables are problematic. Facebook has the open-source "folly" C/C++ implementation for lockless hash tables. Intel has one too, but the documentation for the intel code is... well I could not figure out what intel was doing. There's some cutting-edge research coming out of Israel on this, but I did not see any usable open-source implementations.
>>>
>>> In my application, the lock-less hash tables offered only minor gains; my bottleneck was in the atomics/shared-pointers. YMMV.
Unfortunately, we don't have all the freedom of C/C++. While it is
expected that a C program will crash when a hash table is modified
concurrently, most Scheme systems are expected to handle errors
gracefully and not crash. We may need a few good ideas to minimize the
amount of locking needed.