Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Shared memory

223 views
Skip to first unread message

Marcel Hendrix

unread,
Dec 29, 2022, 6:29:33 PM12/29/22
to
I want to experiment with shared memory between iForth instantiations
running on a multi-core CPU. On Windows, it is possible to share a memory-mapped file between programs. When a non-existing file name is given, the
used system call defaults to an arbitrary memory buffer, exactly what is
needed.

First experiments are successful, I am able to pass text from one iForth
to another with literally only a single line of code. However, after hours of
debugging, it proves that the sharing is only possible when both iForth
instances are run as an Administrator, which is somewhat understandable,
but a nuisance.

The MS example 'C' code ignores the problem, suggesting that
default security measures do not prevent the idea from working.
Does anybody know how to get around this problem (or lessen the OS
default security level a notch)?

-marcel

Hans Bezemer

unread,
Dec 29, 2022, 7:48:15 PM12/29/22
to
Maybe play with umask() before opening up shm?
Like: myMask = umask(0); /* open shm */ umask(myMask);

Hans Bezemer

Anton Ertl

unread,
Dec 30, 2022, 4:53:01 AM12/30/22
to
Marcel Hendrix <m...@iae.nl> writes:
>First experiments are successful, I am able to pass text from one iForth
>to another with literally only a single line of code.

Note that, if you want to communicate between the processes by writing
to shared memory in one process and reading in the other, modern CPUs
tend to have quite nonintuitive behaviour, and require the programmer
to jump through some hoops for reliable operation. IA-32 and AMD64
are somewhat better in that respect than, e.g., ARM, but even they
have non-intuitive behaviour.

My suggestion is to encapsulate the workarounds for this behaviour in
libraries for shared-memory communication (whether between processes
or between threads of the same process). Bernd Paysan has quite a bit
of practical experience with threads and shared memory, and has added
some libraries of this kind to Gforth.

>The MS example 'C' code ignores the problem, suggesting that
>default security measures do not prevent the idea from working.

And, have you tried it? Does it work as non-administrator? If it
does, what's the difference from what you have tried?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

Marcel Hendrix

unread,
Dec 30, 2022, 8:24:20 AM12/30/22
to
On Friday, December 30, 2022 at 10:53:01 AM UTC+1, Anton Ertl wrote:
[..]
> Note that, if you want to communicate between the processes by writing
> to shared memory in one process and reading in the other, modern CPUs
> tend to have quite nonintuitive behaviour, and require the programmer
> to jump through some hoops for reliable operation. IA-32 and AMD64
> are somewhat better in that respect than, e.g., ARM, but even they
> have non-intuitive behaviour.

(iForth does not yet support ARM.) Your warning is appreciated, because
I thought that I was done already (apart from setting up a semaphore).

> >The MS example 'C' code ignores the problem, suggesting that
> >default security measures do not prevent the idea from working.
> And, have you tried it? Does it work as non-administrator? If it
> does, what's the difference from what you have tried?

There are two steps to it. First iForth.exe must be started under an
Administrator account. That cost me quite a bit of time, but I found
several one-click solutions for it. Unfortunately, high privilege
programs are checked by UAC and require further acknowledgement
before they can run. It is incredibly complex to auto-skip that without
editing the Registry. For now I'll live with UAC until share memory
proves useful.

> And, have you tried it? Does it work as non-administrator? If it
> does, what's the difference from what you have tried?

I guess that you ask if I compiled the original example.
No, I did not. It was only a rough sketch. I may try that later.

-marcel

Anton Ertl

unread,
Dec 30, 2022, 12:20:20 PM12/30/22
to
Marcel Hendrix <m...@iae.nl> writes:
>On Friday, December 30, 2022 at 10:53:01 AM UTC+1, Anton Ertl wrote:
>[..]
>> Note that, if you want to communicate between the processes by writing
>> to shared memory in one process and reading in the other, modern CPUs
>> tend to have quite nonintuitive behaviour, and require the programmer
>> to jump through some hoops for reliable operation. IA-32 and AMD64
>> are somewhat better in that respect than, e.g., ARM, but even they
>> have non-intuitive behaviour.
>
>(iForth does not yet support ARM.) Your warning is appreciated, because
>I thought that I was done already (apart from setting up a semaphore).

I expect that the semaphore code (from the OS, right?) contains the
necessary operations such that when you write, then V the semaphore in
one process, and P for the semaphore in the other process, and then
read the shared memory in the other process, things will work as
expected. But such semaphore operations tend to be quite expensive.

minf...@arcor.de

unread,
Dec 30, 2022, 1:14:55 PM12/30/22
to

minf...@arcor.de

unread,
Dec 30, 2022, 1:18:16 PM12/30/22
to
p.s. it's the document page, not the page in the epdf online viewer

none albert

unread,
Dec 30, 2022, 3:24:26 PM12/30/22
to
In article <73c2da86-b581-4519...@googlegroups.com>,
Marcel Hendrix <m...@iae.nl> wrote:
>I want to experiment with shared memory between iForth instantiations
>running on a multi-core CPU. On Windows, it is possible to share a memory-mapped file between programs. When a non-existing file name is given, the
>used system call defaults to an arbitrary memory buffer, exactly what is
>needed.

I have success in going the other direction. Starting Forth and then
forking the process. Naturally the dictionary space is shared (cutting
waste) and a piece of common space (if need be Gbytes)
Each Forth has it own private dictionary space to add definitions
to, so it is fully functional.
It is based on cooperation. Each Forth is supposed to not mess with
each others stack and other private parts.
This works on linux (although I have discovered a defect in the 64 bit forking
that I've worked around.)
The same compatible (!) system works on Windows 32, no need to align
Windows and Linux for a common API that will hard to come by.
That is the advantage of relying on Forth itself.

Thanks to the abysmal documentation of the Windows API's I have
not managed to run it on Windows 64. Mind you, it is supposed
to work the same way as on Windows32. The answers that you get
is that you should use the C++ compiler not the API.
(Same with Linux, "you should use the shared libraries, not the
system calls." Only C++/C compiler writers have the right to
use system calls.)

>First experiments are successful, I am able to pass text from one iForth
>to another with literally only a single line of code. However, after hours of
>debugging, it proves that the sharing is only possible when both iForth
>instances are run as an Administrator, which is somewhat understandable,
>but a nuisance.

Being root should have nothing to do with it. You are in for a
nasty ride.

>The MS example 'C' code ignores the problem, suggesting that
>default security measures do not prevent the idea from working.
>Does anybody know how to get around this problem (or lessen the OS
>default security level a notch)?

I had practical motivation to implement this multi tasking
for the parallel Meissel/Hedgehog inspired idea's of counting
primes. It worked.

What programs do you have in mind to accommodate with this extension?

>
>-marcel

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge.
Don't sell the hide of the bear until you shot it.
Better one bird in the hand than ten in the air.

Marcel Hendrix

unread,
Jan 7, 2023, 1:54:38 PM1/7/23
to
I think I got it. Shared memory is implemented.

A minor annoyance is that iForth now has to be in the Administrator
group to run on Windows 11. This means UAC kicks in when the
program starts. I know how to fix it, but it is not on my priority list.

Getting it to work was not so difficult after all, but once applied for
iSPICE I found an unexpected twist. When iSPICE is ordered to run
a parallel job, the commandline could not contain certain parameters,
because these were not transferred from the controlling core to the
slaves. Here, when #|cpus ( the number of cores allotted to the job )
is set at 8 on the controller,
iSPICE> 1 TO #|cpus RUN-PAR
ran the slaves with #|cpus is 8, not 1. Apparently RUN-PAR is started
before the commandline is fully evaluated.

Below are some results. I took a simple SPICE simulation file with
3 nested .STEP loops for a total of 24 tasks.
Run on LTspice, this takes 363 seconds. Under the same conditions,
it was run under iSPICE with #|cpus set between 1 and 32.

iSPICE> .TICKER-INFO
AMD Ryzen 7 5800X 8-Core Processor

The best result is about 45 times faster than LTspice.
The optimum is 12 cores, with a strange outlier at #|cpus = 10.
An iSPICE task needs about 2 GBytes of memory (here).
The base memory use was 6 GBytes when I ran the test, so with
12 cores the job ran out of memory (I have only 32Gbytes here).
Maybe that with 10 cores Windows started making decisions
with regards to swapspace or working set.

During the test I kept an eye on clock frequency and memory use.
There was no throttling (5.6 GHz throughout), and maximum
memory use was about 31 Gbytes. No disk activity detectable (or
not shown by Windows :--)

The 8 extra hyperthreads are not very useful for this kind of work.
Once the 8 real threads are active, the simulation time does not
really decrease further. Maybe I should stick in more RAM to
make sure about that, or run it on a workstation with more/less
cores.

-marcel

\ LTspiceXVII vs 17.1.5
\ Total elapsed time: 363.431 seconds.

iSPICE> 1 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 49.638 seconds elapsed. ok
iSPICE> 2 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 25.352 seconds elapsed. ok
iSPICE> 4 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 13.489 seconds elapsed. ok
iSPICE> 8 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 8.618 seconds elapsed. ok
iSPICE> 10 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 11.051 seconds elapsed. ok
iSPICE> 12 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 7.569 seconds elapsed. ok
iSPICE> 14 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 7.822 seconds elapsed. ok
iSPICE> 16 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 8.255 seconds elapsed. ok
iSPICE> 20 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 9.459 seconds elapsed. ok
iSPICE> 24 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 8.441 seconds elapsed. ok
iSPICE> 28 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 10.799 seconds elapsed. ok
iSPICE> 32 TO #|cpus ok | RUN-PAR
Job `step\step_partest.cir` finished, 12.280 seconds elapsed. ok

\ About 363/8 = 45x faster than Analog Devices' LTspice.

Marcel Hendrix

unread,
Jan 13, 2023, 4:14:51 PM1/13/23
to
On Saturday, January 7, 2023 at 7:54:38 PM UTC+1, Marcel Hendrix wrote:
> I think I got it. Shared memory is implemented.

Now without bugs. ( https://ibb.co/Qd7Xw3g )

Marcel Hendrix

unread,
Jan 13, 2023, 4:45:59 PM1/13/23
to
Some details:

iSPICE schematic ( https://ibb.co/MsfXGmw )

~~~
-- iForth netlist (automatically converted from SPICE netlist)
-- d:\dfwforth\examples\SPICE\ispice\circuits\net_lts\powerplane
-- powerIII.cir processed by iSPICE on 14:12:52, January 12, 2023
CIRCUIT
5 N: ina out p s p2
3 B: i_V1 i_Vs i_B1

FCONST: k1 = 0.9999
FCONST: L11=15mH
FCONST: L22=15mH r=10
FCONST: con_1=r
FCONST: con_0=r

EXPR: ex_0 -V(p)*I(V1)-V(s)*I(Vs)

ina GND i_V1 PULSE: V1 ( -20 20 0 10n 10n 0.5ms 1ms )
out GND con_0 RESS R2
ina p con_1 RESS R1
p GND s GND CI XU1 L11={L11} L22={L22} K={k1}
s out i_Vs 0e VSOURCE Vs
p2 GND i_B1 ex_0 BVXT B1
END

NO-JOB-STORE
FALSE TO fastaccess?
.TRAN 0 1s {1s-2ms} 0.1u
SIMULATE
WRITES

~~~

-- iForth cmd file
CLEAR-TASK-DATA
.STEP param k1 0.99 1 0.0005
SUBMIT

~~~

-- original SPICE netlist
* D:\dfwforth\examples\SPICE\ispice\circuits\net_lts\powerplane\powerIII.asc
V1 ina 0 PULSE(-20 20 0 10n 10n 0.5ms 1ms)
R2 out 0 {r}
R1 ina p {r}
XU1 p 0 s 0 CI L11={L11} L22={L22} K={k1}
Vs s out 0
B1 p2 0 V=-V(p)*I(V1)-V(s)*I(Vs)
.param k1 = 0.9999
.param L11=15mH
.param L22=15mH r=10
.option reltol=0.1m
.tran 0 1s {1s-2ms} 0.1u
.step param k1 0.99 1 0.0005
.meas FORTH p2 @AVG pleak2
.lib NGSPICE\CI.sub
* LTspice total elapsed time: 527.32 seconds.
.backanno
.end


-marcel

Marcel Hendrix

unread,
Jan 21, 2023, 8:29:16 AM1/21/23
to
On Saturday, January 7, 2023 at 7:54:38 PM UTC+1, Marcel Hendrix wrote:
> I think I got it. Shared memory is implemented.

With further testing I noticed another hidden Windows 'feature.'
When running iForth as an Administrator, drag and drop to the iForth
console and from/to my editor and File manager sometimes did not work.
I suspected a bug in iForth, but digging around uncovered that this is a
well-known Windows feature: a higher privileged process (here
iForth) is prevented from accepting drag-and-drop from a lower
privileged one (here File manager). Ok, but there is a nasty twist here:
when iForth starts my editor with the S" xx" SYSTEM command, 'xx'
apparently becomes higher privileged too, and as a consequence,
drag-and-drop does not work anymore for 'xx' (the started editor).
This is somewhat unexpected and certainly a nuisance.

-marcel

Marcel Hendrix

unread,
Jan 21, 2023, 8:54:46 AM1/21/23
to
On Saturday, January 7, 2023 at 7:54:38 PM UTC+1, Marcel Hendrix wrote:
> I think I got it. Shared memory is implemented.

And now I want more :--)

It would be really great if the shared memory trick (which uses the system
page file) worked across the network. Admittedly it is only cosmetic, because
for my current purpose I could also use a shared file with a Filemap view
(mmap the file in the iForth virtual address space). In case of a file I have to
rewrite my array accesses to file operations, which is a drag. Neither Windows
nor Linux appear to directly support shared memory between networked
computers.

Is there a Forth library with RDMA (a transparent protocol build into many network
adapters)? If it existed I could buy a refurbished HP840 workstation and *really*
get going (such workstations have 44 cores/88 threads and cost a mere
2000 Euros, 15 - 20k new, refurbished RDMA nic's are 20 Euros...).

-marcel

Anton Ertl

unread,
Jan 21, 2023, 10:38:11 AM1/21/23
to
Marcel Hendrix <m...@iae.nl> writes:
>Neither Windows
>nor Linux appear to directly support shared memory between networked
>computers.

If you can live with its performance characteristics (and probably
lack of coherence), how about mmapping an NFS-mounted file (other
distributed fie systems may be better for that purpose, though).

Otherwise, I think there are good reasons for that lack of support.
The latency is long, and coherence is a problem. RDMA may solve the
coherence problem and reduce the latency, but it's still long.
Therefore people tend to use message passing rather than shared memory
across the network. Interestingly, in the Safe Forth concept I
suggested avoiding shared memory and communicating between threads (or
processes) with messages, even on the same machine (where shared
memory is easy and may be cheap).

>Is there a Forth library with RDMA (a transparent protocol build into many network
>adapters)?

Not that I have heard of, but if you want one, you are in a good
position to work on one.

>If it existed I could buy a refurbished HP840 workstation and *really*
>get going (such workstations have 44 cores/88 threads and cost a mere
>2000 Euros, 15 - 20k new, refurbished RDMA nic's are 20 Euros...).

Makes you wonder what's wrong with them:-)

Paul Rubin

unread,
Jan 21, 2023, 11:16:57 AM1/21/23
to
Marcel Hendrix <m...@iae.nl> writes:
> Is there a Forth library with RDMA (a transparent protocol build into
> many network adapters)? If it existed I could buy a refurbished HP840
> workstation and *really* get going (such workstations have 44 cores/88
> threads and cost a mere 2000 Euros, 15 - 20k new, refurbished RDMA
> nic's are 20 Euros...).

Unless you had a bunch of those workstations networked together, why
would you need RDMA, assuming your Forth program is running on the
workstation?

I see one here for 1000 USD, with 44 cores and 128GB ram:

https://www.ebay.com/itm/175576911219

That is really impressive. Anton asks what is wrong with them.
Obviously they are old and power hungry, but less so than it seems:

https://www.intel.com/content/www/us/en/products/sku/91317/intel-xeon-processor-e52699-v4-55m-cache-2-20-ghz/specifications.html

They use 14nm lithography and have 2.2GHz base frequency, which is not
all that fast. They were introduced in 2016. This 44 core system is
almost definitely slower than a 32 core Threadripper, but might beat a
16 core Ryzen. On the other hand those will cost more up front,
especially with the memory figured in. If you are running the
workstation 24/7 then the newer hardware will probably pay for itself in
power savings quickly, but if you only run it part of the time it might
be ok.

Now I feel a little bit interested but don't have an actual use for such
a box. Spinning up some Hetzner cloud servers for an occasional compute
task is pretty cheap.

Maybe you could implement MPI (does anyone still use that?) for your
Spice stuff.

Marcel Hendrix

unread,
Jan 21, 2023, 11:27:47 AM1/21/23
to
On Saturday, January 21, 2023 at 4:38:11 PM UTC+1, Anton Ertl wrote:
> Marcel Hendrix <m...@iae.nl> writes:
[..]
> >If it existed I could buy a refurbished HP840 workstation and *really*
> >get going (such workstations have 44 cores/88 threads and cost a mere
> >2000 Euros, 15 - 20k new, refurbished RDMA nic's are 20 Euros...).
> Makes you wonder what's wrong with them:-)

They come with a 3 year warranty, but I have no idea who dares buy
that stuff for their business, and how these resellers (there are many)
can prosper? I'll find out :--)

-marcel

Marcel Hendrix

unread,
Jan 21, 2023, 11:51:18 AM1/21/23
to
On Saturday, January 21, 2023 at 5:16:57 PM UTC+1, Paul Rubin wrote:
> Marcel Hendrix <m...@iae.nl> writes:
[..]
> Unless you had a bunch of those workstations networked together, why
> would you need RDMA, assuming your Forth program is running on the
> workstation?

I will put the workstation(s) in the attic, where I can't hear and feel them.
My desktop pc dispatches and controls the runs and displays the results.

> This 44 core system is almost definitely slower than a 32 core
> Threadripper, but might beat a 16 core Ryzen.

That costs 7,500 Euros around here, or 4 refurbished HP boxes...

It will be more fun than tweaking a game PC with liquid metal
and nitrogen for 1% higher frame rates.

-marcel

Marcel Hendrix

unread,
Jan 21, 2023, 11:59:34 AM1/21/23
to
On Saturday, January 21, 2023 at 4:38:11 PM UTC+1, Anton Ertl wrote:
> Marcel Hendrix <m...@iae.nl> writes:
[..]
> If you can live with its performance characteristics (and probably
> lack of coherence), how about mmapping an NFS-mounted file (other
> distributed fie systems may be better for that purpose, though).

Hmm, given the very limited functionality I need, this might be
perfectly adequate.

-marcel
0 new messages