Denis,
Thank you very much for this information. This is very instructive!
To check how memory bound was the AMGL Poisson example code,
I run the same executable using 1,2,3,4 simultaneous instances (just sent them to the background and waited for them to finish).
I checked that the Wall time differences among the "identical" jobs was very small, so all simultaneous jobs run in pretty much the same Wall time.
Processor Intel(R) Core(TM) i7-3940XM CPU @ 3.00GHz, 3190 Mhz, 4 Core(s), 8 Logical Processor(s)
32 GB of DRAM (low RAM use, not paging)
With OMP#1 the Wall time is ~ the same up to 3 simul jobs,
and with OMP#2 the Wall time is ~ up to 2 simul jobs.
Also for
OMP#2 the Wall time ~ doubles when going from 1 to 4 simul jobs,
Can we say from this that for this job size the code is 50% memory bound or something along these lines?
And I wonder if it is possible to estimate the Wall time reduction running this code on a workstation with 4,6,8 memory channels?
Can we use linux's prof or any other profiler to figure this out?
Thanks for your advice!
Cheer