Hope the following can help someone to come up with an explanation.
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ elapsed time: 5.072778114 seconds (263769072 bytes allocated)
~$ julia -e "@time svd(randn(2000, 2000))"
elapsed time: 5.029946219 seconds (263769072 bytes allocated)
~$ export OPENBLAS_NUM_THREADS=1
~$ julia -e "@time svd(randn(2000, 2000))"
elapsed time: 7.385015759 seconds (263769072 bytes allocated)
~$ julia -e "@time svd(randn(2000, 2000))"
elapsed time: 7.42231304 seconds (263769072 bytes allocated)
~$
# start 2 independent julia processes concurrently
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ elapsed time: elapsed time: 9.448170203 seconds (263769072 bytes allocated)
9.465813686 seconds (263769072 bytes allocated)
# start 3 independent julia processes concurrently
~$
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ elapsed time: elapsed time: 11.785043673 seconds (263769072 bytes allocated)
11.797240153 seconds (elapsed time: 263769072 bytes allocated)
11.818624479 seconds (263769072 bytes allocated)
# start 4 independent julia processes concurrently
~$
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ julia -e "@time svd(randn(2000, 2000))" &
~$ elapsed time: 14.709108466 seconds (263769072 bytes allocated)
elapsed time: elapsed time: 14.734458378 seconds (263769072 bytes allocated)
14.768934198 seconds (263769072 bytes allocated)
elapsed time: 14.80225463 seconds (263769072 bytes allocated)
As can be seen, there is a steady degradation of time taken when concurrent but independent julia processes are timing their svd calls.
FWIW, I also tried the above with taskset, i.e. setting CPU affinity, but no difference.