I have the same problem.
I use openblas in a multithtread applications. If I use the single thread version of openblas is compiled without USE_LOCKING=1 simply do not work. Compiled using USE_THREADS=0 with USE_LOCKING=1 work but it is slow and slow down application if I use more than 1 thread. The only version that work is to compile using
USE_OPENMP=1 USE_THREADS=0 but I need to setup
export OMP_NUM_THREADS=1
or run application with
OMP_NUM_THREADS=1 ./application
I have tried to change the number of thread dynamically inside the applications
but do not work. It seems that the threads are allocated at startup and there is no way
to change the behavior. I tried to patch the code forcing the read of OMP_NUM_THREADS=1
inside the code but it do not work. Someone have solved this issue in some way?