threads best way to maximise hardware utilisation

12 views

Skip to first unread message

John Logsdon

unread,

May 11, 2018, 4:59:07 AM5/11/18

to torch7

I have an AMD Threadripper box with 16 cores, 32 threads, 32GB memory, 1 TB striped SSD - no GPU as yet :).

The OS is Ubuntu 16.04 LTS.

I am running a job that uses acceptance sampling to find a suitable starting values for subsequent Simplex optimisation.

I may want to try 10,000 draws of 10 parameters to see what the result is and select those cases which are best.

I want to ensure that all hardware threads are as fully utilised as possible.

At the moment, the code kicks off OK and as long as I set the number of threads at about 30, all hardware threads are used with load factors approaching 30.

But a short while into processing the load drops to a few only (typically 2 or 3) and htop confirms that most threads are not being used.

This occurs whatever combination of pool threads is used - I usually use 30 but have tried it at 100 and depended on the Linux scheduler to handle the processes.

My question is - where should I add a pool:synchronize() (or even pool:terminate())?

Should it be after the whole loop below? or every 30 addjobs within the loop?

So my code is something like this:

Enter code here... 
print("Initiating sampling")
local  Pars={}

for i = 1,sampleAll do

Pars[i] = {}

for j=1,optLen do

Pars[i][j] = QinY[j]+(QaxY[j]-QinY[j])*math.random() -- uniform on QinY:QaxY

end

pool:addjob(

function()

return {{Pars[i],Pass(Pars[i],memFile,P)}} -- return as table that is called by endcallback function

end,

function(dummy)

local rest=unpack(dummy)

table.insert(rest,randStack[i])

table.insert(Accepted,rest) -- appends the core random numbers as Accepted[4]

return

end

)

end

print("\nAll random jobs submitted ...")

where the function Pass reads data from a string.dump'd file on /tmpfs to avoid upvalues etc.

TIA

John Logsdon

unread,

May 12, 2018, 5:43:41 AM5/12/18

to torch7

I seem to have solved this. At the top of the program I need:

require 'torch'
torch.setnumthreads(32)
print("Torch threads = "..torch.getnumthreads().." environment threads = "..os.getenv("OMP_NUM_THREADS"))

and ensure that OMP_NUM_THREADS= 1 before it is run. torch.getnumthreads did return 32 but OMP_NUM_THREADS was empty. :)