threads best way to maximise hardware utilisation

12 views
Skip to first unread message

John Logsdon

unread,
May 11, 2018, 4:59:07 AM5/11/18
to torch7
I have an AMD Threadripper box with 16 cores, 32 threads, 32GB memory, 1 TB striped SSD - no GPU as yet :).

The OS is Ubuntu 16.04 LTS.

I am running a job that uses acceptance sampling to find a suitable starting values for subsequent Simplex optimisation.

I may want to try 10,000 draws of 10 parameters to see what the result is and select those cases which are best.

I want to ensure that all hardware threads are as fully utilised as possible.

At the moment, the code kicks off OK and as long as I set the number of threads at about 30, all hardware threads are used with load factors approaching 30.

But a short while into processing the load drops to a few only (typically 2 or 3) and htop confirms that most threads are not being used.

This occurs whatever combination of pool threads is used - I usually use 30 but have tried it at 100 and depended on the Linux scheduler to handle the processes.

My question is - where should I add a pool:synchronize()  (or even pool:terminate())?

Should it be after the whole loop below?  or every 30 addjobs within the loop? 

So my code is something like this:
Enter code here...
print("Initiating sampling")
local  Pars={}
for i = 1,sampleAll do
Pars[i] = {}
for j=1,optLen do
Pars[i][j] = QinY[j]+(QaxY[j]-QinY[j])*math.random() -- uniform on QinY:QaxY
end
pool:addjob(
function()
return {{Pars[i],Pass(Pars[i],memFile,P)}} -- return as table that is called by endcallback function
end,
function(dummy)
local rest=unpack(dummy)
table.insert(rest,randStack[i])
table.insert(Accepted,rest) -- appends the core random numbers as Accepted[4]
return
end
)
end
print("\nAll random jobs submitted ...")


where the function Pass reads data from a string.dump'd file on /tmpfs to avoid upvalues etc.

TIA

John Logsdon

unread,
May 12, 2018, 5:43:41 AM5/12/18
to torch7
I seem to have solved this.  At the top of the program I need:

require 'torch'
torch.setnumthreads(32)
print("Torch threads = "..torch.getnumthreads().." environment threads = "..os.getenv("OMP_NUM_THREADS"))


and ensure that OMP_NUM_THREADS= 1 before it is run.  torch.getnumthreads did return 32 but OMP_NUM_THREADS was empty. :)

This leave the chip  using all hardware cores/threads OK.
Reply all
Reply to author
Forward
0 new messages