parallel python run - caffe old version

17 views
Skip to first unread message

mau

unread,
Jul 8, 2017, 3:12:02 PM7/8/17
to Caffe Users
Hi all.
I'm using caffe 0.15.13 in our department server with 4 gpu for my master thesis. 

I'm trying to use caffe in parallel in my python code. For each call, I only need GLOG output, or specifically, i need only last train/test error.

The problem is that i can't read glog output. This is my code:

def encapsulate_bash_call(cmd,log):
p = subprocess.call(['bash', '-c', cmd])
# out, _ = p.communicate()
# subprocess.check_output(['bash','-c',cmd])

# print 'out', out[:20]
# with open(log, 'w') as lw:
# lw.write(out)

def run_k_fold_parallel( solver_config_paths, log_path=path_log, num_gpus=[1,2] ):
logs   = []
train = 0
val = 0
psnr = 0
num_solv= len(solver_config_paths)
results=[]
while len(solver_config_paths)>0:
pool = ThreadPool(processes=min(len(num_gpus), len(solver_config_paths))) 
for i in range(min(len(num_gpus), len(solver_config_paths))):
s_path = solver_config_paths.pop()
log = log_path+os.path.basename(os.path.normpath(s_path)).replace('.prototxt','.log')
             bashCommand = 'caffe train --solver='+s_path+' --gpu='+str(num_gpus[i])
results.append( pool.apply_async(encapsulate_bash_call, [bashCommand, log]) )
logs.append(log)
pool.close()
pool.join()

while len(logs)>0:
tr, va = parse_log.parse_log(logs.pop())
psnr  += 10 * np.log10((255**2)/(va[-1]['loss']))
train += (tr[-1]['loss'])
val   += (va[-1]['loss'])

return train/num_solv, val/num_solv, psnr/num_solv


I tried with subprocess.call, check_output, Popen and so many more.
I also tried setting GLOG_log_dir or simply read log from my /tmp folder, but with the above code, glog log doesn't exist.

Any suggestion?

Thanks

mau

unread,
Jul 8, 2017, 3:16:21 PM7/8/17
to Caffe Users
I also tried concatenating
caffe train --solver=.. --gpu=1 2&>1 | tee out1.log & caffe train --solver=.. --gpu=2 2&>1 | tee out2.log & 
without success.
Reply all
Reply to author
Forward
0 new messages