parallel python run - caffe old version

17 views

Skip to first unread message

mau

unread,

Jul 8, 2017, 3:12:02 PM7/8/17

to Caffe Users

Hi all.
I'm using caffe 0.15.13 in our department server with 4 gpu for my master thesis.

I'm trying to use caffe in parallel in my python code. For each call, I only need GLOG output, or specifically, i need only last train/test error.

The problem is that i can't read glog output. This is my code:

def encapsulate_bash_call(cmd,log):
	p = subprocess.call(['bash', '-c', cmd])
	# out, _ = p.communicate()
	# subprocess.check_output(['bash','-c',cmd])

	# print 'out', out[:20]
	# with open(log, 'w') as lw:
	# 	lw.write(out)
	

def run_k_fold_parallel( solver_config_paths, log_path=path_log, num_gpus=[1,2] ):
	
	logs   	= []
	train 	= 0
	val 	= 0
	psnr 	= 0
	num_solv= len(solver_config_paths)
	results=[]
	while len(solver_config_paths)>0:
		pool = ThreadPool(processes=min(len(num_gpus), len(solver_config_paths))) 
		for i in range(min(len(num_gpus), len(solver_config_paths))):
			s_path = solver_config_paths.pop()
			log = log_path+os.path.basename(os.path.normpath(s_path)).replace('.prototxt','.log')
			             bashCommand = 'caffe train --solver='+s_path+' --gpu='+str(num_gpus[i])
			results.append( pool.apply_async(encapsulate_bash_call, [bashCommand, log]) )
			logs.append(log)
		pool.close()
		pool.join()

	while len(logs)>0:
		tr, va = parse_log.parse_log(logs.pop())
		psnr  += 10 * np.log10((255**2)/(va[-1]['loss']))
		train += (tr[-1]['loss'])
		val   += (va[-1]['loss'])

	return train/num_solv, val/num_solv, psnr/num_solv

I tried with subprocess.call, check_output, Popen and so many more.
I also tried setting GLOG_log_dir or simply read log from my /tmp folder, but with the above code, glog log doesn't exist.

Any suggestion?

Thanks

mau

unread,

Jul 8, 2017, 3:16:21 PM7/8/17

to Caffe Users

I also tried concatenating
caffe train --solver=.. --gpu=1 2&>1 | tee out1.log & caffe train --solver=.. --gpu=2 2&>1 | tee out2.log &