$ cat ~/.theanorc
[global]
device = cuda
floatX = float32
[gpuarray]
preallocate = 1
[cuda]
root = /usr/local/cuda-8.0
$ ipython
Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
Type "copyright", "credits" or "license" for more information.
IPython 5.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import theano
Using cuDNN version 5105 on context None
Preallocating 11576/12186 Mb (0.950000) on cuda
Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0)
In [2]:
Traceback (most recent call last):
File "scripts/run_rl_mj.py", line 116, in <module>
main()
File "scripts/run_rl_mj.py", line 109, in main
iter_info = opt.step()
File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
cfg=self.sim_cfg)
File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in sim_mp
traj = job.get()
File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
pygpu.gpuarray.GpuArrayException: Out of memory
Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
Toposort index: 4
Inputs types: [TensorType(float32, matrix)]
Inputs shapes: [(1, 4)]
Inputs strides: [(16, 4)]
Inputs values: [array([[ 0.04058, 0.00428, 0.03311, -0.02898]], dtype=float32)]
Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[]<gpuarray>(GpuFromHost<None>.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Closing remaining open files:trpo_logs/CartPole-v0...done
Max traj len: 200
Traceback (most recent call last):
File "scripts/run_rl_mj.py", line 116, in <module>
main()
File "scripts/run_rl_mj.py", line 109, in main
iter_info = opt.step()
File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
cfg=self.sim_cfg)
File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in sim_mp
traj = job.get()
File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
pygpu.gpuarray.GpuArrayException: initialization error
Closing remaining open files:trpo_logs/CartPole-v0...done
Thanks Pascal.
I tried using gpu preallocate 0.01 and 0.1. The run with 0.1, for instance, starts like this:
$ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/CartPole-v0
Using cuDNN version 5105 on context None
Preallocating 1218/12186 Mb (0.100000) on cuda
Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0)
But the same error message results:
Traceback (most recent call last):
File "scripts/run_rl_mj.py", line 116, in <module>
main()
File "scripts/run_rl_mj.py", line 109, in main
iter_info = opt.step()
File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step
cfg=self.sim_cfg)
File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in sim_mp
traj = job.get()
File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
pygpu.gpuarray.GpuArrayException: Out of memory
Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
Toposort index: 1
Inputs types: [TensorType(float32, matrix)]
Inputs shapes: [(1, 4)]
Inputs strides: [(16, 4)]
Inputs values: [array([[ 0.02563, -0.03082, 0.01663, -0.00558]], dtype=float32)]
Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)]<gpuarray>(GpuFromHost<None>.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
With -1 as the preallocate, I get this to start:
$ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/CartPole-v0
Using cuDNN version 5105 on context None
Disabling allocation cache on cuda
I get a similar error message except it's slightly different, with an initialization error, but the same part of the code is running into problems:
Traceback (most recent call last):
File "scripts/run_rl_mj.py", line 116, in <module>
main()
File "scripts/run_rl_mj.py", line 109, in main
iter_info = opt.step()
File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step
cfg=self.sim_cfg)
File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in sim_mp
traj = job.get()
File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
pygpu.gpuarray.GpuArrayException: initialization error
Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
Toposort index: 1
Inputs types: [TensorType(float32, matrix)]
Inputs shapes: [(1, 4)]
Inputs strides: [(16, 4)]
Inputs values: [array([[ 0.01357, -0.02611, 0.0341 , 0.0162 ]], dtype=float32)]
Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)]<gpuarray>(GpuFromHost<None>.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Yes, the code seems to be using multiprocessing. I will try to see if I can find out how to deal with the multiprocessing, or perhaps just disable it.