Hi folks,
there is a problem as follows. There are two different machines, both with Scientific Linux 6.6, one with CUDA 6.5 and Tesla C2075 GPU, the other one is with CUDA 7.0 and GeForce GTX 750 GPU. During the installation, the very same error message appeared that i reported earlier:
> at " make check " stage of compilation, i get the following error message:
> > ...
> > [ 80%] Building CXX object test/unit/CMakeFiles/test_messenger.dir/test_messenger.cc.o
> > /home/dmytro/Downloads/hoomd-blue/test/unit/test_messenger.cc: In member function Б-?void Messenger_file::test_method()Б-?:
> > /home/dmytro/Downloads/hoomd-blue/test/unit/test_messenger.cc:172: error: Б-?unique_pathБ-? was not declared in this scope
> > make[3]: *** [test/unit/CMakeFiles/test_messenger.dir/test_messenger.cc.o] Error 1
> > make[2]: *** [test/unit/CMakeFiles/test_messenger.dir/all] Error 2
> > make[1]: *** [CMakeFiles/check.dir/rule] Error 2
> > make: *** [check] Error 2
However, on CUDA 6.5/Tesla C2075 hoomd running fine without any noticeable errors, while on CUDA 7.0/GeForce GTX 750 machine the LJ example (and others, randomly chosen, too) regularly crashes with the following error message
...
HOOMD-blue is running on the following GPU(s):
[0] GeForce GTX 750 4 SM_5.0 @ 1.14 GHz, 2047 MiB DRAM, DIS
lj.py:005 | init.create_random(N=2000, phi_p=0.01, name='A')
notice(2): Group "all" created containing 2000 particles
lj.py:007 | lj = pair.lj(r_cut=3.0)
lj.py:008 | lj.pair_coeff.set('A', 'A', epsilon=1.0, sigma=1.0)
lj.py:010 | all = group.all();
lj.py:011 | integrate.mode_standard(dt=0.005)
lj.py:012 | integrate.nvt(group=all, T=1.2, tau=0.5)
lj.py:014 | run(10e3)
notice(2): -- Neighborlist exclusion statistics -- :
notice(2): Particles with 0 exclusions : 2000
notice(2): Neighbors excluded by diameter (slj) : no
notice(2): Neighbors excluded when in the same body: no
** starting run **
**ERROR**: Particle with unique tag 1446 is no longer in the simulation box.
**ERROR**: Cartesian coordinates:
**ERROR**: x: 15436.9 y: -24372.6 z: 27529.9
**ERROR**: Fractional coordinates:
**ERROR**: f.x: 328.004 f.y: -516.581 f.z: 584.567
**ERROR**: Local box lo: (-23.5675, -23.5675, -23.5675)
**ERROR**: hi: (23.5675, 23.5675, 23.5675)
Traceback (most recent call last):
File "lj.py", line 14, in <module>
run(10e3)
File "/home/dmytro/bin/hoomd/bin/../lib/hoomd/python-module/hoomd_script/__init__.py", line 268, in run
globals.system.run(int(tsteps), callback_period, callback, limit_hours, int(limit_multiple));
RuntimeError: std::exception
...
In most cases it crashes like this, only the particle id can be different. If i decrease the integration time-step and/or number of particles in the box, it will work errorless longer, but, for a long enough run, will eventually crash anyway. What's weird here is that by default the periodic boundary conditions are assumed, so that there is no meaningful way for a particle to get out of the simulation box. Any suggestions why? If i force it to run on CPU (hoomd lj.py --mode=cpu) everything works fine every time, the problem appears only when GPU is invoked. Various tests of GPU have not revealed any errors, and in all other regards the GPU seems to work flawless.