Hi Alexandru!
Sounds nice! Did I understand you correctly that you want to run the whole Simulation.step() method on the GPU and hold the particle list as well as the grid in the VRAM?
Here is a quick review of the steps a particle takes during one iteration:
1) ParticleMover.push()
*) This method gets the particle list, takes care of the parallelization and executes the Push ParticleAction on each particle. This action is defined in the private Push class that is also in the ParticleMover.
*) Push.execute(): first the particle position is stored in the prevX and prevY variables in the particle. This is needed for the charge conserving cic interpolation algorithm. Then the Solver.step() is called where the particle position is advanced by one time step. Then the boundary check is performed.
*) In the ParticleBoundaries.applyOnParticleCenter() method, which is implemented int he SimpleParticleBoundaries class, first the region (left, right etc) is determined and then the appropriate ParticleBounadry is called. That means that boundaries can be different on different sides of the simulation.
*) In the periodic case the particle boundary simply adds the appropriate value to the x, y, prevX and prevY variables of the particle.
*) The hardwall boundary first calls the Solver.complete() method, then performes the reflection and then calls the Solver.prepare() methods. In the solver methods the velocity is shifted by half a timestep.
Note: in the distributed version there can also be internode boundaries. This is documented on Jan's blog:
http://karolovbrat-gsoc2012.blogspot.sk/ I guess one would split work among multiple gpus in a similar way.
2) Interpolator.interpolateToGrid()
*) This class is similar to ParticleMover.push() it handles the parallelization and then calls InterpolatorAlgorithm.interpolateToGrid() on each particle. This method then goes either to the Cloud in Cell algorithm or the Charge Conserving CIC. They interpolate the velocity of the particle to the grid according to its relative position to the grid cells. This is the slowest part of the simulation at the moment. Because the setters on the grid cells need to be synchronized methods.
3) Interpolator. interpolateToParticles()
*) After the fields on the grid were updated they get interpolated to the particle. Similarly as above but the interpolateToParticle() method is only implemented in the Cloud In Cell algorithm, the charge conserving cic uses the same implementation because its actually just an extension of the Cloud In Cell algorithm.
Thats it. This is where most of the simulation happens. The field update is very quick and is done in the SimpleSolver algorithm.
In principle we can use all combinations of algorithms. In practice we will probably stick to: SimpleSolver, ChargeConservingCIC, Boris. At the moment the parallelization is hidden away in the interator classes. The physicists need only to implement the algorithm that should be performed on a single particle or grid cell. Do you think this could be preserved with OpenCL? Then you might not need to deal with the inner workings of the algorithms...
The problem with the algorithms is that thy store a lot of data in the particle that is not needed all the time. The Charge Conserving CIC for example stores the prevX and prevY variables. One could perform the push and the interpolation to the grid in one step and make push store the data for the interpolator temporarily. That way we only would need to store 2*number of threads of doubles for some time instead of 2*umber of particles all the time. Maybe this temp data even fits in some cache and does not need to be transfered...:-)
Same argument could be applied to the prevForce variables that are stored by te solver for the complete() method.
The problem is to let push know what kind of data it should store because this depends on the interpolation algorithm and the solver algorithm. I am just mentioning it in case memory and bandwidth is a problem. A lot of optimization can be done here! One just needs to find a good way to store some temporary data in the right places.
Cheers
Kirill