Thanks for sharing your thoughts. Indeed, grid decomposition and particle decomposition can be both useful approaches. Maybe even a mixture of the two could be used for the best performance (for example use GD across several CPUs, and PD within a single CPU).
As mentioned, on a single CPU, GD could have significant advantage if particle collision is activated.
Another possibility I want to throw in is not only dynamical load balancing, but dynamical grid size. There could be one region with fewer particles that needs less resolution (e.g. 8 x 8 grid) while a neighboring region has many particles and could use higher resolution (e.g. 32 x 32 grid for same physical size). Connecting the two regions could be as "simple" as devising the right boundary conditions that would interpolate from the 8 rows from one side to 32 rows on the other side, and back. So the 8x8 region doesn't know what happens on the other side. All it sees is the boundary grid that behaves like just another 8x8 neighbor, while the boundary transparently interpolates to a 32x32 neighboring grid.
* It would be good if the code structure would allow the physicist to implement the physics equations without needing to know whether GD or PD or a mixture of both is used. Also the implementation of more advanced algorithms should be easy.
* Dynamical load balancing would be great, but should again be fully transparent to the simulation. The physics simulation should not need to care about how it is parallelized.
* If grids of different sizes are considered, the number of neighbors could vary even more: a large 32 x 32 grid of low activity could be connected to two 16 x 16 grids on the left side with high activity. The boundary condition would connect the upper half of the 32 ghost row with one 16x16 grid, and the lower half of the 32 ghost row with the other 16x16 grid.* In each direction the boundary could be some physical boundary (hardwall, thermal, ..) or some "topological" boundary (periodic boundary, neighboring boundary, boundary to a different CPU, ...). Grids and boundaries could be moved for dynamical load balancing.
Thinking about all these different possibilities (all of which we won't implement right away, but just to keep in mind possible extensions), I'm thinking of the following strategy: ...
Further Strategy
The design of the good boundary and grid should be now of the highest importance. I also think that we (more I than we:)) should start with the GD scenario across several CPUs as it is the hardest part and would require the most design changes to the current code base.
To simplify things I suggest to use at the beginning the following restriction on the grid of the simulation. The grid can only have dimensions of 2^n (eg. 2x8, 16x16, 64x128). This way we can easily create equal subgrids and do not have to come up with complicated algorithms of splitting the grid. Of course then in the end one node might compute more subgrids depending on the number of nodes available.
That's an acceptable restriction, although I don't really see its necessity at this stage. Ok, you probably want to avoid one node calculating 2x5 and the other node 3x5 so that one node doesn't have to wait for the other node all the time, but apart from the wasted idle time, it shouldn't be a conceptual problem to allow for arbitrary sizes?
Before we start to plunge into the GD problem, we should first fix all bugs mentioned in the other discussion thread / pull request, so that we have a good single threaded reference simulation. We could tag that and publish it as version 0.4. Jan, could you have another close look at the refactored code and see if you can help Kirill to get all tests running? Don't hesitate to ask here if there is something you don't understand. If Kirill is happy with the changes, we can push out version 0.4, and start working on 0.5 - code name "GD" :-)
Let me know if you think I'm drifting away too much, and we should rather concentrate on a small well-defined problem first and maybe generalize later :-)
The starting point of the idea was that we already have 2 kinds of "things" that we would need to parallelize: grids and particles.
They share some features:
- there is some region of interest, and neighboring regions.
- you want to abstract the region boundaries so that regions and neighbors behave like a virtual larger region which is what the physics classes get to see.
- regions need to be connected and data shared (either across memory or across CPUs)
- access to regions has to be synchronized, so that the simulation stays in sync and there is no race condition
So there may be some classes and methods that are common to both "things", grid and particles. If there are, one could generalize this common subset from 2 "things" to many "things".
Maybe this thought can be useful when designing the class layout, but I agree it is not of importance now, and if there really is no common subset, then just drop this idea.