Dear Basilisk users,
Over the past month, I've been working with Prof. Popinet to develop a more efficient method for assigning cells to each processor in a parallel MPI simulation on a quad/octree grid. As of now, Basilisk tries to assign an even number of cells to each processor, without considering that some areas of the domain may carry a heavier computational load concentrated in a particular zone. For example, this can happen when you have a lot of computation on the interface of a VOF field or at the boundaries of the domain. This is the problem I set out to solve.
I've attached the preliminary patch to this conversation, which will later be merged into src (hopefully). If you'd like, you can try it out and let me know any feedback you may have. Below you can find a description of the changes.
The main idea is to evenly redistribute a total "load" rather than the number of cells. This information is held in the (const) scalar field balance_weights, which by default is unity. If you do not specify anything, the balancing algorithm automatically defaults to the current behaviour, i.e., each processor gets the same number of cells. To activate the modified version, you have two options:
1. You specify the balance_weights field yourself by allocating the scalar field, something like:
event init (i = 0) {
balance_weights = new scalar;
}
Then you can assign any value that is proportional to what you think your computational effort is. It can come from your intuition or from something you measure yourself, for example:
foreach()
balance_weights[] = f[]*20. + 1.;
The only constraint is that balance_weights[] >= 0. with at least one cell non null (values can also be decimals, no need to normalize the field). If the total weight is 0, the balancer simply does nothing and leaves the current distribution unchanged.
2. You enable automatic detection of the weights through the compilation flag CFLAGS+=-DLB_AUTO=1. This makes Basilisk try to minimize the communication/synchronization time spent exchanging information between processors. Note that this approach is based on the timing of Basilisk's MPI wrappers ( like mpi_all_reduce). Any MPI communication you do outside these wrappers is not counted, so heavy use of "raw" MPI calls in your own code may make the estimate less accurate.
Attached with the patch are also two new test cases (based on src/test/rotate.c) that show examples: balance-rotate.c, using option 1, and balance-rotate-auto.c, using option 2.
Note that activating this should have NO IMPACT on the results of your simulations, only a reduction (hopefully) of your computational time.
Again, I appreciate any feedback (or wishes) you may have.
Thanks,
Riccardo Caraccio