foreach(serial) giving different results when run in parallel and serial

Andrew Pollett

unread,

May 3, 2025, 9:01:56 AM5/3/25

to basilisk-fr

Hello all,

I'm trying to populate an array of structs with the coordinates and scalar field values for only grid cells that lie along the axes.

Here's the approach I'm using:

Counting phase: I first use a foreach loop with a reduction to count how many grid cells lie along each axis. This part runs in parallel:

foreach(reduction(+:yCount) reduction(+:xCount)) {
if (x < smallestGridSize() && x >= -smallestGridSize()) yCount++;
if (y < smallestGridSize() && y >= -smallestGridSize()) xCount++;
}

smallestGridSize() is a macro function that gives the size of smallest possible value of x, using the level of refinement and the domain size.

Allocation and population: I allocate arrays (xValues and yValues) of the appropriate size, and then use another foreach loop to populate them. Since writing to shared resources in parallel would cause issues, I use foreach(serial) for this second loop:

twoElemStruct yValues[yCount];
twoElemStruct xValues[xCount];
xIndex = 0;
yIndex = 0;
foreach(serial) {
if (y < smallestGridSize() && y >= -smallestGridSize()) {
xValues[xIndex] = (twoElemStruct){.coord = x, .f = f[]};
xIndex++;
}
if (x < smallestGridSize() && x >= -smallestGridSize()) {
yValues[yIndex] = (twoElemStruct){.coord = y, .f = f[]};
yIndex++;
}
}

twoElemStruct is a struct that holds two float values (coord and f).

The problem: I get inconsistent results between parallel and serial runs of this second loop — even when everything else stays the same. That suggests something unexpected is happening with the foreach(serial) iterator.

My questions:

Does foreach(serial) guarantee that only one process executes the loop?
Is it possible that the second serial loop begins before all parallel processes have completed in the first loop?
Should I use any MPI_Barrier() to ensure all the processes finish before starting the serial part?
More generally, how does Basilisk handle process synchronization in this context when using MPI?

Any insights on using foreach(serial) properly with MPI would be greatly appreciated.

Thanks in advance,

Andrew

Andrew Pollett

unread,

May 6, 2025, 2:37:15 AM5/6/25

to basilisk-fr

Hi all,

I have made a simple code (attached) that reproduces the issue I'm experiencing. In serial I compile with:

qcc -O2 -Wall foreachTest.c -o serial.out -lm

In parallel, I compile with:

CC99='mpicc -std=c99' qcc -O2 -Wall -D_MPI=1 foreachTest.c -o parallel.out -lm

And run with:

mpirun -np 4 ./parallel.out

Andrew

foreachTest.c

Andrés Castillo-Castellanos

unread,

May 6, 2025, 2:54:55 AM5/6/25

to basilisk-fr

Hi Andrew,

Will have to check to be sure. But it seems that your desired output depends on the order in which we loop over the domain.

I think there is also a misunderstading about what foreach(serial) does. Loops are independent, and the xValues, yValues in each PID will have different data because it has looped a different subdomain.

Maybe calculate_offsets() in http://basilisk.fr/sandbox/acastillo/output_fields/vtkhdf/output_vtkhdf_helpers.h can be adapted to what you are looking for.

Cheers,

Andrés Castillo-Castellanos

unread,

May 6, 2025, 3:00:54 AM5/6/25

to basilisk-fr

Also, you should probably try using foreach_region() instead (http://basilisk.fr/Basilisk%20C#foreach_region)

Stephane Popinet

unread,

May 6, 2025, 7:51:13 AM5/6/25

to basil...@googlegroups.com

Hi Andrew,

> * Does foreach(serial) guarantee that only one process executes
> the loop?

Not "process", "thread" i.e. "serial" guarantees that only one thread
executes the loop (typically when using OpenMP). MPI processes are
independent from one another, and so each MPI process will execute the
loop in parallel (whether "serial" is specified or not).

> * Is it possible that the second serial loop begins before all

> parallel processes have completed in the first loop?

If by "first loop" you mean the loop containing the reduction()
operations, then no, the second loop will only be executed on each
process after the results from the first loop have been "reduced".

> * Should I use any MPI_Barrier() to ensure all the processes

> finish before starting the serial part?

No, see previous answer.

> * More generally, how does Basilisk handle process synchronization

> in this context when using MPI?

Synchronisation is done by reduction() (as above) and/or by boundary
conditions. For example:

scalar a[], b[];
foreach()
a[] = x*y;

// synchronisation is necessary here since boundary conditions
// must be applied on a[] before computing b[]

foreach()
b[] = (a[1] - a[-1])/Delta;

> Any insights on using foreach(serial) properly with MPI would be
> greatly appreciated.

As mentioned above, foreach(serial) is irrelevant for MPI.

hope this helps,

Stephane