As a complement (I realized I should have put this in the first email), here is (one of) the exception that is thrown with the attached MWE from my first mail:
An error occurred in line <738> of file
</home/gpoy/.local/share/deal-ii-candi/tmp/unpack/deal.II-v9.6.1/include/deal.II/base/partitioner.templates.h>
in function
void
dealii::Utilities::MPI::Partitioner::import_from_ghosted_array_finish(dealii::VectorOperation::values,
const dealii::ArrayView<const ElementType,
MemorySpaceType>&, const
dealii::ArrayView<ElementType, MemorySpace>&, const
dealii::ArrayView<ElementType, MemorySpace>&,
std::vector<ompi_request_t*>&) const [with Number =
double; MemorySpaceType = dealii::MemorySpace::Host]
The violated condition was:
*read_position == Number() ||
internal::get_abs(locally_owned_array[j] - *read_position) <=
internal::get_abs(locally_owned_array[j] + *read_position) *
100000. * std::numeric_limits<typename
numbers::NumberTraits< Number>::real_type>::epsilon()
Additional information:
Called compress(VectorOperation::insert), but the element
received
from a remote processor, value 0.7071067811865476, does not
match with
the value 0 on the owner processor 0
I also realized I was wrong below for the actual vector that
fails to compress when there are ghost hanging nodes: I think it
corresponds to the output vector "u2" in
ExtrapolateImplementation<dim, spacedim,
OutVector>::extrapolate_parallel.
Best
Guilhem
Hi everyone,
I have continued trying to fix the issue below on my own. First, I discovered that the extrapolate function is thoroughly tested inside the testsuite, so I went there to see how it is used to fix my MWE, but I still get the exception. This led me to conclude that some triangulations with certain refinement pattern triggers the compress exception, while other do not, including the triangulation that is created in the make_tria function of the file "tests/mpi/fe_tools_extrapolate_common.h" of the deal.II repository. To convince you of this I have attached a patch that changes the refinement pattern in the aforementioned make_tria function. With the patch applied on the v9.6.1 repository, running again the tests "mpi/fe_tools_extrapolate_03***" will throw the same exception mentioned in my previous mails for three mpi ranks.
To summarize, I think there is a real bug involved since one can trigger the exception just by changing the triangulation involved in the "mpi/fe_tools_extrapolate*" tests, touching nothing about the interpolation and extrapolation code. Now, the problem is that I still don't understand why some triangulations works and others don't. Clearly, my affirmation on the triggering condition from my previous emails is imperfect, ghost hanging nodes seems to be a sufficient but not necessary condition to throw the exception. For example, the below distributed triangulation on the left triggers the exception, while the one on the right don't:


I will continue to investigate by diving into the internal
functions of extrapolate, but I would welcome some help on this
since I am not really familiar with the p4est code.
Best regards,
Guilhem
Dear all,
I think I have pinpointed the problem, which comes from l.
724-733 in deal.II/fe/fe_tools_extrapolate.templates.h: when a
cell which has a more refined ghost neighbour owns a DoF that
lives at the interface between this cell and this refined
neighbour, the associated DoF value will never be set and
therefore stays at 0. However, on the neighbouring process, the
corresponding ghost DoF value will be set, which explains the
compress("insert") exception later thrown due to the inconsistency
between a nonzero ghost value and zero "owned" value. So
really, the triggering condition for my bug is not that there
are ghost hanging nodes, but that one of the entry in
dofs_on_refined_neighbors is associated with a locally owned DoF
on a given process.
I guess the idea behind these lines is that we should favour extrapolation on the side that has more information, i.e. with the refined cell. But for this to work, I think one need to communicate data between process if the relevant DoFs on the refined cell are owned by a different process than the one that owns this refined cell. Or maybe I completely misunderstood the thinking behind this dofs_on_refined_neighbors map?
Best regards,
Guilhem