Segfault when calling gather_evaluate() or read_dof_values_plain()

45 views
Skip to first unread message

Maxi Miller

unread,
Jan 13, 2020, 10:41:32 AM1/13/20
to deal.II User Group
I tried to implement a RK4-solver similar to the upcoming example 67 using the matrix-free technique, and compare it to other solvers I wrote earlier. For this purpose I chose the classical heat-equation for testing. When running the attached example, though, I get a segfault when calling either gather_evaluate() or read_dof_values_plain(). By replacing the code from read_dof_values_plain() with the code from the include file, I could narrow the problem down to (presumably) the operation read_write_operation(). When commenting this function out, the code continues until evaluate(), when not, it segfaults.
Therefore, I assume that I have some bugs in my code, but am not experienced enough in writing matrix-free code for being able to debug it myself. What kind of approach could I do now, or is there simply a glaring bug which I did not see yet?
Thanks!
main.cpp

Wolfgang Bangerth

unread,
Jan 13, 2020, 3:41:54 PM1/13/20
to dea...@googlegroups.com
On 1/13/20 8:41 AM, 'Maxi Miller' via deal.II User Group wrote:
> Therefore, I assume that I have some bugs in my code, but am not experienced
> enough in writing matrix-free code for being able to debug it myself. What
> kind of approach could I do now, or is there simply a glaring bug which I did
> not see yet?

I haven't even looked at the code yet, but for segfaults there is a relatively
simple cure: Run your code in a debugger. A segmentation fault is a problem
where some piece of code tries to access data at an address (typically through
a pointer) that is not readable or writable. The general solution to finding
out what the issue is is to run the code in a debugger -- the debugger will
then stop at the place where the problem happens, and you can inspect all of
the values of the pointers and variables that are live at that location. Once
you see what variable is at fault, the problem is either already obvious, or
you can start to think about how a pointer got the value it has and why.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Maxi Miller

unread,
Jan 14, 2020, 4:49:16 AM1/14/20
to deal.II User Group
I could narrow it down to the function
 
 // same as above for has_partitioners_are_compatible == true
 
template <
   
typename VectorType,
   
typename std::enable_if<has_partitioners_are_compatible<VectorType>::value,
                           
VectorType>::type * = nullptr>
 
inline void
  check_vector_compatibility
(
   
const VectorType &                            vec,
   
const internal::MatrixFreeFunctions::DoFInfo &dof_info)
 
{
   
(void)vec;
   
(void)dof_info;
   
Assert(vec.partitioners_are_compatible(*dof_info.vector_partitioner),
           
ExcMessage(
             
"The parallel layout of the given vector is not "
             
"compatible with the parallel partitioning in MatrixFree. "
             
"Use MatrixFree::initialize_dof_vector to get a "
             
"compatible vector."));
 
}

located in vector_access_internal.h. Apparently I have a segfault in the function
   
 template <typename Number, typename MemorySpaceType>
   
inline bool
   
Vector<Number, MemorySpaceType>::partitioners_are_compatible(
     
const Utilities::MPI::Partitioner &part) const
   
{
     
return partitioner->is_compatible(part);
   
}

located in la_parallel_vector.templates.h, and thereby directly stopping the program without triggering the assert()-function. The direct gdb output for this function is
#3  0x00007ffff00c0c5c in dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>::partitioners_are_compatible (this=0x0, part=...) at ~Downloads/git-files/dealii/include/deal.II/lac/la_parallel_vector.templates.h:2021

I am not sure what that means. Could it point to a nullpointer somewhere (after this=0x0)? Though, when putting a breakpoint to the first function and printing the involved variables there, I get
(gdb) print dof_info.vector_partitioner
$2
= std::shared_ptr<const dealii::Utilities::MPI::Partitioner> (use count 3, weak count 0) = {get() = 0x7a60c0}
(gdb) print vec.partitioner
$3
= std::shared_ptr<const dealii::Utilities::MPI::Partitioner> (use count 3, weak count 0) = {get() = 0x7a60c0}

i.e. no null ptr. Could the problem be there, or somewhere else?
Thanks!

Martin Kronbichler

unread,
Jan 14, 2020, 6:43:40 AM1/14/20
to dea...@googlegroups.com

Dear Maxi,

It looks like you are using a scalar finite element (FE_Q<dim>), but you set FEEvaluation to operate on a vectorial field, i.e.,

        FEEvaluation<dim, degree, n_points_1d, dim, Number> phi(data);

Notice the 4th template parameter "dim", which should be one. I agree it is unfortunate that we do not provide a better error message, so I opened an issue for it, https://github.com/dealii/dealii/issues/9312

If you switch the parameter to 1, it should work. However, I want to point out that you use a continuous element with CellwiseInverseMassMatrix - this will not give you a valid matrix inverse. You need to either use a diagonal mass matrix (see step-48) or use the cellwise inverse as a preconditioner (when combined with suitable scaling) and solve the mass matrix iteratively.

Best,
Martin

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/d55e53b2-a0c8-4cc0-ab84-7521f8b23d81%40googlegroups.com.

Maxi Miller

unread,
Jan 14, 2020, 8:57:10 AM1/14/20
to deal.II User Group
For approximating the inverse mass matrix I followed step-48 with some modifications:
In the reinit-subroutine I added
        data.initialize_dof_vector(inv_mass_matrix);
        FEEvaluation<dim, degree, n_points_1d, 1, Number> fe_eval(data);
        const unsigned int           n_q_points = fe_eval.n_q_points;
        for (unsigned int cell = 0; cell < data.n_cell_batches(); ++cell)
        {
            fe_eval.reinit(cell);
            for (unsigned int q = 0; q < n_q_points; ++q)
                fe_eval.submit_value(make_vectorized_array(1.), q);
            fe_eval.integrate(true, false);
            fe_eval.distribute_local_to_global(inv_mass_matrix);
        }
        inv_mass_matrix.compress(VectorOperation::add);
        for (unsigned int k = 0; k < inv_mass_matrix.local_size(); ++k)
            if (inv_mass_matrix.local_element(k) > 1e-15)
                inv_mass_matrix.local_element(k) =
                        1. / inv_mass_matrix.local_element(k);
            else
                inv_mass_matrix.local_element(k) = 1;

and for applying the matrix I wrote
    template <int dim, int degree, int n_points_1d>
    void LaplaceOperator<dim, degree, n_points_1d>::local_apply_inverse_mass_matrix(
            const MatrixFree<dim, Number> &                   data,
            LinearAlgebra::distributed::Vector<Number> &      dst,
            const LinearAlgebra::distributed::Vector<Number> &src,
            const std::pair<unsigned int, unsigned int> &     cell_range) const
    {
        dst.reinit(src);
        dst = src;
        dst.scale(inv_mass_matrix);
    }
so that I could reuse the cell_loops from the original code. Still, the results indicate that something goes wrong. Should I handle the inverse matrix somehow different?
Thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to dea...@googlegroups.com.
main.cpp
Reply all
Reply to author
Forward
0 new messages