Is a call to compress() required after scale()?

30 views
Skip to first unread message

vachan potluri

unread,
Nov 24, 2019, 8:35:33 AM11/24/19
to deal.II User Group
Hello,

I am facing a weird problem. At a point in code, I have PETScWrappers::VectorBase::scale() called for few distributed vectors. Subsequently, I have assignment operator on ghosted versions of these vectors for parallel communication. When I launch the code with 2 or 4 processes, it works fine. But with 3 processes, the code halts after the scaling operations and before the first assignment. I am limited to 4 processes.
  1. Is a compress() required after scale()? With what operation as argument?
  2. Why does this behaviour occur only when 3 processes are launched? Has anyone experienced this before?
Thanks

Daniel Arndt

unread,
Nov 24, 2019, 10:12:37 AM11/24/19
to dea...@googlegroups.com
Vachan,

No there should be no need to call PETScWrappers::VectorBase::compress() after PETScWrappers::VectorBase::scale() (otherwise that would clearly be a bug in deal.II) since
scale() calls VecPointwiseMult which is already collective. Does adding compress() in fact change anything for you?
Otherwise, it would be good if you can share a minimal example showing the problem.

Best,
Daniel

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/31558d55-94ca-4ab8-b668-9c231788a99d%40googlegroups.com.

vachan potluri

unread,
Nov 24, 2019, 10:48:08 PM11/24/19
to deal.II User Group
Thank you for the reply Dr. Arndt. Let me give some more detail about the output with and without compress.

The code concerned snippet is the following. Just for ease in understanding, 'g' stands for global, 'crk' for current RK (for RK-type time stepping), 'avar' for auxiliary variables (three shear stresses and two heat fluxes) 'mu' and 'k' are viscosity and conductivity. 'gh_<vector>' is the ghosted version of <vector>. 'avar_vec' is an array auxiliary variable enums.

                // dof-wise multiplication with mu and k
        pcout << "Scaling stresses and heat fluxes\n";
        gcrk_avars[avar::txx].scale(gcrk_mu);
        pcout << "Scaled txx\n";
        gcrk_avars[avar::txy].scale(gcrk_mu);
        pcout << "Scaled txy\n";
        gcrk_avars[avar::tyy].scale(gcrk_mu);
        pcout << "Scaled tyy\n";
        gcrk_avars[avar::qx].scale(gcrk_k);
        pcout << "Scaled qx\n";
        gcrk_avars[avar::qy].scale(gcrk_k);
        pcout << "Scaled qy\n";

        // communicate
        for(avar var: avar_vec){
                pcout << "Compressing " << avar_names[var] << "\n";
                gcrk_avars[var].compress(VectorOperation::insert);
                pcout << "Communicating " << avar_names[var] << "\n";
                gh_gcrk_avars[var] = gcrk_avars[var];
        }
        pcout << "Communicated stresses and heat fluxes\n";

The output without compress (with first two lines in the last for loop commented) was as follows:

Scaling stresses and heat fluxes
Scaled txx
Scaled txy
Scaled tyy
Scaled qx
Scaled qy
Communicating tau_xx
Communicating tau_xy

And then it doesn't proceed. The output with compress is as follows:

Scaling stresses and heat fluxes
Scaled txx
Scaled txy
Scaled tyy
Scaled qx
Scaled qy
Compressing tau_xx
Communicating tau_xx
Compressing tau_xy

In both cases, I have verified that execution finishes (doesn't halt) when launched with 2 or 4 processes. I will post a minimal example if I can reproduce this behaviour.

Thanks

vachan potluri

unread,
Nov 25, 2019, 1:23:40 AM11/25/19
to deal.II User Group
I was able to reproduce this behaviour with the following code (also attached); the CMakeLists file is also attached. The code hangs after printing 'Scaled variable 0'.

Let me mention that I have used a different algorithm to obtain locally relevant dofs, rather than directly using the function from DoFTools. My algorithm is as follows:

Loop over owned interior cells
  Loop over faces
  If neighbor cell is ghost:
    Add all neighbors dofs on this face to relevant dofs

With this algorithm, relevant dofs are not all the ghost cell's dofs, but only those lying on a subdomain interface. This is implemented in lines 49-69 in the file main.cc. I verified that this algorithm works correctly for a small mesh, I don't presume this is wrong.

#include <deal.II/dofs/dof_handler.h>
#include <deal.II/dofs/dof_accessor.h>
#include <deal.II/dofs/dof_tools.h>
#include <deal.II/fe/mapping_q1.h>
#include <deal.II/fe/fe_dgq.h>
#include <deal.II/fe/fe_face.h>
#include <deal.II/grid/grid_generator.h>

#include <deal.II/base/conditional_ostream.h>
#include <deal.II/base/utilities.h>
#include <deal.II/base/index_set.h>
#include <deal.II/distributed/tria.h>
#include <deal.II/lac/generic_linear_algebra.h>

#include <array>
#include <vector>

/**
 * See README file for details
 */
using namespace dealii;
namespace LA
{
        using namespace ::LinearAlgebraPETSc;
}

int main(int argc, char **argv)
{
        Utilities::MPI::MPI_InitFinalize mpi_initialization(argc, argv, 1);

        parallel::distributed::Triangulation<2> triang(MPI_COMM_WORLD);
        GridGenerator::hyper_cube(triang);
        triang.refine_global(5);
        const MappingQ1<2> mapping;
        const uint degree = 1;
        FE_DGQ<2> fe(degree);
        FE_FaceQ<2> fe_face(degree);
        const std::array<uint, GeometryInfo<2>::faces_per_cell> face_first_dof{0, degree, 0, (degree+1)*degree};
        const std::array<uint, GeometryInfo<2>::faces_per_cell> face_dof_increment{degree+1, degree+1, 1, 1};
        
        DoFHandler<2> dof_handler(triang);
        dof_handler.distribute_dofs(fe);
        IndexSet locally_owned_dofs = dof_handler.locally_owned_dofs();
        IndexSet locally_relevant_dofs;
        
        locally_relevant_dofs = locally_owned_dofs; // initialise with owned dofs
        uint face_id, face_id_neighbor, i; // face ids wrt owner and neighbor
        std::vector<uint> dof_ids_neighbor(fe.dofs_per_cell);
        for(auto &cell: dof_handler.active_cell_iterators()){
                if(!(cell->is_locally_owned())) continue;
                for(face_id=0; face_id<GeometryInfo<2>::faces_per_cell; face_id++){
                        if(cell->face(face_id)->at_boundary()) continue;
                        if(cell->neighbor(face_id)->is_ghost()){
                                // current face lies at subdomain interface
                                // add dofs on this face (wrt neighbor) to locally relevant dofs
                                cell->neighbor(face_id)->get_dof_indices(dof_ids_neighbor);

                                face_id_neighbor = cell->neighbor_of_neighbor(face_id);
                                for(i=0; i<fe_face.dofs_per_face; i++){
                                        locally_relevant_dofs.add_index(
                                                dof_ids_neighbor[
                                                face_first_dof[face_id_neighbor] +
                                                i*face_dof_increment[face_id_neighbor]
                                                ]
                                        );
                                } // loop over face dofs
                        }
                } // loop over internal faces
        } // loop over locally owned cells
        
        ConditionalOStream pcout(std::cout, (Utilities::MPI::this_mpi_process(MPI_COMM_WORLD)==0));
        
        pcout << "Yosh\n";
        
        std::array<LA::MPI::Vector, 4> vecs;
        std::array<LA::MPI::Vector, 4> gh_vecs;
        LA::MPI::Vector scaler;
        for(uint var=0; var<4; var++){
                vecs[var].reinit(locally_owned_dofs, MPI_COMM_WORLD);
                gh_vecs[var].reinit(locally_owned_dofs, locally_relevant_dofs, MPI_COMM_WORLD);
        }
        scaler.reinit(locally_owned_dofs, MPI_COMM_WORLD);
        
        for(uint i: locally_owned_dofs){
                scaler[i] = 1.0*i;
        }
        std::vector<uint> dof_ids(fe.dofs_per_cell);
        
        // setting ops
        for(auto &cell: dof_handler.active_cell_iterators()){
                if(!(cell->is_locally_owned())) continue;
                
                cell->get_dof_indices(dof_ids);
                
                pcout << "\tCell " << cell->index() << "\n";
                pcout << "\t\tSetting\n";
                for(uint var=0; var<4; var++){
                        for(uint i: dof_ids){
                                vecs[var][i] = 1.0*i;
                        }
                        vecs[var].compress(VectorOperation::insert);
                }
                
                // addition ops
                pcout << "\t\tAdding\n";
                for(uint var=0; var<4; var++){
                        for(uint i: dof_ids){
                                vecs[var][i] += 1.0*i;
                        }
                        vecs[var].compress(VectorOperation::add);
                }
                
                // more ops
                pcout << "\t\tMore additions\n";
                for(uint var=0; var<4; var++){
                        for(uint i: dof_ids){
                                vecs[var][i] += -5.0*i;
                        }
                        vecs[var].compress(VectorOperation::add);
                }
        } // loop over owned cells
        // scaling and communicating
        pcout << "Scaling and communicating\n";
        for(uint var=0; var<4; var++){
                vecs[var].scale(scaler);
                pcout << "Scaled variable " << var << "\n";
                gh_vecs[var] = vecs[var];
                pcout << "Communicated variable " << var << "\n";
        }
        pcout << "Completed all\n";
        
        return 0;
}
CMakeLists.txt
main.cc

Matthias Maier

unread,
Nov 25, 2019, 10:21:30 AM11/25/19
to dea...@googlegroups.com

On Mon, Nov 25, 2019, at 00:23 CST, vachan potluri <vachanpo...@gmail.com> wrote:

> I was able to reproduce this behaviour with the following code (also
> attached); the CMakeLists file is also attached. The code hangs after
> printing 'Scaled variable 0'.

The problem is the communication pattern:

90 for(auto &cell: dof_handler.active_cell_iterators()){
91 if(!(cell->is_locally_owned())) continue;
92
93 cell->get_dof_indices(dof_ids);
97 for(uint var=0; var<4; var++){
101 vecs[var].compress(VectorOperation::insert);
102 }
106 for(uint var=0; var<4; var++){
110 vecs[var].compress(VectorOperation::add);
111 }
115 for(uint var=0; var<4; var++){
119 vecs[var].compress(VectorOperation::add);
120 }
121 } // loop over owned cells

The call to compress is a collective operation - that requires all MPI
ranks to participate. But here you are iterating over active cells of a
distributed triangulation. If the number of locally owned cell is not
perfectly divisible by three the program will hang...

All your MPI communication should look like this:

for (auto &cell: dof_handler.active_cell_iterators()) {
// do something
} /* end of for loop */

[...].compress(...) // communicate

Best,
Matthias
Reply all
Reply to author
Forward
0 new messages