Access specific element within a distributed triangulation

130 views
Skip to first unread message

Seyed Ali Mohseni

unread,
Feb 24, 2017, 5:12:59 AM2/24/17
to deal.II User Group
Hi,

Is it possible to access a specific element within a parallel distributed triangulation by means of the following procedure for instance:

typename DoFHandler<dim>::active_cell_iterator cell = dof_handler.begin_active(), endc = dof_handler.end();

for (; cell != endc; ++cell)
{
     if (cell->is_locally_owned())
     {
 std::cout << "CELL NUMBER 4 VERTEX 1: " << cell(4)->vertex(0) << std::endl;
     }
}

The above won't compile.

How am I able to access for example element number 4 in order to obtain the coordinates of point 1 within the element.

Kind regards,
S. A. Mohseni

Jean-Paul Pelteret

unread,
Feb 24, 2017, 8:01:35 AM2/24/17
to deal.II User Group
Dear Seyed,

There are no cell "numbers" in deal.II, only CellIDs to give some unique identifier to each cell. So its unclear to me as to exactly what you're trying to achieve here. 

Are you wanting the fourth cell iterator for the entire triangulation, or the fourth locally owned cell? Either way, you need to increment your cell iterator by however many times you want, taking note that dependent on how you do this you may not end up with a locally owned cell. You can increment iterators by a given number of times using std::advance, and you may wish to consider using FilteredIterators with the LocallyOwned predicate to ensure that you always end up with a cell thats valid on the given process.

I hope that this helps,
Jean-Paul

Timo Heister

unread,
Feb 24, 2017, 8:29:53 AM2/24/17
to dea...@googlegroups.com

Seyed Ali Mohseni

unread,
Feb 24, 2017, 9:45:16 AM2/24/17
to deal.II User Group

Thanks for your suggestions.


Generally, in FEM you number elements like in the below picture: 


I understand of course, that the cells (elements) in deal.II are parallely distributed, but it should be possible to identify the geometrical position and ID of the elements and their corresponding nodes. 


For instance the following works:


typename parallel::distributed::Triangulation<dim>::active_cell_iterator cell = triangulation.begin_active(), endc = triangulation.end();
for (; cell != endc; ++cell)
{
     std::cout << "CELL NUMBER: " << cell->id() << std::endl;
}



Of course here the locally owned term is missing, but is it correct to access the cells like this?


I am a bit confused since it is a rather easy task which should not be such a big "deal" in deal.II.


Best regards,

S. A. Mohseni



Jean-Paul Pelteret

unread,
Feb 24, 2017, 10:11:43 AM2/24/17
to dea...@googlegroups.com
Hi Seyed,

Nope, like I said, there's no such thing as a cell number in deal.II (at least in the sense that a cell knows its own "id" number). The cell->id() function returns a CellId object but it does not provide an index/counter in the sense that you think it does. One possible reason for this is that (refined) deal.II triangulations are stored in levels, and the numbering would have to be rebuilt each time you change the mesh. This becomes tricky for distributed triangulations where the actual mesh stored on each processor is different. Also, does one want to count inactive cells or not? Sometimes yes, sometimes no.

So, if for example I want to store the material ID for each cell in a vector I would have to do it like this:
Vector<double> material_id;
material_id.reinit(triangulation.n_active_cells());

unsigned int c = 0;
typename DoFHandler<dim>::active_cell_iterator
cell = dof_handler.begin_active(),
endc = dof_handler.end();
for (; cell!=endc; ++cell, ++c)
 {
   material_id(c) = static_cast<int>(cell->material_id());
 }


Notice that I create and increment the cell counter "c". Notice as well that there is no presupposition of the order that deal.II traverses its cells.

So dependent on what you're wanting to achieve, you probably wish to implement a permutation of the above. Picking out the cell that you are wanting to query is probably best achieved using a geometric query or by premarking it (e.g. by setting its user_index).

Best,
Jean-Paul

Wolfgang Bangerth

unread,
Feb 24, 2017, 10:51:11 AM2/24/17
to dea...@googlegroups.com

Seyed,

> Generally, in FEM you number elements like in the below picture:
>
> <https://lh3.googleusercontent.com/-0n-HlF1EsRs/WLBEmN34aaI/AAAAAAAAAGU/GlK9R9OqDJ8FSTLKGXeaI85eU6pdV3MkwCLcB/s1600/System.png>

No, that is patently wrong. First, you are only considering rectangular
domains subdivided into rectangular cells. Second, you think that
because there is a "simple" numbering that that is what codes "should
do". Both of these assumptions are wrong. How, for example, should a
code number these cells:

https://raw.githubusercontent.com/dealii/dealii/master/doc/doxygen/images/distributed_mesh_1.png

https://raw.githubusercontent.com/dealii/dealii/master/doc/doxygen/images/hyper_shell_96_cut.png

Starting a sentence with "Generally, in FEM you number elements..." is
generally wrong.


> I am a bit confused since it is a rather easy task which should not be
> such a big "deal" in deal.II.

Because it's *not* easy. It's only easy if you consider simple situation
like yours. But if you want to do complicated things, they generally
turn out to be complicated.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/
Message has been deleted

Seyed Ali Mohseni

unread,
Feb 26, 2017, 9:35:58 AM2/26/17
to deal.II User Group
Thank you everyone. I understand your suggestions. From what I learned now in deal.II is, that each cell is owned by a specific processor which means I cannot access the information within a locally owned cell. That's why I have to loop over all cells and work with distributed data without using locally owned entities.     

Maybe I describe my aim more properly. I like to find the maximum value within a MPI vector such as our locally_relevant_solution in step-40. Hence, I have to loop over all cells and all nodes to find where the geometrical position and value of the displacement lies. How is this possible within a parallely distributed environment?

My approach using local_dof_indices is somehow impossible. It only works for 1 core according to the below code:

std::vector<double> displacement_norms, loaded_nodes, loaded_elements;
double max_displacement;
int max_loaded_node, max_loaded_element;

typename DoFHandler<dim>::active_cell_iterator cell = dof_handler.begin_active(), endc = dof_handler.end();
std::vector<unsigned int> local_dof_indices(fe.dofs_per_cell);
unsigned int cell_number = 0;

for (; cell != endc; ++cell, ++cell_number)
{
     pcout << "CELL ID: " << cell_number << std::endl;     // this works surprisingly.

     cell->get_dof_indices(local_dof_indices);    // Output or accessing this won't work I assume.

     for (unsigned int i = 0, j = 1; i < fe.dofs_per_cell; i += 2, j += 2)
     {
  double displacement_x = locally_relevant_solution(local_dof_indices[i]);
  double displacement_y = locally_relevant_solution(local_dof_indices[j]);

          double displacement_norm = sqrt(displacement_x * displacement_x + displacement_y * displacement_y);

          loaded_nodes.push_back(cell->vertex_index(v));
          loaded_elements.push_back(cell_number);
          displacement_norms.push_back(displacement_norm);
     }
}

max_displacement = *max_element(displacement_norms.begin(), displacement_norms.end());

double max_index = distance(displacement_norms.begin(), max_element(displacement_norms.begin(), displacement_norms.end()));

// Node and element number of maximal displacements
max_loaded_node = loaded_nodes[max_index];
max_loaded_element = loaded_elements[max_index];


Hopefully someone has an idea to help me out. Thank you :)

Jean-Paul Pelteret

unread,
Feb 26, 2017, 10:25:52 AM2/26/17
to deal.II User Group
Hi Seyed,

If you're wanting the infinity norm of a deal.II vector, then there's already a function for that. You can find the equivalent function for the specific MPI vector class that you're using in its documentation.

Regards,
Jean-Paul

Wolfgang Bangerth

unread,
Feb 26, 2017, 10:47:18 AM2/26/17
to dea...@googlegroups.com
On 02/26/2017 07:35 AM, 'Seyed Ali Mohseni' via deal.II User Group wrote:
> Thank you everyone. I understand your suggestions. From what I learned now in
> deal.II is, that each cell is owned by a specific processor which means I
> cannot access the information within a locally owned cell. That's why I have
> to loop over all cells and work with distributed data without using locally
> owned entities.

Seyed -- no, that's exactly the opposite of what you can do. "Locally owned"
means precisely that this is the kind of data the current processor has access
to. You cannot access data that is *not locally owned* (with the exception of
ghost elements).


> My approach using local_dof_indices is somehow impossible. It only works for 1
> core according to the below code:
>
> |
> std::vector<double> displacement_norms, loaded_nodes, loaded_elements;
> double max_displacement;
> int max_loaded_node, max_loaded_element;
>
> typename DoFHandler<dim>::active_cell_iterator cell =
> dof_handler.begin_active(), endc = dof_handler.end();
> std::vector<unsigned int> local_dof_indices(fe.dofs_per_cell);
> unsigned int cell_number = 0;
>
> for (; cell != endc; ++cell, ++cell_number)
> {
> pcout << "CELL ID: " << cell_number << std::endl; // this works
> surprisingly.
>
> cell->get_dof_indices(local_dof_indices); // Output or accessing this
> won't work I assume.

It totally should. But it only will if cell->locally_owned()==true. You do not
have this information for cells that are not locally owned.

>
> for (unsigned int i = 0, j = 1; i < fe.dofs_per_cell; i += 2, j += 2)
> {
> double displacement_x = locally_relevant_solution(local_dof_indices[i]);
> double displacement_y = locally_relevant_solution(local_dof_indices[j]);
>
> double displacement_norm = sqrt(displacement_x * displacement_x +
> displacement_y * displacement_y);
>
> loaded_nodes.push_back(cell->vertex_index(v));
> loaded_elements.push_back(cell_number);
> displacement_norms.push_back(displacement_norm);
> }
> }
>
> max_displacement = *max_element(displacement_norms.begin(),
> displacement_norms.end());
>
> double max_index = distance(displacement_norms.begin(),
> max_element(displacement_norms.begin(), displacement_norms.end()));
>
> // Node and element number of maximal displacements
> max_loaded_node = loaded_nodes[max_index];
> max_loaded_element = loaded_elements[max_index];
> |

At this point, with the
if (cell->is_locally_owned())
statement added to the code above, every processor has now computed the
maximal displacement among all nodes it knows about. You then need to find the
maximum among all processors, for example using Utilities::MPI::max().

Seyed Ali Mohseni

unread,
Feb 26, 2017, 11:38:59 AM2/26/17
to deal.II User Group
Seyed -- no, that's exactly the opposite of what you can do. "Locally owned" 
means precisely that this is the kind of data the current processor has access 
to. You cannot access data that is *not locally owned* (with the exception of 
ghost elements). 

@ Prof. Bangerth: Wow, I totally wrote something else what I meant actually. Of course, exactly like you said I know only information about cells I own. 

@ Jean-Paul: Thank you. I will try that out.

Seyed Ali Mohseni

unread,
Mar 2, 2017, 5:04:21 AM3/2/17
to deal.II User Group
Dear Prof. Bangerth and Jean-Paul,

What I understood so far is, that the Utilities::MPI::max() function computes the maximum between all processors for each row of my vector. This means if I have a vector of 20 entries, each entry has 8 values distributed on 8 cores for instance. As a result, I get 20 maximum values again, but this time they are maximal and processor independent. 

Hence, my question now is how am I able to check where the maximum value comes from? Is it possible to output the processor number of the maximum value for each row?
My aim is to find the node where the maximum value exists. This is somehow cumbersome in parallel mode since I have to store information about the current CPU core, cell ID and node number. Then somehow use my previously shown distance function in C++ and check the position of the maximum value with regard to the geometry structure I stored.

Is there a more elegant way to solve my problem than what I suggested in deal.II?

Thank you for your kind assistance so far :)
And hopefully you overlook my style of approaching things in this rather silly and questionable way.

Kind regards,
Seyed Ali

Daniel Arndt

unread,
Mar 2, 2017, 6:19:40 AM3/2/17
to deal.II User Group
Seyed,


What I understood so far is, that the Utilities::MPI::max() function computes the maximum between all processors for each row of my vector. This means if I have a vector of 20 entries, each entry has 8 values distributed on 8 cores for instance. As a result, I get 20 maximum values again, but this time they are maximal and processor independent. 
Correct.
 
Hence, my question now is how am I able to check where the maximum value comes from? Is it possible to output the processor number of the maximum value for each row?
If you look at the implementation of Utilities::MPI::max() [2], you see that this is essentially a wrapper around MPI_Allreduce called in [3].
What you want is currently not implemented in deal.II, but MPI_MAXLOC is exactly doing this. Example 3 in [1] shows how to use this, i.e.
 - copy your std::vector into a struct containing a value and the MPI process number
 - call MPI_Allreduce( in_struct, out_struct, vector.size(), MPI_DOUBLE_INT, MPI_MAXLOC, mpi_communicator);
 - the maximum value and the number of the first process with that value for each entry of your vector is stored in out_struct

My aim is to find the node where the maximum value exists. This is somehow cumbersome in parallel mode since I have to store information about the current CPU core, cell ID and node number. Then somehow use my previously shown distance function in C++ and check the position of the maximum value with regard to the geometry structure I stored.
If you save the cells/positions containing the maximum values for each process before comparing these across processes, you should be fine with this approach.
 
Best,
Daniel

Seyed Ali Mohseni

unread,
Mar 2, 2017, 11:51:32 AM3/2/17
to deal.II User Group
Dear Daniel,

Thank you. Does your approach also apply to MPICH? Since I am using MPICH, I wonder, if this command can be used there also.
This struct variable, is there an example within deal.II steps? Or do you have an example in C++ I can learn from. 

To be honest I am not yet very familiar with MPI itself, barely understanding deal.II parallelism ;)

I have also another idea: How about I find the maximum value regardless of the position and processor core. Then just loop again over all cells and compare the maximum value with the existing value within each cell. Would this be a very inefficient approach or is looping over cells usually computationally cheap in deal.II.

Kind regards,
Seyed Ali 

Daniel Arndt

unread,
Mar 2, 2017, 1:39:48 PM3/2/17
to deal.II User Group
Seyed,


Thank you. Does your approach also apply to MPICH? Since I am using MPICH, I wonder, if this command can be used there also.
This struct variable, is there an example within deal.II steps? Or do you have an example in C++ I can learn from. 
Since MPICH is (at least) MPI-1.1 conforming, you should be able to use that approach. MPI supports Fortran and C. Therefore, the example I referenced before is probably quite close to what you can get within C++.
 
To be honest I am not yet very familiar with MPI itself, barely understanding deal.II parallelism ;)
It's certainly hard to understand MPI in deal.II without knowing how MPI in principle works ;-).

I have also another idea: How about I find the maximum value regardless of the position and processor core. Then just loop again over all cells and compare the maximum value with the existing value within each cell. Would this be a very inefficient approach or is looping over cells usually computationally cheap in deal.II.
What you can do instead, is to first find the maximum values for each process, then compute the maximum of these values across all processes and finally compare the local maximum with the global maximum. There is no reason to loop again through all cells.
This should not be much more expensive than using MPI_MAXLOC directly.

Best,
Daniel

Seyed Ali Mohseni

unread,
Mar 3, 2017, 11:14:58 AM3/3/17
to deal.II User Group
Dear Daniel,

Thanks a lot. I understand now.
Example 3 is exactly what I need, I try the MPI_MAXLOC approach.

But is it necessary to #include <mpi.h> ?
Since deal.II won't recognize functions such as MPI_Comm_rank(MPI_COMM_WORLD, &myrank) for instance or MPI_Reduce.
Is it implemented in Utilities::MPI::something ? 

Kind regards,
Seyed Ali

Daniel Arndt

unread,
Mar 5, 2017, 7:38:20 AM3/5/17
to deal.II User Group
Seyed,
 
But is it necessary to #include <mpi.h> ?
Since deal.II won't recognize functions such as MPI_Comm_rank(MPI_COMM_WORLD, &myrank) for instance or MPI_Reduce.
Is it implemented in Utilities::MPI::something ? 
If you are using a parallel Triangulation you automatically include "<mpi.h>". You can be sure that the comiler will tell you if you need to include that file explicitly.

Best,
Daniel

Seyed Ali Mohseni

unread,
Mar 7, 2017, 7:31:07 AM3/7/17
to deal.II User Group
Dear Daniel,

Thanks a lot. I tried the MPI_MAXLOC approach and it works.
Unfortunately though, there is a slight problem I wasn't able to understand yet fully.

int disp_size = triangulation.n_vertices();

struct
{
double value;
int rank;
} in[disp_size], out[disp_size];

typename DoFHandler<dim>::active_cell_iterator cell = dof_handler.begin_active(), endc = dof_handler.end();
std::vector<unsigned int> local_dof_indices(fe.dofs_per_cell);
unsigned int cell_number = 0;

for (; cell != endc; ++cell, ++cell_number)
{
if ( cell->is_locally_owned() )
{
cell->get_dof_indices(local_dof_indices);

for (unsigned int i = 0, j = 1; i < fe.dofs_per_cell; i += 2, j += 2)
{
unsigned int v = i / dim;

double displacement_x = locally_relevant_solution(local_dof_indices[i]);
double displacement_y = locally_relevant_solution(local_dof_indices[j]);

double displacement_norm = sqrt(displacement_x * displacement_x + displacement_y * displacement_y);

in[cell->vertex_index(v)].value = displacement_norm;
in[cell->vertex_index(v)].rank = Utilities::MPI::this_mpi_process(mpi_com);
}
}
}

MPI_Reduce(in, out, disp_size, MPI_DOUBLE_INT, MPI_MAXLOC, 0, mpi_com);

std::vector<double> maximum_disp_norms(disp_size);
std::vector<int> disp_ranks(disp_size), disp_nodes(disp_size);
double max_displacement;
int max_index, max_rank, max_node_id;

if ( Utilities::MPI::this_mpi_process(mpi_com) == 0 )
{
for (int i = 0; i < disp_size; ++i)
{
maximum_disp_norms[i] = out[i].value;
disp_ranks[i] = out[i].rank;
disp_nodes[i] = i;

if ( out[i].value > 1e10 )
{
maximum_disp_norms[i] = 0;
disp_ranks[i] = 0;
}
}

        // Maximum norm of displacements
max_displacement = *max_element(maximum_disp_norms.begin(), maximum_disp_norms.end());
max_index = distance(maximum_disp_norms.begin(), max_element(maximum_disp_norms.begin(), maximum_disp_norms.end()));
max_rank = disp_ranks[max_index];
max_node_id = disp_nodes[max_index];
}

// Maximum displacement position
cell = dof_handler.begin_active(), endc = dof_handler.end();
cell_number = 0;

for (; cell != endc; ++cell, ++cell_number)
{
if ( cell->is_locally_owned() )
{
                // This is just testwise
if ( Utilities::MPI::this_mpi_process(mpi_com) == max_rank )
{
std::cout << Utilities::MPI::this_mpi_process(mpi_com) << std::endl;
}

for (int v = 0; v < GeometryInfo<dim>::vertices_per_cell; ++v)
{
if ( cell->vertex_index(v) == max_node_id )
{
// std::cout << "VERTEX ID: " << cell->vertex_index(v) << " WITH COORDINATES: " << cell->vertex(v) << std::endl;
}
}
}
}

Now the funny part is, if I set max_rank manually such as max_rank = 3 for instance, it works and for the currently owned rank I receive an output within terminal. Another thing is that MPI_Reduce creates somewhere some big values, which is why I fixed it by erasing values greater for 1e10. Finding the maximum value is put inside the if ( Utilities::MPI::this_mpi_process(mpi_com) == 0 ) part, so I can check, if it is dependent on the processor I compute it for. Surprisingly, this works only in the first step and afterwards for higher loading steps I receive no output. If I use MPI_Reduce for the 3rd processor such as MPI_Reduce(in, out, disp_size, MPI_DOUBLE_INT, MPI_MAXLOC, 3, mpi_com); then I check for the max_rank again, it works. I am a bit confused...

What could be the error here?


Kind regards,
Seyed Ali


Wolfgang Bangerth

unread,
Mar 7, 2017, 8:42:39 AM3/7/17
to dea...@googlegroups.com
On 03/07/2017 05:31 AM, 'Seyed Ali Mohseni' via deal.II User Group wrote:
> Now the funny part is, if I set max_rank manually such as max_rank = 3 for
> instance, it works and for the currently owned rank I receive an output within
> terminal. Another thing is that MPI_Reduce creates somewhere some big values,
> which is why I fixed it by erasing values greater for 1e10.

Seyed -- you need to learn to debug these things. Just ignoring values greater
than 1e10 means that you don't understand why these values are there -- but
then how can you be sure that the *other* values are correct?

You need to learn strategies to figure these things out. Run the program in a
debugger. Print the values that you send to MPI_Reduce and compare what that
function returns with what you *expect* it to return, etc. You cannot write
software that does what it is supposed to do if you don't understand what it
computes. Learning strategies to debug software is the only way you can learn
to write good software.

Best
WB

Seyed Ali Mohseni

unread,
Mar 7, 2017, 9:33:31 AM3/7/17
to deal.II User Group
Dear Prof. Bangerth,

MPI_Reduce computes exactly what I want. I checked the values and they are correct. The value higher than 1e10 is probably a MPI bug while trying to find the maximum value. 
But this problem has nothing to do with my max_rank problem.

I thought maybe you are MPI experts and could give me a hint.

For instance this MPI_MAXLOC idea was a great suggestion from Daniel. I also wonder why you haven't included it in deal.II yet. 

Debugging MPI problems is quite cumbersome, you know that yourself. Setting up such a debugger environment is described in one of your video tutorials, but it takes more time I assume than solving this small problem.
I agree, if I want to dive deeply into software design I should improve my MPI skills, but I am merely trying to use tools you offer me in deal.II.
As an inventor of deal.II, I think it should be in your interest to help your users find a way solving their problems in your software.

Until now, I haven't yet felt the benefits of deal.II since the programming effort is so high it doesn't make up for the computational speed you would gain, "if" you are able to achieve your goals.
Hence, telling your interested PhD student to go and "learn debugging" is a bit disappointing.

Kind regards,
Seyed Ali 

Seyed Ali Mohseni

unread,
Mar 7, 2017, 9:46:39 AM3/7/17
to deal.II User Group
I think I figured it out ;) 
After thinking about your suggestion, Prof. Bangerth.
It came to my mind, that the result of my max_rank variable is stored on a specific processor due to MPI_Reduce.
That means I cannot access these data parts without being currently at the correspinding processor rank.
For instance I store my results on rank 0, if I want to check the max_rank data on processor rank 3, it is empty.
That is why I cannot get any output, because when rank 3 owns everything output works fine.

Now my question how can I store the data on all processors?
Or how am I able to at least store my max_rank variable on all processors?

BR,
Seyed Ali 

Wolfgang Bangerth

unread,
Mar 7, 2017, 10:02:44 AM3/7/17
to dea...@googlegroups.com
On 03/07/2017 07:46 AM, 'Seyed Ali Mohseni' via deal.II User Group wrote:
>
> Now my question how can I store the data on all processors?
> Or how am I able to at least store my max_rank variable on all processors?

The function you're looking for is called MPI_Allreduce.
Best
W.

Wolfgang Bangerth

unread,
Mar 7, 2017, 10:39:06 AM3/7/17
to dea...@googlegroups.com

Seyed,

> MPI_Reduce computes exactly what I want. I checked the values and they are
> correct. The value higher than 1e10 is probably a MPI bug while trying to
> find the maximum value. But this problem has nothing to do with my max_rank
> problem.
>
> I thought maybe you are MPI experts and could give me a hint.
>
> For instance this MPI_MAXLOC idea was a great suggestion from Daniel. I
> also wonder why you haven't included it in deal.II yet.
>
> Debugging MPI problems is quite cumbersome, you know that yourself. Setting
> up such a debugger environment is described in one of your video tutorials,
> but it takes more time I assume than solving this small problem. I agree,
> if I want to dive deeply into software design I should improve my MPI
> skills, but I am merely trying to use tools you offer me in deal.II. As an
> inventor of deal.II, I think it should be in your interest to help your
> users find a way solving their problems in your software.

Let me call you out here: It is, without a doubt, in our interest to help our
users. And we do every day: We have on the order of maybe a thousand questions
per year on the mailing list, and a group of half a dozen of us answer the
vast majority of those. I am certain that over the past 17 years, I have
written more than 10,000 emails to the mailing lists, and many others have
written a large number as well. To claim that we do not try to be helpful is
factually not correct.

That said, there is a limit to what we can do on a mailing list. We can not,
for example, debug every user's programs if they do not work. We have too few
volunteers for this, and it is also not something our employers pay us for.
You have received significant help on this mailing list already in the past
few weeks, but I don't think you can expect that we continue to debug your
programs indefinitely. So what we try to do -- and what I tried to do in my
previous mail -- is to teach *skills* so that users learn to debug their own
programs.

That is what I tried to do in my last email. It may not have been worded
particularly well, and I apologize if it came over as rude, but I will try
again. Let me state again what I said:

> Seyed -- you need to learn to debug these things. Just ignoring values
> greater than 1e10 means that you don't understand why these values are
> there -- but then how can you be sure that the *other* values are correct?
>
> You need to learn strategies to figure these things out. Run the program in
> a debugger. Print the values that you send to MPI_Reduce and compare what
> that function returns with what you *expect* it to return, etc. You cannot
> write software that does what it is supposed to do if you don't understand
> what it computes. Learning strategies to debug software is the only way you
> can learn to write good software.

What I mean here is this:

1/ If your program gives you values of 1e10, and you don't understand where
they are coming from, then you cannot be sure that the rest is correct. You
can, of course, ignore everything that is bigger than 1e9, but how do you know
that that is the correct threshold? You claim that this is a bug in MPI --
something I am 99.99% sure is wrong -- but assuming that that is the case,
what do you do if MPI gives you wrong values that are 1e8? What if they are
1e2 or 3.141 and in the middle of all of the values that are correct? How will
you filter out the wrong values?

The point I'm making is that you are in the *wrong mindset* if you think you
can ignore values that are wrong and that you don't understand where they are
from. You can say it's rude of me to point that out, but I have been
programming since 1987 and I have learned a few things since then. One of them
is that if there are wrong values, you *need to find out where they come
from*. So what I was trying to say in that statement above is to *teach you a
skill*, namely to not ignore wrong values but to follow their trail and find
out what is happening.

2/ I think I actually was being specific in how to debug things, and didn't
just say "you need to debug". I did say, for example, that you ought to print
out the things you send to MPI_Reduce, and print the things you get back, and
make sure these are all correct, on all processors. You can do this in a
debugger, or just via printf. My choice for these things is to use a debugger
(there is a video lecture on how to do this with MPI). Of course it takes a
bit of time to set it up once, but that is again a skill that you will need
one way or the other at one point as your programs become more complicated.

Seyed Ali Mohseni

unread,
Mar 7, 2017, 10:55:47 AM3/7/17
to deal.II User Group
Dear all,

With MPI_Allreduce it works like a charm. Thank you very much everyone.

Especially Prof. Bangerth and Daniel.

Daniel Arndt

unread,
Mar 7, 2017, 11:54:40 AM3/7/17
to deal.II User Group
Seyed,

after having just a quick glance over your approach there seem to be some more issues you can easily stumble upon:
- n_vertices() only gives you the number of vertices for one process, not the global one. In particular, you can't rely on the fact that this is the same for all processes. Furthermore, you (probably) don't initialize all the values of your struct. This might be the reason for the large numbers you are observing.
- You seem to rely on a specific numbering of the degrees of freedom in FEValues. In particular, you are assuming that you have only nodal values at the cell vertices. This can only be true for Q1 elements. Furthermore, you rely on the fact that after a local dof for the x-component, a local dof for the y-component is stored. This is an unsafe assumption. The proper way of doing this is to use an appropriate Quadrature object that is initialized with support points at the points you are interested in. Have a look at the FAQ [1,2].
- You seem to first calculate the maximum for each vertex across all the processes and after that the maximum over all vertices. It would probably be easier to first find the maximum value and location on each process and after that just compare this one value across all the processes. This would probably also avoid your problems with large numbers.

Else I can just agree with Wolfgang, learning how to debug is a crucial point if you ever want to build working software.

Best,
Daniel

[1] https://github.com/dealii/dealii/wiki/Frequently-Asked-Questions#how-do-i-access-values-of-discontinuous-elements-at-vertices
[2] https://github.com/dealii/dealii/wiki/Frequently-Asked-Questions#how-to-get-the-mapped-position-of-support-points-of-my-element

Seyed Ali Mohseni

unread,
Mar 8, 2017, 4:48:32 AM3/8/17
to deal.II User Group
@Prof. Bangerth: I will try to learn debugging parallel code and also to follow your words of advice. Please do not apologize since I don't think you ever became rude. I mean, after 10000 emails I would definitely not be as calm as you. Thank you.  :) 

@Daniel: Thumbs up. From your words I realize, there is still a lot to learn about deal.II yet. To be honest, I always try to solve it in a simple way than I extend it to the general case such as 3D, DG elements, etc. Thank you for all the help.

Extra question: Can we store variables or copy them to all processors? Since I am filling a variable in a locally owned cell, currently on rank 3. Then my other ranks, especially the root rank 0 has no clue of the values which are set. Hence, there has to be a possibility to copy data to other processors or let them know about the content?

Kind regards,
Seyed Ali
 

Jean-Paul Pelteret

unread,
Mar 8, 2017, 5:11:05 AM3/8/17
to deal.II User Group
Dear Seyed,
 
Extra question: Can we store variables or copy them to all processors? Since I am filling a variable in a locally owned cell, currently on rank 3. Then my other ranks, especially the root rank 0 has no clue of the values which are set. Hence, there has to be a possibility to copy data to other processors or let them know about the content?
 
This falls into the category of standard, but very much nontrivial, MPI questions that are unrelated to deal.II itself. How you would implement the point-to-point communication of data depends very much on, amongst other things, the type of data that is being transferred, and how/to which processes the information is being broadcast and received. 

So I would highly recommend that you take a look at the documentation for the MPI library that you are using and perhaps also go through a couple of tutorials that focus on MPI and its communication models itself. Otherwise, there are examples in the source code and tests that you can use to better understand how you might implement whatever it is that you're trying to do.

Regards,
Jean-Paul

Seyed Ali Mohseni

unread,
Mar 8, 2017, 6:51:53 AM3/8/17
to deal.II User Group
@Daniel:

- n_vertices() only gives you the number of vertices for one process, not the global one. In particular, you can't rely on the fact that this is the same for all processes. Furthermore, you (probably) don't initialize all the values of your struct. This might be the reason for the large numbers you are observing.

Exactly. I fixed the large numbers by setting all entries initially 0. I also had that in mind, but assumed that we already initialize everything the moment we assign its size.

Here the solution for whom it may be interesting. 

int disp_size = triangulation.n_vertices();

struct
{
double value = 0;
int rank = 0;
} in[disp_size], out[disp_size];

Surprisingly, n_vertices() gives me exactly the number of the total vertices. I checked it. Is it not always like this? And what would be the command to obtain the global one?

@ Jean-Paul: Ah, ok. I assumed there are special MPI_Send functions implemented in deal.II which I could use such as Utilities::MPI::max() for instance.

BR,
Seyed Ali
Message has been deleted

Seyed Ali Mohseni

unread,
Mar 8, 2017, 9:51:50 AM3/8/17
to deal.II User Group
Dear all,

I was able to solve the MPI variable problem. 

For the users who may have the same issue once:

if ( this_mpi_process == max_rank )
{
     for (unsigned int i = 0; i < n_mpi_processes; ++i)
     {
  if ( i != max_rank )
       MPI_Send(&position, 1, MPI_DOUBLE, i, 0, mpi_com);
     }
}
else
{
     MPI_Recv(&position, 1, MPI_DOUBLE, max_rank, 0, mpi_com, MPI_STATUS_IGNORE);
     printf("Process %d received number from process %d\n", this_mpi_process, max_rank);
}

A double variable called position (just as an example here, you have to pass x,y,z, so 3 variables) can be defined earlier which has then to be filled on the rank equal to max_rank. 

Finally, I can proceed. The code was tested for ranks 1 to 16 and works perfectly. Every time the maximum value and position remains correct and unchanged, but the maximum rank (max_rank) changes dependent on the core numbers.

Thank you a lot. 
The community here is extraordinary. ;)

Best regards,
S. A. Mohseni

Bruno Turcksin

unread,
Mar 8, 2017, 10:46:23 AM3/8/17
to deal.II User Group
Seyed,

you probably want to use MPI_Bcast instead of your code (see http://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/)

Best,

Bruno
Message has been deleted
Message has been deleted

Seyed Ali Mohseni

unread,
Mar 8, 2017, 2:09:31 PM3/8/17
to deal.II User Group
Dear Bruno,

Thanks. That's even better and even in just a single line of code. ;)

MPI_Bcast(&position, 1, MPI_DOUBLE, max_rank, mpi_com);

Best,
Seyed Ali
Reply all
Reply to author
Forward
0 new messages