MPI, synchronize processes

Uclus Heis

ungelesen,

17.02.2022, 04:01:3217.02.22

an deal.II User Group

Dear all,

I am copying a distributed vector using the following function:

Vector<Number>& Vector< Number >::operator=(const PETScWrappers::VectorBase & v)

In the documentation it is written:

"Note that due to the communication model used in MPI, this operation can only succeed if all processes do it at the same time when v is a distributed vector: It is not possible for only one process to obtain a copy of a parallel vector while the other jobs do something else."

I would like to ask how can I synchronize all the processes to call the function operator() at the same time. Is there a barrier function or similar?

Thank you

Bruno Turcksin

ungelesen,

17.02.2022, 08:57:5017.02.22

an deal.II User Group

Hello,

You don't need to synchronize anything the function has an MPI_Barrier. This is the reason for the comment. If one processor call the function but the others don't, the code will hang.

Best,

Bruno

Uclus Heis

ungelesen,

17.02.2022, 11:22:4217.02.22

an deal.II User Group

Dear Bruno,

I still had problems as I first copy the array and then I store it in a matrix for diffeerent frequencies.The result I got was differet whene using few process compared to using 1 single process. I added the following code and now works, is it right?

        {
            static Utilities::MPI::CollectiveMutex      mutex;
            Utilities::MPI::CollectiveMutex::ScopedLock lock(mutex, mpi_communicator);
           
               tmparray.operator=(locally_relevant_solution);
          for ( int j= 0; j < dof_handler.n_dofs(); ++j)
                sol_matrix(freq_iter, j)=tmparray(j);
         }

Thank you

Wolfgang Bangerth

ungelesen,

17.02.2022, 13:02:0717.02.22

an dea...@googlegroups.com

On 2/17/22 09:22, Uclus Heis wrote:
>
> I still had problems as I first copy the array and then I store it in a matrix
> for diffeerent frequencies.The result I got was differet whene using few
> process compared to using 1 single process. I added the following code and now
> works, is it right?

It copies a vector into a row of a matrix. Whether that's what you want is a
different question, so we can't tell you whether it's "right" :-)

You can simplify this by saying
tmparray = locally_relevant_solution;

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Uclus Heis

ungelesen,

19.08.2022, 05:25:2019.08.22

an deal.II User Group

Dear all,

after some time I came back to this problem again. I would kindly ask for some guidance to see if I can understand and solve the issue.

I am using a parallel::distributed::Triangulation with MPI. I call the function solve() in a loop for different frequencies and want to export the solution of the whole domain for each frequency.

The code looks like:

for ( int i= f0; i < fend; ++i)

{

....

solve(); // solve a frequency

testvec=locally_relevant_solution; //extract the solution

// DataOut

DataOut<dim> data_out;

data_out.attach_dof_handler(dof_handler);

string f_output("sol_" + std::to_string(i) + ".txt");

std::ofstream outloop(f_output);

testvec.print(outloop,9,true,false); // Save txt file with solution for 1 frequency.

}

The way of extracting and exporting the solution with testvec=locally_relevant_solution is a bad practice? I am saving the locally relevant solution from many different processes in one single file for a given frequency. I am afraid that there is no synchronization between processes and the results will be saved without following the right order of DOF (which is needed for me). Is this statement correct?

In that case, what would be the better way to export my domain for each frequency?

Another issue that I found is that this approach increases dramatically the computational time of the run() function. For a particular case, solving the domain takes 1h without exporting the domain, while it takes 8h adding the previous piece of code to export the domain. Is this because the print function is slow or there is some sync going on when calling testvec=locally_relevant_solution?

I would really appreciate if you can clarify and guide mee to solve this issue.

Thank you very much

Regards

Wolfgang Bangerth

ungelesen,

19.08.2022, 12:57:0519.08.22

an dea...@googlegroups.com

On 8/19/22 03:25, Uclus Heis wrote:
> /
> /

> The way of extracting and exporting the solution with

> /testvec=locally_relevant_solution / is a bad practice? I am saving the

> locally relevant solution from many different processes in one single file for
> a given frequency. I am afraid that there is no synchronization between
> processes and the results will be saved without following the right order of
> DOF (which is needed for me). Is this statement correct?

Assuming that testvec is a vector that has all elements stored on the current
process, then the assignment
testvec = locally_relevant_solution;
synchronizes among all processes.

That said, from your code, it looks like all processes are opening the same
file and writing to it. Nothing good will come of this. There is of course
also the issue that importing all vector elements to one process cannot scale
to large numbers of processes.

> Another issue that I found is that this approach increases dramatically the
> computational time of the run() function. For a particular case, solving the
> domain takes 1h without exporting the domain, while it takes 8h adding the
> previous piece of code to export the domain. Is this because the print
> function is slow or there is some sync going on when calling

> /testvec=locally_relevant_solution?/

You can not tell which part of a code is expensive unless you actually time
it. Take a look at the TimerOutput class used in step-40, for example, and how
you can use it to time individual code blocks.

Uclus Heis

ungelesen,

19.08.2022, 16:25:5019.08.22

an deal.II User Group

Dear Wolfgang,

Thank you very much for your answer. Regarding what you mentioned:

"That said, from your code, it looks like all processes are opening the same

file and writing to it. Nothing good will come of this. There is of course
also the issue that importing all vector elements to one process cannot scale
to large numbers of processes."

What would you suggest to export in a text file the whole domain when running many processes ?

A possible solution that I can think is to export for each frequency (loop iteration) a file per process. In addition, I would need to export (print) the locally_owned_dofs (IndexSet) to construct in an external environment the whole domain solution. How could I solve the issue of importing all vector elements to one process ?

Thank you

Regards

Wolfgang Bangerth

ungelesen,

19.08.2022, 17:21:5719.08.22

an dea...@googlegroups.com

On 8/19/22 14:25, Uclus Heis wrote:
>
> "/That said, from your code, it looks like all processes are opening the same/
> /file and writing to it. Nothing good will come of this. There is of course

> also the issue that importing all vector elements to one process cannot scale

> to large numbers of processes."/
> /
> /

> What would you suggest to export in a text file the whole domain when running
> many processes ?
> A possible solution that I can think is to export for each frequency (loop
> iteration) a file per process. In addition, I would need to export (print) the
> locally_owned_dofs (IndexSet) to construct in an external environment the

> whole domain solution. How could I solve the issue of //importing all vector

> elements to one process ?

When you combine things into one file, you will always end up with a very
large file if you are doing things on 1000 processes. Where and how you do the
combining is secondary, the underlying fact of the resulting file is the same.
So: if the file is of manageable size, you can do it in a deal.II program as
you are already doing right now. If the file is no longer manageable, it
doesn't matter whether you try to combine it in a deal.II-based program or
later on, it's not manageable one way or the other.

Uclus Heis

ungelesen,

21.08.2022, 06:29:2321.08.22

an deal.II User Group

Dear Wolfgang,

Thank you for the clarifications.

I am trying now to export a file per process (and frequency) to avoid the issue that I had (previously mentioned). However, What I get is a vector with the total dof instead of the locally own dof.

My solver function is

PETScWrappers::MPI::Vector completely_distributed_solution(locally_owned_dofs,mpi_communicator);

SolverControl cn;

PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);

solver.solve(system_matrix, completely_distributed_solution, system_rhs);

constraints.distribute(completely_distributed_solution);

locally_relevant_solution = completely_distributed_solution;

while the exporting is the same as mentioned before, but adding the label of the corresponding process to each file.

testvec=locally_relevant_solution;

testvec.print(outloop,9,true,false);

It is clear that the problem I have now is that I am exporting the completely_distributed_solution and that is not what I want.

Could you please informe me how to obtain the locally own solution? I can not find the way of obtaining that,

Thank you

Regards

Wolfgang Bangerth

ungelesen,

22.08.2022, 11:51:5422.08.22

an dea...@googlegroups.com

On 8/21/22 04:29, Uclus Heis wrote:
> //
> /testvec.print(outloop,9,true,false);/

>
> It is clear that the problem I have now is that I am exporting the
> completely_distributed_solution and that is not what I want.
> Could you please informe me how to obtain the locally own solution? I
> can not find the way of obtaining that,

I don't know what data type you use for testvec, but it seems like this
vector is not aware of the partitioning and as a consequence it just
outputs everything it knows. You need to write the loop yourself, as in
something along the lines of
for (auto i : locally_owned_dofs)
outloop << testvec(i);
or similar.

Uclus Heis

ungelesen,

22.08.2022, 11:55:4822.08.22

an dea...@googlegroups.com

Dear Wolfgang,

Thank you very much for the suggestion.

Would be also a poddible solution to export my testvec as it is right now (which contains the global solution) but instead of exporting with all the preocess, call the print function only for one process?

Thank you

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/9baaf996-f9b4-65a9-5855-71897421b041%40colostate.edu.

Wolfgang Bangerth

ungelesen,

22.08.2022, 12:06:1322.08.22

an dea...@googlegroups.com

On 8/22/22 09:55, Uclus Heis wrote:
> Would be also a poddible solution to export my testvec as it is right
> now (which contains the global solution) but instead of exporting with
> all the preocess, call the print function only for one process?

Yes. But that runs again into the same issue mentioned before: If you
have a large number of processes (say, 1000), then you have one process
doing a lot of work (1000x as much as necessary) and 999 doing nothing.
This is bound to take a long time.

Allen antworten

Antwort an Autor

Weiterleiten