milonga running mpirun - observations

Vitor Vasconcelos

unread,

Jun 13, 2018, 1:58:38 PM6/13/18

to was...@seamplex.com

Hi all,

I have nothing substantial to share, but I have some interesting
observations about milonga
running under mpirun. (Note: I'm only playing with diffusions/volumes
since Germán told it is the simpler of formulations/methods to study.
Yes, I'm using milonga as a source of studies to learn FEM, FVM,
numerical methods and PETSc at the same time while thinking on how to
parallelize it at the same time).

Since milonga uses PETSc and SLEPc, it is kind of "naturally"
parallelized. The funny part is that when you run milonga using
mpirun, you get, as expected, more or less* the same problem run in
many processors. However, the calculation of eigenvalue performed by
SLEPc has half of the matrix if ran in two processors. This makes you
get the value of keff divided by the number of processes.

I'm (slowly, really really slowly) trying to figure out how everything
works together: milonga on top of wasora and the use of petsc matrices
and slepc to try to come up with a parallelization strategy. But I
have to say it has been interesting digging into the core of milonga.

I've been doing some minor changes to milonga to, first of all, just
try to make my tiny example run without errors using more than one
processor (PETSC compiled in debug mode).

I have to say, now I'm playing with MPI/PETSc. I don't even know if
using openMP for some parts and relying in PETSc MPI functions for
other would be a possible solution. For now, I'll be glad if I manage
to really split wasora-milonga from matrices build/solve in
sequential/parallel parts.

Hope to come up in a reasonable time with answers to these comments
I'm sharing with you.

Until now, I have nothing. But using GPL code to learn is really cool.
Thanks again, Germán.

Regards (and a great World Cup for those into football),

* "More or less" because petsc matrices can be automatically
distributed among processors, so the output is the same but
calculations - as long as I could see - are not.

** For the curious, I'm attaching two vtk files. One for a simple
example ran in sequential mode and a second one ran with mpirun -np 2.
Of course, the second one is generated with errors although in can be
opened with paraview.

1p.tar.gz

2p.tar.gz

Jeremy Theler

unread,

Jun 13, 2018, 3:16:35 PM6/13/18

to was...@seamplex.com

On Wed, 2018-06-13 at 14:58 -0300, Vitor Vasconcelos wrote:
> Hi all,
>
> I have nothing substantial to share, but I have some interesting
> observations about milonga
> running under mpirun. (Note: I'm only playing with diffusions/volumes
> since Germán told it is the simpler of formulations/methods to study.
> Yes, I'm using milonga as a source of studies to learn FEM, FVM,
> numerical methods and PETSc at the same time while thinking on how to
> parallelize it at the same time).

Note that it is not trivial to "parallelize" SLEPc. Try first analyzing this (I have not done it):
http://sl
epc.upv.es/handson/handson6.html

You can subscribe to PETSc mailing list and ask them there. You will get an answer right away (there is a lot of traffic though).

>
> Since milonga uses PETSc and SLEPc, it is kind of "naturally"
> parallelized. The funny part is that when you run milonga using
> mpirun, you get, as expected, more or less* the same problem run in
> many processors. However, the calculation of eigenvalue performed by
> SLEPc has half of the matrix if ran in two processors. This makes you
> get the value of keff divided by the number of processes.

This should not be happening.

>
> I'm (slowly, really really slowly) trying to figure out how everything
> works together: milonga on top of wasora and the use of petsc matrices
> and slepc to try to come up with a parallelization strategy. But I
> have to say it has been interesting digging into the core of milonga.

Fire any question you have!

>
> I've been doing some minor changes to milonga to, first of all, just
> try to make my tiny example run without errors using more than one
> processor (PETSC compiled in debug mode).

Can we see that repository?

>
> I have to say, now I'm playing with MPI/PETSc. I don't even know if
> using openMP for some parts and relying in PETSc MPI functions for
> other would be a possible solution. For now, I'll be glad if I manage
> to really split wasora-milonga from matrices build/solve in
> sequential/parallel parts.

I think you can use OpenMP (without MPI first) on the loop that builds
the matrices in diffusion volumes. That should not be so hard.

>
> Hope to come up in a reasonable time with answers to these comments
> I'm sharing with you.
>
> Until now, I have nothing. But using GPL code to learn is really cool.
> Thanks again, Germán.

This is why I released the code GPL, but there are some people that still do not get the point.

>
> Regards (and a great World Cup for those into football),

You know, these two companies that are (somehow) related to all these stuff have offices at
Nizhny Novgorod which is one of the places where Argentina is playing:

https://cadexchanger.com/
https://www.opencascade.com/

Small world, isn't it?

>
> * "More or less" because petsc matrices can be automatically
> distributed among processors, so the output is the same but
> calculations - as long as I could see - are not.

?

>
> ** For the curious, I'm attaching two vtk files. One for a simple
> example ran in sequential mode and a second one ran with mpirun -np 2.
> Of course, the second one is generated with errors although in can be
> opened with paraview.
>

I would imagine that any kind of output should be performed by only one process, say number zero.
It should gather all the information it needs from all the other ones and dump the VTK.
At least that is what the documentation says:

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPrintf.html

PETSc has a "viewer" for VTK formats. This should be investigated too:

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Viewer/PetscViewerVTKOpen.html

Another option is to use other formats, which allow for "parallel" schemes.
Keep in mind the word "para" in "paraview":

https://www.paraview.org/Wiki/ParaView:FAQ#What_file_formats_does_ParaView_support.3F

--
jeremy
www.seamplex.com

Vitor Vasconcelos

unread,

Jun 14, 2018, 2:03:19 PM6/14/18

to was...@seamplex.com

> Note that it is not trivial to "parallelize" SLEPc. Try first analyzing this (I have not done it):
> http://slepc.upv.es/handson/handson6.html

I am not yet there. For now, I'm just checking how to parallelize
(or make sure it is working perfectly) the
building matrices workload.

Anyway, I was thinking about a really simple approach:
1) make sure the matrices are built in a parallel way.
2) try to solve the linear system (I still need to change my test
case to have a call to milonga_solver_linear_petsc) in a parallel way.
3) work on milonga_solve_eigen_slepc to make it parallel - if possible.

I have to say, the most I dig in, the more complicated the
parallelization looks. Which is more or less what I expected.
I'm humble on this subject: even if it is not possible to have an
amazing parallelized version, I'm ok if I can manage to make milonga
faster in, for example, building large matrices.

Oh, thanks for the handson. I'll devote some time to it.

> You can subscribe to PETSc mailing list and ask them there. You will get an answer right away (there is a lot of traffic though).

Thanks. I'll take a look.

> This should not be happening.

Yes, but it is. I'll try to figure out why it is happening and
see if I can fix it.

> Fire any question you have!

Oh, be sure I'll. I'm working on them.

>> I've been doing some minor changes to milonga to, first of all, just
>> try to make my tiny example run without errors using more than one
>> processor (PETSC compiled in debug mode).
>
> Can we see that repository?

Sure! https://bitbucket.org/vitorvas/par-dev/src/par-dev/

(I'll be disappointed: when I say "minor" I really mean it...)

> I think you can use OpenMP (without MPI first) on the loop that builds
> the matrices in diffusion volumes. That should not be so hard.

There are calls to PETSc MatSetValue which *maybe* can be a problem,
but I agree this can be an interesting test.

> This is why I released the code GPL, but there are some people that still do not get the point.

Probably they never will.

> You know, these two companies that are (somehow) related to all these stuff have offices at
> Nizhny Novgorod which is one of the places where Argentina is playing:
>
> https://cadexchanger.com/
> https://www.opencascade.com/
>
> Small world, isn't it?

It seems you should be there to check what your competitors are doing
while you watch Argentina win the tournement. ;-)

>> * "More or less" because petsc matrices can be automatically
>> distributed among processors, so the output is the same but
>> calculations - as long as I could see - are not.
>
> ?

Sorry, my text is confusing.

I wanted to say that when you use mpirun, petsc
matrices are pre-allocated using MatMPIAIJSetPreallocation. So, data is
somehow distributed among processors, I just could not verify how it is
assembled and at which point each processor get the whole set of data.
Need more study on this matter.

> I would imagine that any kind of output should be performed by only one process, say number zero.

I agree. Only the master makes I/O, for example.

> It should gather all the information it needs from all the other ones and dump the VTK.
> At least that is what the documentation says:
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPrintf.html
>
> PETSc has a "viewer" for VTK formats. This should be investigated too:
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Viewer/PetscViewerVTKOpen.html
>
> Another option is to use other formats, which allow for "parallel" schemes.
> Keep in mind the word "para" in "paraview":

I'm not planning to spend time on data writing at this moment, but
this matter really needs
a closer look.

> https://www.paraview.org/Wiki/ParaView:FAQ#What_file_formats_does_ParaView_support.3F

Thanks for this pointer too.
I'll keep my studies on milonga (time allowing)

Regards,

Vitor

> --
> jeremy
> www.seamplex.com

German Theler

unread,

Jun 14, 2018, 2:46:48 PM6/14/18

to was...@seamplex.com

>
> I am not yet there. For now, I'm just checking how to parallelize
> (or make sure it is working perfectly) the
> building matrices workload.
>
> Anyway, I was thinking about a really simple approach:
> 1) make sure the matrices are built in a parallel way.

this step alone would be enough, as you said

> 2) try to solve the linear system (I still need to change my test
> case to have a call to milonga_solver_linear_petsc) in a parallel way.

great idea! do not define nuSigmaF but define a source with S
This will give a linear system instead of an eigen-problem
I have added an example for this case:

https://bitbucket.org/seamplex/milonga/commits/86b2fa65093bbd4b19020c45007c469a31e972bb

> 3) work on milonga_solve_eigen_slepc to make it parallel - if possible.

I think what can be parallelized are the linear systems involved in finding the eigenvalue.
Anyway, this is really needed to be able to scale milonga to millions of unknowns.
We can ask the developer of SLEPc, I know him personally and he is pretty open to questions.

>
>
> This should not be happening.
>
> Yes, but it is. I'll try to figure out why it is happening and
> see if I can fix it.

Start with the linear example and compare the PETSc examples to what milonga does.

>
>
> I think you can use OpenMP (without MPI first) on the loop that builds
> the matrices in diffusion volumes. That should not be so hard.
>
> There are calls to PETSc MatSetValue which *maybe* can be a problem,
> but I agree this can be an interesting test.

Good point. But the people at PETSc probably did it thread-safe.

> It seems you should be there to check what your competitors are doing
> while you watch Argentina win the tournement. ;-)

Well they are more providers/partners than competitors. And I do not change a world cup by getting back to
the First Division :-/

>
> I wanted to say that when you use mpirun, petsc
> matrices are pre-allocated using MatMPIAIJSetPreallocation. So, data is
> somehow distributed among processors, I just could not verify how it is
> assembled and at which point each processor get the whole set of data.
> Need more study on this matter.

Not sure if helps, but here is a point where they are supposed to be assembled:

https://bitbucket.org/seamplex/milonga/src/06ba147f53d756457fba96a5ec67466f4f484a22/src/milonga.c#lines-291

Reply all

Reply to author

Forward