parallelizing solutions to the heat/diffusion equation in 2D

fjpo...@gmail.com

unread,

Feb 6, 2015, 1:35:52 PM2/6/15

to mpi...@googlegroups.com

Hello,

We are working to develop efficient code to solve Partial Differential Equations (PDEs) using mpi4py. I am attaching an example that we have put together that we believe solves the problem accurately.

We have done some simple tests on a couple of machine and are finding some interesting behaviour, which we don't understand.

On my ubuntu machine (2x8 cores), it runs in serial with the following command (2048x2048 grid) and it runs in 87 seconds

python heat_2d_stepping_mpi.py 11 11 0 0 0

When we try it in parallel we get the following results:

np = 2 time = 37 secs

np = 4 time = 21 secs

np = 8 time = 14 secs

The case with np = 8 seems very efficient, but not super efficient. The other cases seem to run a lot faster and more efficient than we should expect.

I am very happy to know that mpi4py can be so efficient but we would very much like to understand how things scale in this problem to better prepare us for studying more complicated problems.

Also, we did try each of these at least twice and they did not change much.

Any insights would be greatly appreciated.

Best regards,

Francis

heat_2d_stepping_mpi.py

Aron Ahmadia

unread,

Feb 6, 2015, 2:12:09 PM2/6/15

to mpi...@googlegroups.com

Hi Francis,

It's not clear from your question what you need help with. There are a number of quite robust open source tools for solving partial differential equations in parallel from Python (petsc4py, also authored by Lisandro, comes to mind). If you are simply trying to solve PDEs efficiently and work from Python, you may want to start there.

On the other hand, if you're trying to understand why your code doesn't run as efficiently on 8 cores, I'd suggest you try profiling it with Python's built-in profiling tools. You should get an understanding of what parts of your code aren't speeding up as you increase the number of cores.

A

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To post to this group, send email to mpi...@googlegroups.com.
Visit this group at http://groups.google.com/group/mpi4py.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/ef403fa6-4bbf-417e-93c9-8ea8b2a01abf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

fjpo...@gmail.com

unread,

Feb 6, 2015, 8:58:18 PM2/6/15

to mpi...@googlegroups.com, ar...@ahmadia.net

Hello Aron,

Thanks for your reply and sorry that I was unclear.

Yes, I have used petsc4py and even slepc4py and plan to continue using them.

My question boils down to the fact that on 2 cores my tests run with 125% efficiency and I am trying to figure out why that is. I know that petsc4py and slepc4py both build on mpi4py and I am trying to better understand how things work in mp4py and what we can/should expect in terms of scalability.

Francis

Lisandro Dalcin

unread,

Feb 7, 2015, 3:35:00 AM2/7/15

to mpi4py, Aron Ahmadia

On 7 February 2015 at 04:58, <fjpo...@gmail.com> wrote:
> Hello Aron,
>
> Thanks for your reply and sorry that I was unclear.
>
> Yes, I have used petsc4py and even slepc4py and plan to continue using them.
>
> My question boils down to the fact that on 2 cores my tests run with 125%
> efficiency and I am trying to figure out why that is. I know that petsc4py
> and slepc4py both build on mpi4py and I am trying to better understand how
> things work in mp4py and what we can/should expect in terms of scalability.
>

There are a couple of details that make you code inefficient:

* You are using a 1D grid decomposition (i.e, by strips), a 2D
partitioning is better. However, I'm not sure you will notice too much
difference for np=8

* You are communicating ghost nodes using slow, pickle-based,
send()/recv() communication. You should to use methods like
Send()/Recv() to communicate numpy arrays, see the comments here:
http://mpi4py.readthedocs.org/en/latest/tutorial.html . Also, take a
look at this mpi4py's example here, it is quite similar to your code:
https://bitbucket.org/mpi4py/mpi4py/src/master/demo/mpi-ref-v1/ex-2.32.py

Finally, you should really use petsc4py's DMDA object to handle the
grid partitioning and ghost communication, and also use Numba to
quickly speedup the finite differences Python kernel to near C/Fortran
speed. Of course, you can also use Cython or Fortran+f2py to speedup
your kernel, but Numba is IMHO the future for these tasks.

--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459

Ahmad Ridwan

unread,

Apr 17, 2018, 12:15:34 AM4/17/18

to mpi4py

Hi Francis,
I have a problem to gain speedup the execution time with n nodes, i haven't been to do as your work. how to do it in a simpler program? My work just with 2 nodes with ubuntu.

Thanks

Reply all

Reply to author

Forward