On 31 May 2012 18:19, thomas hisch <t.hi...@gmail.com> wrote:
> Hello,
> where can I find parallel (MPI) versions of (some) of the petsc4py/slepc4py
> examples found in the respective source tarballs?
Generally speaking, parallelism with PETSc reduces to filling-in
matrices and vectors in parallel, setting at each processor the
Mat/Vec values. To do this effectively, you previously need some sort
of partitioning to assign degrees of freedom to processors. For
example, this one solves a nonlinear problem with matrix-free using
petsc4py, the partitioning is managed with a DA object (structured
grid):
http://code.google.com/p/petsc4py/source/browse/demo/bratu3d/bratu3d.py
All the examples in slepc4py tarball are parallel (though really
simple), you just have to run "mpiexec -n 5 python ex2.py".
Could you provide some additional background on what you are looking for?
-- Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169
On Friday, June 1, 2012 5:04:07 PM UTC+2, Lisandro Dalcin wrote:
> On 31 May 2012 18:19, thomas hisch <t.hi...@gmail.com> wrote: > > Hello,
> > where can I find parallel (MPI) versions of (some) of the > petsc4py/slepc4py > > examples found in the respective source tarballs?
> Generally speaking, parallelism with PETSc reduces to filling-in > matrices and vectors in parallel, setting at each processor the > Mat/Vec values. To do this effectively, you previously need some sort > of partitioning to assign degrees of freedom to processors. For > example, this one solves a nonlinear problem with matrix-free using > petsc4py, the partitioning is managed with a DA object (structured > grid): > http://code.google.com/p/petsc4py/source/browse/demo/bratu3d/bratu3d.py
> All the examples in slepc4py tarball are parallel (though really > simple), you just have to run "mpiexec -n 5 python ex2.py".
Thx, mpiexec/mpirun in the slepc4py demo dir seems to work fine. Yesterday I only tested the poisson2d example in petsc4py with mpirun -n 4 python poisson2d.py which crashed due to a indexing error. I grepped the demo dirs of both slepc4py and petsc4py for "mpi4py", "DECIDE" (I'm familiar with the C/C++ PETSC/SLEPC API) but didn't find a match. Therefore I expected that all the demos are not parallelized.
> Could you provide some additional background on what you are looking for?
I want to solve Hermitian and non-Hermitian Schrödinger and Helmholtz type problems using slepc4py. I have already created the code for solving the helmholtz-problem in C++ using slepc. In this code I rely on PETSC_DECIDE for partitioning my matrices for parallel usage. How is this partioning done in petsc4py ?
> On Friday, June 1, 2012 5:04:07 PM UTC+2, Lisandro Dalcin wrote:
>> On 31 May 2012 18:19, thomas hisch <t.hi...@gmail.com> wrote:
>> > Hello,
>> > where can I find parallel (MPI) versions of (some) of the
>> > petsc4py/slepc4py
>> > examples found in the respective source tarballs?
>> Generally speaking, parallelism with PETSc reduces to filling-in
>> matrices and vectors in parallel, setting at each processor the
>> Mat/Vec values. To do this effectively, you previously need some sort
>> of partitioning to assign degrees of freedom to processors. For
>> example, this one solves a nonlinear problem with matrix-free using
>> petsc4py, the partitioning is managed with a DA object (structured
>> grid):
>> http://code.google.com/p/petsc4py/source/browse/demo/bratu3d/bratu3d.py
>> All the examples in slepc4py tarball are parallel (though really
>> simple), you just have to run "mpiexec -n 5 python ex2.py".
> Thx, mpiexec/mpirun in the slepc4py demo dir seems to work fine. Yesterday I
> only tested the poisson2d example in petsc4py with mpirun -n 4 python
> poisson2d.py which crashed due to a indexing error. I grepped the demo dirs
> of both slepc4py and petsc4py for "mpi4py", "DECIDE" (I'm familiar with the
> C/C++ PETSC/SLEPC API) but didn't find a match. Therefore I expected that
> all the demos are not parallelized.
That's right, not all the demos in petsc4py are parallel.
>> Could you provide some additional background on what you are looking for?
> I want to solve Hermitian and non-Hermitian Schrödinger and Helmholtz type
> problems using slepc4py. I have already created the code for solving the
> helmholtz-problem in C++ using slepc. In this code I rely on PETSC_DECIDE
> for partitioning my matrices for parallel usage. How is this partioning done
> in petsc4py ?
Suppose M and N are de global row and column sizes, then you just do:
from petsc4py import PETSc
A = PETSc.Mat().create()
A.setType('aij')
A.setSizes([M,N])
A.setPreallocationNNZ([diag_nz, offdiag_nz]) # optional
A.setUp()
and you are done, petsc4py will pass DECIDE to PETSc.
BTW, if you already have fast C++ code to fill your matrix in
parallel, reusing it would be a good idea, as Python will be rather
slow for this. Look at demos/wrap-{cython}swig} to see how this can be
done.
-- Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169
On Friday, June 1, 2012 9:15:56 PM UTC+2, Lisandro Dalcin wrote:
> On 1 June 2012 15:37, thomas hisch <t.hi...@gmail.com> wrote:
> > On Friday, June 1, 2012 5:04:07 PM UTC+2, Lisandro Dalcin wrote:
> >> On 31 May 2012 18:19, thomas hisch <t.hi...@gmail.com> wrote: > >> > Hello,
> >> > where can I find parallel (MPI) versions of (some) of the > >> > petsc4py/slepc4py > >> > examples found in the respective source tarballs?
> >> Generally speaking, parallelism with PETSc reduces to filling-in > >> matrices and vectors in parallel, setting at each processor the > >> Mat/Vec values. To do this effectively, you previously need some sort > >> of partitioning to assign degrees of freedom to processors. For > >> example, this one solves a nonlinear problem with matrix-free using > >> petsc4py, the partitioning is managed with a DA object (structured > >> grid): > >> http://code.google.com/p/petsc4py/source/browse/demo/bratu3d/bratu3d.py
> >> All the examples in slepc4py tarball are parallel (though really > >> simple), you just have to run "mpiexec -n 5 python ex2.py".
> > Thx, mpiexec/mpirun in the slepc4py demo dir seems to work fine. > Yesterday I > > only tested the poisson2d example in petsc4py with mpirun -n 4 python > > poisson2d.py which crashed due to a indexing error. I grepped the demo > dirs > > of both slepc4py and petsc4py for "mpi4py", "DECIDE" (I'm familiar with > the > > C/C++ PETSC/SLEPC API) but didn't find a match. Therefore I expected > that > > all the demos are not parallelized.
> That's right, not all the demos in petsc4py are parallel.
> >> Could you provide some additional background on what you are looking > for?
> > I want to solve Hermitian and non-Hermitian Schrödinger and Helmholtz > type > > problems using slepc4py. I have already created the code for solving the > > helmholtz-problem in C++ using slepc. In this code I rely on > PETSC_DECIDE > > for partitioning my matrices for parallel usage. How is this partioning > done > > in petsc4py ?
> Suppose M and N are de global row and column sizes, then you just do:
> from petsc4py import PETSc > A = PETSc.Mat().create() > A.setType('aij') > A.setSizes([M,N]) > A.setPreallocationNNZ([diag_nz, offdiag_nz]) # optional > A.setUp()
> and you are done, petsc4py will pass DECIDE to PETSc.
> BTW, if you already have fast C++ code to fill your matrix in > parallel, reusing it would be a good idea, as Python will be rather > slow for this. Look at demos/wrap-{cython}swig} to see how this can be > done.
For the current project I do not have any C++ code which assembles the matrices - I would like to avoid using C++ and implement the whole project in python/cython. Doing the analysis with the IPython parallel framework seems to work pretty well, except that when there are errors in my mpi commands all the ipython engines/kernels need to be restarted.
The main part of the code is now ready and I would like to optimize it a bit. A high number of eigenvalueproblems (EVP) (~10^4) needs to be solved: Solving 1 EVP, for which a few eigenvectors and eigenvalues are computed, takes less than a second (~0.3s) . Each eigensystem has a different diagonal of the operator/matrix A. ATM I do all the computation in pure python and loop over the local portion of the diagonal of A on each processor to update A. On the web I found that there is a petsc function which updates the diagonal of a matrix A with a given petsc vector b : http://lists.mcs.anl.gov/pipermail/petsc-users/2011-April/008607.html MatDiagonalSet(A,b,ADD_VALUES). I haven't tried this function jet - maybe it speeds up the assembling of the matrices a bit ?!?! Does using cython for the assembling of block diagonal matrices result in any 'performance gain' ?
Regards and thx for this great piece of software! Thomas
Yes, your code will be a bit faster using a Cythonized loop or updating
with a numpy/Vec array, but I would be surprised if this is the bottleneck.
If your problems are very small, you are probably better off distributing
different problems to each process and having them solve them independently
than decomposing the matrices.
> Does using cython for the assembling of block diagonal matrices result in
any 'performance gain' ?
Yes, but profile your code first to find out where you are spending your
time!