candi petsc does not work with MPI-deal.ii

245 views
Skip to first unread message

Weixiong Zheng

unread,
Jul 15, 2015, 1:51:07 AM7/15/15
to dea...@googlegroups.com
Dear all,
 
After struggling on installing parallel stuffs with deal.ii for 4 days as an Ubuntu rookie (the previous discussion), the final compromise was made as: I used candi to make deal.ii + trilinos and built deal.ii + petsc by myself from source. The reason for building PETSc without candi is that when using -np>1, the following error manifested as shown in the end of this post.

This time, I didn't install libumfpack, but directly installed libsuitesparse-dev. And recall that last time both candi and self-compiled Trilinos did not work, yet, candi-trilinos works well this time, I would assume the undefined reference blabla was caused by "libumfpack"

Yet, I don't have luck with candi+petsc but the self-compiled petsc works well with deal.ii with MPI. I didn't change anything in candi.cfg expect the PROC.

Though i now have working PETSc (self-compiled) and Trilinos (candi), it is still interesting to know what's going on with candi-PETSc (well, I would assume this could be my problem,)

Thanks in advance,
Weixiong



 
 
 ERROR: Uncaught exception in MPI_InitFinalize on proc 1. Skipping MPI_Finalize() to avoid a deadlock.


----------------------------------------------------
Exception on processing:

--------------------------------------------------------
An error occurred in line <70> of file </home/weixiong/apps/candi/unpack/deal.II-v8.2.1/source/lac/sparsity_tools.cc> in function
    void dealii::SparsityTools::partition(const dealii::SparsityPattern&, unsigned int, std::vector<unsigned int>&)
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMETISNotInstalled()
Additional Information:
(none)
--------------------------------------------------------

Aborting!
----------------------------------------------------
ERROR: Uncaught exception in MPI_InitFinalize on proc 0. Skipping MPI_Finalize() to avoid a deadlock.


----------------------------------------------------
Exception on processing:

--------------------------------------------------------
An error occurred in line <70> of file </home/weixiong/apps/candi/unpack/deal.II-v8.2.1/source/lac/sparsity_tools.cc> in function
    void dealii::SparsityTools::partition(const dealii::SparsityPattern&, unsigned int, std::vector<unsigned int>&)
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMETISNotInstalled()
Additional Information:
(none)
--------------------------------------------------------

Aborting!
----------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 13739 on
node Berserker exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

Weixiong Zheng

unread,
Jul 15, 2015, 1:58:13 AM7/15/15
to dea...@googlegroups.com
Sorry for the ambiguity, let me clarify here:
Those error messages came from using candi-PETSc with mpirun in step-18 (step-17 has the same problem).

mpirun -np 1 ./step-18 was fine. but mpirun -np x ./step-18 (x>=2) gave me the errors.

Thanks,
Weixiong

在 2015年7月15日星期三 UTC-5上午12:51:07,Weixiong Zheng写道:

Denis Davydov

unread,
Jul 15, 2015, 6:09:25 AM7/15/15
to dea...@googlegroups.com
Hi Weixiong,

without an intention to hijack the thread from candi, 
you could also give LinuxBrew (a fork of Homebrew) a try.
Scripts in Homebrew are written in such a way as to protect picking up 
things outside of Homebrew, so you shall not have any conflicts with whatever is installed on Linux. 
I use the same scripts/formulae to install the full deal.II suite both on my mac (OS-X Yosemite) and Ubuntu 14.04LTS
without any issues (although I used Ubuntu a couple of months ago). 

Roughly, what you do is
1) install linuxbrew on your machine: https://github.com/Homebrew/linuxbrew
2) then Tap (add extra repository with software) homebrew science:  brew tap homebrew/science
3) then install deal.II and extra:
brew install trilinos --without-python --with-openblas --without-scotch
brew install dealii --without-opencascade --without-muparser --with-openblas

this will also get openmpi, openblas, petsc, slepc, superlu_dist, parpack, hypre, metis, parmetis, hdf5, netcdf, fftw, p4est and suite-sparse.

Those "withouts" are only relevant for linux, as not all software is compiling there yet.
If you will have problems, let me know, i will try to recompile the whole thing on my Ubuntu
and see if i can help.

Regards,
Denis.

Bruno Turcksin

unread,
Jul 15, 2015, 10:25:50 AM7/15/15
to dea...@googlegroups.com
Weixiong,

the error that you get is normal. By default, candi turns off METIS but step-17 and step-18 require METIS to run in parallel. PETSc is installed correctly that's why when you use one processor everything works fine. If you want to use METIS just turn it ON in candi.cgf

Best,

Bruno

Weixiong Zheng

unread,
Jul 15, 2015, 12:56:14 PM7/15/15
to dea...@googlegroups.com
Denis,

Oh, that's really good to know. It shall ease my installation next time. I will try it on my laptop once I finish installing ubuntu on it.

Thanks,
Weixiong

在 2015年7月15日星期三 UTC-5上午5:09:25,Denis Davydov写道:

Chenchen Liu

unread,
Sep 30, 2016, 4:07:44 PM9/30/16
to deal.II User Group
Hi Bruno,

It seems that I met the similar problem. My question is how to turn METIS on? Need I write something in the pbs job file or contact with the guy who is in charge of the cluster? Thank you very much!

Best,
Chenchen

在 2015年7月15日星期三 UTC-4上午10:25:50,Bruno Turcksin写道:

Bruno Turcksin

unread,
Sep 30, 2016, 4:23:20 PM9/30/16
to dea...@googlegroups.com
Chenchen,

2016-09-30 16:07 GMT-04:00 Chenchen Liu <ccliu...@gmail.com>:
> It seems that I met the similar problem. My question is how to turn METIS
> on? Need I write something in the pbs job file or contact with the guy who
> is in charge of the cluster? Thank you very much!
> Please refer to my question
> https://groups.google.com/forum/#!topic/dealii/116cFgS6EiE
No, the problem is totally different. Metis is correctly installed on
your system. I think the problem is that metis is asking for too much
memory (See the "Memory allocation failed" error message). Metis is a
serial program, so when you use multiple nodes, it is very likely that
metis will use more memory than what's available on 1 node. You can
try using a very coarse mesh on multiple nodes. But I would advise you
to use p4est instead of metis. You won't have this kind of problem
with p4est.

Best,

Bruno
Reply all
Reply to author
Forward
0 new messages