Problem with MUMPS Solver

777 views
Skip to first unread message

RAJAT ARORA

unread,
Feb 13, 2017, 2:04:44 AM2/13/17
to deal.II User Group
Hello all,

I am using deal.ii to solve a 3D solid mechanics problem along with Petsc and P4est. The problem has 9 Dofs per node.
I run my code on stampede. 

This error may or may not be directly related to an issue with deal.ii (may be sparsity pattern ?) or its usage. But, given a diverse community, I was wondering if anyone has encountered an error of this type 
and might be able to help me solving the issue. If the deal.ii developers/authors feel that the post should be removed, feel free to let me know. :)

The issue is that I get an error (mentioned below, Info(1) = -9, Info(2) = 188672) when I run the code on greater than 2 nodes ( ~40 processors). There is no error on running the code on 2 nodes or less. I looked in the manual to see what the error Info(1) = -9 means. Also, the time at which the error happens is random. 

For Info(1) = -9, the documentation says  "The main internal real/complex workarray S too small. If INFO(2) is positive, then the number of entries that are missing in S at the moment when the error is raised is available in INFO(2). If INFO(2) is negative, then its absolute value should be multiplied by 1 million. If an error
–9 occurs, the user should increase the value of ICNTL(14) before calling the factorization
(JOB=2) again, except if ICNTL(23) or LWK USER are provided, in which case ICNTL(23)
or LWK USER should be increased." 

I haven't changed any internal variables like ICNTL(23) or ICNTL(14). I am using the default values by calling MUMPS from Petsc. (Honestly, I dont understand what these variables mean as well.)

I wanted to confirm if such an error can occur just because I increase the number of processors to be used as it gives no error when using upto 32 processors. 

If it indeed is happening just because of increasing the processors, is increasing the value of ICNTL(14) still the solution ?

Also, can someone please tell me where can I learn how to change these internal variables ?

Thanks a lot for the help. 

##################### ERROR ##############################
[61]PETSC ERROR: Error in external library
[61]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-9, INFO(2)=188672

[61]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[61]PETSC ERROR: Petsc Release Version 3.6.4, Apr, 12, 2016
[61]PETSC ERROR: ./fdm on a arch-linux2-c-debug named r542.pvt.bridges.psc.edu on Sat Feb 11 13:35:23 2017
[61]PETSC ERROR: Configure options --with-x=0 --download-hypre=1 --download-parmetis --download-metis --download-scalapack --download-mumps=http://ftp.mcs.anl.gov/pub/petsc/externalpackages/MUMPS_5.0.1-p1.tar.gz
[61]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1172 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/mat/impls/aij/mpi/mumps/mumps.c
[61]PETSC ERROR: #2 MatLUFactorNumeric() line 2958 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/mat/interface/matrix.c
[61]PETSC ERROR: #3 PCSetUp_LU() line 152 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/ksp/pc/impls/factor/lu/lu.c
[61]PETSC ERROR: #4 PCSetUp() line 983 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/ksp/pc/interface/precon.c
[61]PETSC ERROR: #5 KSPSetUp() line 332 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/ksp/ksp/interface/itfunc.c
[61]PETSC ERROR: #6 KSPSolve() line 547 in /Code-Libraries/petsc3.6.4/petsc-3.6.4_gcc_openmpi/src/ksp/ksp/interface/itfunc.c

Timo Heister

unread,
Feb 13, 2017, 12:59:02 PM2/13/17
to dea...@googlegroups.com
Rajat,

> The issue is that I get an error (mentioned below, Info(1) = -9, Info(2) =
> 188672) when I run the code on greater than 2 nodes ( ~40 processors). There
> is no error on running the code on 2 nodes or less. I looked in the manual
> to see what the error Info(1) = -9 means. Also, the time at which the error
> happens is random.

I haven't seen this error before. I can see several potential reasons for this:
1. You are doing something that causes a large number of entries per
row (very large number of couplings, maybe some weird constraints?)
2. We have a bug in deal.II that could be related to not communicating
things correctly to PETSc or within PETSc how it hands stuff to MUMPS.
This could be likely to happen if a single processor has no own DoFs.
You can experiment with a tiny test problem and many processors and
see if that triggers it.
3. Your matrix is not invertible but you only trigger a problem with
MUMPS in certain cases.
4. You actually need to change the settings for MUMPS. It looks like
this is easy to do:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Mat_MATSOLVERMUMPS.html&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmEjX38S7JmlS9Vw&m=EsoKswmf7wHHqLsosvyPZ-wn9Gx1iDt69A7APCDil-Y&s=9hrcHEAv7BOgDg1kjclulz4bnJ6WHbMG4QpsHbAVhDc&e=

--
Timo Heister
http://www.math.clemson.edu/~heister/

RAJAT ARORA

unread,
Feb 14, 2017, 6:23:56 PM2/14/17
to deal.II User Group
Hello Timo,

Thanks for your detailed reply giving many possible reasons.

There are no constraints on the equation. It is a first order time dependent problem, so there are just initial conditions that are taken care of separately.
The code is for 3D brick elements with 9 dofs per node. I just use dof_hanlder.distribute_dofs() function to number the nodes. Can this result in weird couplings ?


The matrix is non-singular at all points in time. I am pretty sure about this. I have tested the code rigorously for different parameters. I am almost sure that
this is not a special case. Also, the number of locally_owned_dofs on each processor was between 800 - 1100. 

Well, I might try with changing the settings in  MUMPS. But I need to figure out what value should I set ICNTL(14) to. I am taking a look at the manual.
The function you specified will help. 

Also, does deal.ii support superLU_dist ? I mean I know Petsc does but searching for superLU + deal.ii
doesn't give any useful results to me.

Thanks a lot again for your help. I really appreciate it.

Timo Heister

unread,
Feb 15, 2017, 12:46:48 PM2/15/17
to dea...@googlegroups.com
> There are no constraints on the equation. It is a first order time dependent
> problem, so there are just initial conditions that are taken care of
> separately.

and boundary conditions I would assume? Otherwise your system is not invertible.

> The code is for 3D brick elements with 9 dofs per node. I just use
> dof_hanlder.distribute_dofs() function to number the nodes. Can this result
> in weird couplings ?

It depends on how many entries you have per row in your sparsity
pattern (even if the values are zero). You should check that to be
sure.

> Also, does deal.ii support superLU_dist ? I mean I know Petsc does but
> searching for superLU + deal.ii
> doesn't give any useful results to me.

No, but you should be able to easily create a class for this or it
might even be possible to just to use PreconditionLU and set
"-pc_factor_mat_solver_package superlu" on the command line.

Toon Weyens

unread,
Nov 30, 2018, 11:17:59 AM11/30/18
to deal.II User Group
Did you manage to solve it in the end? I have a similar problem...

Thanks!
Reply all
Reply to author
Forward
0 new messages