Martin Meyer
unread,Mar 23, 2011, 11:44:58 AM3/23/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to matrixprogramming
Hi there,
right now I'm trying to parallelize a software which includes solving
a big linear system of equations. My choice fell on MUMPS (on 4
processors) to do the job. Now I have two main problems:
(1) The solution process is way slower than the sequential way in
which the program was written previoulsy, which was of course a
combination of do-loops.
(2) The solution process fails due to a DivideByZeroException, I
believe at the stage where ParMETIS calculates the symbolic
factorization.
I'm new to MUMPS, so it would be nice if someone with more experience
could maybe give a hint about about what goes wrong.
I'm using FORTRAN, and my non-default parameters for MUMPS are:
mumps_par%NRHS=1
mumps_par%ICNTL(9)=1 !solve Ax=b
mumps_par%ICNTL(10)=50 !iterative refinement steps
mumps_par%ICNTL(13)=0 !use scalapack for factorization
!Setting ICNTL(13) to a non-zero value will help with the
correct detection of null pivots but degrade performance.
mumps_par%ICNTL(23)=5 !supposed to be bigger than infog(26) in the
parallel version, which had a value of 1
mumps_par%ICNTL(28)=2 !use parallel analysis phase
mumps_par%ICNTL(29)=2 !use parmetis
mumps_par%ICNTL(4)=4 !print more information
As the comment says, ICNTL(23) can degrade performance, but because of
that DivideByZeroException, I set it to 0, thinking it could maybe
help.
My output log is:
DMUMPS 4.9.2
L D L^T Solver for general symmetric matrices
Type of parallelism: Host not working
****** ANALYSIS STEP ********
Using ParMETIS for parallel ordering.
Structual symmetry is:100%
WARNING: Largest root node of size 19
not selected for parallel execution
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 0
-- (3) Storage of factors (REAL, estimated) = 21498
-- (4) Storage of factors (INT , estimated) = 14284
-- (5) Maximum frontal size (estimated) = 20
-- (6) Number of nodes in the tree = 375
-- (32) Type of analysis effectively used = 2
-- (7) Ordering option effectively used = 2
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 7
Percentage of memory relaxation (effective) = 20
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 2.606D+05
** Rank of proc needing largest memory in IC facto : 0
** Estimated corresponding MBYTES for IC facto : 1
** Estimated avg. MBYTES per work. proc at facto (IC) : 1
** TOTAL space in MBYTES for IC factorization : 4
** Rank of proc needing largest memory for OOC facto : 0
** Estimated corresponding MBYTES for OOC facto : 1
** Estimated avg. MBYTES per work. proc at facto (OOC) : 1
** TOTAL space in MBYTES for OOC factorization : 4
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 3
OUT-OF-CORE OPTION (ICNTL(22)) = 0
REAL SPACE FOR FACTORS = 21498
INTEGER SPACE FOR FACTORS = 14284
MAXIMUM FRONTAL SIZE (ESTIMATED) = 20
NUMBER OF NODES IN THE TREE = 375
Convergence error after scaling for ONE-NORM (option 7/8) =
0.25D-01
Maximum effective relaxed size of S = 575014
Average effective relaxed size of S = 572911
GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.0006
** Memory relaxation parameter ( ICNTL(14) ) : 20
** Rank of processor needing largest memory in facto : 0
** Space in MBYTES used by this processor for facto : 1
** Avg. Space in MBYTES per working proc during facto : 1
ELAPSED TIME FOR FACTORIZATION = 0.0015
Maximum effective space used in S (KEEP8(67) = 7646
Average effective space used in S (KEEP8(67) = 7338
** EFF Min: Rank of processor needing largest memory : 0
** EFF Min: Space in MBYTES used by this processor : 1
** EFF Min: Avg. Space in MBYTES per working proc : 1
GLOBAL STATISTICS
RINFOG(2) OPERATIONS DURING NODE ASSEMBLY = 1.937D+04
------(3) OPERATIONS DURING NODE ELIMINATION = 2.606D+05
INFOG (9) REAL SPACE FOR FACTORS = 21498
INFOG(10) INTEGER SPACE FOR FACTORS = 14284
INFOG(11) MAXIMUM FRONT SIZE = 20
INFOG(29) NUMBER OF ENTRIES IN FACTORS = 18727
INFOG(12) NB OF NEGATIVE PIVOTS = 0
INFOG(12) NUMBER OF DELAYED PIVOTS = 0
NUMBER OF 2x2 PIVOTS in type 1 nodes = 0
NUMBER OF 2x2 PIVOTS in type 2 nodes = 0
INFOG(14) NUMBER OF MEMORY COMPRESS = 0
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 50
--- (11) = 0
--- (20) = 0
--- (21) = 0
BEGIN ITERATIVE REFINEMENT
MAXIMUM NUMBER OF STEPS = 50
STATISTICS AFTER ITERATIVE REFINEMENT
NUMBER OF STEPS OF ITERATIVE REFINEMENTS 0
** Rank of processor needing largest memory in solve : 1
** Space in MBYTES used by this processor for solve : 4
** Avg. Space in MBYTES per working proc during solve : 4
LEAVING SOLVER WITH: INFOG(1) ............ = 0
INFOG(2) ............ = 0
The DivideByZeroException is thrown in the analysis phase, the output
log ends after "Using ParMETIS for parallel ordering. Structual
symmetry is:100%". I don't know if this is a ParMETIS error, or an
error in the input data or parameters, or something with MUMPS itself.
Thanks in advance for taking a look at this!
Martin