Time for Creating subspace vector V

217 views
Skip to first unread message

clappertown

unread,
Feb 11, 2011, 10:45:04 PM2/11/11
to mor4ansys
I have been using LU decomposition in matlab for genereating subspace
vectors using arnoldi process. From my experiences, for a 300K order
system matrix (n x n , n~300k) , it takes about ~260s to do the LU
decomposition. I just wonder if using other math library in C would
get it done much faster?


Evgenii Rudnyi

unread,
Feb 12, 2011, 1:53:35 AM2/12/11
to mor4...@googlegroups.com
on 12.02.2011 04:45 clappertown said the following:

Hard to says, as it matrix depended. It also depends on the reordering -
use METIS. Please note that when for example it it symmetric, you need
L^tL decomposition (or L^tDL). Yet the factoring time seems to be to
long, it is comparable with what I had at 450 MHz Sun Ultra-80 eight
years ago - see Table 1 in

http://modelreduction.com/doc/papers/rudnyi04PARA.pdf

I would expect that with modern hardware the times in that table should
be reduced by 6-8 times.

You will find information about different solvers at

http://MatrixProgramming.com

I would recommend you MUMPS.

clappertown

unread,
Feb 12, 2011, 2:47:38 AM2/12/11
to mor4ansys
Dear Evgenii,

Thanks a lot for the quick response. I am wondering if I can get a
copy of mor4ansys code? I would like to see how fast it is comparing
to the pure matlab code and then decide whether I need to improve the
speed by switching to C code...

BTW, I am doing thermal simulations and extract both
Conductance(stiffness) and Heat Capacity(damping) matrices from full
files by using static and modal analysis options in ANTYPE and hbmat
command twice. In such way, I don't need to extract the capacity
matrix from the huge .emat file.

Best regards,

sagar

unread,
Feb 12, 2011, 3:13:46 AM2/12/11
to mor4...@googlegroups.com
Hi i want to do piping analysis is it possible in ANSYS  

--
You received this message because you are subscribed to the Google Groups "mor4ansys" group.
To post to this group, send email to mor4...@googlegroups.com.
To unsubscribe from this group, send email to mor4ansys+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mor4ansys?hl=en.




--
 Thanks & Regards
  Sagar

Evgenii Rudnyi

unread,
Feb 12, 2011, 5:05:19 AM2/12/11
to mor4...@googlegroups.com
MOR for ANSYS is a commercial product. Yet to try solvers you will find
sample code here

http://matrixprogramming.com/files/code/benchmark/

For example run_mumps.cpp to run MUMPS.

As for a thermal problem, MOR for ANSYS is working here extremely well.
See for example the latest review

Effective Electrothermal Simulation for Battery Pack and Power
Electronics in HEV/EV
http://modelreduction.com/doc/papers/rudnyi10gsvf.pdf

More at

http://modelreduction.com/Applications/Thermal.html

on 12.02.2011 08:47 clappertown said the following:

clappertown

unread,
Mar 20, 2011, 4:06:04 PM3/20/11
to mor4ansys
Hi Evgenii,

Thanks for the information. I was compiling the MUMPs 4.9.2 on Windows
7 (64 bits) using G95 and visual C++ 2010 express, as you suggested.
However, I still had problems even on making the sample code work.

I did compile and put libdmumps.lib, libmumps_common.lib
libmetis.lib, libf95.lib into the /lib and libmpiseq.lib under /libseq
folders. I also downloaded and compiled BLAS library using g95 and
put the created libblas.lib in the /lib directory.

I succeeded in compiling the c_example.c in examples directory to get
c_example.obj using:
cl -MD -EHsc -c -I..\libseq -I..\include c_example.c

But when I tried to link c_example.obj with those libraries using:
cl c_example.obj libmumps_common.lib libmetis.lib libdmumps.lib
libf95.lib libgcc.lib libblas.lib libmpiseq.lib -link -LIBPATH:..
\libseq -LIBPATH:..\lib /NODEFAULTLIB:libcmt.lib

I have the following link errors:
libdmumps.lib(dmumps_part3.o) : error LNK2019: unresolved external
symbol _DCOPY referenced in function _DMUMPS_507
libdmumps.lib(dmumps_part6.o) : error LNK2001: unresolved external
symbol _DCOPY
libdmumps.lib(dmumps_part5.o) : error LNK2001: unresolved external
symbol _DCOPY
libdmumps.lib(dmumps_ooc_buffer.o) : error LNK2001: unresolved
external symbol _DCOPY
libdmumps.lib(dmumps_part1.o) : error LNK2019: unresolved external
symbol _DSWAP referenced in function _DMUMPS_310
libdmumps.lib(dmumps_part6.o) : error LNK2001: unresolved external
symbol _DSWAP
libdmumps.lib(dmumps_part1.o) : error LNK2019: unresolved external
symbol _DTRSM referenced in function _DMUMPS_310
libdmumps.lib(dmumps_part6.o) : error LNK2001: unresolved external
symbol _DTRSM
libdmumps.lib(dmumps_part4.o) : error LNK2001: unresolved external
symbol _DTRSM
libdmumps.lib(dmumps_part8.o) : error LNK2001: unresolved external
symbol _DTRSM
libdmumps.lib(dmumps_part1.o) : error LNK2019: unresolved external
symbol _DGEMM referenced in function _DMUMPS_310
libdmumps.lib(dmumps_part6.o) : error LNK2001: unresolved external
symbol _DGEMM
libdmumps.lib(dmumps_part4.o) : error LNK2001: unresolved external
symbol _DGEMM
libdmumps.lib(dmumps_part8.o) : error LNK2001: unresolved external
symbol _DGEMM
libdmumps.lib(dmumps_part6.o) : error LNK2019: unresolved external
symbol _DSYR referenced in function _DMUMPS_XSYR
libdmumps.lib(dmumps_part6.o) : error LNK2019: unresolved external
symbol _IDAMAX referenced in function _DMUMPS_XSYR
libdmumps.lib(dmumps_part6.o) : error LNK2019: unresolved external
symbol _DGEMV referenced in function _DMUMPS_XSYR
libdmumps.lib(dmumps_part8.o) : error LNK2001: unresolved external
symbol _DGEMV
libdmumps.lib(dmumps_part6.o) : error LNK2019: unresolved external
symbol _DSCAL referenced in function _DMUMPS_XSYR
libdmumps.lib(dmumps_part6.o) : error LNK2019: unresolved external
symbol _DGER referenced in function _DMUMPS_XSYR
libdmumps.lib(dmumps_part4.o) : error LNK2001: unresolved external
symbol _DGER
libdmumps.lib(dmumps_part4.o) : error LNK2019: unresolved external
symbol _DAXPY referenced in function _DMUMPS_240
libdmumps.lib(dmumps_part8.o) : error LNK2019: unresolved external
symbol _DTRSV referenced in function _DMUMPS_286
c_example.exe : fatal error LNK1120: 11 unresolved externals

It seems the code can't find the BLAS functions used in these fortran
codes. For instance for the below error:
libdmumps.lib(dmumps_part3.o) : error LNK2019: unresolved external
symbol _DCOPY referenced in function _DMUMPS_507
I do see dmumps_part3.f calls DCOPY function and the dcopy.f under
BLAS source directory defines the same DCOPY function.
The complied blas.lib does include dcopy.o when I use lib /list to
list the content of blas.lib.

My understanding is that dmumps_part3.o should find the DCOPY from
blas.lib but it seems not...


On Feb 12, 3:05 am, Evgenii Rudnyi <use...@rudnyi.ru> wrote:
> MOR for ANSYS is a commercial product. Yet to try solvers you will find
> sample code here
>
> http://matrixprogramming.com/files/code/benchmark/
>
> For example run_mumps.cpp to run MUMPS.
>
> As for a thermal problem, MOR for ANSYS is working here extremely well.
> See for example the latest review
>
> Effective Electrothermal Simulation for Battery Pack and Power
> Electronics in HEV/EVhttp://modelreduction.com/doc/papers/rudnyi10gsvf.pdf

Evgenii Rudnyi

unread,
Mar 20, 2011, 4:50:27 PM3/20/11
to mor4...@googlegroups.com
Presumably, when you have compiled BLAS you have made the names
lowercase. You can use nm to check what happened

http://matrixprogramming.com/2011/03/using-nm-to-troubleshoot-linking-problems

Just run nm, for example

nm dcopy.o

and then you see what name is defined there. Please note that the
reference BLAS from NETLIB is slow. In order to have good performance,
you must use an optimized BLAS

http://matrixprogramming.com/2010/08/blas-basic-linear-algebra-system

Here however it would be good to understand what names will be in the
BLAS library. Well, it is also possible to force MUMPS to use any BLAS
names by editing some headers.


On 20.03.2011 21:06 clappertown said the following:

clappertown

unread,
Mar 20, 2011, 6:29:59 PM3/20/11
to mor4ansys
Thanks a lot. When I nm dcopy.o it shows _dcopy_ while dmumps_part3.o
calls _DCOPY.

I just added -fno-underscoring -fcase-upper for g95 when I re-compiled
reference BLAS library, which is the same compile option for creating
libdmumps.lib. Now it linked everything correctly since both library
create the same upper-case function names.

I will try your recommended optimized BLAS libraries and update the
results later. So far I have tried TAUCS for MOR but it seems a
quite slower comparing with UMFPACK version of LU in matlab.

Thanks,

Yizhang






On Mar 20, 1:50 pm, Evgenii Rudnyi <use...@rudnyi.ru> wrote:
> Presumably, when you have compiled BLAS you have made the names
> lowercase. You can use nm to check what happened
>
> http://matrixprogramming.com/2011/03/using-nm-to-troubleshoot-linking...

Evgenii Rudnyi

unread,
Mar 21, 2011, 2:14:50 PM3/21/11
to mor4...@googlegroups.com
On 20.03.2011 23:29 clappertown said the following:

> Thanks a lot. When I nm dcopy.o it shows _dcopy_ while
> dmumps_part3.o calls _DCOPY.
>
> I just added -fno-underscoring -fcase-upper for g95 when I
> re-compiled reference BLAS library, which is the same compile option
> for creating libdmumps.lib. Now it linked everything correctly since
> both library create the same upper-case function names.
>
> I will try your recommended optimized BLAS libraries and update the
> results later. So far I have tried TAUCS for MOR but it seems a
> quite slower comparing with UMFPACK version of LU in matlab.

When you try TAUCS or MUMPS, please use METIS for reordering. You will
find a comparison for different reordering schemes here

http://matrixprogramming.com/2008/05/metis

An optimized BLAS is also essential for good performance. See for example

http://matrixprogramming.com/2008/01/matrixmultiply

clappertown

unread,
Apr 2, 2011, 2:41:46 AM4/2/11
to mor4ansys
Hi Evgenii,

I just tried MUMPS with GotoBLAS and it worked fine to get correct
solutions for small matrix Ax = b. However, when I tried matrix of A
with dimension 314910x314910 (4220697 nonzeros), MUMPs shows below
errors:
On return from DMUMPS, INFOG(1)= -13
On return from DMUMPS, INFOG(2)= 246022294
The help file suggests "An error occurred in a Fortran ALLOCATE
statement" with INFOG = -13. I used g95 and cl to comply the program
which is a modification of the attached c_example.c.

Below is the detailed MUMP display information ( the top part is the
imported matrix file information I added for debug purpose).

MUMPS Vision#: 4.9.2
ptrcrd 314911, indcrd 4220697, valcrd 4220697,rhscrd 314910
Matrix type RSA
314910 rows, 314910 cols, 4220697 nonzeros
sym = 1
Time to load matrix is 5.07

DMUMPS 4.9.2
L D L^T Solver for general symmetric matrices
Type of parallelism: Working host

****** ANALYSIS STEP ********

Resetting candidate strategy to 0 because NSLAVES=1


****** Preprocessing of original matrix

Scaling will be computed during analysis
Compute maximum matching (Maximum Transversal): 5
... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE
... Column permutation not used
Density: NBdense, Average, Median = 0 25 26
Ordering based on METIS

Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 179331615
-- (3) Storage of factors (REAL, estimated) = 210021068
-- (4) Storage of factors (INT , estimated) = 5718074
-- (5) Maximum frontal size (estimated) = 5549
-- (6) Number of nodes in the tree = 21942
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 5
Percentage of memory relaxation (effective) = 10
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.727D+11
** Rank of proc needing largest memory in IC facto : 0
** Estimated corresponding MBYTES for IC facto : 2059
** Estimated avg. MBYTES per work. proc at facto (IC) : 2059
** TOTAL space in MBYTES for IC factorization : 2059
** Rank of proc needing largest memory for OOC facto : 0
** Estimated corresponding MBYTES for OOC facto : 493
** Estimated avg. MBYTES per work. proc at facto (OOC) : 493
** TOTAL space in MBYTES for OOC factorization : 493

****** FACTORIZATION STEP ********


GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 1
OUT-OF-CORE OPTION (ICNTL(22)) = 0
REAL SPACE FOR FACTORS = 210021068
INTEGER SPACE FOR FACTORS = 5718074
MAXIMUM FRONTAL SIZE (ESTIMATED) = 5549
NUMBER OF NODES IN THE TREE = 21942
Maximum effective relaxed size of S = 246022294
Average effective relaxed size of S = 246022294
** ERROR RETURN ** FROM DMUMPS INFO(1)=-13
** INFO(2)= 246022294
On return from DMUMPS, INFOG(1)= -13
On return from DMUMPS, INFOG(2)= 246022294

Evgenii Rudnyi

unread,
Apr 2, 2011, 3:25:11 AM4/2/11
to mor4...@googlegroups.com
Do you do it on 32-bit? If yes, then you have reached the limits as you
need to allocate 2 Gb of contiguous memory that is impossible on 32-bit.
You can try out-of-core though, it is a bit slower but then should work.

on 02.04.2011 08:41 clappertown said the following:

clappertown

unread,
Apr 2, 2011, 4:14:55 AM4/2/11
to mor4ansys
I am using the g95 and cl under windows 7 64bit but might be the
reason that cgywin and g95 are all 32 bit.
Anyway, I set id.ICNTL(22)=1 to use ooc but the code returns the
following error:

Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 179331615
-- (3) Storage of factors (REAL, estimated) = 210021068
-- (4) Storage of factors (INT , estimated) = 5718074
-- (5) Maximum frontal size (estimated) = 5549
-- (6) Number of nodes in the tree = 21942
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 5
Percentage of memory relaxation (effective) = 20
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 3.727D+11
** Rank of proc needing largest memory in IC facto : 0
** Estimated corresponding MBYTES for IC facto : 2242
** Estimated avg. MBYTES per work. proc at facto (IC) : 2242
** TOTAL space in MBYTES for IC factorization : 2242
** Rank of proc needing largest memory for OOC facto : 0
** Estimated corresponding MBYTES for OOC facto : 535
** Estimated avg. MBYTES per work. proc at facto (OOC) : 535
** TOTAL space in MBYTES for OOC factorization : 535

****** FACTORIZATION STEP ********

GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 1

GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 1
OUT-OF-CORE OPTION (ICNTL(22)) = 1
REAL SPACE FOR FACTORS = 210021068
INTEGER SPACE FOR FACTORS = 5718074
MAXIMUM FRONTAL SIZE (ESTIMATED) = 5549
NUMBER OF NODES IN THE TREE = 21942
Maximum effective relaxed size of S = 53312357
Average effective relaxed size of S = 53312357
0 : PB in MUMPS_LOW_LEVEL_INIT_OOC_C
0 : Problem while opening OOC file
** ERROR RETURN ** FROM DMUMPS INFO(1)=-90
** INFO(2)= 0
On return from DMUMPS, INFOG(1)= -90
On return from DMUMPS, INFOG(2)= 0
CNTL(1)=6
** ERROR RETURN ** FROM DMUMPS INFO(1)= -3
** INFO(2)= 3
On return from DMUMPS, INFOG(1)= -3
On return from DMUMPS, INFOG(2)= 3

Evgenii Rudnyi

unread,
Apr 2, 2011, 4:53:14 AM4/2/11
to mor4...@googlegroups.com
on 02.04.2011 10:14 clappertown said the following:

> I am using the g95 and cl under windows 7 64bit but might be the
> reason that cgywin and g95 are all 32 bit.

Correct, this is the reason. A good question, if free 64-bit Fortran
compiler for Windows exists, have no idea.

> Anyway, I set
> id.ICNTL(22)=1 to use ooc but the code returns the following error:

It is not enough. You need also set a directory, search for
MUMPSoutofcore in

http://matrixprogramming.com/files/code/benchmark/solvers.cpp

and you will see what else I have done. I think that this also could be
done through some environment variable, it should be in MUMPS docs.

clappertown

unread,
Apr 2, 2011, 1:11:22 PM4/2/11
to mor4ansys
Thanks. I don't know if the gfortran in MinGW-w64 is actually 64 bits
or not...

I will try other setting for OOC in MUMPs and update the results.

Yizhang

Evgenii Rudnyi

unread,
Apr 2, 2011, 2:09:48 PM4/2/11
to mor4...@googlegroups.com
on 02.04.2011 19:11 clappertown said the following:

> Thanks. I don't know if the gfortran in MinGW-w64 is actually 64
> bits or not...

It looks like, it is. This package now available directly under cygwin.
It would be nice to try it.

clappertown

unread,
Apr 2, 2011, 4:11:27 PM4/2/11
to mor4ansys
After setting the id.ooc_tmpdir, the code runs ok and is not that
slow. As a comparison, MUMPS with OOC take 64s to do the factorization
on this 314910 order matrix, while build-in UMFPACK LU function in
64bits Matlab takes ~260s.

BTW, is there anyway to save the MUMP information after factorization
(JOB=2) such that later on only JOB=3 is used for arbitrary vector b
input?

Since there are no matrix operation functions in MUMPS and I don't
want to do extensive coding in C, the simple way to do MOR would be
first call JOB=2 to factorize A im MUMPS, and then call JOB=3 to get
x for given b and pass it to matlab. Once the v_i vector is generated
in matlab, it can be passed to MUMPS (JOB=3) again to extract x for
M*v_i, in order to obtain v_(i+1) in Matlab. I couldn't get the
matlab interface of MUMPS working using mex complier since MUMPS is
32bits while Matlab is 64 bits...

Evgenii Rudnyi

unread,
Apr 2, 2011, 4:39:27 PM4/2/11
to mor4...@googlegroups.com
on 02.04.2011 22:11 clappertown said the following:

> After setting the id.ooc_tmpdir, the code runs ok and is not that
> slow. As a comparison, MUMPS with OOC take 64s to do the
> factorization on this 314910 order matrix, while build-in UMFPACK LU
> function in 64bits Matlab takes ~260s.
>
> BTW, is there anyway to save the MUMP information after
> factorization (JOB=2) such that later on only JOB=3 is used for
> arbitrary vector b input?

It could be done in one job, you will find my implementation in

http://matrixprogramming.com/files/code/benchmark/

Look at solver*. You will also find also matrix vector multiplication
there. The code does not work directly though, at present it is just to
give an idea.

clappertown

unread,
Apr 3, 2011, 3:57:17 AM4/3/11
to mor4ansys
The example is very helpful. Finally I put everything in C such that
it accepts the Harwell-Boeing format matrices file from ANSYS HBMAT
commands to get subspace matrices. Overall it seems ~4 times faster
than doing LU decomposition in Matlab. On the other hand, TAUCS is
considerable slower than Matlab...

Evgenii Rudnyi

unread,
Apr 4, 2011, 2:37:51 PM4/4/11
to mor4...@googlegroups.com
On 03.04.2011 09:57 clappertown said the following:

> The example is very helpful. Finally I put everything in C such that
> it accepts the Harwell-Boeing format matrices file from ANSYS HBMAT
> commands to get subspace matrices. Overall it seems ~4 times faster
> than doing LU decomposition in Matlab. On the other hand, TAUCS is
> considerable slower than Matlab...

Small correction. LU in TAUCS is slow indeed but it does positive
definite matrices pretty well.

clappertown

unread,
Apr 4, 2011, 8:42:54 PM4/4/11
to mor4ansys
I just found that MUMPS randomly crashes during the factorization
stage (id.job=2), e.g, 2 out of 5 times for the same input A and b.
MUMPS would still yield the same correct solution x if it can go
through factorization to the solve process. I already turned off
ICNTL(1:4) to be -1 allowing no output information.

Evgenii Rudnyi

unread,
Apr 5, 2011, 1:13:56 PM4/5/11
to mor4...@googlegroups.com
On 05.04.2011 02:42 clappertown said the following:

> I just found that MUMPS randomly crashes during the factorization
> stage (id.job=2), e.g, 2 out of 5 times for the same input A and b.
> MUMPS would still yield the same correct solution x if it can go
> through factorization to the solve process. I already turned off
> ICNTL(1:4) to be -1 allowing no output information.
>

Turn output on and watch it when the crash happens. It may help to
understand the reason.

clappertown

unread,
Apr 7, 2011, 6:11:17 PM4/7/11
to mor4ansys
It always crashes after showing " ** Avg. Space in MBYTES per working
proc during facto : XXX" during the factorization process,
after passing the analysis process successfully. Below is the detailed
message before crashing and sometimes Window just pop up a window
saying "Program has stopped working, windows is checking the
solutions...". If I run the program for the same matrice input files
10 times, 30% changes it will crash and the rest of time it yield
correct results...




Scaling will be computed during analysis
Compute maximum matching (Maximum Transversal): 5
... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE
... Column permutation not used
Density: NBdense, Average, Median = 0 24 26
Ordering based on METIS

Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 39222639
-- (3) Storage of factors (REAL, estimated) = 46139251
-- (4) Storage of factors (INT , estimated) = 1735694
-- (5) Maximum frontal size (estimated) = 2666
-- (6) Number of nodes in the tree = 7470
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 7
Percentage of memory relaxation (effective) = 90
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 4.223D+10
** Rank of proc needing largest memory in IC facto : 0
** Estimated corresponding MBYTES for IC facto : 788
** Estimated avg. MBYTES per work. proc at facto (IC) : 788
** TOTAL space in MBYTES for IC factorization : 788
** Rank of proc needing largest memory for OOC facto : 0
** Estimated corresponding MBYTES for OOC facto : 209
** Estimated avg. MBYTES per work. proc at facto (OOC) : 209
** TOTAL space in MBYTES for OOC factorization : 209
done, time elapsed: 1.841s
Call MUMPS to run Factorization processes...
****** FACTORIZATION STEP ********


GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 1
OUT-OF-CORE OPTION (ICNTL(22)) = 1
REAL SPACE FOR FACTORS = 46139251
INTEGER SPACE FOR FACTORS = 1735694
MAXIMUM FRONTAL SIZE (ESTIMATED) = 2666
NUMBER OF NODES IN THE TREE = 7470
Maximum effective relaxed size of S = 19927292
Average effective relaxed size of S = 19927292
GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.0620
** Memory relaxation parameter ( ICNTL(14) ) : 90
** Rank of processor needing largest memory in facto : 0
** Space in MBYTES used by this processor for facto : 209
** Avg. Space in MBYTES per working proc during facto : 209
Message has been deleted

clappertown

unread,
Apr 8, 2011, 2:41:03 AM4/8/11
to mor4ansys
The problem came from GotoBLAS (v1.19) I used - I guess there is
something wrong with its memory access. Everything runs ok with
reference BLAS or atlas but with significantly slower speed! I just
found newer GotoBLAS 1.26 version and compiled it and now MUMPS works
properly.

Evgenii Rudnyi

unread,
Sep 6, 2012, 4:12:27 AM9/6/12
to mor4...@googlegroups.com
On 02.09.2012 19:45 田雪峰 said the following:
> Hi,
>
> I don't know if you can still see this reply.
>
> I want to use the MUMPS to solve the matrix.
>
> And i compiled MUMPS, reference BLAS and METIS lib, and used it to solve a
> positive definite matrix.
>
> The matrix order is only 100,000 and I used the OCC Mode.
>
> However, not like you, I saw for your matrix with 300,000 orders and only
> takes 64s.
>
> It took about 1000s to do the fabrication.
>
> Could you tell me what is wrong with my program.
>
> I read you discussion and I think I do right.
>
> Regard,
>
> Chris

Have you used the optimized BLAS? If you use BLAS from Netlib, it is
slow, as its goal was just to demonstrate a possible implementation.
Hence if by the reference BLAS you mean the BLAS from Netlib, this could
be the reason.

Otherwise, it is highly depended on the matrix.

By the way, MUMPS by default does not use METIS. You have to set some
parameter to force MUMPS using METIS.

Evgenii

Reply all
Reply to author
Forward
0 new messages