Out of core dense matrix solver

379 views
Skip to first unread message

Olumide

unread,
Mar 25, 2009, 12:16:27 PM3/25/09
to matrixprogramming
For the fun it, I thought I'd make this post read like an ad in the
personals section. Here goes:

Researcher in mid thirties seeks an out of core dense matrix solver
for simulation and experimentation. Open source preferred but not
necessary. Must be available on the internet area.

:-)

Mark Hoemmen

unread,
Mar 25, 2009, 1:40:55 PM3/25/09
to matrixpr...@googlegroups.com

Ha! ;-) Well, ScaLAPACK has an out-of-core parallel version. There may
be other solvers too. It's rather unusual to require a dense
out-of-core LU factorization (that's usually a sign you should be using
a sparse solver instead) but they do exist.

mfh

Olumide

unread,
Mar 25, 2009, 2:02:18 PM3/25/09
to matrixprogramming
> Ha! ;-)  Well, ScaLAPACK has an out-of-core parallel version.  There may
> be other solvers too.  It's rather unusual to require a dense
> out-of-core LU factorization (that's usually a sign you should be using
> a sparse solver instead) but they do exist.

Thanks Mark. I have tried a sparse system approach to the (surface
reconstruction) problem I'm trying to solve but it produces
undesirable ripples on output surface, which is especially undesirable
for computing differential geometry properties. So, I've decided to
give the global method that produces a dense matrix a shot, but only
if I can find an appropriate out of core solver. As do not have more
than one PC readily at my disposal, a parallel solver is of little use
to me at the moment.

I'll give ScaLAPACK a closer look.

Mark Hoemmen

unread,
Mar 25, 2009, 2:12:22 PM3/25/09
to matrixpr...@googlegroups.com

As far as I know, all of the existing dense out-of-core solvers are
parallel. ScaLAPACK may not play nicely if you only give it one
processor but you can trick it into thinking that there are two processors.

If I understand your problem correctly, the dense matrix you want to
solve should have a natural structure which is amenable to solution by
more clever factorization techniques, such as hierarchical matrices.

mfh

Olumide

unread,
Mar 25, 2009, 3:03:23 PM3/25/09
to matrixprogramming
> As far as I know, all of the existing dense out-of-core solvers are
> parallel.  ScaLAPACK may not play nicely if you only give it one
> processor but you can trick it into thinking that there are two processors.

Thanks. I'll look into/ask about this. I have just registered on a
LAPACK/ScaLAPACK forum (http://icl.cs.utk.edu/lapack-forum)

> If I understand your problem correctly, the dense matrix you want to
> solve should have a natural structure which is amenable to solution by
> more clever factorization techniques, such as hierarchical matrices.

Indeed the matrix has structure. It has four blocks and looks like
this:

[ A T ]
[ T' 0 ]

T' is the transpose of a non-square matrix T, while A is a square and
symmetric matrix.

Do you know of any clever techniques for solving such a system?

Mark Hoemmen

unread,
Mar 25, 2009, 3:50:15 PM3/25/09
to matrixpr...@googlegroups.com
Olumide wrote:
>> As far as I know, all of the existing dense out-of-core solvers are
>> parallel. ScaLAPACK may not play nicely if you only give it one
>> processor but you can trick it into thinking that there are two processors.
>
> Thanks. I'll look into/ask about this. I have just registered on a
> LAPACK/ScaLAPACK forum (http://icl.cs.utk.edu/lapack-forum)
>
>> If I understand your problem correctly, the dense matrix you want to
>> solve should have a natural structure which is amenable to solution by
>> more clever factorization techniques, such as hierarchical matrices.
>
> Indeed the matrix has structure. It has four blocks and looks like
> this:

I meant something rather more mathematical than that -- I had boundary
value problems in mind, but it looks like you are solving some kind of
constrained optimization problem instead. Oh well.

mfh

Evgenii Rudnyi

unread,
Mar 26, 2009, 3:24:14 PM3/26/09
to matrixpr...@googlegroups.com
Olumide schrieb:

>> As far as I know, all of the existing dense out-of-core solvers are
...

> Indeed the matrix has structure. It has four blocks and looks like
> this:
>
> [ A T ]
> [ T' 0 ]

With some manipulations (Schur complement) x = [x1 x2]

A x1 + T x2 = b1
T'x1 + 0 x2 = b2

T'A^-1 (b1 - T x2) = b2

one can obtain

T'A^-1 T x2 = T'A^-1 b1 - b2

Though hard to say if this is better. Well it depends on the typical
dimensions of A and T. What they are?

Olumide

unread,
Mar 26, 2009, 10:46:20 PM3/26/09
to matrixprogramming
> With some manipulations (Schur complement) x = [x1 x2]
> ...
> one can obtain
>
> T'A^-1 T x2 = T'A^-1 b1 - b2
>
> Though hard to say if this is better. Well it depends on the typical
> dimensions of A and T. What they are?

For an n point problem A would be of dimension (n x n), while T would
be of dimension (n x 3), so that the 0 is in fact a (3 x 3) matrix,
where n would be on average 500.

BTW, I found one prototype version of ScaLAPACK ( http://www.netlib.org/scalapack/prototype/
) that has out of core features. Do you know how I would go about
compiling it into a C/C++ callable library? I'm not entirely sure
LAPACK itself is designed to as an out of core solver.

Mark Hoemmen

unread,
Mar 26, 2009, 11:00:20 PM3/26/09
to matrixpr...@googlegroups.com
Olumide wrote:
>> With some manipulations (Schur complement) x = [x1 x2]
>> ...
>> one can obtain
>>
>> T'A^-1 T x2 = T'A^-1 b1 - b2
>>
>> Though hard to say if this is better. Well it depends on the typical
>> dimensions of A and T. What they are?
>
> For an n point problem A would be of dimension (n x n), while T would
> be of dimension (n x 3), so that the 0 is in fact a (3 x 3) matrix,
> where n would be on average 500.

500 x 500 should still fit easily in core; why would you need an
out-of-core solver in that case? Are you running with exceptionally
tight memory requirements?

> BTW, I found one prototype version of ScaLAPACK ( http://www.netlib.org/scalapack/prototype/
> ) that has out of core features. Do you know how I would go about
> compiling it into a C/C++ callable library? I'm not entirely sure
> LAPACK itself is designed to as an out of core solver.

That's ok; ScaLAPACK handles the out-of-core part. It's "prototype"
because I don't think it ever got integrated into the formal release of
ScaLAPACK. www.netlib.org/scalapack has plenty of build instructions,
which you should read carefully. I've never built out-of-core ScaLAPACK
so I can't help you there. You might want to poke around for some other
libraries besides that one.

mfh

Olumide

unread,
Mar 27, 2009, 4:40:23 AM3/27/09
to matrixprogramming
> 500 x 500 should still fit easily in core; why would you need an
> out-of-core solver in that case?  Are you running with exceptionally
> tight memory requirements?

Actually, 500 x 500 is a very modest estimate that I'm trying to
achieve by my trimming my data set, albeit at the expense of fidelity.
If I had my way, I'd be dealing with 50,000 x 50,000 matrices -- if I
had my way (sigh).

A few weeks ago, I tried to solve a 1800 x 1800 system with the GNU
GSL LU solver ( see http://www.gnu.org/software/gsl/manual/html_node/LU-Decomposition.html
) on my modest laptop. After a whole night of endless paging, I
terminated the solver. That alerted me to the need for an out of core
solver.

> That's ok; ScaLAPACK handles the out-of-core part.  It's "prototype"
> because I don't think it ever got integrated into the formal release of
> ScaLAPACK.  www.netlib.org/scalapackhas plenty of build instructions,
> which you should read carefully.  I've never built out-of-core ScaLAPACK
> so I can't help you there.  You might want to poke around for some other
> libraries besides that one.

Which OOC solvers would you recommend?

Olumide

unread,
Mar 27, 2009, 4:45:50 AM3/27/09
to matrixprogramming
BTW, I'm aware of the fast multipole method. Unfortunately, the
implementation that I'm aware costs in excess of $3000. Also, the
implementation of this solver is anything but trivial.

Mark Hoemmen

unread,
Mar 27, 2009, 2:22:56 PM3/27/09
to matrixpr...@googlegroups.com
Olumide wrote:
>> That's ok; ScaLAPACK handles the out-of-core part. It's "prototype"
>> because I don't think it ever got integrated into the formal release of
>> ScaLAPACK. www.netlib.org/scalapackhas plenty of build instructions,
>> which you should read carefully. I've never built out-of-core ScaLAPACK
>> so I can't help you there. You might want to poke around for some other
>> libraries besides that one.
>
> Which OOC solvers would you recommend?

I'm more familiar with OOC QR solvers, as that was a subject of research
interest to me last year. I haven't investigated dense OOC LU solvers.

If you'd like, you can list whatever OOC LU solvers you find here, and
I'll ask around and see if they are any good. You might also like to
try the LAPACK forum for questions about OOC ScaLAPACK LU.

Best,
mfh

Mark Hoemmen

unread,
Mar 27, 2009, 2:27:16 PM3/27/09
to matrixpr...@googlegroups.com

That's odd, I see a lot of free (as in beer) software out there when I
search online. Do you have an unusual application with specific
requirements?

mfh

Olumide

unread,
Mar 27, 2009, 5:31:13 PM3/27/09
to matrixprogramming
No I don't. Its just the multipole method/algorithm that there does
not appear to be a free software for. The only version (commercial)
I've found is marketed by a New Zealand company called Farfield
Technology http://www.farfieldtechnology.com/products/toolbox/index.html
.

Do you know about the multipole method?

Olumide

unread,
Mar 27, 2009, 5:37:52 PM3/27/09
to matrixprogramming
> > Which OOC solvers would you recommend?
>
> I'm more familiar with OOC QR solvers, as that was a subject of research
> interest to me last year.  I haven't investigated dense OOC LU solvers.

I have no reason to prefer LU over QR. I know so little about either
technique, and I'm willing to try any OOC QR solvers I can get.

> If you'd like, you can list whatever OOC LU solvers you find here, and
> I'll ask around and see if they are any good.  You might also like to
> try the LAPACK forum for questions about OOC ScaLAPACK LU.

I have registered at the LAPACK/ScaLAPACK forum, but its got so little
traffic (and recently a mind boggling amount of spam). The problem
with the prototype ScaLAPACK OOC solver (http://www.netlib.org/
scalapack/prototype/) is that I need to compile it into a C-callable
library for Windows.

BTW, I've recently had my attention turned toward the FLAME library:
http://www.cs.utexas.edu/users/flame/ , but I'm yet to get to grips
with what it actually does i.e. does it describe a solver or write
pseudo-code for one? How much programming would I have to do in order
to make use a solver derived by FLAME etc.

Evgenii Rudnyi

unread,
Mar 28, 2009, 12:09:33 PM3/28/09
to matrixpr...@googlegroups.com
Olumide schrieb:

>> With some manipulations (Schur complement) x = [x1 x2]
>> ...
>> one can obtain
>>
>> T'A^-1 T x2 = T'A^-1 b1 - b2
>>
>> Though hard to say if this is better. Well it depends on the typical
>> dimensions of A and T. What they are?
>
> For an n point problem A would be of dimension (n x n), while T would
> be of dimension (n x 3), so that the 0 is in fact a (3 x 3) matrix,
> where n would be on average 500.

I am not sure that it makes sense to remove T as above, well it might
have sense only if the numerical properties of A are much better than of
the whole matrix.

I have made a test for dgesv on my new notebook (64-bit, 4 GB of RAM,
MKL that uses 2 processors)

http://matrixprogramming.com/LAPACK/

It produces:

$ ./dgesv 1000
matrices allocated and initialised 0.078
dgesv is over for 0.14
info is 0
check is 1.1176e-010

$ ./dgesv 5000
matrices allocated and initialised 0.859
dgesv is over for 4.578
info is 0
check is 7.09765e-009

$ ./dgesv 10000
matrices allocated and initialised 3.218
dgesv is over for 34.875
info is 0
check is 4.79399e-008

$ ./dgesv 12000
matrices allocated and initialised 4.656
dgesv is over for 59.438
info is 0
check is 5.20303e-007

$ ./dgesv 13000
matrices allocated and initialised 5.39
dgesv is over for 75.313
info is 0
check is 4.43987e-007

$ ./dgesv 14000
matrices allocated and initialised 6.234
dgesv is over for 93.703
info is 0
check is 5.59363e-007

The memory usage is about 1.5 Gb for 10 K problems and 3 Gb for 14 K
problem. With 15000 it starts trashing.

> BTW, I found one prototype version of ScaLAPACK ( http://www.netlib.org/scalapack/prototype/
> ) that has out of core features. Do you know how I would go about
> compiling it into a C/C++ callable library? I'm not entirely sure
> LAPACK itself is designed to as an out of core solver.

Note that the time above is the fastest. With out-of-core it will be
only slower. Also note that time grows as N^3 and memory requirement as
N^2. So, the chipest solution is just to buy a computer with 16 Gb of
RAM, then you could afford about 30-40 K matrices with about 800 s for a
run. With out-of-core it will take forever.

Olumide

unread,
Mar 29, 2009, 6:55:58 PM3/29/09
to matrixprogramming
> Note that the time above is the fastest. With out-of-core it will be
> only slower. Also note that time grows as N^3 and memory requirement as
> N^2. So, the ch[ea]pest solution is just to buy a computer with 16 Gb of
> RAM, then you could afford about 30-40 K matrices with about 800 s for a
> run. With out-of-core it will take forever.

Very practical advise, thanks.

Evgenii Rudnyi

unread,
Mar 30, 2009, 1:44:47 PM3/30/09
to matrixpr...@googlegroups.com
Olumide schrieb:

It was an interesting misprint indeed. Probably its hidden meaning was
that by buying more memory you will not only save your time but also
help to survive the memory producers.

Olumide

unread,
Mar 30, 2009, 11:42:11 PM3/30/09
to matrixprogramming
> I have made a test for dgesv on my new notebook (64-bit, 4 GB of RAM,
> MKL that uses 2 processors)
>
> http://matrixprogramming.com/LAPACK/

BTW, is dgesv.cpp clapack code or lapack++?

I'm still trying to figure our how to learn lapack.

Olumide

unread,
Mar 30, 2009, 11:59:32 PM3/30/09
to matrixprogramming
Can you give any advise on optimizing blas? Is there such a thing as a
pre-optimized blas or does blas have to be optimized for my system?

Also, there is ATLAS and there is goto-BLAS. Which would you
recommend?

Thanks.

Mark Hoemmen

unread,
Mar 31, 2009, 10:46:16 AM3/31/09
to matrixpr...@googlegroups.com
On Mon, Mar 30, 2009 at 20:59, Olumide <50...@web.de> wrote:
> Can you give any advise on optimizing blas? Is there such a thing as a
> pre-optimized blas or does blas have to be optimized for my system?

Um, both? You install an optimized BLAS library. At installation
time, it may perform some optimizations for your particular machine.
Some BLAS libraries (Intel's MKL, for example) come with different
binaries for different machines already pre-built. Others (Goto's
BLAS, for example) try to guess what kind of machine you have, and
then compile that version (it comes as source code). Others (such as
ATLAS) don't try as hard to guess your machine type, but instead do
automatic search over a space of different algorithms at install time.
(In fact, modern versions of ATLAS also come with previously tuned
versions of the BLAS for common machines, in order to expedite the
cost of the search process.)

> Also, there is ATLAS and there is goto-BLAS. Which would you
> recommend?

Goto's BLAS is very good, but it is strictly for non-commercial use.
If you plan to make money from your work, then you should either use
ATLAS, or purchase a commercial license for another BLAS library. If
you are doing academic work, then Goto's BLAS is perhaps the best you
can find (at least it was the last time I checked). However, "best"
nowadays means something like a factor of <10% better performance, so
you might prefer to use a library that's already installed on your
machine, especially if you are running on HPC hardware. I wouldn't
use "libblas" on a Linux machine, though, unless you know it's good;
it's often just a build of the reference implementation of the BLAS
and is dreadfully slow.

mfh

Olumide

unread,
Mar 31, 2009, 2:06:55 PM3/31/09
to matrixprogramming
On 31 Mar, 15:46, Mark Hoemmen <mark.hoem...@gmail.com> wrote:
> On Mon, Mar 30, 2009 at 20:59, Olumide <50...@web.de> wrote:
> > Can you give any advise on optimizing blas? Is there such a thing as a
> > pre-optimized blas or does blas have to be optimized for my system?
>
> Um, both?  You install an optimized BLAS library.  At installation
> time, it may perform some optimizations for your particular machine.
> Some BLAS libraries (Intel's MKL, for example) come with different
> binaries for different machines already pre-built.  
> ...
> Goto's BLAS is very good, but it is strictly for non-commercial use.

Thanks Mark. My work is non-commercial i.e. university research.

The problem I find is that most project's build systems are targeted
at Linux and UNIX machines not visual studio. Some support Cygwin, but
I'm not sure if a library compiled with Cyqwin can be used with Visual
Studio.

Also, when you mean optimizations for a particular machine, do you
mean processor type or the combination of processor type, installed
memory etc. If the former is the the case, I'd gladly use intel's MKL.
I'm sure there's an intel blas lib file somewhere on my file system.

Evgenii Rudnyi

unread,
Mar 31, 2009, 2:30:25 PM3/31/09
to matrixpr...@googlegroups.com
Olumide schrieb:

>> I have made a test for dgesv on my new notebook (64-bit, 4 GB of RAM,
>> MKL that uses 2 processors)
>>
>> http://matrixprogramming.com/LAPACK/
>
> BTW, is dgesv.cpp clapack code or lapack++?

No, it uses the Fortran interface directly. In my view, it is the
simplest. Both clapack and lapack++ are wrappers to the Fortran version,
so it is extra level to learn.

Some problem with the Fortran interface is that different Fortran
compilers write function names differently but usually one can influence
it with some flags. So the trick is just to figure out, what your
Fortran compiler does. A few words about it are at

http://matrixprogramming.com/Tools/UsingFortranFromC++.html

> I'm still trying to figure our how to learn lapack.

In my view, the LAPACK Users Guide is pretty straightforward:

http://www.netlib.org/lapack/lug/

If to speack about solution of a linear system, it is here

http://www.netlib.org/lapack/lug/node26.html

When you find an appropriate function, its documentation is in the code,
for example

http://www.netlib.org/lapack/double/dgesv.f

Evgenii Rudnyi

unread,
Mar 31, 2009, 2:37:41 PM3/31/09
to matrixpr...@googlegroups.com
Olumide schrieb:
...

> The problem I find is that most project's build systems are targeted
> at Linux and UNIX machines not visual studio. Some support Cygwin, but
> I'm not sure if a library compiled with Cyqwin can be used with Visual
> Studio.

Yes, provided you compile with -mno-cygwin. You will find an example on
how to use UMFPACK libraries compiled with gcc under Cygwin with Visual
Studio at

http://matrixprogramming.com/UMFPACK/#Using

Alternatively one has to edit makefiles a bit in order to use cl with
GNU Make - cl flags are not completely compatible with Unix compilers.
Some examples on what should be done are at my Website.

> Also, when you mean optimizations for a particular machine, do you
> mean processor type or the combination of processor type, installed
> memory etc. If the former is the the case, I'd gladly use intel's MKL.
> I'm sure there's an intel blas lib file somewhere on my file system.

MKL is not free. Yet, it is quite good. A free alternative (not speaking
about ATLAS and GOTO) is AMD ACML, they also claim that the library
chooses the optimized code at the run-time.

Mark Hoemmen

unread,
Mar 31, 2009, 3:46:22 PM3/31/09
to matrixpr...@googlegroups.com
Olumide wrote:
> The problem I find is that most project's build systems are targeted
> at Linux and UNIX machines not visual studio. Some support Cygwin, but
> I'm not sure if a library compiled with Cyqwin can be used with Visual
> Studio.

I personally consider compiling with Microsoft's compiler a waste of
your time if you are an academic, but you may have a particular work
flow that necessitates it. You can also use Intel's C(++) compiler with
Visual Studio, which would make linking to Fortran libraries much easier
(assuming they are built with Intel's Fortran compiler. Both compilers
are available free-as-in-beer for noncommercial use.

> Also, when you mean optimizations for a particular machine, do you
> mean processor type or the combination of processor type, installed
> memory etc.

Most of this information is available online, so I won't explain. Just
use whatever optimized BLAS you manage to download and install; it
shouldn't make much difference unless you have sophisticated
requirements. If you want to learn more, read the websites of the
various BLAS implementations.

> If the former is the the case, I'd gladly use intel's MKL.
> I'm sure there's an intel blas lib file somewhere on my file system.

It might be there if you have Matlab installed. Otherwise you will have
to download a copy. It's also free-as-in-beer for noncommercial use.

mfh

Olumide

unread,
Mar 31, 2009, 6:16:01 PM3/31/09
to matrixprogramming
> > BTW, is dgesv.cpp clapack code or lapack++?
>
> No, it uses the Fortran interface directly. In my view, it is the
> simplest. Both clapack and lapack++ are wrappers to the Fortran version,
> so it is extra level to learn.
> ...
> http://matrixprogramming.com/Tools/UsingFortranFromC++.html

Thanks for putting up that page. It's very informative. However,
although I'm not done reading the page, I've already started to
wonder or doubt if Microsoft's compiler cl.exe can handle such
compilations. Unfortunately, the version of the Intel Fortran compiler
that I have will expire in a few days. In my experience, very few (if
any) of Intel's compiler suites and tools are free for the Windows
environment.

Olumide

unread,
Mar 31, 2009, 6:52:30 PM3/31/09
to matrixprogramming
> I personally consider compiling with Microsoft's compiler a waste of
> your time if you are an academic, but you may have a particular work
> flow that necessitates it.  You can also use Intel's C(++) compiler with
> Visual Studio, which would make linking to Fortran libraries much easier
> (assuming they are built with Intel's Fortran compiler.  Both compilers
> are available free-as-in-beer for noncommercial use.

I'm working on Windows, and as I guessed, the Intel C++ compiler is
not free :( . If only I'd started out with Linux instead of Windows --
my life would be easier. Its too late to migrate all my work to Linux.
I've got an October deadline for my research :( .

> > Also, when you mean optimizations for a particular machine, do you
> > mean processor type or the combination of processor type, installed
> > memory etc.
>
> Most of this information is available online, so I won't explain.

Most places I've looked appear to gloss over this one aspect. Can you
please confirm if the optimization is for the following:

(a.) processor type, OR
(b.) processor type and installed memory combo

Thanks.

Mark Hoemmen

unread,
Mar 31, 2009, 7:12:15 PM3/31/09
to matrixpr...@googlegroups.com
On Tue, Mar 31, 2009 at 15:52, Olumide <50...@web.de> wrote:
>> I personally consider compiling with Microsoft's compiler a waste of
>> your time if you are an academic, but you may have a particular work
>> flow that necessitates it.  You can also use Intel's C(++) compiler with
>> Visual Studio, which would make linking to Fortran libraries much easier
>> (assuming they are built with Intel's Fortran compiler.  Both compilers
>> are available free-as-in-beer for noncommercial use.
>
> I'm working on Windows, and as I guessed, the Intel C++ compiler is
> not free :( . If only I'd started out with Linux instead of Windows --
> my life would be easier. Its too late to migrate all my work to Linux.
> I've got an October deadline for my research :( .

Cygwin is your friend!

>> > Also, when you mean optimizations for a particular machine, do you
>> > mean processor type or the combination of processor type, installed
>> > memory etc.
>>
>> Most of this information is available online, so I won't explain.
>
> Most places I've looked appear to gloss over this one aspect. Can you
> please confirm if the optimization is for the following:
>
> (a.) processor type, OR
> (b.) processor type and installed memory combo

If you're installing a previously built library, it should be
optimized for the processor(s) and generally also for their caches.
If you install ATLAS and let it do automatic search on your machine
(this can take several hours), its optimization process will take your
entire machine into account. Usually BLAS 3 optimizations don't have
much to do with main memory (DRAM), though, so you shouldn't worry
about it. BLAS 1 and 2 operations are generally bound in performance
by main memory bandwidth, unless the operands fit in cache. Your
machines won't have enough memory bandwidth to keep the processor busy
anyway, no matter what kind of DRAM you have installed, unless it's
some weird superconducting stuff you got from a little green space
alien, or you're running on an embedded processor that's very slow.

I know I shouldn't be impatient! Evgenii is much nicer than I am
about patiently explaining things to inexperienced people.

mfh

Alejandro A. Ortiz Bernardin

unread,
Mar 31, 2009, 8:43:20 PM3/31/09
to matrixpr...@googlegroups.com

>> I personally consider compiling with Microsoft's compiler a waste of
>> your time if you are an academic, but you may have a particular work
>> flow that necessitates it. You can also use Intel's C(++) compiler with
>> Visual Studio, which would make linking to Fortran libraries much easier
>> (assuming they are built with Intel's Fortran compiler. Both compilers
>> are available free-as-in-beer for noncommercial use.

Why do you think that? I find MVS too much easier than using even cygwin.
Also with mono project (http://www.mono-project.com/Main_Page) any .NET
application that you had built in MVS can be built in Unix, Mac, etc with
little effort. I use both, GNU under Linux and MVS, and I prefer MVS ... but
Linux rather than Win.

Alejandro A. Ortiz Bernardin

unread,
Mar 31, 2009, 8:49:43 PM3/31/09
to matrixpr...@googlegroups.com

>> I'm working on Windows, and as I guessed, the Intel C++ compiler is
>> not free :( .

You can purchase MVS 2008 professional edition for $130 for academic
purposes. It is not a limited edition; it is the full professional edition
with all capabilities. Also, you can use the Express Edition for free and I
think that will be all that you need.

>> If only I'd started out with Linux instead of Windows --
>> my life would be easier. Its too late to migrate all my work to Linux.
>> I've got an October deadline for my research :( .

I don't think that would be too difficult migrate to Linux ... as I said in
another e-mail I usually do all my work in MVS and when I want to build a
Linux application, I just use GNU compiler and all works fine. For your
reference, I consider myself a very basic user of Linux, so it would be easy
for you to try Linux as well.

Olumide

unread,
Mar 31, 2009, 10:34:37 PM3/31/09
to matrixprogramming
> I don't think that would be too difficult migrate to Linux ...

Erm ... I'm talking about 4½ years of code, libraries here and apps
here. I'm in too deep to migrate especially with the looming deadline.
I'm literally counting days like a prisoner.

Evgenii Rudnyi

unread,
Apr 1, 2009, 4:25:10 PM4/1/09
to matrixpr...@googlegroups.com

There is a good integration between MS VC and Intel Fortran. It is just
Intel Fortran that coverts Fortran function names on Windows by default to
uppercase and does not add underscore. So, in the LAPACK example you will
see that the Fortran function was written this way. I have compiled that
example with cl.

Mark Hoemmen

unread,
Apr 2, 2009, 10:18:18 AM4/2/09
to matrixpr...@googlegroups.com
Alejandro A. Ortiz Bernardin wrote:
>>> I personally consider compiling with Microsoft's compiler a waste of
>>> your time if you are an academic, but you may have a particular work
>>> flow that necessitates it. You can also use Intel's C(++) compiler with
>>> Visual Studio, which would make linking to Fortran libraries much easier
>>> (assuming they are built with Intel's Fortran compiler. Both compilers
>>> are available free-as-in-beer for noncommercial use.
>
> Why do you think that? I find MVS too much easier than using even cygwin.

Maybe it's just a matter of habit for me, but I have to develop a lot of
cross-platform HPC code and it's much more trouble for me to port from
the MS toolchain than it is to assume a generic POSIX toolchain and port
to Cygwin from there. The latter approach lets me both develop and
deploy anywhere, and I can use whatever development toolchain I like.

I also got burnt many times by Visual Studio 6's poor support for C++
templates -- even though that was a long time ago, I feel like I can't
trust their compiler (not only did a lot of templates code not compile,
but it generated incorrect code once).

> Also with mono project (http://www.mono-project.com/Main_Page) any .NET
> application that you had built in MVS can be built in Unix, Mac, etc with
> little effort. I use both, GNU under Linux and MVS, and I prefer MVS ... but
> Linux rather than Win.

That's something I'll have to check out. I've heard many good things
about .NET.

mfh

Olumide

unread,
Apr 3, 2009, 4:58:50 PM4/3/09
to matrixprogramming
I've tried to compile ATLAS but the documentation quickly got out of
hand. I even sat down with a colleague and tried to work thought it,
but we were unable to get past the step "Tell config what compilers
and flags to use" in the following page: http://math-atlas.sourceforge.net/errata.html#WinComp
. We were unable figure out where "-C if ifort", "-C ic cl", and "-Si
nocygwin 1" go. We tried adding it to the script, Config but that
didn't work.

So because Intel's MKL isn't free, and I've got an Intel processor
(which rules out AMD's MLK) I tried compiling Goto BLAS instead, and
it seemed more straightforward i.e. running make in the source
directory produced the lib file libgoto_northwood-r1.26.a and a soft
link libgoto.a to it. I assume I'm supposed to rename both *.a files
to *.lib. Also, running make in the test directory produced the files:

CBLAT2.SUMM DBLAT2.SUMM SBLAT2.SUMM ZBLAT2.SUMM
CBLAT3.SUMM DBLAT3.SUMM SBLAT3.SUMM ZBLAT3.SUMM

The fact that Goto BLAS knows that my processor is Northwood makes me
think its successfully produced the optimized BLAS I need. Is this the
case?

Olumide

unread,
Apr 3, 2009, 5:09:43 PM4/3/09
to matrixprogramming
> The fact that Goto BLAS knows that my processor is Northwood makes me
> think its successfully produced the optimized BLAS I need. Is this the
> case?

Changed "and" in subject to "an".

BTW, from I've read online, Goto BLAS outperforms most BLAS
implementations.

Mark Hoemmen

unread,
Apr 3, 2009, 6:08:01 PM4/3/09
to matrixpr...@googlegroups.com
Olumide wrote:
>> The fact that Goto BLAS knows that my processor is Northwood makes me
>> think its successfully produced the optimized BLAS I need. Is this the
>> case?

Benchmark DGEMM on large matrices (say 1000 x 1000) and see whether it
performs close to the peak flop rate of the machine!

> BTW, from I've read online, Goto BLAS outperforms most BLAS
> implementations.

It's also very easy to install. In fact I just did it a few days ago --
it takes maybe 1 min to read the installation guide and a couple minutes
just to convince yourself that the default choices made by the
installation script are sensible. I haven't tried installing on Windows
but I imagine with Cygwin it would be equally easy.

mfh

Evgenii Rudnyi

unread,
Apr 4, 2009, 3:26:38 AM4/4/09
to matrixpr...@googlegroups.com
Olumide schrieb:
...

> So because Intel's MKL isn't free, and I've got an Intel processor
> (which rules out AMD's MLK)

AMD ACML is working for Intel processors as well. Yet, I do not know how
effective it is in this case. Well, in the case of DGEMM it was slightly
better than ATLAS 3.6 on some Intel processor, so it should be not that
bad.

Evgenii Rudnyi

unread,
Apr 4, 2009, 1:32:56 PM4/4/09
to matrixpr...@googlegroups.com
Evgenii Rudnyi schrieb:

I have tried the dgesv with AMD AMCL 64-bit on my notebook (Intel Duo T9600)

http://matrixprogramming.com/LAPACK/

It happened that in this case AMCL is a bit slower than MKL for bigger
matrices - about 50%. Yet, as I said, it is working and is free.

Olumide

unread,
Apr 4, 2009, 2:38:44 PM4/4/09
to matrixprogramming

> I have tried the dgesv with AMD AMCL 64-bit on my notebook (Intel Duo T9600)
>
> http://matrixprogramming.com/LAPACK/
>
> It happened that in this case AMCL is a bit slower than MKL for bigger
> matrices - about 50%. Yet, as I said, it is working and is free.

If only for the sake of completeness, have you considered testing with
Goto BLAS and the default (unoptimized) BLAS?

Evgenii Rudnyi

unread,
Apr 4, 2009, 3:05:14 PM4/4/09
to matrixpr...@googlegroups.com
Olumide schrieb:

I have LAPACK and BLAS compiled as it is from Netlib - so I have done it
fast. Now for one processor (the reference BLAS is not multithreaded)
for a matrix of dimension 5000: LAPACK + reference BLAS 48 s, MKL 8 s
and ACML 14 s. So, there is indeed visible difference between not
optimized and optimized BLAS. For a matrix of 1000 the time was: LAPACK
+ reference BLAS 0.4 s, MKL 0.09 s and ACML 0.14 s.

I do plan to try Goto BLAS and ATLAS (I have not compiled it for long
time, so it is time to try the newest version), but probably only in a
couple of weeks.

Olumide

unread,
Apr 4, 2009, 6:47:06 PM4/4/09
to matrixprogramming
> I have LAPACK and BLAS compiled as it is from Netlib - so I have done it
> fast. Now for one processor (the reference BLAS is not multithreaded)
> for a matrix of dimension 5000: LAPACK + reference BLAS 48 s, MKL 8 s
> and ACML 14 s. So, there is indeed visible difference between not
> optimized and optimized BLAS. For a matrix of 1000 the time was: LAPACK
> + reference BLAS 0.4 s, MKL 0.09 s and ACML 0.14 s.

I like those numbers :-) .

Even the performance of the reference BLAS seems encouraging at 48
secs for a 5000 by 5000 system! ... Perhaps I ought to conduct my own
tests.

> I do plan to try Goto BLAS and ATLAS (I have not compiled it for long
> time, so it is time to try the newest version), but probably only in a
> couple of weeks.

Its trivial to compile. Even a n00b like me found it terribly easy.
Almost no changes are required to the Makefile.rule file.

Evgenii Rudnyi

unread,
Apr 5, 2009, 3:11:55 AM4/5/09
to matrixpr...@googlegroups.com
Olumide schrieb:

>> I have LAPACK and BLAS compiled as it is from Netlib - so I have done it
>> fast. Now for one processor (the reference BLAS is not multithreaded)
>> for a matrix of dimension 5000: LAPACK + reference BLAS 48 s, MKL 8 s
>> and ACML 14 s. So, there is indeed visible difference between not
>> optimized and optimized BLAS. For a matrix of 1000 the time was: LAPACK
>> + reference BLAS 0.4 s, MKL 0.09 s and ACML 0.14 s.
>
> I like those numbers :-) .
>
> Even the performance of the reference BLAS seems encouraging at 48
> secs for a 5000 by 5000 system! ... Perhaps I ought to conduct my own
> tests.

Sure, this is the best way to go - just to run. Two more notes.

First, the comparison of optimized BLAS to reference BLAS depends
heavily on the memory clock. The number above for a good computer. You
may want a look at table 1 for matrix multiply to compare it with, let
me say, computers for home

http://matrixprogramming.com/MatrixMultiply/#Conclusion

Then the difference could be even more.

Second, LAPACK by default uses block algorithms. This means that even
with the reference BLAS, the performance is much better as compared with
some naive LU implementation. One can compare dgetrf (the 3-level BLAS
LU) with dgetf2 (the 2-level BLAS LU) in Lapack with the same reference
BLAS. I have such number for some older computer and matrix 2000x2000

dgetf2 13.6 s
dgetrf 4.2 s

One can see - a huge speed up is already before using optimized BLAS.
Say if one runs dgetf2 (or LU implementation that is not block based),
then the optimized BLAS will almost not help.

Reply all
Reply to author
Forward
0 new messages