What code does matlab use for Matrix multiplication? Does it use Strassen?

Nick

unread,

Mar 13, 2012, 6:29:18 PM3/13/12

to

I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.

I am benchmarking the matrix multiply at over 30 GigaFlops on my cpu, however, I am running only one thread (using -singleCompThread at startup), and my computer is 2.25 Ghz. Since I am doing single precision, SSE instructions allow 4 multiplies or adds per instruction, this allows 9 Gigaflops. Most computers can execute multiple instructions at once, however I know from my own tests, and consulting the intel manual: http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (table C-10 and C-10A) that my computer can do at most 2 floating point operations in one cycle. This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?

Below is my code that I tested with

sz=2.^(0:12)
for s=1:numel(sz)
A=rand(sz(s),sz(s),'single');
tic;
A'*A;
t=toc;
fprintf('sz=%d, time=%3.3gms, GFlops=%3.3g\n', sz(s), t*1000, 2*size(A,1)^3/1e9/t)
end

OUTPUT

sz=1, time=0.0457ms, GFlops=4.37e-005
sz=2, time=0.0199ms, GFlops=0.000803
sz=4, time=0.00634ms, GFlops=0.0202
sz=8, time=0.0086ms, GFlops=0.119
sz=16, time=0.0086ms, GFlops=0.952
sz=32, time=0.029ms, GFlops=2.26
sz=64, time=0.0503ms, GFlops=10.4
sz=128, time=0.268ms, GFlops=15.7
sz=256, time=1.61ms, GFlops=20.9
sz=512, time=10.8ms, GFlops=24.8
sz=1024, time=76.4ms, GFlops=28.1
sz=2048, time=572ms, GFlops= 30
sz=4096, time=4.36e+003ms, GFlops=31.5

Nasser M. Abbasi

unread,

Mar 13, 2012, 7:06:48 PM3/13/12

to

On 3/13/2012 5:29 PM, Nick wrote:
> I am getting unexpectedly fast results using matlab's matrix multiplication and
>I am wondering if it is doing something besides straight multiplication,
>such as using strassen's algorithm.

http://www.mathworks.com/help/techdoc/rn/f14-998197.html

"MATLAB uses Basic Linear Algebra Subprograms (BLAS) for
its vector inner product, matrix-vector product, matrix-matrix
product, and triangular solvers in \. MATLAB also uses BLAS
behind its core numerical linear algebra routines from
Linear Algebra Package (LAPACK), which are used in functions
like chol, lu, qr, and within the linear system solver \."

Roger Stafford

unread,

Mar 13, 2012, 7:25:18 PM3/13/12

to

"Nick" wrote in message <jjohnu$iv3$1...@newscl01ah.mathworks.com>...

> I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.

> .... This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?
> ......
> sz=4096, time=4.36e+003ms, GFlops=31.5
- - - - - - - - -
If the matlab compiler is smart enough it could recognize that your A'*A matrix is Hermitian and thereby reduce the number of flops needed by half which would then fall below your theoretical max of 18 Gigaflops. I seem to recall hearing somewhere that that is just what the compiler does in that case. Try it for a general A*B and see how it performs.

Roger Stafford

Nick

unread,

Mar 13, 2012, 7:36:17 PM3/13/12

to

you're right. Thank you! I've been trying to figure that out for a long time.
A=rand(4096,4096,'single'); B =rand(4096,4096,'single');
tic; A'*B; toc
Elapsed time is 8.106026 seconds.
gFlops = size(A,1)^3*2/1e9/8.08793
gFlops =

16.9931

Almost exactly as I expected

"Roger Stafford" wrote in message <jjol0u$sa0$1...@newscl01ah.mathworks.com>...

James Tursa

unread,

Mar 13, 2012, 8:34:11 PM3/13/12

to

"Nick" wrote in message <jjollh$10v$1...@newscl01ah.mathworks.com>...

MATLAB uses the following BLAS routines for symmetric cases:

DSYRK and SSYRK: Performs one of the symmetric rank k operations

* C := alpha*A*A' + beta*C,
*
* or
*
* C := alpha*A'*A + beta*C,
*
* where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k
* matrix in the first case and a k by n matrix in the second case.

DSYR2K and SSYR2K: Performs one of the symmetric rank 2k operations

* C := alpha*A*B' + alpha*B*A' + beta*C,
*
* or
*
* C := alpha*A'*B + alpha*B'*A + beta*C,
*
* where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k
* matrices in the first case and k by n matrices in the second case.

I haven't checked recently, but as of a few years ago MATLAB used these routines for most symmetric cases, but not all possible symmetric cases.

James Tursa