Runtime performance for 64x64 and 96x96 matrices

49 views

Skip to first unread message

Stephen J. Gaffigan

unread,

Dec 2, 2011, 11:43:54 PM12/2/11

to efficient-java-mat...@googlegroups.com

Hello,

I've been enjoying the runtime performance of EJML, which in an application I've been working on results in a speedup of 3-4 times compared to Jama. I've been looking at CPU timing results with varying square matrix sizes. While I have run JMatBench on my local system for more complete and accurate runtime performance results, I'm also running these few extra tests to evaluate the overhead of an interface I'm using.

My question is that I'm seeing a consistent spike in CPU time in both addition and multiplication operations using EJML for matrix sizes of 64x64 and 96x96. I also see a smaller spike in Jama at 96x96. This is consistently observed even when taking the mean CPU time over 1000 operations. Does anyone know why this occurs? Example code snippet is below and image is attached, though I'm not sure if the image will make it through.

DescriptiveStatistics stats = new DescriptiveStatistics(); // commons-math
for (int j=0; j<reps+WARM_UP; j++) { // reps=1000, WARM_UP=100
   DenseMatrix64F matrix = new DenseMatrix64F(n,n);
   CommonOps.set(matrix,generator.nextDouble());
   DenseMatrix64F matrix2 = new DenseMatrix64F(n,n);
   CommonOps.set(matrix2,generator.nextDouble());

   long tm0 = plogger.getCurrentTime(); // Returns CPU time through ThreadMXBean
   DenseMatrix64F ret = new DenseMatrix64F(matrix.getNumRows(),matrix.getNumCols());
   CommonOps.add(matrix,matrix2,ret);
   long tm1 = plogger.getCurrentTime();
   if (j>=WARM_UP) {
      stats.addValue((tm1-tm0)/1.e6);
   }
}
return stats.getMean();

Thanks for the fast library.

Steve

plus.png

Peter Abeles

unread,

Dec 5, 2011, 8:09:32 AM12/5/11

to efficient-java-mat...@googlegroups.com

The most likely culprit is the CPU's cache. You tend to see strange
non-linear behavior when you jump over a cache barrier. The reason it
effected Jama more than EJML is that Jama has more overhead, so the
cache miss got washed out a bit.

CommonOps.add() is a very simple function with little overhead and close
to optimal. It might become a bit faster if function calls were
removed, but I have a vague memory doing that and finding it made little
difference.