10x speed increase after first few multiplications, but first few calls to MatrixMatrixMult.mult_small are slow?

59 views
Skip to first unread message

Russell Butler

unread,
Dec 6, 2016, 3:05:30 PM12/6/16
to efficient-java-matrix-library-discuss
Hey all, i'm working on a physics engine for an android game right now, obviously i want the highest possible fps on my physics engine, so i switched from openGL's matrix multiplication function to EJML, in order to transform vertices on my object's collision frames (these are not the same as opengl shader vertices, so i'm not computing them on the GPU, they're basically points along the outline of every object in my game, that are used for collision detection, and they have to be updated by matrix multiplication every frame).

anways, i was running some benchmarks to see how much of a speedup i could get using ejml, and i noticed that on the first few matrix multiply calls (all i'm using is MatrixMatrixMult.mult_small right now) its quite slow (still 5x faster than my multiplyMM opengl function however) but on later calls to the same multiplication function, i get an almost 10x speedup, here is an example of running the same multiplication 10 times in a loop: (time is in ms, for a 4x4 * 100000x4 matrix).

time = 23
time = 4
time = 4
time = 3
time = 3
time = 3
time = 3
time = 3
time = 3
time = 3

the problem is, when i put this MatrixMatrixMult.mult_small into my game loop, i only achieve the performance of the first iteration (time=23ms), presumably due to cpu caching...

so i was wondering if its possible to somehow achieve the faster multiplications on the first iteration? or is this simply not possible, and i should try to break up my multiplications in some other way to take advantage of caching? it would be nice to have the faster times (3ms) because obviously that would allow me to run almost 10x as many colliding objects in my engine...

here is the code i'm using to benchmark:

import org.ejml.data.DenseMatrix64F;
import org.ejml.alg.dense.mult.* ; 

public class MatrixTest {

    public static void main(String[] args){
        DenseMatrix64F A = new DenseMatrix64F(4,4,true,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1) ;
        int length = 100000 ;
        DenseMatrix64F B = new DenseMatrix64F(length,4) ;
        int dataCount = 0 ;
        double[] data = new double[length*4] ;
        for(int i=0;i<length;i++)
            for(int j=0;j<4;j++){
                data[dataCount] = Math.random() ;
                dataCount++ ;
            }
        B.setData(data) ;
        for(int i=0;i<10;i++){
            DenseMatrix64F C = new DenseMatrix64F(length,4) ;       
            long startTime = System.nanoTime();
            MatrixMatrixMult.mult_small(C,A,B) ;
            long endTime = System.nanoTime();
            long duration = (endTime - startTime);
            System.out.println("time = " + duration/1000000) ;
            //double[] d = new double[10000000] ;
            //for(int j=0;j<d.length;j++)
            //    d[j] +=i+j ;
            //System.out.println(d[100000]) ;
        }
        /*
        A.print();
        System.out.println();
        A.print("%e");
        System.out.println();
        A.print("%10.2f");
        */
       
    }
}

Peter A

unread,
Dec 7, 2016, 4:22:39 PM12/7/16
to efficient-java-mat...@googlegroups.com
That might actually be the hot spot JIT optimizer in action.  One issue with micro benchmarks is that they can exaggerate how much of a speed boost you get in practice. I think hot spot has a limited budget for optimizing.  So unless a piece of code is called a whole bunch it isn't optimized and it might only have enough memory for a limited amount of optimizations.  That's just from observing how it works, I'm not really sure what's going on.

General advice:
1) Never call new inside your highly optimized loops.  Pre-declare all memory if possible.
2) Have you looked at fixed sized matrices?  They can run much faster.
3) Are there any good profiling tools that run on android?  That might give you a better idea what's slowing down your code.

--
You received this message because you are subscribed to the Google Groups "efficient-java-matrix-library-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to efficient-java-matrix-library-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
"Now, now my good man, this is no time for making enemies."    — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

Russell Butler

unread,
Dec 8, 2016, 7:42:56 PM12/8/16
to efficient-java-matrix-library-discuss
thanks for the tips...on further testing i actually realized that it isn't any faster to use ejml than the simple opengl matrix multiplication call...maybe because the matrices are so small.

what do you mean by fixed size matrices? a matrix that doesn't change size from frame to frame?
thanks,
Russell

Russell Butler

unread,
Dec 8, 2016, 8:13:42 PM12/8/16
to efficient-java-matrix-library-discuss
ah nevermind i found the fixed size matrix...it speeds up the multiplication by around 20%, which is not bad...was hoping for more of an increase but its still better than before.
thanks,
Russell

Russell Butler

unread,
Dec 8, 2016, 8:45:28 PM12/8/16
to efficient-java-matrix-library-discuss
ok, so i lied when i run it on my PC in eclipse i get only a 20% speed increase using fixed matrix size, but when i run it on my android device in android studio for some reason the performance increase is almost 100%...strange, but good for me i guess!

Peter A

unread,
Dec 8, 2016, 8:57:48 PM12/8/16
to efficient-java-mat...@googlegroups.com
I think the JIT optimization on embedded devices isn't as good as it is on the desktop.  If you write code which is "already" optimized it tends to run faster.  Is it faster than opengl on Android now?

- Peter

Russell Butler

unread,
Dec 14, 2016, 1:44:22 PM12/14/16
to efficient-java-matrix-library-discuss
yeah. it is about 2x faster using fixed size matrices, than opengl's multiplymatrix function (on my samsung, on my laptop its only 20% faster). thanks again,
Russell
Reply all
Reply to author
Forward
0 new messages