Android armv8 build matrix lib latency

9 views
Skip to first unread message

andrew tabarez

unread,
May 16, 2024, 12:51:09 PMMay 16
to kaldi-help
Hello,
I have been working on compiling a local fork of Kaldi to work on Android, specifically armv8 (64bit) architecture. I have had a working implementation for armv7 (32bit) for a while now. Latency of offline speech to text for 32bit build was basically realtime. 

However, in my 64bit build the nnet3 computations are taking orders of magnitude longer and I cannot figure out why. I have setup a test scenario to test what I think is the issue, the matrix libraries. I am using OpenBLAS with CLAPACK when compiling kaldi.

Running the various unit-tests defined in matrix/matrix-lib-speed-test.cc I get the following results for 32bit and 64bit, I have bolded the tests with major differences between the two builds, though in all cases 32bit is still quicker.

I am running these tests on a Samsung Note 10, with 8core CPU AARCH64.

If someone could give me any tips on what to try next? I have compiled with various different optimization flags for the various kaldi dependencies with no improvement.

Thanks in advance!

64bit
UnitTestRealFftSpeed,float,512,0.106906,seconds
UnitTestSplitRadixRealFftSpeed,float,512,0.104571,seconds
UnitTestSvdSpeed,float,4,25.0509,seconds
UnitTestAddMatMatSpeed,float,2,2.04577,seconds
UnitTestAddRowSumMatSpeed,float,5,0.290059,seconds
UnitTestAddColSumMatSpeed,float,5,0.261519,seconds
UnitTestAddVecToRowsSpeed,float,5,0.219138,seconds
UnitTestAddVecToColsSpeed,float,5,0.209412,seconds
UnitTestRealFftSpeed,double,512,0.114652,seconds
UnitTestSplitRadixRealFftSpeed,double,512,0.075006,seconds
UnitTestSvdSpeed,double,4,13.3691,seconds
UnitTestAddMatMatSpeed,double,2,2.63372,seconds
UnitTestAddRowSumMatSpeed,double,5,0.185159,seconds
UnitTestAddColSumMatSpeed,double,5,0.15138,seconds
UnitTestAddVecToRowsSpeed,double,5,0.170119,seconds
UnitTestAddVecToColsSpeed,double,5,0.170321,seconds

32bit
UnitTestRealFftSpeed,float,512,0.0578439,seconds
UnitTestSplitRadixRealFftSpeed,float,512,0.030853,seconds
UnitTestSvdSpeed,float,4,5.679,seconds
UnitTestAddMatMatSpeed,float,2,0.747806,seconds
UnitTestAddRowSumMatSpeed,float,5,0.173356,seconds
UnitTestAddColSumMatSpeed,float,5,0.145231,seconds
UnitTestAddVecToRowsSpeed,float,5,0.148734,seconds
UnitTestAddVecToColsSpeed,float,5,0.144894,seconds
UnitTestRealFftSpeed,double,512,0.0558102,seconds
UnitTestSplitRadixRealFftSpeed,double,512,0.0309949,seconds
UnitTestSvdSpeed,double,4,7.71063,seconds
UnitTestAddMatMatSpeed,double,2,0.835502,seconds
UnitTestAddRowSumMatSpeed,double,5,0.150827,seconds
UnitTestAddColSumMatSpeed,double,5,0.155713,seconds
UnitTestAddVecToRowsSpeed,double,5,0.148279,seconds
UnitTestAddVecToColsSpeed,double,5,0.147105,seconds

andrew tabarez

unread,
May 16, 2024, 1:36:28 PMMay 16
to kaldi-help
Ok after running some more tests this might be CPU / memory throttling. I run various components in one process over many threads using https://github.com/raytheonbbn/Godec. It looks like when running ONLY these unit-tests not with other threads computing data I get results more in line with those of the 32bit, some even faster, though by profiling the application via android studio my application does not seem to be using much more memory than the 32bit build. ~300-400MB. (phone has 8gb ram)

UnitTestRealFftSpeed,float,512,0.059176,seconds
UnitTestSplitRadixRealFftSpeed,float,512,0.0241799,seconds
UnitTestSvdSpeed,float,4,4.08242,seconds
UnitTestAddMatMatSpeed,float,2,0.920794,seconds
UnitTestAddRowSumMatSpeed,float,5,0.169997,seconds
UnitTestAddColSumMatSpeed,float,5,0.155305,seconds
UnitTestAddVecToRowsSpeed,float,5,0.154547,seconds
UnitTestAddVecToColsSpeed,float,5,0.187884,seconds
UnitTestRealFftSpeed,double,512,0.101193,seconds
UnitTestSplitRadixRealFftSpeed,double,512,0.058409,seconds
UnitTestSvdSpeed,double,4,2.31942,seconds
UnitTestAddMatMatSpeed,double,2,2.43215,seconds
UnitTestAddRowSumMatSpeed,double,5,0.163369,seconds
UnitTestAddColSumMatSpeed,double,5,0.166243,seconds
UnitTestAddVecToRowsSpeed,double,5,0.160645,seconds
UnitTestAddVecToColsSpeed,double,5,0.179735,seconds

Reply all
Reply to author
Forward
0 new messages