Hello,
I have been working on compiling a local fork of Kaldi to work on Android, specifically armv8 (64bit) architecture. I have had a working implementation for armv7 (32bit) for a while now. Latency of offline speech to text for 32bit build was basically realtime.
However, in my 64bit build the nnet3 computations are taking orders of magnitude longer and I cannot figure out why. I have setup a test scenario to test what I think is the issue, the matrix libraries. I am using OpenBLAS with CLAPACK when compiling kaldi.
Running the various unit-tests defined in matrix/matrix-lib-speed-test.cc I get the following results for 32bit and 64bit, I have bolded the tests with major differences between the two builds, though in all cases 32bit is still quicker.
I am running these tests on a Samsung Note 10, with 8core CPU AARCH64.
If someone could give me any tips on what to try next? I have compiled with various different optimization flags for the various kaldi dependencies with no improvement.
Thanks in advance!
64bit
UnitTestRealFftSpeed,float,512,0.106906,seconds
UnitTestSplitRadixRealFftSpeed,float,512,0.104571,seconds
UnitTestSvdSpeed,float,4,25.0509,seconds
UnitTestAddMatMatSpeed,float,2,2.04577,seconds
UnitTestAddRowSumMatSpeed,float,5,0.290059,seconds
UnitTestAddColSumMatSpeed,float,5,0.261519,seconds
UnitTestAddVecToRowsSpeed,float,5,0.219138,seconds
UnitTestAddVecToColsSpeed,float,5,0.209412,seconds
UnitTestRealFftSpeed,double,512,0.114652,seconds
UnitTestSplitRadixRealFftSpeed,double,512,0.075006,seconds
UnitTestSvdSpeed,double,4,13.3691,seconds
UnitTestAddMatMatSpeed,double,2,2.63372,seconds
UnitTestAddRowSumMatSpeed,double,5,0.185159,seconds
UnitTestAddColSumMatSpeed,double,5,0.15138,seconds
UnitTestAddVecToRowsSpeed,double,5,0.170119,seconds
UnitTestAddVecToColsSpeed,double,5,0.170321,seconds
32bit
UnitTestRealFftSpeed,float,512,0.0578439,seconds
UnitTestSplitRadixRealFftSpeed,float,512,0.030853,seconds
UnitTestSvdSpeed,float,4,5.679,seconds
UnitTestAddMatMatSpeed,float,2,0.747806,seconds
UnitTestAddRowSumMatSpeed,float,5,0.173356,seconds
UnitTestAddColSumMatSpeed,float,5,0.145231,seconds
UnitTestAddVecToRowsSpeed,float,5,0.148734,seconds
UnitTestAddVecToColsSpeed,float,5,0.144894,seconds
UnitTestRealFftSpeed,double,512,0.0558102,seconds
UnitTestSplitRadixRealFftSpeed,double,512,0.0309949,seconds
UnitTestSvdSpeed,double,4,7.71063,seconds
UnitTestAddMatMatSpeed,double,2,0.835502,seconds
UnitTestAddRowSumMatSpeed,double,5,0.150827,seconds
UnitTestAddColSumMatSpeed,double,5,0.155713,seconds
UnitTestAddVecToRowsSpeed,double,5,0.148279,seconds
UnitTestAddVecToColsSpeed,double,5,0.147105,seconds