We discussed math like optimizations for Cortex-M on the previous SIG meeting, where CMSIS-DSP is a good option. As promised, here are frontend optimizations
for the micro speech example. Checkout these lines
for a hint on what's done. This will give you a significant reduction of the frontend cycle count.