I'm investigating ways to leverage BLIS GEMM to implement efficient convolution for Neural Networks inference. At its core, the key is the same as in GEMM, pushing as many multiplications as possible through the CPU.
A typical case is the 2D convolution.
Example:
* input is a 400x300 image, in 3 channels (RGB) (400x300x3)
* filter will be a 5x5 square, working on the 3 input channel and extracting 128 features. (5x5x3x128).
* result is 400x300x128 (disregarding boundaries issues). Each "pixel" is now 128 float values deep.
Number of multiplications: 400x300x3x5x5x128.
One easy way to improve on the naive approach is to generate a "megamatrix" A and use GEMM:
* A will be 400x300 high, 5x5x3 wide. One line per pixel in the result. This matrix is highly redundant, each input value will appear 5x5 times (for each output pixel it contributes too).
* B is just the filter rearranged. 128 column, one per input features, with 5x5x3 lines.
* GEMM will compute all values in one single pass, one stride manipulation away to get the required result. Each line is a "pixel", each line has 128 column, one for each feature.
Despite the size of the "magamatrix" A, this is a significant improvment from the naive 6 nested loops approach.
I'm looking into smarter ways though:
* the filter is usually constant, B should be converted into panel once and for all
* crating the megamatrix knowning that GEMM will gut it out to rewrite it in panels is an even bigger waste.
So I'm considering doing the panelisation on my side. I am considering asking BLIS for MR and NR, doing panels the right side and then calling the kernel myself. Another option is to call GEMM, betting that everything will degenerate nicely as I'm giving panels of the right side...
What do you people think ?
--
You received this message because you are subscribed to the Google Groups "blis-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blis-discuss...@googlegroups.com.
To post to this group, send email to blis-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/blis-discuss.
For more options, visit https://groups.google.com/d/optout.
Of course, based on your filter sizes and precision requirements, FFT or Winograd is going to require fewer FLOPs...
Devin Matthews