most of gemmlowp is superseded by ruy

481 views

Skip to first unread message

Benoit Jacob

unread,

Feb 16, 2021, 11:10:09 AM2/16/21

to gemmlowp

Hello,

This is to let this group know that most gemmlowp users should be using ruy instead:

https://github.com/google/ruy

TensorFlow Lite switched from gemmlowp to ruy on ARM around 1.5 years ago.

Here are some benchmark results.

There isn't a lot of documentation, but there are example programs here.

Ruy's strengths over gemmlowp include:

- higher performance.

- support for ARMv8.2+ optional dot-product instructions.

- ruy handles runtime CPU feature detection and dispatch.

- both float and quantized supported in a single package.

- very general support for quantized data types: LHS and RHS can freely mix and match int8 and uint8, and destination can be int8, uint8, int16, int32.

- storage order is now a runtime (not template) parameter.

- output features such as bias-addition and clamp ("ReLU" etc) are also runtime controlled now.

- despite all the above, code side is very small (about 50k of code for TFLite's full set of instantiations).

- quantization scheme: including completely general support for per-channel quantization (per-row or per-column), and more recent refinements.

- CMake and Bazel build systems are fully supported.

Things that gemmlowp does that ruy does not:

- MIPS MSA support.

- the gemmlowp/fixedpoint library still does not have a replacement.

While ruy is the most direct successor to gemmlowp as a standalone matrix multiplication library, for neural network inference purposes, consider using XNNPACK instead. The main reason to use ruy over XNNPACK is if you need the greater generality of matrix multiplications that it supports, including multiplying two runtime-variable matrices (XNNPACK is centered more around the typical NN case where one of the matrices is constant weights).

Cheers,

Benoit

Reply all

Reply to author

Forward

0 new messages