gwmmlowp performance advice

71 views

Skip to first unread message

Lili LI

unread,

Jun 11, 2020, 1:57:17 AM6/11/20

to gemmlowp

I used gemmlowp and eigen for comparison, the time is shown in the attachment.

As the size of the matrix becomes smaller, the advantage of gemmlowp will weaken. Does this meet expectations?

My cmake settings are:
add_definitions(-DGEMMLOWP_NEON_64)
add_definitions(-DGEMMLOWP_DOTPROD_KERNEL)
add_definitions("-march=armv8.2-a+dotprod")

platform:qualcomm 8250;

my code is:

template<bool transpose>
void IntegerGemm(const uint8_t *mat1, int M, int N, int K, const uint8_t *mat2,
                 int offset1, int offset2, int32_t *out) {
  using gemmlowp::MatrixMap;
  using gemmlowp::GemmContext;
  using gemmlowp::GemmWithOutputPipeline;
  using gemmlowp::MapOrder;
  using gemmlowp::DefaultL8R8BitDepthParams;
  // left(right)-hand side
  MatrixMap<const uint8_t, MapOrder::RowMajor>
      lhs(mat1, M, K);
  MatrixMap<const uint8_t, !transpose ? MapOrder::RowMajor : MapOrder::ColMajor>
      rhs(mat2, K, N);
  MatrixMap<int32_t, MapOrder::RowMajor> result(out, M, N);
  const std::tuple<> empty_pipeline = {};
  GemmContext context;
  int max_num_threads = 1;
  context.set_max_num_threads(max_num_threads);
  GemmWithOutputPipeline<uint8_t, int32_t, DefaultL8R8BitDepthParams>(
      &context, lhs, rhs, &result, -offset1, -offset2, empty_pipeline);
}