Kernel sizes

12 views
Skip to first unread message

Robin Verschueren

unread,
Nov 19, 2020, 7:39:46 AM11/19/20
to blis-discuss
Dear all,

I was reading the BLIS documentation (which is very good by the way) but I couldn't find an explanation about how the MR and NR are chosen. For example, MR=4/6/8, NR=12/8/6 in the dgemm microkernel for Haswell. I guess they are chosen based on the values for throughput/latency/cache properties etc. for a particular architecture, but I couldn't figure out how exactly.

Can somebody please enlighten me please? Thanks.

Best regards,
Robin

Minh Quan HO

unread,
Nov 19, 2020, 9:12:07 AM11/19/20
to blis-discuss
Hi Robin,

You haved replied a part of your question. Yes, MR and NR are chosen based on hardware properties like register-file, arithmetic capacity and memory (L1) latency.

How to do that ? It is indeed the responsibility of the developer to study the architecture he wants to port BLIS on, to find out how big are MR and NR and how to get >= 90% of peak out of the cores. And it is not an easy task.

Hope it helps,
Quan

Robin Verschueren

unread,
Nov 19, 2020, 9:22:29 AM11/19/20
to Minh Quan HO, blis-discuss
Thanks Quan. Could you (or someone else) walk me through the Haswell example?

Let's take the 4x12 example. I guess the '4' has to do with the vector length of AVX2 in double precision? But why the '12'?

--
You received this message because you are subscribed to a topic in the Google Groups "blis-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/blis-discuss/-gZZX06NNuA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blis-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-discuss/1a310071-5306-4154-a6a0-b76a8b919900n%40googlegroups.com.

Minh Quan HO

unread,
Nov 19, 2020, 9:29:51 AM11/19/20
to Robin Verschueren, blis-discuss
Sorry, I'm not familiar with Haswell arch. But there would be someone
in this group who can give you a satisfying response.

Bests,
Quan

Robert van de Geijn 2

unread,
Nov 19, 2020, 9:32:50 AM11/19/20
to blis-d...@googlegroups.com
You will want to read the paper identified as "BLIS4" at the end of

https://github.com/flame/blis

Tze Meng Low

unread,
Nov 19, 2020, 12:06:32 PM11/19/20
to Robert van de Geijn 2, blis-d...@googlegroups.com
The paper provides analytical models that explain how the different parameters used in BLIS can be mapped to hardware features on CPU architectures.

The minimum size of the micro-tile of C (MR x NR) must be large enough to avoid stalls in the computational pipelines. The maximum size must still fit within the available number of architecture registers. Either MR or NR is set to be a multiple of the SIMD length.

For the Haswell machine, the minimum size is 40 for double precision, and the 3 different configs meet this minimum required.  They differ in how the microkernel is computed, e.g. the choice of instructions.

Best regards,
Tze Meng

You received this message because you are subscribed to the Google Groups "blis-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blis-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-discuss/57e875b5-d210-f72d-93ad-befda2c5e6e6%40gmail.com.

Robin Verschueren

unread,
Nov 23, 2020, 11:26:53 AM11/23/20
to Tze Meng Low, Robert van de Geijn 2, blis-discuss
Now it's clear!

Thank you for both for the reference and the valuable work.

Best regards,
Robin

Reply all
Reply to author
Forward
0 new messages