BLIS auxiliary blocksize meaning and matrix size thresholds question

10 views
Skip to first unread message

Igor Kozachenko

unread,
Jul 17, 2024, 5:58:02 PM7/17/24
to blis-devel

QUESTION 1. I am confused about the interpretation of the auxiliary blocksizes.

 

I am reading the following digression in the blis/docs/ConfigurationHowTo.md.

 

Digression: Auxiliary blocksize values for cache blocksizes are interpreted as the maximum cache blocksizes. The maximum cache blocksizes are a convenient and portable way of smoothing performance of the level-3 operations when computing with a matrix operand that is just slightly larger than a multiple of the preferred cache blocksize in that dimension. In these "edge cases," iterations run with highly sub-optimal blocking. We can address this problem by merging the "edge case" iteration with the second-to-last iteration, such that the cache blocksizes are slightly larger --rather than significantly smaller -- than optimal. The maximum cache blocksizes allow the developer to specify the maximum size of this merged iteration; if the edge case causes the merged iteration to exceed this maximum, then the edge case is not merged and instead it is computed upon in separate (final) iteration.

 

From the description, it follows that the auxiliary block size is the maximum sum of the optimal block size and the edge block size for the last iteration (edge iteration) to be merged with before last iteration, instead of the last iteration be a separate iteration.  Thus, the maximum block size, i.e. auxiliary block size,  is ALWAYS larger than the optimal block size.

 

According to this logic,  if KC_optimal = 512, K = 1026, then the edge iteration size is 512, edge iteration size is 2. The last edge iteration merged with the before last iteration would be 514. If we set the maximum block size to 512, then the merge should not happen, and the last edge iteration of size 2 happens. If we set the maximum block size to 640 , then the last iteration is merged with the before last iteration resulting in the merged iteration of size 514.

 

In contrast, looking at blis/config/zen4/bli_cntx_init_zen4.c:

 


 37 /*

 38  * List of default block sizes for zen4.

 39  * Converted it to macro as this list is used at multiple places in this file.

 40  */

 41

 42 #define BLI_CNTX_DEFAULT_BLKSZ_LIST_GENOA(blkszs) \

 43         /*                                           s      d      c      z */  \

 44         bli_blksz_init_easy( &blkszs[ BLIS_MR ],    32,    32,     3,    12 );  \

 45         bli_blksz_init_easy( &blkszs[ BLIS_NR ],    12,     6,     8,     4 );  \

 46         bli_blksz_init_easy( &blkszs[ BLIS_MC ],   512,   128,   144,    60 );  \

 47         bli_blksz_init     ( &blkszs[ BLIS_KC ],   480,   512,   256,   512,    \

 48                                                    480,   320,   256,   160 );  \

 49         bli_blksz_init_easy( &blkszs[ BLIS_NC ],  6144,  4002,  4080,  2004 );  \

 50                                                                                 \

 51         bli_blksz_init_easy( &blkszs[ BLIS_AF ],     5,     5,    -1,    -1 );  \

 52         bli_blksz_init_easy( &blkszs[ BLIS_DF ],     8,     8,    -1,    -1 );  \

 53

 54

 55 #define BLI_CNTX_DEFAULT_BLKSZ_LIST_BERGAMO(blkszs) \

 56         /*                                           s      d      c      z */  \

 57         bli_blksz_init_easy( &blkszs[ BLIS_MR ],    32,    32,     3,    12 );  \

 58         bli_blksz_init_easy( &blkszs[ BLIS_NR ],    12,     6,     8,     4 );  \

 59         bli_blksz_init_easy( &blkszs[ BLIS_MC ],   512,    64,   144,    60 );  \

 60         bli_blksz_init     ( &blkszs[ BLIS_KC ],   480,   512,   256,   512,    \

 61                                                    480,   320,   256,   160 );  \

 62         bli_blksz_init_easy( &blkszs[ BLIS_NC ],  6144,  3600,  4080,  2004 );  \

 63                                                                                 \

 64         bli_blksz_init_easy( &blkszs[ BLIS_AF ],     5,     5,    -1,    -1 );  \

 65         bli_blksz_init_easy( &blkszs[ BLIS_DF ],     8,     8,    -1,    -1 );  \

 

one can infer from lines 47,48 (for double precision), that the meaning of the auxiliary block size KC is most likely “the maximum size of the last edge block to be merged with the previous optimal sized block into a single iteration”. And the last edge block which size is larger than the maximum block size, i.e. auxiliary block size, will not be merged with the previous iteration and will have its own iteration.

 

Could you please resolve my concern about the definition of the auxiliary blocksize.

 

 

QUESTION 2. Descriptions of the SUP thresholds.

 

 

Looking at blis/config/zen4/bli_cntx_init_zen4.


269         // Initialize sup thresholds with architecture-appropriate values.

270         //                                           s      d      c      z

271         bli_blksz_init_easy( &thresh[ BLIS_MT ],   682,  1000,   380,   110 );

272         bli_blksz_init_easy( &thresh[ BLIS_NT ],   512,  1000,   256,   128 );

273         bli_blksz_init_easy( &thresh[ BLIS_KT ],   240,   220,   220,   110 );

274


334         // Initialize level-3 sup blocksize objects with architecture-specific

335         // values.

336         //                                           s      d      c      z

337         bli_blksz_init     ( &blkszs[ BLIS_MR ],     6,    24,     3,    12,

338                                                      6,     9,     3,    12 );

339         bli_blksz_init_easy( &blkszs[ BLIS_NR ],    64,     8,     8,     4 );

340         bli_blksz_init_easy( &blkszs[ BLIS_MC ],   192,   144,    72,    48 );

341         bli_blksz_init_easy( &blkszs[ BLIS_KC ],   512,   480,   128,    64 );

342         bli_blksz_init_easy( &blkszs[ BLIS_NC ],  8064,  4080,  2040,  1020 );

343

 

 

Where can I read more about SUP thresholds and their meaning and function? I suppose, SUP stands for small unpacked matrices.

 

I also see the thresholds in at blis/config/zen4/bli_family_zen4.h.

 


 44 #define BLIS_ENABLE_SMALL_MATRIX

 45 #define BLIS_ENABLE_SMALL_MATRIX_TRSM

 46

 47 // This will select the threshold below which small matrix code will be called.

 48 #define BLIS_SMALL_MATRIX_THRES        700

 49 #define BLIS_SMALL_M_RECT_MATRIX_THRES 160

 50 #define BLIS_SMALL_K_RECT_MATRIX_THRES 128

 51

 52 #define BLIS_SMALL_MATRIX_A_THRES_M_SYRK 96

 53 #define BLIS_SMALL_MATRIX_A_THRES_N_SYRK 128

 


How do these thresholds relate to each other?


 

Thank you,

Matthews, Devin

unread,
Jul 17, 2024, 6:49:25 PM7/17/24
to Igor Kozachenko, blis-devel

Question 1: This file must come from AMD’s version of BLIS since Zen4 support hasn’t been merged into vanilla BLIS yes. It looks like the maximum block size is being set “incorrectly”, that is, because 320 < 512 an edge case can never actually be merged. I’ll pass this on to our colleagues at AMD.

 

Question 2: The AMD version of BLIS has two separate mechanisms for dealing with small matrix multiplications. The “SUP” mechanism also exists in vanilla BLIS and uses the thresholds set in bli_cntx_init_zen4. The other “small matrix” mechanism is AMD-specific and uses a different set of thresholds. From https://github.com/amd/blis/blob/7c564c74e103249b52636e6cfc5a93ba8c2b0406/frame/compat/bla_gemm_amd.c it looks like a) those macros aren’t actually used and the thresholds are hardcoded instead and b) this check only happens when dgemm is called and not bli_gemm/bli_dgemm.

 

Devin Matthews

 

From: 'Igor Kozachenko' via blis-devel <blis-...@googlegroups.com>
Date: Wednesday, July 17, 2024 at 4:59 PM
To: blis-devel <blis-...@googlegroups.com>
Subject: [blis-devel] BLIS auxiliary blocksize meaning and matrix size thresholds question

You don't often get email from blis-...@googlegroups.com. Learn why this is important

--
You received this message because you are subscribed to the Google Groups "blis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blis-devel+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-devel/f8babbef-ab1c-4988-ad90-338aa47b7418n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages