The paper provides analytical models that explain how the different parameters used in BLIS can be mapped to hardware features on CPU architectures.
The minimum size of the micro-tile of C (MR x NR) must be large enough to avoid stalls in the computational pipelines. The maximum size must still fit within the available number of architecture registers. Either MR or NR is set to be a multiple of the SIMD length.
For the Haswell machine, the minimum size is 40 for double precision, and the 3 different configs meet this minimum required. They differ in how the microkernel is computed, e.g. the choice of instructions.