Under magmablas/sgemm_vbatched.cpp, there are three different interfaces other than magmablas_sgemm_vbatched. The one with almost no overhead or memory-copies would be
magmablas_sgemm_vbatched_max_nocheck(
magma_trans_t transA, magma_trans_t transB,
magma_int_t* m, magma_int_t* n, magma_int_t* k,
float alpha,
float const * const * dA_array, magma_int_t* ldda,
float const * const * dB_array, magma_int_t* lddb,
float beta,
float **dC_array, magma_int_t* lddc,
magma_int_t batchCount,
magma_int_t max_m, magma_int_t max_n, magma_int_t max_k,
magma_queue_t queue );
As you can see, there are three extra parameters that hold the maximum values of (m, n, k) across the batch. You don’t have to pass the exact maximums. Upper-bounds are ok, but the tighter the better. Note that this routines does not perform any error checks, so you have to be sure about the dimensions you pass.
Ahmad