Hi Richard,
unlike on CPUs where it's typical to specify the number of threads for a whole program manually, on GPUs most of the software
is (almost) always trying to automatically maximize occupancy (i.e. hw utilization), which needs careful tuning for each individual
kernel (i.e. the optimal block sizes for different kernels might be different as well, due to differing register or shared memory
utilization, etc).
So yes, nvBowtie tries to select the best grid configuration for each of its many kernels.
(incidentally, on CPUs this is not necessary because their few "fat" cores are oversubscribed with hw resources, but this obviously
comes at a huge cost in energy efficiency)
best,
-jacopo