Hardware executes invocations in fixed-width "subgroups" - it's the number of SIMD lanes in the SIMT execution. The size depends on the vendor and architecture, but I'm told it's typically 32 or 64. There are also details that make things more complicated than that but that's a first approximation.
I'm not an expert in this, so take what I say with a grain of salt (and someone can correct me if I'm wrong). Anecdotally, if your workgroup is smaller than the subgroup size on a given piece of hardware, drivers usually just leave the other lanes empty, instead of combining multiple workgroups into one subgroup. On the other hand, if the workgroup (say 64) is larger than a subgroup (say 32), then it will just de-parallelize (run 32 up to the first control flow barrier, then switch to run the other 32, and so on until it's done). Hence for reasonable performance, we want to make sure we have workgroups which are a multiple of, and at least as large as, the largest common subgroup size (which is 64) without being so large it defeats parallelization. The simple choice of 64 works well overall, even though tweaking workgroup sizes can improve performance depending on the hardware and exact shader logic.