Jakin,
I recommend that you update your plugins for v3. v2 is in maintenance mode now (only bugfix releases). Once 3.0.0 is complete, we will stop maintaining v2.
I don't have a single precision build to use to run the benchmarks, nor do I support such a build configuration. For 3.x, we plan to implement a mixed precision mode that can give most of the performance benefits of single without loss of accuracy. Here are the double precision performance numbers on V100:
lj-liquid: Hours to complete 10e6 steps: 0.6721325771882073
triblock-copolymer: Hours to complete 10e6 steps: 0.9222339257454073
microsphere: Hours to complete 10e6 steps: 14.320493993917205
quasicrystal: Hours to complete 10e6 steps: 1.691228061921023
These are approximately 2x your single precision numbers on RTX3090 which is consistent with the fact that V100 and 3090 have approximately the same memory bandwidth and doubles are 2x the size of floats. HOOMD performance is entirely limited by the available memory bandwidth and the effectiveness of the on-chip cache.
V100 and RTX 3090 are a generation apart. The datacenter version of the 3090 is the A100 - but I don't have access to any A100s for testing.