For a single mesh (big constraint), you do not get much speedup beyond 4 openmp threads (1 thread per physical core)---this maxes out at a factor of 2 speedup. Most cpus have many more cores than that these days. So, a small number of cores with fast clock rate and fast internal memory access is what you want to target.
But you should be aware that you will get much better performance by utilizing the cores as MPI processes, breaking up the domain into multiple meshes. In that case, if you have a cpu with, say, 16 cores, as long as you have enough RAM, you can run a case with 16 meshes with at least 1000 cells/mesh and get much closer to linear speedup, meaning 16 times faster than a single mesh with one openmp thread.