Vertex coordinates are in .surf.gii files, but they are specific to each individual, and the group averages of these coordinates are not great as representations of the location/folding in a typical subject, even with nonlinear volume registration.
Fundamentally, volume registration does not have good alignment of many parts of the cortex in humans, largely because the folding patterns have a lot of individual variations that would require absurd distortions to "flatten and re-fold" to match other subjects. Surface-based methods get around this problem by establishing cross-subject correspondence without anatomical deformation fields. Areal feature registration and the cifti file format improve on this by aligning based on features that identify functional areas and combining the surface data with subcortical structures into a single file for easier full-brain data handling. See our paper for details about the improvements over volume-based analysis:
However, as a result of surface methods largely ignoring anatomical coordinates when establishing correspondence, the group average surfaces are smoother than individual surfaces, particularly in areas of high folding variability, and as such are mostly only useful for display or approximation. You should instead use the MMP areas with each individual's surface files (.surf.gii), usually midthickness, to find where in the volume that piece of cortex is in that subject. In essence, for the cortex, every human subject has its own version of MNI space that is substantially more accurate to that subject's data than any groupwise version of MNI space.
For TMS or other spatially-targeted interventions, it would make more sense to work in the subject's rigid-aligned space (the T1w/ folder in our data), rather than one that has been distorted to look more like the MNI template. However, I understand there are some assumptions built into some software used for this - if we knew the details of how the software translates between MNI and subject space, in theory you could calculate where the target should be in subject space accurately, and then put it through the reverse transform before giving it to the software, to achieve the correct positioning.
On the other hand, TMS has a pretty large effect zone as I understand it, so "ballpark coordinates" may usually hit the area of interest anyway.
Tim