Philipp,
I've done some convergence studies using a prior version of VSPAERO (before 3.16.1 release) and have found the following.
For simple wings in isolation, you aren't getting any additional useful information by setting the Num_W tesselation beyond 61, maybe 81 if you really think that you need it based on your LE/TE clustering.
As with any numerical solver, you want your lattice or grids to be relatively close to an aspect ratio of 1. This implies that your spanwise, Num_U, spacing should be set up according to your span and clustering intentions. The SectTess_U parameter is limited to 100 so if, and a big IF, you decide that you need more than 100 spanwise sections, you will have to split into a multiple section wing, but I don't recommend this.
More relevant to your issue is that you want to put high cell concentration where most of the gradients are e.g., wingtips, behind propellers, wake interactions, etc. This is true for both panel mode and VLM mode.
Another thing you might try is changing the base units of your model. I've found that scaling everything to inches gives more steady results which leads me to think that there is a fixed decimal truncation somewhere in the normalization of values in the code. You can get a 10^-2 residual difference this way. Remember that OpenVSP operates in a relatively unitless fashion (length normalized) so you'll want to change the velocity and other flow inputs to match the units that you are using. I'm not sure if the Reynold's number in VSPAERO is length normalized or not. I've always assumed that the value of ReCref implies Reynold's based on reference chord.
Assuming that you are trying to find the best "wall time" for a large batch operation, here is something else to consider. Check the actual specs of your machine, either on your PC or server, and find out the number of cores per processor (right click Computer and Google the processor). Machines today use hyperthreading so you have additional "virtual cores" available. For example, an Intel i5 processor has 2 physical cores and 2 virtual cores for a total of 4. Num CPU will use up to this number of threads but no more. Here's where it gets fun. Hyper-threading actually slows down VSPAERO past a certain threshold. The calculations are so fast that the act of passing information between virtual and physical cores takes longer than the solution. If you run on a server with 40 (20 + 20), you'll find that trying to run on 20 cores will take MUCH longer than just 10. On a laptop, it doesn't matter much because you're only threading between two cores so who cares. Just set Num CPU to 4 and let it chug. But on a server, say with two 10 physical core processors, you want to set Num CPU to 10. This keeps much of the calculation to one physical processor and runs very quickly. 3 minutes compared to 10-15 minutes using NumCPU at 20.
Rob is absolutely right about the number of wake iterations. For simple wings in isolation, you only really need two. Three for a sanity check if you like. For propeller interaction or complex geometries/combinations, you'll use 5 to 7 but should never really need more than 10. What you should be checking is the History file and the convergence plot of the log(residual). You'll see that it quickly drops to around 10^-4 or 10^-5. Past that is just noise for the methods used. If you watch the GMRES iterations in the solver window, it shouldn't take long to get a 10^-1 drop in residual. Look at Reduction and Maximum. VSAPERO considers a wake iteration converged when it sees a Reduction by -1 (think 10^N, it's trying to reduce the residual by a factor of 1/10) and Maximum tracks the overall residual reduction (you'll see something like -3 or -4).
Some final words of advice. Check your actual airfoil points and look for highly clustered points near the LE or TE. This can cause issues in the solution. Drop it into XFOIL and respline the points into something more reasonable (140-200 points total) and you should be okay. The types of convergence that you can get with StarCCM, FUN3D, OVERFLOW, etc. are way beyond the scope of this level of analysis. It's overkill. If you're looking at a GA or commercial a/c, how much of a difference is 0.001 in CL? On a 10k lb small vehicle, it's about 5 lbs. On a larger 100k lb a/c, it would be 50 lbs. A 0.01% difference in lift. Even if you have residual differences at the 10^-2 level, it's a very small physical difference.
A lot of this is probably beyond the scope of what you need to know, but this seemed like an easy enough way to put the information out there before I formalize everything in an online training program. Hope this helps.
Cheers.