Hi Josquin,
Wow that's quite the fun research project! And quite the neat scientific background.
Times like this I wish I had free time to build a different version of STITCH that I think would work better on these kinds of situations. But alas, I am overwhelmed with teaching and marking etc. But I think there are ways you can tweak STITCH that are likely relevant to your situation.
K=40 (or even K=20) should hopefully still be OK. What you might need to tweak are as follows. So you might need to set nGen much higher than you would anticipate, because at K=400 it might be true that everyone would have an ancestor ~3-5 generations ago, at K=20 or K=40, the time until your sample set had 20 or 40 common ancestors might be a lot higher (like 10,000). You should also take a look at expRate, minRate and maxRate the recombination rate bounds (related to nGen) (set to defaults appropriate for mice and human of 0.5 cm / Mb with bounds at 100 and 0.1, but again, might need different for your situation). If you turn on plot_shuffle_haplotype_attempts those plots are usually pretty useful to get a sense of how recombination is affecting things (if you generate one of those plots, I'm happy to take a look and walk you through it). Going from randomly initiated ancestral haplotypes to good ancestral haplotypes requires some powerful heuristics, which often get better with tweaking of parameters like these. Also, I've seen good results before dropping shuffle_bin_radius a lot. This is default 5000, and again, is set up for recombination hotspots like you would see in mice and humans, where recombination is clustered into a small number of hotspots representing a population with only a few active PRDM9 alleles, while your trees, likely do not have recombination hotspots, or there are so many of them as to make "hotspots" almost irrelevant. shuffle_bin_radius controls how much smoothing to perform here. With many many more SNPs you can identify them more easily than in mice / humans, so you don't need to smooth things (in fact, over-smoothing is a problem).
But yeah, happy to take a look at some plots with you, to try and tweak STITCH to work as well as possible here. Either email back to the list or just my gmail. This can be helpful, this tweaking. For example, on work with Frank Chan and colleagues, now on bioRxiv
https://www.biorxiv.org/content/10.1101/2020.05.25.113688v2, for butterflies, with very high Ne and recombination rate, we saw substantial improvements in accuracy changing the default parameters (in that case to shuffle_bin_radius = 100, maxDifferenceBetweenReads = 1e3 (reduce impact of each read, as mapping artefacts are high like with your assembly), shuffleHaplotypeIterations <- c(5, 10, 15, 20, 30, 40) (more of these as useful), niterations = 60 (to accomodate more shuffleHaplotypeIterations, K = 10 (appropriate here), nGen = 100000 (wild population).
Best wishes,
Robbie