Hi Ryan,
I have spent a lot of time optimizing some 2D models (>1,000 runs), using multiple rounds of 3,2 and 1 fold perturbations and trying to correctly explore the whole parameter space.
I was originally using a set of ~160,000 linked loci (no LD or physical pruning) with JAFS built from11 individuals from 2 populations using the full dataset with no projection (Group1-Group2_allSNPs_noprojection.png).
Once having obtained my final model, the residuals looked good with no clear structure (Group1-Group2_asym_mig_allSNPs_noprojection.pdf) but I was having issues getting parameters confidence intervals with the GIM (most intervals ranged from 0 to infinite despite running >1000 of bootstraps and trying several eps values).
Therefore I decided to use a set of unlinked loci in order to use the FIM. Based on rates of LD decay (LD_decay_allgroups.png) I chose to remove SNPs within 5kb of each other to obtain such dataset. Because this greatly reduced the number of SNPs to 25,000 the resulting JAFS was quite sparse (Group1-Group2_unlinked_5kb_noprojection.png) and therefore I chose to project down to 9 individuals per population (Group1-Group2_unlinked_5kb_projection_18-18.png).
Now after repeating the optimization steps and picking the best model, the residuals are highly structured indicative of a spurious fit (Group1-Group2_20mis_unlinked_5kb_asym_mig_all4.pdf). I am not sure what is the reason for it. I have tried using different thresholds (e.g. 1kb or 20kb) for the physical pruning but the problem remains the same. For comparison I also tried to project down my initial set of linked SNPs and use the optimized parameters to visualize the residuals and it was introducing a similar bias (Group1-Group2_20mis_asym_allSNPs_projected.pdf), but I haven't re-optimized the parameters for the projected spectrum in that case.
I should say that the model (divergence with asymetrical migration) was picked based on LLR tests with 4 other models (including divergence without migration and a no divergence model) where it was consistently better across runs.
Do you think the pruning/projecting procedure is removing informative SNPs causing this bias ? Do you have suggestions on a better way to proceed ?
Thanks a lot for your help,
Kind regards,
Hugo