Model interpretation and high theta values

Matthew DeSaix

unread,

Jan 6, 2021, 10:39:40 AM1/6/21

to dadi-user

Hi Ryan and dadi-user group,

Thank you for all your help with my other questions directly so far and having all the detailed threads to peruse.

I have completed an analysis of three 1D models (1 - 3 epochs) for a species I was interested in seeing about historical population change because they are an alpine specialist and I was expecting declines since LGM. For each model, I ran 100 iterations of the dadi_pipeline (100 optimization rounds per iteration, Portik et al. 2017) and checked for convergence of the results. Only the one epoch model appeared to converge to a global optimum (albeit very poor), but the two and three epoch models didn't necessarily converge on the same parameters with their best models, though their parameters followed similar patterns, i.e. the best two epoch models all decreased in pop size, and the three epoch increased and then decreased. While the two epoch model had the highest likelihood, the theta values were extremely high and the calculations of T and Nu were magnitudes higher than I expected. L was already estimated at ~30,000,000 so even if that was underestimated it wouldn't be producing the expected T and Nu. I did realize though this species is part of a species-complex that has historically high levels of gene flow so maybe that factor is influencing the very high levels of theta. One aspect of this data set that I'm not sure would affect dadi is that males and females have high genetic differentiation, e.g. PCA picks up on sex differentiation over the very weak population structure, so I also ran dadi on just the males.

I hope that explanation sets up my questions well. I have attached my results I had compiled in markdown (models, residuals, LL, thetas, parameters, L) and my specific questions are:

1) Is it reasonable that a large species complex with gene flow would produce extremely high theta values when sampling just one species that is pretty small (abundance estimates ~95,000, effective size via SNeP ~1000)?

2) Even if a model isn't converging on a best set of parameters, if numerous models after 100 optimizations follow the same patterns (e.g. Nu2 decreases from Nu1), is that reasonable enough support for the model but not the specific set of parameters?

Thanks so much, I really appreciate the help.

Matt

dadi-summary-rofi-short.html

Ryan Gutenkunst

unread,

Jan 12, 2021, 2:33:00 PM1/12/21

to dadi-user

Hello Matt,

Sorry for the slow reply…

Thanks for sending along the plots, etc. Based on those, the three epoch model isn’t adding anything to the fit compared to the two epoch model, and in fact you’re not even succeeding in optimizing it. (The two epoch model is nested within the three epoch model, so the three epoch model should always give at least as good a likelihood.)

In looking at the residual plots, I notice that the singletons and doubletons are very poorly fit by all the models. I question whether there is some issue with your SNP calling so that you’re missing rare variants (particularly singletons). I suspect the extreme parameter values you’re seeing are dadi contorting itself to fit some bias in your low-frequency variants. One way to check this is to run fits with the singletons masked. I bet you’ll get very different results if you ignore that bin in the FS.

In answer to your questions:
1) If gene flow were large enough, you might be picking up signal for the entire complex, so yes it could lead to higher Ne.
2) I would be cautious about interpreting models that aren’t converging. That’s often anecdotally a sign that the model simply doesn’t fit the data well.

I’m also confused to hear males and females are showing up as genetically differentiated. Unless you’re including the sex chromosomes in your analysis (you shouldn’t), I can’t see how that would be possible. Mating is the ultimate in gene flow! Maybe there’s some biology I’m missing here, but that’s a big red flag to me.

Best,
Ryan

> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/9221fec6-908b-447f-b7cf-6ed03c2f13a9n%40googlegroups.com.
> <dadi-summary-rofi-short.html>

Matthew DeSaix

unread,

Jan 12, 2021, 6:51:30 PM1/12/21

to dadi...@googlegroups.com

Thanks Ryan for your response, that all is very helpful.

In regards to the singletons, I filtered a vcf on quality, depth, and proportion of individuals that had the variant. There should be no MAF filters that would have distorted that. Would small, potentially inbreeding, populations account for the deficiency of low-frequency variants, but it seems the model would account for that with a sharp drop in individuals? I found long runs of homozygosity that would indicate recent inbreeding. Anecdotally, I've talked with at least two other people who have found sex differentiation on PCAs in weakly structured species and looking at genome-wide Fst I do have peaks of differentiation not on sex chromosomes (I've double checked the chromosome mapping results to make sure it wasn't due to poor mapping from scaffolds). I'm not sure if or how all of this ties together but I'm still trying to figure out what's going on with the sex differentiation.

Thanks for your suggestions and I will check out the masking option.

Cheers,

matt

You received this message because you are subscribed to a topic in the Google Groups "dadi-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dadi-user/v1qCgelDIHc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dadi-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/3ACA3168-1182-4445-A8C5-A04A2851C9EE%40gmail.com.

Ryan Gutenkunst

unread,

Jan 14, 2021, 5:30:29 PM1/14/21

to dadi-user

Hi Matt,

Inbreeding would lead to a dearth of singletons, so incorporating that into your model may make a big difference. Recent inbreeding is not equivalent to a decrease in population size, since inbreeding affects heterozygosity directly.

Best,
Ryan

> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/CANxcQY2W_eX_pX29mqkZNWtRoWtYd_K7m9s3Gj5L_kf4JdW2MA%40mail.gmail.com.

Reply all

Reply to author

Forward