demographic modeling of high polyploid populations

58 views
Skip to first unread message

Juan Manuel Gorospe Ballesteros

unread,
Aug 7, 2023, 6:21:43 AM8/7/23
to dadi-user

Dear all, 


I am studying the role of gene flow in the speciation of two sister species. Polyploidy has been suggested for both species and from chromosome counts they are thought to be decaploid (however this is uncertain as the sister genus is not known). I am working with a non-model group so I don’t have a reference genome and I used de novo assembly of RADseq loci to generate a vcf file. I have several population pairs each including two populations (one from each species) with 10 individuals in each population.


For my demographic analyses I followed the D.Portik pipeline with the purpose of evaluating the fit of ‘no migration’, ‘ancient migration’ and ‘secondary contact models’. For one of the population pairs, I have used a diploidized dataset and I have also tried the analysis accounting for polyploidy. 

For the polyploid dataset, I have used the polyRAD pipeline to call SNPs assuming my species are autodecaploid (based on phylogenetic  and genetic structure analyses that do not suggest the existence of two subgenomes). Then, I applied quality filters in each dataset (diploidized & decaploid) for minimum allele frequency, missing data and depth coverage. I ran the ‘standard’ dadi approach for each dataset, but I considered the increase in sample size in the case of the decaploid dataset. 

The diploidized dataset contains about 11800 SNPs and the maximum number of segregating sites is about 10500, while the polyploid dataset contains about 5600 SNPs and projecting down to 50 (in each population) I get 1703 segregating sites (without projection I get 1706). I chose projecting down to 50 to save computational resources as segregating sites do not differ too much. 

The diploidized dataset and the decaploid projecting down to 50 support the ‘symmetric migration model’ based on AIC.


With this in mind, I would like to raise some concerns about my analyses:

 

I am unsure if using the ‘standard’ dadi approach for a autopolyploid dataset is fine, or if it violates some assumptions of the method. 


Also, as the SNP calling already includes some uncertainty I would like to know if my SFS for the polyploid dataset looks correct. It is quite different from those in the diploidized dataset and what I have seen around from other people and publications. 

I enclose a decaploid SFS plot projecting down to 50, one without any projection and also the diploidized SFS. 

Would you have any opinion on how to interpret the SFS? Would you have any recommendations regarding projecting down or not in the case of the decaploid dataset?


Any advice is greatly appreciated. Thank you


Juan Manuel


decaploid SFS plot without any projection 


decaploid SFS plot projecting down to 50 


diploidized SFS plot

Ryan Gutenkunst

unread,
Aug 11, 2023, 7:38:25 PM8/11/23
to dadi-user
Hello Juan,

Interesting system!

1) The dadi inference approach should be valid for autopolyploids. The one exception is in interpreting the parameter values. Essentially each of your decaploid individuals contains 5 “diploid individuals”. So when you convert parameter from genetic units into physics units, you’ll need to take that into account. (For example, the default calculation of theta = 4*Nref*mu*L assumes diploidy. So if you use this formula to calculate Nref=5000 “diploid individuals”, your actual estimated number of individuals would be 1000.)

2) I don’t see anything obviously problematic with the polyploid spectra. You could try projecting it down to the same sizes as your diploidized SFS, just to ensure it is qualitatively similar to the diploiized SFS.

Do be careful in interpreting results from a standard set of models like the Portik pipeline. If there are population size dynamics you’re not modeling, that can bias inferences of migration: https://academic.oup.com/mbe/article/38/7/2967/6149129 .

Best,
Ryan
> decaploid SFS plot projecting down to 50
>
>
> diploidized SFS plot
>
>
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/6dcdadb2-cfcf-41f0-8846-3db698b32fe5n%40googlegroups.com.

Juan Manuel Gorospe

unread,
Aug 23, 2023, 9:33:12 AM8/23/23
to dadi-user
Hello Ryan,

Thank you very much for your reply and your advice.

I will keep in mind your suggestions for the interpretation of parameters. Also, I am trying now to project down my decaploid SFS to the same sizes as my diploidized SFS.

Thanks for the heads up about the population size dynamics. I was not considering models with population size changes because my study group comprises plants from the alpine zone in tropical mountains and it is hypothesised that during the Pleistocene (when the two species I focus on lekely diverged) there were many population dynamics driven by climatic oscillations. Considering this, I thought just accounting for population size change in two epochs would be unrealistic and a model that reflect the hypothesis would be extremely complex. Do you have any opinion on this? 

Best,
Juan Manuel

Ryan Gutenkunst

unread,
Aug 23, 2023, 6:06:35 PM8/23/23
to dadi-user
Hello Juan,

In general, any demographic history model will be a crude approximation to the complex dynamics the species experienced. But even a simple two epoch model will be a closer approximation to than simply ignoring size changes, so it is likely to improve the quality of your other inferences.

Best,
Ryan
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/965e6974-df1b-48f1-b8dd-3defba9af8e0n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages