Dear Ryan and dadi-users:
I’m new to the dadi. I am working with a diploid crop specie (lettuce). The populations are different horticultural types, for example, crisphead, butterhead or romaine. I'm trying to fit the 1D models to each of the populations before moving onto the 2D models. The SNPs are called from the RNAseq and only synonymous SNPs (intergenic, intron, and synonymous SNPs) were used in the dadi analysis. Due to the diversity of different gene expression, there are a lot of missing SNPs. So I project down my sample size using fs.S() to maximize the number of segragating SNPs. For example, butterhead population including 27 individuals; different numbers were used to project down sample size (The first attachment is the segregating SNPs on a specified sample size to project down), and then the files were used to plot FS. Using the sample size as 54 (without projecting down), the population displayed a zigzag trend, but using sample size as 20 (in this case, it has the maximize number of S), the population exhibit normal. To compare the results under the different sample size, I use sample sizes as 20, 30, 40 and 54 to fit the five 1D models implemented in the dadi (see the second attachment for the inferred parameters). When sample size was 20, the results showed that four models (growth, bottlegrowth, two_epoch and three_epoch) have the almost same likelihood score but get very different parameter values. As the sample size increase, the difference of the likelihood score between these four model become larger but not significant. According the literature, butterhead-like form plants appeared about five hundred years ago and it must be have a bottleneck history. But the results cannot support the bottleneck model (The parameter values inferred by the bottlegrowth model reached the limit). I cannot determine which model is the best. If you could answer a few questions for me:
(1) Should I project down the sample size?
(2) If yes, which sample size should I use? As you can see, in the first attachment, there was no significant difference when sample size from 10 to 40 with respect to the number of segragating SNPs. I want to know will it have an effect on the dadi performance when project down to a much smaller sample size (i.e. 20; the population have 27 individuals)?
(3) How to choose the best model? Is there something wrong with my data ?
Any help would be greatly appreciated.
Thanks!
Lei zhang
--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.
<butterhead.max_S.log><project_20-30-40-54.parameters.txt><butterhead_20_combined.pdf><butterhead_30_combined.pdf><butterhead_40_combined.pdf><butterhead_54_combined.pdf>
--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.
Hi Ryan,
Thanks a lot for your answer. The generation time of lettuce is one year. The lettuce we sequenced were inbred lines, most of the loci are homozygous. As you suggested, I only take one allele from each accession and then plot the FS for every horticultural types with non-projected data. None of them displayed a zigzag trend. But they displayed fluctuations at the middle of the x axis except serriola population (wild relatives of cultivated lettuce). If I projected down to a smaller sample size, the fluctuations disappeared. But as you said, it’s still there.
I want to know if these fluctuations are normal. It is noted that some of the accessions have some degree of kinship in the population. I calculated the pair-wise genetic similarity among these accessions based on the SNP data. Some of them are almost same at the genotype level, for example, there is only 1292 SNPs between accession A and B (the coverage of RNAseq is 30Mb). Is this the reason why they the displayed fluctuation on the FS? Should I remove samples that have high level of genetic similarity with others?
Many thanks in advance
Lei zhang
Thanks a lot for your answer. The generation time of lettuce is one year. The lettuce we sequenced were inbred lines, most of the loci are homozygous. As you suggested, I only take one allele from each accession and then plot the FS for every horticultural types with non-projected data. None of them displayed a zigzag trend. But they displayed fluctuations at the middle of the x axis except serriola population (wild relatives of cultivated lettuce). If I projected down to a smaller sample size, the fluctuations disappeared. But as you said, it’s still there.
I want to know if these fluctuations are normal. It is noted that some of the accessions have some degree of kinship in the population. I calculated the pair-wise genetic similarity among these accessions based on the SNP data. Some of them are almost same at the genotype level, for example, there is only 1292 SNPs between accession A and B (the coverage of RNAseq is 30Mb). Is this the reason why they the displayed fluctuation on the FS? Should I remove samples that have high level of genetic similarity with others?
<all_fs_with_non-projected data.pdf><all_fs_with_projected data.pdf>