Constructing EBSP with GBS SNPs (DArTseq)

31 views
Skip to first unread message

Monica Fahey

unread,
Apr 22, 2022, 1:28:23 AM4/22/22
to beast-users

Hi Beast users

 I've got DArTseq markers generated by GBS for 3 panmictic populations. DArTseq is not ideal for EBSP as each loci is only 60-100 bp long, with one SNP per loci (but up to 4).


Depending on how the data quality is filtered, the datasets are:

 pop1 - 139 samples, 5301 SNPs (17 samples with 50-86% missing data)

pop 2 - 22 samples, 2709 SNPs (2 samples with 62-79% missing)

pop 3 - 10 samples, 1360 SNPs (2 samples with 62-65% missing)

 

After filtering markers for LD, the datasets would be reduced by about half to a third, depending on linkage threshold employed.

 

A few questions:

1. Is there enough information for Pop 3? Pops 2 and 3 could be pooled as they are closely related, but this would violate assumption of panmixia.

 2. How is the coalescent affected by missing data? Some of these samples have a lot of missing data. Should I remove samples with >50% missing data in order to retain more loci, or is it better to keep more samples and remove loci with low call rate?

 3. I’ve only found one paper that used DArTseq for EBSP and the authors concatenated the SNPs for each sample into one alignment. Am I correct in assuming that this would seriously distort estimation of coalescent times? I was planning to load each SNP/loci as a separate alignment.

 4. Considering how short the loci are, would it really improve estimation of coalescent times by including the whole sequence, or can I just include the SNPs?

 

Any feedback or suggestions would be greatly appreciated!

 Cheers

Monica

Reply all
Reply to author
Forward
0 new messages