Dear Peter,
I am interested in running Migrate to estimate migration rates between two fish populations. I was hoping to get some advice on what you think the best approach would be to generate input files from unphased WGS SNP data.
The first (and best) option you show in your tutorial (https://peterbeerli.com/tutorial/bbcamp/tutorial.html) is to use the reference genome to reconstruct sequences. But, without phased data I cannot accurately reconstruct the haplotypes. I looked into phasing the data using only short-read data with WhatsHap, but that did not produce anything I could use. Genomic intervals were not phased consistently across individuals.
The other option is to use --linkedsnps. For
most demographic analyses I use a data set that consists of 70,000 (assumed) independently
segregating SNPs. This is a thinned data set where each site is in HWE, at least
5,000bp removed from the next (based on average linkage decay), and within a
200kbp window no correlation above r=0.2.
Am I right to assume that these SNPs are probably not the best to use as they
are independent and should not be linked?
My 70,000 SNPs were samples from a high-density
SNP dataset (~5 million). I was planning on excluding any SNPs within 20kbp of
any coding regions, remove any SNPs that are not in HWE and then look for
genomic regions of 5,000 base pairs with high-density of assumed neutral SNPs. I’ll
select the genomic regions with the highest number of SNPs as they will contain
the most information. In
that case, I’d create single sequence of all selected SNPs and indicate the separate
loci in my input file as you suggest here (https://groups.google.com/g/migrate-support/c/j1PjoX0ICMI/m/UJQVTihVBgAJ).
Would you agree that this is an accurate way to generate a dataset from
unphased WGS data, or am I overcomplicating things?
Any advice would be very welcome :D
Small additional question, would you recommend filtering out loci with a minimum allele frequency of 0.05 or 0.01? I’m not sure how well Migrate responds to rare variants.
Many thanks,
Tom