Best approach for generating migrate input from unphased WGS SNP data

41 views
Skip to first unread message

tom.o...@gmail.com

unread,
Apr 2, 2024, 11:25:37 AM4/2/24
to migrate-support

Dear Peter,

I am interested in running Migrate to estimate migration rates between two fish populations. I was hoping to get some advice on what you think the best approach would be to generate input files from unphased WGS  SNP data.

The first (and best) option you show in your tutorial (https://peterbeerli.com/tutorial/bbcamp/tutorial.html) is to use the reference genome to reconstruct sequences. But, without phased data I cannot accurately reconstruct the haplotypes. I looked into phasing the data using only short-read data with WhatsHap, but that did not produce anything I could use. Genomic intervals were not phased consistently across individuals.

The other option is to use --linkedsnps. For most demographic analyses I use a data set that consists of 70,000 (assumed) independently segregating SNPs. This is a thinned data set where each site is in HWE, at least 5,000bp removed from the next (based on average linkage decay), and within a 200kbp window no correlation above r=0.2.            
Am I right to assume that these SNPs are probably not the best to use as they are independent and should not be linked?

My 70,000 SNPs were samples from a high-density SNP dataset (~5 million). I was planning on excluding any SNPs within 20kbp of any coding regions, remove any SNPs that are not in HWE and then look for genomic regions of 5,000 base pairs with high-density of assumed neutral SNPs. I’ll select the genomic regions with the highest number of SNPs as they will contain the most information. In that case, I’d create single sequence of all selected SNPs and indicate the separate loci in my input file as you suggest here (https://groups.google.com/g/migrate-support/c/j1PjoX0ICMI/m/UJQVTihVBgAJ).
Would you agree that this is an accurate way to generate a dataset from unphased WGS data, or am I overcomplicating things?

Any advice would be very welcome :D

Small additional question, would you recommend filtering out loci with a minimum allele frequency of 0.05 or 0.01? I’m not sure how well Migrate responds to rare variants.

 Many thanks,    

Tom

Reply all
Reply to author
Forward
0 new messages