How to filter SNPs for BEAST2

490 views
Skip to first unread message

Yu Sugihara

unread,
Jul 13, 2018, 2:24:40 PM7/13/18
to beast-users
Hi, everyone.

My name is Yu Sugihara.
I have a question about filtering SNPs.

I read SNAPP-FAQ on website.
And it says "Can I use SNAPP with diploid data?; Yes, under the assumption that sites and markers are unlinked."
So I would like to filter SNPs which are close to each other on the same chromosome.
What is the best way to filter linked SNPs?
Or can I put my original (that is, non-filtered) nexus file into BEAUti?

Thank you.

Brandon Stark

unread,
Nov 23, 2018, 7:16:30 PM11/23/18
to beast-users
Yes, you can use the originall nexus file into BEATti.

Santiago Sánchez

unread,
Nov 24, 2018, 10:12:43 AM11/24/18
to beast...@googlegroups.com
Hi Yu,

There are a few ways to filter SNPs. Probably the easiest is with vcftools. For this you would need your data in VCF formal, then use the “—thin” argument and select a distance of your choice. Ideally you would want a LD decay analysis to inform you at roughly what distance most SNPs become unlinked. This is going to be difficult with RAD-tag type data, in particular with de novo loci. In this last case, you could simply select one SNP per locus and assume that each locus is far away enough.

I hope this helps,
Santiago
On Fri, Nov 23, 2018 at 7:16 PM Brandon Stark <xinggu...@gmail.com> wrote:
Yes, you can use the originall nexus file into BEATti.

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Alex Krohn

unread,
Oct 8, 2021, 1:16:58 PM10/8/21
to beast-users
Hi Santiago,

I'm running into this same problem. I have a VCF of sites from RAD data that I want to input into SNAPP. (Seems like it's a common format for SNPs and should be easy!) Do you have suggestions as to how I might filter these SNPs to one SNP per locus?

Thanks,

Alex

Santiago Sánchez

unread,
Oct 8, 2021, 6:11:19 PM10/8/21
to beast...@googlegroups.com
Hi Alex,

I don't know of a specific tool. But you could probably come up with a bash/unix solution without much hassle (or any other programming language for that matter).

For example, just using your VCF file you could use a bash pipeline like this:

#############
# get the vcf header
cat my_vcf_file.vcf | grep "^#" > head
# sample one SNP per loci randomly
# shuf randomly rearranges lines in a file/stream
# parallel executes a command in parallel by passing a list of arguments
cat my_vcf_file.vcf | grep -v "^#" | cut -f1 | uniq | parallel 'grep "^{}" my_vcf_file.vcf | shuf | head -1' > var
# combine header and variants
cat head var > my_vcf_file.one_snp.vcf
# delete temporary files
rm head var
#############

This assumes that you have these unix tools installed and available.

Hope this helps,
Santiago




Santiago Sánchez

unread,
Oct 8, 2021, 6:14:01 PM10/8/21
to beast...@googlegroups.com
I almost forgot I already had a tool written specifically for this: https://github.com/santiagosnchez/sing_snp_vcf

Alex Krohn

unread,
Oct 12, 2021, 1:27:38 PM10/12/21
to beast-users
Hi Santiago,

Thanks for the suggestion. I ended up finding this script to subset one SNP per rad locus (choosing the one with the highest coverage), then used vcf2phylip to convert to nexus.

-Alex

Santiago Sánchez

unread,
Oct 12, 2021, 2:54:58 PM10/12/21
to beast...@googlegroups.com
Hi Alex,

Cool! Thanks for sharing that.

Just note that that script only takes the first SNP of a RAD-tag locus, it does not select one randomly within the locus. So if you want to introduce less systematic error, better off with the random option ;-)

Santiago

Reply all
Reply to author
Forward
0 new messages