vcfR users: read this

82 views
Skip to first unread message

Zhian Kamvar

unread,
Jan 31, 2020, 12:13:45 PM1/31/20
to poppr
Hello poppr users,

A few weeks ago, I had promised to make a notice about a slight gotcha that you may encounter with data from vcfR [1].

If you use vcfR2genind(), you need to use the option return.alleles = TRUE (available in version 1.9).

Poppr accomodates polyploid genind objects in both poppr.amova() and locus_table() by interpreting zero ("0") alleles as NULL. vcfR by default will export its genotypes where a 0 is a code for a valid minor allele, which can be individual nucleotides or insertions. When the zero-coded alleles are passed to poppr.amova() and locus_table(), they will give incorrect results. Thus, to prevent this, you should use return.alleles = TRUE when importing genind object from vcfR.

Note: if you have strictly bi-allelic SNPs, then using vcfR2genlight() would be a much better option.

I will be happy to clarify any confusion on this thread.

Best,
Zhian


Brian Knaus

unread,
Jan 31, 2020, 2:33:13 PM1/31/20
to Zhian Kamvar, poppr
Hi Zhian,

According to the VCF specification 0 is the reference allele and alternate alleles are encoded with 1, 2, .., n. Also according to the VCF specification missing data are encoded with a period ("."). When read into vcfR I try to convert the periods to NA when its easy. Would a solution be to have vcfR2genind() add 1 to all non-NA alleles? Should it convert NA to NULL?

Thanks!
Brian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/7dd70cec-2bc9-4576-a905-6466ddf7509a%40googlegroups.com.


--

Zhian Kamvar

unread,
Jan 31, 2020, 7:44:43 PM1/31/20
to Brian Knaus, poppr
(N.B. Brian Knaus is the maintainer of the vcfR package and supporter of poppr)

Hi Brian,

I think it's perfectly fine for vcfR to return the alleles according to the VCF specification. It's a perfectly cromulent way of representing allelic states in a compact way. I don't think it's worthwhile to shift the encoding by one in order to satisfy an idiosyncrasy of poppr, which is why I recommend to the users to use the return.alleles = TRUE argument. I also offer a couple of reasons why it would be a good idea for users to return alleles when converting from vcfR to genind:

1. You get the information of the allelic states, which is important for analyses such as DAPC where you can investigate the different discriminant axes and see what alleles are most 'influential'. With encoded alleles, you would have to go back to the original VCF and look up the values
2. It's immediately clear to see if the alleles are single nucleotides or indels

So, to answer your question: no, I don't think you need to do anything extra. You added the return.alleles argument in version 1.9.0, which allows users to avoid the original issue. Moreover, the recommended practices are documented in the help documentation for vcfR2genind ( see help('vcfR2genind') ).

Best,
Zhian

Brian Knaus

unread,
Feb 1, 2020, 8:39:43 PM2/1/20
to Zhian Kamvar, poppr
Thanks for the clarification Zhian! Hope you had fun at rstudio::conf! I'm jealous. Safe travels!
Brian
Reply all
Reply to author
Forward
0 new messages