vcfR2genlight object

274 views
Skip to first unread message

Noor Abdelsamad

unread,
Jul 22, 2020, 3:56:43 PM7/22/20
to poppr
Hi, I'm new to poppr and R in general, I'm doing population analysis for a haploid pathogen. I'm trying to convert the VCF file that I have to genlight object, but for some reason I have got a warning (Warning message:
In vcfR2genlight(popdata) : Found 1701 loci with more than two alleles. Objects of class genlight only support loci with two alleles. 1701 loci will be omitted from the genlight object.). and here is the object content: 
/// GENLIGHT OBJECT /////////

 // 960 genotypes,  1,373 binary SNPs, size: 6.1 Mb
 1302200 (98.8 %) missing data

 // Basic content
   @gen: list of 960 SNPbin

 // Optional content
   @ind.names:  960 individual labels
   @loc.names:  1373 locus labels
   @chromosome: factor storing chromosomes of the SNPs
   @position: integer storing positions of the SNPs
   @other: a list containing: elements without names

My questions are:
1) Can I still use this genlight object in the analysis?
2) For haploid genomes, what is the best object to be used?

Thanks for the help
Noor

Brian Knaus

unread,
Jul 22, 2020, 6:14:23 PM7/22/20
to poppr
Hi Noor,

The rule of thumb I'm familiar with is that "Warnings" can be ignored if you understand them. Here you're getting a warning because your data includes loci that have more than two alleles (over your entire sample). The GENLIGHT object was not designed to handle this so they are omitted. I generally ignore this because I typically have a large proportion of loci that only have two alleles so this omission has a minor effect, if any, on the analysis. You appear to have more loci with > 2 alleles than with just 2 alleles which I would consider unusual. That may be because I tend to work on populations of one species. Are you analysing multiple species? Also, you appear to have 98.8% missing data. I think you should look into that because you might not have much to analyze.

I do not feel that there is any "best" object. Different packages written by different authors at different points in time who were working on data in different formats created different objects. I'd call it more of a "different strokes for different folks" matter. We've tried to create conversion tools to help with this. But if you want to work in poppr or adegenet, you'll probably need an object defined in the package. If you want to use ape you'll probably need an object defined in ape.

Good luck!
Brian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/8f48968a-7633-4e84-9083-ba395fa6482ao%40googlegroups.com.


--
Brian J. Knaus, Ph.D.
he/him/his
Corvallis, Oregon, USA
brianknaus.com

Brian Knaus

unread,
Jul 22, 2020, 7:59:41 PM7/22/20
to poppr
Hi Noor,

I think you will find that if you consult the VCF specification


you will find that periods (".") are missing data and are not homozygous positions. And if you're working with a haploid organism, you should not have homozygous and heterozygous positions. Each sample should have a single allele called for each variable position. It Sounds to me like you need to figure out whether this is a valid VCF file and whether the genotypes have been correctly called as haploid.

Good luck!
Brian

---------- Forwarded message ---------
From: Noor Abdelsamad <noorabd...@gmail.com>
Date: Wed, Jul 22, 2020 at 3:31 PM
Subject: Re: [poppr] vcfR2genlight object
To: Brian Knaus <brian...@gmail.com>


Hi Brian,

Thanks for your reply. Actually I'm analyzing one species (Botrytis) and we are using a group of 80 markers. The VCF file I have turned with lots of periods (.) that is supposed to be homozygous allele. My concern is wondering if genlight object consider this dots as missing data

Thanks,
Noor

Reply all
Reply to author
Forward
0 new messages