bitwise distance

165 views
Skip to first unread message

NV

unread,
Mar 10, 2017, 5:28:59 PM3/10/17
to poppr
Hi Zhian,

I am using poppr to analyse GBS data for a haploid fungus. Since GBS results usually have lots of sequencing errors, I included multiple replicates of the same DNA samples in my genotyping. The idea was to calculate the distance among these replicates, which would then give me a threshold for mlg.filtering of the population.

Interestingly, after stringent filtering of the GBS dataset (only a total of 1,631 SNPs), when I calculate the bitwise distance, the distance between the replicate DNA samples seems to be zero. However, when I produce mlg.id for the data, these replicate samples do not fall into the same MLG and some are assigned to different MLGs, so I suppose they do not have identical genotypes? However, even when I set percent = FALSE and missing_match = FALSE, I still get "zero" count of differences among these isolates that belong to different MLGs.

Could you tell me what may have gone wrong? 

Thanks,
Niloofar

Zhian Kamvar

unread,
Mar 10, 2017, 5:49:38 PM3/10/17
to NV, poppr
Hi Niloofar

However, when I produce mlg.id for the data, these replicate samples do not fall into the same MLG and some are assigned to different MLGs, so I suppose they do not have identical genotypes? 

Since you're using bitwise.dist(), you must be using a snpclone object. The default behavior when creating a snpclone object is to assign each individual as a unique MLG*. 

To identify MLGs in your data set determined by bitwise.dist, you can use mlg.filter and set the cutoff to a number just bigger than 0 so that all MLGs with zero distance between them will come together:

mlg.filter(myData) <- .Machine$double.eps ^ 0.5 

Hope that helps!
Zhian

*the reason was to limit the potentially expensive operation of running mlg.filter, but I believe I can add this as an optional default in the next version.

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln



--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/1298bf48-9fde-4908-a6fc-4728baaa24c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Niloofar V

unread,
Mar 10, 2017, 6:05:30 PM3/10/17
to poppr, vagh...@gmail.com
Hi Zhian,

Thanks for your quick reply.

Actually, I have my SNP data as genclone. 

Now my question is: 

1) Would that cause a problem (to have my data as genclone)?

I had a bit of an adventure getting my data into poppr. The vcf file I have (provided by a bioinformatician that did the snp-calling) has the data as diploid. I guess this is the way their pipeline works. So I did the following to change my data to haploid. 

library(vcfR)

noflag.maf1.dp3 <- read.vcfR("path to file.vcf")

noflag.maf1.dp3_genind <- vcfR2genind(noflag.maf1.dp3)

X <- genind2df(noflag.maf1.dp3_genind)

snps <- df2genind(X, ploidy = 1, ncode=1)

noflag.maf1.dp3 <- poppr::as.genclone(snps)

I cannot figure out a way to convert df to snpclone.


2) when I look at the mlg-id, actually there are some samples that fall into the same mlg.id, so I guess genclone does not assign each individual to a unique MLG?

Cheers,
Niloofar

Zhian Kamvar

unread,
Mar 10, 2017, 6:22:21 PM3/10/17
to Niloofar V, poppr
Hi Niloofar,


1) Would that cause a problem (to have my data as genclone)?

This could be caused by the presence of missing data. When calculating the original naive MLGs, poppr includes missing data in this calculation, so two genotypes that are only different by one having a missing locus will be considered different genotypes. The solution, however, is the same. Use mlg.filter (see the documentation for mlg.filter for an example).

2) when I look at the mlg-id, actually there are some samples that fall into the same mlg.id, so I guess genclone does not assign each individual to a unique MLG?

This is correct. Genclone objects will group individuals with identical (including missing) genotypes into MLGs by default. 

Additionally, you can get your data into a snpclone object via poppr::as.snpclone(vcfR::vcfR2genlight(x)), but since you were able to fit it in a genclone object, there's no need :)

Cheers,
Zhian

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln



Niloofar V

unread,
Mar 10, 2017, 6:32:53 PM3/10/17
to poppr, vagh...@gmail.com
Hi Zhian,

Thanks for your help! All makes sense now. I will use mlg.filter as you described.

I have tried converting the vcfr object to snpclone through poppr::as.snpclone(vcfR::vcfR2genlight(x)), and it works fine, but it will read it as diploid. Is there a way to change a diploid homozygous data set (either vcfR or genind or genlight) to haploid snp.clone?

Thanks,
Niloofar

Zhian Kamvar

unread,
Mar 10, 2017, 6:43:39 PM3/10/17
to Niloofar V, poppr
Hi Niloofar,

I wouldn't worry too much about converting it to snpclone. The advantage for that format is largely in memory usage and some speed in exchange for limited functionality. If you have your genclone, you have all the functionality available in poppr.

If you are really interested in having it as a snpclone object, you may be able to do the following:

Covert it to a matrix with as.matrix()
replace all the 2s with 1s 
recreate the object with as.snpclone(new("genlight", mat, <any other data related to your snpclone>))

Best,
Zhian

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln



Brian Knaus

unread,
Mar 10, 2017, 7:08:58 PM3/10/17
to Zhian Kamvar, Niloofar V, poppr
Hi Niloofar,

I think I may have some content of relevance to your questions.

Converting haploid VCF data to diploid:
https://knausb.github.io/vcfR_documentation/dip_to_hap.html

Quality control of GBS data:
https://knausb.github.io/vcfR_documentation/gbs_class.html

Conversion of VCF to genlight and snpclone:
https://knausb.github.io/vcfR_documentation/export_genlight_snpclone.html

I find that quality control of GBS data can be very important prior to downstream analysis in packages such as poppr.

Good luck!
Brian


To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/0d085c06-c1f8-452d-95ed-efbaed1f0c57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

Niloofar V

unread,
Mar 11, 2017, 10:53:33 AM3/11/17
to poppr, zka...@gmail.com, vagh...@gmail.com
Hi Brian,

Thanks a lot for the info. These are really helpful links I had not come across before!

Cheers,
Niloofar

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages