GenAlex binary polyploid data in Poppr?

307 views
Skip to first unread message

Michelle Sneck

unread,
Jan 18, 2017, 10:24:38 PM1/18/17
to poppr
Hi there,

I have recently used GenAlex to calculate a crude measure of genetic distance (total # loci that differ between two individuals) using the data attached (poppr_test_4_exp). These data were generated from 10 SSR markers developed for a tetraploid grass. Given ambiguous dosage, and ambiguous reporting from those who developed the markers, I reduced all alleles per locus to either 1 = present; or 0 = absent (-1 if the data was missing due to poor amplification).

 I've scoured the online Poppr documentation and it appears as though Poppr does indeed accept dominant (0 = absent, 1 = present) data, but I haven't found examples of it. I've tried to mirror the format of the "Pinf" data set (poppr_test_3), but I'm not convinced that this is the correct approach and cannot figure out what the "2's" denote within these data, either. Given the complexity of quantifying population genetics metrics with polyploid data, I am proceeding with extreme caution and want to make sure I am approaching data formatting appropriately. Ultimately, my aim is to calculate LD and other descriptive measures that Poppr offers.

Any expedient help in this matter will be extremely appreciated!

All my best,

-M
poppr_test_4_exp.csv
poppr_test_3.csv

Zhian Kamvar

unread,
Jan 18, 2017, 10:56:12 PM1/18/17
to Michelle Sneck, poppr
Hi,

The data you present show 33 loci, but you mention ten. I suspect that this means that each locus in this case actually represents an allele, which means you have replicated the way data is represented in the genind/genclone object if you were to import it via read.genalex() and use recode_polyploids().

The rest of my response is assuming the above.

There IS a presence/absence data set called Aeut that represents 56 AFLP loci, but each column represents a single locus. If you want to format your data, you should follow the instructions here: http://grunwaldlab.github.io/Population_Genetics_in_R/Data_Preparation.html#polyploids. If you record every observed allele, you will end up with the same matrix you constructed in GenAlEx. Moreover, if you do it this way, you'll be able to utilize the algorithms that treat polyploids in Bruvo's distance (which can be used as a bandsharing distance as demonstrated in Metzger et al. 2015: http://dx.doi.org/10.1016/j.cell.2015.02.042). 

Additionally, there is a package with statistics specifically dealing with polyploids called polysat that might be worthwhile considering that it assumes allele copy ambiguity. 

I hope that helps,

Zhian

P.S.
I am currently traveling and may be unresponsive for a few days if you have any followup questions.

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/93ae52f7-5d0d-4f73-bbd1-78688fa4f529%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<poppr_test_4_exp.csv><poppr_test_3.csv>

Michelle Sneck

unread,
Jan 19, 2017, 12:16:28 PM1/19/17
to poppr, michelle...@gmail.com
Hi Zhian,

Thanks so much for your help!

You are correct, I have 33 loci across 10 alleles. 

I already attempted to replicate the polyploid formatting you suggested, but using a binary approach (poppr_test_4). 
This was not accepted by poppr, so I figured that the binary formatting was problematic when applied to polyploids. As you can see, I added 0's to fill in the proper amount of columns for a tetraploid.

When loading these data into poppr I got the following error message:

"Something went wrong. Please ensure that your data is formatted correctly:
 Is the ploidy flag set to the maximum ploidy of your data?
 If you have geographic coordinates: did you set the flag?
 If you have regional data: did you set the flag?
 Otherwise, the problem may lie within the data structure itself."

--So my question is, how can I format these data properly?

PS I have used polysat, thank you for the suggestion :) 

Best,

M
poppr_test_4.csv

Zhian Kamvar

unread,
Jan 22, 2017, 8:50:36 AM1/22/17
to Michelle Sneck, poppr
Hi Michelle,

You should not use a binary approach to format your data in this situation. It obscures the allele sizes and does not properly delimit the loci. 

I'm attaching a representation of what you should have in your data along with the script used to convert your original file to it. Since I didn't know the specific alleles, I simply replaced them with the numbers 1-4; you can replace them with the true allele sizes. Note that I'm assuming each cell represents and allele count with the column name being the locus. 

The reason for this is because of the way adegenet/poppr transform data. Normally, we code data in a format where each cell in our data sheet represents the state of a genotype or allele. This is great for recording and storing data because it's easy to understand and is easy to add new data (all you have to do is add new rows), but not so great for computation for methods like DAPC where each cell represents counts of a specific allele. 

Because formatting the data for computation is a lot less intuitive and a lot more tedious (if you add new data, you must add new columns for each new allele), adegenet/poppr helps users by automatically transforming the data when it's read in. 

Hope that helps,
Zhian



For more options, visit https://groups.google.com/d/optout.
<poppr_test_4.csv>
poppr_test_4_corrected.R
poppr_test_4_corrected.csv
Reply all
Reply to author
Forward
0 new messages