poppr beginner part 2

212 views
Skip to first unread message

Inna Smith

unread,
Jan 9, 2017, 12:51:52 AM1/9/17
to poppr


I am new to using R, and I am very excited to have found this package for my data analysis. I saw a post by KM Tsui back in 2015 asking for assistance called poppr beginner. I have read that post, and I am attempting to Convert a data.frame of allele data to a genind object. I ran two different files, one "testing.csv" an incomplete file which I was able to make a script as follows:

ALO<- read.csv("C:/Users/innap/Documents/R files/testing.csv", head = TRUE, sep = ",")
alos<- df2genind(ALO[, -c(1, 2)],ncode = 3, ind.names = ALO[[1]], pop = ALO[[2]], ploidy = 2)
poppr(alos)
| Idaho 
| Oregon 
| Total 
     Pop  N MLG eMLG       SE     H G lambda   E.5   Hexp  Ia rbarD File
1  Idaho 31   1    1  0.00e+00 0.000 1    0.0   NaN 0.0000 NaN   NaN alos
2 Oregon 33   1    1 0.00e+00 0.000 1    0.0   NaN 0.0000 NaN   NaN alos
3  Total 64   2    2 1.73e-08 0.693 2    0.5 0.999 0.0387   1     1 alos

after about 8 hours of messing with different commands, something worked, so that's good, but I am not sure if I have the values in there correctly because  Ia and rbarD show NaN (probably because it's a very short initial file with only two pops?). I set ncode=3 because in the reading frame my alleles are maked as A/A..and I dont exactly know what -c(1, 2)] is. This type of format  I found on a PDF "Reading Genetic Data Files Into R with adegenet and pegas"

I also made another csv file (test2.cvs) modeled after diploid data from this page: http://grunwaldlab.github.io/Population_Genetics_in_R/Data_Preparation.html. I attempted to run it with the same code and of course it did not work. I would really like to use the second example as is suggested on the website (test2.csv) , but I was not able to find how to make a dfgenind matrix which would allow me to run my data in that format because when the data is read, the "space" for the second allele are read as a different locus. Any ideas on a command to merge them into one locus two alleles?? Is it worth putting in the metadata rows at the top and how do I read them in without messing up the headers?

 Also, can you please recommend which format I should use that would be most user friendly (testing.csv or test2.csv) for running standard measures of diversity, Ia and rbarD? I am working with allozymes and a primarily selfing diploid plant. The columns I need to include are: Individual, Range, State, Population, and 26 Loci (will be added later) If anyone can share an example file that would be great! I hope I am at least on the right track.. Thank you!!

Inna P. Smith
test2.csv
testing.csv

Brian Knaus

unread,
Jan 10, 2017, 11:27:41 AM1/10/17
to Inna Smith, poppr
Hi Inna,

I think your first example is fine. I like to add a few steps in order to help me validate that the objects I've created are what I think they should be. I've added to your example.

library(poppr)
ALO<- read.csv("testing.csv", head = TRUE, sep = ",")
head(ALO)
nrow(ALO)
ncol(ALO)

alos<- df2genind(ALO[, -c(1, 2)],ncode = 3, ind.names = ALO[[1]], pop = ALO[[2]], ploidy = 2)
alos
# return to data.frame to validate things are handled well.
head(genind2df(alos, sep = "/"))
poppr(alos)

I think the Ia and rbarD reported are reasonable. Your output reports that there is only one multilocus genotype in each population. This means that based on your marker system all your individuals are genetically identical.

You reported that your second example does not work, but you did not provide an example of what you tried or what error your attempts resulted in. This could help us help you. When I try I get the following.

ALO<- read.csv("test2.csv", head = TRUE, sep = ",")

alos<- df2genind(ALO[, -c(1, 2)],ncode = 3, ind.names = ALO[[1]], pop = ALO[[2]], ploidy = 2)

Error in .local(.Object, ...) :
  more than one '.' in column names; please name column as [LOCUS].[ALLELE]
In addition: Warning message:
In df2genind(ALO[, -c(1, 2)], ncode = 3, ind.names = ALO[[1]], pop = ALO[[2]],  :
  character '.' detected in names of loci; replacing with '_'

The error is thrown because your column names (marker names) include periods. When we replace them with another character I think things work. Your data is also formatted differently in the second example so you'll need to handle it differently. I've added some steps to paste each allele together with a delimiter so that we have genotypes as in the first example.

ALO<- read.csv("test2.csv", head = TRUE, sep = ",")
# Because the period is recognized as a regex it needs to be escaped.
colnames(ALO) <- sub("\\.", "_", colnames(ALO))
# Paste together alleles with delimiter.
allele1 <- as.matrix(ALO[,seq(3,53,by=2)])
allele2 <- as.matrix(ALO[,seq(4,54,by=2)])
gt <- as.data.frame(matrix( paste( allele1, allele2, sep = "/"), ncol = 26))
colnames(gt) <- colnames(ALO[,seq(3,53,by=2)])
ALO <- cbind(ALO[,1:2], gt)
# Now proceed with example.
head(ALO)

alos<- df2genind(ALO[, -c(1, 2)],ncode = 3, ind.names = ALO[[1]], pop = ALO[[2]], ploidy = 2)
alos
/// GENIND OBJECT /////////

 // 84 individuals; 52 loci; 60 alleles; size: 48.9 Kb

 // Basic content
   @tab:  84 x 60 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 1-2)
   @loc.fac: locus factor for the 60 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: df2genind(X = ALO[, -c(1, 2)], ncode = 3, ind.names = ALO[[1]],
    pop = ALO[[2]], ploidy = 2)

 // Optional content
   @pop: population of each individual (group size range: 20-33)

We appear to have a valid genind object.

Good luck!
Brian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/768229a2-0c21-4737-a18e-223906edffae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/
Reply all
Reply to author
Forward
0 new messages