Re: [poppr] Am I the only one who can't figure this out: Calculating Allele frequencies for a panel of STRs

154 views
Skip to first unread message

Brian Knaus

unread,
Oct 26, 2017, 11:50:17 AM10/26/17
to JVPlanz, poppr
Hi John,

Both of your messages were received by the group and your message on GitHub was received. Once would have been enough :)

I'm not sure what your goal or issue is here. When I try the example I see the following.

library(poppr)
data("Pinf")
(pal <- private_alleles(Pinf, locus ~ Country, count.alleles = FALSE))
         Pi02 D13 Pi33 Pi4B Pi16 G11 Pi56 Pi70 Pi89
Colombia    0   0    0    0    0   0    0    0    0
Ecuador     0   0    0    0    0   1    0    0    0
Mexico      5  12    1    0    3   7    1    0    9
Peru        1   2    0    1    0   4    0    1    0
sweep(pal, 2, nAll(Pinf)[colnames(pal)], FUN = "/")
               Pi02        D13      Pi33  Pi4B      Pi16        G11 Pi56 Pi70 Pi89
Colombia 0.00000000 0.00000000 0.0000000 0.000 0.0000000 0.00000000 0.00 0.00 0.00
Ecuador  0.00000000 0.00000000 0.0000000 0.000 0.0000000 0.04545455 0.00 0.00 0.00
Mexico   0.45454545 0.46153846 0.3333333 0.000 0.4285714 0.31818182 0.25 0.00 0.75
Peru     0.09090909 0.07692308 0.0000000 0.125 0.0000000 0.18181818 0.00 0.25 0.00


Everything seems to work. However, I'm concerned that this may not be what you're after. These are 'private alleles' and you state that you want a table of allele frequencies. Recall that poppr builds on adegenet, so many adegenet functions will work on poppr objects. Is what you're looking for the following?

makefreq(Pinf)

 Finding allelic frequencies from a genpop object...

...done.

        Pi02.000 Pi02.150 Pi02.152 Pi02.154 Pi02.156 Pi02.158 Pi02.160 Pi02.162 Pi02.164 Pi02.166 Pi02.170 D13.000 D13.104
PiCO01      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO02      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO03      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO04      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO05      0.50     0.00     0.00     0.00     0.00     0.00     0.25     0.25     0.00     0.00     0.00    0.50     0.0

(Actually a big table that I'm not going to reproduce here.)

And you  can always take a brute force approach using this.

as.data.frame(Pinf)

Does that get you closer to your goal?
Brian


On Tue, Oct 24, 2017 at 7:53 AM, JVPlanz <jpt...@gmail.com> wrote:
Hia ll
Been working on this the last couple days, first, Poppr is great! love it want to integrate it into courses I teach, but I'm having problems getting something fundamental figured out... I have a  population database with 200 individuals typed at 21 short tandem repeat loci... created and loaded the csv with no issues and ran some of the summary sts examples...worked fine... I need to first create an allele frequency table for each of the loci ...this has eluded me! ran the test scripts: 
## Get raw number of private alleles per locus.
## (pal <- private_alleles(Pinf, locus ~ Country, count.alleles = FALSE))
# Get percentages.
## sweep(pal, 2, nAll(Pinf)[colnames(pal)], FUN = "/")
changed False to true and got an output that had the allele counts...the sweep did not do anything...
 Would like to have a table of  allele frequencies

allele

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/fc15c9b5-cf3e-49c2-b143-d7952503c551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

JVPlanz

unread,
Oct 26, 2017, 2:14:51 PM10/26/17
to poppr
Hi Brian
Thanks for the reply...sorry I posted it so many times...I couldn't tell from my end that my post actually posted! HAven't used Google groups before.
I'm working with a dataset with 22 loci (microsatellites) with anywhere from 5 - 30 alleles present in a locus.  I was able to generate a table that gives me a count of allele observation across the alleles at each locus The allele designator was put after a "." in the locus name across the columns  when I ran the private alleles scirpt using count.alleles=TRUE  giving me the number of each allele detected across the dataset (i'm only working with 1 population at the moment...until I get things fleshed out and running)  ...looks like this:
 
Locus1.8  Locus1.9  Locus1.10 ... Locus2.19  Locus2.20  Locus2.21   Locus3.42  ...
171           132          23                  41               116              97                 63            ...  

These counts are correct...was able to confirm them in a spreadsheet.  What I would like to generate is a table that has the allele numbers in the first column, the loci across the columns and the intersecting cells with the relative frequencies of each allele...something like this

Allele/Locus    Locus1     Locus2   Locus3   ....
6                           -               -             -
7                        0.375          -             -
8                        0.287          -             -
9                        0.017          -             -
...
19                          -            0.090       -
20                          -            0.327       -
21                          -            0.368       -
...
41                          -                -        0.127  
42                          -                -        0.333
...

I hope this makes sense!
Basically need the relative frequency of each allele at each locus for the dataset.  
Sorry for my level of ineptness in describing this in more computational manner....i'm not very savvy in the script writing arena ...but am trying to get the hang of it! !

Thanks in advance for you help and tolerance!
John 

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/fc15c9b5-cf3e-49c2-b143-d7952503c551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Knaus

unread,
Oct 26, 2017, 4:16:56 PM10/26/17
to JVPlanz, poppr
I took a look around poppr and adegenet but didn't seem to find what you're looking for. Perhaps Zhian can correct me when he becomes available. You could try something like the following.

library(poppr)
data("Pinf")
x <- tab(Pinf)
# Assume everything is one population
x <- colSums(x, na.rm = TRUE)
# Create a factor of loci names
loci <- as.factor(unlist(lapply(strsplit(names(x), split=".", fixed=TRUE), function(x){x[1]})))
# Create a list where each locus is an element
x <- split(x, loci)
# Convert to percentage
lapply(x, function(x){x/sum(x)})

I'm not used to have loci that share all the same allele names (see the example), so I'm not sure where to go here. If you want to ignore their names we could load this into a matrix or data.frame in the order they appear. Are we getting closer?

Brian


To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,
Oct 27, 2017, 9:35:29 AM10/27/17
to poppr
HI Brian
I tried out your suggestion on the example and then ran it on my own datafile and it worked...it shows up in the console and the allele frequencies match what I have...gave me roughly the same output that I got from  what I cobbled together below:

library("poppr")
## create data file
ATHOSPara <- read.genalex("~/Paraguay/ATHOSPara.csv")
ATHOSPara
## create object that contains allele counts by locus
(athospal <- private_alleles(ATHOSPara, count.alleles = TRUE))
#
## this gives Obs & EXp Heterozygosity
data(ATHOSPara)
ATHOSPara
summary(ATHOSPara)
##
#
##
## make a genpop object to calc allele frequencies
##
Z <- genind2genpop(ATHOSPara)
tempATH3 <- makefreq(Z,missing="0")
tempATH3
##
## yes this works

 This is what gave me the value table ( I guess that is what its called...shows up under Values in  Environment  windowin R studio) that I referred to in my previous post.

From what I can tell, to get the table I need i need to accomplish the following tasks:

#  need to differentiate/split column names :  locus.allele -> locus   allele
#  count each locus once and set name as column headers
#  adjust allele names so that microvariants designated  with "." instead of _ (eg D10S1237 allele 19_3 --> 19.3 )
#  make list of all allele names and order numerically
#  stack alleles in numerical order as row legend
#  place corresponding frequency value  in cell of congruence where allele name and locus name meet
###
should give a table that looks like this:

Allele               Locus1     Locus2   Locus3   ....
6                           -               -             -
7                        0.375          -             -
8                        0.287          -             -
9                        0.017          -             -
...
19                          -            0.090       -
19.3                          -         0.327       -
20                          -            0.368       -
...
41                          -                -        0.127  
42                          -                -        0.333
...

PROBLEM IS...I have no earthly idea how to do this.

Again thanks so much for your patience with me... old dog learning new tricks.

Any suggestion to convert what i got from either my attempt or your approach would be greatly appreciated.
Thanks
John

Brian Knaus

unread,
Oct 27, 2017, 2:34:18 PM10/27/17
to JVPlanz, poppr
Hi John,

I think we should stick to the Pinf dataset for this conversation. Part of the purpose of this forum is to document our attempts so that it not only helps you but it may provide help for someone in the future who may have a similar issue. Because the Pinf dataset is available with poppr it means that others will have access to it so it will provide a full example.

Based on the criteria you've provided, I think I got you almost all the way there. Lets repeat it with some changes.

library(poppr)
data("Pinf")
x <- tab(Pinf)
# Create some 'microvariants'
mv <- c(15, 23, 26, 36, 43, 45, 55, 105)
colnames(x)[mv] <- paste(colnames(x)[mv], "_2", sep="")

# Assume everything is one population
x <- colSums(x, na.rm = TRUE)
# Create a factor of loci names
loci <- as.factor(unlist(lapply(strsplit(names(x), split=".", fixed=TRUE), function(x){x[1]})))
# Create a list where each locus is an element
x <- split(x, loci)
# Convert to percentage
x <- lapply(x, function(x){x/sum(x)})
# Rename the alleles so they will be numeric as in the post
x <- lapply(x, function(x){names(x) <- unlist(lapply(strsplit(names(x), split='.', fixed=TRUE), function(x){x[2]})); x})
# Handle 'microvariants'
x <- lapply(x, function(x){names(x) <- sub(pattern="_", replacement=".", names(x),fixed=TRUE); x})
# Initialize our final data.frame
myAlleles <- unique(unlist(lapply(x, names)))
myAlleles <- sort(myAlleles)
myTable <- data.frame(matrix(nrow=length(myAlleles), ncol=length(x)))
rownames(myTable) <- myAlleles
colnames(myTable) <- names(x)

# Population the data.frame
for(i in 1:length(x)){
  myTable[ names(x[[i]]),i] <- x[[i]]
}
head(myTable)

              D13       G11     Pi02 Pi04 Pi16 Pi33      Pi4B     Pi56      Pi63 Pi70      Pi89
000   0.481481481 0.4651163 0.497093  0.5  0.5  0.5 0.4651163 0.497093 0.4617647  0.5 0.4970588
104   0.006172840        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
108   0.009259259        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
110.2 0.015432099        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
112   0.030864198        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
118   0.021604938        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA

Why don't you give that a try and see if we've got it yet.
Brian

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,
Oct 27, 2017, 6:38:45 PM10/27/17
to Brian Knaus, poppr
Will do Brian. I’ll give it a whirl wen I get back in my office. Will keep you posted. Thanks. Have a great weekend
John

Zhian Kamvar

unread,
Oct 29, 2017, 8:59:20 AM10/29/17
to JVPlanz, Brian Knaus, poppr
Hi John,

If you want a table of alleles x loci with the entries being allele frequencies, then private_alleles function would not necessarily be the way you want to go since it adds the population dimension to this. 

If you set your population factor to NULL (e.g. pop(x) <- NULL), or only have one population, you can derive an allele frequency table using genind2genpop() followed by tab(freq = TRUE). After that, you can use tidyverse functionality to manipulate these data into a table (see below).

library("poppr")
library("tidyverse")
data("Pram")
# Selecting one population
JHC <- Pram[pop = "JHallCr_OR", drop = TRUE]
as.data.frame(t(tab(genind2genpop(JHC), freq = TRUE))) %>%
  rownames_to_column("loc") %>% 
  rename(freq = JHallCr_OR) %>% 
  separate(loc, c("locus", "allele"), sep = "[.]") %>%
  spread(key = locus, value = freq)
#> 
#>  Converting data from a genind to a genpop object... 
#> 
#> ...done.
#>    allele   Pr9C3A1   PrMS39A1    PrMS43A1  PrMS45A1   PrMS6A1
#> 1     130        NA 0.50000000          NA        NA        NA
#> 2     165        NA         NA          NA        NA 0.5614754
#> 3     166        NA         NA          NA 0.5040984        NA
#> 4     168        NA         NA          NA        NA 0.4385246
#> 5     186        NA         NA          NA 0.4959016        NA
#> 6     216 0.5614754         NA          NA        NA        NA
#> 7     226 0.4385246         NA          NA        NA        NA
#> 8     246        NA 0.01024590          NA        NA        NA
#> 9     250        NA 0.47745902          NA        NA        NA
#> 10    254        NA 0.01229508          NA        NA        NA
#> 11    364        NA         NA 0.002049180        NA        NA
#> 12    372        NA         NA 0.073770492        NA        NA
#> 13    376        NA         NA 0.387295082        NA        NA
#> 14    380        NA         NA 0.030737705        NA        NA
#> 15    384        NA         NA 0.010245902        NA        NA
#> 16    465        NA         NA 0.002049180        NA        NA
#> 17    477        NA         NA 0.006147541        NA        NA
#> 18    481        NA         NA 0.020491803        NA        NA
#> 19    485        NA         NA 0.411885246        NA        NA
#> 20    489        NA         NA 0.043032787        NA        NA
#> 21    493        NA         NA 0.008196721        NA        NA
#> 22    505        NA         NA 0.004098361        NA        NA

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln
ORCID: 0000-0003-1458-7108




Brian Knaus

unread,
Oct 30, 2017, 12:22:33 PM10/30/17
to Zhian Kamvar, JVPlanz, poppr
+1 for a tidy answer. Thanks Zhian! Those of us who are not as familiar with the tidyverse may find value in avoiding the use of the magritter pipe (%>%). This will allow us to examine the intermediate steps to get a better idea of what each individual function accomplishes. We can rewrite Zhian's example as follows.


library("poppr")
library("tidyverse")
data("Pram")
# Selecting one population
JHC <- Pram[pop = "JHallCr_OR", drop = TRUE]
JHC <- as.data.frame(t(tab(genind2genpop(JHC), freq = TRUE)))
JHC <- rownames_to_column(JHC, var = "loc")
JHC <- rename(JHC, freq = JHallCr_OR)
JHC <- separate(JHC, loc, c("locus", "allele"), sep = "[.]")
JHC <- spread(JHC, key = locus, value = freq)

Perhaps not as efficient as Zhian's example. But I found this helpful for learning the functions.
Thanks!
Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

JVPlanz

unread,
Oct 30, 2017, 1:12:37 PM10/30/17
to Brian Knaus, Zhian Kamvar, poppr
Thanks Zhian. I’ll give that a shot when I get back to the office on Wednesday and let you know how it works out... good tip Brian
Thanks
John

For more options, visit https://groups.google.com/d/optout.



-- 
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

-- 
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.

JVPlanz

unread,
Dec 5, 2017, 12:33:08 PM12/5/17
to poppr
Hi Zhian
Finally can get back to this was out of pocket for a month. I tried the script you sent but I get

> library("poppr")
> library("tidyverse")
Error in library("tidyverse") : there is no package called ‘tidyverse’
> data("Pram")
...
any suggestions?

John

Zhian Kamvar

unread,
Dec 5, 2017, 12:35:36 PM12/5/17
to JVPlanz, poppr
Hi John,

You should install the package called "tidyverse":

install.packages("tidyverse")

In general, when you see an error, try copying and pasting that error into google before you ask package maintainers for help.

Thanks,
Zhian

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln
ORCID: 0000-0003-1458-7108




signature.asc

JVPlanz

unread,
Dec 5, 2017, 5:02:05 PM12/5/17
to poppr
Thanks Brian and Zhian
got your script to work...now i'll get it working with my dataset..
Thanks
John

For more options, visit https://groups.google.com/d/optout.



-- 
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

-- 
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages