Re: [poppr] Am I the only one who can't figure this out: Calculating Allele frequencies for a panel of STRs

Brian Knaus

unread,

Oct 26, 2017, 11:50:17 AM10/26/17

to JVPlanz, poppr

Hi John,

Both of your messages were received by the group and your message on GitHub was received. Once would have been enough :)

I'm not sure what your goal or issue is here. When I try the example I see the following.

library(poppr)
data("Pinf")
(pal <- private_alleles(Pinf, locus ~ Country, count.alleles = FALSE))
         Pi02 D13 Pi33 Pi4B Pi16 G11 Pi56 Pi70 Pi89
Colombia    0   0    0    0    0   0    0    0    0
Ecuador     0   0    0    0    0   1    0    0    0
Mexico      5 12    1    0    3   7    1    0    9
Peru        1   2    0    1    0   4    0    1    0
sweep(pal, 2, nAll(Pinf)[colnames(pal)], FUN = "/")
               Pi02        D13      Pi33 Pi4B      Pi16        G11 Pi56 Pi70 Pi89
Colombia 0.00000000 0.00000000 0.0000000 0.000 0.0000000 0.00000000 0.00 0.00 0.00
Ecuador 0.00000000 0.00000000 0.0000000 0.000 0.0000000 0.04545455 0.00 0.00 0.00
Mexico   0.45454545 0.46153846 0.3333333 0.000 0.4285714 0.31818182 0.25 0.00 0.75
Peru     0.09090909 0.07692308 0.0000000 0.125 0.0000000 0.18181818 0.00 0.25 0.00

Everything seems to work. However, I'm concerned that this may not be what you're after. These are 'private alleles' and you state that you want a table of allele frequencies. Recall that poppr builds on adegenet, so many adegenet functions will work on poppr objects. Is what you're looking for the following?

makefreq(Pinf)

Finding allelic frequencies from a genpop object...

...done.

        Pi02.000 Pi02.150 Pi02.152 Pi02.154 Pi02.156 Pi02.158 Pi02.160 Pi02.162 Pi02.164 Pi02.166 Pi02.170 D13.000 D13.104
PiCO01      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO02      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO03      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO04      0.50     0.00     0.00     0.00     0.00     0.00     0.00     0.50     0.00     0.00     0.00    0.50     0.0
PiCO05      0.50     0.00     0.00     0.00     0.00     0.00     0.25     0.25     0.00     0.00     0.00    0.50     0.0

(Actually a big table that I'm not going to reproduce here.)

And you can always take a brute force approach using this.

as.data.frame(Pinf)

Does that get you closer to your goal?

Brian

On Tue, Oct 24, 2017 at 7:53 AM, JVPlanz <jpt...@gmail.com> wrote:

Hia ll
Been working on this the last couple days, first, Poppr is great! love it want to integrate it into courses I teach, but I'm having problems getting something fundamental figured out... I have a population database with 200 individuals typed at 21 short tandem repeat loci... created and loaded the csv with no issues and ran some of the summary sts examples...worked fine... I need to first create an allele frequency table for each of the loci ...this has eluded me! ran the test scripts:
## Get raw number of private alleles per locus.
## (pal <- private_alleles(Pinf, locus ~ Country, count.alleles = FALSE))
# Get percentages.
## sweep(pal, 2, nAll(Pinf)[colnames(pal)], FUN = "/")
changed False to true and got an output that had the allele counts...the sweep did not do anything...
Would like to have a table of allele frequencies

allele

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/fc15c9b5-cf3e-49c2-b143-d7952503c551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Brian J. Knaus, Ph.D.

Corvallis, Oregon, USA

brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

JVPlanz

unread,

Oct 26, 2017, 2:14:51 PM10/26/17

to poppr

Hi Brian

Thanks for the reply...sorry I posted it so many times...I couldn't tell from my end that my post actually posted! HAven't used Google groups before.

I'm working with a dataset with 22 loci (microsatellites) with anywhere from 5 - 30 alleles present in a locus. I was able to generate a table that gives me a count of allele observation across the alleles at each locus The allele designator was put after a "." in the locus name across the columns when I ran the private alleles scirpt using count.alleles=TRUE giving me the number of each allele detected across the dataset (i'm only working with 1 population at the moment...until I get things fleshed out and running) ...looks like this:

Locus1.8 Locus1.9 Locus1.10 ... Locus2.19 Locus2.20 Locus2.21 Locus3.42 ...

171 132 23 41 116 97 63 ...

These counts are correct...was able to confirm them in a spreadsheet. What I would like to generate is a table that has the allele numbers in the first column, the loci across the columns and the intersecting cells with the relative frequencies of each allele...something like this

Allele/Locus Locus1 Locus2 Locus3 ....
6 - - -

7 0.375 - -

8 0.287 - -

9 0.017 - -

...

19 - 0.090 -

20 - 0.327 -

21 - 0.368 -

...

41 - - 0.127

42 - - 0.333

...

I hope this makes sense!

Basically need the relative frequency of each allele at each locus for the dataset.

Sorry for my level of ineptness in describing this in more computational manner....i'm not very savvy in the script writing arena ...but am trying to get the hang of it! !

Thanks in advance for you help and tolerance!

John

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/fc15c9b5-cf3e-49c2-b143-d7952503c551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Knaus

unread,

Oct 26, 2017, 4:16:56 PM10/26/17

to JVPlanz, poppr

I took a look around poppr and adegenet but didn't seem to find what you're looking for. Perhaps Zhian can correct me when he becomes available. You could try something like the following.

library(poppr)
data("Pinf")
x <- tab(Pinf)
# Assume everything is one population
x <- colSums(x, na.rm = TRUE)
# Create a factor of loci names
loci <- as.factor(unlist(lapply(strsplit(names(x), split=".", fixed=TRUE), function(x){x[1]})))
# Create a list where each locus is an element
x <- split(x, loci)
# Convert to percentage
lapply(x, function(x){x/sum(x)})

I'm not used to have loci that share all the same allele names (see the example), so I'm not sure where to go here. If you want to ignore their names we could load this into a matrix or data.frame in the order they appear. Are we getting closer?

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/83397c27-3854-430b-b449-56afd79cdb74%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,

Oct 27, 2017, 9:35:29 AM10/27/17

to poppr

HI Brian

I tried out your suggestion on the example and then ran it on my own datafile and it worked...it shows up in the console and the allele frequencies match what I have...gave me roughly the same output that I got from what I cobbled together below:

library("poppr")

## create data file

ATHOSPara <- read.genalex("~/Paraguay/ATHOSPara.csv")

ATHOSPara

## create object that contains allele counts by locus

(athospal <- private_alleles(ATHOSPara, count.alleles = TRUE))

#

## this gives Obs & EXp Heterozygosity

data(ATHOSPara)

ATHOSPara

summary(ATHOSPara)

##

#

##

## make a genpop object to calc allele frequencies

##

Z <- genind2genpop(ATHOSPara)

tempATH3 <- makefreq(Z,missing="0")

tempATH3

##

## yes this works

This is what gave me the value table ( I guess that is what its called...shows up under Values in Environment windowin R studio) that I referred to in my previous post.

From what I can tell, to get the table I need i need to accomplish the following tasks:

# need to differentiate/split column names : locus.allele -> locus allele

# count each locus once and set name as column headers

# adjust allele names so that microvariants designated with "." instead of _ (eg D10S1237 allele 19_3 --> 19.3 )

# make list of all allele names and order numerically

# stack alleles in numerical order as row legend

# place corresponding frequency value in cell of congruence where allele name and locus name meet

###

should give a table that looks like this:

Allele Locus1 Locus2 Locus3 ....
6 - - -

7 0.375 - -

8 0.287 - -

9 0.017 - -

...

19 - 0.090 -

19.3 - 0.327 -

20 - 0.368 -

...

41 - - 0.127

42 - - 0.333

...

PROBLEM IS...I have no earthly idea how to do this.

Again thanks so much for your patience with me... old dog learning new tricks.

Any suggestion to convert what i got from either my attempt or your approach would be greatly appreciated.

Thanks

John

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/83397c27-3854-430b-b449-56afd79cdb74%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Brian Knaus

unread,

Oct 27, 2017, 2:34:18 PM10/27/17

to JVPlanz, poppr

Hi John,

I think we should stick to the Pinf dataset for this conversation. Part of the purpose of this forum is to document our attempts so that it not only helps you but it may provide help for someone in the future who may have a similar issue. Because the Pinf dataset is available with poppr it means that others will have access to it so it will provide a full example.

Based on the criteria you've provided, I think I got you almost all the way there. Lets repeat it with some changes.

library(poppr)
data("Pinf")
x <- tab(Pinf)

# Create some 'microvariants'
mv <- c(15, 23, 26, 36, 43, 45, 55, 105)
colnames(x)[mv] <- paste(colnames(x)[mv], "_2", sep="")

# Assume everything is one population
x <- colSums(x, na.rm = TRUE)
# Create a factor of loci names
loci <- as.factor(unlist(lapply(strsplit(names(x), split=".", fixed=TRUE), function(x){x[1]})))
# Create a list where each locus is an element
x <- split(x, loci)
# Convert to percentage

x <- lapply(x, function(x){x/sum(x)})
# Rename the alleles so they will be numeric as in the post
x <- lapply(x, function(x){names(x) <- unlist(lapply(strsplit(names(x), split='.', fixed=TRUE), function(x){x[2]})); x})
# Handle 'microvariants'
x <- lapply(x, function(x){names(x) <- sub(pattern="_", replacement=".", names(x),fixed=TRUE); x})
# Initialize our final data.frame
myAlleles <- unique(unlist(lapply(x, names)))
myAlleles <- sort(myAlleles)
myTable <- data.frame(matrix(nrow=length(myAlleles), ncol=length(x)))
rownames(myTable) <- myAlleles
colnames(myTable) <- names(x)

# Population the data.frame
for(i in 1:length(x)){
myTable[ names(x[[i]]),i] <- x[[i]]
}
head(myTable)

              D13       G11     Pi02 Pi04 Pi16 Pi33      Pi4B     Pi56      Pi63 Pi70      Pi89
000   0.481481481 0.4651163 0.497093 0.5 0.5 0.5 0.4651163 0.497093 0.4617647 0.5 0.4970588
104   0.006172840        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
108   0.009259259        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
110.2 0.015432099        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
112   0.030864198        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA
118   0.021604938        NA       NA   NA   NA   NA        NA       NA        NA   NA        NA

Why don't you give that a try and see if we've got it yet.

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/387e32eb-7a78-46e6-99a8-70636a2713f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,

Oct 27, 2017, 6:38:45 PM10/27/17

to Brian Knaus, poppr

Will do Brian. I’ll give it a whirl wen I get back in my office. Will keep you posted. Thanks. Have a great weekend
John

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/387e32eb-7a78-46e6-99a8-70636a2713f5%40googlegroups.com.

Zhian Kamvar

unread,

Oct 29, 2017, 8:59:20 AM10/29/17

to JVPlanz, Brian Knaus, poppr

Hi John,

If you want a table of alleles x loci with the entries being allele frequencies, then private_alleles function would not necessarily be the way you want to go since it adds the population dimension to this.

If you set your population factor to NULL (e.g. pop(x) <- NULL), or only have one population, you can derive an allele frequency table using genind2genpop() followed by tab(freq = TRUE). After that, you can use tidyverse functionality to manipulate these data into a table (see below).

library("poppr")

library("tidyverse")

data("Pram")

# Selecting one population

JHC <- Pram[pop = "JHallCr_OR", drop = TRUE]

as.data.frame(t(tab(genind2genpop(JHC), freq = TRUE))) %>%

rownames_to_column("loc") %>%

rename(freq = JHallCr_OR) %>%

separate(loc, c("locus", "allele"), sep = "[.]") %>%

spread(key = locus, value = freq)

#>

#> Converting data from a genind to a genpop object...

#>

#> ...done.

#> allele Pr9C3A1 PrMS39A1 PrMS43A1 PrMS45A1 PrMS6A1

#> 1 130 NA 0.50000000 NA NA NA

#> 2 165 NA NA NA NA 0.5614754

#> 3 166 NA NA NA 0.5040984 NA

#> 4 168 NA NA NA NA 0.4385246

#> 5 186 NA NA NA 0.4959016 NA

#> 6 216 0.5614754 NA NA NA NA

#> 7 226 0.4385246 NA NA NA NA

#> 8 246 NA 0.01024590 NA NA NA

#> 9 250 NA 0.47745902 NA NA NA

#> 10 254 NA 0.01229508 NA NA NA

#> 11 364 NA NA 0.002049180 NA NA

#> 12 372 NA NA 0.073770492 NA NA

#> 13 376 NA NA 0.387295082 NA NA

#> 14 380 NA NA 0.030737705 NA NA

#> 15 384 NA NA 0.010245902 NA NA

#> 16 465 NA NA 0.002049180 NA NA

#> 17 477 NA NA 0.006147541 NA NA

#> 18 481 NA NA 0.020491803 NA NA

#> 19 485 NA NA 0.411885246 NA NA

#> 20 489 NA NA 0.043032787 NA NA

#> 21 493 NA NA 0.008196721 NA NA

#> 22 505 NA NA 0.004098361 NA NA

-----

Zhian N. Kamvar, Ph. D.

Postdoctoral Researcher (Everhart Lab)

Department of Plant Pathology

University of Nebraska-Lincoln

ORCID: 0000-0003-1458-7108

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/CAHfpL%2BXdhwyysf2c-BrDQDY1BeeZLrNjj2SEUhJaVN-6qn6BWg%40mail.gmail.com.

Brian Knaus

unread,

Oct 30, 2017, 12:22:33 PM10/30/17

to Zhian Kamvar, JVPlanz, poppr

+1 for a tidy answer. Thanks Zhian! Those of us who are not as familiar with the tidyverse may find value in avoiding the use of the magritter pipe (%>%). This will allow us to examine the intermediate steps to get a better idea of what each individual function accomplishes. We can rewrite Zhian's example as follows.

library("poppr")
library("tidyverse")
data("Pram")
# Selecting one population
JHC <- Pram[pop = "JHallCr_OR", drop = TRUE]

JHC <- as.data.frame(t(tab(genind2genpop(JHC), freq = TRUE)))
JHC <- rownames_to_column(JHC, var = "loc")
JHC <- rename(JHC, freq = JHallCr_OR)
JHC <- separate(JHC, loc, c("locus", "allele"), sep = "[.]")
JHC <- spread(JHC, key = locus, value = freq)

Perhaps not as efficient as Zhian's example. But I found this helpful for learning the functions.

Thanks!

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/387e32eb-7a78-46e6-99a8-70636a2713f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

--
You received this message because you are subscribed to the Google Groups "poppr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe@googlegroups.com.

To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/CAHfpL%2BXdhwyysf2c-BrDQDY1BeeZLrNjj2SEUhJaVN-6qn6BWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,

Oct 30, 2017, 1:12:37 PM10/30/17

to Brian Knaus, Zhian Kamvar, poppr

Thanks Zhian. I’ll give that a shot when I get back to the office on Wednesday and let you know how it works out... good tip Brian
Thanks
John

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/387e32eb-7a78-46e6-99a8-70636a2713f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

--
You received this message because you are subscribed to the Google Groups "poppr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/CAHfpL%2BXdhwyysf2c-BrDQDY1BeeZLrNjj2SEUhJaVN-6qn6BWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

JVPlanz

unread,

Dec 5, 2017, 12:33:08 PM12/5/17

to poppr

Hi Zhian

Finally can get back to this was out of pocket for a month. I tried the script you sent but I get

> library("poppr")

> library("tidyverse")

Error in library("tidyverse") : there is no package called ‘tidyverse’

> data("Pram")

...

any suggestions?

John

Zhian Kamvar

unread,

Dec 5, 2017, 12:35:36 PM12/5/17

to JVPlanz, poppr

Hi John,

You should install the package called "tidyverse":

install.packages("tidyverse")

In general, when you see an error, try copying and pasting that error into google before you ask package maintainers for help.

Thanks,

Zhian

-----

Zhian N. Kamvar, Ph. D.

Postdoctoral Researcher (Everhart Lab)

Department of Plant Pathology

University of Nebraska-Lincoln

ORCID: 0000-0003-1458-7108

signature.asc

JVPlanz

unread,

Dec 5, 2017, 5:02:05 PM12/5/17

to poppr

Thanks Brian and Zhian

got your script to work...now i'll get it working with my dataset..

Thanks

John

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/387e32eb-7a78-46e6-99a8-70636a2713f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Brian J. Knaus, Ph.D.
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

--
You received this message because you are subscribed to the Google Groups "poppr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/CAHfpL%2BXdhwyysf2c-BrDQDY1BeeZLrNjj2SEUhJaVN-6qn6BWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward