gl.report.pa

398 views
Skip to first unread message

Katrin Hohwieler

unread,
Jun 28, 2021, 1:54:31 AM6/28/21
to dartR
Hello, 

potentially a pretty basic question, but I've just been pottering around with private alleles and used gl.report.pa to get some estimates. I kind of get the idea to compare populations pairwise, but I always thought that, per definition, Private alleles are alleles that are found only in a single population among a broader collection of populations. This would deliver one value per population, when comparing e.g. 5 populations. Is there any way to manipulate the code to compare one pop against the other 4 and repeat that for each pop (I hope that makes sense)?
Further, looking at the output, I get huge values for 'priv1', 'priv2' and therefore 'totalpriv'. I do compare populations on a somewhat larger scale but not massive. E.g. I used ~2000 loci on ~10 indiviuals per pop. Comparing pop1 and pop 2, which are geographically very close to each other, priv1 = 325 and priv2=323m with totalpriv = 648, and mdf of 0.165. Could you help me to put this result in words? I'm not sure I interpret the colums correctly. Does this mean pop1 compared to pop2 has 325 private alleles?

Thanks alread in advance, 

best wishes, 
Katrin

Renee Catullo

unread,
Jun 28, 2021, 2:04:24 AM6/28/21
to da...@googlegroups.com
Hi Katrin,

If you have all five populations in the data, you should get a whole matrix of comparisons out. If you want to compare p1 to the other four populations as a whole, then you would need to just reassign those four populations to a single p2 population and do it 5 times.

Your example means there are 325 private alleles in p1 that are not in p2, and 323 private alleles in p2 that are not in p1, for a total number of 648 private alleles between the two populations. 

Hope that helps.

Renee

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/d9cc171e-9167-4abb-8352-bff79d25ec31n%40googlegroups.com.

Katrin Hohwieler

unread,
Jun 28, 2021, 2:55:13 AM6/28/21
to dartR
Thanks Renee! I'll try your suggestion. 

Bernd.Gruber

unread,
Jun 28, 2021, 3:04:37 AM6/28/21
to da...@googlegroups.com

@Katrin

 

Renee explained it nicely.

 

pas <- list

 

The shortcut could be copy your genlight object with the 5 pops and recode 1 against 2-5, e.g. via

 

for (i in 1:5)

{

gl2  <- gl

 

pop(gl2) <- factor(ifelse(pop(gl2)==popNames(gl)[i],popNames(gl)[i], “rest”))

 

print(pas[[i]] <- gl.report.pa(gl2))

 

flush.console() #just so you see the output in a loop.

 

}

 

 

 

pas is then a list with all the five comparisons…

 

I have not tested this, so not sure if it works.

 

Cheers, Bernd

Katrin Hohwieler

unread,
Jun 28, 2021, 3:06:14 AM6/28/21
to dartR
Thanks a lot, Bernd. I'll give it a go and will let you know how I went!

Peter Kriesner

unread,
Jun 28, 2021, 5:53:29 AM6/28/21
to da...@googlegroups.com
Does the PopGenReport package (mk.allele.dist flag) provide this info on private alleles for each pop'n compared to all the others rather than pairwise comparisons? That appears to be the output I got when I ran it.

Peter





Peter Unmack

unread,
Jun 28, 2021, 5:22:07 PM6/28/21
to da...@googlegroups.com
I do this a bit differently as I have no clue how to code in R.

In excel I set up my metadata file with each comparison I want to make
for private alleles retaining the pop name and changing the rest to the
word rest (so that there are only two pops). I'd call each column pop1,
pop2 ... pop5.

Once the data are read in and filtered I use this command to designate
which column should be used as the pop field.

pop(gl) <- gl@other$ind.metrics$pop1

Repeat for each column and copy and paste the output into a text editor.

p1 p2 pop1 pop2 N1 N2 fixed priv1 priv2 totalpriv mdf
1 1 2 NW90 rest 8 77 7 45 1430 1475 0.216
1 1 2 SW70 rest 10 75 1 34 1391 1425 0.174
1 1 2 E504 rest 7 78 0 2 1540 1542 0.186

Cheers
Peter

On 28/06/2021 4:04 pm, Renee Catullo wrote:
> Hi Katrin,
> If you have all five populations in the data, you should get a whole
matrix of comparisons out. If you want to compare p1 to the other four
populations as a whole, then you would need to just reassign those four
populations to a single p2 population and do it 5 times.
> Your example means there are 325 private alleles in p1 that are not in
p2, and 323 private alleles in p2 that are not in p1, for a total number

> of 648 private alleles between the two populations.
> Hope that helps.
> Renee
>> On 28 Jun 2021, at 1:54 pm, Katrin Hohwieler
>> <mailto:dartr+un...@googlegroups.com>.
>> <https://groups.google.com/d/msgid/dartr/d9cc171e-9167-4abb-8352-bff79d25ec31n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> --
> You received this message because you are subscribed to the Google
Groups "dartR" group.
> To unsubscribe from this group and stop receiving emails from it, send
an email to dartr+un...@googlegroups.com
> <mailto:dartr+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dartr/2C5DB6E0-7DBA-4ECC-B86E-1E36B6AA0399%40gmail.com

> <https://groups.google.com/d/msgid/dartr/2C5DB6E0-7DBA-4ECC-B86E-1E36B6AA0399%40gmail.com?utm_medium=email&utm_source=footer>.



Bernd.Gruber

unread,
Jun 28, 2021, 8:39:14 PM6/28/21
to da...@googlegroups.com

Hi,

 

Because we are at it. The beautiful function gl.report.pa reports also mdf (mean absolute difference in allele frequency across all loci), which I just report because I thought it could be a good measure of difference. Anyone has read a publication about this? Would be interested if there is something about that published.

 

Cheers, Bernd

 

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/4ac270912eac8d01e557...@unmack.net.

Katrin Hohwieler

unread,
Jun 28, 2021, 8:53:05 PM6/28/21
to dartR
Hi everyone, 

 I ran the script that Bernd sent through yesterday and it worked, except it needs amending pas<- list to pas <- list()

One weird thing happens, and I don't think it matters at all, but it still confused me at first so I thought I'd share. The code seems to sort pop1 and pop2 for each run alphabetical, so that the name of pop1 is A and pop2 is B. When I have populations named "S", it would compare "rest" as pop1 with "S" as pop2, insead of what I really want - comapring "S" with "rest". Just for consistency, I renamed "rest" to "zest". 

nPop(gl)

pas<- list()
for (i in 1:6)
  
{
  
  gl2  <- gl
  
  
  
  pop(gl2) <- factor(ifelse(pop(gl2)==popNames(gl)[i],popNames(gl)[i], 'zest'))
  
  
  
  print(prival[[i]] <- gl.report.pa(gl2))
  
  
  
  flush.console() #just so you see the output in a loop.
  
  
  
}
pas


As for the mdf I actually had a quick look around yesterday, because I thought it sounds like an interesting measure, but couldn't find anything on it. I'll do a little research and reply here, should I find anything. 

Thanks again for all the help, 
Katrin

Bernd.Gruber

unread,
Jun 28, 2021, 9:14:50 PM6/28/21
to da...@googlegroups.com

Hi Katrin,

 

Yes the popNames uses alphabetical order of pops I think, so good catch (maybe zzest is saver…)

 

p.s. in your script you used prival for the list name, but defined pas.

 

 

Cheers, Bernd

 

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

 

From: da...@googlegroups.com <da...@googlegroups.com> On Behalf Of Katrin Hohwieler
Sent: Tuesday, 29 June 2021 10:53
To: dartR <da...@googlegroups.com>
Subject: Re: [dartR] gl.report.pa

 

Hi everyone, 

Katrin Hohwieler

unread,
Jun 28, 2021, 10:18:29 PM6/28/21
to da...@googlegroups.com
Whoopsie, yes, I wanted to make it nice for the group and then that didn't work out, did it. Here, again:

nPop(gl)

pas<- list()
for (i in 1:6)
  
{
  
  gl2  <- gl
  
  
  
  pop(gl2) <- factor(ifelse(pop(gl2)==popNames(gl)[i],popNames(gl)[i], 'zest'))
  
  
  
  print(pas[[i]] <- gl.report.pa(gl2))
  
  
  
  flush.console() #just so you see the output in a loop.
  
  
  
}
pas

thanks!

You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/TRgadI6Me8Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/ME3PR01MB6497C8C3B06E22412ED3B38BD4029%40ME3PR01MB6497.ausprd01.prod.outlook.com.

Gabriella Scatà

unread,
Oct 12, 2022, 3:57:53 AM10/12/22
to dartR
Hi everyone,
sorry to ask but I am still confused about the output of the "gl.report.pa()" function.

In my output dataframe, I have the following columns:
"p1"        "p2"        "pop1"      "pop2"      "N1"        "N2"        "fixed"     "priv1"     "priv2"     "totalpriv"   "AFD"

p1 & p2 = represent the ordered number of the populations being compared (so p1=1 p2=3 means 1st pop with 3rd pop in order) - correct?
pop1 & pop2 = the name of the pops being compared
N1 & N2 = the number of individuals in each of the 2 pops
fixed = the number of fixed genetic differences between the 2 populations --> so would this correspond to private alleles which are fixed in each population? So it would correspond to loci that are monomorphic in each population 
             but not necessarily private (so for example: pop1-Loc1= aa, pop2-Loc1 = bb, pop3-Loc1 = aa) --> correct?
             BUT in the dartR manual it says: "the number of loci with fixed differences (same for both populations) in pop1 (compared to pop2) and vice versa. Same for private alleles..."
             BUT there is only 1 column for "fixed genetic differences" --> so does it mean the value reported is the fixed genetic differences in pop1 compared to pop2 - BUT then not viceversa? Unless as you mention, the number of fixed
             genetic differences is the same if you compare pop1 with pop2 or viceversa pop2 with pop1.
             In my case all fixed differences are = 0 - is it normal?
priv1 = number of private alleles in pop1 compared to pop2
priv2 = number of private alleles in pop2 compared to pop1

Then I have a question on the relationship between the Sankey Diagram & the dataframe output:
In my dataframe output, if I look at pop3 compared to pop12 for example, I have:
p1  p2    pop1  pop2  fixed  priv1  priv2
3     12    A        B          0       795    296

So I read this information as pop A having 795 private alleles with pop B, and pop B having 296 private alleles with pop A.
BUT, when I look at the Sankey diagram, I have:
pop B (on the left) ====> 795 private alleles ====> pop A (on the right)        

Which I interpret as the opposite to what the dataframe output says, that is: pop B (and not pop A) has 795 private alleles compared to pop A.

So, what is the correct interpretation (or direction of reading these outputs)?

Thanks a lot for any clarification you can provide!
Best,
Gabriella

Jose Luis Mijangos

unread,
Oct 13, 2022, 2:13:07 AM10/13/22
to dartR
Hi Gabriella,

Q1 correct
Q2 A fixed difference at a locus occurs when two populations share no alleles, eg. all individuals are "aa" in pop 1 and all individuals are "bb" in pop 2.
Q3  fixed differences arise when two populations have been separated (ie no gene flow) for several (eg hundreds) of generations. That's why fixed difference are useful for species delimitation. You can have a look at the tutorial below for a full analysis using fixed differences.
Q4 Yes, your observation is correct, the direction of the flows (private alleles) was inverted, thank you for report it. I have fixed this error. As usual, to use the updated version of this function you should install the developing version of dartR by typing in the R console:
> library(dartR)
> gl.install.vanilla.dartR(flavour = "dev")

Cheers,
Luis

Gabriella Scatà

unread,
Oct 13, 2022, 2:49:59 AM10/13/22
to da...@googlegroups.com
Hi Luis,
thank you for your always quick and detailed reply...and thank you so much for the new tutorial! I am going through it...very helpful!

Q1 ok, thanks
Q2 But then fixed differences become private alleles, or not? So when gl.report.pa reports different numbers for fixed and for private alleles i get confused...I thought it was becoming clear to me if fixed alleles are alleles fixed in 1 population but they can be present in the other population in different frequency with another allele, while private alleles are found only in 1 population/spp and never in the other...and don't necessarily need to be fixed...no? So fixed genetic alleles which are not shared by the 2 populations/spp (so Spp1: aa, Spp2 = bb) are both fixed and private?

Q4. Thanks! that makes much more sense now! So, just to understand...which was the correct information? the one given by the report table or by the Sankey diagram?

Thanks again a lot!
Best,
Gabriella

You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/TRgadI6Me8Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/133a3fbd-9104-4243-a380-33c430f0f110n%40googlegroups.com.

Gabriella Scatà

unread,
Oct 13, 2022, 3:22:49 AM10/13/22
to da...@googlegroups.com
Sorry, I just ran the developer version of the gl.report.pa, and I have an additional question:
What are the values reported "Chao1 Chao2"?

Thanks again!
Gabriella

Jose Luis Mijangos

unread,
Oct 14, 2022, 4:35:29 PM10/14/22
to dartR
Hi Gabriella,

You can say that fixed differences are also private alleles, but not all private alleles are fixed differences, consider the following examples:
Example 1:
pop1: ind 1 genotype aa; ind 2 genotype aa: ind 3 genotype aa
pop2: ind 1 genotype AA; ind 2 genotype AA: ind 3 genotype Aa
In this example pop2 has one private allele with respect to pop1, pop1 has no private alleles with respect to pop2
Example 2:
pop1: ind 1 genotype aa; ind 2 genotype aa: ind 3 genotype aa
pop2: ind 1 genotype AA; ind 2 genotype AA: ind 3 genotype AA
In this example pop2 has one private allele with respect to pop1, pop1 has one private alleles with respect to pop2 and there is one fixed difference between pop1 and pop2.

The correct information for private alleles was reported in the table.

The following information was added to the function documentation:

The function also reports an estimation of the lower bound of the number of undetected private alleles using the Good-Turing frequency formula, originally developed for cryptography, which estimates in an ecological context the true frequencies of rare species in a single assemblage based on an incomplete sample of individuals. The approach is described in Chao et al. (2017). For this function, the equation 2c is used. This estimate is reported in the output table as Chao1 and Chao2.

  • Chao, Anne, et al. "Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good‐Turing theory." Ecology 98.11 (2017): 2914-2929.


Cheers,
Luis

Gabriella Scatà

unread,
Oct 17, 2022, 7:37:14 PM10/17/22
to dartR
Thank you so much Luis!
That makes so much more sense!
And thank you for the paper.
Best,
Gabriella

Robin van Velzen

unread,
Feb 11, 2024, 10:57:17 AM2/11/24
to dartR
Dear all, 

I am running gl.report.pa and have some difficulties understanding (and, consequently, interpreting) the Sankey graph. The documentation explains it like this:
"In this function a Sankey Diagram is used to visualize patterns of private alleles between populations. This diagram allows to display flows (private alleles) between nodes (populations). Their links are represented with arcs that have a width proportional to the importance of the flow (number of private alleles)." 

I know that Sankey graphs are generally used to depict flow. However, I do not see how patterns of private alleles can be considered flow as I would think private alleles rather indicate an absence of (gene) flow. 

Am I missing something? 

Jose Luis Mijangos

unread,
Feb 12, 2024, 6:20:34 PM2/12/24
to dartR
Hi Robin,

Yes, you are right; the visualisation is counterintuitive. However, I think the visualisation has value as long as the description of the figure is clear and available to the reader.

Alternatively, you could present to the reader the table that is generated from the function. 

Cheers,
Luis 

Oumar Boro

unread,
Mar 8, 2024, 12:13:18 PM3/8/24
to dartR
Hi everyone,
I went through this discussion and it was very helpful.
I've been able to generate both, the table and graph.
However, I'm still looking at a function if included in DArtR, that I can use to display or extract only those private alleles.
In my case, I want to observe the exact private alleles to confirm my hypothesis.

Regards,

Boro

Jose Luis Mijangos

unread,
Mar 8, 2024, 12:41:27 PM3/8/24
to dartR
Hi Boro,

See code below.

Cheers,
Luis 

library(dartR)
#read function documentation
?gl.report.pa
#private alleles/fixed differences report
res <- gl.report.pa(platypus.gl, loc_names = TRUE)
#print results
res  

Oumar BORO

unread,
Mar 8, 2024, 12:54:57 PM3/8/24
to dartR
Reply all
Reply to author
Forward
0 new messages