adding fields to individual metadata to a filtered dataset

94 views
Skip to first unread message

Gabriella Scatà

unread,
Oct 11, 2022, 2:52:58 AM10/11/22
to dartR
Hi,
I would like to know if it's possible to add a field to the individual metadata of an already filtered dataset.

I need to create a new field in the individual metadata and assign specific values to specific groups of individuals (using a list of individual names for the assignment).

I am assuming it is possible, but just wanted to check.
Thanks
Best,
Gabriella

Jose Luis Mijangos

unread,
Oct 12, 2022, 6:20:45 PM10/12/22
to dartR
Hi Gabriella,

Check out the code below.

Cheers,
Luis

library(dartR)
library(data.table)
# test dataset
test <- platypus.gl
# filtering missing data
test <- gl.filter.callrate(test,threshold = 1)
# assigning individuals to population using lapply
assign_res <- lapply(1:nInd(test),function(x){
  ind_name <- indNames(test)[x]
  # this is the assignment function
  tmp <- utils.assignment(test,unknown=ind_name,verbose=0)
  return(data.frame(id=ind_name,assign_pop=tmp[1,1]))
})

# joining list of dataframes by row
assign_res_2 <- rbindlist(assign_res)

# shuffling rows just to exemplify how to order a dataframe in the next step
assign_res_3 <- assign_res_2[order(sample(nrow(assign_res_2))),]

# ordering a dataframe based on individual names of the genlight object
assign_res_4 <- assign_res_3[match(indNames(test), assign_res_3$id),]

# adding to ind.metrics the new pop assignment
test$other$ind.metrics$pop_assignment <- assign_res_4$assign_pop

Gabriella Scatà

unread,
Oct 13, 2022, 2:01:13 AM10/13/22
to dartR
Hi Luis,
thank you so much for the code.
I am having a hard time following the various steps, but I will try it out and try to understand.

What I tried so far is the following (and I would like to know if it would be correct anyways to proceed this way or if it messes up the genlight object...):
  1. I created 2 character vectors with the list of individuals I want to reassign to each group -->
    es. Spp1 = c("ind1", "ind3", "ind8", "ind23") 
          Spp2 = c("ind2", "ind4", ind12")
    class(Spp1) # character

  2. Then, I created a new column/field (called "Genotype") in the individual metrics using mutate(), case_when() and  %in% as follows:
    test_gl$other$ind.metrics = test.gl$other$ind.metrics %>%
                      mutate(Genotype = case_when(id %in% Spp1 ~ "PURE_Spp1",
                                                  id %in% Spp2 ~ "Spp2",
                                                  TRUE ~ "Spp3"))

    I then converted the new field into a factor:
    test_gl$other$ind.metrics$Genotype = as.factor(test_gl$other$ind.metrics$Genotype)

    This piece of code seems to work...so I do get the correct individuals assigned to the correct Species group, but...when I then use the functions gl.report.pa & gl.filter.pa I have some issues...so I am not sure if the way I manipulated the data is ok! So -->

  3. When using gl.report.pa():
    a. First I converted the pop of test_gl into Genotype
    pop(test_gl) =  test_gl$other$ind.metrics$Genotype

    b. Report pa
    test_gl_REPORT_PA = gl.report.pa(
    test_gl,
    x2= NULL,
    method = "pairwise",
    plot.out = TRUE,
    font_plot = 14,
    map.interactive = FALSE,
    save2tmp = FALSE,
    verbose = 5)

    And I seem to get a good output from this report (although see my question in "gl.report.pa()" post --> https://groups.google.com/g/dartr/c/TRgadI6Me8Y)

  4. BUT, when I then try to use the Filter function -->
    test_gl_FILTER_PA = gl.filter.pa(
    test_gl,
    pop1=pop(test_gl)[1], pop2=pop(test_gl) [2] , invers = FALSE, verbose = 5))

    I noticed that :
    pop(test_gl)[1] # = "Spp3"
    pop(test_gl)[2] # = "Spp3" (instead of Spp1 or Spp2 ??)

    While in your testset.gl example (provided in the explanation of the gl.filter.pa function in the dartR manual):
    pop(testset.gl)[1] # = "EmmacMDBForb" --> so pop1
    pop(testset.gl)[1] # = "EmmacMaclGeor" --> pop2

    Did I do something wrong in manipulating my genlight object such that I dont get the different populations when calling pop(test_gl)[1] OR [2]??
            And could I maybe set the field pop1 = test_gl$other$ind.metrics$Genotype == "Spp1" ?

I would really appreciate some help!
Thanks a lot!
Best,
Gabriella

Jose Luis Mijangos

unread,
Oct 14, 2022, 6:58:54 PM10/14/22
to dartR
Hi Gabriella,

Here are some points that might be causing the problem.

1. For assignment is better to use "<-" instead of "=", see link below to know why:


2. In your code you are using two different variables for your dataset "test_gl" and "test.gl".

3. you get the population name of the first individual when you type:

> pop(test_gl)[1]

4. to get the name of the first population you should type:

> popNames(test_gl)[1]

Cheers,
Luis

Breno Hamdan

unread,
Oct 15, 2022, 9:04:23 PM10/15/22
to da...@googlegroups.com

Hi guys,


Is any code capable of creating from DARTseq a  sequence alignment file containing sequence alignments in the PHYLIP format, with one AlleleID alignment following the other, all in one file (arranged one after another)?

 

Example of three SNPs below:


AlleleID_100040979

Litoria_ewingii_PA22 TGCAGACACCTCTGTGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA02    TGCAGACACCTCTGTGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA03    TGCAGACACCTCTGTGAGAATGGAATGGAGAGCTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA07    TGCAGACACCTCTGTGAGAATGGAATGGATAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA21        TGCAGACACCTCTGTGAGAATGGAATCGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA22   TGCAGACACCTCTGTGAGAATGGAATGGAGAAATCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA


AlleleID_100040980

Litoria_ewingii_PA22    TGCAGACACCTCTGTGAGTATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA02    TGCAGACACCTCTGTGAGACTGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA03    TGCAGACACCTCTGTGAGAACGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA07    TGCAGACACCTCTGTGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA21        TGCAGACACCTCTATGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA22   TGCAGACACCTCTGTGAGTATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

 

AlleleID_100040981

Litoria_ewingii_PA22    TGCAGACACCTTTGTGAGTATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA02    TGCAGACACCTTGTGAGACTGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_ewingii_PA03    TGCAGACACCTTTGTGAGAACGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA07    TGCAGACCCCTCTGTGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA21        TGCTGACACCTCTATGAGAATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA

Litoria_paraewingii_PA22   TGCAGACAGCTCTGTGAGTATGGAATGGAGAACTCTGGGAAGGAAATAGCTGATTTACCAACACCCTGA


Cheers
Breno Hamdan
Laboratório de Coleções Biológicas e Biodiversidade
Diretoria Científica
Instituto Vital Brazil

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/35c8e329-fbc0-4d4b-8536-9aca863abd89n%40googlegroups.com.

Gabriella Scatà

unread,
Oct 17, 2022, 7:35:57 PM10/17/22
to dartR
Thank you so much Luis!
I think I managed to fix the problem!
Yes finally I realized that  pop(test_gl)[1] was giving the population name of the first individual. Thank you for the other code for the name of the first population.
Best,
Gabriella

Breno Hamdan

unread,
Mar 10, 2023, 6:57:30 AM3/10/23
to da...@googlegroups.com
Hi Luis,

How are you doing?
I ran gl2bpp(glp6), and this time several SNPs sequences were retrieved with an R in the middle of the sequence, which precludes running the BPP analysis. Example below.

image.png

Could you help me understand why and how to solve it?
Cheers
Breno Hamdan


Breno Hamdan

unread,
Mar 10, 2023, 7:27:37 AM3/10/23
to da...@googlegroups.com
Luis  gl2fasta generates the same "r" in the sequences

image.png
Breno Hamdan

Jose Luis Mijangos

unread,
Mar 14, 2023, 12:25:11 AM3/14/23
to dartR
Hi Breno,

gl2bpp and gl2fasta functions replace heterozygous positions by IUPAC ambiguity codes. "R" is one of the letters that are used as an ambiguity code. You can read more about these ambiguity codes here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2865858/

You should use the parameter method = 2, as below, if you don't want to use ambiguity codes.

library(dartR)
t1 <- platypus.gl
t1 <- gl.filter.callrate(t1,threshold = 1)
t1 <- gl.filter.monomorphs(t1)
t1 <- gl.subsample.loci(t1,n=25)
gl2bpp(x = t1,method = 2)

Cheers,
Luis 

Reply all
Reply to author
Forward
0 new messages