Linkage disequilibrium (LD) functions & error with gl.filter.ld

68 visningar
Hoppa till det första olästa meddelandet

Gabriella Scatà

oläst,
2 maj 2023 23:04:342023-05-02
till dartR
Hi everyone,
I am trying for the first time to assess LD in my dataset.

I would like to have some clarity on each of the available functions to assess LD in the dartR package, as I am a bit confused about what each one does.
In addition, when I tried gl.filter.ld it gives me an error saying that it does not find the genlight object "x" - even though it's there!

Below my questions & the code I used for gl.filter.ld + error message:


Q1. What is the difference between gl.report.ld & gl.report.ld.map?
  • gl.report.ld
    = calculation of pairwise LD across all loci between subpopulations
    Q2 --> this means that pairwise LD measures are calculated for locus a & locus b within a pooled population, so without separating the r2 value of the pair a_b in subpop1 from that in in subpop2?
    SO a single r2 value is reported for the pair a_b for only one pooled population as a whole?
    (i.e. this would be equivalent to the approach used to assess HWE when pooling all subpopulations together?)

  • gl.report.ld.map
    calculates pairwise linkage disequilibrium (LD) by population using the function ld (package snpStats)
    Q3--> this means that pairwise LD measures are calculated for locus a & locus b within pop1 AND for locus a & locus b within pop2, separately?
    AND then an r2 value is reported for the pair a_b for each population?
    (i.e. this would correspond to the approach used for HWE - calculating HWE proportions in each subpopulation, separately)

    Outcome values:
    Q4 --> what is the locus_a/b.stat_keep? How is it calculated?
    Q5 --> how is the distance calculated if I am using a SNPs dataset with no chromosome information (non-model species, with no genomic reference map)?

  • gl.filter.ld
    This function uses the statistic set in the parameter stat_keep from function gl.report.ld.map to choose the SNP to keep when two SNPs are in LD. When a SNP is selected to be filtered out in each pairwise comparison, the function stores its name in a list. In subsequent pairwise comparisons, if the SNP is already in the list, the other SNP will be kept.

    Q6 --> does this function only accept the output from gl.report.ld.map or also from gl.report.ld?
    If the 2 LD report functions give different outcome, could you EITHER filter loci which have an r2 >0.2 for example in the pooled population as a whole (output of gl.report.ld?) OR in a minimum of half of the subpopulations (output of gl.report.ld.map?) ?

    I just want to understand better the different uses of these functions.

  • gl.ld.haplotype & gl.ld.distance  
    Q7 --> these functions are useful only if you have your SNPs mapped into a reference genome, correct?
    (as they report in different ways LD measures vs physical genetic distance on a chromosome)

Q8. Error when using gl.filter.ld:
Code:
<
packageVersion("dartR") # ‘2.7.2’ (but I tried with dev@dartR 2.9.4 & i get the same error)
nPop(My_dataset) # 8

My_dataset_REPORT_LD_byPOP = gl.report.ld.map(
    My_dataset,
    ld_max_pairwise = NULL,
    maf = 0.05,
    ld_stat = "R.squared",
    ind.limit = 10,
    stat_keep = "AvgPIC",
    ld_threshold_pops = 0.2,
    plot.out = TRUE,
    plot_theme = NULL,
    save2tmp = FALSE,
    verbose = 5)

str(My_dataset_REPORT_LD_byPOP)
'data.frame':      47442810 obs. of  11 variables:
$ pop              : chr  "GOC_1" "GOC_1" "GOC_1" "GOC_1" ...
$ chr              : chr  "1" "1" "1" "1" ...
$ pos_loc_a        : num  1 1 1 1 1 1 1 1 1 1 ...
$ pos_loc_b        : num  2 3 4 5 6 7 10 12 13 14 ...
$ ld_stat          : num  0.138866 0.013779 0.000215 0.041834 0.023374 ...
$ distance         : num  1 2 3 4 5 6 9 11 12 13 ...
$ locus_a.snp.name : chr  "100080330-68-C/T" "100080330-68-C/T" "100080330-68-C/T" "100080330-68-C/T" ...
$ locus_a.stat_keep: num  0.388 0.388 0.388 0.388 0.388 ...
$ locus_b.snp.name : chr  "100085431-61-C/G" "100092936-51-T/C" "100094167-33-T/C" "100094793-41-T/C" ...
$ locus_b.stat_keep: num  0.315 0.468 0.407 0.438 0.388 ...
$ locus_a_b        : chr  "100080330-68-C/T_100085431-61-C/G" "100080330-68-C/T_100092936-51-T/C" "100080330-68-C/T_100094167-33-T/C" "100080330-68-C/T_100094793-41-T/C" ...


My_dataset_REPORT_LD_byPOP_LD_halfPOPs_0.2r2 =
gl.filter.ld(
     My_dataset,
     My_dataset_REPORT_LD_byPOP,
     threshold = 0.2,
     pop.limit = ceiling(nPop(x)/2),
     verbose = 5)

          Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a            method for function 'nPop': object 'x' not found

nPop(My_dataset) # 8

# so the "x" genlight dataset is there...
>

Could you help me understand what is the problem?
I hope I did not overlook anything this time!

Thanks a lot as usual!
Best,
Gabriella

Svara alla
Svara författaren
Vidarebefordra
0 nya meddelanden