Linkage disequilibrium (LD) functions & error with gl.filter.ld

68 visningar

Hoppa till det första olästa meddelandet

Gabriella Scatà

oläst,

2 maj 2023 23:04:342023-05-02

till dartR

Hi everyone,

I am trying for the first time to assess LD in my dataset.

I would like to have some clarity on each of the available functions to assess LD in the dartR package, as I am a bit confused about what each one does.

In addition, when I tried gl.filter.ld it gives me an error saying that it does not find the genlight object "x" - even though it's there!

Below my questions & the code I used for gl.filter.ld + error message:

Q1. What is the difference between gl.report.ld & gl.report.ld.map?

gl.report.ld
= calculation of pairwise LD across all loci between subpopulations
Q2 --> this means that pairwise LD measures are calculated for locus a & locus b within a pooled population, so without separating the r2 value of the pair a_b in subpop1 from that in in subpop2?
SO a single r2 value is reported for the pair a_b for only one pooled population as a whole?
(i.e. this would be equivalent to the approach used to assess HWE when pooling all subpopulations together?)
gl.report.ld.map
calculates pairwise linkage disequilibrium (LD) by population using the function ld (package snpStats)
Q3--> this means that pairwise LD measures are calculated for locus a & locus b within pop1 AND for locus a & locus b within pop2, separately?
AND then an r2 value is reported for the pair a_b for each population?
(i.e. this would correspond to the approach used for HWE - calculating HWE proportions in each subpopulation, separately)

Outcome values:
Q4 --> what is the locus_a/b.stat_keep? How is it calculated?
Q5 --> how is the distance calculated if I am using a SNPs dataset with no chromosome information (non-model species, with no genomic reference map)?
gl.filter.ld
This function uses the statistic set in the parameter stat_keep from function gl.report.ld.map to choose the SNP to keep when two SNPs are in LD. When a SNP is selected to be filtered out in each pairwise comparison, the function stores its name in a list. In subsequent pairwise comparisons, if the SNP is already in the list, the other SNP will be kept.

Q6 --> does this function only accept the output from gl.report.ld.map or also from gl.report.ld?
If the 2 LD report functions give different outcome, could you EITHER filter loci which have an r2 >0.2 for example in the pooled population as a whole (output of gl.report.ld?) OR in a minimum of half of the subpopulations (output of gl.report.ld.map?) ?

I just want to understand better the different uses of these functions.

gl.ld.haplotype & gl.ld.distance
Q7 --> these functions are useful only if you have your SNPs mapped into a reference genome, correct?
(as they report in different ways LD measures vs physical genetic distance on a chromosome)

Q8. Error when using gl.filter.ld:

Code:

packageVersion("dartR") # ‘2.7.2’ (but I tried with dev@dartR 2.9.4 & i get the same error)
nPop(My_dataset) # 8

My_dataset_REPORT_LD_byPOP = gl.report.ld.map(

My_dataset,
ld_max_pairwise = NULL,
maf = 0.05,
ld_stat = "R.squared",
ind.limit = 10,
stat_keep = "AvgPIC",
ld_threshold_pops = 0.2,
plot.out = TRUE,
plot_theme = NULL,
save2tmp = FALSE,
verbose = 5)

str(My_dataset_REPORT_LD_byPOP)
'data.frame':      47442810 obs. of 11 variables:
$ pop              : chr "GOC_1" "GOC_1" "GOC_1" "GOC_1" ...
$ chr              : chr "1" "1" "1" "1" ...
$ pos_loc_a        : num 1 1 1 1 1 1 1 1 1 1 ...
$ pos_loc_b        : num 2 3 4 5 6 7 10 12 13 14 ...
$ ld_stat          : num 0.138866 0.013779 0.000215 0.041834 0.023374 ...
$ distance         : num 1 2 3 4 5 6 9 11 12 13 ...
$ locus_a.snp.name : chr "100080330-68-C/T" "100080330-68-C/T" "100080330-68-C/T" "100080330-68-C/T" ...
$ locus_a.stat_keep: num 0.388 0.388 0.388 0.388 0.388 ...
$ locus_b.snp.name : chr "100085431-61-C/G" "100092936-51-T/C" "100094167-33-T/C" "100094793-41-T/C" ...
$ locus_b.stat_keep: num 0.315 0.468 0.407 0.438 0.388 ...
$ locus_a_b        : chr "100080330-68-C/T_100085431-61-C/G" "100080330-68-C/T_100092936-51-T/C" "100080330-68-C/T_100094167-33-T/C" "100080330-68-C/T_100094793-41-T/C" ...

My_dataset_REPORT_LD_byPOP_LD_halfPOPs_0.2r2 =
gl.filter.ld(
My_dataset,
My_dataset_REPORT_LD_byPOP,
threshold = 0.2,
pop.limit = ceiling(nPop(x)/2),
verbose = 5)

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'nPop': object 'x' not found

nPop(My_dataset) # 8

# so the "x" genlight dataset is there...

Could you help me understand what is the problem?

I hope I did not overlook anything this time!

Thanks a lot as usual!

Best,

Gabriella

Svara alla

Svara författaren

Vidarebefordra

0 nya meddelanden