Hi everyone,
I am trying for the first time to assess LD in my dataset.
I would like to have some clarity on each of the available functions to assess LD in the dartR package, as I am a bit confused about what each one does.
In addition, when I tried gl.filter.ld it gives me an error saying that it does not find the genlight object "x" - even though it's there!
Below my questions & the code I used for gl.filter.ld + error message:
Q1. What is the difference between gl.report.ld & gl.report.ld.map?
- gl.report.ld
= calculation of pairwise LD across all loci between subpopulations
Q2 --> this means that pairwise LD measures are calculated for locus a & locus b within a pooled population, so without separating the r2 value of the pair a_b in subpop1 from that in in subpop2?
SO a single r2 value is reported for the pair a_b for only one pooled population as a whole?
(i.e. this would be equivalent to the approach used to assess HWE when pooling all subpopulations together?)
- gl.report.ld.map
calculates pairwise linkage disequilibrium (LD) by population using the function ld (package snpStats)
Q3--> this means that pairwise LD measures are calculated for locus a & locus b within pop1 AND for locus a & locus b within pop2, separately?
AND then an r2 value is reported for the pair a_b for each population?
(i.e. this would correspond to the approach used for HWE - calculating HWE proportions in each subpopulation, separately)
Outcome values:
Q4 --> what is the locus_a/b.stat_keep? How is it calculated?
Q5 --> how is the distance calculated if I am using a SNPs dataset with no chromosome information (non-model species, with no genomic reference map)?
- gl.filter.ld
This function uses the statistic set in the parameter stat_keep from function gl.report.ld.map to choose the SNP to keep when two SNPs are in LD. When a SNP is selected to be filtered out in each pairwise comparison, the function stores its name in a list. In subsequent pairwise comparisons, if the SNP is already in the list, the other SNP will be kept.
Q6 --> does this function only accept the output from gl.report.ld.map or also from gl.report.ld?
If the 2 LD report functions give different outcome, could you EITHER filter loci which have an r2 >0.2 for example in the pooled population as a whole (output of gl.report.ld?) OR in a minimum of half of the subpopulations (output of gl.report.ld.map?) ?
I just want to understand better the different uses of these functions.
- gl.ld.haplotype & gl.ld.distance
Q7 --> these functions are useful only if you have your SNPs mapped into a reference genome, correct?
(as they report in different ways LD measures vs physical genetic distance on a chromosome)
Q8. Error when using gl.filter.ld:
Code:
<
packageVersion("dartR") # ‘2.7.2’ (but I tried with dev@dartR 2.9.4 & i get the same error)
nPop(My_dataset) # 8
My_dataset_REPORT_LD_byPOP = gl.report.ld.map(
My_dataset,
ld_max_pairwise = NULL,
maf = 0.05,
ld_stat = "R.squared",
ind.limit = 10,
stat_keep = "AvgPIC",
ld_threshold_pops = 0.2,
plot.out = TRUE,
plot_theme = NULL,
save2tmp = FALSE,
verbose = 5)
str(My_dataset_REPORT_LD_byPOP)
'data.frame': 47442810 obs. of 11 variables:
$ pop : chr "GOC_1" "GOC_1"
"GOC_1" "GOC_1" ...
$ chr : chr "1" "1" "1"
"1" ...
$ pos_loc_a : num
1 1 1 1 1 1 1 1 1 1 ...
$ pos_loc_b : num
2 3 4 5 6 7 10 12 13 14 ...
$ ld_stat : num
0.138866 0.013779 0.000215 0.041834 0.023374 ...
$ distance : num
1 2 3 4 5 6 9 11 12 13 ...
$ locus_a.snp.name : chr "100080330-68-C/T" "100080330-68-C/T"
"100080330-68-C/T" "100080330-68-C/T" ...
$ locus_a.stat_keep: num 0.388 0.388 0.388 0.388 0.388 ...
$ locus_b.snp.name : chr "100085431-61-C/G"
"100092936-51-T/C" "100094167-33-T/C"
"100094793-41-T/C" ...
$ locus_b.stat_keep: num 0.315 0.468 0.407 0.438 0.388 ...
$ locus_a_b : chr
"100080330-68-C/T_100085431-61-C/G"
"100080330-68-C/T_100092936-51-T/C"
"100080330-68-C/T_100094167-33-T/C"
"100080330-68-C/T_100094793-41-T/C" ...
My_dataset_REPORT_LD_byPOP_LD_halfPOPs_0.2r2
=
gl.filter.ld(
My_dataset,
My_dataset_REPORT_LD_byPOP,
threshold =
0.2,
pop.limit =
ceiling(nPop(x)/2),
verbose = 5)
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function
'nPop': object 'x' not found
nPop(My_dataset) # 8
# so the "x" genlight dataset is there...
>
Could you help me understand what is the problem?
I hope I did not overlook anything this time!
Thanks a lot as usual!
Best,
Gabriella