Dear Karl,
I hope this finds you well. Thanks so much for the previous responses.
I've been trying to do QTL mapping in a DO population (n = ~550), which was genotyped using GigaMUGA (markers = ~ 110,000). I've come across some questions regarding how the genotype/allele probability matrix works in scan1() and would love to have some help with my understanding and further application.
After a few attempts on QTL mapping, I noticed this function called clean_genoprob(), which cleans up genotype/allele probabilities by setting extremely small values to 0. I understand that this function was initially developed to help with the problem of unstable estimates of genotype effects in scan1coef() and fit1() when there's a largely absent genotype. I imagined this would not affect the results much and may give me a less noisy scan result. So, I then started to apply this function with the later analysis before scan1().
Surprisingly, as you may see in the attached screenshots, with the same cross2 file (same covariates and phenotype), cleaning the allele probability greatly altered the haplotype-based QTL results. My understanding is that scan1() fits the linear mixed model using allele probability to obtain RSS values, which are then used to calculate LOD scores. Thus, these small probabilities will also affect LOD scores. But it's surprising how much it has changed.
So, I looked into some of the source codes for scan1() and noticed that scan1() subsets the probabilities, dropping columns with all 0s. This may be the case in our data; many of the markers have one of their allele dosages cleaned to all 0s.
May I ask how function may affect the result with the allele additive model? Do you happen to have any thoughts on using this clean_genoprob() function before running scan1()?
Here is the code I used:
cross2 <- readRDS("cross2_clean.rds")
geno.prob <- calc_genoprob(cross2, error_prob=0.002, map_function="c-f")
allele.prob <- genoprob_to_alleleprob(geno.prob)
allele.prob.clean <- clean_genoprob(allele.prob)
kinship <- calc_kinship(allele.prob, "loco")
kinship.clean <- calc_kinship(allele.prob.clean, "loco")
out <- scan1(allele.prob, phenotype, kinship = kinship, addcovar = covar, Xcovar=sex)
out.clean <- scan1(allele.prob.clean, phenotype, kinship = kinship.clean, addcovar = covar, Xcovar=sex)
plot_scan1(out, cross2$pmap, altcol="green4", bgcolor = "white", altbgcolor = "gray90", ylim = c(0,14))
plot_scan1(out.clean, cross2$pmap, altcol="green4", bgcolor = "white", altbgcolor = "gray90", ylim = c(0,14))
Thank you so much for your help!
Cheers,
Billy
That does seem high. We’ve lost some alleles at some locations in the DO. In a set of ~350 DO mice from G46 – G48, I get ~840 markers at which one founder has a frequency of < 1e-4 in these mice.
If I plot it, I get this. 1 means at least one allele was < 1e-4 at that marker:
What generation are your mice from? It does seem odd that you’d have half of the genome showing zero frequency for a founder.
Dan
--
You received this message because you are subscribed to the Google Groups "R/qtl2 discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rqtl2-disc+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rqtl2-disc/c83f5789-548c-4c80-a411-7141a0e366acn%40googlegroups.com.
I don’t think so. The situation should be better in earlier generations. I tried the counting myself both before and after cleaning and I’m not seeing the same thing that you’re seeing. Maybe I’m making a mistake, but I’m not seeing it. The plots below show the markers at which the overall founder frequency goes below 0.01.
setwd('~/projects/superovulation/data/')
library(qtl2)
# Read in genoprobs.
probs = readRDS('sodo_alleleprobs_grcm39_137K.rds')
# Count the number of low allele frequency columns.
count_zero = function(pr) {
# Sum up each founder at each marker.
zero = lapply(probs, colSums)
# Tally whether the sum is less than the threshold.
zero = lapply(zero, '<', 0.01)
# Sum up the number of low founders at each marker.
zero = sapply(zero, colSums)
unlist(zero)
} # count_zero()
zero = count_zero(probs)
# Clean the genoprobs object by setting low probs == 0.
probs2 = clean_genoprob(object = probs,
value_threshold = 1e-06,
column_threshold = 0.01)
zero2 = count_zero(probs2)
layout(matrix(1:2, ncol = 1))
plot(zero, pch = 16, main = 'Unclean')
plot(zero2, pch = 16, main = 'Cleaned')
Dan
From: rqtl2...@googlegroups.com <rqtl2...@googlegroups.com>
On Behalf Of Billy Jiang
Sent: Monday, April 1, 2024 6:12 PM
To: R/qtl2 discussion <rqtl2...@googlegroups.com>
Subject: [SOCIAL NETWORK] Re: [SOCIAL NETWORK] [Rqtl2-disc] Re: cleaning genptype/allele probability prior to scan1
Hi Dan,
Thanks a lot for your response and for sharing your experience.
The mice are from generations 27 - 37:
> table(cross2[["cross_info"]])
27 28 29 30 31 32 33 34 35 36 37
34 20 34 32 34 26 33 112 33 108 79
Would this be the reason for the loss?
Thanks much,
Billy
On Monday 1 April 2024 at 11:32:08 pm UTC+11 Dan Gatti wrote:
That does seem high. We’ve lost some alleles at some locations in the DO. In a set of ~350 DO mice from G46 – G48, I get ~840 markers at which one founder has a frequency of < 1e-4 in these mice.
If I plot it, I get this. 1 means at least one allele was < 1e-4 at that marker:
To view this discussion on the web visit https://groups.google.com/d/msgid/rqtl2-disc/10957e1d-49ff-437e-aa3a-d314169bc50en%40googlegroups.com.