Can I use already existed probability data into qtl2

126 views
Skip to first unread message

Bora Kim

unread,
Feb 4, 2022, 8:34:41 AM2/4/22
to R/qtl2 discussion

Please find the attachment from here: https://github.com/rqtl/qtl2/issues/206


Hello, thank you very much for developing this nice package. I am very new to QTL mapping and R/qtl2, and I would greatly appreciate to get any help.


To begin with, I would like to do QTL analysis with the data from tomato RIL population (two parents, F8 selfing).
I've got mapfile (Genetic_map_MM_PI_Sterken_etal.csv) from the paper "Plasticity of maternal environment dependent expression-QTLs, 2020 (https://www.biorxiv.org/content/10.1101/2021.03.29.437558v2.full)" that contains physical map (Mbp) and the estimation of being one of parents, Moneymaker or Pimpinellifolium, base on the SNPs on each locus.

  • To make it workable in R/qtl2, all the numbers < 0.5 were converted to A and >=0.5 to B (genotypes.csv).
  • Physical map was separated from the original map file (Physical_map.csv).
  • Before I conduct full analysis, I tried QTL mapping with small set of phenotype (phenotypes_median_test.csv) that is tomato shoot weight to see if everything is going well .

Then I wrote control file as below.

control_test <- file.path("~/RIL_data_analysis", "control_test.yaml") write_control_file(control_test, crosstype="riself", geno_file="genotypes.csv", gmap_file="physical_map.csv", pmap_file="physical_map.csv", pheno_file="phenotypes_median_test.csv", geno_codes=c(A=1L, B=2L), alleles=c("A", "B"), na.strings=c("-", "NA")) test <- read_cross2("control_test.yaml")
  1. My first question here is, is it correct to choose crosstype="rilself"? I am not very confident about my decision.

Then, I added pseudomarkers to my physical map and calculated the genotype probability.

map <- insert_pseudomarkers(map=test$pmap, step=10000) pr <- calc_genoprob(cross=test, map=map, error_prob=0.001)

As a result, I found that SNP marker genotype probability is either 0.999 or 0.001 for AA and BB, respectively. However, all pseudomarkers probability became 0.5.

> pr$ch01[1:10,,"snp5"] AA BB T_MM 0.999 0.001 T_251 0.001 0.999 T_205 0.999 0.001 T_206 0.999 0.001 T_207 0.999 0.001 T_208 0.001 0.999 T_209 0.999 0.001 T_210 0.999 0.001 T_211 0.999 0.001 T_212 0.999 0.001 > pr$ch01[1:10,,"cch01.loc50000"] AA BB T_MM 0.5 0.5 T_251 0.5 0.5 T_205 0.5 0.5 T_206 0.5 0.5 T_207 0.5 0.5 T_208 0.5 0.5 T_209 0.5 0.5 T_210 0.5 0.5 T_211 0.5 0.5 T_212 0.5 0.5

I understood this as the SNP marker probability is roughly 1 or 0. So, pseduomarkers in between markers inevitably 0.5.
So at the end, I could run all following steps such as scan_1, but of course, without LOD scores at all pseudomarkers. So, it was basically the same as an analysis without adding pseudomarkers.

  1. Therefore, here's my second question: The original map file (Genetic_map_MM_PI_Sterken_etal.csv) already contains probability scores, and is there way to assign it as qtl2 object and use it directly to scan_1?

Thank you in advance for your time.

Karl Broman

unread,
Feb 4, 2022, 9:20:10 AM2/4/22
to R/qtl2 discussion
Regarding the first question, "rilself" is for two-way RIL by selfing, which seems to be appropriate for these data.

The problem you're running into, with the pseudomarkers all having 0.5, is that you're using the physical map in basepairs in place of a genetic map in cM. It may be fine to use the physical map as an approximation of the genetic map, but I would convert it to Mbp.

test$gmap <- lapply(test$gmap, "/", 1e6)
map <- insert_pseudomarkers(map=test$gmap, step=10000) pr <- calc_genoprob(cross=test, map=map, error_prob=0.001)

Regarding your second question, you could in principle use your own genotype probabilities, but we don't have any facilities to assist with that.

The data structure is a list (with each component being a chromosome) of 3-dimensional arrays,
individuals x genotypes x locations

karl

Bora Kim

unread,
Feb 4, 2022, 12:07:27 PM2/4/22
to R/qtl2 discussion
Thank you very much for the fast answer. I thought my physical map is in Mbp, but indeed it is in basepairs.
I tried your suggestion that changes the unit bp to Mbp and inserted pseudomarkers. 

>map2 <- insert_pseudomarkers(map=test2$gmap, step=0.1)
> map2$ch01[1:20]
        snp1 cch01.loc0.1 cch01.loc0.2         snp2 cch01.loc0.3         snp3         snp4         snp5         snp6         snp7
   0.0100000    0.1100000    0.2100000    0.2341200    0.3100000    0.4099604    0.4140064    0.4180976    0.4355072    0.4398925
        snp8         snp9        snp10        snp11        snp12        snp13        snp14        snp15        snp16        snp17
   0.4962492    0.5009257    0.5055631    0.5101766    0.5147904    0.5193125    0.5760517    0.5809494    0.5858481    0.5907328 

It did not result in many psedomarkers in between markers, and so I tried step=0.01, however, I've got error message as: 

Error in insert_pseudomarkers_onechr(map[[i]], step = step, off_end = off_end,  :
  tol should be << step

Q. Can I know what the "tol" means in this message? Is there way to add more pseudomarkers in my map?

Anyway, changing units (bp to Mbp) resulted in different probability for both SNPs and pseudomarkers

> pr2$ch01[1:10,,"snp5"]  #one of marker
               AA          BB
T_MM  1.00000e+00 2.85497e-11
T_251 2.85497e-11 1.00000e+00
T_205 1.00000e+00 2.85497e-11
T_206 1.00000e+00 2.85497e-11
T_207 1.00000e+00 2.85497e-11
T_208 2.85497e-11 1.00000e+00
T_209 1.00000e+00 2.85497e-11
T_210 1.00000e+00 2.85497e-11
T_211 1.00000e+00 2.85497e-11
T_212 1.00000e+00 2.85497e-11

> pr2$ch01[1:10,,"cch01.loc0.3"] #one of pseudomarker
                AA           BB
T_MM  9.999970e-01 3.039735e-06
T_251 3.039735e-06 9.999970e-01
T_205 9.999970e-01 3.039735e-06
T_206 9.999970e-01 3.039735e-06
T_207 9.999970e-01 3.039735e-06
T_208 3.039735e-06 9.999970e-01
T_209 9.999970e-01 3.039735e-06
T_210 9.999970e-01 3.039735e-06
T_211 9.999970e-01 3.039735e-06
T_212 9.999970e-01 3.039735e-06


However, after reading your answers, I realized that physical map and genetic map have very different meaning. I feel it is wrong to calculate probability based on physical map. 
I feel bit lost, but thinking two options that 1) Calculate probability without psedomarkers and continue 'scan_1', otherwise, 2) construct 3d array data using original mapfile as you suggested and use it directly to 'scan_1'.
Second option maybe demanding yet more precise. What do you think? otherwise, can I get some advice on how to perform QTL analysis using R/qlt2 when only physical map is available?

Thank you in advance for your time.



Karl Broman

unread,
Feb 4, 2022, 1:30:29 PM2/4/22
to R/qtl2 discussion
The initial SNPs on that chromosome are all closely spaced, but you'll find that there are some big gaps where pseudomarkers got added. Compare the number of positions in the different maps:

sapply(test2$gmap, length)
sapply(map2, length)

To understand the tol argument to insert_pseudomarkers(), see the help page:

tol Tolerance for determining whether a pseudomarker would duplicate a marker position.

We're adding a grid of pseudomarkers to the map, but if a pseudomarker to be added is within tol of a marker that is already there, we just leave it out.

If you don't have a genetic map, you can either interpolate a map from some external source, or you can estimate it with the available data. See the function est_map().

karl



Bora Kim

unread,
Feb 7, 2022, 12:20:59 PM2/7/22
to R/qtl2 discussion
Dear Karl, 

Thank you for your advice and please find my attached R markdown. 
I thought it would be better to show you whole process rather than some random codes without context. 
I have tried est_map() function and it worked well : ) thank you very much, but I do still have couple of minor questions. 

Q1. I am wondering, which level of density (how many markers) do you call it high density? In my markdown, you will find that I compared the number of markers before and after inserting pseudomarkers. But if my original markers are dense enough, I won't add psedomarkers as eventually I need to test over 600 phenotypes.
Q2. If I interpret the result correctly, there is a significant LOD peak for phenotype 'fresh_shoot_g' at position 44.1000 cM (in range between 43.05117 and 46.12834) in chromosome 5. Then, I could find marker placed at 44.1000 cM is one of pseudomarker "cch05.loc44.1" Can I consider this marker as the loci that might be involved in fresh_shoot_g? or do I still need to consider other markers that are in the interval range (43.05117 to 46.12834)?
Q3. So as you know, I have a physical map. If I want to find the physical position of the pseudomarker "cch05.loc44.1" (that has significant LOD peak in above question), how can I do it? I found code something like "max(out, pmap)" in somewhere, but when I tried it, I could only get physical position of already existed SNP marker, but not pseudomarker. I think it is because pseudomarkers were inserted into genetic map and so can't link to physical position, but I thought perhaps you know some magic trick. 

Thank you for your help and have a good day :  ) 

Bora
test.pdf

Karl Broman

unread,
Feb 7, 2022, 11:51:53 PM2/7/22
to R/qtl2 discussion
1. "high density" depends on the type of cross and the sample size. Adding pseudomarkers let's you investigate positions between markers, but they slow computations.

2. You're not going to be able to resolve a QTL to a single marker or pseudomarker. It's always going to be a range.

3. You can use the function interp_map() to interpolate the pseudomarker positions to the physical map.

gmap <- insert_pseudomarkers(mycross$gmap, step=1)
pmap <- interp_map(gmap, mycross$gmap, mycross$pmap)

karl

Bora Kim

unread,
Feb 14, 2022, 8:05:52 AM2/14/22
to R/qtl2 discussion
'interp_map' works perfectly : ) Thank you Karl for all your guidance. Have a good day! 
Reply all
Reply to author
Forward
0 new messages