Conversion from HAPPY (R1) to R/qtl2

34 views
Skip to first unread message

Dor Fried

unread,
Aug 15, 2024, 10:47:37 AM8/15/24
to R/qtl2 discussion

Dear R/qtl2 Google Group, 

 

In the past, my lab used R1 to perform QTL analysis using collaborative cross (CC) mice. We want to upgrade to R/qtl2. We are using mice lines that are unavailable in the R2 analysis and presented in the HAPPY condensed data. We want to convert from the R1 Mouse data to the R2 SNP convention.  

The input files are from HAPPY. For each chromosome there are 3 files: 

  1. Alleles files – contain the founders' alleles for each marker and the cm of each marker 

  1. Map file – contain the markers order, bp and chromosome 

  1. Data file – contain the line name and pairs of TCGA letters for each marker 

 

 

From this data we need to update and create the input files to R/QTL2: 

  1. cc_genoXX.csv 

  1. MMnGM_foundergenoXX.csv 

  1. MMnGM_gmapXX.csv 

  1. MMnGM_pmapXX.csv 

  1. cc_covar.csv 

  1. cc_crossinfo.csv 

 

For 1) and 2) we need to convert from the TCGA convention to the AB convention for each marker, where can we find the data to do this transformation? 

 

As far as we can tell we have enough data to create 3) and 4) 

 

For 5) we need to find the mitochondria, Ychr and n_founders for each line, is this data required? If it is, what does the n_founders column stand for? 

 

For 6) we need to eight-mating-event history, is this data required?  

 

Thanks in advance 

Dor Fried 

Tel Aviv University  

Karl Broman

unread,
Aug 15, 2024, 10:58:10 AM8/15/24
to R/qtl2 discussion
- The coding of TCGA to AB just needs to be consistent between the founders and the CC lines, and so if you have founders and CC genotypes, you pick one allele to be A and one allele to be B.
I generally use functions in the qtl2convert package:  find_unique_geno() to identify the two alleles in the founder lines, and then encode_geno() to convert the genotypes in both the founders and the CC lines.

- mitochondria, Ychr, and n_founders is not required. Several of the available CC lines were not generated with a standard founder and actually derive from a cross with just 6 or 7 founders; that is what n_founders indicates.

- We do need the eight-mating-event history, specifically for reconstruction of the X chromosome, which in the cross AxBxCxDxExFxGxH should have just alleles A,B,C,E,F, with C at higher frequency.

karl

Dor Fried

unread,
Aug 28, 2024, 6:00:15 AM8/28/24
to R/qtl2 discussion

Dear Dr. Broman, 

I appreciate your prompt and detailed answer. 

I performed the modification of the code as you suggested. 

I still have two questions. 

  • In our dataset, approximately 9% of the mice have an additional allele. In some cases, we encounter more than two alleles for the same SNP. For example, for rs13478072, we see: 5' - CCGTGGAT[AT/GC]CGGTACT - 3'. In such instances, which allele should we choose to assign to the line? Is there a standard practice for handling these cases in R/qtl2? An example from our data set is attached, the first table is the data for each line. The markers with different alleles are marked. The second table has the genome of the founders (because the format does not allow tables they here as photos and Excel file)

 

 

 

Appreciate your help, 

Dor 

founders_snp_exp.png
snp_exp.xlsx
snp_exp.png

Karl Broman

unread,
Aug 28, 2024, 7:04:02 AM8/28/24
to R/qtl2 discussion
Regarding the alternate allele, I would put a missing value. 

Regarding the mating history, I’m not sure of a source for this information. I would ask the lab who provided the lines, or email the group at the University of North Carolina.

karl
Reply all
Reply to author
Forward
0 new messages