R/qtl2 Analysis with CC Mice and "Strain Mean" Phenotypes

43 views
Skip to first unread message

Christopher Panaretos

unread,
Nov 17, 2023, 1:54:25 AM11/17/23
to R/qtl2 discussion

Dear R/qtl2 Google Group,


I am using R/qtl2 to perform a “risib8”-type QTL analysis using collaborative cross (CC) mice, and I am brand new to this type of analysis. I would like to describe my understanding about how to perform such an analysis, and also pose a number of questions to the group, if I may. Please correct me anywhere I make a mistake.


Background: I am currently learning how to use R/qtl2 with a test data set that includes 14 different CC mice strains, and a quantitative phenotype that is averaged over all mice of a particular strain. Thus, my R/qtl2 analysis uses the so-called “strain means” paradigm.


After reading through the R/qtl2 user guide, I believe there are a total of 7 required .csv files for this analysis, which I will list here:


=================


File 1 - Genotypes.csv


For a strain means analysis, the rows of this file will be the 14 CC Strain IDs, not individual animal IDs. The columns are the genome Marker IDs. The body will contain [A,B,-] allele information.


Q1a -> What kind of markers are the CC mice markers? MUGA, MegaMUGA, GigaMUGA, or some other?


Q1b -> The website https://github.com/rqtl/qtl2data/tree/main/CC lists the CC mouse Marker information for each chromosome, and appears to use a [A,B,-] allele nomenclature. What is this nomenclature?


Q1c -> The github website named in Q2 has Marker information per chromosome, but not per CC strain. I assume I will need to download these chromosome files, extract the marker information for my 14 CC mouse strains, and manually reorganize it per CC strain?


=================


File 2 - FounderGenotypes.csv


All CC mice strains derive from an original eight inbred mouse strains. So the rows of this file will be the founder strain IDs, and the columns are Marker IDs. The body contains [A,B,-] allele information.


Q2 -> Is this file required?


=================


File 3 - GeneticMap.csv


The first column lists all Marker IDs present in the CC mice. The second column lists the chromosome number of each Marker ID. The third row lists the centiMorgan position of each Marker ID.


=================


File 4 - Phenotype.csv


The rows of this file are the 14 CC Strain IDs. The columns are the quantitative phenotypes. The body contains the quantitative phenotype values.


=================


File 5 - Covariates.csv


For a QTL analysis using individual animals, this file would contain covariate information for each animal, like Sex or Age. However, since I am using the “strain means” paradigm, I don’t think there are any covariates per strain.


Q5 -> Is this file required?


=================


File 6 - PhenotypeCovariates.csv


I don’t know what a phenotype covariate is. I am using a single, quantitative phenotype response, and I don’t believe it is possible for these values to have covariates. Also, what is the difference between a Covariate (File 5) and a Phenotype Covariate (File 6)?


Q6 -> Is this file required?


=================


File 7 - CrossInfo.csv


Since I am using the “strain means" paradigm, the rows of this file are the Strain IDs. I am not sure what the columns should be. After reading the R/qtl2 guide, I believe there should be eight columns, one column per mating event. These eight mating events took place in a particular order, listed from left to right in this .csv file, and each mating event lists one of the eight founder mice (?).


Q7 -> For each CC strain, where can I find its eight-mating-event history?


=================


Again, please tell me if I am misunderstanding any of the QTL concepts, as they are new to me. I look forward to reading your responses, and thanks for taking the time to read through this long post!


Cheers, Chris



Dan Gatti

unread,
Nov 17, 2023, 6:59:14 AM11/17/23
to rqtl2...@googlegroups.com

Responses interspersed below:

 

File 1 - Genotypes.csv

 

For a strain means analysis, the rows of this file will be the 14 CC Strain IDs, not individual animal IDs. The columns are the genome Marker IDs. The body will contain [A,B,-] allele information.

 

Q1a -> What kind of markers are the CC mice markers? MUGA, MegaMUGA, GigaMUGA, or some other?

 

It looks like they’re on the Gigamuga. The CC data that you link below has 110,054 markers, which is a subset of the 143,259 markers on the Gigamuga.

 

Q1b -> The website https://github.com/rqtl/qtl2data/tree/main/CC lists the CC mouse Marker information for each chromosome, and appears to use a [A,B,-] allele nomenclature. What is this nomenclature?

 

The A/B/- removes the ACGTN calls and converts them to 2 alleles. This is all that is needed for haplotype reconstruction.

 

Q1c -> The github website named in Q2 has Marker information per chromosome, but not per CC strain. I assume I will need to download these chromosome files, extract the marker information for my 14 CC mouse strains, and manually reorganize it per CC strain?

 

The marker positions don’t change with each strain. The genotype changes. So you will have one physical and one genetic map file per chromosome.

 

=================

 

File 2 - FounderGenotypes.csv

 

All CC mice strains derive from an original eight inbred mouse strains. So the rows of this file will be the founder strain IDs, and the columns are Marker IDs. The body contains [A,B,-] allele information.

 

Q2 -> Is this file required?

 

Yes.

=================

 

File 3 - GeneticMap.csv

 

The first column lists all Marker IDs present in the CC mice. The second column lists the chromosome number of each Marker ID. The third row lists the centiMorgan position of each Marker ID.

 

=================

 

File 4 - Phenotype.csv

 

The rows of this file are the 14 CC Strain IDs. The columns are the quantitative phenotypes. The body contains the quantitative phenotype values.

 

=================

 

File 5 - Covariates.csv

 

For a QTL analysis using individual animals, this file would contain covariate information for each animal, like Sex or Age. However, since I am using the “strain means” paradigm, I don’t think there are any covariates per strain.

 

Q5 -> Is this file required?

 

I’m not sure. But I would include it since you have it on the qtl2data website.

 

=================

 

File 6 - PhenotypeCovariates.csv

 

I don’t know what a phenotype covariate is. I am using a single, quantitative phenotype response, and I don’t believe it is possible for these values to have covariates. Also, what is the difference between a Covariate (File 5) and a Phenotype Covariate (File 6)?

 

Q6 -> Is this file required?

 

Did you do your experiment in batches? That would be a covariate. And if you have different sexes, then that would also be a covariate.

 

=================

 

File 7 - CrossInfo.csv

 

Since I am using the “strain means" paradigm, the rows of this file are the Strain IDs. I am not sure what the columns should be. After reading the R/qtl2 guide, I believe there should be eight columns, one column per mating event. These eight mating events took place in a particular order, listed from left to right in this .csv file, and each mating event lists one of the eight founder mice (?).

 

Q7 -> For each CC strain, where can I find its eight-mating-event history?

 

 

=================

 

Again, please tell me if I am misunderstanding any of the QTL concepts, as they are new to me. I look forward to reading your responses, and thanks for taking the time to read through this long post!

 

Cheers, Chris

 

--

You received this message because you are subscribed to the Google Groups "R/qtl2 discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rqtl2-disc+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/rqtl2-disc/773b896c-2290-464f-875b-7227bbbd3408n%40googlegroups.com.

---

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Karl Broman

unread,
Nov 17, 2023, 5:14:43 PM11/17/23
to R/qtl2 discussion

Hi, Chris,

You may not need to focus on the details in these files. You can load these CC data files directly into R using the following:

library(qtl2)
file <- paste0("https://raw.githubusercontent.com/rqtl/", "qtl2data/main/CC/cc.zip")
cc <- read_cross2(file)

If you create a vector containing your quantitative phenotype data, with the vector names containing the CC lines names (in the same format as is used in this cc object), then you can proceed directly to QTL mapping.

my_pheno <- c("CC001/Unc"=10.95, "CC002/Unc"=11.86, "CC003/Unc"=8.58, "CC004/TauUnc"=11.35, "CC005/TauUnc"=5.41,
              "CC006/TauUnc"=12.95, "CC007/Unc"=11.72, "CC008/GeniUnc"=10.72, "CC009/Unc"=8.61, "CC010/GeniUnc"=8.91)
pr <- calc_genoprob(cc)
out <- scan1(pr, my_pheno)
plot(out, cc$pmap)

karl
Reply all
Reply to author
Forward
0 new messages