scan1; max_batch

11 views

Skip to first unread message

Flerpan

unread,

Nov 30, 2020, 8:48:20 AM11/30/20

to R/qtl2 discussion

Hi Karl

I'm running a qtlscan (scan1-function) on a large dataset (1 million phenotypes) and saw that I can set an fixed amount of phenotypes that are ran at the same time (max_batch). What value should you recommend to use here or can you give me a short discription of what the function is doing so I can decide myself? I'm running on 32 cores with 12Gb ram each.

Also, while trying to make the cross-file the read_cross2-function kept on crashing for me. I managed to load the big dataset anyway by reading in the phenotype file separately, see below;

library(qtl2)
cross <- read_cross2("cross_setup.yaml")
cross$pheno <- read_pheno("phenotypes.csv")
save(cross, file = "mycross.Rdata")

Perhaps this could be of use for others

Cheers,

Andrey

Karl Broman

unread,

Nov 30, 2020, 12:30:26 PM11/30/20

to R/qtl2 discussion

As it says in the help file for scan1(), "max_batch indicates the maximum number of phenotypes to run together; default is unlimited."

If you are using a kinship matrix and so a linear mixed model, the phenotypes will be considered one at a time. But if you are not using a kinship matrix, the phenotypes are considered as a batch when fitting the model. There has been some experience that working with no more than, say, 500 at a time can actually be faster than working with them all at once, but this may depend on the computer hardware and the particular data, so the parameter is there if you want to use it.

With 1 million phenotypes, I think the bigger problem you'll have is just storing and working with the results. If the genotype data are n x m and the phenotypes are n x k, the output of scan1 will be m x k, which may be much, much bigger than the data themselves. You may need to break the phenotypes into batches manually.