Re: coding effect

68 views

Skip to first unread message

Christopher Viot

unread,

Sep 15, 2009, 4:01:23 AM9/15/09

to rqtl...@googlegroups.com

Dear Karl,

Your Rqtl software seemed to me fantastic from the beginning, as i'd been able to scan a colleague's micro-array data for 25,000 genes on a N=~100 RI population against a 800 mk map, a few months ago. Other times i could work on the RF data from this map, an enormous matrix, and so on.

Many thanks to you, including R programmers, for all these softwares. The whole of it is very interesting and usefull.

Now the question of homogeneity with other results obtained from WinQTLCart was raised ; so i compared and communicate some results :

1- comparison Rqtl / WinQtlC : single-marker analysis = SMA, interval mapping = IM ; graphs showing all chr linearly linked, with cM cumulated

I used "em" instead of "hk" and error.prob=0, on a RI population

- strict global identity at level SMA

- strict identity with IM too, the apparent shift is not actual chromosome by chr ; it results from the fact that WQTLC truncates the chromosome before the last mk, what makes that some cM can be lacking, and this increases cumulating along the genome ; contrarywise, total genome length is exact with Rqtl

- for threshold calculations Rqtl was better, while WQTLC, somewhat slower, fails frequently when distributions are irregular (strange discontinuity of the series of permutation LOD values when results can be obtained, but i'm incompetent to appreciate)

1- comp Rqtl sma // WinQtlCart sma

2- comp Rqtl im // WinQtlCart im

2- some of a problem now, with cross type and coding ; when our colleague was looking at validating his results on A/B/H/C/D coding, he compared with ours on A/B (reducing D to A, C to B, H to NA, more or less) on identical phenotypic series and this failed severely ;

searching for an explanation :

- after "read.cross", cross type was changed from "f2" to "riself" ; nevertheless, genotypes considered as valid according to cross type are apparently, if i interpret correctly :

AA AB BB D C

f2 1 2 3 4 5

riself 1 2

Seemingly, this makes that when you change from "f2" to "riself", BB data coded as "3" are considered as NA because they are neither "1" nor "2", while if there were AB data, coded as "2", they now are considered as BB

Solution :

- in csvr format, if you add in read.cross : " ,genotypes=c("A","B","H","C","D") ", that is BB data is coded as AB, then changing from "f2" to "riself", you have the correct genotypes and the qtl scan on the riself population gives expected results (H, C and D data had previously been transformed as explained above)

- else, one can change to BC then to riself

- in gary format one must change 2s to 1s in the geno file

Hoping my comments are adequate and of some utility,

Best regards,

Dr. Christopher Viot
UMR DAP
Département BIOS "Systèmes biologiques"
Centre de Coopération Internationale en Recherche
Agronomique pour le Développement (CIRAD)
Avenue Agropolis - TA B-10 / 02 (Bât. 03, Bur. 103)
F-34398 Montpellier Cedex 5, France
Tel.: +33 467 614 471, fax: +33 467 615 605, courriel : christop...@cirad.fr
http://umr-dap.cirad.fr - http://tropgenedb.cirad.fr

Karl Broman

unread,

Sep 15, 2009, 10:47:17 AM9/15/09

to R/qtl discussion, Christopher Viot

Dear Chris,

Thanks for your kind words, and for the results comparing R/qtl to
WinQTLCart.

Regarding the issues with cross type and coding in RIL by selfing:

1. The data should be read in as if they were a backcross, but with
the 2nd homozygote genotype being treated as a heterozygote. Any
actual heterozygotes need to be treated as missing in R/qtl. If the
data file has genotypes coded AA/AB/BB, use read.cross, for a "csv"
file, like this:

mycross <- read.cross("csv", "", "mydata.csv", genotypes=c("AA","BB"),
na.strings=c("-","NA","AB"))

2. I recommend against the "gary" format. I don't think even Gary
Churchill (from which it comes) uses it any more.

3. If you read in the data and then change the cross type to
"riself" (or "risib"), run summary.cross, which we rely on for checks
of data integrity. It should issue a warning message about invalid
genotype codes.

Your comments are helpful. I think what I should do is add functions
convert2riself() and convert2risib(), to make the conversion from what
gets read in to riself/risib format. If the data had been imported as
an f2, these functions could convert the genotype codes appropriate,
and just issue a warning if there were heterozygote genotypes that got
removed.

Kind regards,
karl
-----------
Karl Broman
kbr...@gmail.com
http://www.biostat.wisc.edu/~kbroman

> <clip_image001.gif>

> 2- comp Rqtl im // WinQtlCart im

> <clip_image002.gif>

Reply all

Reply to author

Forward

0 new messages