Scan one problem with non-numeric phenotype error

483 views
Skip to first unread message

Lisa Ranford-Cartwright

unread,
Apr 4, 2013, 2:20:29 PM4/4/13
to rqtl...@googlegroups.com

Dear Karl and group

 

I am trying to run scanone on a dataset with 3 phenotypes, and I am getting an error:

 

>malpheno<-scanone(aartest, pheno.col=2)

 

Error in checkcovar(cross, pheno.col, addcovar, intcovar, perm.strata,  :

  Following phenotypes are not numeric: Column 1

I get the same error when I specify any of the “real” phenotype columns with pheno.col (1, 2 or 3 - the 4th column is the ID and this is not numeric).

 

I have been trying to work out why with no success. I have run scanone before on different (but similar) phenotype data (same genotype data) with no problems, although in this case the phenotypes came in an excel file from a colleague. I checked the excel file and the cells are just the number, nothing else e.g. no calculations etc. I’m struggling to find out what the problem here might be, so I’m hoping someone can help.

 

I remembered a similar problem coming up in the discussion group before and looked it up.  Karl had suggested a check to see if the phenotypes were numeric, which I ran and got this output:

> which(!sapply(aartest$pheno, is.numeric))

 

Neal_Com   Neal_Mal  Neal_Gam   ID

       1          2         3    4

 

I think this means that the phenotype columns are not being read as numbers (is that correct?). I am not sure why this would be.

 

Here is the first line of the phenotype data:

> pull.pheno(aartest)

 

   Neal_Com Neal_Mal Neal_Gam     ID

1      2.69     0.27     2.82   3BB5

 

Some of these phenotypes are decimals rather than whole numbers – although I can’t see that is the problem as examples given in the help pages/ Karl’s wonderful book include decimals.

 

Many thanks for any ideas or suggestions.  

 

Lisa

Karl Broman

unread,
Apr 5, 2013, 9:27:46 AM4/5/13
to rqtl-disc@googlegroups.com discussion, Lisa Ranford-Cartwright
In your data file, you should include some missing value code (like "NA" or "-") in cells with missing data, and then indicate the missing value code through the na.strings argument to read.cross().

My guess is that you have spaces or some other invisible text in such cells, and so the whole column is being interpreted as non-numeric.

Removing the individuals with missing data won't fix the problem, unless you do that to the file and then re-load the data.

A quick fix would be to just force them to be numeric

for(i in 1:nphe(aartest))
aartest$pheno[,i] <- as.numeric( as.character( aartest$pheno[,i] ) )

karl






On Apr 5, 2013, at 4:21 AM, Lisa Ranford-Cartwright <Lisa.Ranfor...@glasgow.ac.uk> wrote:

> Dear Karl
>
> Many thanks as ever for your prompt offers of assistance.
>
> I had another play with the data last night and I think the problem was
> missing phenotypes in a few of the progeny individuals. My original file
> had about 10 phenotypes in it, so I could use the same genetic map for
> all, but there was not a complete set of phenotype data - so some
> individuals would have no data for some phenotypes.
>
> I thought that missing phenotype data could be accommodated/ignored, but
> either there are too many, or the "blanks" have something in them that
> is non-numeric. Pull.pheno does not show anything for these individuals,
> as shown in the clipped example below (here individual X11 has no
> phenotype data for the 3 phenotypes I was interested in):
>
> pull.pheno(aartest)
> Neal_Comb Neal_Mal Neal_Gam ID
> 1 2.69 0.27 2.82 3D7
> 2 10 1 1 HB3
> 3 1.46 0.08 1.46 X10
> 4 X11
> 5 1.4 0.14 0.96 X12
>
> I had tried to remove these individuals previously by making a subset of
> the main data that excluded these individuals (for example, to remove
> the 18th individual I used
> aartest<-subset(aartest, ind=-18) but maybe this isn't the right way to
> do it. This reduced the number of individuals, but scanone still gave
> the same error. Note that I re-ran calc.genoprob after removing the
> individuals before running scanone, but perhaps I needed to do something
> else.
>
> To try and figure out the problem, I made a smaller test file from the
> original excel file reducing the marker number (for speed - my final
> genetic map is very dense with ~4000 snp markers over 14 chromosomes -
> the organism has a high recombination rate and I have a lot of
> independent recombinant progeny from a single cross) - re ran the
> analysis and had the same error (this file is attached and called
> aartest). I then removed the progeny with missing phenotypes entirely
> from the excel file, reread it into rqtl as a new file called test,
> reran the analysis, and scanone ran with no problem. Hurrah!
>
> So as a temporary and ugly fix I can go back to my original data and
> remove those individuals with missing phenotypes, and then I think it
> should work. It would be useful to know for future work if I need to
> remove all individuals with missing phenotype data, and the best way to
> do this ie not from the excel file!
>
> In case you want to check the files, I have attached the two smaller
> files as you suggested - one that works and one that does not. They are
> small so things run very fast.
>
> aartest is the bc file with the missing phenotypes removed which runs
> fine (19 individuals).
> test is the equivalent bc file with the missing phenotype data (23
> individuals).
>
> I can send the "big" file if you want to see the whole data set, but the
> smaller files give the same errors but more quickly.
>
> Many thanks again for any suggestions.
>
> Kind regards
>
> Lisa
>
>
> -----Original Message-----
> From: Karl Broman [mailto:kbr...@gmail.com]
> Sent: 04 April 2013 21:00
> To: Lisa Ranford-Cartwright
> Subject: Re: [Rqtl-disc] Scan one problem with non-numeric phenotype
> error
>
> Could you send me the data?
>
> Use
>
> save(aartest, file="aartest.RData")
>
> and send me the .RData file and I'll figure out what's going on.
>
> karl
>
>
> On Apr 4, 2013, at 1:20 PM, Lisa Ranford-Cartwright
>> Some of these phenotypes are decimals rather than whole numbers -
> although I can't see that is the problem as examples given in the help
> pages/ Karl's wonderful book include decimals.
>>
>> Many thanks for any ideas or suggestions.
>>
>> Lisa
>>
>> --
>> You received this message because you are subscribed to the Google
> Groups "R/qtl discussion" group.
>> To unsubscribe from this group and stop receiving emails from it, send
> an email to rqtl-disc+...@googlegroups.com.
>> To post to this group, send email to rqtl...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/rqtl-disc?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> <aartest.RData><test.RData>

Reply all
Reply to author
Forward
0 new messages