error message due to chr names : scan() expected 'a real', got

4,144 views
Skip to first unread message

gerald...@gmail.com

unread,
Aug 27, 2013, 10:13:51 AM8/27/13
to methylkit_...@googlegroups.com
Hi all,
we begin to use methylKit for a WGBS HiSeq run analysis and encountering some problems. We obtained the following error : 
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got 'LGE64.4486'
Calls: read.bismark ... read -> .local -> .readTableFast -> read.table -> scan


The corresponding command line used : objs=read.bismark("bismarkSortedOutput.sam", "Accl", assembly="gg4", save.folder="/work/methylKit",save.context="CpG",read.context="CpG",nolap=TRUE,mincov=10,minqual=20,phred64=FALSE)

The file /work/methylKit/Accl_CpG.txt was created and seems complete (all the chromosomes are present in it). It seems that methylKit fails while loading the methylRawFile object from the previous file
Here is an extract of the file where the fail occurs : 
21.6800885      21      6800885 F       123     54.47   45.53
21.6800667      21      6800667 F       11      54.55   45.45
LGE64.4486      LGE64   4486    F       15      33.33   66.67

I modified the Accl_CpG.txt to prefix the first and second column with  "chr" (not for the first line), and then I can load the methylRawFile object with
objs=read("Accl_CpG_modified.txt", "AcclModified", assembly="gg4")

I downloaded the source of methylKit, but I'm not an R expert...at all. It's quite difficult for me to locate the way the "scan" method is called and where (create an eclipse R project/ import methylKit source / Ctrl+H looking for "scan" in all the files of the project > no result).
It seems that one can enforce the column type of the scan method...but don't know how to do this....

could you please help me correcting this?

Another question : 
is there a script somewhere to generate the txt file (input of methylKit) from a sam output of bismark?

Thank you for your help

Gérald


Altuna Akalin

unread,
Aug 27, 2013, 11:13:53 AM8/27/13
to
I think the problem occurs when the first column is expected to be numeric based on the first couple hundred rows.  While reading in, R encounters something non-numeric. This will hopefully be fixed in the next version where the redundant first column is removed from the data structures. 

Adding "chr" string helps to resolve this problem as you have discovered. 

The script that does the bismark sam file parsing is a perl script located at the "exec" folder of the installed package, so you can run that on the sam files separately, then add "chr" string via awk or some one-liner. The script has some light documentation. You should be able to run it as follows:


perl ~/R/methylKit/exec/methCall.pl --read1 sorted.sam.txt --type paired_sam --nolap --CpG test.CpG.txt

~/R/ folder is the installation directory for my R packages, you should change it with where ever your packages are installed.


Best,
Altuna
Reply all
Reply to author
Forward
0 new messages