Error reading in bins

44 views
Skip to first unread message

Sarah Carl

unread,
May 6, 2014, 7:01:47 AM5/6/14
to mosaics_u...@googlegroups.com
Hello,

I've been successfully using MOSAiCS until this week, when I upgraded my R to version 3.1.0 and Bioconductor to version 2.14. Now when I call readBins() on data that I had previously processed using constructBins(), I get the following error:

Error in mapScore[, 2] * rounding : 
  non-numeric argument to binary operator

I constructed the mappability data using the instructions here: http://www.stat.wisc.edu/~keles/Software/mosaics/
I then combined the data for each chromosome into one file (same for the GC and N data).

Has anyone else come across this error, and do you have any idea what might be causing it? Could it be an issue with chromosome names? I'm using the Drosophila pseudoobscura genome, so I have somewhat non-standard names like chr4_group1e. However, it seems strange that it was working fine before and now gives me this error.

Cheers,
Sarah


Dongjun Chung

unread,
May 6, 2014, 2:42:00 PM5/6/14
to mosaics_u...@googlegroups.com
Hi Sarah,

Could you please confirm a few things, to figure out the sources of problems?

1) Which version of mosaics package did you use before you upgraded mosaics package to version 2.14?
2) Do the chromosome names coincide among all the files?
3) Could you provide the command lines you are using?
4) Could you provide the first few lines of the files you are using?

Thanks,
Dongjun


--
You received this message because you are subscribed to the Google Groups "MOSAiCS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mosaics_user_gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sarah Carl

unread,
May 6, 2014, 3:01:02 PM5/6/14
to mosaics_u...@googlegroups.com
Hi Dongjun,

Thanks for the quick reply!

In answer to your questions, I believe that I was using the most recent version of mosaics, even though I had an older version of R. Unfortunately, I can't double-check, because I had to reinstall it after my R got updated.

The chromosome names do coincide among all the files. However, they aren't in the same order in all the files - would that cause a problem?

Here are the command lines I'm using. I first built the bins from bed files (I've tried using the bin data that I made before and making new bin data; constructBins() works, but I still get an error after calling readBins()):

constructBins(infile="../Density/Stage_9_1.ext300.bed", fileFormat="bed", outfileLoc="./", byChr=FALSE,        useChrfile=FALSE, chrfile=NULL, PET=FALSE, fragLen=300, binSize=50)

binData <- readBins(type=c("chip", "M", "GC", "N"), fileName=c("./Stage_9_1.ext300.bed_fragL300_bin50.txt", "./dp3_map_fragL300_bin50.txt", "./dp3_GC_fragL300_bin50.txt", "./dp3_N_fragL300_bin50.txt"))


And the first few lines of each file:

The bin file:
chrXL_group1e   0       3
chrXL_group1e   50      11
chrXL_group1e   100     13
chrXL_group1e   150     14
chrXL_group1e   200     14
chrXL_group1e   250     17

The mappability file:
chr2    0       1.0
chr2    50      1.0
chr2    100     1.0
chr2    150     1.0
chr2    200     1.0
chr2    250     1.0
chr2    300     1.0
chr2    350     1.0

The GC file:
chr2    0       0.402186259822477
chr2    50      0.403171352784456
chr2    100     0.389682596093851
chr2    150     0.376311591634198
chr2    200     0.373197297181536
chr2    250     0.368923948400469
chr2    300     0.354858096828047
chr2    350     0.337262103505843

And the N file:
chr2    0       0
chr2    50      0
chr2    100     0
chr2    150     0
chr2    200     0
chr2    250     0
chr2    300     0
chr2    350     0


Thanks for your help!
Sarah



--
You received this message because you are subscribed to a topic in the Google Groups "MOSAiCS User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mosaics_user_group/tBqr7fAImC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mosaics_user_gr...@googlegroups.com.

Dongjun Chung

unread,
May 8, 2014, 4:56:58 PM5/8/14
to mosaics_u...@googlegroups.com
Hi Sarah,


The chromosome names do coincide among all the files. However, they aren't in the same order in all the files - would that cause a problem?

This should be OK. The order of chromosome should not matter.
 

Here are the command lines I'm using. I first built the bins from bed files (I've tried using the bin data that I made before and making new bin data; constructBins() works, but I still get an error after calling readBins()):

Both command lines and input files look OK. One more thing to check is whether these files are imported into R correctly. The following lines are used to read in these files into R in the mosaics R package. Can you try the following command lines and confirm whether each object has three columns and contents look OK?

chip <- read.table( chipFileName, header=FALSE, sep='\t',
    colClasses=c("character","numeric","numeric"), comment.char="",
    stringsAsFactors=FALSE, check.names=FALSE )
mapScore <- read.table( mapScoreFileName, header=FALSE,
    stringsAsFactors=FALSE, check.names=FALSE, comment.char="" )
gcScore <- read.table( gcScoreFileName, header=FALSE,
    stringsAsFactors=FALSE, check.names=FALSE, comment.char="" )
nNuc <- read.table( nNucFileName, header=FALSE,
    stringsAsFactors=FALSE, check.names=FALSE, comment.char="" )

Thanks,
Dongjun

Sarah Carl

unread,
May 12, 2014, 4:15:15 AM5/12/14
to mosaics_u...@googlegroups.com
Hi Dongjun,

I think I've found the source of the problem - when I run the commands you give me and look at the resulting objects, they each have 3 columns, but for mapScore, column 3 is stored as character data, rather than numeric:

> summary(mapScore)
      V1                  V2                V3           
 Length:2821490     Min.   :       0   Length:2821490    
 Class :character   1st Qu.: 2428912   Class :character  
 Mode  :character   Median : 5642675   Mode  :character  
                    Mean   : 7696876                     
                    3rd Qu.:10724800                     
                    Max.   :30711150  

> head(mapScore) V1 V2 V3 1 chr2 0 1.0 2 chr2 50 1.0 3 chr2 100 1.0 4 chr2 150 1.0 5 chr2 200 1.0 6 chr2 250 1.0

 For each of the other objects, both columns 2 and 3 are numeric. I don't know if this is due to a problem with my file or the way that R is reading it in (it seems strange that the problem has just come up, when it was working before with the same file). Do you have any suggestions as to how to fix it?

Thanks again for your help!
Sarah

Sarah Carl

unread,
May 12, 2014, 6:01:36 AM5/12/14
to mosaics_u...@googlegroups.com
Dear Dongjun,

I've discovered that there are some entries in my mappability file where very low scores are recorded using scientific notation, such as:

chrXR_group9    6570700 2.0E-4

I think this is what's causing the error, as R doesn't recognize the scores as numeric. I made the mappability file following the directions on your website (using the PeakSeq code first). I don't know if there's a way to force R to read those values as numeric, or if I need to change the mappability file.

Cheers,
Sarah

Sarah Carl

unread,
May 12, 2014, 6:25:28 AM5/12/14
to mosaics_u...@googlegroups.com
Hi Dongjun,

Sorry for sending so many e-mails! I'm gradually figuring it out...

If I run this command, the import seems to work:

mapScore <- read.table( "./dp3_map_fragL300_bin50.txt", header=FALSE,
  stringsAsFactors=FALSE, check.names=FALSE, comment.char="",
  colClasses=list("character", "integer", "double")

But then I can't figure out how to make a BinData object from the data. I tried running readBins() and specifing the data frame mapScore instead of a path to a filename, but I got an error saying that the length of 'type' and 'fileName' do not match. Is there a way to import the bin data from a data frame directly instead of reading in files with readBins()?

Cheers,
Sarah

Dongjun Chung

unread,
May 12, 2014, 10:57:08 AM5/12/14
to mosaics_u...@googlegroups.com
Hi Sarah,

It is great to hear that you are figuring out the sources of problems.

It seems to be clear that the third column of mappability score file generates the problem.
However, I still would like to check the source of problem more in detail
because R usually recognizes the scientific notation in the file it imports.

One way to check this is to change "2.0E-4" to "0.0002" in your mappability score file
and see whether mosaics can read it with readBins().
Could you please try this and let me know whether this solves the problem?

Thanks,
Dongjun

Sarah Carl

unread,
May 15, 2014, 4:27:57 AM5/15/14
to mosaics_u...@googlegroups.com
Hi Dongjun,

I was able to get readBins() to work! What I did was to read in my original mappability file using:


mapScore <- read.table( "./dp3_map_fragL300_bin50.txt"
, header=FALSE,
  stringsAsFactors=FALSE, check.names=FALSE, comment.char="",
  colClasses=list("character", "integer", "double")

And then I wrote the resulting data frame back into a bed file using:

write.table(mapScore, file="test_mapScore.bed", sep="\t", row.names=FALSE, col.names=FALSE, quote=FALSE)

Finally I used readBins() with this new mappability file, and the import worked fine. I still don't know why R didn't recognize the scientific notation originally - maybe it's a problem with my specific R installation.

Thanks for your help!
Sarah



Dongjun Chung

unread,
May 15, 2014, 12:51:52 PM5/15/14
to mosaics_u...@googlegroups.com
Hi Sarah,

It is great to hear that your problem is fixed!
Now, please enjoy our MOSAiCS software! :)

Thanks,
Dongjun

Reply all
Reply to author
Forward
0 new messages