Is rnb.execute.import() has a bug in reading bismarkCytosine formart bed file? coordinates seem to be shifted by 1

24 views
Skip to first unread message

bb cao

unread,
May 14, 2020, 10:45:54 PM5/14/20
to Epigenomics forum

Hi,

      I keep getting errors with importing bismarkCytosine formart bed file. When I check the log, I found all sites have been removed.  So I check the source code, and then I found the function rnb.execute.import() line248 in the file of loading.R has a puzzling parameter, which the function "read.bed.files()" was called with  a parameter 'pos.coord.shift=1L,' and an undeclared parameter 'coord.shift = 0L'.  

      We know the coordinates of Bismark's coverage2cytosine format file are 1-based, so I think the function rnb.execute.import() use the wrong parameter 'pos.coord.shift=1L' , which it should be set 'pos.coord.shift=0L' ,  When I call read.bed.files() independently  use parameter 'pos.coord.shift=0L,', it can read and create an object. So, is it a bug? Please help me to check

 

Any help will be highly appreciated.

 

Thank you.

 

I use RnBeads 2.40 , The following information may be useful!

1.Below is the parameter pos.coord.shift of function read.bed.files() declaration

      param pos.coord.shift   The frame shift between the the CpG annotation (1-based) and the coordinates in the loaded BEDs. If BEDs have 0-based coordinates, \code{pos.coord.shift=1} (default)

 

2.Below is function rnb.execute.import() line248

                 else if (rnb.getOption("import.bed.style")=="bismarkCytosine"){

                          result <- read.bed.files(base.dir=data.source[[1]], sample.sheet=data.source[[2]], file.names.col=filename.column,

                                           verbose=verbose,

                                           pos.coord.shift=1L,                     # I think it should be 0 for (1-based) bed coordinates

                                           skip.lines=0,

                                           chr.col=1L,

                                           start.col=2L,

                                           end.col=NA,

                                           c.col=4L,

                                           t.col=5L,

                                           strand.col=3L,

                                           mean.meth.col=NA,

                                           coverage.col=NA,

                                           coord.shift = 0L,                                  # I think it should be remove

                                           nrows=nrows)

 

3. this is part of my wrong log:

2020-03-17 17:47:30     5.2  STATUS                                 STARTED Loading Data From BED Files
2020-03-17 17:47:31     5.2    INFO                                     Reading BED file: .//data//bed/1322P.cytosine_report_chr1.bed
2020-03-17 17:47:44     7.6  STATUS                                     Read 1 BED files
2020-03-17 17:47:45     4.0  STATUS                                     Matched chromosomes and strands to annotation
2020-03-17 17:47:45     4.0  STATUS                                     Checked for the presence of sites and coverage
2020-03-17 17:47:45     4.2  STATUS                                     Initialized meth/covg matrices
opening ff C:/Users/cby/AppData/Local/Temp/Rtmp0yYQkI/ffb71c4df4595.ff
2020-03-17 17:47:45     4.6  STATUS                                     Combined a data matrix with 4568940 sites and 1 samples
2020-03-17 17:47:45     4.6  STATUS                                     Processed all BED files
2020-03-17 17:47:45     4.6  STATUS                                     STARTED Creating RnBiseqSet object
2020-03-17 17:47:50     5.2 WARNING                                         All sites have been removed, returning NULL
2020-03-17 17:47:50     5.2  STATUS                                     COMPLETED Creating RnBiseqSet object
2020-03-17 17:47:50     5.2  STATUS                                 COMPLETED Loading Data From BED Files
2020-03-17 17:47:50     5.2  STATUS                                 Loaded data from .//data//bed
 

4. Call read.bed.files() function standalone:

> data.source <- c( bedDir , sampleSheet, "filename_bed")

> data.source <- as.list(data.source)

> result <- read.bed.files(base.dir=data.source[[1]], sample.sheet=data.source[[2]], file.names.col=data.source[[3L]],    verbose=TRUE,            

+                          pos.coord.shift=0L,

+                          skip.lines=0,               

+                          chr.col=1L,

+                          start.col=2L,

+                          end.col=NA,

+                          c.col=4L,

+                          t.col=5L,

+                          strand.col=3L,

+                          mean.meth.col=NA,

+                          coverage.col=NA,

+                          nrows=-1L,

+                          usebigff=TRUE)

Reading BED file: .//data//bed/1322P.cytosine_report_chr1.bed

Read 1 BED files

Matched chromosomes and strands to annotation

Checked for the presence of sites and coverage

Initialized meth/covg matrices

opening ff C:/Users/cby/AppData/Local/Temp/Rtmp0yYQkI/ffb71ca465dd6.ff

Combined a data matrix with 4568940 sites and 1 samples

Processed all BED files

Matched 2284470 of 4568940 methylation sites to the annotation

Checking site coverage

Removed 207795 of 2284470 methylation sites because they were not covered in any sample

Creating methylation matrix

Creating coverage matrix

Creating object

Summarizing strand methylation

Summarizing tiling methylation

Summarizing genes methylation

Summarizing promoters methylation

Summarizing cpgislands methylation

bb cao

unread,
May 18, 2020, 9:33:53 PM5/18/20
to Epigenomics forum
I found a guy had the same issule, So can it be sure it's a bug?
https://github.com/epigen/RnBeads/issues/7#issuecomment-630251187

在 2020年5月15日星期五 UTC+8上午10:45:54,bb cao写道:
Reply all
Reply to author
Forward
0 new messages