RnBeads bed import error

1,026 views
Skip to first unread message

Peter Hill

unread,
Feb 10, 2015, 10:57:04 AM2/10/15
to epigenom...@googlegroups.com
Hi,

I keep getting the same error message when trying to import bismark.cov files using RnBeads - and I quite like what I saw from the Nature Methods regarding RnBeads, so this is frustrating!!

>library(RnBeads)
>library(RnBeads.mm9)
>RnBeadsOptions <- rnb.options(assembly="mm9",import.bed.style="bismarkCov", import.table.separator="\t",filtering.snp="yes",filtering.missing.value.quantile=0.6, filtering.greedycut=FALSE, filtering.coverage.threshold=5, distribution.subsample=1000000, filtering.high.coverage.outliers=TRUE, filtering.low.coverage.masking=TRUE,normalization.method="none", normalization.background.method="none", normalization.plot.shifts=TRUE, filtering.context.removal=c("CC","CAG","CAH","CTG","CTH","Other"), filtering.sex.chromosomes.removal=TRUE)
>data.dir <- "/Users/pwh08/Documents/BisSamples"
>samples <- c("A1.bismark.cov","A2.bismark.cov","A3.bismark.cov","A4.bismark.cov","B1.bismark.cov","B2.bismark.cov","B3.bismark.cov","B41.bismark.cov")
>regions <- c("tiling","genes","promoters","cpgislands")
>result <- read.bed.files(base.dir=data.dir, file.names=samples, region.types=regions, assembly="mm9")
Read 8 BED files
Combined a data matrix with 8032550 sites and 8 samples
Processed all BED files
Removed 8032550 sites with unknown chromosomes
Warning: All sites have been removed, returning NULL
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
5: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
6: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
7: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
8: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8

I have looked at the bismark.cov files and they are in the expected format. I also ran rnb.options() to make sure everything related to importing bed files was as expected:

$import.table.separator
[1] "\t"

$import.bed.style
[1] "bismarkCov"

$import.bed.columns
     chr    start      end   strand     meth coverage        c        t
       1        2       NA       NA       NA       NA        5        6

$import.bed.frame.shift
[1] 1

$import.bed.test
[1] TRUE

$import.bed.test.only
[1] FALSE

Thoughts as to what could be going wrong? Does it have to do with RnBeads.mm9 package?

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] RnBeads.mm9_0.99.0   RnBeads_0.99.18      plyr_1.8.1
 [4] methylumi_2.12.0     minfi_1.12.0         bumphunter_1.6.0
 [7] locfit_1.5-9.1       iterators_1.0.7      foreach_1.4.2
[10] Biostrings_2.34.1    XVector_0.6.0        lattice_0.20-29
[13] reshape2_1.4         scales_0.2.4         Biobase_2.26.0
[16] illuminaio_0.8.0     matrixStats_0.13.0   limma_3.22.4
[19] gridExtra_0.9.1      gplots_2.16.0        ggplot2_1.0.0
[22] fields_7.1           maps_2.3-9           spam_1.0-1
[25] ff_2.2-13            bit_1.1-12           cluster_1.15.3
[28] RColorBrewer_1.1-2   MASS_7.3-37          GenomicRanges_1.18.4
[31] GenomeInfoDb_1.2.4   IRanges_2.0.0        S4Vectors_0.4.0
[34] BiocGenerics_0.12.1

loaded via a namespace (and not attached):
 [1] annotate_1.44.0       AnnotationDbi_1.28.1  base64_1.1
 [4] beanplot_1.2          bitops_1.0-6          caTools_1.17.1
 [7] codetools_0.2-9       colorspace_1.2-4      DBI_0.3.1
[10] digest_0.6.4          doRNG_1.6             gdata_2.13.3
[13] genefilter_1.48.1     gtable_0.1.2          gtools_3.4.1
[16] KernSmooth_2.23-13    mclust_4.4            multtest_2.22.0
[19] munsell_0.4.2         nlme_3.1-119          nor1mix_1.2-0
[22] pkgmaker_0.22         preprocessCore_1.28.0 proto_0.3-10
[25] quadprog_1.5-5        Rcpp_0.11.2           registry_0.2
[28] reshape_0.8.5         R.methodsS3_1.6.1     rngtools_1.2.4
[31] RSQLite_1.0.0         siggenes_1.40.0       splines_3.1.0
[34] stringr_0.6.2         survival_2.37-7       tools_3.1.0
[37] XML_3.98-1.1          xtable_1.7-4          zlibbioc_1.12.0

Fabian

unread,
Feb 10, 2015, 12:13:01 PM2/10/15
to epigenom...@googlegroups.com
Hi,
this probably has to do with the fact that you use the read.bed.files() function, which overall does not take into account the options you previously specified. The function you are looking for is probably rnb.execute.import().
After specifying the options using rnb.options(), this function takes care of loading your data with these options instead of requiring a lot of parameters.

Best,
Fabian

rsheib

unread,
Aug 1, 2017, 5:05:25 AM8/1/17
to Epigenomics forum
Dear Peter,
did you solve this issue? I have the same one and checked all the issues but still I have the warnings with rnb.run.import. Can you please let me know about it.

Kind regards,
Rahel

Pavlo Lutsik

unread,
Aug 1, 2017, 5:14:21 AM8/1/17
to Epigenomics forum
Dear Rahel,

Could you share an excerpt of your BED files with us for debugging purposes (several thousand lines would suffice).

Best regards,

Pavlo

rsheib

unread,
Aug 1, 2017, 9:35:37 AM8/1/17
to Epigenomics forum
Dear Pavlo,

Many thanks for your prompt reply. I have EPP Bed files. Here is my command and the header of Bed files:
>RnBeadsOptions <- rnb.options(assembly="hg38",import.bed.style="EPP", import.table.separator="\t")
>rnb.options(region.types=c(rnb.getOption("region.types"),"promoters","genes","exns","tiling","cpgislands","sites"))
>bed.dir<-file.path(data.dir, "bed_biseq")
>sample.annotation <- file.path(data.dir, "sample_annotation.txt")
>analysis.dir <- file.path(data.dir, "analysis")
>report.dir <- file.path(analysis.dir, "reports")
>rnb.initialize.reports(report.dir)
>data.source <- c(bed.dir, sample.annotation)
>logger.start(fname=NA)
>logger.isinitialized()
>result <-rnb.run.import(dir.reports=report.dir, data.source=data.source, data.type="bs.bed.dir")
and the BED file:
chr1 10496 10498 '47/47' 1000 +
chr1 10524 10526 '47/47' 1000 +
chr1 10541 10543 '47/47' 1000 +
chr1 10588 10590 '1/1' 1000 +
chr1 10608 10610 '1/1' 1000 +
chr1 10616 10618 '1/1' 1000 +
chr1 10619 10621 '0/1' 0 +
chr1 10640 10642 '6/10' 600 +
chr1 10643 10645 '13/15' 867 +
chr1 10649 10651 '13/15' 867 +
chr1 10659 10661 '13/15' 867 +
chr1 10661 10663 '13/14' 929 +
chr1 10664 10666 '15/15' 1000 +
chr1 10666 10668 '15/15' 1000 +
chr1 10669 10671 '12/20' 600 +
chr1 10672 10674 '14/19' 737 +
chr1 10678 10680 '18/20' 900 +
chr1 10688 10690 '21/22' 955 +
chr1 10690 10692 '18/19' 947 +
chr1 10693 10695 '22/22' 1000 +
chr1 10695 10697 '17/22' 773 +
chr1 10698 10700 '23/27' 852 +
chr1 10701 10703 '21/22' 955 +
chr1 10707 10709 '22/23' 957 +
chr1 10717 10719 '23/24' 958 +
chr1 10719 10721 '20/20' 1000 +
chr1 10722 10724 '24/24' 1000 +
chr1 10724 10726 '16/24' 667 +
chr1 10727 10729 '22/27' 815 +
chr1 10730 10732 '23/23' 1000 +
chr1 10736 10738 '23/23' 1000 +
chr1 10746 10748 '22/23' 957 +
chr1 10748 10750 '19/20' 950 +
chr1 10751 10753 '23/23' 1000 +
chr1 10753 10755 '19/23' 826 +
chr1 10756 10758 '20/21' 952 +
chr1 10759 10761 '12/12' 1000 +
chr1 10765 10767 '12/12' 1000 +
chr1 10775 10777 '12/12' 1000 +
chr1 10777 10779 '10/10' 1000 +
chr1 10780 10782 '12/12' 1000 +
chr1 10782 10784 '12/12' 1000 +
chr1 10785 10787 '7/9' 778 +
chr1 11460 11462 '0/13' 0 -
chr1 11479 11481 '0/14' 0 -
chr1 11762 11764 '9/9' 1000 +
chr1 11780 11782 '9/9' 1000 +
chr1 11791 11793 '5/5' 1000 +
chr1 11945 11947 '8/20' 400 -
chr1 11959 11961 '14/27' 519 +
chr1 14887 14889 '10/10' 1000 -
chr1 15954 15956 '58/58' 1000 +
chr1 16943 16945 '9/9' 1000 -
chr1 16961 16963 '9/9' 1000 -
chr1 16973 16975 '9/9' 1000 -
chr1 26793 26795 '14/14' 1000 +
chr1 26797 26799 '14/14' 1000 +
chr1 26802 26804 '13/14' 929 +
chr1 26809 26811 '14/14' 1000 +
chr1 26836 26838 '13/14' 929 +
chr1 28764 28766 '0/2' 0 +
chr1 28773 28775 '0/2' 0 +
chr1 28792 28794 '0/2' 0 +
chr1 28807 28809 '0/2' 0 +
chr1 28851 28853 '6/17' 353 +
chr1 28862 28864 '5/17' 294 +
chr1 28864 28866 '0/17' 0 +
chr1 28890 28892 '0/16' 0 +
chr1 28907 28909 '0/14' 0 +
chr1 28911 28913 '0/2' 0 +
chr1 28924 28926 '0/2' 0 +
chr1 28933 28935 '0/2' 0 +
chr1 28935 28937 '0/2' 0 +
chr1 28937 28939 '0/2' 0 +
chr1 29001 29003 '0/5' 0 -
chr1 29010 29012 '0/7' 0 +
chr1 29020 29022 '0/10' 0 +
chr1 29052 29054 '0/12' 0 +
chr1 29062 29064 '2/4' 500 +
chr1 29073 29075 '0/17' 0 +
chr1 29075 29077 '0/19' 0 +
chr1 29095 29097 '0/17' 0 -
chr1 29100 29102 '0/15' 0 -
chr1 29111 29113 '0/11' 0 -
chr1 29118 29120 '0/16' 0 -
chr1 29164 29166 '0/15' 0 +
chr1 29171 29173 '0/15' 0 +
chr1 29175 29177 '0/15' 0 +
chr1 29177 29179 '0/15' 0 +
chr1 29193 29195 '0/14' 0 +
chr1 29204 29206 '0/15' 0 +
chr1 29214 29216 '0/12' 0 +
chr1 29220 29222 '0/12' 0 +
chr1 29229 29231 '0/9' 0 -
chr1 29233 29235 '0/8' 0 -
chr1 29252 29254 '0/14' 0 +
chr1 29266 29268 '0/5' 0 +
chr1 29271 29273 '0/5' 0 +
chr1 29284 29286 '0/5' 0 +
chr1 29297 29299 '0/5' 0 +
chr1 29299 29301 '0/5' 0 +
chr1 29305 29307 '0/9' 0 +
chr1 29309 29311 '0/13' 0 +
chr1 29323 29325 '0/14' 0 +
chr1 29328 29330 '0/15' 0 +
chr1 29336 29338 '0/13' 0 +
chr1 29346 29348 '0/4' 0 +
chr1 29349 29351 '0/4' 0 +
chr1 29352 29354 '0/4' 0 +
chr1 29358 29360 '0/4' 0 +
chr1 29483 29485 '0/2' 0 +
chr1 29488 29490 '0/2' 0 +
chr1 29499 29501 '0/2' 0 +
chr1 29719 29721 '0/5' 0 -
chr1 29723 29725 '0/5' 0 -
chr1 29730 29732 '0/5' 0 -
chr1 29735 29737 '0/5' 0 -
chr1 39675 39677 '0/1' 0 -
chr1 39702 39704 '1/1' 1000 -
chr1 51603 51605 '0/3' 0 -
chr1 51631 51633 '2/3' 667 -
chr1 51636 51638 '2/3' 667 -
chr1 51640 51642 '2/3' 667 -
chr1 51647 51649 '1/3' 333 -
chr1 51661 51663 '3/3' 1000 -
chr1 51681 51683 '0/2' 0 -
chr1 51692 51694 '0/2' 0 -
chr1 51721 51723 '1/3' 333 +
chr1 51725 51727 '1/1' 1000 +
chr1 51727 51729 '1/1' 1000 +
chr1 51733 51735 '1/1' 1000 +
chr1 51737 51739 '1/1' 1000 +
chr1 51743 51745 '1/1' 1000 +
chr1 51765 51767 '1/1' 1000 +
chr1 51770 51772 '0/1' 0 +
chr1 64494 64496 '5/7' 714 -
chr1 91028 91030 '2/2' 1000 -
chr1 91058 91060 '2/2' 1000 -
chr1 103063 103065 '1/3' 333 +
chr1 103110 103112 '4/4' 1000 -
chr1 121358 121360 '10/10' 1000 +
chr1 121366 121368 '10/10' 1000 +
chr1 121410 121412 '10/10' 1000 +
chr1 121414 121416 '9/10' 900 +
chr1 121486 121488 '1/1' 1000 +

Many thanks for your help,
Kind Regards,
Rahel

rsheib

unread,
Aug 16, 2017, 8:24:22 AM8/16/17
to Epigenomics forum
Dear Pavlo,

Did you check this error from RnBeads? I still have this error and tried all possible options to get rid of this and beside this error I got this one as well. Do you think there is something wrong with BED files?
> rnb.options(
+ identifiers.column                = "filename_bed",
+ import.bed.style                  = "EPP",
+ assembly                          = "hg38",
+ filtering.low.coverage.masking    = TRUE,
+ filtering.greedycut               = FALSE,
+ filtering.missing.value.quantile  = 0.5,
+ filtering.high.coverage.outliers  = TRUE,
+ normalization.plot.shifts  =TRUE,
+ export.to.ewasher  = TRUE,
+ filtering.deviation.threshold  = 0.005,
+ qc.coverage.plots   = TRUE,
+ qc.coverage.histograms   = TRUE,
+ qc.coverage.violins    = TRUE,
+         exploratory.clustering.heatmaps.pdf= TRUE)
> result <- read.bed.files(base.dir=data.dir, region.types=regions, assembly="hg38", import.bed.style= "EPP")
2017-08-16 14:08:51     7.1  STATUS                                                                         STARTED Loading Data From BED Files
Error in read.table.ffdf(FUN = "read.delim", ...) : 
  unkown arguments: import.bed.style
> result <- read.bed.files(base.dir=data.dir, region.types=regions, assembly="hg38")
2017-08-16 14:08:57     7.1  STATUS                                                                             STARTED Loading Data From BED Files
2017-08-16 14:12:23     6.6  STATUS                                                                                 Read 44 BED files
opening ff /tmp/RtmpF1IqPr/ffdf79713ed1fc87.ff
2017-08-16 14:13:50     7.0  STATUS                                                                                 Combined a data matrix with 8368422 sites and 44 samples
2017-08-16 14:13:50     7.0  STATUS                                                                                 Processed all BED files
2017-08-16 14:13:50     7.0  STATUS                                                                                 STARTED Creating RnBiseqSet object
2017-08-16 14:14:10     6.9  STATUS                                                                                     Matched 4269226 of 8368422 methylation sites to the annotation
2017-08-16 14:14:10     6.9  STATUS                                                                                     Checking site coverage
2017-08-16 14:14:13     7.6  STATUS                                                                                     Removed 4269226 of 4269226 methylation sites because they were not covered in any sample
2017-08-16 14:14:31     7.2  STATUS                                                                                     Creating methylation matrix
2017-08-16 14:14:59     7.1  STATUS                                                                                     Creating coverage matrix
2017-08-16 14:15:25     7.1   ERROR                                                                                     The following samples have no valid methylation values: 6 16 25 31 1 19 26 29 37 7 9 12 33 22 4 5 13 23 36 44 20 24 39 40 41 42 2 3 11 14 18 17 21 27 30 32 38 43 8 10 15 28 35 34
Error in logger.error(txt) : 
  The following samples have no valid methylation values: 6 16 25 31 1 19 26 29 37 7 9 12 33 22 4 5 13 23 36 44 20 24 39 40 41 42 2 3 11 14 18 17 21 27 30 32 38 43 8 10 15 28 35 34
In addition: There were 44 warnings (use warnings() to see them)

Many thanks in advance,
Rahel

On Tuesday, August 1, 2017 at 11:14:21 AM UTC+2, Pavlo Lutsik wrote:

Edahi Gonzalez Avalos

unread,
Sep 25, 2017, 11:36:18 PM9/25/17
to Epigenomics forum
I am also having the same error of
 

All sites have been removed, returning NULL


Has this been solved? I can provide data, or logs, or whatever needed to solve it.

Edahi Gonzalez Avalos

unread,
Sep 26, 2017, 12:06:36 AM9/26/17
to epigenom...@googlegroups.com
I went ahead and added below the test datasets (the supposedly 10,000 rows RnBeads reads).
Please let me know if you get to solve the error:

Inline image 1


--
You received this message because you are subscribed to a topic in the Google Groups "Epigenomics forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/epigenomicsforum/TY7q2togFX4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to epigenomicsforum+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Annotation.txt
error.test.tar.bz2
03.README_ProveAPoint.R

Pavlo Lutsik

unread,
Sep 26, 2017, 2:43:30 AM9/26/17
to Epigenomics forum
Sorry for the delays, guys. 

Thanks Edahi for preparing the test data, I will look into the issue this week.

Best regards,

Pavlo
To unsubscribe from this group and all its topics, send an email to epigenomicsfor...@googlegroups.com.

Pavlo Lutsik

unread,
Oct 5, 2017, 8:11:15 AM10/5/17
to Epigenomics forum
Hi,

I have committed a few fixes to the "bismarkCytosine" style into a separate branch on github. You can install it using devtools:

devtools:::install_github("epigen/RnBeads", ref="bsseq-loading-fix")

(will be a part of the upcoming release).

For Edahi's example I managed to read the files in using the following snippet (mind the wrong BED file names in the Annotation.txt):

rnb.options(assembly = "mm9")

rnb.options(import.bed.style = "bismarkCytosine")

rnb.options(import.table.separator="\t")

rnb.set<-rnb.execute.import(

data.source="data/bedDemo/bismarkCytosineIssue/",

data.type="bed.dir",

dry.run=TRUE, verbose=TRUE)


Best regards,

Pavlo

rsheib

unread,
Nov 6, 2017, 6:52:23 AM11/6/17
to Epigenomics forum
Dear Pavlo,

I did try again the run with the fix version but it still gives me the same warning. My input bed file is "EPP". I want to know what is the effect of this warning, am I loosing some part of my data?

Kind Regards,
Rahel

Edahi Gonzalez Avalos

unread,
Nov 9, 2017, 3:18:25 PM11/9/17
to epigenom...@googlegroups.com
On my part, the problem got solved. Thank you very much

To unsubscribe from this group and all its topics, send an email to epigenomicsforum+unsubscribe@googlegroups.com.

Edahi Gonzalez Avalos

unread,
Nov 13, 2017, 1:07:10 PM11/13/17
to epigenom...@googlegroups.com
Nevermind, I still have errors with the program but steps way later. Attached is the log file, as well as the Annotation.txt, Data and the README (you may need to modify it) of how I run it. Please note that part of the errors I had were because the program at some step used the library "DBI" while this is never requested to be installed or the program unsuccessfully installed it.

Anyhow, should you require the whole (or bigger) dataset to reproduce the errors at the end, we can arrange the data transfer at your earliest convenience, or I can post 30,000 rows instead of 10000.

Thank you for your efforts.




RnBeads.log.txt
README.R
Annotation.txt
error.test.tar.bz2

Fabian

unread,
Nov 14, 2017, 3:04:52 AM11/14/17
to Epigenomics forum
Hi,
this is something that we see sometimes when you run out of memory due to the size of the dataset. Are you running this is parallel? if so, you could try to reduce the number of parallel jobs.
Hope that helps.

Best regards,
Fabian

rsheib

unread,
Nov 14, 2017, 4:10:40 AM11/14/17
to Epigenomics forum
Dear all,

Can anyone explain me that what this warning means when I import "EPP" bed files in RnBeads?

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8

Many thanks, Rahel

Pavlo Lutsik

unread,
Nov 14, 2017, 5:03:37 AM11/14/17
to Epigenomics forum
Hi,

I could just load a selection of EPP-formatted files which look identical to yours using the latest build (no warnings whatsoever).

Best regards,

Pavlo

rsheib

unread,
Nov 14, 2017, 5:12:45 AM11/14/17
to Epigenomics forum
Dear Pavlo,

I attched here 2 test example of my Bed file, can you please check the run with them. I did repeat it many time but I have the same error and I do not know what is the effect and should I stop to continue with further analysis in this case or not?
here is also the command:
rnb.options(
identifiers.column                = "BSF_name",
import.bed.style                  = "EPP",
assembly                          = "hg38",
filtering.low.coverage.masking    = TRUE,
filtering.greedycut               = FALSE,
filtering.missing.value.quantile  = 0.5,
filtering.high.coverage.outliers  = TRUE,
differential.comparison.columns=c("sample_Group"), 
differential.comparison.columns.all.pairwise=c("sample_Group"),
differential.report.sites         = TRUE,
differential.site.test.method="limma",
export.to.ewasher   = TRUE,
filtering.deviation.threshold   = 0.005,
qc.coverage.plots   = TRUE,
qc.snp.boxplot =TRUE,
qc.coverage.plots = TRUE, 
qc.coverage.histograms = TRUE, 
qc.coverage.violins = TRUE,
import.gender.prediction         = TRUE,
qc.coverage.histograms   = TRUE,
qc.coverage.violins    = TRUE,
        exploratory.clustering.heatmaps.pdf= TRUE,
region.types=c(rnb.getOption("region.types"),"promoters","genes","exns","tiling","cpgislands","sites")
)
#######
dataDir <- file.path(getwd(), "data")
resultDir <- file.path(getwd(), "results_lima")
datasetDir <- file.path(dataDir, "rnbeads")
bed.dir <- file.path(datasetDir, "bed_biseq")
sampleSheet <- file.path(datasetDir, "sample_annotation.csv")
reportDir <- file.path(resultDir, "report_RRBS")

###
data.source <- c(bed.dir, sampleSheet)
result <-rnb.run.import(dir.reports=reportDir,
data.source=data.source,
data.type="bs.bed.dir"


Many thanks in advance,
test1.epp.bed
test2.epp.bed

Marta Kołoszyc

unread,
Jan 16, 2018, 11:45:24 AM1/16/18
to Epigenomics forum

Hi Rahel,

Have you found a solution? Would you care to share it? I came across precisely the same problem, with my bed file having the same formatting and so on. 

thanks
Marta

rsheib

unread,
Jan 17, 2018, 7:11:59 AM1/17/18
to Epigenomics forum
Dear Marta,

I could not find any solution and I gave up to do the analysis with RnBeads.

Kind Regards,
Rahel

Pavlo Lutsik

unread,
Jan 17, 2018, 9:18:09 AM1/17/18
to Epigenomics forum
Hi,

To restate what I posted above in this thread, RnBeads is able to read EPP-formatted files without any major problems. I am thus not sure which solution is needed here. The mentioned warnings are harmless, and will be fixed in one of the upcoming releases.

Best regards,

Pavlo

On Tuesday, January 16, 2018 at 5:45:24 PM UTC+1, Marta Kołoszyc wrote:

Marta Kołoszyc

unread,
Jan 18, 2018, 7:52:00 AM1/18/18
to Epigenomics forum
Hi Pavlo,
In my case I get :

> data.dir <- "/home/mart/Barley_pilot_RnBeads/Rtry"
> bed.dir <- file.path(data.dir, "bed")
> sample.annotation <- file.path(data.dir, "sample.txt")
> analysis.dir <- "/home/mart/Barley_pilot_RnBeads/Rtry/analysis"
> dir.reports <- file.path(analysis.dir, "reports")
> rnb.initialize.reports(dir.reports)
> rnb.options(identifiers.column="filename_bed",import.bed.style = 'EPP',import.table.separator="\t")
> data.source <- c(bed.dir, sample.annotation)
> result<-rnb.execute.import(data.source, data.type = "bed.dir",dry.run=TRUE, verbose=TRUE)
No column with file names specified: will try to find one
Potential file names found in column 2 of the supplied annotation table
Read 2 BED files
Combined a data matrix with 13228 sites and 2 samples
Processed all BED files
Removed 13147 sites with unknown chromosomes
Warning: All sites have been removed, returning NULL
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8

In the end, files are not imported. I've tried to follow the advice you gave as a solution to problems described earlier, but still, I am clearly doing something gravely wrong. 

I've attached a random subset of my bed files and annotation txt if you would like to have a look. I must be doing something wrong. 

Oh, and I work with plant genome.

best
Marta
1_AseI.bed
2_AseI.bed
samples.txt

Pavlo Lutsik

unread,
Jan 18, 2018, 9:08:23 AM1/18/18
to Epigenomics forum
Dear Marta, 

Your last line explains it. RnBeads supports only reference genome-based analysis, for which a suitable annotation package exists (e.g. RnBeads.hg19 or RnBeads.mm10). To be able to load data from barley you need to assemble such a package yourself. We have a brief explanation on how to achieve this in one of our FAQ items (Which genome assemblies does RnBeads support? Can I include a new one?). Let us know whether you managed to proceed.

Best regards,

Pavlo

Marta Kołoszyc

unread,
Jan 25, 2018, 10:30:23 AM1/25/18
to Epigenomics forum
Hi Pavlo,
So (with help) I've managed to install the genome (I actually have used a script by Bjorn Wouters) and the analysis was going quite ok, untill this happened:

> # Main analysis function of RnBeads.
> rnb.run.analysis(dir.reports=report.dir, data.source=data.source,
+                  data.type="bs.bed.dir")
2018-01-25 15:46:02     0.9  STATUS STARTED RnBeads Pipeline
2018-01-25 15:46:02     0.9    INFO     Initialized report index and saved to index.html
2018-01-25 15:46:03     0.9  STATUS     STARTED Loading Data
2018-01-25 15:46:03     0.9    INFO         Number of cores: 1
2018-01-25 15:46:03     0.9    INFO         Loading data of type "bs.bed.dir"
2018-01-25 15:46:03     0.9  STATUS         STARTED Performing loading test
2018-01-25 15:46:03     0.9    INFO             The first 10000 rows will be read from each data file
2018-01-25 15:46:03     0.9    INFO             No column with file names specified: will try to find one
2018-01-25 15:46:03     0.9  STATUS             STARTED Loading Data From BED Files
2018-01-25 15:46:03     0.9  STATUS                 STARTED Automatically parsing the provided sample annotation file
2018-01-25 15:46:03     0.9  STATUS                     Potential file names found in column 2 of the supplied annotation table
2018-01-25 15:46:03     0.9  STATUS                 COMPLETED Automatically parsing the provided sample annotation file
2018-01-25 15:46:04     1.0  STATUS                 Read 2 BED files
2018-01-25 15:46:04     1.0  STATUS                 Combined a data matrix with 13267 sites and 3 samples
2018-01-25 15:46:04     1.0  STATUS                 Processed all BED files
2018-01-25 15:48:49     1.0  STATUS                 Matched 13267 of 13267 methylation sites to the annotation
2018-01-25 15:48:49     1.0  STATUS                 STARTED Creating RnBiseqSet object
2018-01-25 15:48:49     1.0  STATUS                     Summarizing strand methylation
2018-01-25 15:49:19     1.1  STATUS                 COMPLETED Creating RnBiseqSet object
2018-01-25 15:49:19     1.1  STATUS             COMPLETED Loading Data From BED Files
2018-01-25 15:49:19     1.1  STATUS             STARTED Checking the loaded object
2018-01-25 15:49:19     1.1    INFO                 Checking the supplied RnBiseqSet object
2018-01-25 15:49:19     1.1    INFO                 The object contains information for 13267 methylation sites
2018-01-25 15:49:19     1.1    INFO                 The object contains information for 2 samples
2018-01-25 15:49:19     1.1    INFO                 The object contains 6534 missing methylation values
2018-01-25 15:49:19     1.1    INFO                 Methylation values are within the expected range
2018-01-25 15:49:19     1.1    INFO                 The object contains coverage information
2018-01-25 15:49:19     1.1    INFO                 Coverage values are within the expected range
2018-01-25 15:49:19     1.1    INFO                 The object loaded during the loading test is valid
2018-01-25 15:49:19     1.1  STATUS             COMPLETED Checking the loaded object
2018-01-25 15:49:19     1.1  STATUS         COMPLETED Performing loading test
2018-01-25 15:49:20     1.1    INFO         No column with file names specified: will try to find one
2018-01-25 15:49:20     1.1  STATUS         STARTED Loading Data From BED Files
2018-01-25 15:49:20     1.1  STATUS             STARTED Automatically parsing the provided sample annotation file
2018-01-25 15:49:20     1.1  STATUS                 Potential file names found in column 2 of the supplied annotation table
2018-01-25 15:49:20     1.1  STATUS             COMPLETED Automatically parsing the provided sample annotation file
2018-01-25 15:49:21     1.1  STATUS             Read 2 BED files
2018-01-25 15:49:21     1.1  STATUS             Combined a data matrix with 35157 sites and 3 samples
2018-01-25 15:49:21     1.1  STATUS             Processed all BED files
2018-01-25 15:57:21     1.1  STATUS             Matched 35157 of 35157 methylation sites to the annotation
2018-01-25 15:57:22     1.1  STATUS             STARTED Creating RnBiseqSet object
2018-01-25 15:57:22     1.1  STATUS                 Summarizing strand methylation
2018-01-25 15:59:14     1.1  STATUS             COMPLETED Creating RnBiseqSet object
2018-01-25 15:59:14     1.1  STATUS         COMPLETED Loading Data From BED Files
2018-01-25 15:59:14     1.1  STATUS         Loaded data from /home/mart/RnBeads-master/RnBeads/assemblies/hv1//bs.bed/
2018-01-25 15:59:15     1.1  STATUS         Added data loading section to the report
2018-01-25 15:59:15     1.1  STATUS         Loaded 2 samples and 35157 sites
2018-01-25 15:59:15     1.1    INFO         Output object is of type RnBiseqSet
2018-01-25 15:59:15     1.1  STATUS     COMPLETED Loading Data
2018-01-25 15:59:15     1.1    INFO     Initialized report index and saved to index.html
2018-01-25 15:59:15     1.1  STATUS     STARTED Quality Control
2018-01-25 15:59:15     1.1    INFO         Number of cores: 1
2018-01-25 15:59:15     1.1  STATUS         STARTED Preparing Quality Control Information
2018-01-25 15:59:15     1.1  STATUS         COMPLETED Preparing Quality Control Information
2018-01-25 15:59:15     1.1  STATUS         STARTED Quality Control Section
2018-01-25 15:59:28     1.1  STATUS             Added sequencing coverage histograms
2018-01-25 15:59:32     1.1  STATUS             Added sequencing coverage violin plots
2018-01-25 15:59:34     1.1  STATUS         COMPLETED Quality Control Section
2018-01-25 15:59:34     1.1  STATUS     COMPLETED Quality Control
2018-01-25 15:59:34     1.1    INFO     Initialized report index and saved to index.html
2018-01-25 15:59:34     1.1  STATUS     STARTED Preprocessing
2018-01-25 15:59:34     1.1    INFO         Number of cores: 1
2018-01-25 15:59:34     1.1 WARNING         Skipped normalization module for sequencing data.
2018-01-25 15:59:34     1.1  STATUS         STARTED Filtering Procedures
2018-01-25 16:01:34     1.1  STATUS             STARTED Removal of High Coverage (Outlier) Sites
2018-01-25 16:01:34     1.1  STATUS                 Removed 7 high coverage outlier sites
2018-01-25 16:01:34     1.1  STATUS                 Saved removed sites to /home/mart/RnBeads-master/RnBeads/assemblies/hv1/analysis/25_01_2018_15:45/preprocessing_data/removed_sites_high_coverage.csv
2018-01-25 16:01:34     1.1  STATUS                 Added a corresponding section to the report
2018-01-25 16:01:34     1.1  STATUS             COMPLETED Removal of High Coverage (Outlier) Sites
2018-01-25 16:03:18     1.1  STATUS             Retained 2 samples and 35150 sites
2018-01-25 16:03:18     1.1  STATUS         COMPLETED Filtering Procedures
2018-01-25 16:03:18     1.1  STATUS         STARTED Summary of Filtering Procedures
2018-01-25 16:03:18     1.1  STATUS         COMPLETED Summary of Filtering Procedures
2018-01-25 16:03:18     1.1  STATUS     COMPLETED Preprocessing
2018-01-25 16:03:19     1.1    INFO     Initialized report index and saved to index.html
2018-01-25 16:03:19     1.1  STATUS     STARTED Tracks and Tables
2018-01-25 16:03:19     1.1    INFO         Number of cores: 1
2018-01-25 16:03:19     1.1  STATUS         STARTED Generating Tracks and Tables
2018-01-25 16:03:19     1.1  STATUS             STARTED Exporting sites
2018-01-25 16:03:19     1.1  STATUS                 STARTED Creating Track Hub -- bigBed
2018-01-25 16:03:19     1.1  STATUS                     STARTED Conversion to BED
2018-01-25 16:03:19     1.1  STATUS                         Converting to GRangesList
2018-01-25 16:05:01     1.1  STATUS                         Exporting sample 1_AseI
2018-01-25 16:05:01     1.1  STATUS                         Exporting sample 2_AseI
2018-01-25 16:05:15     1.1  STATUS                     COMPLETED Conversion to BED
2018-01-25 16:05:15     1.1  STATUS                     STARTED Creating Track Hub
2018-01-25 16:05:17     1.1  STATUS                     COMPLETED Creating Track Hub
2018-01-25 16:05:17     1.1  STATUS                 COMPLETED Creating Track Hub -- bigBed
2018-01-25 16:05:17     1.1  STATUS                 STARTED Creating UCSC Track Hub -- bigWig
2018-01-25 16:05:17     1.1  STATUS                     STARTED Conversion to bedGraph
2018-01-25 16:07:12     1.1  STATUS                     COMPLETED Conversion to bedGraph
2018-01-25 16:07:12     1.1  STATUS                     STARTED Creating Track Hub
2018-01-25 16:07:13     1.1  STATUS                     COMPLETED Creating Track Hub
2018-01-25 16:07:13     1.1  STATUS                 COMPLETED Creating UCSC Track Hub -- bigWig
2018-01-25 16:07:13     1.1  STATUS             COMPLETED Exporting sites
2018-01-25 16:07:13     1.1  STATUS         COMPLETED Generating Tracks and Tables
2018-01-25 16:07:13     1.1  STATUS         STARTED Writing export report
2018-01-25 16:07:13     1.1  STATUS         COMPLETED Writing export report
2018-01-25 16:07:13     1.1  STATUS     COMPLETED Tracks and Tables
2018-01-25 16:07:13     1.1    INFO     Initialized report index and saved to index.html
2018-01-25 16:07:13     1.1  STATUS     STARTED Exploratory Analysis
2018-01-25 16:07:13     1.1    INFO         Number of cores: 1
2018-01-25 16:07:13     1.1  STATUS         Designed color mappings for probe type and CGI status
2018-01-25 16:07:13     1.1 WARNING         The following 1 region types will not be included in the analysis:
2018-01-25 16:07:13     1.1 WARNING         The following 1 region types will not be included in the analysis:
2018-01-25 16:07:13     1.1  STATUS         STARTED Dimension Reduction Techniques
2018-01-25 16:07:13     1.1 WARNING             Skipped due to too few samples
2018-01-25 16:07:13     1.1  STATUS         COMPLETED Dimension Reduction Techniques
2018-01-25 16:07:13     1.1 WARNING         The following 1 region types will not be included in the analysis:
2018-01-25 16:08:55     1.1 WARNING         The following 1 region types will not be included in the analysis:
2018-01-25 16:08:55     1.1  STATUS         STARTED Methylation Value Distributions - Sample Groups
2018-01-25 16:08:55     1.1    INFO             processing beta_density_samples_1
2018-01-25 16:08:55     1.1    INFO             Density estimation ( all samples--sites ): Groupwise retained observations after missing value removal: all:58615/70300
2018-01-25 16:08:57     1.2  STATUS         COMPLETED Methylation Value Distributions - Sample Groups
2018-01-25 16:08:57     1.2  STATUS         STARTED Methylation Value Distributions - Site Categories
2018-01-25 16:08:57     1.2    INFO             Site categories are non-categorical. --> skipped
2018-01-25 16:08:57     1.2  STATUS         COMPLETED Methylation Value Distributions - Site Categories
2018-01-25 16:08:57     1.2  STATUS         STARTED Sample Clustering
2018-01-25 16:08:57     1.2  STATUS             STARTED Agglomerative Hierarchical Clustering
2018-01-25 16:08:57     1.2  STATUS                 Skipped clustering on sites : too few samples
2018-01-25 16:08:57     1.2 WARNING                 The following 1 region types will not be included in the analysis:
2018-01-25 16:08:57     1.2  STATUS             COMPLETED Agglomerative Hierarchical Clustering


pass1 - making usageList (6309 chroms): 14 millis
pass2 - checking and writing primary data (31253 records, 6 fields): 444 millis
index write: 3 millis
pass3 - writeReducedOnceReturnReducedTwice: 30 millis
further reductions: 11 millis
pass1 - making usageList (5443 chroms): 12 millis
pass2 - checking and writing primary data (27362 records, 6 fields): 411 millis
index write: 2 millis
pass3 - writeReducedOnceReturnReducedTwice: 27 millis
further reductions: 10 millis
Error in sum(clust.failed) : invalid 'type' (list) of argument
Calls: rnb.run.analysis ... rnb.run.exploratory -> rnb.step.clustering.internal
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 6 != length(data) = 8
5: Removed 703 rows containing non-finite values (stat_ydensity).
Execution halted

(same input as in my comment above),

Any obvious idea of what may cause the error?

thanks,
Marta
Reply all
Reply to author
Forward
0 new messages