I'm troubleshooting some problems I was having with intersect and merge; I've been using a terminal in Ubuntu as well as R. Ideally I will be trying to accomplish all of this in R. While tracing back, I realized that the bed files I have are failing is.valid.region but I can't tell why. I converted it to bed using bedtools, straight from the bam file generated by STAR alignment. The entire "chr" column uses "chr" prefixes, even if I test it on the first 6 lines it fails:
is.valid.region(head(bam2bed))
VALIDATE REGIONS
* Checking input type... PASS
Input seems to be in bed format but chr/start/end column names are missing
* Check if index is a string... PASS
* Check index pattern... FAIL
Use check.chr = FALSE if no 'chr' prefix
[1] "c(\"chr1\", \"chr1\", \"chr1\", \"chr1\", \"chr1\", \"chr1\"):c(24, 119, 206, 264, 1023, 1094)-c(100, 194, 282, 340, 1099, 1170)"
* Check for missing values... FAIL
chr start end
1 c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1") NA NA
* Check for larger start position... PASS.
* Check if zero based... PASS
[1] FALSE
If I ignore all this and try to continue with the "invalid" files, I can't use merge or intersect even with small test files. All the advice online I've seen about segfaults are about memory size, but even using 100GB on my local cluster and trying to do an intersection between the two strands of the small 100 line file, I get a segfault. No error codes or anything, just a segfault. I've attached one of the test files here.