Bamtobed file not passing is.region.valid (and then segfaults on merge or intersect)

Liana Engie

unread,

Apr 10, 2020, 1:09:38 PM4/10/20

to bedtools-discuss

Hello all,

I'm troubleshooting some problems I was having with intersect and merge; I've been using a terminal in Ubuntu as well as R. Ideally I will be trying to accomplish all of this in R. While tracing back, I realized that the bed files I have are failing is.valid.region but I can't tell why. I converted it to bed using bedtools, straight from the bam file generated by STAR alignment. The entire "chr" column uses "chr" prefixes, even if I test it on the first 6 lines it fails:

is.valid.region(head(bam2bed))

VALIDATE REGIONS

* Checking input type... PASS

Input seems to be in bed format but chr/start/end column names are missing

* Check if index is a string... PASS

* Check index pattern... FAIL

Use check.chr = FALSE if no 'chr' prefix

[1] "c(\"chr1\", \"chr1\", \"chr1\", \"chr1\", \"chr1\", \"chr1\"):c(24, 119, 206, 264, 1023, 1094)-c(100, 194, 282, 340, 1099, 1170)"

* Check for missing values... FAIL

chr start end

1 c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1") NA NA

* Check for larger start position... PASS.

* Check if zero based... PASS

[1] FALSE

If I ignore all this and try to continue with the "invalid" files, I can't use merge or intersect even with small test files. All the advice online I've seen about segfaults are about memory size, but even using 100GB on my local cluster and trying to do an intersection between the two strands of the small 100 line file, I get a segfault. No error codes or anything, just a segfault. I've attached one of the test files here.

Thanks everyone, any advice would be appreciated.

Liana

firsthundred.bed

Aaron Quinlan

unread,

Apr 12, 2020, 10:44:52 AM4/12/20

to Liana Engie, bedtools...@googlegroups.com

Hi Liana,

Could you show the first few lines of the file that represents your “bamtobed” object in R? I am not familiar with the is.valid.region() function, and thus don’t have much insight.

Apologies,

Aaron

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bedtools-discuss/430f6b4e-dc5e-4d30-a822-bc12789bb5ba%40googlegroups.com.

Liana Engie

unread,

Apr 12, 2020, 5:40:40 PM4/12/20

to Aaron Quinlan, bedtools...@googlegroups.com

Hello all,

Here's what my bam file looks like:

> bam[1:20,]
GAlignmentPairs object with 20 pairs, strandMode=1, and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
[1] chr1 + : 25-100 -- 120-194
[2] chr1 + : 207-282 -- 265-340
[3] chr1 + : 1024-1099 -- 1095-1170
[4] chr1 + : 1482-1556 -- 1497-1572
[5] chr1 + : 2536-2611 -- 2538-2613
... ... ... ... ... ... ...
[16] chr1 + : 4095-4169 -- 4184-4257
[17] chr1 + : 4181-4256 -- 4225-4300
[18] chr1 + : 4681-4755 -- 4997-5071
[19] chr1 + : 4687-4758 -- 4727-4802
[20] chr1 + : 4751-4826 -- 4864-4939
-------
seqinfo: 1061 sequences from an unspecified genome

I've converted it to a bed file manually (constructing a dataframe) or with bedtools in the console.

> bam2bed <- fread("bedtools_converted_bam2bed.bed",data.table=TRUE)
> head(bam2bed)
chr start end name score strand
1: chr1 24 100 NS500449:173:HH7L3BGXX:3:21504:14989:20106/1 255 +
2: chr1 119 194 NS500449:173:HH7L3BGXX:3:21504:14989:20106/2 255 -
3: chr1 206 282 NS500449:173:HH7L3BGXX:2:21210:13252:12427/1 255 +
4: chr1 264 340 NS500449:173:HH7L3BGXX:2:21210:13252:12427/2 255 -
5: chr1 1023 1099 NS500449:173:HH7L3BGXX:1:12210:20287:2447/1 255 +
6: chr1 1094 1170 NS500449:173:HH7L3BGXX:1:12210:20287:2447/2 255 -

This has issues with is.valid.region(), which I started to pay more attention to after the intersect began giving me problems. I had also tried simplifying it to a 6 column or 4 column bed and removing all non "chr" prefixed lines. The files are sorted and I should have plenty of memory allocated from my computer cluster.

V1 V2 V3 V4 V5 V6
1: chr 1 24 194 0 0 +
2: chr 1 206 340 0 0 +
3: chr 1 1023 1170 0 0 +
4: chr 1 1481 1572 0 0 +
5: chr 1 2535 3075 0 0 +
6: chr 1 3162 3441 0 0 +

V1 V2 V3 V4
1451 chr1 11873 14409 NR_046018
1448 chr1 14361 29370 NR_024540
1443 chr1 17368 17436 NR_107063
1444 chr1 17368 17436 NR_128720
1446 chr1 17368 17436 NR_107062
1447 chr1 17368 17436 NR_106918

If I take even this 6 line bed and intersect it with another 6 line bed, I get a segfault. Either of these:

bedtools intersect -b 100linetest.bed 100linetest.bed -S -sorted > testresult.bed

bedtools intersect -b onlychr1.bed -S -sorted > onlychr1int.bed

Results in a segfault

# Segmentation fault (core dumped)

Liana

Milind Chauhan

unread,

Jan 31, 2023, 1:49:40 PM1/31/23

to bedtools-discuss

Hi,

I am facing the same problem using bedtools. Did you find the solution to this?

Reply all

Reply to author

Forward