bedtools intersect illegal characters

341 views
Skip to first unread message

Schorn, Andrea

unread,
Apr 10, 2020, 1:09:38 PM4/10/20
to bedtools...@googlegroups.com
Hi there,

I have this funky problem with bedtools intersect. My bed files are not standard format (brackets in columns 10 and/or 12) but some of them run fine, while others that look absolutely the same to me don't. Examples (attached) here are test_3000.bed fine, but test_4000.bed not fine. I tried bedtools v2.28.0 and v2.29.2. I understand it has to do with bedtools exiting on illegal characters and non-canonical bed formats, but why does it seem to be able to run sometimes and how to make it tolerant always? It would be incredibly useful if only the first 6 columns had to be canonical/proper bed.

Thank you,
Andrea




(ngs) aschorn@bnbcompute54:~/data/WolframPGC2/CCA_tRF$ intersectBed -f 1.0 -wb -sorted -a test_3000.bed -b test_4000.bed
***** ERROR: illegal character '(' found in integer conversion of string "(1)". Exiting...

(ngs) aschorn@bnbcompute54:~/data/WolframPGC2/CCA_tRF$ intersectBed -f 1.0 -wb -sorted -a test_3000.bed -b test_3000.bed
chr16 60197616 60197652 tRNA:tRNA-Thr-ACA 230 - 21.6 0.0 0.0 (38) 38 2 chr16 60197616 60197652 tRNA:tRNA-Thr-ACA 230 - 21.6 0.0 0.0 (38) 38 2
chr16 62745065 62745121 tRNA:tRNA-Ala-GCY_ 292 + 21.1 3.4 0.0 6 64 (11) chr16 62745065 62745121 tRNA:tRNA-Ala-GCY_ 292 + 21.1 3.4 0.0 6 64 (11)
chr16 64779646 64779703 tRNA:tRNA-Ala-GCY_ 235 - 27.6 1.7 0.0 (14) 61 3 chr16 64779646 64779703 tRNA:tRNA-Ala-GCY_ 235 - 27.6 1.7 0.0 (14) 61 3
chr16 66274008 66274073 tRNA:tRNA-Ala-GCY_ 229 - 34.9 0.0 4.5 (1) 74 12 chr16 66274008 66274073 tRNA:tRNA-Ala-GCY_ 229 - 34.9 0.0 4.5 (1) 74 12
chr16 66821414 66821483 tRNA:tRNA-Ala-GCY_ 232 + 26.5 2.9 2.9 4 73 (2) chr16 66821414 66821483 tRNA:tRNA-Ala-GCY_ 232 + 26.5 2.9 2.9 4 73 (2)
chr16 66858343 66858379 tRNA:tRNA-Met_ 308 - 2.7 0.0 0.0 (38) 38 2 chr16 66858343 66858379 tRNA:tRNA-Met_ 308 - 2.7 0.0 0.0 (38) 38 2
chr16 70961173 70961230 tRNA:tRNA-Ala-GCA 252 + 29.3 0.0 0.0 4 61 (14) chr16 70961173 70961230 tRNA:tRNA-Ala-GCA 252 + 29.3 0.0 0.0 4 61 (14)
chr16 73081341 73081406 tRNA:tRNA-Ala-GCY_ 307 + 17.5 3.1 4.5 1 65 (10) chr16 73081341 73081406 tRNA:tRNA-Ala-GCY_ 307 + 17.5 3.1 4.5 1 65 (10)
chr16 74443218 74443284 tRNA:tRNA-Ala-GCY_ 242 + 35.8 0.0 0.0 1 67 (8) chr16 74443218 74443284 tRNA:tRNA-Ala-GCY_ 242 + 35.8 0.0 0.0 1 67 (8)
chr16 75434179 75434255 tRNA:tRNA-Ile-ATT 587 + 7.8 0.0 0.0 1 77 (0) chr16 75434179 75434255 tRNA:tRNA-Ile-ATT 587 + 7.8 0.0 0.0 1 77 (0)

(ngs) aschorn@bnbcompute54:~/data/WolframPGC2/CCA_tRF$ intersectBed -f 1.0 -wb -sorted -a test_4000.bed -b test_4000.bed
***** ERROR: illegal character '(' found in integer conversion of string "(1)". Exiting...

test_3000.bed
test_4000.bed

John Urban

unread,
Apr 10, 2020, 3:05:48 PM4/10/20
to bedtools...@googlegroups.com
I see what you mean though... the other file has similar things that should raise an error. Tap out. I agree. An option to just check the first 3 or 6 columns would be useful -- assume all others are strings.


John Urban

unread,
Apr 10, 2020, 3:05:48 PM4/10/20
to bedtools...@googlegroups.com
Well, the error is certainly coming from file test_4000.bed on line 6, column 10: value = (1). It would appear that BEDtools might be expecting integers based on the other values in that column, and "(1)" cannot be converted. If the parentheses have a meaning to you, why not just have an additional column with 0s and 1s representing whether parentheses should be added later.


--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bedtools-discuss/A73B4949-D5F9-4050-8D66-E7EBC78330FF%40cshl.edu.

Aaron Quinlan

unread,
Apr 12, 2020, 10:51:35 AM4/12/20
to John Urban, bedtools...@googlegroups.com
This is a special case. Your test_3000.bed happens to have exactly 12 columns, so bedtools is trying to parse the file into a proper BED12 format, where the 10th column is meant to be an integer reflecting the number of blocks in the BED interval. The parentheses violate this expectation.


I think your best bet is to remove the parentheses. For example:

sed 's/[)(]//g' test_3000.bed > test_3000.noparens.bed

Best,
Aaron



On April 10, 2020 at 1:05:50 PM, John Urban (mr.joh...@gmail.com) wrote:

I see what you mean though... the other file has similar things that should raise an error. Tap out. I agree. An option to just check the first 3 or 6 columns would be useful -- assume all others are strings.


--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages