Aaron,
Sorry it took a while to reply, it took a while for me to work through
this topic and get time to respond. Many thanks for your promptness,
in fact I did try the -bed12 option the day of your response. The
trouble was that bed4 (or any bed <12) doesn't have aggregated
features, so I basically had to write a script to aggregate multiple
rows of bed4 together into singular rows in bed12 ... ultimately
describing a pseudo-feature which was the total length of the aligned
segments and the gaps between them. It worked, but feels like a
kludge. Obviously it's better to start with something already having
the aggregated feature information, in my case I tend to use PSL since
I have a smooth funnel from PSL to sequence-SAM, to sequence-BAM. And
I like saying BAM!
But this workflow is still good, and needed, in cases where I've used
BEDtools to merge two files together, then aggregate a new set of
feature segments. I don't know how one could automate that process,
since the aggregation step seems situation-dependent. I could
*envision* a way, but that's another topic. :-) Anyway, I wrote a
script which does bed4ToBed12, it just needs to know how to join
things.
The other issue I encountered was logistical wrt my downstream use in
GBrowse 2. It may be similar for other genome browsers, which is why
I mention it. If my newly-aggregated bed12 feature is too huge,
performance is effectively lost when using BAM. I suspect it
supercedes the point of using BAM files to have features 500 kb long.
The only suggestion I can think of is similar to the maxIntron flags
used in other toolsets, since most of these cases have just one
ginormous gap. Then again, it could just be a GBrowse 2 issue.
If I may ask, what do you do with multi-segment features? I love
BEDtools, so I find myself encoding information into column 4 "name"
field to keep track of exons and multi-segment features, like
"exon_1", "exon_2", etc.
I appreciate your proactiveness!
Best,
James