coverageBed and spliced reads

157 views
Skip to first unread message

Felix Schlesinger

unread,
Jun 23, 2010, 11:25:07 AM6/23/10
to bedtools-discuss
Hi,

I have a question about using bedtools to compute overlap between
genomic features and RNAseq reads.
The mapped reads are either in BAM (denoting splice junctions in the
CIGAR string) and as BED12 files (using blocks for the read
fragments). The documentation of coverageBED does not mention spliced
alignments and I am wondering how they are handled.
Specifically are the 'introns' of spliced reads considered when
considering overlap and coverage?

Also is there a way to only consider reads that fall fully within the
features to contribute coverage? IntersectBed seems close (with -f 1)
but it counts the number of BED features that overlap each BAM read,
not the other way around, when using option -c.

Thank you
Felix Schlesinger

Felix Schlesinger

unread,
Jun 23, 2010, 11:28:04 AM6/23/10
to bedtools-discuss
Just saw the older post about this topic. (I had searched for
'spliced' not for 'gapped')

Sorry
Felix

Aaron Quinlan

unread,
Jun 23, 2010, 11:35:07 AM6/23/10
to bedtools...@googlegroups.com
No problem, Felix. This is in the works, but the release is taking longer than I expected.

> Also is there a way to only consider reads that fall fully within the
> features to contribute coverage? IntersectBed seems close (with -f 1)
> but it counts the number of BED features that overlap each BAM read,
> not the other way around, when using option -

If I understand correctly, you could combine intersectBed and coverageBed as follows:

intersectBed -abam reads.bam -b target.bed -f 1.0 -bed | coverageBed -a stdin -b target.bed

This will only pass to coverageBed those alignments that are 100% within the targets. Consequently, the coverage will be computed on the targets solely from said reads. Is this what you wanted?

Best,
Aaron

Felix Schlesinger

unread,
Jun 23, 2010, 12:15:02 PM6/23/10
to bedtools-discuss
On Jun 23, 11:35 am, Aaron Quinlan <aaronquin...@gmail.com> wrote:
> > Also is there a way to only consider reads that fall fully within the
> > features to contribute coverage? IntersectBed seems close (with -f 1)
> > but it counts the number of BED features that overlap each BAM read,
> > not the other way around, when using option -
>
> If I understand correctly, you could combine intersectBed and coverageBed as follows:
> intersectBed -abam reads.bam -b target.bed -f 1.0 -bed | coverageBed -a stdin -b target.bed

Almost, but not quite I think. If some features in the bed file
overlap, the filter step will include reads that fall fully within one
feature and the coverageBed will count them also for the feature they
only partially overlap.

Felix

Aaron Quinlan

unread,
Jun 23, 2010, 1:23:40 PM6/23/10
to bedtools...@googlegroups.com
Ah, I see the issue now. Unfortunately, it is current very painful for me to change coverageBed such that it will count a read if and only if it overlaps with solely one feature. I tried to think of ways to "hack" this with intersectBed, but could not think of one. I apologize.

I believe that the htseq package (http://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count) will handle this by setting the appropriate "intersection-strict" value.

I will think more about this and consider modifying the tools to address this.

I assume you need this to avoid double-counting for a specific enrichment statistic???

Aaron

Felix Schlesinger

unread,
Jun 23, 2010, 2:38:44 PM6/23/10
to bedtools-discuss
Hi,

To compute expression from RNAseq reads I want to count the number of
fragments (i.e. 'blocks' in split alignments) that fall within each
exon. A fragment that crosses over the boundary of an exon does not
really belong to that exon. For some situations this difference can be
important.

Thanks for your work on this great set of tools. Looking forward to
the next release.

Bye
Felix
Reply all
Reply to author
Forward
0 new messages