featureCounts: using --largestOverlap

475 views
Skip to first unread message

Kira Mourao

unread,
Mar 23, 2016, 1:47:02 PM3/23/16
to Subread
Hi,

I'm trying to use featureCounts with the --largestOverlap option, but I'm not seeing the difference I was expecting, so I think I may be using it incorrectly. I'm using featureCounts 1.5.0-p1 and calling:

featureCounts -s 2 -p -f -t exon --largestOverlap -a mygtffile.gtf -o counts.txt fwd.bam rev.bam


The manual says "If specified, reads (or fragments) will be assigned to the target that has the largest number of overlapping bases." so I was expecting that a read overlapping more than one exon would be assigned to the exon it overlaps most. My data has reads overlapping multiple exons, but while my counts are higher with --largestOverlap, a lot of the overlapping reads are not counted.


What is the intended behaviour? Do I need to set additional options to use it?


Thanks

Kira

Wei Shi

unread,
Mar 23, 2016, 6:37:19 PM3/23/16
to Subread
Hi Kira,

I cannot see anything you did wrong. If a read has the same number of overlapping bases with two exons, it will still not be counted. I think that is the reason why you still got a lot of unassigned overlapping reads.

You may take at the 'counts.txt.summary' file to see how many reads were unassigned due to ambiguity.

Cheers,
Wei

Kira Mourao

unread,
Mar 29, 2016, 3:13:18 AM3/29/16
to Subread
Hi Wei

Thanks, that does explain what I'm seeing: reads with the same number of overlapping bases were not being counted, which in particular means that exons with the same start and end positions (as in multiple transcripts) always have 0 counts. 

In the scenario I'm working with, this behaviour is not very helpful, and at the moment I've modified my local version of the featureCounts code to assign such reads randomly to one of the exons which is overlapped. While this is an approximation, it's better than 0, as it means I can decide afterwards what to do about reads in overlapping exons, whereas if the reads are not counted I can't do anything.

Is it possible to make it an option of --largestOverlap to also assign reads which overlap with the same number of bases in this way?

Cheers
Kira

Wei Shi

unread,
Mar 29, 2016, 11:25:18 PM3/29/16
to Subread
Hi Kiro,

Would the '-O' option be helpful for what you are trying to do? With this option, a read will be assigned to all its overlapping features and you can decide what to do with those reads overlapping multiple exons afterwards as well.

Cheers,
Wei

Kira Mourao

unread,
Mar 30, 2016, 7:39:40 AM3/30/16
to Subread
Hi Wei

No, not really - what I'm doing means I know each read has only been assigned to one exon, so, for example, I can sum across all the exons which overlap exactly to get a count. With -O I don't know how many times each read has been assigned, and so I can't factor out the multiple counts. AFAIK I would have to use -R and go through the assignment of each read one by one, which rather negates the point of using featureCounts - it's faster and simpler for me to assign those reads to one exon only.

Cheers
Kira 
Reply all
Reply to author
Forward
0 new messages