featurecounts multi-map and multi-overlap modes

1,728 views
Skip to first unread message

Camille Daniels

unread,
Oct 12, 2014, 8:40:24 AM10/12/14
to sub...@googlegroups.com
Hello,

I've run featurecounts (subread-1.4.5-p1-Linux-x86_64) using a bam file containing paired-end DNA reads mapped to an annotated bacterial genome.

Total fragments: 832927

Running ./featureCounts -T 6 -p -B -M -R -t CDS -g ID -a bact.gff -o s1_CDS_multimap.txt s1_sortcoord.bam produces

Assigned    714609
Unassigned_Ambiguity    97850
Unassigned_MultiMapping    0
Unassigned_NoFeatures    20468
Unassigned_Unmapped    0
Unassigned_MappingQuality    0
Unassigned_FragementLength    0
Unassigned_Chimera    0
Unassigned_Secondary    0
Unassigned_Nonjunction    0
Unassigned_Duplicate    0


and sum of counts in output txt file is : 714609

When I add -O option to the command above to also consider multi-overlaps, the count sum in the output file below does not match the total assigned.
 Is this correct? Are the counts inflated due to reads that may be overlapping more than one meta-feature? 

Assigned    812459
Unassigned_Ambiguity    0
Unassigned_MultiMapping    0
Unassigned_NoFeatures    20468
Unassigned_Unmapped    0
Unassigned_MappingQuality    0
Unassigned_FragementLength    0
Unassigned_Chimera    0
Unassigned_Secondary    0
Unassigned_Nonjunction    0
Unassigned_Duplicate    0


Sum of counts in txt file 1,098,102

Wei Shi

unread,
Oct 12, 2014, 9:29:21 PM10/12/14
to sub...@googlegroups.com
Dear Camille Daniels,

Yes, this is expected. When you used the '-O' option, each read will be assigned to all their overlapping features and the resulting number of counts should be equal to or greater than the number of assigned reads. In your case, when you turned on '-O', you got 812459  reads that were successfully assigned but the total number of counts (1,098,102) is greater than the number of assigned reads since quite a few reads were counted multiple times.

In your first run of featureCounts summarization where '-O' was not used, 97850 reads were found to overlap with more than one gene and these reads were not assigned due to assignment ambiguity. 714609 reads were successfully assigned and each of these reads was assigned to only one gene. These reads plus the 97850 unassigned reads constitute the 812459  reads that were assigned in your second run where '-O' was turned on.

Hope this makes it clear.

Best regards,

Wei

Camille Daniels

unread,
Oct 13, 2014, 9:41:21 AM10/13/14
to sub...@googlegroups.com
 Wei,

Yes-- that helps!

Thank you!

Camlle
Reply all
Reply to author
Forward
0 new messages