Please describe the featureCounts summary

5,074 views
Skip to first unread message

Kamil Slowikowski

unread,
Dec 18, 2014, 10:41:01 AM12/18/14
to sub...@googlegroups.com, Maria Gutierrez-Arcelus, Harm-Jan Westra

Could I ask you to please describe each row in the featureCounts summary, or correct me if my understanding is incorrect?

The user’s guide does not explain it, so I’m trying to interpret what you’ve described in the paper. However, some terms such as “nonjunction” are not mentioned in the paper.

For example:

Status  SRR445718.bam
Assigned        17983705
Unassigned_Ambiguity    1796487
Unassigned_MultiMapping 0
Unassigned_NoFeatures   2755404
Unassigned_Unmapped     8387602
Unassigned_MappingQuality       2020467
Unassigned_FragementLength      0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0
  • Assigned:
    • The read (or fragment) was assigned to a gene feature in the annotation file provided with option -a
  • Ambiguity:
    • Section 5.3 of the paper. The fragment might originate from gene A or gene B, and it is not clear which gene it originated from.
  • MultiMapping:
    • The fragment maps to multiple different positions.
    • Will a read with multiple alignments be assigned or unassigned if I use the --primary option?
  • NoFeatures:
    • The fragment mapped to a region that is not annotated in the annotation file.
  • Unmapped:
    • The fragment is not mapped to the reference at all.
  • MappingQuality:
    • The fragment’s mapping quality is below the threshold I set with option -Q.
  • FragementLength:
    • The insert size between the two read mates is larger or smaller than the options set with -d and -D.
  • Chimera:
    • ??? I’m guessing that the fragment’s mates are mapped to different chromosomes.
  • Secondary:
    • ???
  • Nonjunction:
    • ???
  • Duplicate:
    • The fragment is duplicated in the data, so it was not assigned.

Wei Shi

unread,
Jan 14, 2015, 5:31:58 PM1/14/15
to sub...@googlegroups.com, maria...@gmail.com, westra....@outlook.com
Apologies for my late reply. Your explanations are mostly correct. Below are my answers to your questions:

- Multi-mapping

When --primary is specified, the -M option will be ignored, meaning that a primary alignment will always be counted no matter the read is multi-mapped or not.

- Chimera

Your explanation is correct.

- Secondary

Not assigned because the alignment was marked as a secondary alignment in the FLAG field. The '--primary' option need to be turned on to disable the counting for such alignments.

- Nojunction

Not assigned because the read is an exon-spanning read. This is related to the option '--countSplitAlignmentsOnly'.

Hope this helps.

Wei

John Blischak

unread,
Feb 2, 2015, 1:14:01 PM2/2/15
to sub...@googlegroups.com, maria...@gmail.com, westra....@outlook.com
Hi Wei,

Is there a plan to incorporate these explanations into the documentation? I recently helped my labmate to start using featureCounts, and when she asked for a description of the summary categories, I realized that this thread was the best thing I could send her. featureCounts is such a great program, I wouldn't someone to not use it because they cannot understand the results. I am willing to help implement this addition to the documentation if you can point me to where the code is hosted (e.g. GitHub, Bitbucket, etc.).

Thanks,

John

Kamil Slowikowski

unread,
Feb 2, 2015, 1:43:13 PM2/2/15
to sub...@googlegroups.com

Hi John,

I asked Wei about contributing. This was his reply:

I’m not sure if it is a good idea to allow other people to make contributions to our package at the moment since the pacakge includes quite a few programs and it has a complexed structure. We might move the code repository to for example git-hub in the future, but at this stage we would like to keep it to ourselves to ensure a smooth development of the programs (especially new programs and algorithms).

I believe that source code for scientific software — regardless of complexity — should be stored in a permanent public repository that encourages contributions from the community.

I am also willing to help implement additional features or write more documentation. Github is an appropriate solution for managing contributions from the community.

Wei, I encourage you to look at the way other complex packages with multiple programs are organized on github:

You might consider creating a separate github repo with the R package for subread.

Kamil

John Blischak

unread,
Feb 26, 2015, 2:13:02 PM2/26/15
to sub...@googlegroups.com
Hi Kamil and Wei,


On Mon, Feb 2, 2015 at 12:43 PM, Kamil Slowikowski wrote:
> I asked Wei about contributing. This was his reply:
>
> I’m not sure if it is a good idea to allow other people to make
> contributions to our package at the moment since the pacakge includes quite
> a few programs and it has a complexed structure. We might move the code
> repository to for example git-hub in the future, but at this stage we would
> like to keep it to ourselves to ensure a smooth development of the programs
> (especially new programs and algorithms).

Putting the code on GitHub will not hurt the development. Git is a
distributed version control system. Thus no one can make changes to
the main codebase without the approval of the owner of the code
repository. See the link below for how development with distributed
version control systems, e.g. Git, differs from development with
centralized systems like Subversion:

http://git-scm.com/book/en/v2/Getting-Started-About-Version-Control


> I believe that source code for scientific software — regardless of
> complexity — should be stored in a permanent public repository that
> encourages contributions from the community.

100% agree.


> You might consider creating a separate github repo with the R package for
> subread.

Bioconductor has support for this. You can allow others to help you
develop the software using Git/GitHub, and you can continue to develop
the software as you have been using Subversion.

http://bioconductor.org/developers/how-to/git-svn/

Subread is a great tool that many users find very useful. Wei, please
consider letting us help you improve it.

Best,

John

Wei Shi

unread,
Apr 2, 2015, 1:36:20 AM4/2/15
to sub...@googlegroups.com, maria...@gmail.com, westra....@outlook.com
Hi John,

Just let you know I have added more documentation for featureCounts in the latest release of Subread package (1.4.6-p2).

Best wishes,
Wei

Marcelo Laia

unread,
Apr 21, 2020, 3:09:58 PM4/21/20
to Subread
Hi!

I'm in trouble to understand the featurecounts summary (stat slot) and found this thread.

In the Rsubread/Subread Users Guide Rsubread v2.0.0/Subread v2.0.0 21 October 2019 downloaded from Biocomductor webpage I found, on section 6.2.9 Program output, pages 36-37:

Unassigned Unmapped: unmapped reads cannot be assigned.
Unassigned NoFeatures: alignments that do not overlap any feature.

I'm interested in known the difference between these two output.

In the Kamil's message, there are some differences:

Unassigned Unmapped: The fragment is not mapped to the reference at all.
Unassigned NoFeatures: The fragment mapped to a region that is not annotated in the annotation file.

Whats is the explanation for these two summary? I need to explain these differences in a speech (short talk). There area some draw or schematic slide for show the differences?

In my case, about 50% of all reads are Unassigned NoFeatures. What I could do in downstream analysis?

Thank you very much!

Marcelo

Marcelo Laia

unread,
Apr 21, 2020, 5:12:38 PM4/21/20
to Subread
Dear All!

I found this figure (https://www.mathworks.com/help/bioinfo/ref/featurecount_overlapmethod.png) on this site https://www.mathworks.com/help/bioinfo/ref/featurecount.html.
  
It helped me a lot.
  
Thank you!
  
Marcelo
Reply all
Reply to author
Forward
0 new messages