featurecount with Unassigned_Ambiguity and Unassigned_MultiMapping and differential expression analysis

756 views
Skip to first unread message

setar...@gmail.com

unread,
Feb 10, 2017, 10:16:13 AM2/10/17
to Subread
Hi everybody,

I used STAR for mapping reads against the genome and then featureCounts on the sorted bam file. Based on the output, the assigned read is just 30% and 

Unassigned_Ambiguity 10548803
Unassigned_MultiMapping 4103060

I know that with "-M" and "-O" the above reads considered by featureCounts, but I don't know how it is accurate? I'm going to do differential expression (DE) analysis at the both transcript and gene level by edgeR on the generated count by featureCounts. I think that without -M and -O, we may lose much data, especially, with DE analysis where the read count imply the read expression level. Could you please advise me on this issue?


Thanks in advance

Wei Shi

unread,
Feb 21, 2017, 5:44:18 PM2/21/17
to Subread
Apologies for my slow response.

Is your assignment rate of 30% from assigning reads to transcripts or assigning reads to genes? This percentage is low if you assigned your reads to genes in mouse or human genome.

Your counting results will be less accurate if you include reads that are multi-mapping or overlap with more than one feature. So I normally wouldn't recommend including such reads in your counting. However, you may consider counting such reads in a fractional manner (see '-f' option).

Hope this helps.

Wei

setar...@gmail.com

unread,
Feb 24, 2017, 5:44:47 AM2/24/17
to Subread
Hi Wei,

Thank you for your response. You asked me "Is your assignment rate of 30% from assigning reads to transcripts or assigning reads to genes? This percentage is low if you assigned your reads to genes in mouse or human genome". Actually, this is assigned reads to the transcript for a plant genome (Hordeum vulgar). Now, please kindly tell me if the 30% is acceptable and I go ahead for a successful study!? 

I used the below command:
./featureCounts map1Aligned.sortedByCoord.out.bam -T 4 -a annot.gtf -t transcript -g transcript_id -o counts_1.txt -R

Regarding your suggestion for using "-f", as the manual says "it is for read summarization at feature level (eg. exon level), so could you please let me know if I should use for transcript level, too? 


Thanks in advance

Wei Shi

unread,
Feb 26, 2017, 5:22:47 PM2/26/17
to Subread
In human and mouse genomes, transcripts from the same gene often have a large overlap between each other. When you assign reads to these transcripts, it is hard to determine which transcript the read originates from if the read overlaps with more than one transcript. With the default setting of featureCounts such reads will not be counted and this may result in low percentage of reads being assigned.

I do not know if this is the case for the plant genome you are studying. Do the transcripts from the same gene have a large overlap between each other? Why do you want to count reads to transcripts instead of to genes?

Best

Wei
Reply all
Reply to author
Forward
0 new messages