Hello,
I am new to RNAseq analysis and this is my first time using featureCounts. I have some sequences (PE100 from Illumina HiSeq4000), prepared with the TruSeq stranded mRNA kit and want to do differential expression analysis. I aligned my reads to the genome using TopHat2 (consistently around 75% alignment to genome), and am using the resulting accepted_hits.bam as my input into featureCounts.
This is my command for featurecounts (subread-1.5.0):
featureCounts -T 8 \
-p -F GTF -t exon -g Parent -s 2 \
-a Csinensis_154_gene.gtf \
-G Csinensis_154.fa \
-o NC1-8_featurecounts_output.txt \
accepted_hits.bam
My output shows very low successfully assigned fragments- around 20%. I also have changed -B and -M settings, but neither provide much improvement. -s 2 is correct over -s 1. Example output:
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| P accepted_hits_NC1-8_R1.bam ||
|| ||
|| Output file : NC1-8_featurecounts_output.txt ||
|| Summary : NC1-8_featurecounts_output.txt.summary ||
|| Annotation : /home/ubuntu/workspace/rnaseq/phyto_genome/v ... ||
|| ||
|| Threads : 8 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Strand specific : reversely stranded ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : not required ||
|| ||
//================================= Running ==================================\\
|| ||
|| Load annotation file /home/ubuntu/workspace/rnaseq/phyto_genome/versio ... ||
|| Features : 259657 ||
|| Meta-features : 34329 ||
|| Chromosomes/contigs : 1750 ||
|| ||
|| Loading FASTA contigs : /home/ubuntu/workspace/rnaseq/phyto_genome/ver ... ||
|| 12574 contigs were loaded ||
|| ||
|| Process BAM file accepted_hits_NC1-8_R1.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| ||
|| Read re-ordering is performed. ||
|| ||
|| Total fragments : 34002467 ||
|| Successfully assigned fragments : 7239403 (21.3%) ||
|| Running time : 0.89 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
It seems like this is due to many ambiguous reads:
Status accepted_hits_NC1-8_R1.bam
Assigned 7239403
Unassigned_Ambiguity 18514271
Unassigned_MultiMapping 576445
Unassigned_NoFeatures 7672348
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
Adding the -O to allow ambiguous reads increases the number of assigned fragments, (75%), but I know this is not recommended for RNAseq data. --largestOverlap doesn't improve the number much (in absence of -O).
The header of my gtf and bam look like they match. I did not modify the annotation file, but did obtain this gtf by converting the original gff3 by using gffread from Cufflinks:
head Csinensis_154_gene.gtf
# gffread Csinensis_154_gene.gff3 -o Csinensis_154_gene.gtf
##gff-version 3
scaffold00001 phytozome8_0 mRNA 6848 8905 . - . ID=PAC:18137364;geneID=orange1.1g009555m.g;gene_name=orange1.1g009555m.g
scaffold00001 phytozome8_0 exon 6848 7034 . - . Parent=PAC:18137364
scaffold00001 phytozome8_0 exon 7149 7550 . - . Parent=PAC:18137364
head accepted_hits_NC1-8_R1.sam
J00113:145:HF3W3BBXX:2:1107:14641:48878 97 scaffold00001 6838 50 100M = 7011 387 CTGGTAGAGAGAAATGAACAGTTATATTAAACAGCTCAGCTCACGTGATATTTCTAATTAGTAGGACGATCGCCTTCTCGGCTTCTCCAATCCTAATAGG AAFFFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1 XS:A:-
I'm really new to RNAseq analysis, so I am unsure if there is something that I am missing, because it seems like there are so few fragments successfully assigned. I would greatly appreciate any advice or suggestions you have for troubleshooting.
Thanks!
Liz