Hello
I am using featureCounts for Illumina truseq RNAseq Human transcriptome data aligned with STAR. I have used it in the past with this type of libraries successfully with no issues, but in a recent data set I’m analysing I have very low assigned fragments. FeatureCounts successfully assigned fragments is consistently low ~ 27 – 30% but STAR uniquely mapped reads is always 80% or above for the sample.
When i look at the output, the gene names are listed correctly, with counts there, but for genes i know should have high counts, they are low.
Alignments are made in STAR using gencode primary assembly HG38 . My featureCounts script is below. I am using version 2.0.1
Thanks for any advice.
featureCounts -T 2 -p -t exon -F GTF -g gene_name -M -s 2 -o $outfile \
-G /gencode.v29.transcripts.fa \
-a /gencode.v29.annotation.gtf \
Load annotation file gencode.v29.annotation.gtf ...
|| Features : 1262773
|| Meta-features : 57133
|| Chromosomes/contigs : 25
|| Load FASTA contigs from gencode.v29.transcripts.fa...
|| 206693 contigs were loaded
|| Process BAM file.Aligned.sortedByCoord.out.bam...
|| Strand specific : reversely stranded
|| Paired-end reads are included.
|| Total alignments : 54471316
|| Successfully assigned alignments : 14959257 (27.5%)
|| Running time : 3.74 minutes
And from Star Log file
Number of input reads | 32190521
Average input read length | 200
UNIQUE READS:
Uniquely mapped reads number | 25999712
Uniquely mapped reads % | 80.77%
Average mapped length | 197.49
Number of splices: Total | 8011271
Number of splices: Annotated (sjdb) | 7865690
Number of splices: GT/AG | 7930811
Number of splices: GC/AG | 64006
Number of splices: AT/AC | 5161
Number of splices: Non-canonical | 11293
Mismatch rate per base, % | 0.40%
Deletion rate per base | 0.02%
Deletion average length | 1.66
Insertion rate per base | 0.02%
Insertion average length | 1.42
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 5313410
% of reads mapped to multiple loci | 16.51%
Number of reads mapped to too many loci | 58914
% of reads mapped to too many loci | 0.18%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 2.49%
% of reads unmapped: other | 0.05%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
So if I use the number of assigned reads from featureCounts against the number of uniquely mapped reads from STAR then the percentage successfully assigned becomes 57% rather than 27.5% reported by featureCounts, this is still lover than expected unless i have extremely high multimapping reads, although i used the flag -M so i thought they would be counted. Thanks for any comments, Sophia