Successfully assigned fragments is low

hisoph

unread,

Jun 17, 2020, 12:47:50 PM6/17/20

to Subread

Hello

I am using featureCounts for Illumina truseq RNAseq Human transcriptome data aligned with STAR. I have used it in the past with this type of libraries successfully with no issues, but in a recent data set I’m analysing I have very low assigned fragments. FeatureCounts successfully assigned fragments is consistently low ~ 27 – 30% but STAR uniquely mapped reads is always 80% or above for the sample.

When i look at the output, the gene names are listed correctly, with counts there, but for genes i know should have high counts, they are low.

Alignments are made in STAR using gencode primary assembly HG38 . My featureCounts script is below. I am using version 2.0.1

Thanks for any advice.

featureCounts -T 2 -p -t exon -F GTF -g gene_name -M -s 2 -o $outfile \

-G /gencode.v29.transcripts.fa \

-a /gencode.v29.annotation.gtf \

Yang LIAO

unread,

Jun 17, 2020, 7:47:43 PM6/17/20

to Subread

The command is pretty basic so the low counts don't look like to be caused by featureCounts.

You may verify the counting results using samtools, namely you can only output reads in a specific region (one of the gene that you expect to have high counts but not had it) using "samtools view". If there were indeed very few reads from samtools then it confirms that no many reads were mapped to that gene.

hisoph

unread,

Apr 14, 2021, 8:33:43 PM4/14/21

to Subread

Hi Yang, I'm returning to this issue that has occurred again. Do you mean check the bam in igv i didn't unerstand what you meant? I wasn't told if the libraries were stranded or not, but the unstranded and revstranded colums in the reads.per.gene.out.tab file suggest its one of those. N_noFeature is lowest for the column 2. Adding up the reads total counts across all the rows then column 4 is the greatest. Rev Stranded is the most likely.

An example of the counts for one bam file, but typical for all of them is :

Load annotation file gencode.v29.annotation.gtf ...

|| Features : 1262773

|| Meta-features : 57133

|| Chromosomes/contigs : 25

|| Load FASTA contigs from gencode.v29.transcripts.fa...

|| 206693 contigs were loaded

|| Process BAM file.Aligned.sortedByCoord.out.bam...

|| Strand specific : reversely stranded

|| Paired-end reads are included.

|| Total alignments : 54471316

|| Successfully assigned alignments : 14959257 (27.5%)

|| Running time : 3.74 minutes

And from Star Log file

Number of input reads | 32190521

Average input read length | 200

UNIQUE READS:

Uniquely mapped reads number | 25999712

Uniquely mapped reads % | 80.77%

Average mapped length | 197.49

Number of splices: Total | 8011271

Number of splices: Annotated (sjdb) | 7865690

Number of splices: GT/AG | 7930811

Number of splices: GC/AG | 64006

Number of splices: AT/AC | 5161

Number of splices: Non-canonical | 11293

Mismatch rate per base, % | 0.40%

Deletion rate per base | 0.02%

Deletion average length | 1.66

Insertion rate per base | 0.02%

Insertion average length | 1.42

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 5313410

% of reads mapped to multiple loci | 16.51%

Number of reads mapped to too many loci | 58914

% of reads mapped to too many loci | 0.18%

UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00%

% of reads unmapped: too short | 2.49%

% of reads unmapped: other | 0.05%

CHIMERIC READS:

Number of chimeric reads | 0

% of chimeric reads | 0.00%

So if I use the number of assigned reads from featureCounts against the number of uniquely mapped reads from STAR then the percentage successfully assigned becomes 57% rather than 27.5% reported by featureCounts, this is still lover than expected unless i have extremely high multimapping reads, although i used the flag -M so i thought they would be counted. Thanks for any comments, Sophia

Yang LIAO

unread,

Apr 15, 2021, 3:40:04 AM4/15/21

to Subread

Glad to hear that the assigned rate is higher. 57% isn't a very low rate especially when you're doing strand-specific sequencing.

When I said that you can use Samtools to check the mapping results, I meant that you can run samtools like

$ samtools view my_reads.bam chr12:3000000-3200000

(my_reads.bam must be sorted by read coordinates, and an index must be built for the BAM file using samtools index)

This command will only give you the reads that are mapped to the specified region. This is faster than loading the BAM file in IGV or IGB and the results are easier to inteperate.

Reply all

Reply to author

Forward