Low successfully assigned fragments

1,139 views

Skip to first unread message

Elizabeth

unread,

Sep 1, 2016, 7:06:44 PM9/1/16

to Subread

Hello,

I am new to RNAseq analysis and this is my first time using featureCounts. I have some sequences (PE100 from Illumina HiSeq4000), prepared with the TruSeq stranded mRNA kit and want to do differential expression analysis. I aligned my reads to the genome using TopHat2 (consistently around 75% alignment to genome), and am using the resulting accepted_hits.bam as my input into featureCounts.

This is my command for featurecounts (subread-1.5.0):

featureCounts -T 8 \

-p -F GTF -t exon -g Parent -s 2 \

-a Csinensis_154_gene.gtf \

-G Csinensis_154.fa \

-o NC1-8_featurecounts_output.txt \

accepted_hits.bam

My output shows very low successfully assigned fragments- around 20%. I also have changed -B and -M settings, but neither provide much improvement. -s 2 is correct over -s 1. Example output:

//========================== featureCounts setting ===========================\\

|| ||

|| Input files : 1 BAM file ||

|| P accepted_hits_NC1-8_R1.bam ||

|| ||

|| Output file : NC1-8_featurecounts_output.txt ||

|| Summary : NC1-8_featurecounts_output.txt.summary ||

|| Annotation : /home/ubuntu/workspace/rnaseq/phyto_genome/v ... ||

|| ||

|| Threads : 8 ||

|| Level : meta-feature level ||

|| Paired-end : yes ||

|| Strand specific : reversely stranded ||

|| Multimapping reads : not counted ||

|| Multi-overlapping reads : not counted ||

|| ||

|| Chimeric reads : counted ||

|| Both ends mapped : not required ||

|| ||

\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\

|| ||

|| Load annotation file /home/ubuntu/workspace/rnaseq/phyto_genome/versio ... ||

|| Features : 259657 ||

|| Meta-features : 34329 ||

|| Chromosomes/contigs : 1750 ||

|| ||

|| Loading FASTA contigs : /home/ubuntu/workspace/rnaseq/phyto_genome/ver ... ||

|| 12574 contigs were loaded ||

|| ||

|| Process BAM file accepted_hits_NC1-8_R1.bam... ||

|| Paired-end reads are included. ||

|| Assign fragments (read pairs) to features... ||

|| ||

|| WARNING: reads from the same pair were found not adjacent to each ||

|| other in the input (due to read sorting by location or ||

|| reporting of multi-mapping read pairs). ||

|| ||

|| Read re-ordering is performed. ||

|| ||

|| Total fragments : 34002467 ||

|| Successfully assigned fragments : 7239403 (21.3%) ||

|| Running time : 0.89 minutes ||

|| ||

|| Read assignment finished. ||

|| ||

\\===================== http://subread.sourceforge.net/ ======================//

It seems like this is due to many ambiguous reads:

Status accepted_hits_NC1-8_R1.bam

Assigned 7239403

Unassigned_Ambiguity 18514271

Unassigned_MultiMapping 576445

Unassigned_NoFeatures 7672348

Unassigned_Unmapped 0

Unassigned_MappingQuality 0

Unassigned_FragmentLength 0

Unassigned_Chimera 0

Unassigned_Secondary 0

Unassigned_Nonjunction 0

Unassigned_Duplicate 0

Adding the -O to allow ambiguous reads increases the number of assigned fragments, (75%), but I know this is not recommended for RNAseq data. --largestOverlap doesn't improve the number much (in absence of -O).

The header of my gtf and bam look like they match. I did not modify the annotation file, but did obtain this gtf by converting the original gff3 by using gffread from Cufflinks:

head Csinensis_154_gene.gtf

# gffread Csinensis_154_gene.gff3 -o Csinensis_154_gene.gtf

##gff-version 3

scaffold00001 phytozome8_0 mRNA 6848 8905 . - . ID=PAC:18137364;geneID=orange1.1g009555m.g;gene_name=orange1.1g009555m.g

scaffold00001 phytozome8_0 exon 6848 7034 . - . Parent=PAC:18137364

scaffold00001 phytozome8_0 exon 7149 7550 . - . Parent=PAC:18137364

head accepted_hits_NC1-8_R1.sam

J00113:145:HF3W3BBXX:2:1107:14641:48878 97 scaffold00001 6838 50 100M = 7011 387 CTGGTAGAGAGAAATGAACAGTTATATTAAACAGCTCAGCTCACGTGATATTTCTAATTAGTAGGACGATCGCCTTCTCGGCTTCTCCAATCCTAATAGG AAFFFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1 XS:A:-

I'm really new to RNAseq analysis, so I am unsure if there is something that I am missing, because it seems like there are so few fragments successfully assigned. I would greatly appreciate any advice or suggestions you have for troubleshooting.

Thanks!

Liz

Wei Shi

unread,

Sep 2, 2016, 2:16:57 AM9/2/16

to Subread

Hi Liz,

Firstly you do not need to specify '-G' since this option is only useful when '-J' is specified. However this doesn't affect your assignment percentage.

Having a large number of reads in category "Unassigned_Ambiguity" suggests that in your annotation many genes overlap with each other. You may try to use a different annotation to see if assignment rate improves.

Cheers,

Wei

Elizabeth

unread,

Sep 2, 2016, 1:09:09 PM9/2/16

to Subread

I will try this.

Thanks for your suggestion!

Reply all

Reply to author

Forward

0 new messages