RNA-seq strand info

Jason

unread,

Aug 10, 2016, 12:37:26 AM8/10/16

to Subread

Dear Dr. Shi,

I'm a new user of featureCounts for RNA-seq data analysis. I had some data generated with Illumina TruSeq stranded mRNA LT kit (http://support.illumina.com/sequencing/sequencing_kits/truseq_stranded_mrna_lt_sample_prep_kit.html) and my goal is to do differential expression analysis.

I have a specific question about strand option in featureCounts.

-s

It has three possible values: 0 (unstranded), 1 (stranded) and 2 (reversely stranded)

I'm wondering in the case of TruSeq, which value should I use? 1 or 2? I understand TruSeq read 1 map to the anti-sense strand. But is 1 for sense and 2 for anti-sense strand?

Also, I'd like to hear about your thoughts on using --primary -M and --fraction, and −−ignoreDup option. I use hisat2 to align reads and with the default command line, it allows up to 5 alignments per read. If I use --primary, that means that 5 alignments are counted for one read. Would this be ideal? I read in many places saying it's best to discard multi-mapping reads. But would using both -M --fraction be better than discarding them all? Would it be better not to use −−ignoreDup option since it's reasonable that RNA-seq data have many duplicated reads?

If I don't specify --primary, -M --fraction at all, does it mean that only non-multimapping primary reads will be counted?

Thank you.

Jason

Wei Shi

unread,

Aug 10, 2016, 1:05:46 AM8/10/16

to Subread

Hi Jason,

From your description I think you should set '-s 2'. However you might experiment with '-s 1'. Incorrect setting of this argument should result in very few reads being counted (therefore you can know it is an incorrect setting).

For counting RNA-seq reads, I strongly recommend counting uniquely mapped reads only since this will give you the most accurate counting results. You should count primary alignments only as well (do not count secondary alignments), for the purpose of getting as accurate counts as possible. No reads should be counted more than once.

I don't think you need to ignore duplicated reads, ie. keep reads that map to the same location.

Hope this helps.

Cheers,
Wei

Jason

unread,

Aug 10, 2016, 1:50:17 AM8/10/16

to Subread

Hi Wei,

Thanks for your quick reply! I'd take your suggestion. Is there a way to specify options to only count uniquely mapped primary reads from featureCounts command line? Or do I have to filter out and only keep uniquely mapped primary reads before running featureCounts?

Thanks.

Jason

Wei Shi

unread,

Aug 10, 2016, 2:10:36 AM8/10/16

to Subread

By default, featureCounts does not count multi-mapping reads. By specifying '--primaryOnly', featureCounts will count primary alignments only and ignore secondary alignments.

You might also take a look at aligner parameters to see if you can ask it to output uniquely mapped reads only.

Cheers,

Wei

Jason

unread,

Aug 10, 2016, 10:15:11 AM8/10/16

to Subread

Sorry I'm a bit confused.

The description for --primary option says (Subread v1.5.0-p1/Rsubread v1.20.3, 1 Feb 2016):

If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. All primary alignments in a dataset will be counted no matter they are from multi- mapping reads or not (ie. ‘-M’ is ignored).

So it suggests that if --primary option is used, multi-mapping reads will also be counted, which is not what I want. Does that mean that I have to only give featureCounts primary reads through filtering or aligner options and not use --primary option in featureCounts?

Thanks.

Jason

Wei Shi

unread,

Aug 10, 2016, 7:47:30 PM8/10/16

to Subread

Sorry I did not make myself very clear.

You are correct that multi-mapping reads might be counted when the --primary option is on. So you will have to remove multi-mapping reads from the mapping results before you feed them to featureCounts. After you have done so, you can then use --primary option to count primary alignments only.

You may try the Subread or Subjunc aligner included in Subread package, which can be easily used to produce unique alignments (-u option).

Best regards,

Wei

Jason

unread,

Aug 10, 2016, 9:24:54 PM8/10/16

to Subread

No problem.

After trying it out with my data, I think actually I don't have to remove multi-mapping reads if the NH:i field is set in the output of an alignment results.

Could you comment on if my understanding is correct?

I think:

1. When --primary option is not set, featureCounts use NH:i field to figure out if a read maps to one location or multiple locations. That's why you mentioned "By default, featureCounts does not count multi-mapping reads.". So by default only NH:i:1 alignments are counted.

2. When --primary option is set, featureCounts use flag value of an alignment to figure out if it's primary alignment or not. For SE mapping, only alignments with flag 0 or 16 are counted. And these alignments based on flag value won't distinguish between mapping to one location or multiple locations.

Is that how featureCounts works?

Thanks.

Jason

Wei Shi

unread,

Aug 11, 2016, 9:38:46 PM8/11/16

to Subread

Yes your understanding is correct.

Jason

unread,

Aug 11, 2016, 9:58:26 PM8/11/16

to Subread

Great!

I have a few other questions. Could you comment on how PE reads primary alignments were determined? By FLAG of 99/147 or 83/163 or other ways?

Also does featureCounts rely on XS:i: field to determine the strand? I have only used hisat2 so far, not sure about other aligners, do they all generated XS:i: field to indicate strand info if the RNA-seq protocol was stranded one?

Thanks.

Jason

Wei Shi

unread,

Aug 16, 2016, 9:41:27 PM8/16/16

to Subread

When counting primary alignments for read pairs, featureCounts will only count those alignments that have both reads mapped as primary alignments. It uses the bit 0x100 in FLAG field to test if an alignment is a primary alignment or not.

featureCounts uses FLAG to find the strand info. 'XS:i' is not part of the SAM/BAM specification.

Regards,

Wei

Jason

unread,

Aug 17, 2016, 12:19:32 AM8/17/16

to Subread

those all make sense now to me. Thanks for the feedback.

Reply all

Reply to author

Forward