I am confused or qualimap rnaseq uses the wrong strand

Rory Kirchner

unread,

May 22, 2019, 11:11:09 AM5/22/19

to QualiMap

Hi everyone!

Thanks for the nice tool. I think that qualimap rnaseq, when run in stranded mode, uses the wrong strand.

I have a dataset that is firststrand, which means that R1 should be on the reverse strand:

http://onetipperday.sterding.com/2012/07/how-to-tell-which-library-type-to-use.html

I run hisat2 with this mode turned on, by setting -rna_strandness RF, which is the correct mode for fr-firstrand, from the hisat2 manual:

Specify strand-specific information: the default is unstranded.
For single-end reads, use F or R. 'F' means a read corresponds to a transcript. 'R' means a read corresponds to the reverse complemented counterpart of a transcript. For paired-end reads, use either FR or RF.
With this option being used, every read alignment will have an XS attribute tag: '+' means a read belongs to a transcript on '+' strand of genome. '-' means a read belongs to a transcript on '-' strand of genome.

This results in alignment rates of 95%, so it is correct: 157763312 + 0 mapped (95.62% : N/A)

Qualimap rnaseq has this description for the strandedness option:

reverse-strand
For single-end reads, the read and the feature must have the opposite strand. For paired-end reads, the first read of pair must be mapped to the opposite strand of the feature, while the second read of the pair must be on the same strand as the feature.

If I run qualimap on these alignments with -p strand-specific-reverse set, only about 3% of the reads overlap the exons in the GTF. If I run it with -p strand-specific-forward set, it overlaps about around 70% of them.

Do I have the direction wrong, or does qualimap?

Thanks!

Best,

Rory

Konstantin Okonechnikov

unread,

May 26, 2019, 5:48:45 AM5/26/19

to qual...@googlegroups.com

Hi!

Thanks for your opinion about Qualimap! Are you sure about the protocol assignment? Actually the specificity should be addressed accordingly, the tool was tested on quite common reverse-specific and typically strand-specific-reverse should cover up to 95% of reads. Did you try running without strand specificity as by default? There will be the estimation value ("Strand specificity estimation (fwd/rev)" ) which allows to set the correct settings since it will show the measured proportions directly.

Best regards,

Konstantin

--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/c8f31ee0-74cf-4b6c-b3a3-493f849bd222%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rory Kirchner

unread,

May 29, 2019, 10:58:51 AM5/29/19

to QualiMap

Hi Konstantin,

Thanks! I am 95% certain I am confused, but don't understand why I'm confused.

Salmon also thinks these reads are in the RF (firststrand) orientation:

{

"read_files": "[ /dev/fd/63, /dev/fd/62]",

"expected_format": "ISR",

"compatible_fragment_ratio": 1.0,

"num_compatible_fragments": 2473144,

"num_assigned_fragments": 2473144,

"num_frags_with_consistent_mappings": 1678556,

"num_frags_with_inconsistent_or_orphan_mappings": 794588,

"MSF": 0,

"OSF": 0,

"ISF": 0,

"MSR": 0,

"OSR": 0,

"ISR": 1678556,

"SF": 348831,

"SR": 445757,

"MU": 0,

"OU": 0,

"IU": 0,

"U": 0

}

The Salmon manual has ISR corresponding to firststrand: https://salmon.readthedocs.io/en/latest/library_type.html

So these are definitely fr-firstrand or RF reads.

I did run unstranded, sorry for not including it! Here is the blurb for that on a subset of the reads:

RNA-Seq QC report

-----------------------------------

>>>>>>> Input

bam file = align/small.bam

gff file = /projects/ngs/reference/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf

counting algorithm = proportional

protocol = non-strand-specific

>>>>>>> Reads alignment

reads aligned (left/right) = 710,176 / 690,655

read pairs aligned = 673,232

total alignments = 1,748,874

secondary alignments = 348,043

non-unique alignments = 0

aligned to genes = 781,605

ambiguous alignments = 88,959

no feature assigned = 448,909

not aligned = 106,419

SSP estimation (fwd/rev) = 0.97 / 0.03

So it's calling these fwd reads, but I aligned with hisat2 like this:

/projects/ngs/local/software/bb5CD48373/tools/bin/../../anaconda/envs/python2/bin/hisat2-align-s --wrapper basic-0 --new-summary -x /projects/ngs/reference/genomes/Hsapiens/hg38/hisat2/hg38 -p 16 --phred33 --rna-strandness RF --rg-id PrCa_09_PROSTATE_N_1_1 --rg PL:illumina --rg PU:PrCa_09_PROSTATE_N_1_1 --rg SM:PrCa_09_PROSTATE_N_1_1 --known-splicesite-infile /projects/ngs/reference/genomes/Hsapiens/hg38/rnaseq/ref-transcripts-splicesites.txt --novel-splicesite-outfile /projects/ngs/oncology/dev/Dev_1663_RNASeq_Bakeoff_2019/bcbio/work/align/PrCa_09_PROSTATE_N_1_1/PrCa_09_PROSTATE_N_1_1-novelsplicesites.bed -1 /tmp/29067.inpipe1 -2 /tmp/29067.inpipe2

the --rna-strandedness RF flag is set there, on my slimmed down file here are the alignments:

1855293 + 0 in total (QC-passed reads + QC-failed reads)

348043 + 0 secondary

0 + 0 supplementary

571978 + 0 duplicates

1748874 + 0 mapped (94.26% : N/A)

1507250 + 0 paired in sequencing

753625 + 0 read1

753625 + 0 read2

1346464 + 0 properly paired (89.33% : N/A)

1356310 + 0 with itself and mate mapped

44521 + 0 singletons (2.95% : N/A)

1698 + 0 with mate mapped to a different chr

1214 + 0 with mate mapped to a different chr (mapQ>=5)

Do you spot where I am not understanding?

Best,

Rory

To unsubscribe from this group and stop receiving emails from it, send an email to qual...@googlegroups.com.

Konstantin Okonechnikov

unread,

May 31, 2019, 7:41:01 AM5/31/19

to qual...@googlegroups.com

Hi Rory,

my only hypothesis that there might be some adjustment of the BAM format by hisat2 that Qualimap does not handle properly. Does it modify somehow the reads orientation in the SAM records?

Also, did you try to run Qualimap on Salmon BAM result? Does it report the same SSP?

Actually, most of the tests with Qualimap RNA-seq were done on STAR results, therefore there could be some effects from the alignment tool.

Best regards,

Konstantin

To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/77823fb6-8714-4625-8579-68f36c2150d7%40googlegroups.com.

Reply all

Reply to author

Forward