I am confused or qualimap rnaseq uses the wrong strand

461 views
Skip to first unread message

Rory Kirchner

unread,
May 22, 2019, 11:11:09 AM5/22/19
to QualiMap
Hi everyone!

Thanks for the nice tool. I think that qualimap rnaseq, when run in stranded mode, uses the wrong strand.

I have a dataset that is firststrand, which means that R1 should be on the reverse strand:





I run hisat2 with this mode turned on, by setting -rna_strandness RF, which is the correct mode for fr-firstrand, from the hisat2 manual:

Specify strand-specific information: the default is unstranded.
For single-end reads, use F or R. 'F' means a read corresponds to a transcript. 'R' means a read corresponds to the reverse complemented counterpart of a transcript. For paired-end reads, use either FR or RF.
With this option being used, every read alignment will have an XS attribute tag: '+' means a read belongs to a transcript on '+' strand of genome. '-' means a read belongs to a transcript on '-' strand of genome.

This results in alignment rates of 95%, so it is correct: 157763312 + 0 mapped (95.62% : N/A)

Qualimap rnaseq has this description for the strandedness option:

reverse-strand
For single-end reads, the read and the feature must have the opposite strand. For paired-end reads, the first read of pair must be mapped to the opposite strand of the feature, while the second read of the pair must be on the same strand as the feature.

If I run qualimap on these alignments with -p strand-specific-reverse set, only about 3% of the reads overlap the exons in the GTF. If I run it with -p strand-specific-forward set, it overlaps about around 70% of them.

Do I have the direction wrong, or does qualimap?

Thanks!

Best,

Rory

Konstantin Okonechnikov

unread,
May 26, 2019, 5:48:45 AM5/26/19
to qual...@googlegroups.com
Hi! 

Thanks for your opinion about Qualimap! Are you sure about the protocol assignment? Actually the specificity should be addressed accordingly, the tool was tested on quite common reverse-specific and typically strand-specific-reverse should cover up to 95% of reads. Did you try running without strand specificity as by default? There will be the estimation value ("Strand specificity estimation (fwd/rev)" ) which allows to set the correct settings since it will show the measured proportions directly.  

Best regards,
   Konstantin
 

--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/c8f31ee0-74cf-4b6c-b3a3-493f849bd222%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rory Kirchner

unread,
May 29, 2019, 10:58:51 AM5/29/19
to QualiMap
Hi Konstantin,

Thanks! I am 95% certain I am confused, but don't understand why I'm confused.

Salmon also thinks these reads are in the RF (firststrand) orientation:

{
    "read_files": "[ /dev/fd/63, /dev/fd/62]",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 1.0,
    "num_compatible_fragments": 2473144,
    "num_assigned_fragments": 2473144,
    "num_frags_with_consistent_mappings": 1678556,
    "num_frags_with_inconsistent_or_orphan_mappings": 794588,
    "MSF": 0,
    "OSF": 0,
    "ISF": 0,
    "MSR": 0,
    "OSR": 0,
    "ISR": 1678556,
    "SF": 348831,
    "SR": 445757,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

The Salmon manual has ISR corresponding to firststrand: https://salmon.readthedocs.io/en/latest/library_type.html

So these are definitely fr-firstrand or RF reads. 

I did run unstranded, sorry for not including it! Here is the blurb for that on a subset of the reads:

RNA-Seq QC report
-----------------------------------

>>>>>>> Input

    bam file = align/small.bam
    gff file = /projects/ngs/reference/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
    counting algorithm = proportional
    protocol = non-strand-specific


>>>>>>> Reads alignment

    reads aligned (left/right) = 710,176 / 690,655
    read pairs aligned  = 673,232
    total alignments = 1,748,874
    secondary alignments = 348,043
    non-unique alignments = 0
    aligned to genes  = 781,605
    ambiguous alignments = 88,959
    no feature assigned = 448,909
    not aligned = 106,419
    SSP estimation (fwd/rev) = 0.97 / 0.03

So it's calling these fwd reads, but I aligned with hisat2 like this:

/projects/ngs/local/software/bb5CD48373/tools/bin/../../anaconda/envs/python2/bin/hisat2-align-s --wrapper basic-0 --new-summary -x /projects/ngs/reference/genomes/Hsapiens/hg38/hisat2/hg38 -p 16 --phred33 --rna-strandness RF --rg-id PrCa_09_PROSTATE_N_1_1 --rg PL:illumina --rg PU:PrCa_09_PROSTATE_N_1_1 --rg SM:PrCa_09_PROSTATE_N_1_1 --known-splicesite-infile /projects/ngs/reference/genomes/Hsapiens/hg38/rnaseq/ref-transcripts-splicesites.txt --novel-splicesite-outfile /projects/ngs/oncology/dev/Dev_1663_RNASeq_Bakeoff_2019/bcbio/work/align/PrCa_09_PROSTATE_N_1_1/PrCa_09_PROSTATE_N_1_1-novelsplicesites.bed -1 /tmp/29067.inpipe1 -2 /tmp/29067.inpipe2

the --rna-strandedness RF flag is set there, on my slimmed down file here are the alignments:

1855293 + 0 in total (QC-passed reads + QC-failed reads)
348043 + 0 secondary
0 + 0 supplementary
571978 + 0 duplicates
1748874 + 0 mapped (94.26% : N/A)
1507250 + 0 paired in sequencing
753625 + 0 read1
753625 + 0 read2
1346464 + 0 properly paired (89.33% : N/A)
1356310 + 0 with itself and mate mapped
44521 + 0 singletons (2.95% : N/A)
1698 + 0 with mate mapped to a different chr
1214 + 0 with mate mapped to a different chr (mapQ>=5)

Do you spot where I am not understanding? 

Best,

Rory
To unsubscribe from this group and stop receiving emails from it, send an email to qual...@googlegroups.com.

Konstantin Okonechnikov

unread,
May 31, 2019, 7:41:01 AM5/31/19
to qual...@googlegroups.com
Hi Rory,

my only hypothesis that there might be some adjustment of the BAM format by hisat2 that Qualimap does not handle properly.  Does it modify somehow the reads orientation in the SAM records? 

Also, did you try to run Qualimap on Salmon BAM result? Does it report the same SSP? 

Actually, most of the tests with Qualimap RNA-seq were done on STAR results, therefore there could be some effects  from the alignment tool. 

Best regards,
  Konstantin



To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/77823fb6-8714-4625-8579-68f36c2150d7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages