-pe option not working in Qualimap rnaseq

239 views
Skip to first unread message

Antonin Thiébaut

unread,
Mar 6, 2021, 12:16:00 PM3/6/21
to QualiMap
Hi,

I’ve been trying to run qualimap rnaseq on paired-end files with mixed results. I tried it on several versions (2.2.2a, 2.2.2b, 2.2.2c and 2.2.2d installed via conda ; I also tried the latest build (31-08-20) available on bitbucket) and with several input files.

Here’s an example of the command I used :
qualimap rnaseq -bam X_sorted.bam -gtf X.gtf -a uniquely-mapped-reads --sequencing-protocol non-strand-specific -pe --java-mem-size=32000M

The annotation files are directly downloaded from the NCBI.
The bam files are obtained using a classic worklow : fastq are downloaded from the SRA, trimmed for adapters and bad bases, mapped using Hisat2 ; the sam file produced by Hisat2 is then converted into bam, the multi-mapping reads are removed and the bam file is finally sorted (by location, not by name) and indexed (all the last steps are performed using samtools). I tried viewing these files and I can clearly see pairs of reads.

And here’s an example of qualimap’s log :

Display: 
Java memory size is set to 32000M
Launching application...

QualiMap v.2.2.2-dev
Built on 2019-11-11 14:05

Selected tool: rnaseq
Initializing regions from X.gtf...

Initialized 100000 regions...

Initialized 364586 regions it total

Starting constructing transcripts for RNA-seq stats...
Finished constructing transcripts

Starting BAM file analysis

Sorting BAM file by name...

Read 10000000 records.
Finished reading inputs, merging and writing to output now.
Sorting by name finished.

WARNING! The following chromosomes from reads are not found in annotations:
[I removed the list in order to avoid a long post]

Counted 0 read pairs, 0 single reads
The number of skipped alignments having only one read in fragment: 28825354

Processed 28825354 reads in total

Cleanup of temporary files

BAM file analysis finished
Creating plots
Writing HTML report...
HTML report created successfully

I’m not bothered by the missing chromosomes because some contigs/scaffolds are not annotated, so it’s not a problem. However, as you can see in bold, qualimap finds reads but doesn’t count them as pairs or single reads, it just skips them.

I’ve also tried the same command without the -pe option and it worked fine.

Do you have any idea about what could be causing that ?

Thanks in advance,
Best,

Antonin THIÉBAUT

Konstantin Okonechnikov

unread,
Apr 26, 2021, 12:43:49 PM4/26/21
to qual...@googlegroups.com
HI!

Sorry, somehow missed this email. If it's still helpful, here are some of my comments: probably some issue here's with SAM records - how the read pair is assigned by Hisat.  If there is a BAM file example (or subsample) available I could take a look on it.

Best regards,
   Konstantin

--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qualimap/b0119538-9bc6-4a9f-92ad-5a392715e94an%40googlegroups.com.

Antonin Thiébaut

unread,
Apr 27, 2021, 6:56:50 AM4/27/21
to QualiMap
Hello !

No worries, it happens ! I appreciate your answer.
So the story gets a bit deeper : this "bug" doesn't always happen. I've tested another set of paired-end files and the Qualimap rnaseq command written above worked like a charm ! This is surprising, because all samples are processed the same way, with the same Hisat2 commands (I'm using a snakemake workflow). Which could mean that the problem doesn't come from Hisat2 but from the samples themselves ?

In any case, if you want to have a look, here's an example of problematic bam files : https://apollo.vital-it.ch/trackvis/qualimap_test/ (I uploaded the full bam and a subsample of it ; both display the problem).

Best,

Antonin

Konstantin Okonechnikov

unread,
Apr 28, 2021, 1:12:35 PM4/28/21
to qual...@googlegroups.com
That's interesting, maybe could also be the effect of target reference dataset/annotation. The sample is from drosophila ananassae, right? I found this: https://www.ncbi.nlm.nih.gov/sra/SRR6161785 

 What annotation GTF file was used? Was it something specific? 

Best regards,
    Konstantin

Antonin Thiébaut

unread,
Apr 29, 2021, 5:07:00 AM4/29/21
to QualiMap
Yes, it's from Drosophila ananassae, you have the right sample ! I also thought the annotation could be the cause, which is why :
- I tested samples from another species (D. kikkawai) but from the same dataset than D. ananassae, and it showed the same problem.
- I also tested D. melanogaster samples from an entirely different dataset and it showed the same problem (but I've managed to get the command working on other D. melanogaster samples with the same annotations !).

In every case, I'm using gtf annotation files from the RefSeq database (ie the annotation files were all produced by the NCBI annotation pipeline, except D. melanogaster, which is maintained by FlyBase).

Best regards,
Antonin

Konstantin Okonechnikov

unread,
Apr 30, 2021, 3:18:35 AM4/30/21
to qual...@googlegroups.com
Hi Anotonin,

thanks for explanation! But unfortunately I could not find the GTF file,  even simply googling the name of references from the BAM file did not help. Could you perhaps share the link to it? 

Best regards,
   Konstantin

Antonin Thiébaut

unread,
Apr 30, 2021, 4:05:52 AM4/30/21
to QualiMap
Hi Konstantin,

The original gtf files are there :

But, some coordinates in these files were broken (I guess this happens with automatic annotation pipeline !), so I fixed them using several commands including AGAT. The fixed files can be found there : https://apollo.vital-it.ch/trackvis/qualimap_test/
I use only fixed/standardized gtf files with qualimap.

Two interesting notes about the gtf fix :
1. With the original gtf files, qualimap sometimes stops early because of this error "java.lang.IllegalArgumentException: start must be less than or equal to end!". I interpreted that as a coordinates problem, which is why I implemented gtf repair. With the fixed files, there is no more error and qualimap runs until completion.
2. Without -pe option, I always get qualimap rnaseq reports using the fixed gtf files, but as mentioned earlier, with -pe option, sometimes it works, sometimes it doesn't.

Hope it helps !

Best,
Antonin

Konstantin Okonechnikov

unread,
Apr 30, 2021, 7:51:15 AM4/30/21
to qual...@googlegroups.com
Thanks for sharing the annotation! 

The issue was due to presence of read suffix 1/2 in the alignment names. e.g. an example is like this:   SRR6161785.10000565/1 and SRR6161785.10000565/2 instead of SRR6161785.10000565 for both records  from the same read. Qualimap assumes that name should remain the same for a pair of reads, this is also stated in SAM format AFAIK. 

Simply removing these suffixes solved the issue. Here's link to fixed SAM subsample and report of Qualimap from it:  

Maybe this was possible to control somehow from the hisat2 pipeline settings? Not sure where such suffixes could come from. 

Best regards,
   Konstantin


Antonin Thiébaut

unread,
Apr 30, 2021, 2:21:49 PM4/30/21
to QualiMap
You're welcome!

Thank you for solving this ! As another control, I've checked several of the samples, and indeed, all those who don't run with Qualimap have read suffixes... It also explains why Qualimap worked on these samples without the paired-end option.

I've downloaded the report to have a look, so you can delete it and save space :)

So it seems Hisat2 isn't responsible for the suffixes, because they're already present in the fastq files at the start. Hisat2 just keeps them because it doesn't change the read names. Anyway, now that I know what's causing the problem, it'll be much easier to deal with it !

Thank you for your help, much appreciated !

Best regards,
Antonin
Reply all
Reply to author
Forward
0 new messages