Tophat likely bug in "tophat_reports" + "--no-discordant"

867 views
Skip to first unread message

Alex Williams

unread,
Aug 19, 2013, 4:27:51 PM8/19/13
to tuxedo-to...@googlegroups.com
Hi!

I encountered an issue with aligning large PE100 (paired end) files. Everything would go smoothly until the "tophat_reports" section, which would terminate with "[ERROR]" and no additional information.

 This only occurs on certain files, and they appear to have to be quite large. So I don't have a great way to reproduce the results on a small dataset, unfortunately.

 Because this happened in both Tophat 2.0.8b and Tophat 2.0.9, I think it is actually a bug—I tried with in several different input files, all of which exhibited the same behavior. A co-worker of mine also tested it on his own system, and had the same problems.

 I tracked the issue down to "--no-discordant" -- if that option is specified, then Tophat will (on certain inputs!) fail in the tophat_reports step. Removing that option allows Tophat to finish with no problems.

 I wish I had more specific information about this issue, but if you happen to run into an "[ERROR]" in tophat_reports with paired-end data, you might try running the sequences again without "--no-discordant" and see if that works.

 Alex

Alex Williams

unread,
Sep 2, 2013, 2:53:15 PM9/2/13
to tuxedo-to...@googlegroups.com
Looks like someone else is having this same problem as well: http://seqanswers.com/forums/showthread.php?t=24205

(Bringing the total number of independent cases that I know about up to 3.)

I am pretty sure this is in fact a bug in Tophat that needs to be addressed—something is wrong with the way "--no-discordant" works under certain scenarios with reasonably large (2-3 GB for each half of the pair, when compresed) input FASTQ files.

Alex Williams

unread,
Sep 2, 2013, 2:54:47 PM9/2/13
to tuxedo-to...@googlegroups.com
(To clarify, the post I am referring to on "seqanswers" is actually NOT one of the ones on the first page... it's the top one on page 2, here: http://seqanswers.com/forums/showthread.php?t=24205&page=2 )

zhech...@gmail.com

unread,
Dec 1, 2013, 8:25:53 AM12/1/13
to tuxedo-to...@googlegroups.com
Hi!

I have encounted the same trouble with you. But if I use a small test data with the same command, 200,000 reads, no ERROR occur. I had used "--no-discordant" parameter.

I want to know if you had solved this problem?

Zhe

A2z

unread,
Dec 26, 2013, 10:34:52 PM12/26/13
to tuxedo-to...@googlegroups.com
I can confirm that I too see the issue with '--no-discordant.' The version of Tophat was 2.0.10 (Bowtie 2.1.0). The RNA seq. data is I notice the issue with both uncompressed fastq and gz-compressed fastq files, for many samples, with read counts varying from 10 to 100 million. I had both paired and unpaired data in my input. The accepted_hits.bam files get generated when '--no-discordant' is not specified:

tophat -p 16 -r -50 --mate-std-dev=40 -g 1 --read-mismatches 3 --read-edit-dist 3 --no-novel-juncs --transcriptome-index=$g -o ${s} $i ${s}_trimmed_paired_1.fastq,${s}_trimmed_unpaired_1.fastq ${s}_trimmed_paired_2.fastq,${s}_trimmed_unpaired_2.fastq

Jake Freimer

unread,
Jan 8, 2014, 12:37:42 AM1/8/14
to tuxedo-to...@googlegroups.com
I'm having this same problem. Tophat runs fine if I don't use the --no-discordant option and if I use a much smaller dataset. Has anyone found a solution?

JXW659

unread,
Jan 8, 2014, 9:18:07 PM1/8/14
to tuxedo-to...@googlegroups.com
So funny story... I got the same error with out the --no-dicordant reads in Tophat 2.0.3 The run log has 

Error running /home/jxw659/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir Con2/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p16 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations hg19/genes.gtf --gtf-juncs Con2/tmp/genes.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header Con2/tmp/genome_genome.bwt.samheader.sam --samtools=/home/jxw659/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 hg19/genome.fa Con2/junctions.bed Con2/insertions.bed Con2/deletions.bed Con2/fusions.out Con2/tmp/accepted_hits Con2/tmp/left_kept_reads.m2g_um.candidates_and_unspl.bam Con2/tmp/left_kept_reads.bam Con2/tmp/right_kept_reads.m2g_um.candidates_and_unspl.bam Con2/tmp/right_kept_reads.bam

Loaded 311244 junctions
I hypothesize that an error occurs in the call to samtools as this seemed to have failed mid way through when the accepted_hits.bam files are being written. 
I ran it on another one of the servers which had someone a bit more unix inclined install samtools,tophat, and bowtie2 and it ran all the way through. I seriously CAN NOT understand this

Zach Russ

unread,
Mar 11, 2014, 3:31:36 PM3/11/14
to tuxedo-to...@googlegroups.com
I also got the same error; when I ran it on ~200M paired end reads.  The full command was:

/Genomics/Software/tophat-2.0.11.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 20 --max-report-intron 1000 --min-isoform-fraction 0.15 --output-dir /Fast/IT1D/Mtruncatula_198/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 1000 --min-coverage-intron 50 --max-coverage-intron 1000 --min-segment-intron 20 --max-segment-intron 1000 --read-mismatches 4 --read-gap-length 2 --read-edit-dist 4 --read-realign-edit-dist 5 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations /Genomics/References/Mtrunc_v9/Mtruncatula_198_knownBT2.gff --gtf-juncs /Fast/IT1D/Mtruncatula_198/tmp/Mtruncatula_198_knownBT2.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header /Fast/IT1D/Mtruncatula_198/tmp/Mtruncatula_198_genome.bwt.samheader.sam --samtools=/Genomics/Software/samtools-0.1.19/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /Genomics/References/Mtrunc_v9/Mtruncatula_198.fa /Fast/IT1D/Mtruncatula_198/junctions.bed /Fast/IT1D/Mtruncatula_198/insertions.bed /Fast/IT1D/Mtruncatula_198/deletions.bed /Fast/IT1D/Mtruncatula_198/fusions.out /Fast/IT1D/Mtruncatula_198/tmp/accepted_hits /Fast/IT1D/Mtruncatula_198/tmp/left_kept_reads.m2g.bam,/Fast/IT1D/Mtruncatula_198/tmp/left_kept_reads.m2g_um.mapped.bam,/Fast/IT1D/Mtruncatula_198/tmp/left_kept_reads.m2g_um.candidates /Fast/IT1D/Mtruncatula_198/tmp/left_kept_reads.bam /Fast/IT1D/Mtruncatula_198/tmp/right_kept_reads.m2g.bam,/Fast/IT1D/Mtruncatula_198/tmp/right_kept_reads.m2g_um.mapped.bam,/Fast/IT1D/Mtruncatula_198/tmp/right_kept_reads.m2g_um.candidates /Fast/IT1D/Mtruncatula_198/tmp/right_kept_reads.bam

It gets as far as writing unmapped_right_4.bam (and all the other accepted and unmapped hits before it) and then dies.  The error is "floating point exception"


Zach Russ

unread,
Mar 11, 2014, 3:34:42 PM3/11/14
to tuxedo-to...@googlegroups.com
Also, the command I ran was:

tophat --num-threads=8 --min-intron-length=20 --max-intron-length=1000 -G /${REFDIR}/${REF}_gene_exons.gff3 --no-discordant --read-mismatches=4 --read-edit-dist=4 --no-mixed --transcriptome-index=/${REFDIR}/${REF}_knownBT2 --output-dir /${OUTPUTDIR}/${PLANT}/${REF} /${REFDIR}/${REF} /${DATADIR}/${PLANT}_1.fasta,/${DATADIR}/${PLANT}-STFC.fasta /${DATADIR}/${PLANT}_2.fasta

Claudia Armenise

unread,
Apr 2, 2014, 9:14:50 AM4/2/14
to tuxedo-to...@googlegroups.com
Hi,

I am also encountering the same issue for some single-end RNA-seq libraries with both Tophat 2.0.9 and Tophat 2.0.11, without using the "--no-discordant"option. This is the command I run:

tophat -o my_out_dir --bowtie1 --no-coverage --fusion-multireads 1 --fusion-multipairs 1 --max-intron-length 100000 --fusion-min-dist 10000 --fusion-anchor-length 10  --no-convert-bam -p 12 bowtie1_index/my_index my_fastq_file.fastq

The issue is not fixed switching to bowtie2 indexes. Did someone find any patch?

Thank you very much

lmolla

unread,
Jul 30, 2014, 3:20:10 PM7/30/14
to tuxedo-to...@googlegroups.com
I am working with 50PE reads and I also get the same error when I use the --no-discordant option . Anyone was able to fix this? 
thanks 

[2014-07-30 18:52:16] Reporting output tracks

[FAILED]

Error running /home/ubuntu/software/tophat-2.0.12.Linux_x86_64/tophat_reports ********************

Loaded 217630 junctions

Claudia Armenise

unread,
Jul 31, 2014, 2:56:21 AM7/31/14
to tuxedo-to...@googlegroups.com
Our CIO Karl Forner worked on it and proposes a patch that works for us: https://groups.google.com/forum/#!topic/tuxedo-tools-users/0pcEVAOCBaw
Please let us know if it solves your issue

Best regards,
Claudia Armenise
Quartz Bio

Fan Li

unread,
Aug 4, 2014, 7:10:55 PM8/4/14
to tuxedo-to...@googlegroups.com
Hi Claudia,

I'm still getting the tophat_reports error on Tophat v2.0.12 even with your proposed patch. Turning off multithreading didn't help for me, and running the standalone tophat_reports command gave a bit more information about a floating point exception.


Andrew Oler

unread,
Oct 4, 2014, 11:58:39 AM10/4/14
to tuxedo-to...@googlegroups.com
I had this same problem with TopHat v2.0.13 using --no-discordant option on some large files.  Rather than re-do the whole tophat run without the --no-discordant option, I just ran the last step (tophat_reports) with tophat --resume using TopHat version 2.0.8b and it went to completion.  Note that it appears Bowtie 2.2.0 is not compatible with TopHat 2.0.8b, so I had to downgrade to Bowtie 2.1.0 on my PATH to avoid an error. 

As a side note, based on my anecdotal experience with this bug, I think it was introduced in 2.0.9 as I don't think I've ever had this problem in 2.0.8b or earlier. 
Reply all
Reply to author
Forward
0 new messages