I am using TopHat 2 (version 2.0.10) with Bowtie 2 to map Illumina 2x101 b non-stranded paired-end RNA sequencing data to the reference human genome/transcriptome.
In one scenario, the fastq files for the left and read reads (paired) are used as input for TopHat. The BAM files that are generated get read by RSeQC's infer_experiment.
But in another scenario, I first process the fastq files through
trimmomatic (1.3.2) software to remove adapter/contaminant and poor quality sub-sequences from the reads, thus generating four fastq files from the two input fastq files (see the software's web-site for more). The output of this is then used for TopHat2. The BAM files that are generated are usable with some other softwares but do not get read by RSeQC's infer_experiment (see output below).
Is it that RSeQC has issues with BAM files that are generated with TopHat using both unpaired and paired data in the same TopHat run? (The ability to use both paired and unpaired data has been introduced in TopHat version 2.0.10.)
RSeQC: version 2.3.7 on 64-bit Linux with Python 2.7.3
Command: /software/RSeQC-2.3.7/usr/global/python-2.7.3/bin/infer_experiment.py -r /ref/hg19_UCSC_RefSeqGenes.bed -i /bam/test.bam
Terminal/stdout output:
Reading reference gene model /ref/hg19_UCSC_RefSeqGenes.bed ... Done
Loading SAM/BAM file ... Finished
Total 0 usable reads were sampled
Unknown Data type
TopHat command: tophat -p 16 -g 1 -r 50 -G=... -o test ... test_trimmed_paired_1.fastq.gz,test_trimmed_unpaired_1.fastq.gz test_trimmed_paired_2.fastq.gz,test_trimmed_unpaired_2.fastq.gz