RSeQC 2.3.7 infer_experiment failure

A2z

unread,

Nov 30, 2013, 8:31:04 PM11/30/13

to rseqc-...@googlegroups.com

I am using TopHat 2 (version 2.0.10) with Bowtie 2 to map Illumina 2x101 b non-stranded paired-end RNA sequencing data to the reference human genome/transcriptome.

In one scenario, the fastq files for the left and read reads (paired) are used as input for TopHat. The BAM files that are generated get read by RSeQC's infer_experiment.

But in another scenario, I first process the fastq files through trimmomatic (1.3.2) software to remove adapter/contaminant and poor quality sub-sequences from the reads, thus generating four fastq files from the two input fastq files (see the software's web-site for more). The output of this is then used for TopHat2. The BAM files that are generated are usable with some other softwares but do not get read by RSeQC's infer_experiment (see output below).

Is it that RSeQC has issues with BAM files that are generated with TopHat using both unpaired and paired data in the same TopHat run? (The ability to use both paired and unpaired data has been introduced in TopHat version 2.0.10.)

RSeQC: version 2.3.7 on 64-bit Linux with Python 2.7.3

Command: /software/RSeQC-2.3.7/usr/global/python-2.7.3/bin/infer_experiment.py -r /ref/hg19_UCSC_RefSeqGenes.bed -i /bam/test.bam

Terminal/stdout output:
Reading reference gene model /ref/hg19_UCSC_RefSeqGenes.bed ... Done
Loading SAM/BAM file ... Finished
Total 0 usable reads were sampled
Unknown Data type

TopHat command: tophat -p 16 -g 1 -r 50 -G=... -o test ... test_trimmed_paired_1.fastq.gz,test_trimmed_unpaired_1.fastq.gz test_trimmed_paired_2.fastq.gz,test_trimmed_unpaired_2.fastq.gz

Liguo Wang

unread,

Dec 1, 2013, 12:02:03 AM12/1/13

to rseqc-...@googlegroups.com

You are right. RSeQC can NOT tell the strand relationships (eg. read_1 mapped to '-' strand indicates a gene located on '+' strand) from BAM file that is a mixture of pair-end sequencing and single-end sequencing.

For a single RNA-seq experiment, it could be either pair-end sequencing or single-end sequencing, no other possibility.

We believe it is not a good practice to mix pair-end and single-end sequencing together and run 'infer_experiment.py', because the strand relationship might be conflicting.

Thanks

Liguo

--
You received this message because you are subscribed to the Google Groups "rseqc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseqc-discus...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A2z

unread,

Dec 1, 2013, 1:32:28 PM12/1/13

to rseqc-...@googlegroups.com

Thanks for the clarification. I hope RSeQC does get the feature to handle such BAM files in the future. It's a very nice and useful software.

A2z

unread,

Dec 1, 2013, 10:33:40 PM12/1/13

to rseqc-...@googlegroups.com

I want to point out that though infer_experiment.py failed, some other RSeQC utilities seem to work. I tested bam_stat.py and inner_distance.py.

On Sunday, December 1, 2013 12:02:03 AM UTC-5, Liguo Wang wrote:

Reply all

Reply to author

Forward