Failed to work with UCSC GTF

52 views
Skip to first unread message

Vang Le Quy

unread,
Apr 23, 2015, 4:01:21 AM4/23/15
to qual...@googlegroups.com
Hellow Qualimap developers,

Since Rat genome rn6 assembly does not have ENSEMBL release yet, I am trying to make qualimap work with RN6 release from UCSC. I followed instructions here to create a GTF file: http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format

And I used the following command to fetch the GTF file:

./genePredToGtf -utr -honorCdsStat rn6 refGene rn6.refGene2.gtf

head rn6.refGene2.gtf
chr1    refGene exon    396700  396905  .       +       .       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "1"; exon_id "NM_001099460.1"; gene_name "Vom2r3";
chr1    refGene CDS     396700  396905  .       +       0       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "1"; exon_id "NM_001099460.1"; gene_name "Vom2r3";
chr1    refGene exon    399545  399827  .       +       .       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "2"; exon_id "NM_001099460.2"; gene_name "Vom2r3";
chr1    refGene CDS     399545  399827  .       +       1       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "2"; exon_id "NM_001099460.2"; gene_name "Vom2r3";
chr1    refGene exon    400256  401059  .       +       .       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "3"; exon_id "NM_001099460.3"; gene_name "Vom2r3";
chr1    refGene CDS     400256  401059  .       +       0       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "3"; exon_id "NM_001099460.3"; gene_name "Vom2r3";
chr1    refGene exon    401912  402136  .       +       .       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "4"; exon_id "NM_001099460.4"; gene_name "Vom2r3";
chr1    refGene CDS     401912  402136  .       +       0       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "4"; exon_id "NM_001099460.4"; gene_name "Vom2r3";
chr1    refGene exon    407504  407627  .       +       .       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "5"; exon_id "NM_001099460.5"; gene_name "Vom2r3";
chr1    refGene CDS     407504  407627  .       +       0       gene_id "Vom2r3"; transcript_id "NM_001099460"; exon_number "5"; exon_id "NM_001099460.5"; gene_name "Vom2r3";


And Qualimap command:

qualimap rnaseq --java-mem-size=16G -bam ./RNAseq.bam -gtf rn6.refGene2.gtf 

I got the following error, and could not find the problematic lines. I tried running 
 gawk '{ if ($4 > $5) print $0 }' rn6.refGene2.gtf >ucsc.badlines.gtf 
This gave emtpy ucsc.badlines.gtf  file.
=================ERROR============================
Java memory size is set to 16G
Launching application...

QualiMap v.2.0
Built on 2014-08-28 17:03

Selected tool: rnaseq
Thu Apr 23 09:32:40 CEST 2015           WARNING output folder already exists
Initializing regions from rn6.refGene2.gtf...

Initialized 100000 regions...
Initialized 200000 regions...
Initialized 300000 regions...
Initialized 400000 regions...
Failed to run rnaseq

java.lang.IllegalArgumentException: start must be less than or equal to end!
        at net.sf.picard.util.Interval.<init>(Interval.java:74)
        at net.sf.picard.util.Interval.<init>(Interval.java:52)
        at org.bioinfo.ngs.qc.qualimap.common.TranscriptDataHandler.createHelperMaps(TranscriptDataHandler.java:674)
        at org.bioinfo.ngs.qc.qualimap.common.TranscriptDataHandler.constructTranscriptsMap(TranscriptDataHandler.java:174)
        at org.bioinfo.ngs.qc.qualimap.process.ComputeCountsTask.loadRegionsFromGTF(ComputeCountsTask.java:565)
        at org.bioinfo.ngs.qc.qualimap.process.ComputeCountsTask.initRegions(ComputeCountsTask.java:474)
        at org.bioinfo.ngs.qc.qualimap.process.ComputeCountsTask.run(ComputeCountsTask.java:383)
        at org.bioinfo.ngs.qc.qualimap.process.RNASeqQCAnalysis.run(RNASeqQCAnalysis.java:67)
        at org.bioinfo.ngs.qc.qualimap.main.RnaSeqQcTool.execute(RnaSeqQcTool.java:135)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartTool.run(NgsSmartTool.java:187)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartMain.main(NgsSmartMain.java:103)
Thu Apr 23 09:32:46 CEST 2015           WARNING Cleanup output dir

Konstantin Okonechnikov

unread,
Apr 24, 2015, 6:17:42 AM4/24/15
to qual...@googlegroups.com
Hi!

Thanks for report!

Could you please share the GTF file and a small subsample from BAM file (using drive.google.com or dropbox)?

Or provide the link to Pred file, I ll try to convert it to GTF, maybe the problem with this conversion.
--
  Konstantin

 


--
You received this message because you are subscribed to the Google Groups "QualiMap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qualimap+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vang Le Quy

unread,
Apr 27, 2015, 3:53:26 AM4/27/15
to qual...@googlegroups.com, k.okone...@gmail.com
hi Konstantin,
You can download GTF and BAM files from here: https://drive.google.com/folderview?id=0B7pviyx9Rsf1fnNucFZsZllFTXRrRlVqQnBJc0MwUXU5RW9BZEx3a1A2R0VGZ21EeU9tb1U&usp=sharing. I followed exactly the procedure I mentioned in the previous post to create the GTF files. 
The problem occurs when qualimap is parsing the GTF file, so it happens to any BAM file when I use the GTF file.

Thanks
Vang

Konstantin Okonechnikov

unread,
Apr 27, 2015, 9:44:52 AM4/27/15
to qual...@googlegroups.com
Dear Vang,

try using version v2.1, it includes several fixes for annotation file processing:

I tried running analysis with this latest version on the data you shared and it worked fine. I've attached both results.

Please, run novel version on full datasets and let me know if everything is fine.

--
  Konstantin


subsample_rnaseq_qc_rn6.refFlat.zip
subsample_rnaseq_qc_rna6.refGene2.zip

Vang Le Quy

unread,
Apr 28, 2015, 7:14:17 AM4/28/15
to qual...@googlegroups.com, k.okone...@gmail.com
Indeed! It solved the problem. Thank you very much for your help, Konstantin.

~ Vang
Reply all
Reply to author
Forward
0 new messages