rsem-prepare-reference using UCSC annotation fails with exons from different orientations

58 views
Skip to first unread message

Margaret Starostik

unread,
Jul 3, 2019, 1:56:58 PM7/3/19
to RSEM Users
I am getting the following error upon building RSEM references using UCSC annotation:

"According to the GTF file given, transcript NM_001026033 has exons from different orientations! "rsem-extract-reference-transcripts RSEM/ucsc_ce10 0 ucsc_ce10_compiled.gtf None 1 knownIsoforms.txt ucsc_ce10_compiled.fa" failed! Please check if you provide correct parameters/options for the pipeline!"

What I did:
(1) Downloaded GTF from UCSC Table Browser

Assembly: Oct.2010 (WS220/ce10) 
Group: Genes and Gene Predictions; Track: RefSeq Genes
Table: refGene
Output format: GTF

The first few lines of the file:
chrI ce10_refGene stop_codon 8378299 8378301 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene CDS 8378302 8378421 0.000000 - 0 gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene exon 8378299 8378421 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene CDS 8379131 8379233 0.000000 - 1 gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene exon 8379131 8379233 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene CDS 8379700 8379809 0.000000 - 0 gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene exon 8379700 8379809 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene CDS 8380324 8380439 0.000000 - 2 gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene exon 8380324 8380439 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; 
chrI ce10_refGene CDS 8380905 8380979 0.000000 - 2 gene_id "NM_182066"; transcript_id "NM_182066"; 

(2) Created knownIsoforms.txt from the GTF above since it's not available to download for C. elegans 
 
The first few lines of the file:
NM_182066 NM_182066
NM_059873 NM_059873
NM_001129046 NM_001129046
NR_070240 NR_070240
NR_052806 NR_052806 

(3) Ran RSEM as follows: 
rsem-prepare-reference --gtf ucsc_ce10_compiled.gtf
      --transcript-to-gene-map knownIsoforms.txt \
      chrI.fa,chrII.fa,chrIII.fa,chrIV.fa,chrV.fa,chrM.fa,chrX.fa \
      RSEM/ucsc_ce10
     
I have extensively used RSEM in the past with human Ensembl annotation and had no problems, but this is the first time using C. elegans UCSC annotation.

Thanks,
Margaret

Reply all
Reply to author
Forward
0 new messages