I am getting the following error upon building RSEM references using UCSC annotation:
"According to the GTF file given, transcript NM_001026033 has exons from different orientations! "rsem-extract-reference-transcripts RSEM/ucsc_ce10 0 ucsc_ce10_compiled.gtf None 1 knownIsoforms.txt ucsc_ce10_compiled.fa" failed! Please check if you provide correct parameters/options for the pipeline!"
What I did:
(1) Downloaded GTF from UCSC Table Browser
Assembly: Oct.2010 (WS220/ce10)
Group: Genes and Gene Predictions; Track: RefSeq GenesTable: refGene
Output format: GTF
The first few lines of the file:
chrI ce10_refGene stop_codon 8378299 8378301 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene CDS 8378302 8378421 0.000000 - 0 gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene exon 8378299 8378421 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene CDS 8379131 8379233 0.000000 - 1 gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene exon 8379131 8379233 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene CDS 8379700 8379809 0.000000 - 0 gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene exon 8379700 8379809 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene CDS 8380324 8380439 0.000000 - 2 gene_id "NM_182066"; transcript_id "NM_182066";
chrI ce10_refGene exon 8380324 8380439 0.000000 - . gene_id "NM_182066"; transcript_id "NM_182066"; chrI ce10_refGene CDS 8380905 8380979 0.000000 - 2 gene_id "NM_182066"; transcript_id "NM_182066";
(2) Created knownIsoforms.txt from the GTF above since it's not available to download for C. elegans
The first few lines of the file:
NM_182066 NM_182066
NM_059873 NM_059873
NM_001129046 NM_001129046
NR_070240 NR_070240
NR_052806 NR_052806
(3) Ran RSEM as follows:
rsem-prepare-reference --gtf ucsc_ce10_compiled.gtf
--transcript-to-gene-map knownIsoforms.txt \
chrI.fa,chrII.fa,chrIII.fa,chrIV.fa,chrV.fa,chrM.fa,chrX.fa \
RSEM/ucsc_ce10
I have extensively used RSEM in the past with human Ensembl annotation and had no problems, but this is the first time using C. elegans UCSC annotation.
Thanks,
Margaret