STAR index generation using gff annotation file

1,283 views
Skip to first unread message
Assigned to kind...@gmail.com by me

Scott Cinel

unread,
Feb 2, 2016, 10:00:46 AM2/2/16
to rna-star
Hi all, 

I am trying to generate a genome index for the Spodoptera frugiperda genome using the latest assembly and annotation files available from NCBI. The only annotations available are in gff format which is giving me trouble when trying to create the index. 

I've tried using --sjdbGTFtagExonParentTranscript Parent but this still fails and the error output states the GTF file is not in the correct format. I've looked through the gff file looking for the correct identifier but am a bit lost. 

Attached are the error output, log, and the code I am using to call STAR. Thanks for any help you can give!

Here is my code:

STAR --runThreadN $PBS_NUM_PPN \

         --runMode genomeGenerate \

         --genomeDir /home/a-m/cinel1/starIndex \

         --genomeFastaFiles /home/a-m/cinel1/star-runs/star-references/SfrugiperdaGenome.fna

         --sjdbGTFfile /home/a-m/cinel1/star-runs/star-references/SfrugiperdaAnnotation.gff

         --sjdbGTFtagExonParentTranscript Parent \

         --sjdbOverhang 99



And here are a few lines from the gff file:

##sequence-region JQCY02000001.1 1 236665
JQCY02000001.1 Genbank region 1 236665 . + . ID=id0;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
##sequence-region JQCY02000002.1 1 268131
JQCY02000002.1 Genbank region 1 268131 . + . ID=id1;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
##sequence-region JQCY02000003.1 1 641448
JQCY02000003.1 Genbank region 1 641448 . + . ID=id2;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
##sequence-region JQCY02000004.1 1 301894
JQCY02000004.1 Genbank region 1 301894 . + . ID=id3;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
##sequence-region JQCY02000005.1 1 153123
JQCY02000005.1 Genbank region 1 153123 . + . ID=id4;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
##sequence-region JQCY02000006.1 1 164006
JQCY02000006.1 Genbank region 1 164006 . + . ID=id5;Dbxref=taxon:7108;cell-line=Sf21;gbkey=Src;mol_type=genomic DNA;tissue-type=ovarian
Log.out
starindexgenerate.o1835781

Alexander Dobin

unread,
Feb 5, 2016, 3:10:33 PM2/5/16
to rna-star
Hi Scott,

is this the gff file you are using:

It contains no exons, just regions, and they all start at position 1. So this file does not seem to describe genes.

Cheers
Alex

Luise Zühl

unread,
Mar 9, 2018, 3:45:54 PM3/9/18
to rna-star
Hi Alex,

I have a similar problem with generating a genome index with my .gff3 annotation file and don´t know how to solve it. Would you be so kind and have a look to the command etc please?
Do I need to use another annotation format? or is something wrong with the options I chose?

Thanks a lot already in advance :-)

The command I ran was basically as following:
STAR --runMode genomeGenerate --genomeDir ./STARIndex --genomeChrBinNbits 16 --limitGenomeGenerateRAM 30000000000 --genomeFastaFiles ./Bstricta_278_v1.fa --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfile ./Bstricta_278_v1.2.gene_exon.gff3 --runThreadN 8


and the error:
Mar 02 16:59:23 ..... processing annotations GTF

FATAL error, could not open file sjdbGTFfile=./Bstricta_278_v1.2.gene_exon.gff3

Mar 02 16:59:23 ...... FATAL ERROR, exiting


here are some lines from the .gff3 annotation file:

Scaffold13129   phytozomev10    CDS     1618453 1618512 .       -       0       ID=Bostr.13129s0352.1.v1.2.CDS.17;Parent=Bostr.13129s0352.1.v1.2;pacid=30672932
Scaffold13129   phytozomev10    exon    1617880 1618364 .       -       .       ID=Bostr.13129s0352.1.v1.2.exon.18;Parent=Bostr.13129s0352.1.v1.2;pacid=30672932
Scaffold13129   phytozomev10    three_prime_UTR 1617880 1618247 .       -       .       ID=Bostr.13129s0352.1.v1.2.three_prime_UTR.1;Parent=Bostr.13129s0352.1.v1.2;pacid=30672932
Scaffold13129   phytozomev10    CDS     1618248 1618364 .       -       0       ID=Bostr.13129s0352.1.v1.2.CDS.18;Parent=Bostr.13129s0352.1.v1.2;pacid=30672932
Scaffold13129   phytozomev10    mRNA    1617880 1624574 .       -       .       ID=Bostr.13129s0352.2.v1.2;Name=Bostr.13129s0352.2;pacid=30672933;longest=0;Parent=Bostr.13129s0352.v1.2
Scaffold13129   phytozomev10    exon    1624377 1624574 .       -       .       ID=Bostr.13129s0352.2.v1.2.exon.1;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    CDS     1624377 1624417 .       -       0       ID=Bostr.13129s0352.2.v1.2.CDS.1;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    five_prime_UTR  1624418 1624574 .       -       .       ID=Bostr.13129s0352.2.v1.2.five_prime_UTR.1;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    exon    1623530 1624016 .       -       .       ID=Bostr.13129s0352.2.v1.2.exon.2;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    CDS     1623530 1624016 .       -       1       ID=Bostr.13129s0352.2.v1.2.CDS.2;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    exon    1623223 1623314 .       -       .       ID=Bostr.13129s0352.2.v1.2.exon.3;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    CDS     1623223 1623314 .       -       0       ID=Bostr.13129s0352.2.v1.2.CDS.3;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933
Scaffold13129   phytozomev10    exon    1622755 1622919 .       -       .       ID=Bostr.13129s0352.2.v1.2.exon.4;Parent=Bostr.13129s0352.2.v1.2;pacid=30672933


Alexander Dobin

unread,
Mar 9, 2018, 4:05:58 PM3/9/18
to rna-star
Hi Luise,

please try to convert GFF to GTF. You can use gffread from cufflinks package:
$ gffread my.gff -T -o my.gtf

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages