Generating genome index with gff annotation file

5,154 views
Skip to first unread message

Konika Chawla

unread,
Aug 17, 2015, 11:04:23 AM8/17/15
to rna-star
Hi
I am not clear with what should be specified with the parameters --sjdbGTFtagExonParentTranscript, if I am using a gff annotation file for generating genome index 

my command looks like 
STAR  --runMode genomeGenerate --genomeDir /local/data/genomes/gff/star --genomeFastaFiles GCF_000001635.23_GRCm38.p3_genomic.fna --sjdbGTFfile GCF_000001635.23_GRCm38.p3_genomic.gff

Thanks in advance.


Konika Chawla

unread,
Aug 18, 2015, 7:18:46 AM8/18/15
to rna-star
i think this will work 
STAR  --runMode genomeGenerate --genomeDir /local/data/genomes/gff/star --genomeFastaFiles GCF_000001635.23_GRCm38.p3_genomic.fna --sjdbGTFfile GCF_000001635.23_GRCm38.p3_genomic.gff --sjdbGTFtagExonParentTranscript gene --sjdbOverhang 80

Alexander Dobin

unread,
Aug 19, 2015, 6:32:20 PM8/19/15
to rna-star
Hi Konika,

typically, for GFF3 files you need to use --sjdbGTFtagExonParentTranscript Parent.

If this does not work, please post a few "exon" lines of your GFF file.

Cheers
Alex

Konika Chawla

unread,
Aug 25, 2015, 8:13:07 AM8/25/15
to rna-star
Here are few lines from gff file
NC_000067.6     Gnomon  exon    3670552 3671742 .       -       .       ID=id1;Parent=rna0;Dbxref=GeneID:497097,Genbank:XM_006495550.2,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4%2C transcript variant X1;transcript_id=XM_006495550.2
NC_000067.6     Gnomon  exon    3421702 3421901 .       -       .       ID=id2;Parent=rna0;Dbxref=GeneID:497097,Genbank:XM_006495550.2,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4%2C transcript variant X1;transcript_id=XM_006495550.2
NC_000067.6     Gnomon  exon    3213439 3216968 .       -       .       ID=id3;Parent=rna0;Dbxref=GeneID:497097,Genbank:XM_006495550.2,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4%2C transcript variant X1;transcript_id=XM_006495550.2
NC_000067.6     Gnomon  exon    3199731 3207317 .       -       .       ID=id4;Parent=rna0;Dbxref=GeneID:497097,Genbank:XM_006495550.2,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4%2C transcript variant X1;transcript_id=XM_006495550.2
NC_000067.6     BestRefSeq      exon    3670552 3671498 .       -       .       ID=id5;Parent=rna1;Dbxref=GeneID:497097,Genbank:NM_001011874.1,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4;transcript_id=NM_001011874.1
NC_000067.6     BestRefSeq      exon    3421702 3421901 .       -       .       ID=id6;Parent=rna1;Dbxref=GeneID:497097,Genbank:NM_001011874.1,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4;transcript_id=NM_001011874.1
NC_000067.6     BestRefSeq      exon    3214482 3216968 .       -       .       ID=id7;Parent=rna1;Dbxref=GeneID:497097,Genbank:NM_001011874.1,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4;transcript_id=NM_001011874.1
NC_000067.6     Gnomon  exon    3670552 3671742 .       -       .       ID=id8;Parent=rna2;Dbxref=GeneID:497097,Genbank:XM_011238395.1,MGI:MGI:3528744;gbkey=mRNA;gene=Xkr4;product=X Kell blood group precursor related family member 4%2C transcript variant X3;transcript_id=XM_011238395.1
 

Alexander Dobin

unread,
Aug 25, 2015, 6:01:36 PM8/25/15
to rna-star
Hi Konika,

--sjdbGTFtagExonParentTranscript Parent should work for this file.

Cheers
Alex

Konika Chawla

unread,
Aug 27, 2015, 5:01:57 AM8/27/15
to rna-star
Hi,
I am interested in the name of the gene that the exon belongs to rather than the parent specified as Parent=rna0. I hope it is ok to use 
--sjdbGTFtagExonParentTranscript gene

Thanks
Konika


On Monday, 17 August 2015 17:04:23 UTC+2, Konika Chawla wrote:

Alexander Dobin

unread,
Aug 28, 2015, 6:35:16 PM8/28/15
to rna-star
Hi Konika,

most likely this will not work. If you have >1 isoform per gene, the exons from different isoforms will be all lumped together in one "transcript".
This will create strange situations, e.g. overlapping exons. The results will be unpredictable.

Cheers
Alex

Roxy J

unread,
Jan 5, 2016, 11:25:52 AM1/5/16
to rna-star
Hi Alex,
Are you saying that she should use the --sjdbGTFtagExonParentTranscript gene option or she shouldn't? I am having a high percentage of % reads mapped to too many loci with my data and I am starting to think that this is because I did use the -- sjdbGTFtagExonParentTranscript gene with my gff3 file when creation the genome index. Could you clarify?

Thanks,

RJ

Roxy J

unread,
Jan 5, 2016, 11:58:24 AM1/5/16
to rna-star
Sorry, I meant to say "i did NOT use the --sjdbGTFtagExonParentTranscript option"

Alexander Dobin

unread,
Jan 5, 2016, 5:58:52 PM1/5/16
to rna-star
Hi RJ,

from Log.final.out your previous e-mail, you got 0 annotated junctions, which means that the annotations did not work at all.
For a "standard" gff3 file, you need to use --sjdbGTFtagExonParentTranscript Parent .
Please check that exonic (3rd field=exon) lines from you gff3 file contain "Parent=..." attribute (or post a few lines).

Cheers
Alex

negin Valizadegan

unread,
Oct 23, 2018, 9:56:34 AM10/23/18
to rna-star
Hi Alex,

I am also using gff files but using the Parent option does not work for me. I get the following error:

EXITING: FATAL INPUT ERROR: empty value for parameter "sjdbGTFtagExonParentTranscript Parent" in input "Command-Line-Initial"

SOLUTION: use non-empty value for this parameter

Alexander Dobin

unread,
Oct 23, 2018, 12:37:54 PM10/23/18
to rna-star
Hi,

Please check your command line, I think this is the problem. If you still get an error, please send me the Log.out file.

Cheers
Alex

negin Valizadegan

unread,
Oct 25, 2018, 3:09:57 PM10/25/18
to rna-star
What do you mean from checking my command line?

Alexander Dobin

unread,
Oct 25, 2018, 4:09:25 PM10/25/18
to rna-star
I think the space in  "sjdbGTFtagExonParentTranscript Parent" is not really space since STAR thinks this whole thing is a parameter name. 
This could happen if you copy-pasted it. Pease re-type it.
Reply all
Reply to author
Forward
0 new messages