how to get gene symbols instead of gene_id / transcript_id?

30 views
Skip to first unread message

julian

unread,
Aug 7, 2017, 12:54:04 PM8/7/17
to rna-star
Hi,
My ReadsPerGene.out.tab files feature Ensembl IDs (eg, ENSMUSG00000102693.1), while I'd like them to include gene symbols or gene names instead.
I'm using a Genecode GTF (where both gene_id and transcript_id are Ensembl IDs.) I feel dumb but I don't see how instruct it to take the gene_name ID (12th field in the GTF) instead. I have tried STAR genomeGenerate with either default parameters or with --sjdbGTFtagExonParentTranscript gene_name (not there, Ensembl IDs at the final output yet). I'd appreciate receiving any clue.
Thanks!
J.

Alexander Dobin

unread,
Aug 7, 2017, 6:10:08 PM8/7/17
to rna-star
Hi Julian,

if all the "exon" lines have the gene_name tag, and all genes have the unique gene_name, you can use --sjdbGTFtagExonParentGene gene_name.
However, this GTF specifications only guarantee it for gene_id tags.

It might be safer to write a short script that will extract gene_id gene_name pairs from the GTF file, and then add the gene_name tag to the ReadPerGene file.

Cheers
Alex

julian

unread,
Aug 8, 2017, 9:08:48 AM8/8/17
to rna-star
Reply all
Reply to author
Forward
0 new messages