Custom GTF file

Haritz Irizar

unread,

Feb 29, 2016, 4:03:31 PM2/29/16

to rna-star

Hi!!

My name is Haritz Irizar and I am a postdoctoral fellow at the Icahn School of Medicine at Mount Sinai, New York.

We have generated a custom gtf file for a bunch of microbial genomes we would like to use as reference for the STAR alignment. The idea is to map our mouse reads to both the mouse reference genome (already indexed to be used in the alignment) and to the microbial genomes (including the fasta file that contains them and the custom gtf annotation file we create on the fly).

However, everytime we try, the execution halts in the "processing GTF" step and we cannot figure out what's wrong with the gtf file we are creating.

This is using STAR version 2.5.1b.

Could you please have a look at the gtf file? (attached)

What's the minimum information necessary in a GTF file for STAR to accept it?

Thanks a lot,

Haritz

SuperGenome2.gtf

Alexander Dobin

unread,

Feb 29, 2016, 5:08:45 PM2/29/16

to rna-star

Hi Haritz,

if I read it correctly, you

(i) combined many bacterial genomes into one large super-sequence

(ii) represented each bacterial sequence as an exon in the super-genome

(iii) gave all sequences of the same species (plasmids etc) the same gene_id and transcript_id

This looks like a nice hack to make STAR count reads per species.

Off the bat, I can think of only one thing that can cause a problem in GTF processing - all the "junctions" in your transcripts will have gaps of 0 lengths, since all your exons are adjacent to each other.

To avoid this, you need to assign a different transcript_id to each of your exons.

Please try it out on a small subset of bacteria (say 10) and if it does not work, please send me the GTF and and FAST files so that I can re-create the problem.

Cheers

Alex

Haritz Irizar

unread,

Mar 1, 2016, 12:51:58 PM3/1/16

to rna-star

Hi Alex,

As you suggested, I assigned a different transcript_id to each exon/microbe and it did the trick, so thanks for you quick an accurate response!!

Besides, thanks for STAR itself, too. What a wonderful tool for us working with RNAseq data.