Providing a GTF at runtime that is different from the GTF used to build the index

49 views

Skip to first unread message

Cole Wunderlich

unread,

Jun 16, 2020, 6:29:50 PM6/16/20

to rna-star

Hello All,

I currently have several different GTFs and I would like to test how they affect STAR alignment. I know that GTFs can be supplied at run-time/on the fly, but I am not sure if the GTF used to build the index has any effect on this.

For example, let's say I wanted to test how GTF-A affects alignment but have a STAR index that was built using GTF-B. If I supply GTF-A at run-time, will I get the same results as if I had built the index with GTF-A, or will the fact the index was built with GTF-B affect the results? (meaning I need to rebuild with GTF-A to get the proper results)

Another question I have is what is the most efficient way to go about using multiple GTFs? Is it faster to:

a) Build the index using no GTF, then provide each GTF at runtime.

b) Build the index with one GTF, then provide the others at runtime.

c) Build an individual index for each GTF.

If STAR is just rebuilding the index under the hood every time a GTF is provided on the fly, then I guess these questions are largely moot since it would be just as fast to build a bunch of new indices as it would be to supply things at run-time (a=c).

Alexander Dobin

unread,

Jun 24, 2020, 7:48:30 PM6/24/20

to rna-star

Hi Cole,

if you built genome with GTF-B and supplied GTF-A at mapping step:

1. The splice junctions from GTF-B will be added to the splice junctions from GTF-A. This will affect the alignments, i.e. output to Aligned.out.sam/bam and junctions in SJ.out.tab.

These results will be the same as those with genome built with GTF-A+B from the start.

2. The genes and transcript will be taken from GTF-B only, so the ReadsPerGene counts and Aligned.toTranscriptome.bam will be affected.

These results will be different from genome built with GTF-A+B.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages