Running STAR for alignment

1,555 views
Skip to first unread message

SRB

unread,
Sep 8, 2014, 11:06:31 PM9/8/14
to rna-...@googlegroups.com
Hi All
I am trying to use STAR instead of tophat for alignment, but I am little confused on how to use it. I have a reference fasta file and also GTF file. 
So I think I should skip the "generate genome' and 'generate genome with annotation' (steps 4 and 5 in the manual), right?
Should I directly use the following? and put my reference file under "/path/to/Genome" directoy?

STAR --genomeDir /path/to/GenomeDir --readFilesIn /path/to/read1 [/path/to/read2] --runThreadN <n> --<inputParameterName> <input parameter value(s)> 


Alexander Dobin

unread,
Sep 9, 2014, 11:48:38 PM9/9/14
to rna-...@googlegroups.com
Hi SRB,

you need to generate genome first in the /path/to/genome/dir/ directory:
STAR --runMode genomeGenerate   --genomeDir /path/to/genome/dir/   --genomeFastaFiles g1.fa g2.fa ...  --sjdbGTFfile annot.gtf --sjdbOverhang 100  --runThreadN 5
Then you run mapping pointing to the same genome directory:
STAR --genomeDir /path/to/genome/dir/ --readFilesIn /path/to/read1 [/path/to/read2] ...all other parameters...

Cheers
Alex

Giovanni Marques de Castro

unread,
Sep 25, 2014, 2:35:25 PM9/25/14
to rna-...@googlegroups.com
If I didn't use the GTF during the genome generate , can i still use it during the align ? Or it will have no effect?
That is, if I have a newer gtf/gff, will i need to index the genome again?

Alexander Dobin

unread,
Sep 30, 2014, 11:33:53 AM9/30/14
to rna-...@googlegroups.com
Hi Giovanni,

at the moment GTF can only be used at the genome generation step, and you need to re-generate genome for each gtf ot gff file.
I will release a patch shortly that will allow adding annotations to the genome on the fly, i.e. without re-generatin the genome.

Cheers
Alex

Mike

unread,
Dec 15, 2014, 1:47:25 PM12/15/14
to
On Tuesday, 30 September 2014 11:33:53 UTC-4, Alexander Dobin wrote:
Hi Giovanni,

at the moment GTF can only be used at the genome generation step, and you need to re-generate genome for each gtf ot gff file.
I will release a patch shortly that will allow adding annotations to the genome on the fly, i.e. without re-generatin the genome.

Cheers
Alex

 
Hi, I'm wondering about using --sjdbGTFfile, --sjdbGTFchrPrefix, and --sjdbOverhang options during genome generation step vs. alignment step.  Is it correct that if I've generated the genome using these options I don't need to include them during alignment?  What happens if I include them during both the genome generation and alignment steps, and what if I use different options for the two (eg. if using different overhang lengths will the one provided during genome generation or during alignment be used)?
 

Alexander Dobin

unread,
Dec 17, 2014, 10:41:32 PM12/17/14
to rna-...@googlegroups.com
Hi Mike,

presently, these parameters are only used at the genome generation step. If you specify them also at the alignment step, they will be ignored. The only sjdb parameter that is used at the alignment step is --sjdbScore (bonus score for annotated junctions, =2 by default).

Cheers
Alex


On Monday, December 15, 2014 1:47:25 PM UTC-5, Mike wrote:
On Tuesday, 30 September 2014 11:33:53 UTC-4, Alexander Dobin wrote:
Hi Giovanni,

at the moment GTF can only be used at the genome generation step, and you need to re-generate genome for each gtf ot gff file.
I will release a patch shortly that will allow adding annotations to the genome on the fly, i.e. without re-generatin the genome.

Cheers
Alex

Rahil Sethi

unread,
Nov 16, 2015, 2:28:14 PM11/16/15
to rna-star
Hi Alex,

Whatever you said about mentioning gtf file and sjdbOverhang is in clear agreement with your manual:
file:///Users/ras143/Documents/Rahil_Sethi/STARmanual_2.3.0.1.pdf in the 4th "Generating genomes with annotations"  step : "The annotations have to be supplied at the genome/suffix array generation step. The annotations can be supplied in the form of splice junctions’ loci or GTF (or GFF3) file."

However when I look at the paper you published: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4631051/pdf/nihms722197.pdf  where you mentioned pipeline for analysis of unstranded reads with cufflinks (Page 16, Alternate Protocol 9):
~/star/code/STAR-STAR_2.4.0k/bin/Linux_x86_64/STAR\ --runThreadN 12 --genomeDir ~/star/genome/ \ --sjdbGTFfile ~/star/Homo_sapiens.GRCh38.79.gtf --sjdbOverhang 100 \
--readFilesIn ~/star/ENCFF001RFH.fastq.gz ~/star/ENCFF001RFG.fastq.gz -- readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate Unsorted \ --outSAMstrandField intronMotif

This is in disagreement with what you said. In the above example you illustrated --sjdbGTFfile and --sjdbOverhang while running mapping. Was this just to illustrate that these options need to be mentioned during genome generation?
I'm using current version STAR_2.5.0a

I just wanted to be clear since I got confused when I looked at the above paper. 

Also, I'm not that clear with when to use --quantMode TranscriptomeSAM ??

I will keep searching in your forums for my last question, but my questions above this are related to this discussion.

Alexander Dobin

unread,
Nov 17, 2015, 5:39:06 PM11/17/15
to rna-star
Hi Rahil,

it seems you are using an old manual. please check the latest version https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
Since 2.4.1a, it's also possible to include the GTF file at the mapping step.

--quantMode TranscriptomeSAM has to be specified at the mapping step only. With this option GTF file has to be specified at least once, at the genome generation or mapping stages.
If you specify two different GTF files at both stages, only the one from the mapping stage will be used.

Cheers
Alex

Morris Chair

unread,
Sep 17, 2019, 10:49:02 AM9/17/19
to rna-star
Hello Alexander, 
I used STAR for counting the reads using the function "--quantMode TrancriptomeSAM GeneCounts" but I didn't use the annotation GTF file for indexing the genome or for counting the reads, do I have to be concerned for the result?

Thank you 

Alexander Dobin

unread,
Sep 17, 2019, 11:37:46 AM9/17/19
to rna-star
Hi Morris,

you have to have GTF file either at the genome generation step, or at the mapping step.
It will not work otherwise - it would not know where the transcripts and genes are.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages