how to start STAR

3,180 views
Skip to first unread message

anisha honey

unread,
Dec 4, 2013, 6:14:58 AM12/4/13
to rna-...@googlegroups.com
hey
Iam new user wanted to start star and i also have reads with me along with reference genome where am suppose to align.I tried the path given in manual can any one please explain me how to direct it ...in my server STAR is already installed

Thanks in advance

Daniel Fernandez

unread,
Dec 4, 2013, 10:45:43 AM12/4/13
to rna-...@googlegroups.com
Hi Anisha, are you familiar with using software command line? which server are you using?

The way to use it is command line.  I.e., You'd just write the STAR commands in the manual, press enter and that would start STAR mapping, then wait and after a few hours (~ 1hour or so) you'd have the bam files with the mapped reads to the genome...

Alexander Dobin

unread,
Dec 7, 2013, 11:23:45 AM12/7/13
to rna-...@googlegroups.com
Hi Anisha,

to expand a little on Daniel's suggestions, here are example STAR commands. 
First, you need to "generate genome" for your species. Make a new directory at /path/to/genome/dir/, and then run:

STAR --genomeDir /path/to/genome/dir/ --runMode genomeGenerate --genomeFastaFiles  /path/to/genome1.fa  /path/to/genome2.fa …  --sjdbGTFfile /path/to/annotation.gtf --sjdbOverhang 100 --runThreadN 8

Once this is done, you can start mapping your files. Make a new work directory for each of the STAR runs, cd to it and then run
STAR --genomeDir /path/to/genome/dir/ --runThreadN 8 --readFilesIn /path/to/Read1.fastq /path/to/Read2.fastq
STAR will write all the output files in the directory from which you run it, although you can change this with --outFileNamePrefix <string-file-path-name-prefix>
If you files are gzipped, you need to add "--readFilesCommand zcat" option.

Cheers
Alex

Quan Gu

unread,
Feb 25, 2014, 6:33:03 AM2/25/14
to rna-...@googlegroups.com
Hi Alex,

I am confused what you mentioned "generate genome".

 Q1: If I need to generate the reference genome, why in your command have two genome files i.e. genome1.fa and genome2.fa.?
because if we ran Tophat, as you know the command line is like:

$ tophat -G genes.gtf -o Tophat_out genome.* Read_1.fq Read_2.fq

(where genes.gtf and genome.* are reference genome with bowtie index I downloaded from Ensembl, but not two reference genome1 and genome2)

Q2:Is STAR only suitable for Illumina platform or pair-ended reads? because I saw your example command line is "sample_1.fq" and "sample_2.fq?"

Q3: If I don't need to generate the genome, how do I run the command?

My current folder is like this

$ls myrna

genomepath Reads_1.fq Reads_2.fq 

where genomepath is a folder containing the genome.fa genes.gtf and bowtie index which I downloaded from iGnomes/Illumina.

If I ran
$STAR --genomeDir
genomepath --runThreadN --readFilesIn Read_1.fq Read_2.fq

It will give me the error:

EXITING because of FATAL ERROR: could not open genome file genomepath/genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomDir is correct and the files are present, and have user read permsissions

Is that means that I must use STAR homepage reference genome (ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/STARgenomes/) or generate the genome?

Thank you for your kind attention!

Cheers

Quan

Alexander Dobin

unread,
Feb 25, 2014, 4:17:37 PM2/25/14
to rna-...@googlegroups.com
Hi Quan,

Q2: STAR can be run on single end reads from any platform. Only Illumina format of the paired-end reads is supported, i.e. the two ends are expected to be sequenced from opposite strands.

Q1,Q3: Running STAR consists of two stages:
1. Generate the genome files.
$ STAR --runMode genomeGenerate --genomeDir genomepath --genomeFastaFiles  genomepath/genome.fa  --sjdbGTFfile genomepath/genes.gtf --sjdbOverhang 100 --runThreadN 8
This should take 1-2 hours and will write STAR genome files into genomepath directory.
2. Mapping the reads - your command should work fine:
$STAR --genomeDir genomepath --runThreadN 8 --readFilesIn Read_1.fq Read_2.fq
Note, that STAR will write a number of files into the "current" directory, and I strongly recommend running each STAR job from a fresh directory, or using unique --outFileNamePrefix for each run.

Cheers
Alex

BTS

unread,
Mar 26, 2014, 5:34:51 PM3/26/14
to rna-...@googlegroups.com
Hi Alex,
I have run STAR many times with no problem. I recently tried to update to 2.3.1z.  That build failed on my OSX possibly because of  G++ version. So I reverted to 2.3.0e executable (redownloaded) and it now throws the same error to me as Quan reported including the odd spelling for --genomDir.
Any help would be appreciated.
BTS

Alexander Dobin

unread,
Mar 28, 2014, 4:00:40 PM3/28/14
to rna-...@googlegroups.com
Hi BTS,

2.3.1z has experimental support for BAM output and requires zlib libraries. If it does not compile, please try to compile 2.3.1y.
The problem with --genomeDir is weird. What is the output of `ls -l  /Users/Will/Desktop/BioinfomaticsTools/Genomes/Mouse` ?

Cheers
Alex

Quan Gu

unread,
Apr 1, 2014, 5:24:36 AM4/1/14
to Alexander Dobin, rna-...@googlegroups.com
Hi  I have a naive question.
After  I get the output file  Aligned.out.sam, I want to get the BAM file.
Could I use this command?
samtools view -bS Aligned.out.sam > Aligned.bam




--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/yVxv2DOw2os/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

Alexander Dobin

unread,
Apr 1, 2014, 11:15:40 AM4/1/14
to rna-...@googlegroups.com, Alexander Dobin
Hi Quan,

it's correct. If you want the .bam file sorted by coordinate, you can pipe samtools sort like this:
samtools view -bS Aligned.out.sam | samtools sort -m30G -@6 - Aligned

Cheers
Alex
To unsubscribe from this group and all its topics, send an email to rna-star+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages