Star 2Pass alignment (2.5.0c)

100 views
Skip to first unread message

bassu

unread,
Jul 13, 2016, 9:17:32 AM7/13/16
to rna-star
Hi,

I am want  to do a 2Pass method alignment to human genome(Primary assembly fasta). I have multiple samples and  I'm confused(cant understand) with the help document/other sites on the steps involved.

This is what I understand

1) Create a Star index of the reference fasta file

2) Align each sample individually to the above index and obtain the SJ.out.tab


STAR --genomeDir Index/ --readFilesIn Sample1/R1.fastq Sample1/R2.fastq  --outFileNamePrefix Sample1

:
:
:
STAR --genomeDir Index/ --readFilesIn SampleN/R1.fastq SampleN/R2.fastq -outFileNamePrefix SampleN


3) Creating  new index with previous reference fasta and SJ.out.tab from all samples

STAR  --runMode genomeGenerate --genomeDir NewIndex/  --genomeFastaFiles Reference.fa --sjdbFileChrStartEnd  Sample1/SJ.out.tab  ... SampleN/SJ.out.tab



4) Remapping to the New Index

STAR --genomeDir NEW_Index/ --readFilesIn Sample1/R1.fastq Sample1/R2.fastq  --outFileNamePrefix Sample1

:
:
:
STAR --genomeDir NEW_Index/ --readFilesIn SampleN/R1.fastq SampleN/R2.fastq -outFileNamePrefix SampleN


Please do let me know if my above steps is correct?

Alexander Dobin

unread,
Jul 13, 2016, 10:14:19 AM7/13/16
to rna-star
Hi @bassu,

your steps are correct. You may want to add annotations (GTF) files for both steps 1 and 3.
Also, you may want to filter the junctions in SJ.out.tab - for instance, you can remove chrM junctions and poorly supported non-canonical junctions.

Cheers
Alex

bassu

unread,
Jul 14, 2016, 6:12:30 AM7/14/16
to rna-star
In GATK method they are mentioning not to provide a known annotation(https://www.broadinstitute.org/gatk/guide/article?id=3891).

Does this effect the results by any means? What would be the differences by giving a known GTF vs by not giving any?

Alexander Dobin

unread,
Jul 14, 2016, 11:26:26 AM7/14/16
to rna-star
Hi @bassu,

the annotations are generally helpful, though may not be necessary for the SNP calling purposes, since they affect only mapping of low-abundance splices.

Cheers
Alex

Yifang Tan

unread,
Aug 26, 2016, 12:27:49 PM8/26/16
to rna-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages