STAR aligner mapping to transcriptome

7,013 views
Skip to first unread message

Alexander Czerny

unread,
May 27, 2013, 4:19:44 AM5/27/13
to rna-...@googlegroups.com
Hi people,

iam new to the forum but i read in it for a long time and it helped me well, till now. I have 3 questions, where i hope u can help me answer them:

#1 at generating the genome for human i want to annotate it like this:
facts so far:
- i got single end reads 50 bp and
- ensembl whole_genome.fa
- for annotation i use gencode.v16.annotation.gtf, what i found fitting for the annotaion in this forum

command:
STAR --runMode genomeGenerate --genomeDir "path/Referenzgenom_hg19_STAR/Ensembl" --genomeFastaFiles $hg19_ensembl \
 --runThreadN 50 --sjdbFileChrStartEnd $gtf_human --sjdbOverhang 49 --genomeChrBinNbits 16

but when i map  my data against it i get no annotated sj and i didnt got a data file in the genome dir for the sj_annotation. Is the sjdbOverhand correct ?

#2 How exactly can i map my data against the transcriptome and afterwards leftovers against the genome ? How are u doing this.

 I got so far a indexed transcriptome from  ensembl and the genome from above.

Which leads me to question #3:

 I mapped against my transcriptome, but i found splicejunctions also in the final.out.log, but i think this shouldnt happen when i map against a transcriptome since all exon are allready "sticked" together, or how i should interpret this ?

I hope u can help me and thanks in advance.

Alex.

Alexander Dobin

unread,
May 29, 2013, 12:03:11 PM5/29/13
to rna-...@googlegroups.com
Hi Alex,

1. When you generate the genome, you need to use --sjdbGTFfile <annotation.gtf> at the genome generation step. Also, please check that the chromosome names are the same in gencode.gtf and ENSEMBL whole_genome.fa files. Gencode uses "chr" in chromosome names, while ENSEMBL does not, so you have to be careful.

2. If your genome generation with --sjdb* option works, STAR will be mapping to transcriptome and genome simultaneously, and will select the best alignment. You do not need to "map to transcriptome first, then map to the genome".

3. If you really want to map to transcriptome,  you would need to generate a genome with sequences of all annotated transcripts, each transcript will be a separate "chromosome". Note, that alignments in the .sam file will be given in the "transcriptome" coordinates. The spliced reads within the transcriptome, mostly non-canonical, will correspond to unannotated (novel) junctions.

Cheers
Alex

Alexander Czerny

unread,
Jun 10, 2013, 6:41:57 AM6/10/13
to rna-...@googlegroups.com
Hi Alex,

thx for your help, works fine now.

greetings, Alex.

broder...@googlemail.com

unread,
May 16, 2016, 4:33:28 AM5/16/16
to rna-star
Hi Alex,

3. If you really want to map to transcriptome,  you would need to generate a genome with sequences of all annotated transcripts, each transcript will be a separate "chromosome". Note, that alignments in the .sam file will be given in the "transcriptome" coordinates. The spliced reads within the transcriptome, mostly non-canonical, will correspond to unannotated (novel) junctions.

I would like to map to the transcriptome as described by you here under 3. I understood that it would be generally better to map against the genome to not force the reads into the transcripts. However, in this case we have few reads and loose some of them into the genome space. So we want to increase the sensitivity knowing that this decreases the specificity at the same time.
Well, I already generated the genome index, with each transcript being a separate "chromosome". Now I am struggling with getting a GTF file that has the correct coordinates to correspond with the SAM file. As a starting point I have the following files:


There seems to be no GTF file from ENSEMBL that is specifically for the transcriptome. So I guess I would have to use the genome GTF as a starting point.

Do you have an idea how I can go about this most effectively?

Cheers
Broder

Alexander Dobin

unread,
May 16, 2016, 5:22:45 PM5/16/16
to rna-star
Hi Broder,

when mapping to transcriptome, you do not need the GTF file, since all the information about transcripts is already contained in the sequnces of the transcriptome file.

Cheers
Alex

On Monday, May 16, 2016 at 4:33:28 AM UTC-4, 

A R

unread,
Mar 31, 2021, 1:33:10 PM3/31/21
to rna-star
Hello Alex,
I have a similar problem. I am trying to map shrimp mRNA transcripts to a (rather small) transcriptome FASTA file without using a GTF/GFF file. However, I get the following error when using the flag "--quantMode TranscriptomeSAM".
----------
Transcriptome.cpp:14:Transcriptome: exiting because of *INPUT FILE* error: could not open input file ./STAR_genome_index//geneInfo.tab
Solution: check that the file exists and you have read permission for this file
          SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step
-----------
I do have all permissions set. It works when I remove the --quantMode TranscriptomeSAM flag, but I really wanted to get counts for our project. Can you let me know if there is a way to use the flag without a GTF file?
Many thanks,
Anna Rawles

Alexander Dobin

unread,
Mar 31, 2021, 5:37:16 PM3/31/21
to rna-star
Hi Anna,

if you are mapping to the transcriptome, then the Aligned.out.bam already contains alignments to the transcripts - so that you do not need --quantMode TranscriptomeSAM option (which you cannot have without the GTF).

Cheers
Alex

A R

unread,
Apr 13, 2021, 10:20:37 AM4/13/21
to rna-star
Hi Alex,
I apologize for my late response - thank you for the informative reply!
Sincerely,
Anna
Reply all
Reply to author
Forward
0 new messages