Build an index containing only transcripts

1,372 views
Skip to first unread message

Po

unread,
Jan 11, 2016, 11:16:52 AM1/11/16
to rna-star
Dear all,

I wonder if it is possible to build a STAR-index composed of only transcripts based on Gencode GTF and GRCh38 fasta file (no non-transcript sequence in the index)? I tried to build an index based on the transcript-only fasta file of Gencode v23 but it didn't worked out (error after short running). 

Thanks,
Po   

Kirill Tsyganov

unread,
Jan 11, 2016, 4:44:24 PM1/11/16
to Po, rna-star
Hi Po,

If all you want to do is to align your RNA-seq data to the transcriptome rather than genome, then yes, you can do that with STAR. I did it before and it worked fine for me. The only thing I'm confused about is when you mention GTF file. I don't think you need any GTF files when aligning to the transcriptome, since if you have mapped read then you know which transcript it came from i.e you know the gene. 

When you are building an index with STAR, insure that your FASTA hold transcript sequence i.e you shouldn't really have "> Chr1" names, instead each sequence should have transcript name e.g "> CTNNB1". 

I'm not sure if that's in you question, but just to make it clear - STAR can not use GTF file to make transcriptome from genome FASTA file on the fly.

If you still struggling post `head` of your transcriptome file. Also post an error you are getting when generating an index, that will help.  

Cheers, 

Kirill

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/rna-star.

Alexander Dobin

unread,
Jan 11, 2016, 5:43:26 PM1/11/16
to rna-star, pchi...@gmail.com
Hi Po, Kirill,

Kirill's suggestions are right to the point.
If you do not have transcriptome FASTA file, you can generate it from genome FASTA and GTF (I use rsem-prepare-reference from RSEM package).

Cheers
Alex
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+unsubscribe@googlegroups.com.

Po

unread,
Jan 11, 2016, 10:42:37 PM1/11/16
to rna-star, pchi...@gmail.com
Hi Kirill and Alex,

Thank you for the responses. 

I did use the fasta file of transcripts to build index. The program stopped soon after running and showed the following error message:

Jan 12 11:33:41 ..... Started STAR run
Jan 12 11:33:41 ... Starting to generate Genome files
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

My input in the terminal:
STAR  --runMode genomeGenerate --runThreadN 2  --genomeSAsparseD 2 --genomeDir '/po/GRCh38_Gencode23_transcripts_STAR_INDEX'    --genomeFastaFiles '/po/gencode.v23.transcripts.fa'

The fasta file looks like this:
>ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-002|DDX11L1|1657|processed_transcript|
GTTAACTTGCCGTCAGCCTTTT.................AAGCACACTGTTGGTTTCTG
>ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|
GTGTCTGACTTCCAGCAA...........GAAAACAGGGGAATCCCGAA

Cheers,
Po

Kirill Tsyganov於 2016年1月12日星期二 UTC+8上午5時44分24秒寫道:

Alexander Dobin

unread,
Jan 12, 2016, 4:34:35 PM1/12/16
to rna-star
Hi Po,

as this file contains many references, you need to decrease --genomeChrBinNbits to reduce required RAM. I would recommend  --genomeChrBinNbits 12

Cheers
Alex

Po

unread,
Jan 15, 2016, 7:53:18 PM1/15/16
to rna-star
Hi Alex,

It worked!  Thank you.

More more question:
If I would like to have the outputs of unmapped and mapped reads in separate FASTA files, is it possible with STAR? 

Cheers,
Po


Alexander Dobin於 2016年1月13日星期三 UTC+8上午5時34分35秒寫道:

Alexander Dobin

unread,
Jan 19, 2016, 3:32:48 PM1/19/16
to rna-star
Hi Po,

you can send unmapped reads into FASTQ/A files using --outReadsUnmapped Fastx command.
The mapped reads go into the SAM/BAM files - if you need them as FASTA, you can use picard or bamtools to extract them.

Cheers
Alex

Po

unread,
Jan 19, 2016, 9:16:25 PM1/19/16
to rna-star
A million of thanks Alex!

Alexander Dobin於 2016年1月20日星期三 UTC+8上午4時32分48秒寫道:
Reply all
Reply to author
Forward
0 new messages