Using STAR for RNA-seq in hg19

952 views
Skip to first unread message

Carlos Guzman

unread,
Jul 11, 2016, 12:51:27 PM7/11/16
to rna-star
Hey guys,

I'm kind of new to using STAR for RNA-seq. Essentially what i'm trying to do is visualize my RNA-seq data by mapping and then converting the resulting bam file into a bigWig file.

I'm having a bit of trouble deciding which GTF file and Fasta file I should be using (and I usually have trouble in this regard). Since I am working with the hg19 genome, I figured I could use the Gencode files. However, there are several fasta files and gtf files and I'm not entirely sure which ones I should be using.

Would someone mind quickly letting me know which gtf files and fasta files I should be using and why (if it's not too much trouble)?

My first run through I used this gtf (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz) and this fasta (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz), but I ran into a problem where a lot of my reads were mapping to rRNA which made converting to bigWig via deeptools almost impossible (was running out of RAM even on a 128gb machine for a small 25gb file). 

All help is appreciated!

Alexander Dobin

unread,
Jul 12, 2016, 5:38:53 PM7/12/16
to rna-star
Hi Carlos,

the problem with the fasta files that you used is that it contains patches and haplotypes that you generally do not want to use for RNA-seq mapping.
I would recommend this fasta file for hg19/h37:
It contains chromosomes and unlocalized scaffolds, but no patches or haplotypes.
I would recommend the latter, though I have not tried it myself yet. 
Or, even better, switch to h38 genome assembly and annotations:

Note, that STAR can also generate the wiggle files from the sorted BAM file.

Cheers
Alex

Carlos Guzman

unread,
Jul 12, 2016, 8:40:27 PM7/12/16
to rna-star
Thanks for the super quick response and help Alex!
Reply all
Reply to author
Forward
0 new messages