Using STAR to get the "raw count" from an aligned againt transcript file

916 views
Skip to first unread message

Osama Hamzah

unread,
Nov 20, 2016, 6:58:35 PM11/20/16
to rna-star
I already used the STAR tool to align the reads and quantify against the transcript using the following command --quantMode TranscriptomeSAM, then i am using the following command to get the "raw count":

STAR --runMode inputAlignmentsFromBAM --runThreadN 24 --inputBAMfile SRR1164787Aligned.toTranscriptome.out.bam --outWigType bedGraph --outWigStrand Unstranded --outWigNorm None

I am now getting two output files: 'Signal.Unique.str1.out.bg' and 'Signal.UniqueMultiple.str1.out.bg'.

head Signal.Unique.str1.out.bg

NM_014704       73      75      3
NM_014704       75      123     4
NM_014704       123     125     1
NM_014704       134     160     3
NM_014704       160     184     4
NM_014704       184     189     1
NM_014704       189     210     2
NM_014704       210     215     1
NM_014704       215     230     2
NM_014704       230     239     4

head Signal.UniqueMultiple.str1.out.bg

NM_014704       73      75      3
NM_014704       75      123     4
NM_014704       123     125     1
NM_014704       134     160     3
NM_014704       160     184     4
NM_014704       184     189     1
NM_014704       189     210     2
NM_014704       210     215     1
NM_014704       215     230     2
NM_014704       230     239     4

I know the first field is the transcript name, but what about the other fields. What i am looking for is the total number of reads for each transcript.

Alexander Dobin

unread,
Nov 21, 2016, 11:20:04 AM11/21/16
to rna-star
Hi Osama,

--outWigType bedGraph generates the "bedGraph" files (https://genome.ucsc.edu/goldenpath/help/bedgraph.html) which contain the "signal" (number of reads) per each base in the genome. These files are useful for visualizing the data in the genomic browsers.
You can get the count of reads per genes (i.e. aggregated over all transcripts for each gene) using --quantMode GeneCounts. This has to be done at the mapping stage, and it can be done simultaneously with TranscriptomeSAM, i.e. --quantMode TranscriptomeSAM GeneCounts .
If you want quantification of the transcripts, you would need to use RSEM on the Aligned.toTranscriptome.out.bam file generated by STAR with the --quantMode TranscriptomeSAM option.

Cheers
Alex

Osama Hamzah

unread,
Nov 21, 2016, 11:27:07 AM11/21/16
to rna-star
I already used RSEM to get the counts of transcripts, but the output of RSEM is in 'expected_counts' which is a normalized output. Unfortunately this input is not accepted in DESEQ, as it requires to the raw count of the reads per transcript, that is why I was trying the STAR tool for counting.

Any suggestions?

Appreciate the support.

Alexander Dobin

unread,
Nov 21, 2016, 11:35:50 AM11/21/16
to rna-star
DESeq actually needs the counts per gene (not transcripts), so the  --quantMode GeneCounts option will work.
You will need to re-map your data with this option. You can use --outSAMtype None to prevent generating the large SAM/BAM files again. 
In the future you can add this option to your standard run.

Osama Hamzah

unread,
Nov 21, 2016, 11:38:05 AM11/21/16
to rna-star
Great, thanks a lot for the help.
Reply all
Reply to author
Forward
0 new messages