STAR aligner bam output processing for downstream tools like htseq-count

669 views
Skip to first unread message

Komal Rathi

unread,
Oct 9, 2015, 11:00:35 AM10/9/15
to rna-star
Hi everyone,

I posted this question on seqanswers but realized I am bound to get more/quick answers here. 

I am a new user of STAR Aligner. I have been using GSNAP until now, but my new group likes STAR better. I am very excited about using it and have a couple of questions too. We will be using paired end fastq files as input. I saw there are couple of really cool things with STAR:

1. Output coordinate-sorted/unsorted BAM files using --outSAMtype
2. Output read counts using --quantMode 

Question 1: Does the --quantMode perform equally well to htseq-count? Are the read-counts identical in both cases? 

When I was using GSNAP, after I obtained the output BAM file, I used to perform the following steps before using htseq-count:
1. samtools fixmate to fill in mate information
2. bamtools filter to keep only "reads in proper pair" and CIGAR string should indicate atleast one match i.e. 'M'

Question 2: So do I have to do these steps after I obtain the bam output from STAR? 

Question 3: My command line for htseq-count is as follows with the GSNAP output: 
samtools view -f 0x0002 sample.bam | htseq-count - $GTF > sample.counts

Question 3: Do I have to use '-f 0x0002' (i.e. reads mapped in proper pair) while using htseq-count on STAR output bam file as well? 

In general, after using STAR aligner, obtaining the bam file, what processing I need to do to get counts of uniquely mapped reads from htseq-count?

Thanks a lot!

Alexander Dobin

unread,
Oct 9, 2015, 2:43:32 PM10/9/15
to rna-star
Hi Komal,

--quantMode GeneCounts should produce exactly the same counts as htseq-count with 3 columns corresponding to the 3 --stranded options in htseq (all other options default, in particular --mode=union).

You do not need to use fixmate in STAR output. With default parameters, STAR will only output paired alignments, so no need for that filtering either.
Uniquely mapped reads are recognized by htseq,  so you can simply use htseq-count Aligned.out.sam $GTF > sample.counts

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages