STAR - htseq-count : sam unsorted

1,900 views
Skip to first unread message

Nico R

unread,
Mar 13, 2013, 4:33:51 AM3/13/13
to rna-...@googlegroups.com
Hi,

I've a problem using htseq-count to extract the read count per gene. I've this type of warning for every read in my sam file

Warning: Read HWI-ST1172:65:C0RN7ACXX:8:1314:14367:98257 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)


I don't understant beacause I sorted the Aligned.out.sam before using htseq-count

samtools sort Aligned.out.sam Aligned.sorted

htseq-count Aligned.sorted.sam gene.gtf > read_count.txt


Thank you for your help,

N.

Nicolas Stransky

unread,
Mar 13, 2013, 7:01:54 AM3/13/13
to Nico R, rna-star

It should be sorted by name. (-n)

Nicolas

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star?hl=en.
 
 

Scott Smith

unread,
Mar 13, 2013, 5:36:19 AM3/13/13
to Nico R, rna-...@googlegroups.com
Does your BAM file have paired-end unaligned reads for which the query string still has a /1 or /2 at the end?

They won't be identified as ends of the same pair when one of the pair aligns and the other does not.  (There should be no /1 or /2 on the first column if you run samtools view.)

I just fought through this issue with tophat + htseq-count.  I was wondering if we would have it with rna-star too.  It required a tophat patch (which I think we got in 2.0.7?).

I got around it by adding -F 0x0004 when running "samtools view" to cut out the unaligned reads entirely, though having correct BAMs is more optimal.

I'll be curious to hear whether this is really your situation...

Scott
--

Nico R

unread,
Mar 13, 2013, 10:59:42 AM3/13/13
to rna-...@googlegroups.com, Nico R
Thanks it's working with sort -n 

thanks

Alexander Dobin

unread,
Mar 13, 2013, 11:27:07 AM3/13/13
to rna-...@googlegroups.com, Nico R
Hi Scott,

STAR does not include /1, /2 in the read IDs, so it should not be a problem. Also, by default STAR only output correctly paired alignments.

Cheers
Alex

Scott Smith

unread,
Mar 13, 2013, 12:47:52 PM3/13/13
to Alexander Dobin, rna-...@googlegroups.com, Nico R, ssm...@genome.wustl.edu


Sent from my iPhone

On Mar 13, 2013, at 10:27 AM, Alexander Dobin <ado...@gmail.com> wrote:

Hi Scott,

STAR does not include /1, /2 in the read IDs, so it should not be a problem.

Ah good.

Also, by default STAR only output correctly paired alignments.

Will STAR output discordant read pairs, as would be useful for gene fusion detection in cancer tumors?

Looking forward to trying it out...

archana bhardwaj

unread,
Apr 5, 2014, 5:00:03 AM4/5/14
to rna-...@googlegroups.com
Hello everyone
I am working on paired end ranseq dataset. I need to count the no. of reads mapped to specific gene attributes. What should be the exact parameter while running htseq-count over paired end rnaseq data.Proper command that i am using is

htseq-count -s no in.sam  reference.gtf   > count.txt

I am in dilemma whether i should choose s yes or no ???

Waiting for reply




Nico

unread,
Apr 7, 2014, 6:25:20 AM4/7/14
to rna-...@googlegroups.com
Are your data strand-specific ? If yes you should use -s yes or -s reverse depending on your library type. If unstranded -s no

Alexander Dobin

unread,
Apr 8, 2014, 11:57:08 AM4/8/14
to rna-...@googlegroups.com
To complement Nico's answer:
unstranded data (like unstranded Illumina Tru-seq): -s no
stranded, second mate's strand agrees with the RNA (like stranded Illumina Tru-seq, or classis dUTP protocol): -s reverse
stranded, first mate's strand agrees with the RNA: -s yes

Mp

unread,
May 14, 2015, 9:31:15 AM5/14/15
to rna-...@googlegroups.com
So for  Truseq total stranded  I need to use fr.stranded? (http://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf)

Alexander Dobin

unread,
May 15, 2015, 4:56:36 PM5/15/15
to rna-...@googlegroups.com, maurizio...@gmail.com
Hi @Mp,

not sure which software fr.stranded refers to?

Cheers
Alex

Mp

unread,
May 18, 2015, 4:57:53 AM5/18/15
to rna-...@googlegroups.com
I mean for htseq-count.

My question follow the first question of the post. After alignment using star I need to count the genes and I need use htseqcount. Which is the right parameter for  total stranded RNA?

Alexander Dobin

unread,
May 19, 2015, 5:34:03 PM5/19/15
to rna-...@googlegroups.com, maurizio...@gmail.com
Hi @Mp,

for stranded Illumina Tru-seq data (1st read is on the opposite strand of the RNA molecule) you would need to use either "-s reverse" option.
Other strand orientation uses "-s yes" option.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages