different results if input BAM is sorted

299 views
Skip to first unread message

Matthew

unread,
May 15, 2015, 4:15:03 PM5/15/15
to rsem-...@googlegroups.com
Hello,

I align my reads using an external aligner (not bowtie(2) or BWA). The output is a typical, unsorted BAM file.  Call it  algmts.bam

The results from RSEM (and EBSeq) are completely different (and conflicting)  if I run rsem-calculate-expression  in the following two ways:
  1. rsem-calculate-expression  [options]  --bam  algmts.bam  [etc]     (i.e. give RSEM an unsorted BAM)
  2. sort algmts.bam via samtools to get  algmts.sorted.bam  , then call:   rsem-calculate-expression  [options]  --bam algmts.sorted.bam  [etc]

In case 1 (i.e. unsorted BAM) , the algorithm converges in a "normal fashion", i.e. from the RSEM diagnostic output, i can see the EM converging at a modest speed (say 1k to 4.5k iterations, depending on the input file).

In case 2 (i.e. sorted BAM), the algorithm converges immediately - 0 change in parameter values after step 1.



Note:  there are no "read groups" in this data.  And it is all single-ended reads of the same length.

Thank you!

Matthew

unread,
May 15, 2015, 4:16:57 PM5/15/15
to rsem-...@googlegroups.com
Also:   i am using rsem 1.2.21

Bo Li

unread,
May 15, 2015, 4:26:37 PM5/15/15
to rsem-...@googlegroups.com
Hi Matthew,

That's understandable. If you sort the BAM file using samtools, the
reads are sorted by the transcript_id/aligned position. Then if a read
aligns to different locations, its alignments will be *** separate ***.
However, RSEM requires the alignments of a same read to be grouped
together (see the second paragraph of Usage/II. Calculating Expression
Values/Using an alternative aligner of the RSEM README file). If you
provide RSEM a sorted BAM, RSEM will assume every read is uniquely
aligned and converge very quickly. But the results are wrong.

So my suggestion is, do not sort your BAM file unless you can guarantee
that the alignments of a same read are grouped together after the
sorting.

Hope it helps,
Bo

On 2015-05-15 13:16, Matthew wrote:
> Also: i am using rsem 1.2.21
>
> On Friday, May 15, 2015 at 4:15:03 PM UTC-4, Matthew wrote:
>
>> Hello,
>>
>> I align my reads using an external aligner (not bowtie(2) or BWA).
>> The output is a typical, _unsorted_ BAM file. Call it _algmts.bam_
>>
>> The results from RSEM (and EBSeq) are completely different (and
>> conflicting) if I run rsem-calculate-expression in the following two
>> ways:
>>
>> * rsem-calculate-expression [options] --bam _algmts.bam _[etc]
>> (i.e. give RSEM an unsorted BAM)
>> * sort _algmts.bam_ via samtools to get _algmts.sorted.bam_ , then
>> call: rsem-calculate-expression [options] --bam _algmts.sorted.bam_
>> [etc]
>>
>> In case 1 (i.e. UNSORTED BAM) , the algorithm converges in a "normal
>> fashion", i.e. from the RSEM diagnostic output, i can see the EM
>> converging at a modest speed (say 1k to 4.5k iterations, depending
>> on the input file).
>>
>> In case 2 (i.e. sorted BAM), the algorithm converges immediately - 0
>> change in parameter values after step 1.
>>
>> Note: there are no "read groups" in this data. And it is all
>> single-ended reads of the same length.
>>
>> Thank you!
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1] http://deweylab.biostat.wisc.edu/rsem/
> [2] http://groups.google.com/group/rsem-users

Matthew

unread,
May 15, 2015, 5:23:30 PM5/15/15
to rsem-...@googlegroups.com
Excellent!  Thank you Bo!  That makes perfect sense.

I have read that part of the rsem-calculate-expression usage/README probably a dozen times.
I had previously misinterpreted "...all alignments of the same read group together" to mean  "read group"  (i.e. the @RG line of a sam/bam header), rather than "all alignments of the same read should be adjacent in the bam file".

Bo Li

unread,
May 15, 2015, 7:24:31 PM5/15/15
to rsem-...@googlegroups.com
Hi Matthew,

Good point. I'll change the README file to avoid potential confusions.

Thanks,
Bo
>>> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [1] [1]
>>> ---
>>> You received this message because you are subscribed to the Google
>>
>>> Groups "RSEM Users" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to rsem-users+...@googlegroups.com.
>>> To post to this group, send email to rsem-...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/rsem-users [2]
>> [2].
>>>
>>>
>>> Links:
>>> ------
>>> [1] http://deweylab.biostat.wisc.edu/rsem/ [1]
>>> [2] http://groups.google.com/group/rsem-users [2]
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [2].
>
>
> Links:
> ------
> [1]
> http://www.google.com/url?q75http%3A%2F%2Fdeweylab.biostat.wisc.edu%2Frsem%2F46sa75D46sntz75146usg75AFQjCNEU-wRL_aNCO7ziCWa-Y12BbbySXw
> [2] http://groups.google.com/group/rsem-users
> [3] http://deweylab.biostat.wisc.edu/rsem/
Reply all
Reply to author
Forward
0 new messages