bowtie/bowtie2 options in the align_and_estimate_abundance.pl script

469 views
Skip to first unread message

Peter Bain

unread,
Mar 16, 2015, 8:47:38 PM3/16/15
to trinityrn...@googlegroups.com
Hi Brian/all,

I have a couple of questions about options used for bowtie/bowtie2, I wonder if anyone on the list can provide some insight.

1. For both bowtie and bowtie2, the default behaviour is to report only the best alignment for each read (or read pair). This can be overridden using -k <int> to specify the maximum number of alignments to report, or -a to report all alignments found regardless of score (provided they meet other specified criteria). Given that for transcriptome assemblies there are usually a number of putative transcript variants reported for any given gene, I would assume that in many cases there would be more than one 'best' alignment with identical scores. I guess my question is: when aligning to a list of transcripts (rather than genomic sequence) does bowtie/bowtie2 report only the best alignment per transcript, or the best alignment per transcriptome? Surely for abundance estimation we wouldn't want to restrict alignments to the single 'best' one for the whole transcriptome, which would leave tie the decision about the locus of the best alignment to the random seed.
 
2. In the align_and_estimate_abundance.pl script, bowtie2 is set to the default behaviour with respect to the above reporting options, i.e. it will report only the best alignment per read (pair). If you choose the bowtie aligner, the options include both '--all' and '--best', which according to the bowtie manual allows multiple hits to be reported and sorts the results from best to worst. Why would the options be so different? Am I missing something in the script?

Until now I have restricted my expression analyses to non-redundant ORFs, so the above issues haven't really been a problem. I am in the middle of aligning reads using bowtie2 independently of the trinity scripts, with the -a option, and got curious about how this is handled in the trinity pipeline. Any thought or info would be much appreciated.

Thanks,

Peter

Tiago Hori

unread,
Mar 16, 2015, 8:59:40 PM3/16/15
to Peter Bain, trinityrn...@googlegroups.com
Peter,

With bowtie and RSEM, which is into the trinity pipeline you report all best because RSEM uses an expectation maximization model to attempt to sort which of those alignments is the "real" one. I am not sure why bowtie2 default to reporting the best alignment, so I will let Brian and other comment on that.

As a side note, in pair end and/or stranded mode, RSEM will only consider proper oriented paired mappings.

T.

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Mar 16, 2015, 9:24:04 PM3/16/15
to Tiago Hori, Peter Bain, trinityrn...@googlegroups.com
It looks like the bowtie2 defaults are actually missing an important parameter: -k

In RSEM, this is set to -k 200

I'll update the code for the next release.

For Bowtie-1, it includes --strata, which ensures that only best matching tied alignments are reported.

thanks for pointing this out!

~brian

--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Peter Bain

unread,
Mar 16, 2015, 9:44:46 PM3/16/15
to trinityrn...@googlegroups.com, tiag...@me.com, peter....@gmail.com
Thanks for the reply Brian, glad I could contribute in some small way. If I was to use eXpress for the abundance estimation would -k 200 be appropriate also?

Peter


On Tuesday, March 17, 2015 at 11:54:04 AM UTC+10:30, Brian Haas wrote:
It looks like the bowtie2 defaults are actually missing an important parameter: -k

In RSEM, this is set to -k 200

I'll update the code for the next release.

For Bowtie-1, it includes --strata, which ensures that only best matching tied alignments are reported.

thanks for pointing this out!

~brian

On Mon, Mar 16, 2015 at 8:59 PM, Tiago Hori <tiag...@me.com> wrote:
Peter,

With bowtie and RSEM, which is into the trinity pipeline you report all best because RSEM uses an expectation maximization model to attempt to sort which of those alignments is the "real" one. I am not sure why bowtie2 default to reporting the best alignment, so I will let Brian and other comment on that.

As a side note, in pair end and/or stranded mode, RSEM will only consider proper oriented paired mappings.

T.

Sent from my iPhone

On Mar 16, 2015, at 9:47 PM, Peter Bain <peter....@gmail.com> wrote:

Hi Brian/all,

I have a couple of questions about options used for bowtie/bowtie2, I wonder if anyone on the list can provide some insight.

1. For both bowtie and bowtie2, the default behaviour is to report only the best alignment for each read (or read pair). This can be overridden using -k <int> to specify the maximum number of alignments to report, or -a to report all alignments found regardless of score (provided they meet other specified criteria). Given that for transcriptome assemblies there are usually a number of putative transcript variants reported for any given gene, I would assume that in many cases there would be more than one 'best' alignment with identical scores. I guess my question is: when aligning to a list of transcripts (rather than genomic sequence) does bowtie/bowtie2 report only the best alignment per transcript, or the best alignment per transcriptome? Surely for abundance estimation we wouldn't want to restrict alignments to the single 'best' one for the whole transcriptome, which would leave tie the decision about the locus of the best alignment to the random seed.
 
2. In the align_and_estimate_abundance.pl script, bowtie2 is set to the default behaviour with respect to the above reporting options, i.e. it will report only the best alignment per read (pair). If you choose the bowtie aligner, the options include both '--all' and '--best', which according to the bowtie manual allows multiple hits to be reported and sorts the results from best to worst. Why would the options be so different? Am I missing something in the script?

Until now I have restricted my expression analyses to non-redundant ORFs, so the above issues haven't really been a problem. I am in the middle of aligning reads using bowtie2 independently of the trinity scripts, with the -a option, and got curious about how this is handled in the trinity pipeline. Any thought or info would be much appreciated.

Thanks,

Peter

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Mar 16, 2015, 10:01:34 PM3/16/15
to Peter Bain, trinityrn...@googlegroups.com, Tiago Hori
Hi Peter,

From looking at the eXpress documentation, it appears the recommended settings are:

  # recommended eXpress params: http://bio.math.berkeley.edu/eXpress/faq.html                                                                                        

    # -a -X 600 --rdg 6,5 --rfg 6,5 --score-min L,-.6,-.4 --no-discordant --no-mixed                                                                                   

  

I thought I had gotten my settings from an earlier example in the documentation, but I can't find it now.

Note, from looking at the RSEM code, it's defaults would be:

--dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -k 200

I suppose the proper thing to do would be to use their recommended settings. 

I'll put some more thought into this, in addition to some more testing, and update our defaults for the next release.

best,

~brian

Peter Bain

unread,
Mar 16, 2015, 10:14:10 PM3/16/15
to trinityrn...@googlegroups.com, peter....@gmail.com, tiag...@me.com
Thank you Brian, that's exactly what I was looking for.

(note to self: read the docs for ALL steps of the process :) )

Brian Haas

unread,
Mar 16, 2015, 10:20:04 PM3/16/15
to Peter Bain, trinityrn...@googlegroups.com, Tiago Hori
well.... this 'wrapper', although well-intended, somewhat lacks proper execution.  I'll put some more thought into it and rectify it soon,

best,

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages