Re: Bowtie 2 alignment to Trinity reference

2,144 views
Skip to first unread message

Brian Haas

unread,
Feb 21, 2012, 9:29:20 AM2/21/12
to Eshita Sharma, trinityrnaseq-users, rsem-...@googlegroups.com
Hi Eshita, CCing rsem-users@google,

I don't think RSEM is capable of handling gapped alignments yet, in
which case it wouldn't be compatible with the bowtie2 alignments.
Once RSEM does handle gaps, I think bowtie2 will be the appropriate
utility to leverage here.

best,

-brian


On Tue, Feb 21, 2012 at 9:21 AM, Eshita Sharma
<eshita...@tuebingen.mpg.de> wrote:
> Hi,
>
> I have used trinity for de novo assembly of a complex vertebrate
> transcriptome where we dont have a genome. I  tried a couple of aligners
> with the trinity assembled reference and found bowtie 2 to give me highest
> alignments. Is there a way to use the Bowtie 2 sam output for direct hit
> count as opposed to bowtie based RSEM quantification which does not support
> gapped alignment?
>
> Best
> Eshita
>
>
> Eshita Sharma
> PhD Student
> Department 6
> Max Planck Institute for Developmental Biology
> Phone : +49 7071 601 1406
>
> eshita...@tuebingen.mpg.de
>
>
>
>
>

--
--
Brian J. Haas
Manager, Genome Annotation and Analysis Research and Development
The Broad Institute
http://broad.mit.edu/~bhaas

Brian Haas

unread,
Feb 21, 2012, 9:48:40 AM2/21/12
to Eshita Sharma, trinityrnaseq-users, rsem-...@googlegroups.com
OK, but you really need to run RSEM in order to get the more accurate
abundance estimation. In Trinity, you'll have multiple best-hits to
isoforms and artifacts, and it's only through RSEM that you'll get a
better estimate of which transcript assembly is most relevant.

best,

-b

On Tue, Feb 21, 2012 at 9:41 AM, Eshita Sharma
<eshita...@tuebingen.mpg.de> wrote:
> Hi Brian,
>
> Thanks for the input.
> Will stick to the direct read count of best hits till further development in
> RSEM then.
>
> Best
> Eshita

Colin Dewey

unread,
Feb 21, 2012, 3:32:46 PM2/21/12
to Eshita Sharma, rsem-...@googlegroups.com, trinityrnaseq-users
Hi Eshita,

You are correct that RSEM does not currently handle gapped alignments. This is a feature that we hope to add in the near future. However, I recommend that people use caution when allowing gapped alignments, as it is much easier to produce incorrect alignments when gaps are allowed. So you will certainly increase the number of alignments by allowing gaps, but many of those may be false positives. I suggest thinking about what types indels you are trying to handle here: are they (1) indels introduced by the sequencing process or (2) indels that are the result of your reference transcript set not exactly matching the transcript set of your sample? From what I've heard (1) is not a major concern if you are using Illumina sequencing. (2) is certainly a valid concern, but if you are assembling a transcript set from your sample reads with Trinity, then your reference transcript set should match the sample transcript set very well. I'd be interested to hear if you had other concerns that I have not considered here.

Best,
Colin

archana bhardwaj

unread,
Aug 9, 2013, 2:28:19 AM8/9/13
to rsem-...@googlegroups.com, Eshita Sharma, trinityrnaseq-users
Hello everyone 

I am new to NGS . I have paired end sequence data for differential gene expression analysis. First i did the prepossessing of my data such as adapter , ambiguous character removal and quality filtration . Now i want to assemble my data in form of contigs. I run trininty.pl to get the assembled reads in form of  contigs. There are so many output files . What is the exact contig files??? Inchworm contigs (InchwormK25.fasta or bundled.fasta ???  How can i get the read count of each contig ??? 

waiting for reply

liz wright

unread,
Jan 20, 2014, 5:02:51 PM1/20/14
to rsem-...@googlegroups.com, Eshita Sharma, trinityrnaseq-users
Hi all, I have a similar situation to Eshita in so far as I get significantly better mapping results with Bowtie2 and also BWA-mem than earlier versions of either program.  I am using cufflinks currently for transcript abundance estimation but would really really like to run RSEM as well.  Any idea when RSEM version that handles gapped alignments will be released?  I have illumina data and Trinity assembly.... I think the quality of my raw reads is good (I QC'd and trimmed them as well) so I am not hugely worried about indels.  For the record, my assembly is de novo, so I am mapping reads back to my assembly.

Thanks, Liz Wright

bli

unread,
Jan 20, 2014, 5:08:09 PM1/20/14
to rsem-...@googlegroups.com
Hi Liz,

Why do you think that bowtie2 will have significantly better mapping
results than bowtie for RSEM's usage?

In RSEM, we only let the aligner align the first 25bp with at most 2
mismatches and ignore the quality scores. Then it is RSEM's duty to
assess the quality of each alignment. In this setting, I do not think
bowtie and bowtie2 will be different much.

Best,
Bo
>>>> http://broad.mit.edu/~bhaas [1]
>>>
>>>
>>> Eshita Sharma
>>> PhD Student
>>> Department 6
>>> Max Planck Institute for Developmental Biology
>>> Phone : +49 7071 601 1406
>>>
>>> eshita...@tuebingen.mpg.de
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> --
>> Brian J. Haas
>> Manager, Genome Annotation and Analysis Research and Development
>> The Broad Institute
>> http://broad.mit.edu/~bhaas [1]
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [3].
>
>
> Links:
> ------
> [1]
> http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
> [2] http://deweylab.biostat.wisc.edu/rsem/
> [3] http://groups.google.com/group/rsem-users

liz wright

unread,
Jan 20, 2014, 8:42:39 PM1/20/14
to rsem-...@googlegroups.com

Hi Bo, I don't know but the alignment stats do vary significantly for bowtie (PE read mapping ~45%) and bowtie2 (PE read mapping ~71%). This is similar for BWA vs BWA mem.

--- You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rsem-users+unsubscribe@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.

bli

unread,
Jan 20, 2014, 10:39:06 PM1/20/14
to rsem-...@googlegroups.com
Hi Liz,

I see. You mean mapping paired-end reads. That is possible.

In fact, you can use bowtie2 with RSEM, provided that you disable
bowtie2's indel and local alignments. Here is the bowtie2 parameters I
recommend to set:

### Bowtie 2

# Build indices

bowtie2-build reference.fa reference_name

# Align reads
# We assume single-end data

bowtie2 --very-sensitive \
# preset options
--dpad 0 --gbar read_length \
# alignment options
--mp 1,1 --np 1 --score-min L,0,-0.1 \
# scoring options
# -I 0 -X 1000 --no-mixed --no-discordant \
# paired-end options
-k 201 \
# reporting options
-p 8 \
# performance options
-x reference-name -U reads.fq -S output.sam
# main arguments

comments: Alignment options make sure that there is no gapped
alignments. In scoring options, '--mp 1,1' means mismatch penalty is 1,
'--np 1' means the penalty of aligning a 'N' base is 1, and '--score
-min L,0,-0.1' is a function for the threshold to report an alignment,
it is f(x) = -0.1x, where x is the read length. The scoring option
basically ignore quality scores and '--score-min' set the maximum ratio
of mismatches allowed. Paired-end options are for paired-end reads only
(thus commented). '--no-mixed' and '--no-discordant' make sure that RSEM
can parse the resulting SAM file.

The '-k 201' reporting option tells bowtie2 to report at most 201
alignments per reads. Since bowtie2 does not '-m' option as it has in
bowtie1. What I can do here is set '-k 201' and then recommend users to
write their own scripts to ** filter ** out any reads having more than
200 alignments.

I have tested the above commends. The resulting SAM file can be parsed
by RSEM correctly. Please note that you still use
'rsem-prepare-reference' to build RSEM indices (enable '--no-bowtie' to
avoid generating bowtie indices).

If '--very-sensitive' is still not sensitive, you can adjust '-D 20 -R 3
-N 0 -L 20 -i S,1,0.50' accordingly. For example, change -N 0 to -N 1.

Best,
Bo
>> Phone : +49 7071 601 1406 [1]
>>
>> eshita...@tuebingen.mpg.de
>>
>> --
>> --
>> Brian J. Haas
>> Manager, Genome Annotation and Analysis Research and Development
>> The Broad Institute
>> http://broad.mit.edu/~bhaas [2] [1]
>
> Eshita Sharma
> PhD Student
> Department 6
> Max Planck Institute for Developmental Biology
> Phone : +49 7071 601 1406 [1]
>
> eshita...@tuebingen.mpg.de
>
> --
> --
> Brian J. Haas
> Manager, Genome Annotation and Analysis Research and Development
> The Broad Institute
> http://broad.mit.edu/~bhaas [2] [1]
>
>  --
>  RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3] [2]
>  ---
>  You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
>  To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
>  To post to this group, send email to rsem-...@googlegroups.com.
>  Visit this group at http://groups.google.com/group/rsem-users [4]
> [5]
> [2] http://deweylab.biostat.wisc.edu/rsem/ [3]
> [3] http://groups.google.com/group/rsem-users [4]
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]
> --- You received this message because you are subscribed to a topic
> in the Google Groups "RSEM Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
> [6].
> To unsubscribe from this group and all its topics, send an email to
> rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [4].
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [4].
>
>
> Links:
> ------
> [1] tel:%2B49%207071%20601%201406
> [2] http://broad.mit.edu/~bhaas
> [3] http://deweylab.biostat.wisc.edu/rsem/
> [4] http://groups.google.com/group/rsem-users
> [5]
> http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
> [6]
> https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe

liz wright

unread,
Jan 21, 2014, 9:20:01 AM1/21/14
to rsem-...@googlegroups.com

That's for this detailed guide. I did previously use local alignment flag for mapping but otherwise followed rsem manual for preparing non bowtie assembly.  Can try this though i may lose mapping efficiency without local parameter. Liz


  To post to this group, send email to rsem-...@googlegroups.com.
  Visit this group at http://groups.google.com/group/rsem-users [4]
[3].

 Links:
 ------
 [1]

http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
[5]
 [2] http://deweylab.biostat.wisc.edu/rsem/ [3]
 [3] http://groups.google.com/group/rsem-users [4]

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]
 --- You received this message because you are subscribed to a topic
in the Google Groups "RSEM Users" group.
 To unsubscribe from this topic, visit
https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
[6].
 To unsubscribe from this group and all its topics, send an email to

 To post to this group, send email to rsem-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/rsem-users [4].

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]
 ---
 You received this message because you are subscribed to the Google
Groups "RSEM Users" group.
 To unsubscribe from this group and stop receiving emails from it,
--- You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rsem-users+unsubscribe@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.

liz wright

unread,
Jan 21, 2014, 11:10:57 AM1/21/14
to rsem-...@googlegroups.com
I meant to say *thanks* for this detailed guide Bo.  Any idea if RSEM will be some day modified to accept gapped alignments?  Liz

liz wright

unread,
Jan 21, 2014, 4:03:39 PM1/21/14
to Lassance, Jean-Marc, <rsem-users@googlegroups.com>, Eshita Sharma, trinityrnaseq-users
Hello all -- Jean-Marc thanks for the input but here's the thing: I don't see the reason to keep using Bowtie or BWA instead of the newer version(s), which are giving me better results, unless someone can give me a compelling reason that this better mapping is somehow flawed or wishful thinking.

In the meantime I have done read trimming, though not with trim galore.  I use seqtk, and based on fastQC report either hard trim my reads from 3' end or use the seqtk algorithm (based off of phred scores) to trim.

Best, Liz


On Tue, Jan 21, 2014 at 3:01 PM, Lassance, Jean-Marc <lass...@fas.harvard.edu> wrote:
Hi all, 

In my experience, the overall mapping rates with Bowtie(1) increased after I started using Trimgalore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) to pre-process paired-end data. Especially there is one option (trim1) that seems to contribute a lot to the increase of mapping efficiency. 



-t/--trim1              Trims 1 bp off every read from its 3' end. This may be needed for FastQ files that
                        are to be aligned as paired-end data with Bowtie. This is because Bowtie (1) regards
                        alignments like this:

                          R1 --------------------------->     or this:    ----------------------->  R1
                          R2 <---------------------------                       <-----------------  R2

                        as invalid (whenever a start/end coordinate is contained within the other read).


Best, 

JM

----------------------------------------------------------
Jean-Marc Lassance, PhD

Harvard University
Department of Organismic and Evolutionary Biology
Department of Molecular and Cellular Biology
Museum of Comparative Zoology

26, Oxford Street
Cambridge MA 02138
USA

lass...@fas.harvard.edu






------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
Trinityrnaseq-users mailing list
Trinityrn...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/trinityrnaseq-users


Bob Freeman

unread,
Jan 21, 2014, 4:21:05 PM1/21/14
to liz wright, Lassance, Jean-Marc, trinityrnaseq-users, Eshita Sharma, <rsem-users@googlegroups.com>
A 2013 paper by Hatem et al. is worth the read, comparing the merits of Bowtie vs Bowtie2. I think the former outperformed the latter in most cases.

Hatem et al. BMC Bioinformatics 2013, 14:184

Cheers,
Bob
-----------------------------------------------------
Bob Freeman, Ph.D.
Acorn Worm Informatics, Kirschner lab
Dept of Systems Biology, Alpert 524
Harvard Medical School
200 Longwood Avenue
Boston, MA  02115

@DevBioInfoGuy

Filemaker 10 Certified Developer

"Sorry I'm late. Oh, God, that sounded insincere. I'm late."
-- Karen Walker, from Will and Grace




liz wright

unread,
Jan 21, 2014, 4:41:17 PM1/21/14
to Bob Freeman, Lassance, Jean-Marc, trinityrnaseq-users, Eshita Sharma, <rsem-users@googlegroups.com>
Thanks for reference.  Interestingly, I have significantly higher mapping percentage using Bowtie2 vs Bowtie and BWA-mem (not reported on in paper) vs. BWA.  I should probably mention that I would expect a certain percentage of my transcripts to have high polymorphism, i.e. those coding for toxins (I am working on the venom duct transcriptome of a marine snail) as the toxins are numerous (as many as 200 transcripts) and under strong positive selection.  In theory increasing the number of mismatches parameter could address this but it has never really improved my mapping (surprising, and contrary to what is stated in this paper).  For whatever reason, allowing local or gapped alignment seems to make the difference.

Shawn Driscoll

unread,
Feb 5, 2014, 2:54:56 AM2/5/14
to rsem-...@googlegroups.com, Eshita Sharma, trinityrnaseq-users
Indeed I have lately been realizing that sometimes the "best aligner" is case specific.  However if you're using RSEM you have to throw out all of your gapped/soft clipped alignments anyways.  In my experience STAR and bowtie2 (and probably bowtie1) are going to produce better alignments for these EM algorithms than BWA. I haven't directly compared the performance of bowtie1 vs 2 for RSEM. I have compared bowtie2 and STAR for use with eXpress and they perform nearly identically with the right settings. It was a little tricky to get the right parameters for STAR.

liz wright

unread,
Feb 5, 2014, 4:10:43 AM2/5/14
to <rsem-users@googlegroups.com>, Eshita Sharma, trinityrnaseq-users
Ultimately I have decided to use my Bowtie2 and BWA-mem read mapping results and go with both Cufflinks and eXpress for FPKM analysis.  I fiddled somewhat endlessly with Bowtie parameters to see if I could get better mapping, and the only thing that generated a very modest improvement was adjusting the frag length parameter (the default is strangely low).  I should have FPKM results shortly for comparison to each other across contigs of interest, and it will be very interesting to see what kind of consistency is obtained.  I would have liked to use RSEM as an additional tool for analysis, but after repeated attempts, it does not seem to be panning out.  Liz


--
RSEM website: http://deweylab.biostat.wisc.edu/rsem/
---
You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rsem-users+...@googlegroups.com.

Bo Li

unread,
Feb 5, 2014, 9:43:04 PM2/5/14
to rsem-...@googlegroups.com
Hi Liz,

What makes you decide not to use RSEM? If you do not care about gapped alignments in fact RSEM with compatible with bowtie 2 aligner. I have sent the right parameters of bowtie2 for RSEM. If you need, I can send them to you again.

Best,
Bo



You received this message because you are subscribed to the Google Groups "RSEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.

Bo Li

unread,
Feb 5, 2014, 9:45:15 PM2/5/14
to rsem-...@googlegroups.com
Hi Liz,

To me, it seems that bowtie2 is better at finding paired-end read alignments than bowtie 1. Perhaps handling gapped alignments is not a big deal for your case.

Best,
Bo

Dinesh Heisnam

unread,
Feb 6, 2014, 10:21:55 AM2/6/14
to rsem-...@googlegroups.com

hi brian

i am new to trinity. Recently i used it to assemble transcripts from six different libraries pooled together (Same sample with different treatment). Although the size of the pooled reads is a bit large 180 GB for R1&2, it has been completed successfully without any error.  When i tried to quantify it with RSEM, except for 2 library all are giving 85-90 read alignment. For the 2 libraries alignment is merely 4% and 12%. I run the alignment once again and it is giving the same result. 
I even check the read quality and it is good.

kindly suggest what went wrong here.

liz wright

unread,
Feb 6, 2014, 11:00:54 AM2/6/14
to <rsem-users@googlegroups.com>
Hi Bo, 

Originally I used the local parameter for Bowtie2 alignment and as I recall this is not going to work with RSEM.  I can redo the alignment one more time using Bowtie2 and the parameters you suggest, but I actually do think gapped alignments allow the read mappers to return a higher percentage of mapped reads in my case.  I am by interested by, but do not fully understand the relationship of indels and gapped alignments in read mapping outcomes.  In an effort to get bowtie to perform better I tweaked the n/l and then also v parameters but that did pretty much nothing to improve the alignment.  In general I think I have low indel quotient in my reads, as they are illumina hi seq and it was a good run.  Other items of interest here are that I used seqtk to trim my reads (using default phred algorithm) and also assembled with trinity normalized data, though I am mapping back to the assembly wiith the full set of reads (~289 mil) to get an idea of expression levels. Since it's a lot of reads, mapping over and over again gets to be a bit time consuming but I may give it one more shot, since it's pretty interesting to see what happens and I think relevant to the quality of the assembly.  Still, it would be nice if RSEM could handle gapped alignments... perhaps that is in the works, or not, if you have some philosophical or mathematical reason against it.   

Cheers, Liz

Bo Li

unread,
Feb 6, 2014, 2:24:15 PM2/6/14
to rsem-...@googlegroups.com
Hi Liz,

Thanks for deciding to try it one more time. One thing you should notice is that a higher mapping rate does not necessarily mean a better estimation result. In particular, I'm not aware of any quantification tools will take advantage of local alignments (In a generative model, you need to assign all non-alignable bases as mismatches, which makes the local alignment actually a low quality alignment). 

For RSEM's application, if you have single-end reads, I do not think bowtie2 will have any advantage. However, if you plan to align paired-end reads to your assembly, bowtie2 seems better than bowtie. Alternatively, you can try just align the first mates of your reads using bowtie, which will give you a much higher mapping rate than aligning paired-end reads. 

Currently, RSEM does not support indel alignments just because I do not have time to implement it... Hopefully, we can get this feature implemented soon.

Best,
Bo

Shawn Driscoll

unread,
Feb 6, 2014, 5:46:16 PM2/6/14
to rsem-...@googlegroups.com
Liz I was wondering if you also tried increasing the -e option in bowtie with -n (not -v). With that set high enough the aligner literally will return hits where only the seed aligned (-l length). Id think it would return tons of alignments at that point.

Also a comment on gaps. I know for a fact that bwa and bowtie2 will return gapped alignments instead of proper alignments in some cases. I've seen this using simulated data that has no indels and only single base errors. Ideally you should try getting all alignments while disallowing gaps and then allow gaps on the unaligned reads (using a multi stage alignment process).

liz wright

unread,
Feb 7, 2014, 1:26:53 PM2/7/14
to <rsem-users@googlegroups.com>
Thanks Shawn but I am definitely done with bowtie for this assembly.  I think allowing alignments based on seed length alone is not a genuine alignment and may defeat the purpose.  I am going to try the parameters outlined by Bo for bowtie2 and will report back!  Liz


On Thu, Feb 6, 2014 at 5:46 PM, Shawn Driscoll <lispw...@gmail.com> wrote:
Liz I was wondering if you also tried increasing the -e option in bowtie with -n (not -v). With that set high enough the aligner literally will return hits where only the seed aligned (-l length). Id think it would return tons of alignments at that point.

Also a comment on gaps. I know for a fact that bwa and bowtie2 will return gapped alignments instead of proper alignments in some cases. I've seen this using simulated data that has no indels and only single base errors. Ideally you should try getting all alignments while disallowing gaps and then allow gaps on the unaligned reads (using a multi stage alignment process).

Shawn Driscoll

unread,
Feb 7, 2014, 8:10:53 PM2/7/14
to rsem-...@googlegroups.com
Bo-

I thought I'd mention in here that I've been testing these bowtie2 parameters in my benchmarkings and I've found the RSEM outputs from bowtie2 alignments to be slightly better than from the default bowtie1 alignments (with PE100 reads).  This is also true of eXpress though eXpress is still not producing results as good as RSEM in my tests.  I guess I'm not surprised as bowtie2 seems to consistently perform better at finding correct alignments (in simulations) compared to every other aligner I've tested so far.

Thanks for the

Bo Li

unread,
Feb 7, 2014, 10:45:03 PM2/7/14
to rsem-...@googlegroups.com
Hi Shawn,

Thanks for sharing this information to me. Perhaps we should document
the suggested bowtie2 parameters in RSEM in the future. By the way, have
you filtered out reads that aligns to more than 200 positions? I guess
that if you just keep them there, RSEM's performance should be still the
same. But I haven't tested it.

Thanks,
Bo
>>>> http://broad.mit.edu/~bhaas [1] [2] [1]
>>>
>>> Eshita Sharma
>>> PhD Student
>>> Department 6
>>> Max Planck Institute for Developmental Biology
>>> Phone : +49 7071 601 1406 [1]
>>>
>>> eshita...@tuebingen.mpg.de
>>>
>>> --
>>> --
>>> Brian J. Haas
>>> Manager, Genome Annotation and Analysis Research and Development
>>> The Broad Institute
>>> http://broad.mit.edu/~bhaas [1] [2] [1]
>>>
>>> --
>>> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2] [3] [2]
>>> ---
>>> You received this message because you are subscribed to the Google
>>
>>> Groups "RSEM Users" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to rsem-users+...@googlegroups.com.
>>> To post to this group, send email to rsem-...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/rsem-users [3]
>> [4]
>>> [5]
>>> [2] http://deweylab.biostat.wisc.edu/rsem/ [2] [3]
>>> [3] http://groups.google.com/group/rsem-users [3] [4]
>>>
>>> --
>>> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2] [3]
>>> --- You received this message because you are subscribed to a
>> topic
>>> in the Google Groups "RSEM Users" group.
>>> To unsubscribe from this topic, visit
>>>
>> https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
>> [5]
>>> [6].
>>> To unsubscribe from this group and all its topics, send an email
>> to
>>> rsem-users+...@googlegroups.com.
>>> To post to this group, send email to rsem-...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/rsem-users [3]
>> [4].
>>>
>>> --
>>> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [2] [3]
>>> ---
>>> You received this message because you are subscribed to the Google
>>
>>> Groups "RSEM Users" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to rsem-users+...@googlegroups.com.
>>> To post to this group, send email to rsem-...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/rsem-users [3]
>> [4].
>>>
>>>
>>> Links:
>>> ------
>>> [1] tel:%2B49%207071%20601%201406
>>> [2] http://broad.mit.edu/~bhaas [1]
>>> [3] http://deweylab.biostat.wisc.edu/rsem/ [2]
>>> [4] http://groups.google.com/group/rsem-users [3]
>>> [5]
>>>
>>
> http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
>> [4]
>>> [6]
>>>
>> https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
>> [5]
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [6]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [3].
> [2]
> http://www.google.com/url?q75http%3A%2F%2Fdeweylab.biostat.wisc.edu%2Frsem%2F46sa75D46sntz75146usg75AFQjCNEU-wRL_aNCO7ziCWa-Y12BbbySXw
> [3] http://groups.google.com/group/rsem-users
> [4]
> http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
> [5]
> https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
> [6] http://deweylab.biostat.wisc.edu/rsem/

Shawn Driscoll

unread,
Feb 8, 2014, 4:13:51 PM2/8/14
to rsem-...@googlegroups.com
Bo,

I did not filter those out but when aligning real data reads that align more than even 100 times seem to be very rare. I guess the impact of allowing those reads through will vary depending on what species and gene models are being used. Its probably a bigger problem with short SE reads than PE100.

I've done these transcriptome alignment simulations where I make reads from random transcripts and then align allowing 200 or maybe it was 500 alignments. Then I'd check the alignments to see how many of the true alignments were recovered within the mass of multi alignments. Between star, bowtie2, Gem and bwa it was consistently bowtie2 that came out on top. So if by random chance bowtie2 is returning the correct alignment for reads that can align more than 200 times, which are rare, then there is probably not much to worry about, right?

Bo Li

unread,
Feb 8, 2014, 4:39:08 PM2/8/14
to rsem-...@googlegroups.com
Hi Shawn,

Yes, you're right. Even if bowtie2 does not return the correct
alignment, RSEM has a built-in noise transcript, which might grasp most
of read weight from the false alignments. So maybe filtering or not is
not a big deal.

P.S., it is interesting to hear that bowtie2 performs best for
paired-end reads among the aligners you have compared. Thanks for
sharing this with me!

Best,
Bo

Brian Haas

unread,
Feb 8, 2014, 7:50:34 PM2/8/14
to rsem-...@googlegroups.com
Hi Bo,

I'm sure you're already considering this, but it would be useful to have a --bowtie2 option (or similar effect) to run RSEM with your recommended bowtie2 parameters.

best regards,

~brian



--- You received this message because you are subscribed to the Google Groups "RSEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+unsubscribe@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.



--
--
Brian J. Haas

liz wright

unread,
Feb 9, 2014, 12:51:13 PM2/9/14
to <rsem-users@googlegroups.com>
While we are on the subject, I think there is a strong argument to be made of supporting BWA-mem input to RSEM.  You all my find this graph of interest, comparing FPKM correlation from my Trinity assembly of marine snail venom duct tissue.  http://imageshack.com/a/img36/2040/mzjx.jpg


You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.

To unsubscribe from this group and all its topics, send an email to rsem-users+...@googlegroups.com.

Shawn Driscoll

unread,
Feb 10, 2014, 7:37:07 PM2/10/14
to rsem-...@googlegroups.com
So that looks like bowtie2 and bwa "mem" end up producing similar results with eXpress.  I've seen that as well comparing STAR and bowtie2 with eXpress - the outputs are extremely similar.  eXpress performs slightly worse with bowtie1 and so far as I have seen it does even worse with bwa "backtrack" at least at the isoform level.

Out of curiosity - what alignment options did you use for MEM to use the alignments with eXpress?  Are you using single-end reads?
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.
Visit this group at http://groups.google.com/group/rsem-users.



--
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

 

liz wright

unread,
Feb 10, 2014, 7:49:14 PM2/10/14
to <rsem-users@googlegroups.com>
Hi Shawn,  all my alignments using paired end reads.  I may have mentioned this before but these are 2x100 illumina hi seq, 289 mil reads from one lane.  I did my assembly on Trinity normalized data (massive reduction of the dataset to about ~10 mil reads).  One of these days I may be tempted to perform the assembly on the full set of reads as I have now acquired access to a server with 1TB of RAM.  In the meantime I am doing the mapping with the full set of reads to get an idea of expression levels of the toxins in the venom duct.  I am getting pretty consistent fpkms for the putative toxin transcripts, which is my main interest.  I'm not really intending to base a paper around read mapping but one does get kind of sucked in! 

liz wright

unread,
Feb 10, 2014, 8:47:19 PM2/10/14
to <rsem-users@googlegroups.com>

Also Bo allow me to say that i have not yet had the chance to try your suggested bowtie 2 parameters for use with rsem yet but it's definitely on my list. Will report back once I have it.

liz wright

unread,
Feb 14, 2014, 1:27:27 PM2/14/14
to <rsem-users@googlegroups.com>
Hi all, I just wanted to report back on my mapping results using the BT2 suggested parameters with use for RSEM, with results as follows:

~/Bowtie2_BLi_param$ bowtie2 --very-sensitive --dpad 0 --gbar 100 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 0 -X 1000 --no-mixed --no-discordant -k 201 -p 8 -x Trinity -1 ~/s1_pe_05tr.fq -2 ~/s2_pe_05tr.fq -S Trinanil_BT2_bli.sam
140071556 reads; of these:
  140071556 (100.00%) were paired; of these:
    90265955 (64.44%) aligned concordantly 0 times
    28566026 (20.39%) aligned concordantly exactly 1 time
    21239575 (15.16%) aligned concordantly >1 times
35.56% overall alignment rate

As you can see this is not good -- in fact it is as poor or worse than regular Bowtie.  When I add the --local parameter to BT2 mapping I get an increase of reads concordantly aligned to ~70%

When I use BWA-mem and default parameters I also get paired end read mapping of slighly > 70%

Best, Liz


  To post to this group, send email to rsem-...@googlegroups.com.
  Visit this group at http://groups.google.com/group/rsem-users [4] [5]
 [2] http://deweylab.biostat.wisc.edu/rsem/ [3]
 [3] http://groups.google.com/group/rsem-users [4]

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]

 --- You received this message because you are subscribed to a topic
in the Google Groups "RSEM Users" group.
 To unsubscribe from this topic, visit
https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
[6].

 To unsubscribe from this group and all its topics, send an email to

 To post to this group, send email to rsem-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/rsem-users [4].

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]

 ---
 You received this message because you are subscribed to the Google
Groups "RSEM Users" group.
 To unsubscribe from this group and stop receiving emails from it,

 To post to this group, send email to rsem-...@googlegroups.com.
--- You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rsem-users+unsubscribe@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.

Colin Dewey

unread,
Feb 14, 2014, 2:37:51 PM2/14/14
to rsem-...@googlegroups.com
Hi Liz,

The fact that you get such a big increase in % alignment when moving to local alignment suggests that there may be something peculiar about your reads.  For example, perhaps a large number of your reads have adapter (or other foreign) sequences at either end.  You may want to take a look at those reads that don't align with a global alignment but do align with a local alignment; the question I would ask here is: what parts of the reads are generally not aligning in the local alignments?

Of course, one can always achieve higher % alignment by only requiring a local alignment, allowing indels, or further increasing the number of mismatches allowed.  So these results are not necessarily surprising.  But the large difference in the results for global vs. local alignment might suggest that there is something systematically odd about the read sequences.

Colin

To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.

liz wright

unread,
Feb 14, 2014, 2:48:50 PM2/14/14
to <rsem-users@googlegroups.com>
Thanks Colin. I used trimmomatic to get rid of adapter sequence so hopefully this is not a problem but from what I understand this can be an imperfect process.  With respect to Bowtie, I have altered n/l and alternatively v parameters in a number of incarnations with very small gains to aligment, though I have not tampered that much with -l (seed length).  One thing I sort of wonder about is I did my assembly on digitally normalized reads (using Trinity in silico normalization) but now I am mapping with the full set of reads.  This may not be the best way to go about things.  As I have recently gained access to a server with 1TB of mem I am currently running the full set of 289 mil reads using 40 CPU so hopefully this will be done soon.  I can also do what you suggest and parse out which reads are aligning under what parameters.

Liz


To unsubscribe from this group and all its topics, send an email to rsem-users+...@googlegroups.com.

Colin Dewey

unread,
Feb 14, 2014, 2:54:08 PM2/14/14
to rsem-...@googlegroups.com
Hi Liz,

I would likely be informative to do an alignment with only those reads that remained after the digital normalization.  Would that be possible?

Colin

liz wright

unread,
Feb 14, 2014, 2:57:11 PM2/14/14
to <rsem-users@googlegroups.com>
Yes I have done that... it's buried somewhere deeply in the past.  The alignment numbers were higher though not outstanding (around 45% as I recall but without any tweaking at all).  I can either find or recreate.  Will let you know.

Shawn Driscoll

unread,
Feb 14, 2014, 3:22:49 PM2/14/14
to rsem-...@googlegroups.com
It sounds to me like Trinity had a hard time coming up with consensus sequences from your reads.   If your data that you built the assembly with has such a difficult time aligning back to it that tells me Trinity had to put a lot of "not so similar" sequences together and make some hard decisions on consensus bases. 

Liz one more thing that might be interesting and possibly insightful to try is to align with bowtie2 --local but also specify --gbar 1000 to disable gaps.  This makes it so bowtie2 does 1/2 of a local alignment (ungapped but allows soft-clipping). 

I'd be more comfortable using the results of that alignment than one that's clipped and gapped though my thinking is probably biased towards aligning to a known annotation. In your case it's likely the aligner has to clip and gap the reads to compensate for what Trinity assembled. So the root of this issue, in my thinking, is the Trinity assembly and that of course could circle back around to your reads - but as long as the experiment went well and your base quals are good I'd bet the sequencing data is fine. 

You know, now that I'm thinking about it, Trinity is going to suffer at assembling low-expressed features which may include alternative isoforms of genes and other copies of genes. These elements may be present in your reads but fail to build out into transcripts in Trinity.  It would make sense for parts of reads that belong to isoforms or copies of genes that were not assembled to align to the features that were assembled but need some clipping.  I'd also expect that Tinitiy doesn't nail the full extents of the 3' and 5' ends of transcripts so you're likely losing data there when you align - data that would align when soft-clipping is allowed (bwa mem and bowtie2 --local do this).

One thing I'd warn of is when you use these "local" alignment modes you're giving the aligner a lot of flexibility in producing false alignments.  For example I've been interested in benchmarking the parsing of mixed-species RNA-Seq data so as a test I might take some reads from a Human sample and align them to a Rat sample.  With an aligner like bwa mem I can sometimes get > 80% of the human data to align to the rat but if I disable gaps that percentage drops severely.  Once I'm down to no gaps and no clipping (by the way mem can't do this so I use bowtie2 at this point) the mapping percentage is often only about 1% or less meaning the reads aligning are those that are mapping to genes shared between the species which is expected.  So while my example is not really related to what you're doing the point is you should be a little suspicious of those "local" alignments especially when non-local alignments produce a significantly smaller percentage of successful mappings.

It might be interesting to do a couple additional subsets of your reads and re-build Trinity assemblies.  This would at least clue you into whether or not you're initial set is really representative of the data as a whole and how much variance there may be in your reads.  It might also be interesting to force Trinity to use longer 'mers'.  Maybe that would cut down on its tendency to generate sequences that are so divergent from your data.
 send an email to rsem-users+...@googlegroups.com.
  To post to this group, send email to rsem-...@googlegroups.com.
  Visit this group at http://groups.google.com/group/rsem-users [4] [5]
 [2] http://deweylab.biostat.wisc.edu/rsem/ [3]
 [3] http://groups.google.com/group/rsem-users [4]

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]

 --- You received this message because you are subscribed to a topic
in the Google Groups "RSEM Users" group.
 To unsubscribe from this topic, visit
https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe
[6].

 To unsubscribe from this group and all its topics, send an email to
rsem-users+...@googlegroups.com.
 To post to this group, send email to rsem-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/rsem-users [4].

 --
 RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [3]

 ---
 You received this message because you are subscribed to the Google
Groups "RSEM Users" group.
 To unsubscribe from this group and stop receiving emails from it,
send an email to rsem-users+...@googlegroups.com.
 To post to this group, send email to rsem-...@googlegroups.com.

--
RSEM website: http://deweylab.biostat.wisc.edu/rsem/
--- You received this message because you are subscribed to a topic in the Google Groups "RSEM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rsem-users/l9vzo4HGSeQ/unsubscribe.
Message has been deleted
Message has been deleted

Shawn Driscoll

unread,
Feb 14, 2014, 5:27:05 PM2/14/14
to rsem-...@googlegroups.com
Liz,

It may even be useful to cut your /1 reads in 1/2 and align a SE 50 version of the reads.  You can do that with bowtie2's -3 option to trim from the 3' end.


On Fri, Feb 14, 2014 at 1:24 PM, liz wright <lzwr...@gmail.com> wrote:
yes, I hope this was not a really dumb mistake on my part when it comes to read mapping.  I have 215399 contigs for the whole assembly (min contig size 60), 75503 contigs with cutoff 300 bp and 46930 contigs with cutoff 500 bp.


On Fri, Feb 14, 2014 at 4:04 PM, liz wright <lzwr...@gmail.com> wrote:
One other thought.... when I do my assembly I change the min-contig size to 60bp.  I know this is very short, but this is venom gland tissue and some of the peptides are themselves very short... mature alpha toxins that target nicotinic receptors can be 20AA / 60 bp in length (though the genetic precursor which includes at least a signal sequence is typically at least 100 bp and generally quite a bit longer, though still peptidic).  In any event I suppose it would be rather hard for 100 bp paired end reads to align effectively to short contigs.  Any insight regarding what is the shortest contig length that will allow paired reads to align?  I believe my frag insert size is ~350 bp (from bioanalyzer trace) so can't imagine PE reads aligning effectively to anything shorter than 500 (give or take) bp contigs?  This also may make an argument, if I want to see what is going on with shorter contigs, for aligning just the /1 files.



--
bf8126ddd9d91a5812a21c4b15b6f22c

liz wright

unread,
Feb 14, 2014, 6:35:29 PM2/14/14
to <rsem-users@googlegroups.com>
I'll try it all.  but one of these days I have to quit read mapping and write this damn paper ;)

liz wright

unread,
Feb 14, 2014, 6:47:49 PM2/14/14
to <rsem-users@googlegroups.com>
I've quality checked, adapter trimmed, read trimmed,, assembled with multiple assemblers, read mapped with multiple read mappers, assessed fpkms, blasted against every database known to man, assigned gene ontology terms, kegg terms, graphed whatever I can, have done everything but run it through the laundry machine.  perhaps it's time to move on :)

Shawn Driscoll

unread,
Feb 14, 2014, 7:31:59 PM2/14/14
to rsem-...@googlegroups.com
Yeah.  I think you get a rise out of us on this subject because many of us like to tinker with this kind of thing independently of actual application to a project. I'm a bioinformatician for a lab so I tinker with this kind of thing for researchers so they don't have to and I don't have to write papers - just methods sections.  My gut feeling is that if the reads you used to make assemblies in Trinity don't align very well to those assemblies then you need a better assembly.  Perhaps that 1TB server is your answer in that regard.  The other possibility is there's lots of noise in your data and your assembly is fine.  Plus when you're talking about 30% of your data - how many reads is that? If you're successfully aligning over 10 or 20 million reads per sample to your assembly (and finding expression of a significant percentage of sequences that reads can align to based on the sequence lengths) then you're already getting robust expression information - enough to produce reliable DE results.

liz wright

unread,
Feb 14, 2014, 8:18:26 PM2/14/14
to <rsem-users@googlegroups.com>
well it's a pretty fascinating thing to tinker with and I do want to have confidence that I have a decent assembly rather than just throwing something out there and saying "this is it!" Since I am working with 289 mil reads, quite in excess of what is really required for your average transcriptome assembly, 30% is, well, you do the math, but plenty really.  and if I actually do the mapping with the digitally normalized reads I used for the assembly the the percentage is higher. also, the good news here is I have identified a respectable number of putative toxins through blasting alone, not counting some incipient results from a in house regex based program that looks for canonical toxin features.  I also have significant level of blast hits from more generalized GO investigations against NR etc.  All that said, I will be interested to see how Trinity assembly of full set of reads performs.

Shawn Driscoll

unread,
Feb 14, 2014, 8:38:36 PM2/14/14
to rsem-...@googlegroups.com
Yeah, I think you're gonna be alright. :)

Srijani Sridhar

unread,
Jun 26, 2014, 10:52:41 AM6/26/14
to rsem-...@googlegroups.com
Hey Brian,

I was wondering if I could make RSEM use bowtie2 instead of bowtie and here's what I did:
I simply renamed all my bowtie2 files to look like bowtie files.
This seems to work.
Please tell me if you think this is a crazy thing to do.

Regards,
Srijani
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.

To post to this group, send email to rsem-...@googlegroups.com.
Visit this group at http://groups.google.com/group/rsem-users.

Brian Haas

unread,
Jun 26, 2014, 11:23:44 AM6/26/14
to rsem-...@googlegroups.com
Hi Srijani,

When running it with bowtie2, the parameters are much different, so it will involve more than just changing the name of the program.

Best thing is to run the latest rsem directly with the bowtie2 option.  If you want to run the Trinity wrapper around it, there's corresponding bowtie2 and rsem parameters to use.

best,

~brian

Reply all
Reply to author
Forward
0 new messages