strand specificity for paired-end read alignments

1,204 views
Skip to first unread message

Oleg Moskvin

unread,
Mar 19, 2013, 11:55:07 AM3/19/13
to rsem-...@googlegroups.com
Hi Bo and Colin,

I've noticed that when --strand-specific  and --paired-end options are used with rsem-calculate-expression, the command sent internally to Bowtie contains --norc Bowtie option.

According to the manual, it is supposed to be exactly like that. And the purpose of the --strand-specific option, according to the manual, is to handle the cases when "the RNA-Seq protocol used to generate the reads is strand specific", so analyzing strand-specfic paired-end raw data with RSEM implies using the combination of those options, right?

However, using --norc with paired-end data (in a typical case when reads in the pair belong to opposite strands) results in dramatic drop (6.8 times in my case, from 49.9% to 7.3%) in read mappability. My understanding is the option is applied to both reads in each pair and this conflicts with the common (for Illumina GA) mapping of the first read to forward and the second - to the reverse strand. I've found that running --fr Bowtie option for this kind of data (instead of --norc) restores the read mappability.

What do you think about introducing a special treatment of pair-end'ness and strand-specificity in rsem-calculate-expression?

The observation is made with RSEM-1.2.3

Thanks,

Oleg

Colin Dewey

unread,
Mar 19, 2013, 12:55:58 PM3/19/13
to rsem-...@googlegroups.com
Hi Oleg,

Apologies, the interface for specifying these cases is perhaps not as clean and straightforward as it should be.  Currently, --strand-specific means that the first mate (or the single-end reads) must map to the *forward* strand.  If you wish to use data for which the first mate must map to the *reverse* strand you can specify this with the option:

--forward-prob 0

This modifies the orientation parameter of the RSEM model and forces the use of the --nofw option to Bowtie.  If you specify the --forward-prob option, you should not also give the --strand-specific option.  --strand-specific is equivalent to --forward-prob 1.

Best,
Colin

--
You received this message because you are subscribed to the Google Groups "RSEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Oleg Moskvin

unread,
Mar 20, 2013, 9:05:42 AM3/20/13
to rsem-...@googlegroups.com
Hi Colin,

Thank you for the crystal clear explanation! 

Best,

Oleg

Brian Haas

unread,
Mar 20, 2013, 12:04:38 PM3/20/13
to rsem-...@googlegroups.com
Hi Colin,

I'm wondering if you'd consider adopting the Trinity --SS_lib_type (R,F, RF, FR) designations for strand-specificity...    I'm trying to convince the Tuxedo group to do this as well.

--forward-prob 0 would correspond to our RF designation for paired-end reads, which is what corresponds to the dUTP approach to strand-specific library preparation.

best,

-brian

--
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

 

Colin Dewey

unread,
Mar 20, 2013, 2:38:55 PM3/20/13
to rsem-...@googlegroups.com
Hi Brian,

We'll definitely consider this for the next minor release of RSEM.  And it would be good to agree on a standard with the Tuxedo group as well; keep us posted on that.  Currently, the --rf option to Bowtie is quite different from Trinity's RF library type.

Also, any suggestions on the options we use for this would be welcome.  Currently, we have the --paired-end and --strand-specific options, and an easy addition would be a new --library-type option that specifies the orientation for the strand-specific protocols (replacing the --forward-prob option).  It would be annoying to have to specify all three options at once though.  We could make it so that specifying --library-type automatically implies --strand-specific (since it does) and also implies --paired-end if FR or RF are specified.  In that case, we could remove the --strand-specific option and be left with the following possible configurations:

(no options): single-end, non-strand-specific data
--paired-end: paired-end, non-strand-specific data
--library-type F: single-end, forward-strand-specific 
--library-type R: single-end, reverse-strand-specific 
--library-type FR: paired-end, mate1-foward-strand, mate2-reverse strand
--library-type RF: paired-end, mate2-foward-strand, mate1-reverse strand

In all cases, we require (as we do now) that the 5' end of a read is always at one end of the cDNA fragment.

It would be even more convenient if we could further merge the --paired-end and --library-type options into a single option, but then I am at a loss as to how to specify a paired-end, non-strand-specific protocol.

Best,
Colin

Brian Haas

unread,
Mar 20, 2013, 2:43:18 PM3/20/13
to rsem-...@googlegroups.com
Hi Colin,

Your suggested:

(no options): single-end, non-strand-specific data
--paired-end: paired-end, non-strand-specific data
--library-type F: single-end, forward-strand-specific 
--library-type R: single-end, reverse-strand-specific 
--library-type FR: paired-end, mate1-foward-strand, mate2-reverse strand
--library-type RF: paired-end, mate2-foward-strand, mate1-reverse strand

would be almost perfect, in my opinion.

I'd just suggest keeping --library-type and --paired-end separate.

In this case, --paired-end just indicates that... it's PE data.

If it's strand-specific, folks would be required to set the --library-type parameter.

Then, I'd just check to make sure that the --library-type setting matches up with the --paired-end setting, so that users can't do something like specify 'F' for --library-type with --pared-end invoked.  That's basically what we do in the Trinity wrapper.

best,

-brian


Colin Dewey

unread,
Mar 20, 2013, 2:49:42 PM3/20/13
to rsem-...@googlegroups.com
Hi Brian,

Just to clarify then, you are suggesting that only the following would be valid option sets:

(no options): single-end, non-strand-specific data
--paired-end: paired-end, non-strand-specific data
--library-type F: single-end, forward-strand-specific 
--library-type R: single-end, reverse-strand-specific 
--paired-end --library-type FR: paired-end, mate1-foward-strand, mate2-reverse strand
--paired-end --library-type RF: paired-end, mate2-foward-strand, mate1-reverse strand

So even though --library-type XX implies paired-end data, we would require that the --paired-end option be used as well just to double-check that the user knows what he/she is doing?

Thanks,
Colin

Brian Haas

unread,
Mar 20, 2013, 3:07:42 PM3/20/13
to rsem-...@googlegroups.com
Yes, that's right. I'd just check to make sure there's no conflict, in the case where someone has strand-specific data and their library type specification didn't match up.

I prefer to think about these two options separately, even if the latter one contains information that could infer the former.

best,

-brian

Oleg Moskvin

unread,
Mar 21, 2013, 11:24:54 AM3/21/13
to rsem-...@googlegroups.com
Hi Brian, Colin,

I see danger here.

As Colin mentioned, RF and rf are two different things. We are actually dealing with 2 types of phenomena: 1) sequencing technology-dependent relative orientation of the reads in the pair and 2) library protocol-dependent strandness of the first read. They need to be addressed separately.

Real-world example with Bowtie alignments of one library:

with no strand-specificty and the default RSEM options for Bowtie (-n 2 -e 99999999 -l 25 -I 1 -X 1000 -m 200)
Bowtie --fr: 49.9% mapped
Bowtie --rf: 0.7% mapped
Bowtie --ff 0.1% mapped

This tells us that the relative orientation of the reads is fr that is expected from Illumina reads. It has nothing to do with particular strand specificity of the library.
Indeed, if we map the same library with --forward-prob 1 , only 7.3% reads are mapped, and with --forward-prob 0 we have 42.8% mapped (marginally less than in the case of a strans-non-specific option)

So, we have a) a dirctional library where the first read represents the reverse strand of the mRNA and b) the sequencing technology where reads are in fr relative orientation.
I am new to all of this and just jumping in; so my 2-days old  message (the original message of the thread) was de facto messing up rf and RF, sorry about that.

Since we have two separate phenomena, it may be a good idea to address them separately in the RSEM options.

Having only 2 options for paired-end -

"
--paired-end --library-type FR: paired-end, mate1-foward-strand, mate2-reverse strand
--paired-end --library-type RF: paired-end, mate2-foward-strand, mate1-reverse strand
"
actually locks us to Illumina paired-end libraries. I am referring to the message from 08-06-2010, 02:09 PM in http://seqanswers.com/forums/showthread.php?t=6317 ,  for a brief illustration.

How about leaving --forward-prob in place because it is a nice case of having one option that represents one physical phenomenon (library strandness, that also covers both SE and PE protocols) and introducing an extra option that is responsible for the relative orientation in the reads to cover other protocols / platforms?

So, the option you mentioned, for example:
--paired-end --library-type RF

would be

--paired-end --forward-prob 0 --orient fr for Illumina paired-end
--paired-end --forward-prob 0 --orient rf for Illumina mate pair
--paired-end --forward-prob 0 --orient ff for SOLiD

What do you think?

Thanks!

Oleg

Brian Haas

unread,
Mar 21, 2013, 12:22:43 PM3/21/13
to rsem-...@googlegroups.com
I see... I was entirely ignoring that some technologies can generate PE reads where the reads don't necessarily point towards each other.  ugh...

more later.

SinghA

unread,
Mar 13, 2014, 4:24:03 AM3/13/14
to rsem-...@googlegroups.com
Hi Oleg,

How did you specify the Bowtie --fr option in RSEM or did you run bowtie independently? I believe I may be having the same problem as you. 
Amrit

Bo Li

unread,
Mar 21, 2014, 3:32:21 PM3/21/14
to rsem-...@googlegroups.com
Hi SinghA,

In RSEM, we always assume the first mate reads from the 5' end of a
fragment and the second mate reads from the 3' end.

Best,
Bo
>> http://seqanswers.com/forums/showthread.php?t=6317 [1] , for a brief
>> [2].
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "RSEM Users" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to rsem-users+...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out
>> [2].
>
> --
> --
> Brian J. Haas
> The Broad Institute
> http://broad.mit.edu/~bhaas [3]
>
> --
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out [2].
>
> --
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out [2].
>
> --
> --
> Brian J. Haas
> The Broad Institute
> http://broad.mit.edu/~bhaas [3]
>
> --
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out [2].
>
> --
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out [2].
>
> --
> --
> Brian J. Haas
> The Broad Institute
> http://broad.mit.edu/~bhaas [3]
>
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/ [4]
> ---
> You received this message because you are subscribed to the Google
> Groups "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users [5].
>
>
> Links:
> ------
> [1]
> http://www.google.com/url?q75http%3A%2F%2Fseqanswers.com%2Fforums%2Fshowthread.php%3Ft%3D631746sa75D46sntz75146usg75AFQjCNFALtsEn0T6zlmRK_kvIfGIQ681Bg
> [2] https://groups.google.com/groups/opt_out
> [3]
> http://www.google.com/url?q75http%3A%2F%2Fbroad.mit.edu%2F~bhaas46sa75D46sntz75146usg75AFQjCNEFPojrqaBZPNrANlLSiNue28I3EA
> [4] http://deweylab.biostat.wisc.edu/rsem/
> [5] http://groups.google.com/group/rsem-users
Reply all
Reply to author
Forward
0 new messages