Biological replicates?

624 views
Skip to first unread message

Bob Freeman

unread,
Jan 30, 2013, 3:23:02 PM1/30/13
to rsem-...@googlegroups.com
Sorry to be so naive on this, but I have two groups of samples that each have three biological replicates. For each group, how do I instruct RSEM to handle the replicates? Are replicates the comma-separated list of files specified after --paired-end or --single-end? Or should I run RSEM on each replicate individually, then use another program to analyze each output file together?

Again, my apologies, but I haven't seen anything in the online docs or man pages that I should explicitly state on the command line.

Tx,
Bob

Brian Haas

unread,
Jan 30, 2013, 3:28:01 PM1/30/13
to rsem-...@googlegroups.com, trinityrn...@lists.sf.net
Hi Bob,

I think you're supposed to get abundance estimates for each replicate separately.   Assuming you're doing this with Trinity data, I've got the attached script which will run RSEM for you by remapping the Trinity parameters to the RSEM parameters.  RSEM does all the work - alignments through abundance estimation. All the script does is remap the parameters.  Drop it in the TRINITY_HOME/util/RSEM_util/   folder.

I'm cross-posting to the Trinity list here too.

(this is probably all bad etiquette, so my apologies to the rsem folks in advance).

-b



--
You received this message because you are subscribed to the Google Groups "RSEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsem-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

 
run_RSEM_align_n_estimate.pl

Freeman, Robert M.

unread,
Jan 30, 2013, 3:31:32 PM1/30/13
to Brian Haas, rsem-...@googlegroups.com, trinityrn...@lists.sf.net
(chuckle) Thanks, Brian. You've probably guess -- I'm trying out the RSEM scripts directly since alignReads is going to be deprecated... stepping out of my comfort zone.

Yes, doing this on new Trinity data from a collaboration. I'll give this a try!

Cheers,
Bob

<run_RSEM_align_n_estimate.pl>------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan_______________________________________________
Trinityrnaseq-users mailing list
Trinityrn...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/trinityrnaseq-users

-----------------------------------------------------
Bob Freeman, Ph.D.
Acorn Worm Informatics, Kirschner lab
Dept of Systems Biology, Alpert 524
Harvard Medical School
200 Longwood Avenue
Boston, MA  02115

"Sorry I'm late. Oh, God, that sounded insincere. I'm late."
-- Karen Walker, from Will and Grace





Colin Dewey

unread,
Jan 30, 2013, 3:31:51 PM1/30/13
to rsem-...@googlegroups.com
Hi Bob,

You should run RSEM on each replicate separately and then use a differential expression package such as EBSeq (which is bundled with RSEM), DESeq, or edgeR on the output from all replicates.  This allows the differential expression methods to estimate the biological variance between replicates of the same condition, which then allows them to make more accurate predictions of the genes/transcripts that are differentially expressed.

Best,
Colin

Brian Haas

unread,
Jan 30, 2013, 3:39:18 PM1/30/13
to rsem-...@googlegroups.com, trinityrn...@lists.sf.net
The script I attached will be the new way to do it as per the upcoming Trinity release (couple weeks), and does not use the alignReads.pl script. You'll notice it's many times faster, and uses RSEM directly (which is what I should have done in the first place).

I haven't experimented with EBSeq yet, but I'm looking forward to supporting it as well.   The latest DE-supported analyses in Trinity include edgeR and DESeq, but I'll note that edgeR has *always* been very good to me, and DESeq almost always has caused me trouble (very obvious false negatives). So, pay special attention to your pairwise MA-plots to ensure you're not missing anything that should obviously be called as highly significant.  If you have many many bio replicates, DESeq might work just fine, but otherwise, examine the results closely (MA-plots, particularly).  Loving edgeR!:)  especially w/ bio replicates.

Brian Haas

unread,
Jan 30, 2013, 3:51:28 PM1/30/13
to rsem-...@googlegroups.com, trinityrn...@lists.sf.net
In case it's helpful, I'm collecting read count matrices from various publicly available data sets here:


so we can experiment with the various DE analysis methods and see how they compare.  The above data sets were culled from the edgeR manual, and are already formatted for easy integration with R.    

If anyone has any favorites to add, please let me know.   

(my last bad-etiquette double-post for the day).

Thanks!

-b

Ning Leng

unread,
Jan 30, 2013, 4:00:30 PM1/30/13
to rsem-...@googlegroups.com, trinityrn...@lists.sf.net
Hi Brian,

In our simulations we found that edgeR is sensitive to outliers. (e.g.
genes with expression 1, 2, 1 in condition 1 and 1, 1, 1000 in
condition2)
This kind of genes still look like significant DE in MA plots. But
actually they are EE with outlier.
On the other hand, in my understanding, current version of DESeq tried
to identify subset of genes with potential outlier before testing,
then perform more strict test on them.
Therefore, DESeq might miss the ones that are DE with outlier. And
some "good" genes might be misclassified as the group with potential
outlier.
I'm wondering that did you get a chance to look at the expressions of
the potential FN's in DESeq?

Thanks!
Ning
Ning Leng
University of Wisconsin Madison
Department of Statistics
4720 Medical Sciences Center
1300 University Avenue
Madison, Wisconsin 53706

Brian Haas

unread,
Jan 30, 2013, 4:25:50 PM1/30/13
to rsem-...@googlegroups.com, trinityrn...@lists.sf.net
Thanks, Ning!  Very good point!  I'll definitely look into it.  I need to figure out the best way to leverage DESeq, with replicates or without, and this has been an absolute struggle.

Attached is a run of DESeq on the 'classic' kidney/liver data w/o replicates, run like below.   Check out that black point in the upper right.  I would like to color it red. :)

library(DESeq)

data = read.table("/Users/bhaas/SVN/trinityrnaseq/misc_tests/test_DiffExpression/published_studies/Marioni_liver_vs_kidney/kidney_vs_liver.sum_tech_reps.matrix", header=T, row.names=1)
col_ordering = c(1,2)
rnaseqMatrix = data[,col_ordering]
rnaseqMatrix = round(rnaseqMatrix)
rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=10,]
conditions = factor(c(rep("kidney_sums", 1), rep("liver_sums", 1)))

exp_study = newCountDataSet(rnaseqMatrix, conditions)
exp_study = estimateSizeFactors(exp_study)
exp_study = estimateDispersions(exp_study, method="blind", sharingMode="fit-only")

res = nbinomTest(exp_study, "kidney_sums", "liver_sums")

 


Screen Shot 2013-01-30 at 4.22.13 PM.png

Bob Freeman

unread,
Jan 30, 2013, 5:45:27 PM1/30/13
to rsem-...@googlegroups.com
Thanks for confirming this, Colin.

I'd like to ask you a question about multi-mapped read assignment, but I'll ask on a separate thread.

(now it seems I have other messages to catch up on!)

Thanks much,
Bob

-----------------------------------------------------
Bob Freeman, Ph.D.
51 Downer Avenue, #2
Dorchester, MA  02125

"Well, I'm thankful that I found a pharmacologist who is as dumb as a box of hair."

Freeman, Robert M.

unread,
Feb 7, 2013, 9:42:08 AM2/7/13
to Brian Haas, trinityrn...@lists.sf.net, rsem-...@googlegroups.com
Just wanted to let you know that this work perfectly in my hands.

Thanks!
B

On Jan 30, 2013, at 3:28 PM, Brian Haas wrote:

<run_RSEM_align_n_estimate.pl>

-----------------------------------------------------
Bob Freeman, Ph.D.
Acorn Worm Informatics, Kirschner lab
Dept of Systems Biology, Alpert 524
Harvard Medical School
200 Longwood Avenue
Boston, MA  02115

"Sorry I'm late. Oh, God, that sounded insincere. I'm late."

Brian Haas

unread,
Feb 7, 2013, 9:45:41 AM2/7/13
to Freeman, Robert M., trinityrn...@lists.sf.net, rsem-...@googlegroups.com
Wonderful!   It'll look slightly different in the upcoming release:  more direct interface to DESeq, and auto-generation of MA and Volcano plots in pdf format.  Plus, heatmap and sample correlation plots in pdf format.

-b

Freeman, Robert M.

unread,
Feb 7, 2013, 9:46:51 AM2/7/13
to Brian Haas, trinityrn...@lists.sf.net, rsem-...@googlegroups.com
Nice! Those will be great additions indeed.

-b

Brian Haas

unread,
Feb 7, 2013, 9:50:23 AM2/7/13
to Freeman, Robert M., trinityrn...@lists.sf.net, rsem-...@googlegroups.com
Oh... I was thinking of the run_DE_analysis.pl.   The RSEM wrapper probably hasn't changed (much) from the latest one you've used.   I do like how it's now a direct interface to RSEM.    But, alignReads is unfortunately not going away to the extent that I had wanted it to.  We'll still be using it for read-based evaluation of assembly 'completeness'.

RSEM is really great! :)    (for the RSEM list).

-b

Kenya Bioinfo

unread,
Apr 2, 2013, 12:11:30 AM4/2/13
to rsem-...@googlegroups.com

b...@cs.wisc.edu

unread,
Apr 2, 2013, 6:47:59 AM4/2/13
to rsem-...@googlegroups.com
Hi Kenya,

If your purpose is to detect differentially expressed genes, you should
run RSEM on each replicate individually and then use a DE tool like EBSeq
or edgeR, DESeq.

Best,
Bo
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/
> ---
> You received this message because you are subscribed to the Google Groups
> "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users?hl=en-US.
>
>
>


Clayton K Collings

unread,
Apr 2, 2013, 4:57:35 PM4/2/13
to rsem-...@googlegroups.com
Don't forget to round your output values from RSEM for each gene
to the nearest integer before using DESeq or EdgeR.

Frédéric Burdet

unread,
Feb 10, 2015, 10:49:55 AM2/10/15
to rsem-...@googlegroups.com
Hello Colin,

Is there a way to use the confidence intervals by transcript generated by RSEM in EBSeq? Or does EBSeq not take them into account?

Thanks in advance,

Best,

Fred

Ning Leng

unread,
Feb 10, 2015, 11:08:36 AM2/10/15
to rsem-...@googlegroups.com
Hi Fred,
The current version of EBSeq doesn't account for CI from RSEM. Please
let me know if you have any further questions.
Best,
Ning
> --
> RSEM website: http://deweylab.biostat.wisc.edu/rsem/
> ---
> You received this message because you are subscribed to the Google Groups
> "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rsem-users+...@googlegroups.com.
> To post to this group, send email to rsem-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/rsem-users.



--
Ning Leng, Ph.D
Computational Biologist / Biostatistician
Regenerative Biology Laboratory
Morgridge Institute for Research
Reply all
Reply to author
Forward
0 new messages