Samples file matrix input for voom DE analysis in Trinity

49 views
Skip to first unread message

Rob W.

unread,
Sep 21, 2022, 3:11:37 PM9/21/22
to trinityrnaseq-users
Hi Brian, 

Thanks for your continued and prompt assistance for all of us Trinity users.
I am trying to use voom for pairwise DE analysis. I have been following the wiki's instructions for properly formatting a tab-delimited --samples file, listing the condition (tab) replicate name:

cond_A   cond_A_rep1
cond_A   cond_A_rep2
cond_A   cond_A_rep3
cond_B   cond_B_rep1
cond_B   cond_B_rep2
cond_B   cond_B_rep3
cond_C   cond_C_rep1
cond_C   cond_C_rep2
cond_C   cond_C_rep3
cond_D   cond_D_rep1
cond_D   cond_D_rep2
cond_D   cond_D_rep3

My dataset has 4 conditions, and each condition has 3 biological replicates, for a total of 12 samples. I would like to create pairwise DE matrices by replicate, e.g. Cond_C_rep2 vs. Cond_A_rep1, etc.

When I run voom with the run_DE_analysis.pl script with the --method voom option, I get pairwise matrices by condition, not replicate. For example, I have outputs such as:

salmon.isoform.counts.matrix.cond_C_vs_cond_D.voom.count_matrix

It seems voom only wants to compare conditions, not replicates; however, when I run edgeR with the same --samples file, it yields a matrix for each pairwise combination, which is what I want. 

I am aware of the --contrasts option, but shouldn't voom automatically read my samples file if it's formatted properly? When I run the command with my samples file above and no --contrasts option, it correctly detects each replicate (as each rep is a header in my salmon isoform counts matrix input file) then lists the contrasts it will perform: conditions, not replicates. 

Please let me know if there's something I'm not understanding. Thank you!
Rob

Mark Chapman

unread,
Sep 21, 2022, 3:42:44 PM9/21/22
to Rob W., trinityrnaseq-users
Hi Rob,
Can you just call them condA thru condL for the purposes of the all-by-all analysis? I.e. each one is a separate condition? 
Cheers, Mark 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/d257c82b-04b8-4a98-a37a-c3c2fcf4a2d7n%40googlegroups.com.

Rob W.

unread,
Sep 21, 2022, 4:36:40 PM9/21/22
to trinityrnaseq-users
Hi Mark,
Clever suggestion, but after trying that I received this error:

"Error, need multiple biological replicates for each sample in order to run voom at /storage/work/<user>/miniconda3/envs/bioinfo/bin/run_DE_analysis.pl line 578."

I received this error after running each replicate as its own condition (i.e., CondA - CondL), and I also received that error after running the samples file I listed in my original post with a --contrasts file listing each pairwise combination (12c2, 66 total pairs); after these failures, it makes me wonder, what does voom consider conditions, and what does it consider replicates? I listed each combination as rep1 (tab) rep2, but perhaps the contrasts file only pertains to conditions. However, this goes back to my original obstacle. I don't want to compare conditions pairwise; originally, voom was yielding matrices like this:

sampleA    sampleB    logFC    logCPM    PValue    FDR
TRINITY_DN16058_c0_g1_i4    0h_control    48h_control    -4.69554230791119    6.61979859251297    9.05506881654054e-08    0.00877481443666861
TRINITY_DN16058_c0_g1_i2    0h_control    48h_control    -4.48963223880393    10.4876650765091    9.61506912894022e-07    0.0208158278509136
TRINITY_DN6054_c3_g1_i3    0h_control    48h_control    -3.07217995384019    7.38191817735216    8.83919030348254e-07    0.0208158278509136


where "0h_control" and "48h_control" are two of my four conditions. 0h_control and 48h_control are not samples, but the headers sampleA and sampleB suggest otherwise. Is voom pooling all the expression across the three replicates per condition (here "sample") and comparing the pooled expression data? In the reads count matrices from this same job, the matrices show read counts by replicate in the two conditions that were paired, for example:

DE8_0    DE9_0    DE12_0    DE8_48    DE9_48    DE12_48
TRINITY_DN2117_c0_g1_i27    61    33    50    71    0    74
TRINITY_DN41339_c1_g2_i4    87    69    117    75    50    93
TRINITY_DN1207_c1_g1_i10    23    0    326    108    327    480

Where each "DE#_#" is a replicate and each of the three reps from a single treatment are listed next to each other; one treatment group is red, the other is blue. Again, here I can't get information about DE isoforms between individual replicates.

This is the crux of my problem: why might this Trinity script be detecting my replicates properly from the isoforms matrix yet pair the conditions? Is this the normal way? If so, why does edgeR not do this?

Robert

Brian Haas

unread,
Sep 21, 2022, 8:29:22 PM9/21/22
to Rob W., trinityrnaseq-users
Hi,

Yeah, the DE analysis needs to have bio replicates for each condition, otherwise it can't run the statistics as it won't know about biological variation. 

The condition-based pairwise comparisons are the way to go.


best,

~b



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Reply all
Reply to author
Forward
0 new messages