TMM normalized expression matrix

Chris Brauer

unread,

Aug 3, 2017, 10:57:21 PM8/3/17

to trinityrnaseq-users

Hi Guys,

I am running a few analyses comparing my "raw" Trinity assembly with an assembly generated with an extra clustering step using with Corset. I am trying to perform a TMM normalization on my Corset counts matrix equivalent to that performed by the Trinity abundance_estimates_to_matrix.pl. I had thought that the run_TMM_normalization_write_FPKM_matrix.pl would do what I wanted but when I tested it using my Trinity counts the normalized expression values are only weakly correlated with the values generated by the abundance_estimates_to_matrix.pl script. What is the difference between the normalization methods used by abundance_estimates_to_matrix.pl and run_TMM_normalization_write_FPKM_matrix.pl? In case the methods are not similar, is there a script to perform the TMM-normalization on a raw counts matrix?

Thanks
Chris Brauer

Brian Haas

unread,

Aug 4, 2017, 8:47:52 AM8/4/17

to Chris Brauer, trinityrnaseq-users

Hi Chris,

The abundance_estimates_to_matrix.pl script actually uses the 'util/support_scripts/run_TMM_scale_matrix.pl' script to perform TMM normalizaton on the TPM expression matrix. Here, the TPM values are directly extracted from the results of the estimation tool (ie. RSEM, kallisto, or salmon).

The run_TMM_normalization_write_FPKM_matrix.pl script is what we used to use a while ago, which performs TMM normalization on the count data, then recomputed FPKM values based on the TMM-adjusted counts.

Since expression quant tools now regularly output FPKM and TPM values directly, it made more sense to use those values rather than to recompute them.

The values from either should be highly correlated even though they're distinct.

I see the value of using Corset to do the transcript clustering into 'gene' groupings, but I wouldn't necessarily endorse it for doing any quantification - I'd stick with kallisto, salmon, or rsem for that.

I'm happy to take a look at any data to help clarify things further.

best,

~brian

> --
> You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
> To post to this group, send email to trinityrn...@googlegroups.com.
> Visit this group at https://groups.google.com/group/trinityrnaseq-users.
> For more options, visit https://groups.google.com/d/optout.

--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Chris Brauer

unread,

Aug 7, 2017, 10:27:33 PM8/7/17

to trinityrnaseq-users, pygmy...@gmail.com

Thanks Brian!

The values from each method are highly correlated at low expression but gradually become less so for higher values which, I guess makes some sense.

"I see the value of using Corset to do the transcript clustering into 'gene' groupings, but I wouldn't necessarily endorse it for doing any quantification - I'd stick with kallisto, salmon, or rsem for that."

I agree with you here, we are just exploring the difference between Trinity and Corset 'genes' in response to a reviewer.

Thanks
Chris

Reply all

Reply to author

Forward