Centered log2 (fpkm+1)

2,466 views
Skip to first unread message

Suejin Park

unread,
May 6, 2015, 8:44:32 PM5/6/15
to trinityrn...@googlegroups.com
Hi everyone,
I have some questions about cluster plots.
After runing 'difine_clusters_by_cutting_tree.pl', it generates cluster plots. My question is Y-axis, why its parameter is 'centered log2(fpkm+1)'? I understand the values are log1-transformed, but I don't know it is centered and (fpkm+1). 

The followings are my questions
1) centered??
According to Trinity manual, each transcript's expression values are centered by the median value, so what does it mean median of?  How to calculate it?

2) fpkm+1
Why not using just fpkm? Why is fpkm added by 1?

3) In a cluster plots, there are gray lines and a blue line. I think gray lines mean each trasscript's patterns, but what is the blue line? Is it average value or median?

Thank you,

Suejin

Brian Haas

unread,
May 7, 2015, 3:31:23 AM5/7/15
to Suejin Park, trinityrn...@googlegroups.com
Hi,

Responses below:

On Thu, May 7, 2015 at 10:44 AM, Suejin Park <sjpar...@gmail.com> wrote:
Hi everyone,
I have some questions about cluster plots.
After runing 'difine_clusters_by_cutting_tree.pl', it generates cluster plots. My question is Y-axis, why its parameter is 'centered log2(fpkm+1)'? I understand the values are log1-transformed, but I don't know it is centered and (fpkm+1). 

The followings are my questions
1) centered??
According to Trinity manual, each transcript's expression values are centered by the median value, so what does it mean median of?  How to calculate it?

Each gene's set of expression values (in log2(fpkm+1)) is centered by subtracting the median.  This enables genes to be analyzed according to their relative expression across the different samples, rather than taking account the relative intensity within each sample.
 

2) fpkm+1
Why not using just fpkm? Why is fpkm added by 1?

This is because some fpkm values are zero and you can't take log(0). log2(fpkm+1) is a rather common thing to do.
 

3) In a cluster plots, there are gray lines and a blue line. I think gray lines mean each trasscript's patterns, but what is the blue line? Is it average value or median?


The blue line is the average across all genes in that cluster.

 
Thank you,

Suejin

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Mauricio Losilla

unread,
Jul 21, 2018, 2:21:48 PM7/21/18
to trinityrnaseq-users

I was just looking at this and I have two questions/observations regarding this (these are for Trinity/v2.6.6):

1) Brian's reply and Trinity's manual say expression values are median centered, but I am under the impression that they are actually mean centered: the R script diffExpr.P0.001_C2.matrix.R
reads:

______
# Centering rows
data = t(scale(t(data), scale=F))
______

And the R documentation for scale {base} says:
____
default: center = TRUE
If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns
____


If indeed trinity is performing mean-centering, which one is best (mean or median)?


2) Expression values. Trinity's documentation, output plots, and filenames say that the pre-transformed expression values are fpkm, but I think they are TMM values. This is a minor thing, of course.


Cheers,
Mau






On Thursday, May 7, 2015 at 3:31:23 AM UTC-4, Brian Haas wrote:
Hi,

Responses below:

On Thu, May 7, 2015 at 10:44 AM, Suejin Park <sjpar...@gmail.com> wrote:
Hi everyone,
I have some questions about cluster plots.
After runing 'difine_clusters_by_cutting_tree.pl', it generates cluster plots. My question is Y-axis, why its parameter is 'centered log2(fpkm+1)'? I understand the values are log1-transformed, but I don't know it is centered and (fpkm+1). 

The followings are my questions
1) centered??
According to Trinity manual, each transcript's expression values are centered by the median value, so what does it mean median of?  How to calculate it?

Each gene's set of expression values (in log2(fpkm+1)) is centered by subtracting the median.  This enables genes to be analyzed according to their relative expression across the different samples, rather than taking account the relative intensity within each sample.
 

2) fpkm+1
Why not using just fpkm? Why is fpkm added by 1?

This is because some fpkm values are zero and you can't take log(0). log2(fpkm+1) is a rather common thing to do.
 

3) In a cluster plots, there are gray lines and a blue line. I think gray lines mean each trasscript's patterns, but what is the blue line? Is it average value or median?


The blue line is the average across all genes in that cluster.

 
Thank you,

Suejin

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages