After reading some posts here and in the blog, I still have no
response for some issues regarding expression thresholds.
In one of the files available in the FTP, I found have averaged,
normalized values of the microarrays. Assuming a gene A with just one
probe set available, how do I determine, for example, in how many
tissues gene A is "detected" ?
This is an important issue, because by doing so for all the
represented genes, one can get the distribution of how many genes are
expressed per tissue and in how many tissues each gene is expressed.
An anonymous user posted a somewhat similar question about thresholds,
but I got no conclusive answer from that thread.
Thanks in advance.
Unfortunately, there is no conclusive and universal answer to this question. Every probe set has its own background signal and hybridization properties, so global thresholds for "expression" aren't very useful/reliable. Our solution is to just present the quantitative data as the chip reports them, and we leave it up to the user to assign thresholds based on their own view and their tolerance for false positives and negatives.
Hope that helps...
Cheers,
-andrew
________________________________________
From: Thiago M. Venancio [thiago....@gmail.com]
Sent: Friday, March 05, 2010 1:20 PM
To: BioGPS
Subject: gene expression dataset
Thanks for the reply.
Sorry if I am being insistent in this matter. In the PNAS 2004 paper
describing the data set, there are some results regarding the "number
of genes expressed per tissue" and "number of genes expressed in all
tissues". The second can be used for example as a proxy to find house-
keeping genes. How did you get these results without setting an
expression threshold to determine if the gene was detected in the
first place ?
Regards,
Thiago
On Mar 10, 3:05 pm, Andrew Su <A...@gnf.org> wrote:
> Hello,
>
> Unfortunately, there is no conclusive and universal answer to this question. Every probe set has its own background signal and hybridization properties, so global thresholds for "expression" aren't very useful/reliable. Our solution is to just present the quantitative data as the chip reports them, and we leave it up to the user to assign thresholds based on their own view and their tolerance for false positives and negatives.
>
> Hope that helps...
>
> Cheers,
> -andrew
>
> ________________________________________
> From: Thiago M. Venancio [thiago.venan...@gmail.com]
You're right, in our global analysis we did use exactly the type of threshold you're talking about. And I think that was valid for the type of general conclusions we were trying to draw. But I wouldn't want to extrapolate much beyond that to the point of making strong assertions that any given gene was expressed or not expressed in any given tissue. So I think the bottom line is still that you should choose your threshold (and decide whether a global threshold is appropriate) based on your particular application.
Cheers,
-andrew