The question asks us how to implement invariant normalization but don't
we need PRD thresholds to determine whether to keep the data point in
the invariant set or to discard it? The thresholds are not given in the
homework however. According to Wing Wong paper's methods section, there
are different thresholds for the small PRD depending on the rank of the
average intensity of a point.
Additionally, when the homework says "at least 50 points for each
segment", does that mean there are 50 arrays/experiments for each
segment, i.e. the full microarray is 1000x50 for 1000 genes and 50
experiments?
I would also appreciate clarification on the previous questions.
Thanks,
John
The question asks us how to implement invariant normalization but don't we need PRD thresholds to determine whether to keep the data point in the invariant set or to discard it? The thresholds are not given in the homework however. According to Wing Wong paper's methods section, there are different thresholds for the small PRD depending on the rank of the average intensity of a point.
You need to determine the thresholds. 1.3 is asking you to describe the method for finding the PDRs. We do not give you the thresholds but we give you the parameters to find them, as your question below addresses.
Additionally, when the homework says "at least 50 points for each segment"
This means 50 genes for each segment of 1000 genes.
For the actual data, I ranked the first and second column, then picked
the first 1000 genes from either ranked column, and the two sets have
125 mutual genes, in which I chose the 50 PRDs with least distance.
What if one segment of 1000 genes have less than 50 mutual genes?
Lu
Hmm I think your definition may be off. If you search 'prd' in the paper you should find that it is 'the absolute rank difference...', that is the difference of the ranks between the two arrays of the same gene.
Sent from my iPhone
On Tue, Mar 15, 2011 at 2:11 PM, Aparna Kumar <apar...@andrew.cmu.edu> wrote:
> Take away line from office hours discussion - plotting for 1.3 -
> You should create a plot of the values Array2 vs the values of Array1.
> Array1 will be the sorted gene values that were used to compute the prds.
PRDs are computed from the sorted gene values in Array1? But in the
paper and HW PRDs are found through rankings in both Array 1 & 2, if I
got it right...
> Array2 should be in the corresponding label order as Array1. You calculate
> a regression of the invariant genes for every segment. Make sure your curve
> is continuous and monotonically non-decreasing.
So we need 2 figures here, one for Array1 vs. Array2, another for
segmented curve, right?
> Take away line from office hours discussion - plotting for 1.3 -
> You should create a plot of the values Array2 vs the values of Array1.PRDs are computed from the sorted gene values in Array1? But in the
> Array1 will be the sorted gene values that were used to compute the prds.
paper and HW PRDs are found through rankings in both Array 1 & 2, if I
got it right...
> Array2 should be in the corresponding label order as Array1. You calculateSo we need 2 figures here, one for Array1 vs. Array2, another for
> a regression of the invariant genes for every segment. Make sure your curve
> is continuous and monotonically non-decreasing.
segmented curve, right?