Re: HW4, Question 1

12 views
Skip to first unread message

John Kang

unread,
Mar 9, 2011, 12:55:32 AM3/9/11
to computational-ge...@googlegroups.com, aparnak...@gmail.com
Could we have clarification on Problem 1, part 3 (dChip normalization
method)?

The question asks us how to implement invariant normalization but don't
we need PRD thresholds to determine whether to keep the data point in
the invariant set or to discard it? The thresholds are not given in the
homework however. According to Wing Wong paper's methods section, there
are different thresholds for the small PRD depending on the rank of the
average intensity of a point.

Additionally, when the homework says "at least 50 points for each
segment", does that mean there are 50 arrays/experiments for each
segment, i.e. the full microarray is 1000x50 for 1000 genes and 50
experiments?

I would also appreciate clarification on the previous questions.

Thanks,
John

John Kang

unread,
Mar 11, 2011, 6:24:28 PM3/11/11
to computational-ge...@googlegroups.com
Thanks for clarifying the previous questions, Aparna.

The one thing I'm still not clear I'm not sure how needing at least 50 points for each segment determines the thresholds.  The thresholds in the Wing Wong paper did not have a derivation and seemed arbitrarily determined.  In says in the paper "we chose these parameters empirically to make selected points in the invariant set thin enough to naturally determine a normalization relation."


On 3/9/2011 3:33 PM, Aparna Kumar wrote:
The question asks us how to implement invariant normalization but don't we need PRD thresholds to determine whether to keep the data point in the invariant set or to discard it?  The thresholds are not given in the homework however.  According to Wing Wong paper's methods section, there are different thresholds for the small PRD depending on the rank of the average intensity of a point.

You need to determine the thresholds.  1.3 is asking you to describe the method for finding the PDRs.  We do not give you the thresholds but we give you the parameters to find them, as your question below addresses.

 
Additionally, when the homework says "at least 50 points for each segment"

This means 50 genes for each segment of 1000 genes.

 

 

John Kang

unread,
Mar 11, 2011, 6:28:06 PM3/11/11
to computational-ge...@googlegroups.com
I just had a moment of inspiration...the 50 points refer to the last remaining 50 points in the rank-invariant set.  I guess the details for  how we get to those 50 points is up to us?

Aparna Kumar

unread,
Mar 14, 2011, 11:54:16 AM3/14/11
to computational-ge...@googlegroups.com
".. to construct the piecewise linear normalization curve "  

Think about 50 points for every 1000 genes.  

Aparna Kumar

unread,
Mar 14, 2011, 7:14:42 PM3/14/11
to computational-ge...@googlegroups.com
Hi All, a few people told me they were still confused about this question.  Let me restate:

We are telling you to construct a piecewise curve.  Each piece is about 1000 genes but it might end up being more if the previous PRD is greater than the current.  You can determine the PRDs by calculating the absolute rank difference between two arrays, then look at segments of 1000 genes.  In each segment you can draw a threshold at the 50th ranked gene and use these 50 as invariant for that segment.  

Does this clear things up?  

Aparna

Lu Xie

unread,
Mar 14, 2011, 9:23:03 PM3/14/11
to computational-ge...@googlegroups.com, Aparna Kumar
Is it possible that, in one segment of 1000 genes, we cannot find 50 PRDs?

For the actual data, I ranked the first and second column, then picked
the first 1000 genes from either ranked column, and the two sets have
125 mutual genes, in which I chose the 50 PRDs with least distance.
What if one segment of 1000 genes have less than 50 mutual genes?

Lu

Aparna Kumar

unread,
Mar 14, 2011, 9:56:21 PM3/14/11
to Lu Xie, computational-ge...@googlegroups.com
Oh, and to further clarify calculate the prds for the entire array, then look at the segments.  



On Mon, Mar 14, 2011 at 9:52 PM, Aparna Kumar <aparnak...@gmail.com> wrote:
Hmm I think your definition may be off.  If you search 'prd' in the paper you should find that it is 'the absolute rank difference...', that is the difference of the ranks between the two arrays of the same gene.

Sent from my iPhone

Aparna Kumar

unread,
Mar 15, 2011, 2:11:00 PM3/15/11
to computational-ge...@googlegroups.com
Take away line from office hours discussion - plotting for 1.3 - 

You should create a plot of the values Array2 vs the values of Array1.  Array1 will be the sorted gene values that were used to compute the prds.  Array2 should be in the corresponding label order as Array1.  You calculate a regression of the invariant genes for every segment.  Make sure your curve is continuous and monotonically non-decreasing. 

Lu Xie

unread,
Mar 15, 2011, 3:34:30 PM3/15/11
to computational-ge...@googlegroups.com, Aparna Kumar
Maybe next time, one TA designs a HW, another TA works out the
solution for the HW, then they could know how vague the questions and
where the obstacles could be :P

On Tue, Mar 15, 2011 at 2:11 PM, Aparna Kumar <apar...@andrew.cmu.edu> wrote:
> Take away line from office hours discussion - plotting for 1.3 -
> You should create a plot of the values Array2 vs the values of Array1.
>  Array1 will be the sorted gene values that were used to compute the prds.

PRDs are computed from the sorted gene values in Array1? But in the
paper and HW PRDs are found through rankings in both Array 1 & 2, if I
got it right...

>  Array2 should be in the corresponding label order as Array1.  You calculate
> a regression of the invariant genes for every segment.  Make sure your curve
> is continuous and monotonically non-decreasing.

So we need 2 figures here, one for Array1 vs. Array2, another for
segmented curve, right?

Aparna Kumar

unread,
Mar 15, 2011, 4:05:13 PM3/15/11
to computational-ge...@googlegroups.com
Lu, If you don't like the assignment let the professors know.  It was written a long time ago and is unmodified.

 
> Take away line from office hours discussion - plotting for 1.3 -
> You should create a plot of the values Array2 vs the values of Array1.
>  Array1 will be the sorted gene values that were used to compute the prds.

PRDs are computed from the sorted gene values in Array1? But in the
paper and HW PRDs are found through rankings in both Array 1 & 2, if I
got it right...

Yes, PRDs are computed with two arrays.  You can consider one as the baseline and normalize/regress the other array to this. When you plot you keep the order of one sorted array and change the order of the other array to match this.  

>  Array1 will be the sorted gene values that were used to compute the prds.

This meant that Array1 is in the order that we used (for calculating the prd wrt array2) while Array2 now has to be in Array1's order.  If you try to plot you will see why this has to be the case.  Your plot would not make sense otherwise.  

 
>  Array2 should be in the corresponding label order as Array1.  You calculate
> a regression of the invariant genes for every segment.  Make sure your curve
> is continuous and monotonically non-decreasing.

So we need 2 figures here, one for Array1 vs. Array2, another for
segmented curve, right?

One figure.  You normalize array1 against array2 and plot the piecewise curve of these points (on the graph)  

 


 
Reply all
Reply to author
Forward
0 new messages