Fwd: Question 1 clarification

4 views
Skip to first unread message

John Kang

unread,
Mar 6, 2011, 10:34:00 PM3/6/11
to computational-ge...@googlegroups.com
I had a few questions about the first problem on the homework.

What does "plot the distributions of normalized array 1 (X) vs.
normalized array 2(Y) following each normalization" mean? What is array
1 and 2?

Does "rank-based normalization method" refer to quantile normalization
from the lecture slides?

On slide 27 of Lecture 14, I wasn't sure what we are supposed to do
after sorting and finding the row means of the array...it doesn't seem
like the "normalized" column has sufficient information to recreate the
distributions seen in slide 28 for example, since the information has
been sorted already?

I'm at Biophysical Society and my hotel wifi is extremely bad (probably
too many people using it) so my apologies for not being able to look up
the complete quantile normalization method myself.

Thanks,
John

Jose Juan Tapia

unread,
Mar 7, 2011, 1:09:10 AM3/7/11
to computational-ge...@googlegroups.com
I guess I can answer the first one....

Each of the columns in the input file represent a different array. It should be something like plot column one vs column two.

--
José Juan Tapia Valenzuela
Carnegie Mellon -- University of Pittsburgh
Ph.D. Program in Computational Biology

John Kang

unread,
Mar 7, 2011, 3:43:26 PM3/7/11
to computational-ge...@googlegroups.com
Thanks, Jose, that makes sense.

I'm assuming that rank-based is referring to quantile normalization, but this raises the point of how to plot array 1 vs 2 for quantile normalization since it seems that all the arrays get converted to an average, sorted array?  Plotting array 1 vs array 2 would just be a straight line.

I'm going by the quantile normalization example in the lecture 14 and by Wikipedia:

http://en.wikipedia.org/wiki/Quantile_normalization
To quantile normalize two or more distributions to each other, without a reference distribution, sort as before, then set to the average (usually, arithmetical mean) of the distributions. So the highest value in all cases becomes the mean of the highest values, the second highest value becomes the mean of the second highest values, and so on.

-John

Han Lai

unread,
Mar 10, 2011, 7:38:32 PM3/10/11
to computational-ge...@googlegroups.com, computational-ge...@googlegroups.com
So the rank based plot array1 and 2 will just be a straight line? And another method won't?

Kind Regards,

Aparna Kumar

unread,
Mar 11, 2011, 2:56:55 PM3/11/11
to computational-ge...@googlegroups.com, John Kang
Hi all, apologies for the delay.  I should get these emails real time from now on and I'll be sure to respond to the group.  
 
What does "plot the distributions of normalized array 1 (X) vs. normalized array 2(Y) following each normalization" mean? What is array 1 and 2?

Yup, Jose has it right.  Also, if you read 1.1 it says "The rows correspond to genes and the columns correspond to the array experiments.  The rows correspond to genes and the columns correspond to the array experiments."   :P


 
Does "rank-based normalization method" refer to quantile normalization from the lecture slides?

Yes.
  
On slide 27 of Lecture 14, I wasn't sure what we are supposed to do after sorting and finding the row means of the array...it doesn't seem like the "normalized" column has sufficient information to recreate the distributions seen in slide 28 for example, since the information has been sorted already?


Ok so you sort, you find the means, if you plot the sorted mean vs the sorted mean they yes you would get a straight line.  However what you need to do is now replace the original values in the unsorted arrays with the respective means we have found for those values.  So now think about when you might get a straight line.  It is possible but it would be under a special case...  (This should answer Han's question too)

Hope this helps,
Aparna

Aparna Kumar

unread,
Mar 11, 2011, 3:20:15 PM3/11/11
to computational-ge...@googlegroups.com
..it doesn't seem like the "normalized" column has sufficient information to recreate the distributions seen in slide 28 for example, 

You do not need to recreate the normalized densities on slide 28 - instead create a scatter plot of one normalized array vs the other.  
Reply all
Reply to author
Forward
0 new messages