Fwd: Question about 1.3,4,5

4 views
Skip to first unread message

Murat Can Cobanoglu

unread,
Mar 15, 2011, 7:15:16 PM3/15/11
to computational-ge...@googlegroups.com


Begin forwarded message:

From: Aparna Kumar <aparnak...@gmail.com>
Date: March 15, 2011 7:11:18 PM EDT
To: Murat Can Cobanoglu <mu...@pitt.edu>
Subject: Re: Question

I think you are on the right track... Think about how you could fit a line and make it piecewise continuous. Write down the equation for 1.4 first and force the second line to connect to the first line by considering the largest x value where the first line is defined. Also would you mind posting this to googlegroups? I think others would benefit.

Sent from my iPhone

On Mar 15, 2011, at 6:16 PM, Murat Can Cobanoglu <mu...@pitt.edu> wrote:

I would fit a line on the 50 invariant genes in each segment. When you fit a line to 50 points, it automatically has a specified intercept. 

This is what I understood from mails and the paper. Am I supposed to do something else? 

On Mar 15, 2011, at 3:28 PM, Aparna Kumar wrote:

Well, how did you decide on the intercepts of each line segment?



On Tue, Mar 15, 2011 at 3:07 PM, Murat Can Cobanoglu <mu...@pitt.edu> wrote:
Aparna

After calculating line fits to the invariant sets of every segment, the ends of the line segments will not connect. How do we overcone this problem?

-MCC





Sent from my iPhone


Murat Can Cobanoglu, MSc
PhD Student
CMU-Pitt Computational Biology PhD Program







Murat Can Cobanoglu, MSc
PhD Student
CMU-Pitt Computational Biology PhD Program






Lu Xie

unread,
Mar 15, 2011, 8:45:15 PM3/15/11
to computational-ge...@googlegroups.com, Murat Can Cobanoglu
For Q1.5, "List the PRDs", I guess it means the 66 thresholds, not
66*50 PRDs, right?

Lu

Murat Can Cobanoglu

unread,
Mar 15, 2011, 10:11:24 PM3/15/11
to computational-ge...@googlegroups.com
I don't think I correctly understood what I should list. For every segment, I created a scatter plot of the genes that I chose for the invariant set. Consequently, I have 66 such plots. Is this correct?

Will Foran

unread,
Mar 15, 2011, 11:37:03 PM3/15/11
to computational-ge...@googlegroups.com
I think we should only have one graph: array1 as the x, array2 as y. normalized to the  curve created using the 66 prd thresholds.  like c in the figure of the paper (and lecture notes).

How to construct the curve though, I don't know.

Lai Han

unread,
Mar 16, 2011, 1:26:43 AM3/16/11
to Computational Genomics 2011Spring
I am still confused. first we get the abs rank difference/65000, then
sort. then get 65 segment of 1000 genes, if we select 65 PRDs for each
segment to select 50 genes. then would they all be different? more
correctly, decreasing? Then how are the PRDs monotonically non
decreasing?

On Mar 15, 11:37 pm, Will Foran <will.fo...@gmail.com> wrote:
> I think we should only have one graph: array1 as the x, array2 as y.
> normalized to the  curve created using the 66 prd thresholds.  like c in the
> figure of the paper (and lecture notes).
>
> How to construct the curve though, I don't know.
>
> On Tue, Mar 15, 2011 at 10:11 PM, Murat Can Cobanoglu <m...@pitt.edu> wrote:
>
>
>
>
>
>
>
> > I don't think I correctly understood what I should list. For every segment,
> > I created a scatter plot of the genes that I chose for the invariant set.
> > Consequently, I have 66 such plots. Is this correct?
>
> > On Mar 15, 2011, at 8:45 PM, Lu Xie wrote:
>
> > > For Q1.5, "List the PRDs", I guess it means the 66 thresholds, not
> > > 66*50 PRDs, right?
>
> > > Lu
>
> > > On Tue, Mar 15, 2011 at 7:15 PM, Murat Can Cobanoglu <m...@pitt.edu>
> > wrote:
>
> > >> Begin forwarded message:
>
> > >> From: Aparna Kumar <aparnakumar...@gmail.com>
> > >> Date: March 15, 2011 7:11:18 PM EDT
> > >> To: Murat Can Cobanoglu <m...@pitt.edu>
> > >> Subject: Re: Question
>
> > >> I think you are on the right track... Think about how you could fit a
> > line
> > >> and make it piecewise continuous. Write down the equation for 1.4 first
> > and
> > >> force the second line to connect to the first line by considering the
> > >> largest x value where the first line is defined. Also would you mind
> > posting
> > >> this to googlegroups? I think others would benefit.
>
> > >> Sent from my iPhone
> > >> On Mar 15, 2011, at 6:16 PM, Murat Can Cobanoglu <m...@pitt.edu> wrote:
>
> > >> I would fit a line on the 50 invariant genes in each segment. When you
> > fit a
> > >> line to 50 points, it automatically has a specified intercept.
> > >> This is what I understood from mails and the paper. Am I supposed to do
> > >> something else?
> > >> On Mar 15, 2011, at 3:28 PM, Aparna Kumar wrote:
>
> > >> Well, how did you decide on the intercepts of each line segment?
>
> > >> On Tue, Mar 15, 2011 at 3:07 PM, Murat Can Cobanoglu <m...@pitt.edu>
Message has been deleted
Message has been deleted

Aparna Kumar

unread,
Mar 16, 2011, 2:25:41 AM3/16/11
to computational-ge...@googlegroups.com
Yes, the 66 thresholds.

Aparna Kumar

unread,
Mar 16, 2011, 2:27:04 AM3/16/11
to computational-ge...@googlegroups.com
I don't think I correctly understood what I should list. For every segment, I created a scatter plot of the genes that I chose for the invariant set. Consequently, I have 66 such plots. Is this correct?

Plot them on the same graph.  Then you will have one plot.

Aparna Kumar

unread,
Mar 16, 2011, 2:33:48 AM3/16/11
to computational-ge...@googlegroups.com
How to construct the curve though, I don't know.

The curve (piecewise line segments) can be calculated by regressing one set of invariant genes (from one array) onto the set of invariant from the other array.  The invariant genes are the genes less than your thresholds.  

Now read Murat's question about how to make the curve continuous... 

Aparna Kumar

unread,
Mar 16, 2011, 2:53:31 AM3/16/11
to computational-ge...@googlegroups.com
I am still confused. first we get the abs rank difference/65000, then
sort. then get 65 segments of 1000 genes,

1. Find absolute rank diff (PRDs) of the full list of 65k genes -> A list of 65k prds.
2. Break the PRD list into sets of 1000 genes -> 66 sets.
3. Sort each set of 1000 genes (notice you are sorting after you break the prd list into segments. This was Lu's question yesterday) -> 66 sorted sets.
4. Find thresholds in each of the 66 sets.

 
if we select 65 PRDs for each segment to select 50 genes. then would they all be different? more
correctly, decreasing?

(after 4) Not necessarily at this stage.  If the prds are not mon.non.dec. then you can force them to decrease.. the directions say each segment must be calculated from at least 50 genes (at least 50 must be invariant), but that means you could have more than 50 in your invariant sets.  

Is one thresh (b) is now less than the previous one (a) this messes up the mon.non.dec. requirement, right?  If you increase the threshold of that segment (b) so that it possibly includes more genes you are still following the 'as least 50 invariant genes' requirement, and you could also possibly have a mon.non.dec line.  What would that threshold have to be increased to?   (hint, answer = new threshold has to equal previous threshold) 

 Now your array of prds has probably gone from 66 unique values to maybe 10ish.  

 
Then how are the PRDs monotonically nondecreasing?

Should be after above steps.

Aparna Kumar

unread,
Mar 16, 2011, 2:58:18 AM3/16/11
to computational-ge...@googlegroups.com
Sorry, there were some typos in the last response.  Please read this one:

(after 4) Not necessarily at this stage.  If the prds are not mon.non.dec. then you can force them to be mon.non.dec.
 .. the directions say each segment must be calculated from at least 50 genes (at least 50 must be invariant), but that means you could have more than 50 in your invariant sets.  

Reply all
Reply to author
Forward
0 new messages