Meaning of 'unique true positive', and 'counting the top ranked cluster when there is more than one cluster assigned to a gene'

24 views
Skip to first unread message

clin...@umn.edu

unread,
Jan 28, 2018, 6:19:00 PM1/28/18
to corset-project

Hello Davidson,

I am reading your corset 2014 algorithm paper https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0410-6. It is difficult for me to understand the Fig. 4 and Fig 5 as I cannot understand the meaning of 'unique true positive'.
According to page 6 on the paper, 'a unique true positive refers to only counting the top ranked cluster when there is more than one cluster assigned to a gene'. 
Could you kindly explain the meaning of 'counting the top ranked cluster when there is more than one cluster assigned to a gene'?    how the cluster ranked?    In my understanding, one cluster should be a gene.    'more than one cluster assigned to a gene'?
 
Thank you very much.

Best wishes,

Chen

Nadia Davidson

unread,
Jan 29, 2018, 5:14:41 PM1/29/18
to corset-project
Hi Chen,

Because transcript assembly is not perfect, clustering is also not perfect. It's possible, for example, that a large low coverage region in the middle of a gene is not assembled meaning that the transcripts of a gene are split in two or more fragments. e.g. one from 5' end to low coverage region and then another from the low coverage region to 3' end. Usually the clustering algorithm has no way of knowing that these two fragments came from the same gene, and if will assign them to different clusters. Let call these clusters #1 and #2 as an example. Then after preforming differential expression analysis with something like edgeR we can get a p-value ranked list like:

cluster  p-value
#1        0.0001
#3        0.001
#2        0.1
#4        0.999
..etc..

For the purposes of "unique true positives" we build the ROC curves without #2 because the gene it represents is already counted in #1 and #1 has a lower p-value. The logic of doing this is that in a real analysis, is that we look at the top-ranked genes first. If any part of the gene comes up at significantly different then that gene could be interesting to follow up on the biology.

Hope this helps and happy to clarify further if needed.

Cheers,
Nadia.


clin...@umn.edu

unread,
Jan 30, 2018, 10:56:10 AM1/30/18
to corset-project

Hi Nadia,

Thank you very much

Chen
Reply all
Reply to author
Forward
0 new messages