inconsistent results

117 views
Skip to first unread message

Petros Kolovos

unread,
May 21, 2012, 8:40:09 AM5/21/12
to seqMINER
Dear All,

I have a chip-seq and I comparing that vs another chip-seq (reference)
and everytime that I do the clustering (Kmeans raw) with 10
clusters.....everytime the picture in the end is different than the
previous time

why is this happening?

thank you in advance

best regards
Petros

Petros Kolovos

unread,
May 23, 2012, 11:04:23 AM5/23/12
to seqMINER
Dear All,

here you can see some examples of my problem

All the 3 pictures were created from the same dataset with 10 clusters WITHOUT changing any parameter

I understand that the problem is that the program takes everytime different clusters

Isn't a way to define how the clusters are going to be taken? For example to define one cluster that is peaks that >1000 bp from TSS, another cluster that is peaks <1000 bp and so on.....

Who this can be done?

Thank you in advance

I will wait for your asnwer

Yours sincerely
Petros Kolovos

2012/5/21 Petros Kolovos <p.ko...@gmail.com>
1.png
2.png
3.png

Tao

unread,
May 28, 2012, 10:30:28 AM5/28/12
to seqMINER
Hi Petros,

I'm afraid that you've mis-understood the k-means clustering method.

you will have a reference here to explain your questions.
http://en.wikipedia.org/wiki/K-means_clustering

best,
Tao

On May 23, 5:04 pm, Petros Kolovos <p.kolo...@gmail.com> wrote:
> Dear All,
>
> here you can see some examples of my problem
>
> All the 3 pictures were created from the same dataset with 10 clusters
> WITHOUT changing any parameter
>
> I understand that the problem is that the program takes everytime different
> clusters
>
> Isn't a way to define how the clusters are going to be taken? For example
> to define one cluster that is peaks that >1000 bp from TSS, another cluster
> that is peaks <1000 bp and so on.....
>
> Who this can be done?
>
> Thank you in advance
>
> I will wait for your asnwer
>
> Yours sincerely
> Petros Kolovos
>
> 2012/5/21 Petros Kolovos <p.kolo...@gmail.com>
>
>
>
>
>
>
>
> > Dear All,
>
> > I have a chip-seq and I comparing that vs another chip-seq (reference)
> > and everytime that I do the clustering (Kmeans raw) with 10
> > clusters.....everytime the picture in the end is different than the
> > previous time
>
> > why is this happening?
>
> > thank you in advance
>
> > best regards
> > Petros
>
>
>
>  1.png
> 139KViewDownload
>
>  2.png
> 138KViewDownload
>
>  3.png
> 139KViewDownload

Petros Kolovos

unread,
May 28, 2012, 2:49:31 PM5/28/12
to seqm...@googlegroups.com
Dear Tao,

thank you very much. I understand what K-mean means

So, if i understand correct....whatever normalization I will use, if I repeat the analysis I will always have different results?

That means that you have to repeat your analysis many times in order to have a picture which fits to your expectation?

Isn't scarry that you have different pictures each time you do the analysis with the same input?

Isn't a way to define the way the clusters are made? Like I said in the previous email, for example to have at one cluster the peaks which have a specific distance from TSS (for example greater that 1000bp from TSS) and another cluster to have peaks with distance less than 1000bp from TSS.

So if i want to see the distribution of my peaks relevant to TSS, is it logical to repeat the clustering method many times until I have the best picture? How can I define what number is the best to choose for the number of clusters?

Thank you in advance

I will wait for your answer as soon as possible as I really have to make some pictures with seqminner

Best regards
Petros



2012/5/28 Tao <ytsq...@gmail.com>

Gema Sanz

unread,
May 11, 2013, 12:17:34 PM5/11/13
to seqm...@googlegroups.com
Hi Petros, if you use a fixed k-mean seed you will get always the same result (toolsoptions/general/run kmeans with a given seed, I use 5)
Reply all
Reply to author
Forward
0 new messages