---------- Forwarded message ---------
From:
赵路 <luzha...@gmail.com>Date: Thu, Dec 13, 2018 at 10:43 PM
Subject: Re: Clustering output depends on order of barcodes
To: Michael Manhart <
msma...@gmail.com>
Hi Michael,
Thanks for reporting this behavior. Based on your description and current implementation, I'm not sure what's the real cause for this little fuzziness in your case. This behavior might come from several sources. For example, the underlying data structure APIs does not have point or value stability (the code should be examined and removed if this exists), the greedy clustering algorithm in each bucket is not invariant to sequences order mostly because of the reason you pointed out (exists barcodes that are equidistant to multiple clusters). To be frank, I'm not surprised by this behavior. And even for classic clustering algorithms, they also might have different results for different initial states, such as k-means. In your experiment, the difference is very small in terms of # of clusters. Could you share me the cluster size distribution of those clusters only show up in one setting? I suspect that most of their size should be very small.
Best,
Lu