fuzzy c means VS multiple inputs and nan values in Python

190 views
Skip to first unread message

xiaoye tong

unread,
Feb 8, 2016, 9:20:53 AM2/8/16
to scikit-fuzzy
Hello there,
I want to classify a multiple-layer spatial raster image unsupervisedly into 2 classes by using fuzzy c means clustering. 
My first question is how to cluster multiple inputs in scikit-fuzzy. For example, I defined one input as an 1D array named "xpts" and another input named "ypts". Now I wrote them into alldata using "alldata = np.vstack((xpts, ypts))" and attempt to use "cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(alldata, 2, 2, error=0.005, maxiter=1000, init=None)" to get the "u" with information of how much likely the value belongs to one cluster. Is it the right way to do it?
I didn't succeed by getting outcomes as 0.5 percentage for one cluster for all values.

Futhermore, I also need to mask out some areas by giving them nan values. I notice fuzz.cluster.cmeans could deal with nan values. but it works when I convert the nan values to 0. But I am not sure whether the zeros would influence the clustering.


Looking forward to your reply!
Best,
Xiaoye

JDWarner

unread,
Feb 10, 2016, 8:42:02 PM2/10/16
to scikit-fuzzy
The usage you describe sounds correct. You may want to experiment with more than two clusters, though, since `u` will be a square array with only two. This can confuse a new user because it's unclear which axis is the fuzzy membership and which is the fuzzy cluster. Transpose these and the results look wrong. I suspect this is what is going on, but without more info to test against I can't be sure. If that doesn't solve your issue, feel free to open an issue on Github about this with a minimal example!

Regarding masking points with NaNs, I suggest simply omitting them from your data before you create `alldata`. If you have a 2D image `test_image` with multiple layers that has dimensions [X, Y, layers], and you know your masked points (a binary array `mask` with desired points set to True, with dimensions [X, Y]), simply index `test_image[mask[..., np.newaxis].repeat(layers, axis=-1)]` and flatten or ravel the result. Only the desired data is returned. This eliminates the need for NaN masking.

Hope that helps,
Josh
Reply all
Reply to author
Forward
0 new messages