Hi all,
Can anyone give me suggestions? I am extremely confused by the Gaussian mixture model in high-dimensional space.
Assuming there is a data set containing 1,000 data points in the 128-dimensional space, we will train a Gaussian mixture model which has 300 components. The dimension is not very high, comparing with the number of the data points, right? But when we apply the EM algorithm to estimate the parameters, some Gaussian PDFs are prone to approach infinity due to the very small determinations of the co-variance, while others always approach zero due to the too large determinations(Although k-means is introduced first).
I have retrieved some papers and confirm under this condition the GMM can be estimated, e.g.
"Florent Perronnin. Adapted Vocabularies for Generic Visual Categorization"
"Bouveyron C, Girard S, Schmid C. High Dimensional Data Clustering."
So anything i misunderstand or ignore? Hope your help.
Best,
Stridence