Covariance matrix is not positive definite. Try specifying a regularization constant in the fitting

600 views
Skip to first unread message

rfvt...@gmail.com

unread,
Jul 27, 2013, 2:57:35 PM7/27/13
to accor...@googlegroups.com
Hello,

I am trying to make a multivariate normal distribution.

double[] mean = Accord.Statistics.Tools.Mean(mixture);

double[,] covariance = Accord.Statistics.Tools.Covariance(mixture, mean);

var mvn = new MultivariateNormalDistribution(mean,covariance);

However when i execute the code it says "Covariance Maxtrix is not positive definite". Now the data that i have been provided with refers to a Gaussian distribution.

I don't see how can i change the covariance matrix as there is nothing i can do about the data that has been provided to me.

Any suggestions would be greatly welcomed.

I have also attached the xls file for checking if something is wrong with my data or my code.

Thank You.

test.xls

César

unread,
Jul 27, 2013, 3:43:36 PM7/27/13
to accor...@googlegroups.com
Hi there,

Unfortunately the sample you got indeed has a non-positive definite sample covariance matrix. This might occur due a variety of reasons, but there are some ways to try to sidestep this issue. 

Instead of specifying the distribution directly, I would suggest if you could try estimating the distribution from the data using the Estimate static method. This way you can attempt to provide a regularization constant to keep the covariance matrix positive definite:

// Create a new Excel reader to read the spreadsheet
ExcelReader reader = new ExcelReader(@"C:\test.xls", hasHeaders: false);

// Read the "Data" worksheet
DataTable table = reader.GetWorksheet("Data");

// Convert the data table to a jagged matrix
double[][] observations = table.ToArray();

// Estimate a new Multivariate Normal Distribution from the observations
var dist = MultivariateNormalDistribution.Estimate(observations, new NormalOptions()
{
    Regularization = 1e-10 // this value will be added to the diagonal until it becomes positive-definite
});

This will likely succeed, but might lead to a slightly biased distribution. By the way, can you share a bit more details about the data so we can try to figure out why its covariance matrix is not positive definite? Sometimes this is due a small number of samples; sampling a bit more from the data source would eventually make the sample covariance matrix PSD.

Hope it helps!

Best regards,
Cesar

rfvt...@gmail.com

unread,
Jul 27, 2013, 7:12:00 PM7/27/13
to accor...@googlegroups.com
Hello,

Thanks for the prompt reply. This data has been extracted using image processing techniques.(21 characteristics(hence the 21 columns) of each image were measured using these characteristics all the rows would be classified to belong to one the four unique classes).

I am trying to use GMM to classify the data into these 4 different classes. Someone else has already tagged each row with a class. The previous spreadsheet contained all the data that was tagged against a single class. My objective is to make a Multivariate Normal Distribution of each of these four classes and then combine them into forming GMM and then use real data to automate the classification. I don't if its enough but each class contains around 150-250 samples for training.

I can post the full spreadsheet if required.

and once again real i am really thankful for your help and interest.

Regards.

César

unread,
Jul 27, 2013, 7:21:08 PM7/27/13
to accor...@googlegroups.com
Hi there,

If that is the case, then you just need to specify that regularization constant to make the matrices eventually positive definite. It is not that uncommon to use this feature when dealing with Gaussian Mixture Models. The Mixture classes, the MultivariateNormalDistribution classes and the GaussianMixtureModel all include a way to feed this regularization to the actual estimation routines.

By the way, you do not need to create the distributions manually. If you wish you can use the GaussianMixtureModel class to do the heavy lifting for you. If you need more directions, perhaps you would find it useful to take a look at the GMM sample application for a working example on how to use this class.

If you need further help please ask!


Best regards,
Cesar

rfvt...@gmail.com

unread,
Jul 28, 2013, 8:10:02 AM7/28/13
to accor...@googlegroups.com
Hello,

I have been successfully able to combine the various normal distribution into GMM and then used them for classification.

I have one last question though. As the covariance of my data was non-positive i used the regularization constant that you provided in your previous post.Even though I am getting correct answer >90% of the time when i compare data classified by GMM with the reference classification data i would still like to know what kind of side effects can i expect because of this regularization constant that has been applied to the distributions.

Regards.

Reply all
Reply to author
Forward
0 new messages