Multiclass SVM for MNIST digits classification

Jorge Morales Vidal

unread,

Jul 6, 2014, 7:04:36 PM7/6/14

to accor...@googlegroups.com

Hi everyone,

I've been using Accord.NET in the last couple of weeks and it works really great for two classifiers (Naive Bayes and k Nearest Neighbor) to work on the MNIST handwritten digits database (you may know it). However I've been trying to use Multiclass Support Vector Machine classifier with no avail so far.

I've prepared the 60 thousand records with PCA, so the data has been reduce to its 95 principal components. Naive Bayes and kNN work nicely with this data; but MSVM doesn't. The computer freezes when running the learning algorithm with Sequential Minimal Optimization, and Task Manager reports 7.9GB of RAM being used, before the screen freezes. I have configured the cross validation method to use 10 folds, and I've been trying with different sizes with the same results: freezing.

I know something needs to be configured in CrossValidation, SMO or MSVM (I have been trying with several parameters for the Gaussian and Polynomial kernels, and the complexity parameter as well), but I have looked at the documentation and I don't know what to change. My code is as follows:

//numeroDimensiones: integer value, 95 as the number of principal components
//numeroClases: integer value, 10, as the number of classes in the database [0-9]
public override void CrossValidation(double[][] inputs, int[] outputs, out CrossValidationStatistics trainingErrors, out CrossValidationStatistics validationErrors)
{
	var crossvalidation = new CrossValidation<MulticlassSupportVectorMachine>(size: inputs.Length, folds: 10);
	crossvalidation.Fitting = (index, indicesTrain, indicesValidation) =>
	{
		double trainingError = 0, validationError = 0;

		// Lets now grab the training data: 
		double[][] trainingInputs = inputs.Submatrix(indicesTrain);
		int[] trainingOutputs = outputs.Submatrix(indicesTrain);

		// And now the validation data: 
		double[][] validationInputs = inputs.Submatrix(indicesValidation);
		int[] validationOutputs = outputs.Submatrix(indicesValidation);

		// Create a new kernel
		IKernel kernel = Gaussian.Estimate(trainingInputs);

		// Complexity
		//var complexity = SequentialMinimalOptimization.EstimateComplexity(kernel, trainingInputs);

		// Create a new Multi-class Support Vector Machine
		var model = new MulticlassSupportVectorMachine(numeroDimensiones, kernel, numeroClases);

		// Create the Multi-class learning algorithm for the machine 
		var teacher = new MulticlassSupportVectorLearning(model, trainingInputs, trainingOutputs);

		// Configure the learning algorithm to use SMO to train the 
		// underlying SVMs in each of the binary class subproblems.
		teacher.Algorithm = (svm, classInputs, classOutputs, i, j) =>
			new SequentialMinimalOptimization(svm, classInputs, classOutputs);

		// Run the learning algorithm 
		trainingError = teacher.Run();

		// Get the validation errors
		validationError = teacher.ComputeError(validationInputs, validationOutputs);

		// Return a new information structure containing the model and the errors achieved. 
		return new CrossValidationValues<MulticlassSupportVectorMachine>(model, trainingError, validationError);
	};

	// Compute the cross-validation 
	var result = crossvalidation.Compute();
	result.Save(SVMResultsFile);

	// Finally, access the measured performance.
	result.Training.Tag = "Training results";
	result.Validation.Tag = "Validation results";
	trainingErrors = result.Training;
	validationErrors = result.Validation;

	var minIndex = result.Validation.Values.Find(v => v == result.Training.Values.Min()).FirstOrDefault();
	var minModelValues = result.Models[minIndex];
	this.model = minModelValues.Model;
}

If you have any recommendation or suggestion, I'll really appreciate it.

Thank you,

Jorge Morales

Message has been deleted

César

unread,

Jul 7, 2014, 4:41:54 AM7/7/14

to accor...@googlegroups.com

Hi Jorge!

First, thanks for the interest in the framework, I hope it can be helpful to you!

One of the first things that I would suggest you to change is the cache size for the SMO algorithm. The SMO uses a cache internally to trade memory for CPU time. The problem is that, if too much memory is used, the CPU may stay more time changing memory pages than actually doing any real work. For thus, you can adjust the CacheSize property of the SMO algorithm for a smaller value (please see second question in the FAQ). Also, I see that you are already following the two recommendations in the FAQ for the Complexity and Sigma parameter, which should help.

Now, just to start and get a first model working, I would suggest you to start with a Linear kernel, and I would also suggest you to set the Tolerance parameter to a higher value, such as 0.9. This will make the algorithm stop sooner, but may of course not lead to the best solution. As such, you can get an idea on how much time it would take to learn your problem and if the SMO would be feasible in this case. After you can start decreasing the Tolerance back to a normal value (even 0.2 might still work and give good results).

After you finish experimenting with the Linear, try a 2nd degree Polynomial first, and only then move to the Gaussian. The Gaussian is by far the most costly of those three. Hope it helps!

Best regards,

Cesar

Jorge Morales Vidal

unread,

Jul 7, 2014, 10:50:48 AM7/7/14

to accor...@googlegroups.com

Hi César!

I have followed your guidelines and it works! Awesome, I almost cried after seeing it finish with the results. The predictitions are really good, and I hope I can have the time to test the MNIST data with the other kernel functions you suggest. Thanks César, that was amazing!

Accord.NET is a really great framework. It is helpful as I need to do some comparisons between several algorithms (and I'm currently focusing on the DeepBeliefNetwork implementation -it's currently running on an old AMD Turion64X2 laptop, I hope it finishes soon -, but that could be told in another post).

Thank you!

Jorge Morales

carlos....@gmail.com

unread,

Jul 9, 2014, 12:52:55 PM7/9/14

to accor...@googlegroups.com

Hi!
First of all, thank you César for this outstanding work with accord (have been following you the past 2 months and i am completely astonished)

Now, to the point:
I used this example of Cross Validation (and thank you Jorge) and i came across an error i can't fix nor understand :X

I am getting an error that says "Additional information: Training algorithm needs at least one training vector."
Well, the obvious thing would be my "input" vector, despite not being null, being empty. Which is not!
But, seeing the CrossValidation method as Jorge placed it here, when i run the
"teacher.Algorithm ..." the input it receives is given by the "trainingInputs", which i do not control (??right??).
Could it be that, by having a small amount of data (my original input vector only has 40 observations), the cross validation can't divide properly what is training and validation ??
This is my code:

public CrossValidation(double[][] inputs, int[] outputs, out CrossValidationStatistics trainingErrors, out CrossValidationStatistics validationErrors, string path)
{

var crossvalidation = new CrossValidation<MulticlassSupportVectorMachine>(size: inputs.Length, folds: 10);

crossvalidation.Fitting = delegate(int index, int []indicesTrain, int [] indicesValidation)
{

double trainingError = 0, validationError = 0;

// Lets now grab the training data:
double[][] trainingInputs = inputs.Submatrix(indicesTrain);
int[] trainingOutputs = outputs.Submatrix(indicesTrain);

// And now the validation data:
double[][] validationInputs = inputs.Submatrix(indicesValidation);
int[] validationOutputs = outputs.Submatrix(indicesValidation);

// Create a new kernel

//IKernel kernel = Gaussian.Estimate(trainingInputs);
IKernel kernel = new Linear();

// Complexity
//var complexity = SequentialMinimalOptimization.EstimateComplexity(kernel, trainingInputs);

// Create a new Multi-class Support Vector Machine

var model = new MulticlassSupportVectorMachine(trainingInputs[0].Length, kernel, inputs.Distinct().Length);

// Create the Multi-class learning algorithm for the machine
var teacher = new MulticlassSupportVectorLearning(model, trainingInputs, trainingOutputs);

// Configure the learning algorithm to use SMO to train the
// underlying SVMs in each of the binary class subproblems.
teacher.Algorithm = (svm, classInputs, classOutputs, i, j) =>

new SequentialMinimalOptimization(svm, classInputs, classOutputs) {
Tolerance = 0.1,
};

// Run the learning algorithm
trainingError = teacher.Run();

// Get the validation errors
validationError = teacher.ComputeError(validationInputs, validationOutputs);

// Return a new information structure containing the model and the errors achieved.
return new CrossValidationValues<MulticlassSupportVectorMachine>(model, trainingError, validationError);
};

// Compute the cross-validation
var result = crossvalidation.Compute();

result.Save(path);

// Finally, access the measured performance.
result.Training.Tag = "Training results";
result.Validation.Tag = "Validation results";
trainingErrors = result.Training;
validationErrors = result.Validation;

var minIndex = result.Validation.Values.Find(v => v == result.Training.Values.Min()).FirstOrDefault();
var minModelValues = result.Models[minIndex];

this.classifier = minModelValues.Model;
}

PS: I'm having a really hard time joining the group (if i'm able) or even putting my name accordingly --' can't even edit this response properly with code and stuff. I'm Carlos Sotelo

carlos....@gmail.com

unread,

Jul 9, 2014, 1:10:12 PM7/9/14

to accor...@googlegroups.com

Something i realized it might be a cause for troubles...
Some of the classes only have one sample... i remember reading that this is problematic for some kernels like the Gaussian, could this be an issue in here?

César

unread,

Jul 9, 2014, 1:32:54 PM7/9/14

to accor...@googlegroups.com

Hi Carlos!

Thanks for the message, I hope the framework can be useful to you! Indeed, the problem is exactly what you mentioned. The CrossValidation technique doesn't takes into account the label of the samples, it just randomly splits the data into different sets. The problem is that some of the classes might have only one sample, and when this happens, some machine learning methods may fall apart (such as the multiclass, one-vs-one, support vector machines being used).

In this case, I am still thinking what could be done... I could try to add another method to select the partitions better, ensuring that at least a n number of samples from each class stays at each partition. However, this would also need that there is enough samples from each class at each step to actually created those partitions in the first place. I suppose, in this case, that it would be better to do a smaller validation, such as 5-fold or even 3-fold and see if it helps.

I am thinking if I can add some feature to do this is an automated way. Thanks for exposing the problem, I hope we find a solution!

Best regards,

Cesar

César

unread,

Jul 9, 2014, 3:08:59 PM7/9/14

to accor...@googlegroups.com

Hi Carlos!

I *think* that you could be able to use the following code to generate properly balanced partitions for your cross-validation. However, please note that it will only work if you have enough samples from each class to be able to fill every fold evenly. It should probably work if you decrease the number of folds, as I suggested earlier:

public static int[] RandomGroups(int[] labels, int classes, int groups)
{
    int size = labels.Length;

    var buckets = new List<Tuple<int, int>>[classes];
    for (int i = 0; i < buckets.Length; i++)
        buckets[i] = new List<Tuple<int, int>>();

    for (int i = 0; i < labels.Length; i++)
        buckets[labels[i]].Add(Tuple.Create(i, labels[i]));


    for (int i = 0; i < buckets.Length; i++)
        Accord.Statistics.Tools.Shuffle(buckets);

    var partitions = new List<Tuple<int, int>>[groups];
    for (int i = 0; i < partitions.Length; i++)
        partitions[i] = new List<Tuple<int, int>>();

    // We are going to take samples from the buckets and assign to 
    // groups. For this, we will be following the buckets in order,
    // such that new samples are drawn equally from each bucket.

    bool allEmpty = true;
    int bucketIndex = 0;
    int partitionIndex = 0;

    do
    {
        for (int i = 0; i < partitions.Length; i++)
        {
            allEmpty = true;


            var currentPartition = partitions[partitionIndex];
            partitionIndex = (partitionIndex + 1) % partitions.Length;


            for (int j = 0; j < buckets.Length; j++)
            {
                var currentBucket = buckets[bucketIndex];
                bucketIndex = (bucketIndex + 1) % buckets.Length;


                if (currentBucket.Count == 0)
                    continue;


                allEmpty = false;


                var next = currentBucket[currentBucket.Count - 1];
                currentBucket.RemoveAt(currentBucket.Count - 1);
                currentPartition.Add(next);
            }
        }

    } while (!allEmpty);


    for (int i = 0; i < partitions.Length; i++)
        Accord.Statistics.Tools.Shuffle(partitions[i]);

    int[] splittings = new int[labels.Length];
    for (int i = 0; i < partitions.Length; i++)
        foreach (var index in partitions[i])
            splittings[index.Item1] = i;

    return splittings;
}

To use it, please class this RandomGroups method passing the labels of your samples, the number of different classes in your samples, and the number of folds. The method should generate a int[] vector representing the cross-validation folds for the data. Afterwards, pass this int[] array into one of the cross-validation constructors. There is a constructor that accepts a int[] array and the number of folds you selected.

César

unread,

Jul 9, 2014, 3:15:51 PM7/9/14

to accor...@googlegroups.com

I mean, please "call", not "class" the method.

Best regards,

Cesar

carlos....@gmail.com

unread,

Jul 10, 2014, 5:51:52 AM7/10/14

to accor...@googlegroups.com

Hi again ;)
I suppose we have an huge time gap between us :P , nevertheless thank you a lot for these fast, clean and tremendously helpful answers! :D

Thank you also for confirming some of my doubts and even answering to other questions i had. In fact, my problem for now is quite simpler: get more Data ;)
Was simply "messing" around with the cross validation to start to see some actual numbers and kinda forgot that the amount of data could be an issue (like we just verified). At least this way we conclude the obvious thing -> need more samples.

Also thought that probably it would be the best to expose minimally my problem.
I am trying to recognize hand movements (as in sign language) with the actual position of both hands. Firstly i was using an MultiClass SVM and a simple HMM to classify the movements. I was using the 6 dimensions (3 for each hand) and realized that this could be an issue so was thinking in using a machine for each hand... Is this feasible? I mean, i can obviously classify each hand separately... but i can't see how i could correlate the results after. I say this cause in some cases, using the 6 dimensions was costly for the teaching process...

But for now, my main problem is to fix something i thought it was correct... that has nothing to do with machine learning, which is the depth Data from my hands needs to be normalized, so i can compare movements done 1m for the sensor or 2m or 3m.

So, ill keep up working on this issues, and ill let you know about the answers you provided :D

Lastly, a minor fix because i couldn't compile your function (RandomGroups) because of this line...

for (int i = 0; i < partitions.Length; i++)
Accord.Statistics.Tools.Shuffle(partitions[i]);

adding the .ToArray() method fixed it ;) :
Accord.Statistics.Tools.Shuffle.ToArray()(partitions[i]);

PS: how can i edit my message like you are doing? putting the code like that and bold, italic and other stuff? can't find a way to do that :(

César

unread,

Jul 10, 2014, 6:12:07 AM7/10/14

to accor...@googlegroups.com

If you wish, can I suggest you to take a look at some of my papers that I wrote about sign recognition? I worked roughly in the same problem as you before, using SVMs and HMMs in depth images for sign language recognition. My processing routine would first start classifying an image window around the hand into a set of 46 possible hand shapes. With this, I would obtain an integer representing the current shape of the users hand, for each frame. Afterwards, I would take this integer, join together with the location of the hand and some other information to create a feature vector similar to <hand_shape, hand_dx, hand_dy, hand_angle, face_angle>, and send this to a Hidden Markov Model created with an independent joint distribution of a GeneralDiscreteDistribution (for the hand shape) and NormalDistribution's for the other features.

The paper is available here (and if you would like to cite it as well please be welcome, if you are really thankful for the framework, this would be the best reward I could get ;-)

Souza, C. R., Pizzolato, E. B.; Sign Language Recognition with Support Vector Machines and Hidden Conditional Random Fields: Going from Fingerspelling to Natural Articulated Words. Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science Volume 7988, 2013, pp 84-98. DOI: 10.1007/978-3-642-39712-7_7 [Manuscript]

Unfortunately it is not possible to edit posts in this group, I suppose this is a (very inconvenient) Google Groups limitation... And regarding joining the group, I suppose you must have a Google account to do so. The code can be colored if you click on the { } icon in the upper right corner of the message editor. Hope it helps!

Best regards,

Cesar

carlos....@gmail.com

unread,

Jul 10, 2014, 6:34:46 AM7/10/14

to accor...@googlegroups.com

I just feel like the worst researcher by not finding this precise work before!! My problem is exactly the same... and the funny thing is, i took almost the same exact approach... The thing i am/was lacking was precisely how to connect the hand posture with the hand movement and then feed that to a classifier!! I am not sure, but i think you are familiar with Rui Almeida's thesis back in 2011 (Portuguese Sign Language Recognition via Computer Vision and Depth Sensor) ... He was the one who pointed me in your direction (so to speak) .

By simply reading the title of your article i realized those words describe precisely my thesis --' . "Going from Fingerspelling to Natural Articulated Words"
Almeida did exactly the part of the FingerSpelling for Portuguese Sign Language and I'm continuing his work to achieve the Natural Articulated Word (with the hand postures merged with the hand movement).

I just feel i should've talked to you like 2 months ago --'

PS: I wasn't able to find your contact, in order to not extend our talk in this topic, cause it is becoming a little off topic...

César

unread,

Jul 10, 2014, 8:44:23 AM7/10/14

to accor...@googlegroups.com

Do you by any chance speak Portuguese? I have a complete master thesis on this subject, but unfortunately it is still available only in Brazilian portuguese:

Souza, C. R.; Reconhecimento de gestos da Língua Brasileira de Sinais através de Máquinas de Vetores de Suporte e Campos Aleatórios Condicionais Ocultos. Master's thesis. Universidade Federal de São Carlos (UFSCar). São Carlos, 2013. [Manuscript] [Slideshow]

If not, we can share lots of other stuff as well! My personal email address is available at the top of any of the framework's source code files.

César

unread,

Jul 10, 2014, 8:52:23 AM7/10/14

to accor...@googlegroups.com

And I forgot to say, yeah, I had seen Almeida thesis before! It is very funny because I created the framework more or less to work in the sign recognition problem, and I suppose he was able to use my framework even before me. It was an interesting work, I must say!

In any case, in the beginning I also wrote some papers on fingerspelling recognition, if you would also be interested:

Souza, C. R., Pizzolato, E. B., Anjo, M. S.; Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields. Advances in Artificial Intelligence – IBERAMIA 2012. Lecture Notes in Computer Science Volume 7637, 2012, pp 561-570. DOI: 10.1007/978-3-642-34654-5_57 [Manuscript] [Slideshow]