Streaming incremental learner?

Rick Rauschke

unread,

Apr 22, 2014, 3:32:34 AM4/22/14

to accor...@googlegroups.com

Greetings, I'm really new at accord-net and have searched around but can't find a solution to address my multiclass SVM learner crashing somewhere between 10K and 25K records:

...System.OutOfMemoryException: Array dimensions exceeded supported range.

Ideally, I would really like to be able to train over the full set of 125K records. I've done all the preprocessing (via feature reduc. and normalization) I can do so I'm running out of options in using my full training set. I know there are options like sampling the larger set for a subset etc, but I'd really like to be able to utilize all of the available training data.

Does the accord framework support any type of streaming incremental learning capability for learning over very large datasets? I read that Weka for instance, has this capability in several of it's learning approaches. But I'm not sure if I'm overlooking something in accord-net.

Any help or direction greatly appreciated.

Regards,

Rick

César

unread,

Apr 23, 2014, 4:32:50 AM4/23/14

to accor...@googlegroups.com

Hi Rick!

As of now, unfortunately the framework doesn't support online learning for SVMs; however, it might be possible to overcome the issue by setting the cache property of the sequential minimum optimization learner as detailed here. If you decrease the value of the CacheSize value, less support vectors will be stored in memory while doing an optimization. However, setting this to a too small value this may also slower things considerably.

Please let me know if it helps!

Best regards,

Cesar

Rick Rauschke

unread,

Apr 23, 2014, 10:45:21 AM4/23/14

to accor...@googlegroups.com

Greetings Cesar, thanks for the information. I will endeavor to test cache property settings for the SMO per your suggestions. I really like this framework and I'm confident that even if I do have to reduce the training set, I will still be able to develop a very accurate representation of the full dataset.