Scaling of chain CRF

nandal.su...@gmail.com

unread,

Jun 7, 2016, 11:07:04 AM6/7/16

to pystruct

Hello all,

Problem: I always run out of memory when implementing ChainCRF. I am using a dataset of 60K sentences where each sentence is of avg. length 15-20. each word is represented by 300 dimensions.

Computer Config. : I have a RAM of 8GB.

Question: Is it possible to scale implementation for such big dataset? I do not want to use PCA for dimensionality reduction.

Currently i am collecting my streaming data and passing it to ChainCRF in the required format.

Thanks & Regards
Surender

Roshan Santhosh

unread,

Jun 11, 2016, 1:31:41 PM6/11/16

to pystruct

What is the size of your entire dataset? If it exceeds 8GB, then you will have to split up your dataset into smaller files.

A possible solution would be train your model in batches. Import first input dataset file, train on it and extract the model weights. Then initialize a new model with the next input dataset and weights from the previous model and train on it. This can be done recursively for all the input files.

A potential problem with this approach is that the overall model can overfit to the last input dataset file. You can overcome this training multiple iterations by choosing input files in different orders. Thereafter you can either average the weights of all the models or ensemble their predictions.

Andy

unread,

Jun 20, 2016, 10:02:03 PM6/20/16

to pyst...@googlegroups.com

Hi Surender.
Sorry for the late reply.
This should totally work. How big is your input in ram?
And do you get an OutOfMemory error? If so, where?

Andy

--
You received this message because you are subscribed to the Google Groups "pystruct" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystruct+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward