OutOfMemoryError with Mallet CRF classifier

154 views
Skip to first unread message

tilak kumar

unread,
Nov 9, 2015, 2:36:36 PM11/9/15
to cleartk-users
Hi,

The classifier frequently fails with OutOfMemoryError. Heap dump shows that 42% of heap size is char[] and 15% is String.
java.lang.OutOfMemoryError: Java heap space
    at cc.mallet.types.IndexedSparseVector.setIndex2Location(IndexedSparseVector.java:109)
    at cc.mallet.types.IndexedSparseVector.dotProduct(IndexedSparseVector.java:157)
    at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1856)
    at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1835)
    at cc.mallet.fst.CRF$State.transitionIterator(CRF.java:1776)
    at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:252)
    at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:197)
    at cc.mallet.fst.MaxLatticeDefault$Factory.newMaxLattice(MaxLatticeDefault.java:494)
    at cc.mallet.fst.MaxLatticeFactory.newMaxLattice(MaxLatticeFactory.java:11)
    at cc.mallet.fst.Transducer.transduce(Transducer.java:124)
    at org.cleartk.ml.mallet.MalletCrfStringOutcomeClassifier.classify(MalletCrfStringOutcomeClassifier.java:90)


Model is created based on MalletCrfStringOutcomeDataWriter.
AnalysisEngineFactory.createEngineDescription(DataChunkAnnotator.class,
        CleartkSequenceAnnotator.PARAM_IS_TRAINING, true, DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
        options.getModelsDirectory(), DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, MalletCrfStringOutcomeDataWriter.class)

The annotator code looks as follows.
 if (this.isTraining()) {
        List<DataAnnotation> namedEntityMentions = JCasUtil.selectCovered(jCas, DataAannotation.class, sentence);
        List<String> outcomes = this.chunking.createOutcomes(jCas, tokens, namedEntityMentions);
        this.dataWriter.write(Instances.toInstances(outcomes, featureLists));
      } else {
        List<String> outcomes = this.classifier.classify(featureLists);
        this.chunking.createChunks(jCas, tokens, outcomes);
      }

Please suggest.


Thanks
Tilak

Steven Bethard

unread,
Nov 9, 2015, 2:39:32 PM11/9/15
to cleartk-users
Mallet can require a lot of memory. Consider increasing your Java heap space, e.g.,


Steve

--
You received this message because you are subscribed to the Google Groups "cleartk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cleartk-user...@googlegroups.com.
To post to this group, send email to cleart...@googlegroups.com.
Visit this group at http://groups.google.com/group/cleartk-users.
For more options, visit https://groups.google.com/d/optout.

tilak kumar

unread,
Nov 9, 2015, 11:32:01 PM11/9/15
to cleartk-users
Thanks Steven.
We have UIMA pipeline which invokes 5 model jars(based on mallet CRF) around 30MB each. -Xms is set to 2G and -Xmx is set to 4G.
Every 4th time we invoke the pipeline it runs into OutofMemeory. Is there any guidelines/bench marking on setting the heap space.

Looks like mallet CRF has memory leak issues - https://code.google.com/p/cleartk/issues/detail?id=408
The out of memory issue is reproducible even after applying the patch.

Thanks
Tilak

Steven Bethard

unread,
Nov 11, 2015, 4:44:05 PM11/11/15
to cleart...@googlegroups.com
I see. So the error is in a long-running classification process. Can you tell if your alphabet size is growing, like it was in the unpatched version?

If that doesn't seem to be the problem, you might ask on the Mallet list. We have a fairly thin wrapper over Mallet, so they might be able to suggest what other things in mallet might cause OutOfMemoryError.

Steve


Message has been deleted

tilak kumar

unread,
Nov 16, 2015, 3:52:10 AM11/16/15
to cleartk-users
Thanks Steve. Can you please point Mallet list ?
I could not find its google group. Mail to majo...@cs.umass.edu or mal...@cs.umass.edu fails.

Steven Bethard

unread,
Nov 16, 2015, 11:08:05 AM11/16/15
to cleart...@googlegroups.com
I think those are the right thing to be emailing, but they're also failing for me. In the meantime, you could try:

Adding an issue on the Mallet tracker on Github:

Posting a "mallet" question on StackOverflow:

That said, it looks like the Mallet developers are about as bad at responding to these things as we ClearTK developers are. =)

Steve

On Mon, Nov 16, 2015 at 2:51 AM tilak kumar <tilakk...@gmail.com> wrote:
Thanks Steve. Can you please point Mallet list ?
I could not find its google group. Mail to majo...@cs.umass.edu or mal...@cs.umass.edu fails.

tilak kumar

unread,
Nov 19, 2015, 12:58:35 PM11/19/15
to cleartk-users
Reply all
Reply to author
Forward
0 new messages