Wordnet Lemmatizer?

63 views
Skip to first unread message

Chandra Shekhar

unread,
Feb 12, 2016, 5:34:33 AM2/12/16
to Factorie
What are the basic functions to lemmatize a string from
import cc.factorie.app.nlp.lemma._
Not getting it!
Like "Studied" should get lemmatized to "study"

Emma Strubell

unread,
Feb 12, 2016, 10:43:34 AM2/12/16
to Factorie
Hi Chandra,

Not sure what your particular issue might have been (I suspect it was due to a lack of necessary pre-processing: tokenization, pos tagging). Our lemmatizer is designed to run on entire documents, rather than one token at a time. It can be run on a single token, but requires a part-of-speech tag. Our part-of-speech tagger is likely to perform somewhat poorly if given just a single token at a time as it incorporates many features based on a token's context in its sentence, which would be missing.

The following code works for me, and provides examples of running on both a single token and a sentence:  

Oh, one other thing is that you'll need to depend on our all-models jar of NLP models (in addition to the base Factorie jar) in order to run NLP pipeline components.

Let us know if you have remaining issues/questions!

Emma

--
--
Factorie Discuss group.
To post, email: dis...@factorie.cs.umass.edu
To unsubscribe, email: discuss+u...@factorie.cs.umass.edu

Chandra Shekhar

unread,
Feb 13, 2016, 10:35:17 AM2/13/16
to dis...@factorie.cs.umass.edu
Hi Emma,
Thanks for the help.
I already tried with POS tags but, was stuck with the word-net dir.The code you shared is giving me errors with DeterministicNormalizingTokenizer, and DeterministicSentenceSegment .Unable to resolve


---
You received this message because you are subscribed to the Google Groups "Factorie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@factorie.cs.umass.edu.

Emma Strubell

unread,
Feb 13, 2016, 10:48:05 AM2/13/16
to dis...@factorie.cs.umass.edu
Are you using Scala or Java? How are you resolving dependencies? Are you sure that your project depends on the all-models jar? (If you tell me how you are doing dependency management I can give you more info on how to include that dependency). What errors are you getting?

Chandra Shekhar

unread,
Feb 13, 2016, 10:52:32 AM2/13/16
to dis...@factorie.cs.umass.edu
I am doing this project with SBT in scala.
I have my dependencies in buid,sbt like these:
"org.scalanlp" %% "epic-parser-en-span" % "2015.1.25",

"org.scalanlp" %% "epic-pos-en" % "2015.1.25",

"org.scalanlp" %% "epic-ner-en-conll" % "2015.1.25",

"cc.factorie" % "factorie" % "1.0",

"com.jasonbaldridge" % "chalk" % "1.1.0"

And the errors with your code are coming:
Error:(4, 8) object DeterministicNormalizingTokenizer is not a member of package cc.factorie.app.nlp.segment
import cc.factorie.app.nlp.segment.{DeterministicSentenceSegmenter, DeterministicNormalizingTokenizer}
       ^
Error:(12, 23) not found: value DeterministicNormalizingTokenizer
    val annotators = Seq(DeterministicNormalizingTokenizer, DeterministicSentenceSegmenter, OntonotesForwardPosTagger, WordNetLemmatizer)
                         ^
I was trying with wordnet lemmatizer, in that it requires a word-net dir as argument which i am unable to resolve.

Emma Strubell

unread,
Feb 13, 2016, 11:00:24 AM2/13/16
to dis...@factorie.cs.umass.edu
Aha. My code is based on the snapshot version of Factorie. For my code to work you need to add the dependency:
"cc.factorie.app.nlp" % "all-models" % "1.2-SNAPSHOT"

and replace Factorie with:
"cc.factorie" %% "factorie" % "1.2-SNAPSHOT"

to use the latest release version of Factorie (which is 1.1, not 1.0) your dependencies should be:
"cc.factorie.app.nlp" % "all-models" % "1.0.0"
"cc.factorie" %% "factorie" % "1.1"

(depending on how your scala version is set in build.sbt versioning might not quite work, but this should be a start)

Chandra Shekhar

unread,
Feb 15, 2016, 3:06:25 AM2/15/16
to dis...@factorie.cs.umass.edu
This is how my "build.sbt" look like:

name
:= "newepic"

libraryDependencies ++= Seq(

"org.scalanlp" %% "epic-parser-en-span" % "2015.1.25",

"org.scalanlp" %% "epic-pos-en" % "2015.1.25",

"org.scalanlp" %% "epic-ner-en-conll" % "2015.1.25",

  //"cc.factorie" %% "factorie" % "1.1",

"cc.factorie" %% "factorie" % "1.2-SNAPSHOT",


"cc.factorie.app.nlp" % "all-models" % "1.2-SNAPSHOT"

  //"cc.factorie.app.nlp" % "all-models" % "1.0.0"


)

resolvers ++= Seq(
"ScalaNLP Maven2" at "http://repo.scalanlp.org/repo",

"Scala Tools Snapshots" at "http://scala-tools.org/repo-snapshots/",

"Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/",

//"IESL Release" at "http://dev-iesl.cs.umass.edu/nexus/content/groups/public"
"IESL Release" at "https://dev-iesl.cs.umass.edu/nexus/content/groups/public-snapshots/cc/factorie/factorie/"
)

scalaVersion := "2.11.7"
But, Unable to resolve these issues:

Error:(5, 8) object DeterministicNormalizingTokenizer is not a member of package cc.factorie.app.nlp.segment

import cc.factorie.app.nlp.segment.{DeterministicSentenceSegmenter, DeterministicNormalizingTokenizer}
       ^
Error:(14, 24) not found: value DeterministicNormalizingTokenizer

        val annotators = Seq(DeterministicNormalizingTokenizer, DeterministicSentenceSegmenter, OntonotesForwardPosTagger, WordNetLemmatizer)
                             ^

Chandra Shekhar

unread,
Feb 15, 2016, 6:39:28 AM2/15/16
to dis...@factorie.cs.umass.edu
Thanks a lot Emma,
Understood the problem and resolved the issue.
Thanks a ton :-)


Emma Strubell

unread,
Feb 15, 2016, 7:07:26 AM2/15/16
to dis...@factorie.cs.umass.edu

Great, glad I could help :)

Emma

Reply all
Reply to author
Forward
0 new messages