Simon Hafner
unread,Mar 3, 2015, 9:19:38 PM3/3/15Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to scalanlp
The problem: How to pass the input types to an annotator. Possible solutions:
a) Actually don't have different input types. Force everything to be
Sentence, Token, etc.
Drawback: No more object hierarchies.
b) Make the compiler find out the types. That's what I did originally.
Makes the best API.
val sentenceSegmenter = epic.preprocess.MLSentenceSegmenter.bundled().get
val tokenizer = epic.preprocess.TreebankTokenizer
val slabs = documents.map(Slab(_))
.map(sentenceSegmenter(_))
.map(tokenizer(_))
Drawback: ~ 20-30s compiletime per pipeline. Unacceptable.
c) Pass the types as arguments to the constructor.
val sentenceSegmenter = epic.preprocess.MLSentenceSegmenter.bundled().get
val tokenizer = epic.preprocess.TreebankTokenizer.slab[Sentence]
val parser =
epic.models.ParserSelector.loadParser("en").get.slab[Sentence,
ContentToken]
val slabs = documents.map(Slab(_))
.map(sentenceSegmenter(_))
.map(tokenizer(_))
.map(parser(_))
Drawback: Doesn't work for the parser, because the apply for slabs is
via implicit class. Would need to do it explicitly, shouldn't be too
big of a problem.
d) Via apply method.
val sentenceSegmenter = epic.preprocess.MLSentenceSegmenter.bundled().get
val tokenizer = epic.preprocess.TreebankTokenizer
val parser = epic.models.ParserSelector.loadParser("en").get
val slabs = documents.map(Slab(_))
.map(sentenceSegmenter(_))
.map(tokenizer[Sentence](_))
.map(parser[Sentence, ContentToken](_))
Drawback: Doesn't really work, because the type information is passed
around at class level. Would require some major rewrite of slab code.
e) Your solution.