clojure-opennlp

Jim foo.bar

unread,

Feb 11, 2012, 9:20:07 AM2/11/12

to Clojure

HI everyone,

I was just wondering whether anyone has used the clojure-opennlp
wrapper for multi-word named entity recognition (NER)? I am using it
to train a drug finder from my private corpus and even though i get
correct behavior when using the command line tool of apache openNLP
when trying to use the API i only get single-words entities
recognised!!! I've opened up a thread in the official mailing list
because initially i thought there was a genuine problem with openNLP
but since the command line tool does exactly what i want i'm starting
to think that it might not be openNLP's fault but either in my code or
in the clojure wrapper...

I've followed both the official tutorials and the wrapper
documentation and thus i am doing everything as instructed...
I know the name finder expects tokenized sentences and i am indeed
passing tokenized sentences like this:

(defn find-names-model [text]
(map #(drug-find (tokenize %))
(get-sentences text)))

It is very strange because i am getting back "Folic" but not "Folic
acid" regardless of using the exact same model i used with the command
line tool...

Any help will be greatly appreciated...
Regards,
Jim

Nicolas Buduroi

unread,

Feb 12, 2012, 3:42:36 PM2/12/12

to Clojure

Just for the record, it seems this issue has been fixed today:

https://github.com/dakrone/clojure-opennlp/commit/887add29a1fbc3b4aac7d12f5cbc52c43c6a7dcd

Try out the the new 0.1.8 version.

Lee Hinman

unread,

Feb 11, 2012, 12:47:34 PM2/11/12

to clo...@googlegroups.com

I have inquired on the OpenNLP mailing list about a way to train a tokenizer not to automatically split on spaces, if I hear back a way to do it I will add it to clojure-opennlp.

- Lee

Reply all

Reply to author

Forward