(doto (UIMAFramework/newDefaultResourceManager)(.setExtensionClassPath (. (Thread/currentThread) getContextClassLoader)) "" true))
Hi Jim, I was looking at your motivation for the hotel_nlp project. It looks like U-Compare wasn't what you were looking for, but maybe DKPro Core [1] is. It's a library/collection of interoperable UIMA components that you can mix and match as you desire. We wrap many state-of-the-art tools. The components have been made to look very uniform and are all based on the same type system. DKPro Core was developed from a researcher background. Making all these tools mixable in a convenient way is part of our daily business ;) Cheers, -- Richard [1] http://code.google.com/p/dkpro-core-asl/
AnalysisEngine tagger = createAnalysisEngine("mypackage.MyTagger");
An UIMA XML-descriptor is often not stand-alone. When it's loaded it must be able to resolve any imported descriptors, which may cause additional problems.
I'd probably ignore the XML descriptor and use the normal AnalysisEngineFactory.createPrimitive(). The downside is, that you have to specify *all* mandatory parameters in the call, because uimaFIT doesn't know the default values. Further, if the component uses external resources, you should have a look here [1].
createPrimitive() is my fallback for this particular component...I'd really prefer to go down the official route for the sake of clarity and since the descriptors exist let's use them... :) btw, from the documentation I understand that there are no external dependencies unless you load the aggregate engine which needs to pull in the WhitespaceTokenizer. But again, I'd expect that it will find it...you know better of course ;)
Placing stuff at the root of the classpath is not a good idea anyway: too much potential for conflicts and issues when using classpath scanning
example code-snippets follow:
(def config (to-array ["NGRAM_SIZE" n
"ModelFile"
"/home/sorted/clooJWorkspace/hotel-nlp/resources/pretrained_models/BrownModel.dat"]))
(def tagger (AnalysisEngineFactory/createAnalysisEngine
"HmmTaggerAggregate" config)) ;;using the official xml
descriptor gives warnings
(def jc (doto (JCasFactory/createJCas)
(.setDocumentText "My name is Jim and I like
pizzas a lot !")))
(.process tagger jc) ;;despite the warnings the code does reach
this point before the SEVERE exception...
[1] Apr 29, 2013 5:27:38 PM
org.apache.uima.analysis_engine.impl.AnalysisEngineDescription_impl
checkForInvalidParameterOverrides
WARNING: The aggregate text analysis engine "HmmTaggerTAE" has
declared the parameter NGRAM_SIZE, but has not declared any
overrides.This usage is deprecated.
Apr 29, 2013 5:27:38 PM
org.apache.uima.analysis_engine.impl.AnalysisEngineDescription_impl
checkForInvalidParameterOverrides
WARNING: The aggregate text analysis engine "HmmTaggerTAE" has
declared the parameter ModelFile, but has not declared any
overrides.This usage is deprecated.
Apr 29, 2013 5:27:38 PM WhitespaceTokenizer initialize
INFO: "Whitespace tokenizer successfully initialized"
The used model
is:/home/sorted/clooJWorkspace/hotel-nlp/resources/pretrained_models/BrownModel.dat
Apr 29, 2013 5:27:42 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Apr 29, 2013 5:27:42 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Apr 29, 2013 5:27:42 PM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(407)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
I'm using all the official techniques/guides/components and
still I cannot get them to work...The HMMTaggerAggreagte is
advertised as a ready-to-use uima-component (because it falls
back to the WhiteSpaceTokenizer if it doesn't find any Token
annotations). I've been trying for 4 days now just to tag
some text (as the uimafit website demonstrates).
Can it be that I'm that stupid? Where is all the
interoperability and smooth integration of components?
To top all that, I cannot find a single example of using the
HMMTagger in a real project. I've spent endless hours
looking...maybe that would give a clue as to how to instantiate
it and use it...its documentation I've read probably more than
30 times...
Can anyone shed some light please? This is getting terribly
frustrating... :(
again, I am truly thankful for your time
Jim
Right, I 've found Jens's repo - it's here: https://github.com/jenshaase/uimaclj/blob/master/java/uimaclj/core/CljAnnotator.java
From what I can see he's using a single function, that's why the code is much shorter+cleaner...However, I do like the fact that he's passing the actual function (if I 've understood correctly) rather than its class-name. I'm starting to think I can do the same thing found in my code but with Jens's approach (passing 3 external-resources)...Do you think that would work?
Jim