Uima and Clojure

130 views
Skip to first unread message

Jens Haase

unread,
Apr 18, 2013, 11:33:51 AM4/18/13
to uimafi...@googlegroups.com
Hi,

after Jim had some questions on how to use Clojure and UIMA together, we setup a very simple prototype of how to do it.

The idea is that we have special Clojure Annotator that gets a namespace and a function name as parameter. In the `process`
method it then resolves the Clojure function and calls it with the JCas and the UimaContext.

You can find the complete project here: https://github.com/jenshaase/uimaclj

We like to see some feedback.

Cheers,
Jens

Richard Eckart de Castilho

unread,
Apr 18, 2013, 12:09:05 PM4/18/13
to uimafi...@googlegroups.com
Very nice :)

I'm not familiar with Clojure at all, so my feedback is primary questions and may be a bit naive.

"namespace-parameter" is a "package" name and "function-parameter" is the name of a Clojure function?

Does it make any sense to want to implement an analysis engine in Clojure? (I think that's what Jim tried before)

I think it would be nice to be able to do something like this:

;; How to run pipeline
;; The CljAnnotator gets two parameter the namespace and the
;; function name you want to call.
(defn -main []
(let [jcas (JCasFactory/createJCas)
ae (AnalysisEngineFactory/createPrimitive
CljAnnotator
(to-array
[CljAnnotator/PARAM_FN (defn my-annotator-fn [uima-context jcas]
;; do your annotator work here. you can use the uima-context
;; to access configuration parameter or external resources
(println "hello world"))]))]
(.process ae jcas)
nil))

Do you have any idea what would be necessary to do that? What is the result value of "defn"?

Cheers,

-- Richard

Jens Haase

unread,
Apr 18, 2013, 1:05:40 PM4/18/13
to uimafi...@googlegroups.com
Hi Richard,

if I understand correctly you want to replace the two namespace and function name strings. This
can be done with a clojure macro. I haven't done this for now. But we can replace the main
method with something like this:

(defmacro create-primitive [f & params]
  ;; This macro needs an implementation.
  ;; Wrap the AnalysisEngineFactory.createPrimitive functions
  )

(defn -main []
  (let [jcas (JCasFactory/createJCas)
        ae (create-primitive my-annotator-fn :param-key1 "param-value1" :param-key2 "param-value2")]
    (.process ae jcas)
    nil))

Cheers,
Jens

Richard Eckart de Castilho

unread,
Apr 19, 2013, 1:26:01 AM4/19/13
to uimafi...@googlegroups.com
Hi Jens,

hm, yes,

1) I wanted to replace the two name strings and my idea was to pass a "closure" instead (no idea if that is the correct term in Clojure). E.g. in Groovy, a "closure" is realized as a Java Class implementing a particular interface. I can instantiate that class and then call the "call" method and pass the function arguments (consider this pseudocode):

public static final String PARAM_CLOSURE_CLASS = "closureClass";
@ConfigurationParameter(name=PARAM_CLOSURE_CLASS);
Class<Closure> closureClass;

closureClass.newInstance().call(getUimaContext(), aJCas);

2) In DKPro Lab's Groovy support, I slightly improved on the above by implementing a uimaFIT ResourceProvider (ClosureResourceProvider [1]) which allows me to avoid handling the class instantiation in the component.

public static final String PARAM_CLOSURE = "closure";
@ExternalResource(key=PARAM_CLOSURE);
Closure closure;

closure.call(getUimaContext(), aJCas);

If the result of a "defn" in Closure is also internally represented as a Java Class implementing a particular interface, this approach should also work in Clojure (minus possibly any classloading issues).

3) You approach with the macro also appears attractive syntaxwise. But since it hooks in at the level of the factory methods, you'd basically have to implement one macro per uimaFIT factory method. That seems a lot of overhead.

I suppose the following two are the least invasive and require just documentation:

0) passing the two name strings
1) passing a function handle (if possible)

If one considers providing some minimal language support library (e.g. as DKPro Lab does for Groovy) then

2) using a ResourceProvider, e.g. accepting name strings or function handle, is a good approach. It encapsulates the function-handling logic in a reusable place outside the component.

I think 3) and 2) can be combined to have a nice, clean, native syntax, but still maintain the "adapter" functionality between Closjure and Java/uimaFIT/UIMA in a single, easily maintainable place.

Cheers,

-- Richard

[1] http://code.google.com/p/dkpro-lab/source/browse/de.tudarmstadt.ukp.dkpro.lab/de.tudarmstadt.ukp.dkpro.lab.groovy/src/main/groovy/de/tudarmstadt/ukp/dkpro/lab/groovy/uima/ClosureResourceProvider.java
> --
> You received this message because you are subscribed to the Google Groups "uimafit-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to uimafit-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Richard Eckart de Castilho

unread,
Apr 19, 2013, 1:29:09 AM4/19/13
to uimafi...@googlegroups.com
In all this discussion, I realize that this Groovy ClosureResourceProvider
should actually be part of uimaFIT. It'd probably be a good idea to open a
uimafit-groovy module and make it available there.

-- Richard

Jens Haase

unread,
Apr 19, 2013, 3:48:00 AM4/19/13
to uimafi...@googlegroups.com
Hi Richard,

in Clojure "closures" are called function. You can create a named function with (defn my-name [args] body) or an anonymous function (fn [args] body). In the optimal case you would use the second case to pass it into the resource. However, I do not think that is possible in clojure. As in Groovy functions are classes that implement the IFn interface. But the compilation in clojure is very dynamic (you will never have a .class file on your disk). I think function are just class instances in memory (but I'm not very sure). I never saw a way to recreate a function with the "new" keyword.

Moving the instantiation to a resource provider seem to be a good option. I will try this.

By the way, I added the create-primitive function: https://github.com/jenshaase/uimaclj/blob/master/src/uimaclj/core/test.clj#L22

Cheers,
Jens

Richard Eckart de Castilho

unread,
Apr 19, 2013, 4:54:29 AM4/19/13
to uimafi...@googlegroups.com
> in Clojure "closures" are called function. You can create a named function with (defn my-name [args] body) or an anonymous function (fn [args] body). In the optimal case you would use the second case to pass it into the resource. However, I do not think that is possible in clojure. As in Groovy functions are classes that implement the IFn interface. But the compilation in clojure is very dynamic (you will never have a .class file on your disk). I think function are just class instances in memory (but I'm not very sure). I never saw a way to recreate a function with the "new" keyword.

Wether the class resides on disk or in memory shouldn't make any difference at runtime. It just needs to be reachable from the currently user ClassLoader.

Frameworks like Spring or AspectJ also rely a lot on classes that are dynamically generated or dynamcially modified and only exist in the final form in memory.

The "new" keyword could of course not be used to load such a class in a ResourceProvider, but Class.newInstance() can be used.

-- Richard

Jens Haase

unread,
Apr 19, 2013, 5:13:08 AM4/19/13
to uimafi...@googlegroups.com
Hi,

short question. How can I pass a parameter of type Class<IFn>?

Got this exception:
Failed to convert property value of type 'java.lang.String' to required type 'java.lang.Class' for property

I try what you mentioned in point 1) in one of your last mails.

Cheers,
Jens

Richard Eckart de Castilho

unread,
Apr 19, 2013, 5:21:41 AM4/19/13
to uimafi...@googlegroups.com
Hi,

> short question. How can I pass a parameter of type Class<IFn>?
>
> Got this exception:
> Failed to convert property value of type 'java.lang.String' to required type 'java.lang.Class' for property
>
> I try what you mentioned in point 1) in one of your last mails.


that depends on a couple of factors:

a) "Class" is supported for reader and analysis engines
b) "Class" is (should be) supported for external resources that are SharedResourceObjects (see [1])
c) only "String" is supported for external resources that are based on Resource_ImplBase

This should be the state in uimaFIT 1.4.0. The ClosureResoureProvider implementation I gave as an example is a case of c). For the example I gave in 1), I had a) in mind, but it should work with a b) variant.

Cheers,

-- Richard

[1] https://groups.google.com/d/msg/uimafit-users/Q9GWBZajff8/QO3mw0nqviIJ

Jens Haase

unread,
Apr 19, 2013, 6:37:52 AM4/19/13
to uimafi...@googlegroups.com
Hi Richard,

i did it:

https://github.com/jenshaase/uimaclj/blob/master/src/uimaclj/core/test.clj#L31

Looks really nice now. Thanks.

Cheers,
Jens
--
Gruß
Jens Haase
http://about.me/jenshaase

Richard Eckart de Castilho

unread,
Apr 19, 2013, 6:53:26 AM4/19/13
to uimafi...@googlegroups.com
Hi Jens,

nice indeed!

Btw. you can avoid the calle to "bindResource" and just pass in an external resource descriptor like a regular parameter, like:

ExternalResourceDescription fnDesc = ExternalResourceFactory.createExternalResourceDescription(
ClojureResourceProvider.class, …);

createPrimitiveDescription(CljAnnotator.class, PARAM_CLOJURE_FN, fnDesc);


Let's go a little advanced now ;)

Can I "let" another variable in -main, e.g. a=1 and access this in the anonymous "fn", e.g. "println a"? If that's works, I'd personally probably do that instead of passing in parameters via the UIMAContext (via :param1, etc.).

-- Richard

Jens Haase

unread,
Apr 19, 2013, 7:14:31 AM4/19/13
to uimafi...@googlegroups.com
Hi,

setting a = 1 in the let statement did not work. It throw an exception
because the function class has now a constructor with one argument. I
don't see an option to extract the local context and pass it to the
constructor.

But if you define a global value (in clojure with (def name value)) it
will work.

Cheers,
Jens


Richard Eckart de Castilho <richard...@gmail.com> writes:

Jim

unread,
Apr 19, 2013, 12:01:24 PM4/19/13
to uimafi...@googlegroups.com
Hi again,

I quickly skimmed through this thread as it turned out I hadn't properly applied to be a member of the list...I thought simply posting something would suffice...anyway I'm stupid - I missed some really good discussion...

OK, as I said I only skimmed through very quickly so forgive me if I've misunderstood something. From what I can see Jens has been sort of re-inventing the wheel...The big goal of mine was to completely disconnect  the class that does the actual work from its UIMA proxy. Apart from the interop that requires only 2 things:

1) being able to instantiate components with parameters -> we can via annotations + uimafit
2) being able to somehow pass functions that will read from & write to the JCas. This is rather inconvenient and non-idiomatic for Clojure but there is no other way. .process is a stateful call after all...

that is it! of course I'll have to agree that being able to pass small anonymous functions is indeed attractive :)

Now, the higher level API that builds on top of these abilities I already have. It's here:

In its current state (haven't changed anything from yesterday) the consumer can with a single functional call turn his class into uima-compatible. The odd thing is that the fn expects a var and a string...very rarely you see that but I'm prepared to live with that in case I find nothing better...I'll have to look close to see Jens has done but at first sight I see 2 Java classes, which I guess can't be good! :)

 
also, I have to point out again that apart from the actual annotator you need another 2 fns passed in...a fn to extract the appropriate 'tokens' from the JCas and one to write the annotations back...otherwise we're back to step 1 - there is no real disconnect between UIMA and the object that will do the actual work...

Jim

ps: of course the namespace I posted is far from complete/final...but you can get a taste of what is happening... the spine of producing components is 'produce' and doesn't require uimafit. 'uima-compatible' however does use uimafit to produce a primitive analysis description which is passed to 'produced' and finally wrapped by UIMAProxy. 

Jim

unread,
Apr 19, 2013, 12:15:44 PM4/19/13
to uimafi...@googlegroups.com
@Jen,
ClojureResourceProvider.java

again, the classloading issue has been resolved quite nicely already...there is no need for hacks. Richard pointed out a while back that UIMAFramework can take an extensionClassPath where you can also specify a parent Classloader...so, as long as you fire up UIMA with the following resource-manager, that java class is useless, no?

(doto (UIMAFramework/newDefaultResourceManager)
  (.setExtensionClassPath (. (Thread/currentThread) getContextClassLoader)) "" true))

Jim

ps: hmmm... perhaps now that UIMA has dynamic-classloader as its parent we can ask instances of the functions directly from UIMA, thus bypassing 'ns-resolve' & 'symbol'? 

Jim

unread,
Apr 20, 2013, 7:25:43 AM4/20/13
to uimafi...@googlegroups.com
I refactored my code to use the 'Class/forName(...).newInstance()' approach instead of the 'ns-resolve/symbol' approach as Jen did...The good thing about it is that the 3 functions do not need to be in the same namespace - they can be anywhere. The down side is that this will only work for functions- NOT for defrecords/deftypes as they don't have a nullary constructor...the workaround is to wrap the defrecords/deftypes in a functions before passing them in, which in turn requires that they implement IFn. I'm currently weighting the pros and cons of each approach...

Jim 

Richard Eckart de Castilho

unread,
Apr 24, 2013, 12:04:12 PM4/24/13
to uimafi...@googlegroups.com
Hi Jim,

again on anything I say, please take into account that I'm not a user of Clojure any may be naive to ignorant on some things

Am 19.04.2013 um 18:01 schrieb Jim <jimpi...@gmail.com>:
> Now, the higher level API that builds on top of these abilities I already have. It's here:
> https://github.com/jimpil/hotel-nlp/blob/master/src/hotel_nlp/externals/uima.clj
>
> In its current state (haven't changed anything from yesterday) the consumer can with a single functional call turn his class into uima-compatible. The odd thing is that the fn expects a var and a string...very rarely you see that but I'm prepared to live with that in case I find nothing better…I'll have to look close to see Jens has done but at first sight I see 2 Java classes, which I guess can't be good! :)

I suppose if it helps integrating, Java classes ain't bad. I wonder if Clojure is actually written in Clojure or in fully Java ;)

> also, I have to point out again that apart from the actual annotator you need another 2 fns passed in…a fn to extract the appropriate 'tokens' from the JCas and one to write the annotations back...otherwise we're back to step 1 - there is no real disconnect between UIMA and the object that will do the actual work...

Why is it essential that features structures are extracted and then written back to the CAS? Why not take the function that you pass in as a transformation function. Imagine the CAS is a list of feature structures. Then you have a function f : CAS -> CAS which takes a CAS, iterates over its contents and creates/modifies/deletes feature structures as necessary. It returns the same CAS object (now updated) which can then be passed into the next function if necessary.

Cheers,

-- Richard

Jim - FooBar();

unread,
Apr 24, 2013, 12:32:20 PM4/24/13
to uimafi...@googlegroups.com
On 24/04/13 17:04, Richard Eckart de Castilho wrote:
> Hi Jim,

Hi Richard,

your email was somewhat a coincidence as I was thinking about emailing
you today in order to ask you if you're still interested in that
uima+clojure tutorial! :)
I 'm at a point where I can do both things (converting arbitrary classes
to UIMA components *and* using UIMA components from Clojure).

> I suppose if it helps integrating, Java classes ain't bad. I wonder if Clojure is actually written in Clojure or in fully Java
well, I 'd really like to minimize the Java code I'm shipping with my
library...Most of Clojure is written in Clojure (excluding the immutable
data-structures which are the foundation of the language)...

> Why is it essential that features structures are extracted and then written back to the CAS? Why not take the function that you pass in as a transformation function. Imagine the CAS is a list of feature structures. Then you have a function f : CAS -> CAS which takes a CAS, iterates over its contents and creates/modifies/deletes feature structures as necessary. It returns the same CAS object (now updated) which can then be passed into the next function if necessary.

yes, you could bundle up all the functionality (extracting-input, doing
the work, writing the annotations) in a single function but that is not
acceptable when you have ready-made components that know nothing about
UIMA. That was the problem with Jen's code as well...it will only work
for functions, but you might want to use the same stuff with a defrecord
(an actual custom class) or a deftype...both defrecord & deftype do not
have a nullary constructor whereas functions do! So the whole point of
splitting up the functionality in 3 pieces seriously decomplects the
situation...for example suppose you have a tokenizer that accepts a
sentence-string and returns back an arraylist with the token-strings.
This tokenizer will not know how extract the text from the JCas or how
to write back the annotation indexes. In fact it doesn't even produce
indexes - you'll have to reconstruct them after it runs! So how to use
this in UIMA without modifying it? It can't be done...of course if
you're writing an annotator from scratch you've got all the freedom you
need but again if you bundled up everything, that component is no longer
a generic component but rather an UIMA component...see what I mean?


As I said I am sort of ready to write up that UIMA+CLOJURE tutorial (If
there is still interest of course)...In addition, I can demonstrate how
to go from Clojure -> UIMA (converting components into UIMA friendly)
*and* from UIMA -> Clojure (using UIMA components from Clojure). It
won't be a short article as there are several things that need to be
addressed...I've worked/wrapped several large Java libraries in the past
and I can confidently say that UIMA caused the most problems...In fact
if it wasn't for uimafit I'd have given up long ago!


Jim




Richard Eckart de Castilho

unread,
Apr 24, 2013, 1:13:38 PM4/24/13
to uimafi...@googlegroups.com
Hey ;)

> your email was somewhat a coincidence as I was thinking about emailing you today in order to ask you if you're still interested in that uima+clojure tutorial! :)
> I 'm at a point where I can do both things (converting arbitrary classes to UIMA components *and* using UIMA components from Clojure).

Nice :)

>> Why is it essential that features structures are extracted and then written back to the CAS? Why not take the function that you pass in as a transformation function. Imagine the CAS is a list of feature structures. Then you have a function f : CAS -> CAS which takes a CAS, iterates over its contents and creates/modifies/deletes feature structures as necessary. It returns the same CAS object (now updated) which can then be passed into the next function if necessary.
>
> yes, you could bundle up all the functionality (extracting-input, doing the work, writing the annotations) in a single function but that is not acceptable when you have ready-made components that know nothing about UIMA. That was the problem with Jen's code as well...it will only work for functions, but you might want to use the same stuff with a defrecord (an actual custom class) or a deftype...both defrecord & deftype do not have a nullary constructor whereas functions do! So the whole point of splitting up the functionality in 3 pieces seriously decomplects the situation...for example suppose you have a tokenizer that accepts a sentence-string and returns back an arraylist with the token-strings. This tokenizer will not know how extract the text from the JCas or how to write back the annotation indexes. In fact it doesn't even produce indexes - you'll have to reconstruct them after it runs! So how to use this in UIMA without modifying it? It can't be done…of course if you're writing an annotator from scratch you've got all the freedom you need but again if you bundled up everything, that component is no longer a generic component but rather an UIMA component...see what I mean?

Well, yes and no. I suppose I get your point about decomposing into mainly three functions:

- CAS -> input
- input -> output
- output -> CAS

Still, when connecting these function, it ends up in a being representable CAS -> CAS function. So when I can inject a CAS -> CAS function in some way, it's possible to implement that internally into the 3 stages. Probably it is also possible to use local deftype/defrecord or even access ones that have been defined outside somewhere using the Classloader stuff we were discussing.

I gather there's also the other thing where you want to do the

- input -> CAS
- CAS -> CAS (UIMA process)
- CAS -> output

In that case, I suppose nothing has to be injected into the UIMA component.

Or maybe I'm understanding only half or nothing of what you said ;) As I said, quite naive when it comes to functional programming

I had a look at your uima.clj file. I can even understand some things and give some hints:

(def jc (org.uimafit.factory.JCasFactory/createJCas
(org.uimafit.factory.TypeSystemDescriptionFactory/createTypeSystemDescription)))

You should be able to reduce that to

(def jc (org.uimafit.factory.JCasFactory/createJCas))

You could change your inject-annotation function to use CasUtil.getAnnotationType(CAS, Class) internally. That way you should be able to pass Annotation.class in instead of "uima.tcas.Annotation".

What does "alt-implementation" do?

> As I said I am sort of ready to write up that UIMA+CLOJURE tutorial (If there is still interest of course)...In addition, I can demonstrate how to go from Clojure -> UIMA (converting components into UIMA friendly) *and* from UIMA -> Clojure (using UIMA components from Clojure). It won't be a short article as there are several things that need to be addressed...I've worked/wrapped several large Java libraries in the past and I can confidently say that UIMA caused the most problems...In fact if it wasn't for uimafit I'd have given up long ago!

I think it would be great. As I said, I'm still working on the uimaFIT 2.0.0 release and docs are in the works there, however, mainly on the core functionality at the moment. E.g. I don't think I'll include any mention of Groovy stuff for 2.0.0. We could put it up on the uimaFIT Google Code wiki, in the Apache UIMA wiki [1] or wherever you find a better place. In any case, I'll put at a link on the Google Code page.

Cheers,

-- Richard

[1] https://cwiki.apache.org/UIMA/uimafit.html

Jim - FooBar();

unread,
Apr 24, 2013, 2:49:49 PM4/24/13
to uimafi...@googlegroups.com
Hey Richard,

On 24/04/13 18:13, Richard Eckart de Castilho wrote:
> Still, when connecting these function, it ends up in a being representable CAS -> CAS function. So when I can inject a CAS -> CAS function in some way, it's possible to implement that internally into the 3 stages. Probably it is also possible to use local deftype/defrecord or even access ones that have been defined outside somewhere using the Classloader stuff we were discussing.

aaa I think I see what you mean...if I'm understanding correctly, you'd
prefer passing a single function that perhaps wraps the
defrecord/deftype and does more or less the same but all bundled
up...hmm...that is certainly a valid concern but I guess you can already
do that :) all I need to do is to null-check the 'CAS -> input' & output
-> CAS fns in my UIMAProxy.java and then the consumer can pass nils for
those and only leave the actual component which as you said will take
care of everything...have I understood correctly?
If yes, I actually like it and if you notice my new UIMAProxy.java
you'll see that you can now safely pass nulls for all but the central fn
which is the annotator function itself...

> I gather there's also the other thing where you want to do the
>
> - input -> CAS
> - CAS -> CAS (UIMA process)
> - CAS -> output
>
> In that case, I suppose nothing has to be injected into the UIMA component.

hmmm...this is the point where I'm not an expert :) Could you explain
this a bit more? What case is this? Is this when you're running a
pipeline of many annotators? I've clearly not thought about this...


> Or maybe I'm understanding only half or nothing of what you said As I said, quite naive when it comes to functional programming
no worries, as I said I'm quite naive when it comes to UIMA...I'm slowly
getting to grips with it...perhaps that is the reason I don't like it so
much...all this XML/reflection makes me uncomfortable...;)

> I had a look at your uima.clj file. I can even understand some things and give some hints:
>
> (def jc (org.uimafit.factory.JCasFactory/createJCas
> (org.uimafit.factory.TypeSystemDescriptionFactory/createTypeSystemDescription)))
>
> You should be able to reduce that to
>
> (def jc (org.uimafit.factory.JCasFactory/createJCas))

thanks! I commited it :)

> You could change your inject-annotation function to use CasUtil.getAnnotationType(CAS, Class) internally. That way you should be able to pass Annotation.class in instead of "uima.tcas.Annotation".

awesome! I will do that... :)
> What does "alt-implementation" do?

UIMA lets you specify an alternative implementation instead of the
default one...essentially that means you can write your own
UIMAFramework.java and use that...It's not terribly important, in fact
most of the stuff in that namespace are reduntant...I was just spitting
out code as I was going through the docs...

> I think it would be great. As I said, I'm still working on the uimaFIT 2.0.0 release and docs are in the works there, however, mainly on the core functionality at the moment. E.g. I don't think I'll include any mention of Groovy stuff for 2.0.0. We could put it up on the uimaFIT Google Code wiki, in the Apache UIMA wiki [1] or wherever you find a better place. In any case, I'll put at a link on the Google Code page.

Ok cool...I'll do it soon...The title will probably be something like
"From Clojure to UIMAFIT to UIMA and back"... ;)

thanks again :)

Jim

Richard Eckart de Castilho

unread,
Apr 24, 2013, 3:16:15 PM4/24/13
to uimafi...@googlegroups.com
Hi,

> On 24/04/13 18:13, Richard Eckart de Castilho wrote:
>> Still, when connecting these function, it ends up in a being representable CAS -> CAS function. So when I can inject a CAS -> CAS function in some way, it's possible to implement that internally into the 3 stages. Probably it is also possible to use local deftype/defrecord or even access ones that have been defined outside somewhere using the Classloader stuff we were discussing.
>
> aaa I think I see what you mean...if I'm understanding correctly, you'd prefer passing a single function that perhaps wraps the defrecord/deftype and does more or less the same but all bundled up...hmm...that is certainly a valid concern but I guess you can already do that :) all I need to do is to null-check the 'CAS -> input' & output -> CAS fns in my UIMAProxy.java and then the consumer can pass nils for those and only leave the actual component which as you said will take care of everything...have I understood correctly?
> If yes, I actually like it and if you notice my new UIMAProxy.java you'll see that you can now safely pass nulls for all but the central fn which is the annotator function itself…

I didn't look at the proxy (I'm terribly lazy to look up stuff if there's no link ;) but it sounds like we're on the same level here.

>> I gather there's also the other thing where you want to do the
>>
>> - input -> CAS
>> - CAS -> CAS (UIMA process)
>> - CAS -> output
>>
>> In that case, I suppose nothing has to be injected into the UIMA component.
>
> hmmm...this is the point where I'm not an expert :) Could you explain this a bit more? What case is this? Is this when you're running a pipeline of many annotators? I've clearly not thought about this…

Running multiple annotators is actually a no-brainer in UIMA because it supports aggregates. You can just wrap up a set of annotators as a new annotator and then we're back in the original situation of running just one.

No, what I was thinking is that you actually turn an UIMA component into a function. E.g. you pass a list of sentences into the function, which represents e.g. an UIMA tokenizer + named entity detector, and get back a list of named entities (or something more useful).

>> Or maybe I'm understanding only half or nothing of what you said As I said, quite naive when it comes to functional programming
> no worries, as I said I'm quite naive when it comes to UIMA...I'm slowly getting to grips with it...perhaps that is the reason I don't like it so much...all this XML/reflection makes me uncomfortable…;)

Hehe. XML is… borderline but the reflection is what's saving us here.

>> What does "alt-implementation" do?
>
> UIMA lets you specify an alternative implementation instead of the default one...essentially that means you can write your own UIMAFramework.java and use that...It's not terribly important, in fact most of the stuff in that namespace are reduntant…I was just spitting out code as I was going through the docs...

Hardly used I think. I only once descended into the depths of the framework composition, when writing the experimental Spring support in uimaFIT. I used some evil reflection hack (yeah… not the white reflection used elsewhere in uimaFIT, but the dark black one) to patch in Spring support. Maybe I should have used that instead. I guess that happens when reading code instead of documentation…

>> I think it would be great. As I said, I'm still working on the uimaFIT 2.0.0 release and docs are in the works there, however, mainly on the core functionality at the moment. E.g. I don't think I'll include any mention of Groovy stuff for 2.0.0. We could put it up on the uimaFIT Google Code wiki, in the Apache UIMA wiki [1] or wherever you find a better place. In any case, I'll put at a link on the Google Code page.
>
> Ok cool...I'll do it soon...The title will probably be something like "From Clojure to UIMAFIT to UIMA and back"... ;)

Looking forward to it :)

Cheers,

-- Richard

Jim - FooBar();

unread,
Apr 24, 2013, 3:44:13 PM4/24/13
to uimafi...@googlegroups.com
On 24/04/13 20:16, Richard Eckart de Castilho wrote:
> I didn't look at the proxy (I'm terribly lazy to look up stuff if there's no link;) but it sounds like we're on the same level here.

I'm so sorry I should have included the link:

https://github.com/jimpil/hotel-nlp/tree/master/src/hotel_nlp/externals

in this folder you'll find the UIMAProxy.java and the uima.clj
namespace...I wouldn't spend too much time on uima.clj cos as I said it
has a lot of reduntant code...if you're so inclined look at the very
last bit (the code block wrapped in 'comment') which shows an example
usage with the tokenizer I mentioned earlier . if you reach that far you
can safely skip the first 2 lines that basically loads+uses the namespace...

> No, what I was thinking is that you actually turn an UIMA component into a function. E.g. you pass a list of sentences into the function, which represents e.g. an UIMA tokenizer + named entity detector, and get back a list of named entities (or something more useful).

well that was my plan for today but I got carried away...now that I've
got the infrastructure I can try to turn the HMM-POS-TAGGER of UIMA into
a hotel_nlp component or at least a function. However, since it expects
a JCas *with* Sentence + Token annotations I first needed a way to
inject the annotations in...now that this is out of the way I can focus
on the rest ;)

Jim

Jim - FooBar();

unread,
Apr 25, 2013, 4:48:04 PM4/25/13
to uimafi...@googlegroups.com
On 25/04/13 06:54, Richard Eckart de Castilho wrote:
Hi Jim,

I was looking at your motivation for the hotel_nlp project. It looks like U-Compare wasn't what you were looking for, but maybe DKPro Core [1] is. It's a library/collection of interoperable UIMA components that you can mix and match as you desire. We wrap many state-of-the-art tools. The components have been made to look very uniform and are all based on the same type system.

DKPro Core was developed from a researcher background. Making all these tools mixable in a convenient way is part of our daily business ;)

Cheers,

-- Richard

[1] http://code.google.com/p/dkpro-core-asl/ 

This is great info! thanks a lot... :)

btw, I've got another problem...for the purposes of the uima+clojrue tutorial I'm working with the UIMA HMMTagger. Now, this is a proper UIMA component so it comes with XML descriptors and everything. ON the uimafit website it shows this:
AnalysisEngine tagger = createAnalysisEngine("mypackage.MyTagger");

this will fine only if the XML is in the same path as the class file. From the little that I've seen not many components actually do that...usually the XML is at the root directory in the jar. What do I do in that case? I tried 'createAnalysisEngineFromPath' but it literally tries my file-system and not the project classpath! what is the easiest way of instantiating a proper UIMA component from uimafit?

again. thanks a lot :)

Jim




Richard Eckart de Castilho

unread,
Apr 26, 2013, 4:17:45 AM4/26/13
to uimafi...@googlegroups.com
An UIMA XML-descriptor is often not stand-alone. When it's loaded it must be able to resolve any imported descriptors, which may cause additional problems.

I'd probably ignore the XML descriptor and use the normal AnalysisEngineFactory.createPrimitive(). The downside is, that you have to specify *all* mandatory parameters in the call, because uimaFIT doesn't know the default values. Further, if the component uses external resources, you should have a look here [1].

Placing stuff at the root of the classpath is not a good idea anyway: too much potential for conflicts and issues when using classpath scanning [2].

Cheers,

-- Richard

[1] https://code.google.com/p/uimafit/wiki/ExternalResources#Regular_UIMA_components
[2] http://static.springsource.org/spring/docs/3.0.x/reference/resources.html (Section 4.7.2.3)

Jim - FooBar();

unread,
Apr 26, 2013, 1:32:31 PM4/26/13
to uimafi...@googlegroups.com
On 26/04/13 09:17, Richard Eckart de Castilho wrote:
An UIMA XML-descriptor is often not stand-alone. When it's loaded it must be able to resolve any imported descriptors, which may cause additional problems.

This makes sense...However, this particular component comes prepackaged as an UIMA component  - in other words, this particular lib doesn't make sense outside of UIMA. Hence, I'd expect it to be packaged in a fully self-contained and compatible manner...that is, if I load 1 descriptor which needs other ones it should know where to find them without me doing anything. Much like the relationship between Java classes...am I being too optimistic?


I'd probably ignore the XML descriptor and use the normal AnalysisEngineFactory.createPrimitive(). The downside is, that you have to specify *all* mandatory parameters in the call, because uimaFIT doesn't know the default values. Further, if the component uses external resources, you should have a look here [1].

createPrimitive() is my fallback for this particular component...I'd really prefer to go down the official route for the sake of clarity and since the descriptors exist let's use them... :) 
btw, from the documentation I understand that there are no external dependencies unless you load the aggregate engine which needs to pull in the WhitespaceTokenizer. But again, I'd expect that it will find it...you know better of course  ;) 

Placing stuff at the root of the classpath is not a good idea anyway: too much potential for conflicts and issues when using classpath scanning

    tell me about it...still though there must be a way to get hold of
    it... I tried using getResourceAsStream etc but I get nil! :(

Jim

Jim - FooBar();

unread,
Apr 26, 2013, 1:45:56 PM4/26/13
to uimafi...@googlegroups.com
I just fetched the WhitespaceTokenizer from maven, opened up the jar and what do I see? The xml descriptor of the component is again at the root of the jar! I don't think this is a coincidence...What is the official way of instantiating ready-made components? I seem unable to read that XML no matter what I try...I guess I could try moving it in the same package as  the class manually but that is not really a viable solution...

Jim

Richard Eckart de Castilho

unread,
Apr 27, 2013, 3:53:39 AM4/27/13
to uimafi...@googlegroups.com
If the descriptor is at the root of the JAR, something like

AnalysisEngine tagger = createAnalysisEngine("NameOfDescriptor");

should work (no .xml and no package name of the component class).

-- Richard

Jim - FooBar();

unread,
Apr 28, 2013, 6:18:48 AM4/28/13
to uimafi...@googlegroups.com
thanks a lot Richard...it worked :)

Jim

Jim - FooBar();

unread,
Apr 29, 2013, 12:49:49 PM4/29/13
to uimafi...@googlegroups.com
Hi again,

Richard, first if all I ought to say that I appreciate your help so far enormously...I really hate to break your b***s, but in my defense it's only because my b***s are shattered by now! there are problems on top of problems on top of problems...if you don't mind let me explain:

  1. Your suggestion to read the xml without the .xml  suffix worked like a charm so that's good. However, even with the official descriptor I get 'deprecated-usage' warnings (see below).
  2. It is impossible to read the actual models the HmmTagger needs from the jar. IMPOSSIBLE! The only way is to extract them on to your file-system and use them from there.
  3. Even if I do extract the models on my disk, my code reaches the point where .process is called but then a NPE is thrown and a truly massive stacktrace accompanies it! Presumably this is related to the warnings earlier... :(

example code-snippets follow:

(def config (to-array  ["NGRAM_SIZE" n
                                    "ModelFile" "/home/sorted/clooJWorkspace/hotel-nlp/resources/pretrained_models/BrownModel.dat"]))

(def tagger (AnalysisEngineFactory/createAnalysisEngine "HmmTaggerAggregate" config)) ;;using the official xml descriptor gives warnings

(def jc (doto (JCasFactory/createJCas)
                (.setDocumentText "My name is Jim and I like pizzas a lot !")))

(.process tagger jc) ;;despite the warnings the code does reach this point before the SEVERE exception...


[1] Apr 29, 2013 5:27:38 PM org.apache.uima.analysis_engine.impl.AnalysisEngineDescription_impl checkForInvalidParameterOverrides
WARNING: The aggregate text analysis engine "HmmTaggerTAE" has declared the parameter NGRAM_SIZE, but has not declared any overrides.This usage is deprecated.
Apr 29, 2013 5:27:38 PM org.apache.uima.analysis_engine.impl.AnalysisEngineDescription_impl checkForInvalidParameterOverrides
WARNING: The aggregate text analysis engine "HmmTaggerTAE" has declared the parameter ModelFile, but has not declared any overrides.This usage is deprecated.
Apr 29, 2013 5:27:38 PM WhitespaceTokenizer initialize
INFO: "Whitespace tokenizer successfully initialized"
The used model is:/home/sorted/clooJWorkspace/hotel-nlp/resources/pretrained_models/BrownModel.dat
Apr 29, 2013 5:27:42 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Apr 29, 2013 5:27:42 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Apr 29, 2013 5:27:42 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(407)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.   
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)

 I'm using all the official techniques/guides/components and still I cannot get them to work...The HMMTaggerAggreagte is advertised as a ready-to-use uima-component (because it falls back to the WhiteSpaceTokenizer if it doesn't find any Token annotations). I've been trying for 4 days now just to tag some text (as the uimafit website demonstrates). Can it be that I'm that stupid? Where is all the interoperability and smooth integration of components?

To top all that, I cannot find a single example of using the HMMTagger in a real project. I've spent endless hours looking...maybe that would give a clue as to how to instantiate it and use it...its documentation I've read probably more than 30 times...

Can anyone shed some light please? This is getting terribly frustrating... :(
again, I am truly thankful for your time

Jim

Richard Eckart de Castilho

unread,
Apr 29, 2013, 1:11:12 PM4/29/13
to uimafi...@googlegroups.com
Hi Jim,

well, I have to admin, I'm really the wrong person to ask about the HMMTagger in particular. You'd better ask that on the UIMA users list. I'd suppose that the tagger may simply relatively be unmaintained.

This is maybe not the answer you wanted, but I don't want to go digging down into the official HMMTagger now to try and fix it.

I could offer you to the DKPro Core components as an alternative… we've got the OpenNLP tagger, TreeTagger, Stanford POS Tagger, the ClearNLP POS Tagger, Learning-based-Java POS Tagger… there's actually quite smooth interoperability and integration. Also, I can help much better with problems ;) I doubt the HMMTagger is better than any of the previously mentioned ones, so we didn't bother to integrate it into DKPro Core. We actually could, just for sake of completeness. Btw. DKPro Core does support loading models from the classpath. We even provide them as Maven artifacts!

-- Richard

Richard Eckart de Castilho

unread,
Apr 29, 2013, 1:16:08 PM4/29/13
to uimafi...@googlegroups.com
Am 29.04.2013 um 18:49 schrieb "Jim - FooBar();" <jimpi...@gmail.com>:

> Apr 29, 2013 5:27:42 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(407)
> SEVERE: Exception occurred
> org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)

Btw., this stack trace appears to be cropped. At least in uima 2.4.0, this line reads

throw new AnalysisEngineProcessException(
AnalysisEngineProcessException.ANNOTATOR_EXCEPTION, null, e);

so I'd expect there must be some chained exception that you didn't tell us.

-- Richard

Jim - FooBar();

unread,
Apr 29, 2013, 1:18:42 PM4/29/13
to uimafi...@googlegroups.com
I see...well, the only reason I'm doing this is to provide a complete
UIMA->Clojure & Clojure->UIMA tutorial to you and potentially some new
capabilities to my little library - not for actual real work...For
serious work you've seriously tempted me to try out DKPro...

anyway the HMMTagger and the WhiteSpace tokenizer are still version
2.3.1...I guess I chose the wrong components to do the demonstration...

Your other email just came...

> Btw., this stack trace appears to be cropped. At least in uima 2.4.0, this line reads
>
> throw new AnalysisEngineProcessException(
> AnalysisEngineProcessException.ANNOTATOR_EXCEPTION, null, e);
>
> so I'd expect there must be some chained exception that you didn't tell us.

as I said it's a truly massive stacktrace - completely unmanagable....do
you want me to paste it in an empty email?

Jim


On 29/04/13 18:11, Richard Eckart de Castilho wrote:
> Hi Jim,
>
> well, I have to admin, I'm really the wrong person to ask about the HMMTagger in particular. You'd better ask that on the UIMA users list. I'd suppose that the tagger may simply relatively be unmaintained.
>
> This is maybe not the answer you wanted, but I don't want to go digging down into the official HMMTagger now to try and fix it.
>
> I could offer you to the DKPro Core components as an alternative� we've got the OpenNLP tagger, TreeTagger, Stanford POS Tagger, the ClearNLP POS Tagger, Learning-based-Java POS Tagger� there's actually quite smooth interoperability and integration. Also, I can help much better with problems ;) I doubt the HMMTagger is better than any of the previously mentioned ones, so we didn't bother to integrate it into DKPro Core. We actually could, just for sake of completeness. Btw. DKPro Core does support loading models from the classpath. We even provide them as Maven artifacts!
>
> -- Richard
>
> Am 29.04.2013 um 18:49 schrieb "Jim - FooBar();" <jimpi...@gmail.com>:
>
>> Hi again,
>>
>> Richard, first if all I ought to say that I appreciate your help so far enormously...I really hate to break your b***s, but in my defense it's only because my b***s are shattered by now! there are problems on top of problems on top of problems...if you don't mind let me explain:
>>
>> � Your suggestion to read the xml without the .xml suffix worked like a charm so that's good. However, even with the official descriptor I get 'deprecated-usage' warnings (see below).
>> � It is impossible to read the actual models the HmmTagger needs from the jar. IMPOSSIBLE! The only way is to extract them on to your file-system and use them from there.
>> � Even if I do extract the models on my disk, my code reaches the point where .process is called but then a NPE is thrown and a truly massive stacktrace accompanies it! Presumably this is related to the warnings earlier... :(

Richard Eckart de Castilho

unread,
Apr 29, 2013, 1:20:06 PM4/29/13
to uimafi...@googlegroups.com
@Stacktrace: You know http://pastebin.com?

-- Richard

Jim - FooBar();

unread,
Apr 29, 2013, 1:20:23 PM4/29/13
to uimafi...@googlegroups.com
On 29/04/13 18:18, Jim - FooBar(); wrote:
> as I said it's a truly massive stacktrace - completely
> unmanagable....do you want me to paste it in an empty email?
>
> Jim

here it is:
----------------------------------------------------------------------------
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)
at hotel_nlp.externals.uima$uima_hmm_postag.doInvoke(NO_SOURCE_FILE:13)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at hotel_nlp.externals.uima$eval3726.invoke(NO_SOURCE_FILE:1)
at clojure.lang.Compiler.eval(Compiler.java:6619)
at clojure.lang.Compiler.eval(Compiler.java:6582)
at clojure.core$eval.invoke(core.clj:2852)
at
clojure.main$repl$read_eval_print__6588$fn__6591.invoke(main.clj:259)
at clojure.main$repl$read_eval_print__6588.invoke(main.clj:259)
at clojure.main$repl$fn__6597.invoke(main.clj:277)
at clojure.main$repl.doInvoke(main.clj:277)
at clojure.lang.RestFn.invoke(RestFn.java:1096)
at
clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__589.invoke(interruptible_eval.clj:56)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:617)
at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1788)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at
clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj:41)
at
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__630$fn__633.invoke(interruptible_eval.clj:171)
at clojure.core$comp$fn__4154.invoke(core.clj:2330)
at
clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__623.invoke(interruptible_eval.clj:138)
at clojure.lang.AFn.run(AFn.java:24)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:474)
at org.apache.uima.cas.impl.CASImpl.createAnnotation(CASImpl.java:3916)
at
org.apache.uima.annotator.WhitespaceTokenizer.createAnnotation(WhitespaceTokenizer.java:230)
at
org.apache.uima.annotator.WhitespaceTokenizer.process(WhitespaceTokenizer.java:143)
at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
... 32 more

Apr 29, 2013 5:27:42 PM
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
processAndOutputNewCASes(275)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)
at hotel_nlp.externals.uima$uima_hmm_postag.doInvoke(NO_SOURCE_FILE:13)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at hotel_nlp.externals.uima$eval3726.invoke(NO_SOURCE_FILE:1)
at clojure.lang.Compiler.eval(Compiler.java:6619)
at clojure.lang.Compiler.eval(Compiler.java:6582)
at clojure.core$eval.invoke(core.clj:2852)
at
clojure.main$repl$read_eval_print__6588$fn__6591.invoke(main.clj:259)
at clojure.main$repl$read_eval_print__6588.invoke(main.clj:259)
at clojure.main$repl$fn__6597.invoke(main.clj:277)
at clojure.main$repl.doInvoke(main.clj:277)
at clojure.lang.RestFn.invoke(RestFn.java:1096)
at
clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__589.invoke(interruptible_eval.clj:56)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:617)
at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1788)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at
clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj:41)
at
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__630$fn__633.invoke(interruptible_eval.clj:171)
at clojure.core$comp$fn__4154.invoke(core.clj:2330)
at
clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__623.invoke(interruptible_eval.clj:138)
at clojure.lang.AFn.run(AFn.java:24)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:474)
at org.apache.uima.cas.impl.CASImpl.createAnnotation(CASImpl.java:3916)
at
org.apache.uima.annotator.WhitespaceTokenizer.createAnnotation(WhitespaceTokenizer.java:230)
at
org.apache.uima.annotator.WhitespaceTokenizer.process(WhitespaceTokenizer.java:143)
at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
... 32 more

NullPointerException org.apache.uima.cas.impl.CASImpl.createFS
(CASImpl.java:474)
----------------------------------------------------------------------------------------------------------------------------------------

Jim - FooBar();

unread,
Apr 29, 2013, 1:21:41 PM4/29/13
to uimafi...@googlegroups.com
aaaa yes of course!!! How could I forget? I should chill....hehe :)

Jim

Richard Eckart de Castilho

unread,
Apr 29, 2013, 1:31:56 PM4/29/13
to uimafi...@googlegroups.com
Hi,

ok, so this is the relevant section:

Am 29.04.2013 um 19:20 schrieb "Jim - FooBar();" <jimpi...@gmail.com>:

> Caused by: java.lang.NullPointerException
> at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:474)
> at org.apache.uima.cas.impl.CASImpl.createAnnotation(CASImpl.java:3916)
> at org.apache.uima.annotator.WhitespaceTokenizer.createAnnotation(WhitespaceTokenizer.java:230)
> at org.apache.uima.annotator.WhitespaceTokenizer.process(WhitespaceTokenizer.java:143)

The problem is, that the TokenAnnotation and SentenceAnnotations are not in the type system.

public static final String TOKEN_ANNOTATION_NAME = "org.apache.uima.TokenAnnotation";
public static final String SENTENCE_ANNOTATION_NAME = "org.apache.uima.SentenceAnnotation";

Those types are defined inline in the descriptor of the WhitespaceTokenizer. This is invisible to
uimaFIT, and that may be the problem. It's probably a problem of you creating the CAS using
uimaFIT TSDF.createTypeSystemDescription(). I suppose at some point you create an AnalysisEngine
from the HMMTagger aggregate description. Try creating a CAS from that AnalysisEngine, that should
also contain the types define within the analysis engine descriptors.

-- Richard

Jim - FooBar();

unread,
Apr 29, 2013, 2:11:18 PM4/29/13
to uimafi...@googlegroups.com
On 29/04/13 18:31, Richard Eckart de Castilho wrote:
> Those types are defined inline in the descriptor of the WhitespaceTokenizer. This is invisible to
> uimaFIT, and that may be the problem. It's probably a problem of you creating the CAS using
> uimaFIT TSDF.createTypeSystemDescription(). I suppose at some point you create an AnalysisEngine
> from the HMMTagger aggregate description. Try creating a CAS from that AnalysisEngine, that should
> also contain the types define within the analysis engine descriptors.

I don't know what to say... You are so right!!!! It turns out that if
one has a descriptor its better to go through the
UIMAFramework.produceBLAHBLAH() road instead of uimafit...suddenly all
warnings disappeared and the engine run just fine! YAY! :)

I am speechless Richard...I do promise not to pester you from now on but
after demonstrating the extent of your UIMA knowledge I cannot exactly
make any hard guarantees!
You can expect that article/tutorial within the week... :)


THANKS A MILLION... ;)

Jim

ps: in the unlikely case you're in the Greater Manchester area (UK) I'm
buying beer ... :):):)

Jim

unread,
May 1, 2013, 10:04:15 AM5/1/13
to uimafi...@googlegroups.com
Hi Richard,

just wanted to let you know that I've started putting together the
UIMA+Clojure demo/tutorial...:) Actually, more than half is ready...I'm
using markdown format - is that ok for your purposes?

thanks,
Jim


Richard Eckart de Castilho

unread,
May 1, 2013, 11:54:46 AM5/1/13
to uimafi...@googlegroups.com
Hi Jim,

If the format is human readable/interpretable that's fine. Looking forward to it.

-- Richard

Jim - FooBar();

unread,
May 2, 2013, 10:54:25 AM5/2/13
to uimafi...@googlegroups.com
Hey Richard,

you can download the UIMA+CLojure demo/tutorial from here:

https://dl.dropboxusercontent.com/u/45723414/demo.mdown

I suggest you copy the contents of the file and paste them here:
http://markdownlivepreview.com/

Awaiting feedback... :)

Jim

Jim - FooBar();

unread,
May 8, 2013, 12:21:31 PM5/8/13
to uimafi...@googlegroups.com

Right, I 've found Jens's repo - it's here: https://github.com/jenshaase/uimaclj/blob/master/java/uimaclj/core/CljAnnotator.java

From what I can see he's using a single function, that's why the code is much shorter+cleaner...However, I do like the  fact that he's passing the actual function (if I 've understood correctly) rather than its class-name. I'm starting to think I can do the same thing found in my code but with Jens's approach (passing 3 external-resources)...Do you think that would work?

Jim

Ok, scrap that...He's doing much more than I originally thought. I think I understand but I don't see any substantial benefit to be honest...I've bookmarked his repo but I thinkI'll stick with my version for now...I could however, link to his repo from the demo as an alternative proxy-ing solution.


Jim

Richard Eckart de Castilho

unread,
May 8, 2013, 1:40:59 PM5/8/13
to uimafi...@googlegroups.com
Am 08.05.2013 um 11:35 schrieb Jim <jimpi...@gmail.com>:

> On 08/05/13 09:29, Richard Eckart de Castilho wrote:
>> Hi Jim,
>>
>> I have read the text and do have some comments.
>
> Hi again,
> comments are very much welcome... :)
>
>> "UIMA relies heavily on the java.lang.reflect" - that's not true. uimaFIT does, not UIMA doesn't very much. What do you mean?
>
> It's my understanding that UIMA uses Class.forName() to instantiate components. This is a reflective operation isn't it? What I really mean is that UIMA does not accept object instances but rather Class names…This caused my several problems and I just wanted to make it clear from the start...

Ah, well. Yes, classes are instantiated using reflection. But compared to the stuff that uimaFIT does, this is really harmless ;) (Inspection of fields, injecting values even into private fields, the uimaFIT Maven plugin in uimaFIT 2.0.0 even parses the source code to extract JavaDoc!)

>> there are several references to external stuff like the "2 sides of uimaFIT", those should be links
>>
>> --
>>
>> link to definition of "proxy" for the non-Clojure-guru (like me)
>
> you're right, I'll add them... :)
>
>> UIMAProxy - no need to keep a reference to the context in initialize(), just call getUIMAContext or getContext later.
> right, I see...It was Jens who prototyped this class and so I thought it was necessary as he knows more about UIMA than me...I'll remove the extra arg...
>
>> Have you thought about combining the resource-based approach Jens used for injecting the Clojure functions into the UIMAProxy? Too complicated, too much magic?
>
> I've not had a look at it to be honest...The current approach I find clear and understandable and so I pursued it...Is Jens's resource-based approach ready and fully functional? Do we have a link?
>
>> Here you just redefine "JCasUtil/selectCovered" as "select-annotations", right? Are there "static imports" in Clojure so that you could just write "selectCovered" and safe the redefinition?
>>
>> (defn select-annotations [^JCas jc ^Class klass start end]
>> (JCasUtil/selectCovered jc klass start end))
> No, there are no static imports in Clojure...That is why many of my functions are essentially hiding the java interop and nothing more...However, since I'm providing most of the type hints there is very little to no overhead from the extra redefinition. Moreover, whenever you see a 'definline' it means that the function body will be inlined with the code that calls it at compile time so the redefinition essentially dissapears! The binding is still there, but the call to it has vanished...
>
>> Rename "inject-annotation" to "createAnnotation" to stay in-line with the UIMA naming scheme?
>
> how about "create-annotation" to stay in-line with Clojure's naming conventions as well? camelCase is not considered eye-friendly to most Clojurians…

Sure, sounds good.

>> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>>
>> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
>
> really? How is that possible? I thought only strings can be passed to annotated fields. Are you saying I can declare UIMAProxy/PARAM_EXTFN to be a field of type Class thus allowing me to write
>
> UIMAProxy/PARAM_EXTFN (class jcas-input-extractor) ?

No, many things can be passed to annotated fields. In principle anything that has a constructor taking a single string argument and that has a toString method which sensible can fill this argument works. A good example is File:

File a = new File("lala");
File b = new File(a.toString());
a eq b

Then there is support for all kinds of collections and for special classes, e.g. "Class". This is mostly handled by Spring and so-called PropertyEditor classes (which are actually part of the good old Java Bean specs). uimaFIT brings some customized PropertyEditor as well.

So it should be no problem that you pass a File or a Class instance as an argument to the factory method. The field annotated with @ConfigurationParameter in your component must not be of the same type. E.g. you can pass a File to the factory method when the annotated field is actually a string, or vice versa. I think you cannot pass a the String "1" and fill an int field with that, but it may even work.

>> Towards the end, in the UIMA -> Clojure part, I got quite lost. I believe I follow the general idea, but I'm not just up to speed with Clojure.
>
> Is there something in particular that confused you? something that I can make clearer perhaps?

I think I got distracted by the automatic detection if the WhitespaceTokenizer should run or not. That's probably because I don't work much with descriptors. Either, I would just create an aggregate in code containing a tokenizer and a tagger (if I really needed that), or I would just tokenize in Clojure and then run the tagger. In both cases, I know what I'm doing and do not need this auto-detection.

>> How shall we do it? You get commit access to the uimaFIT wiki? I place the tutorial in the uimaFIT wiki? You place the tutorial in markdown with your code on GitHub and we link to it? I'd love to open a "language zoo" with helper classes and examples for different language in the Apache uimaFIT, but I don't have the resources right now to invest time there. Would you licensing the tutorial and the associated source code under any of the Category A licenses listed here?:http://www.apache.org/legal/3party.html
>
> Hmmm...I've not thought about this! I'd say the simplest option would be for me to create a project on github and then you can link to it from the uimaFIT wiki. Alternatively, if you want to pursue your "language zoo" project in the future, the BSD license is fine by me... :)

Alright. Drop it on the github, attach a license and I'll set up a link.

-- Richard

Jim - FooBar();

unread,
May 9, 2013, 10:08:46 AM5/9/13
to uimafi...@googlegroups.com
Hi Richard,


> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>
> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
you were right...it works!

but the biggest benefit is not less-typing but the fact that now
UIMAProxy.java needs to know nothing about Clojure's dynamic classloader
because we've eliminated the Class.forName() invocation...cool stuff :)

Jim

ps: I'll create the repo with the new code tonight or tomorrow :)

Richard Eckart de Castilho

unread,
May 9, 2013, 10:42:55 AM5/9/13
to uimafi...@googlegroups.com
>> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>>
>> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
> you were right...it works!
>
> but the biggest benefit is not less-typing but the fact that now UIMAProxy.java needs to know nothing about Clojure's dynamic classloader because we've eliminated the Class.forName() invocation...cool stuff :)

Ehm… then there's something unexpected here. The one who would now need to know about the dynamic class loader is the Spring framework. To pass in a Class parameter to the UIMAProxy, it is implicitly converted from Class to String and then again from String to Class. At least in the second step, access to the dynamic class loader is necessary. If Spring has it for some reason, that's great. But I wonder why Class.forName() in UIMAProxy then was a problem just using the default class loader.

-- Richard

Jim - FooBar();

unread,
May 9, 2013, 11:08:31 AM5/9/13
to uimafi...@googlegroups.com
On 09/05/13 15:42, Richard Eckart de Castilho wrote:
>>> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>>>
>>> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
>> you were right...it works!
>>
>> but the biggest benefit is not less-typing but the fact that now UIMAProxy.java needs to know nothing about Clojure's dynamic classloader because we've eliminated the Class.forName() invocation...cool stuff :)
> Ehm� then there's something unexpected here. The one who would now need to know about the dynamic class loader is the Spring framework. To pass in a Class parameter to the UIMAProxy, it is implicitly converted from Class to String and then again from String to Class. At least in the second step, access to the dynamic class loader is necessary. If Spring has it for some reason, that's great. But I wonder why Class.forName() in UIMAProxy then was a problem just using the default class loader.
>
> -- Richard
>

ooo right....I think I've got the explanation! My 'produce' function has
a default argument - the resource-manager. Unless the user the caller
specifies its own, a nee default resource-manager will be created like so:

(doto (UIMAFramework/newDefaultResourceManager)
(.setExtensionClassPath dynamic-classloader "" true))

This basically means that the dynamic-classloader is already
'registered' with UIMA (and therefore with Spring?). In other words, I'm
suspecting that the call I used to have like this:

(IFn)Class.forName(strextfn, true, dcl).newInstance();

was actually redundant as UIMA already know about 'dcl'...I don't see
how Spring can know about Clojure's specific Classloader.

does this make sense?

Jim

Richard Eckart de Castilho

unread,
May 9, 2013, 11:44:26 AM5/9/13
to uimafi...@googlegroups.com
Am 09.05.2013 um 17:08 schrieb "Jim - FooBar();" <jimpi...@gmail.com>:

> On 09/05/13 15:42, Richard Eckart de Castilho wrote:
>>>> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>>>>
>>>> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
>>> you were right...it works!
>>>
>>> but the biggest benefit is not less-typing but the fact that now UIMAProxy.java needs to know nothing about Clojure's dynamic classloader because we've eliminated the Class.forName() invocation...cool stuff :)
>> Ehm… then there's something unexpected here. The one who would now need to know about the dynamic class loader is the Spring framework. To pass in a Class parameter to the UIMAProxy, it is implicitly converted from Class to String and then again from String to Class. At least in the second step, access to the dynamic class loader is necessary. If Spring has it for some reason, that's great. But I wonder why Class.forName() in UIMAProxy then was a problem just using the default class loader.
>>
>> -- Richard
>>
>
> ooo right....I think I've got the explanation! My 'produce' function has a default argument - the resource-manager. Unless the user the caller specifies its own, a nee default resource-manager will be created like so:
>
> (doto (UIMAFramework/newDefaultResourceManager)
> (.setExtensionClassPath dynamic-classloader "" true))
>
> This basically means that the dynamic-classloader is already 'registered' with UIMA (and therefore with Spring?). In other words, I'm suspecting that the call I used to have like this:
>
> (IFn)Class.forName(strextfn, true, dcl).newInstance();
>
> was actually redundant as UIMA already know about 'dcl'...I don't see how Spring can know about Clojure's specific Classloader.
>
> does this make sense?


Afaik, there's nothing special to set up a link between the PropertyEditors that Spring uses and the UIMA ResourceManager. It looks like Spring's ClassPropertyEditor is falling back to the same classloader you are using to acquire the Clojure dynamic class loader -- the one from the thread context. So apparently, by chance (or by design?), Spring uses the right one.

See also: org.springframework.beans.propertyeditors.ClassEditor.setAsText(String)
See also: org.springframework.util.ClassUtils.getDefaultClassLoader()

-- Richard

Jim - FooBar();

unread,
May 9, 2013, 11:47:11 AM5/9/13
to uimafi...@googlegroups.com
On 09/05/13 16:44, Richard Eckart de Castilho wrote:
> Am 09.05.2013 um 17:08 schrieb "Jim - FooBar();" <jimpi...@gmail.com>:
>
>> On 09/05/13 15:42, Richard Eckart de Castilho wrote:
>>>>> UIMAProxy/PARAM_EXTFN (-> jcas-input-extractor class .getName)
>>>>>
>>>>> uimaFIT 1.4.0 should support passing a "class" parameter (no need to get the class name as string). It should automagically convert it to the class name string (via Spring magic).
>>>> you were right...it works!
>>>>
>>>> but the biggest benefit is not less-typing but the fact that now UIMAProxy.java needs to know nothing about Clojure's dynamic classloader because we've eliminated the Class.forName() invocation...cool stuff :)
>>> Ehm� then there's something unexpected here. The one who would now need to know about the dynamic class loader is the Spring framework. To pass in a Class parameter to the UIMAProxy, it is implicitly converted from Class to String and then again from String to Class. At least in the second step, access to the dynamic class loader is necessary. If Spring has it for some reason, that's great. But I wonder why Class.forName() in UIMAProxy then was a problem just using the default class loader.
>>>
>>> -- Richard
>>>
>> ooo right....I think I've got the explanation! My 'produce' function has a default argument - the resource-manager. Unless the user the caller specifies its own, a nee default resource-manager will be created like so:
>>
>> (doto (UIMAFramework/newDefaultResourceManager)
>> (.setExtensionClassPath dynamic-classloader "" true))
>>
>> This basically means that the dynamic-classloader is already 'registered' with UIMA (and therefore with Spring?). In other words, I'm suspecting that the call I used to have like this:
>>
>> (IFn)Class.forName(strextfn, true, dcl).newInstance();
>>
>> was actually redundant as UIMA already know about 'dcl'...I don't see how Spring can know about Clojure's specific Classloader.
>>
>> does this make sense?
>
> Afaik, there's nothing special to set up a link between the PropertyEditors that Spring uses and the UIMA ResourceManager. It looks like Spring's ClassPropertyEditor is falling back to the same classloader you are using to acquire the Clojure dynamic class loader -- the one from the thread context. So apparently, by chance (or by design?), Spring uses the right one.
>
> See also: org.springframework.beans.propertyeditors.ClassEditor.setAsText(String)
> See also: org.springframework.util.ClassUtils.getDefaultClassLoader()
>
> -- Richard
>

aaaaa ok! I'll make sure to document that somewhere... thanks again for
your helpful insights :)

Jim

Jim - FooBar();

unread,
May 10, 2013, 11:32:08 AM5/10/13
to uimafi...@googlegroups.com
Hi Richard,

as promised, the github repo is up and I've incorporated all but one of your suggestions. Basically, I left "inject-annotation!" as is because it actually does 2 things. It creates the annotations but also writes it to the CAS. Therefore, simply "create-annotation" doesn't do justice....semantically, I think 'inject' makes more sense there...

I've also attached the 4-clause BSD Licence and linked it at the bottom of the readme.md...

https://github.com/jimpil/clojuima

let me know if you spot any problems :)

Jim

Richard Eckart de Castilho

unread,
May 11, 2013, 4:25:54 AM5/11/13
to uimafi...@googlegroups.com
Am 10.05.2013 um 17:32 schrieb Jim - FooBar(); <jimpi...@gmail.com>:

> Hi Richard,
>
> as promised, the github repo is up and I've incorporated all but one of your suggestions. Basically, I left "inject-annotation!" as is because it actually does 2 things. It creates the annotations but also writes it to the CAS. Therefore, simply "create-annotation" doesn't do justice....semantically, I think 'inject' makes more sense there...
>
> https://github.com/jimpil/clojuima

Excellent :) I've linked this from two places now:

https://code.google.com/p/uimafit/wiki/Documentation?tm=6 (Language zoo)

and from the main project page under "Blogs" (even though it's not a blog strictly speaking)

https://code.google.com/p/uimafit/

Feel free to look around on the project page and suggest a better place. I was also considering placing a link under "Who is using uimaFIT?" or opening a new section, but I didn't come up with a good title yet. So the link may still move around.

> I've also attached the 4-clause BSD Licence and linked it at the bottom of the readme.md…

I would love at some point to include this with the Apache uimaFIT. By chance I just checked the license guidelines because of another piece of code using the BSD license [1]. Apparently, Apache doesn't allow to include stuff that's under the 4-clause BSD license [2]. The advertising clause (clause 3) is a problem. While this is not imminent, I just wanted to point out, that I wouldn't be able to include your work with the current license.

Cheers,

-- Richard

[1] https://issues.apache.org/jira/browse/UIMA-2471
[2] http://www.apache.org/legal/resolved.html

Jim - FooBar();

unread,
May 12, 2013, 11:16:12 AM5/12/13
to uimafi...@googlegroups.com
good stuff! :)

btw, I switched to the 3-clause BSD license... :) Thanks for pointing
that out. You should be ok now...

cheers,

Jim



On 11/05/13 09:25, Richard Eckart de Castilho wrote:
> Am 10.05.2013 um 17:32 schrieb Jim - FooBar(); <jimpi...@gmail.com>:
>
>> Hi Richard,
>>
>> as promised, the github repo is up and I've incorporated all but one of your suggestions. Basically, I left "inject-annotation!" as is because it actually does 2 things. It creates the annotations but also writes it to the CAS. Therefore, simply "create-annotation" doesn't do justice....semantically, I think 'inject' makes more sense there...
>>
>> https://github.com/jimpil/clojuima
> Excellent :) I've linked this from two places now:
>
> https://code.google.com/p/uimafit/wiki/Documentation?tm=6 (Language zoo)
>
> and from the main project page under "Blogs" (even though it's not a blog strictly speaking)
>
> https://code.google.com/p/uimafit/
>
> Feel free to look around on the project page and suggest a better place. I was also considering placing a link under "Who is using uimaFIT?" or opening a new section, but I didn't come up with a good title yet. So the link may still move around.
>
>> I've also attached the 4-clause BSD Licence and linked it at the bottom of the readme.md�
Reply all
Reply to author
Forward
0 new messages