uimafit annotator and external resource file (xml)

129 views
Skip to first unread message

mco...@mitre.org

unread,
Dec 19, 2012, 1:47:58 PM12/19/12
to uimafi...@googlegroups.com
Hello,

I have a uimafit-based annotator (extends from JCasAnnotator_ImplBase) and it needs an pull in an xml configuration file to configure the annotator (its own xml format, not a uima descriptor).  I want to pull this in as an external resource.

In particular, I'm looking to get a file URI.

Question #1:
I want to declare a field on my annotator class and annotate it with @ExternalResource.  Can this field just be a URI, or do I need to define my own type that extends SharedResourceObject?

Question #2:
When I create the AnalysisEngineDescriptor object and pass in the configuration parameters and external resource settings, (this is sort of related to #1) can I pass in a URI directly or do I need to construct an ExternalResourceDescription object?

Question #3 [this is my main question]:
When I'm building my annotator pipeline and configuring the annotator, what is the proper way in uimafit to lookup the URI of my annotator's xml configuration file?  Do I just call this.getClass().getClassLoader().getResource("com/mycompany/myproject/myannotator.xml")?  I want to do this the right uimafit way!


I've been using the following two resources and they've been very helpful!  The first is the wiki page on "External Resources".  The second is an example class for external resources.



My problem is very similar to the examples.  I guess the main question is what's the proper way to lookup the resource on the classpath?

Other than use the normal java classloader mechanism, my other thoughts were:
  • Call UIMAContextFactory.createUimaContext().getResourceURI() [my problem with this is that if I pass in a key, that just brings me back to the external resource definition that I'm working from...so it's a bit of a circle; if specify a uri string, it will look it up on the data path and classpath, which is what I want, but it says this is deprecated behavior and it will stop working in a future version.]
  • In the uimafit type descriptor documentation they talk about using spring style classpath: and classpath*: filenames.  Can I use these style of filenames with the uimafit methods?  Or maybe I should use spring directly?
  • Or, again, should I just go with the plain old java classloader code?

Thanks!
Matt

Richard Eckart de Castilho

unread,
Dec 19, 2012, 3:46:27 PM12/19/12
to uimafi...@googlegroups.com
Hello Matt,

to make a long story short: I'd use the method I'm most familiar with, for you that's probably the good old classloader code paired with a simple @ConfigurationParameter which tells you the location within the classpath.

Unless… your main goal is to load the data from the URI only once and share it amongst multiple components or instances of the same component in your pipeline. Then you'd want to use the @ExternalResource paired with whatever you are most familiar with.

Am 19.12.2012 um 19:47 schrieb "mco...@mitre.org" <mco...@mitre.org>:

> I have a uimafit-based annotator (extends from JCasAnnotator_ImplBase) and it needs an pull in an xml configuration file to configure the annotator (its own xml format, not a uima descriptor). I want to pull this in as an external resource.
>
> In particular, I'm looking to get a file URI.
>
> Question #1:
> I want to declare a field on my annotator class and annotate it with @ExternalResource. Can this field just be a URI, or do I need to define my own type that extends SharedResourceObject?

You can annotate URI fields with @ConfigurationParameter because the URI class has a constructor which takes a single String argument. @ExternalResource is mean to be used with subclasses of SharedResourceObject or other special cases.

The main use case for external resources is that a single instance of a resource is shared amongst multiple UIMA components (be it different components or different instances of the same component), so the resource is kept in memory or otherwise provisioned only once (e.g. downloaded from a remote URL once per pipeline instead of per component).

> Question #2:
> When I create the AnalysisEngineDescriptor object and pass in the configuration parameters and external resource settings, (this is sort of related to #1) can I pass in a URI directly or do I need to construct an ExternalResourceDescription object?

@ExternalResource: you need to pass in a ExternalResourceDescription.

@ConfigurationParameter: you can pass in an URI because it has a toString() method whose output can be passed to the String constructor of URI. uimaFIT or rather the underlying Spring takes care of the conversion URI -> String -> URI.

> Question #3 [this is my main question]:
> When I'm building my annotator pipeline and configuring the annotator, what is the proper way in uimafit to lookup the URI of my annotator's xml configuration file? Do I just call this.getClass().getClassLoader().getResource("com/mycompany/myproject/myannotator.xml")? I want to do this the right uimafit way!

uimaFIT doesn't provide mechanisms for this scenario. You can use the plain old classloader or a Spring (which uimaFIT uses internally) or write your own convenience methods like the ResourceUtils in DKPro Core [1].

> I've been using the following two resources and they've been very helpful! The first is the wiki page on "External Resources". The second is an example class for external resources.
>
> http://code.google.com/p/uimafit/wiki/ExternalResources
>
> http://code.google.com/p/uimafit/source/browse/trunk/uimaFIT-examples/src/main/java/org/uimafit/examples/resource/ExternalResourceExample.java
>
> My problem is very similar to the examples. I guess the main question is what's the proper way to lookup the resource on the classpath?
>
> Other than use the normal java classloader mechanism, my other thoughts were:
> • Call UIMAContextFactory.createUimaContext().getResourceURI() [my problem with this is that if I pass in a key, that just brings me back to the external resource definition that I'm working from…so it's a bit of a circle; if specify a uri string, it will look it up on the data path and classpath, which is what I want, but it says this is deprecated behavior and it will stop working in a future version.]

I have no idea why it is deprecated - possibly because loading stuff from the classpath is notoriously inconvenient when running in an OSGi environment. The deprecation of this feature prompted us to implement and use the ResourceUtils.resolveLocation() throughout DKPro Core because we don't worry about this. uimaFIT doesn't cover this aspect.

> • In the uimafit type descriptor documentation they talk about using spring style classpath: and classpath*: filenames. Can I use these style of filenames with the uimafit methods? Or maybe I should use spring directly?

uimaFIT supports this only in its special configuration files in META-INF/org.uimafit. "classpath:" is implemented in DKPro's ResourceUtils.resolveLocation() or if you want to use Spring, you may want to consider the PathMatchingResourcePatternResolver, which is what uimaFIT uses internally.

> • Or, again, should I just go with the plain old java classloader code?

Up to you. Since you already have Spring on the classpath via uimaFIT, you might want to give that a try. You can also get inspiration from DKPro's ResourceUtils or even use it.

Personally, I'd use an external resource if I desperately needed to save memory or if my main interest was't to load data from an URI but rather to access data via a particular API. I could then inject different implementations of that API as needed using the external resource mechanism. For example, you could define a "Dictionary" interface and implement a "FileDictionary" and a "DatabaseDictionary". Both of the implementations could expect different parameters, e.g. a path for the FileDictionary or a JDBC connection string, table name, credentials, etc. for the DatabaseDictionary).

Otherwise, I'd stick with @ConfigurationParameter on a String field and pass that field into ResourceUtils.resolveLocation() (if I expect a single URL) or PathMatchingResourcePatternResolver (if I want to address multiple locations via a wildcard).

Cheers,

-- Richard

[1] http://dkpro-core-asl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-asl/tags/latest-release/apidocs/de/tudarmstadt/ukp/dkpro/core/api/resources/ResourceUtils.html

Coarr, Matt

unread,
Dec 19, 2012, 4:30:33 PM12/19/12
to uimafi...@googlegroups.com
Thanks Richard!  That's just what I needed to know!

It sounds like @ConfigurationParameter is the way to go for this project!

(On another project I have a some large models, so that might be a better fit for external resources.)

Thanks for the clarifications and suggestions!  Solid advice!

Matt
Reply all
Reply to author
Forward
0 new messages