YTEX cTAKES 3.1.1 ready

332 views
Skip to first unread message

vijay garla

unread,
Jan 3, 2014, 10:21:52 PM1/3/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hello All,

I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.  Most of the YTEX functionality has been ported and integrated with cTAKES, and I've tested with MySQL and MS SQL Server (oracle tests pending).

Most of the changes were made in new projects - very little existing cTAKES code has been modified.  The only non-trivial changes are in /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api - here I modified CharacterOffsetToLineTokenConverterCtakesImpl & SingleDocumentProcessorCtakes to deal with newlines within sentences correctly.  Can somebody take a look at the changes in the ytex branch?

I believe that the branch https://svn.apache.org/repos/asf/ctakes/branches/ytex is ready to be merged into ctakes trunk, but would like other users to test it as well.  Questions:

* How can I distribute the ctakes binary distribution to ytex users before the merge? Can we make the branch build available somewhere?  The binary distribution is too large to host on the ytex google code site (max 200 MB)
* Non-ASF libraries - I have segregated these out into their own zip file that can be distributed via sourceforge.  As a stopgap, I can upload this to the ytex google code site, but would prefer to upload to sourceforge.
* UMLS Derivatives - Ditto for these - would like to move to sourceforge.
* Documentation - How can I update the confluence docs?  I would migrate the documentation from the google code website.

Here the installation instructions (putting the wagon in front of the horse ...)


Best,

VJ


Bhaskar B

unread,
Jan 8, 2014, 7:41:44 AM1/8/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi Vijay,

Thank you for this update.  In order to evaluate the YTEX port into cTAKES, I wanted to do the following (goals):

(a) Compile the ctakes/branches/ytex to create apache-ctakes-3.1.2-SNAPSHOT.
(b) Validate that the just compiled binary works by running either the AggregatePlaintextProcessor.xml or AggregatePlaintextUMLSProcessor.xml pipelines (basically following instructions at https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+User+Install+Guide).
(c) Follow instructions at https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1 to explore YTEX specific features, e.g. pipelines, writing annotations to database, etc.

So I took the following steps:

1) Followed the instructions at https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+Developer+Install+Guide and https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+Install+Guide to successfully (i) configure Eclipse (Juno) environment for building cTAKES, and (ii) pull the source code from the SVN repository branch: https://svn.apache.org/repos/asf/ctakes/branches/ytex.

2) In Eclipse, right clicked on the top-most (or root level) pom.xml, selected Run As -> Maven build, typed in "compile" as the goal, and hit Run.  This successfully compiled all the projects.

3) In Eclipse, repeated step (2) but selected Run As -> Maven install to create a distribution.  However this is where I started to encounter few problems.  I was eventually able to get Maven install to complete and create the binaries (i.e. in ctakes-distribution/target/) but by manually doing the following:

3.1) in pom.xml of ctakes-ytex-uima: excluded all tests
3.2) in pom.xml of ctakes-core: excluded 2 tests
3.3) in ctakes-ytex: modified scripts/build-classpath.xml and scripts/build-setup.xml to hardcode path to ANT library
3.4) in ctakes-dependency-parser: excluded 1 unit test

4) After this step, I extracted apache-ctakes-3.1.2-SNAPSHOT-bin.zip and attempted to verify (i.e. step (b)) above.  However when I attempted to load AggregatePlaintextProcessor, I am getting exception (below).

While I continue to look to resolve this, any tips/hints that you could provide to get this build functional would be highly appreciated.  I think I may be missing one or more key steps/configuration.  My workstation is Windows 7 and I use Eclipse Juno.

-------------------------
java.lang.Error: Unresolved compilation problems:
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.machine cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.output cannot be resolved
        The import org.apache.ctakes.core.fsm.token.BaseToken cannot be resolved

        The import org.apache.ctakes.core.fsm.token.EolToken cannot be resolved
        DateFSM cannot be resolved to a type
        TimeFSM cannot be resolved to a type
        FractionFSM cannot be resolved to a type
        RomanNumeralFSM cannot be resolved to a type
        RangeFSM cannot be resolved to a type
        MeasurementFSM cannot be resolved to a type
        PersonTitleFSM cannot be resolved to a type
        DateFSM cannot be resolved to a type
        DateFSM cannot be resolved to a type
        TimeFSM cannot be resolved to a type
        TimeFSM cannot be resolved to a type
        FractionFSM cannot be resolved to a type
        FractionFSM cannot be resolved to a type
        RomanNumeralFSM cannot be resolved to a type
        RomanNumeralFSM cannot be resolved to a type
        RangeFSM cannot be resolved to a type
        RangeFSM cannot be resolved to a type
        MeasurementFSM cannot be resolved to a type
        MeasurementFSM cannot be resolved to a type
        PersonTitleFSM cannot be resolved to a type
        PersonTitleFSM cannot be resolved to a type
        BaseToken cannot be resolved to a type
        BaseToken cannot be resolved to a type
        BaseToken cannot be resolved to a type
        The method adaptToBaseToken(BaseToken) from the type ContextDependentTok
enizerAnnotator refers to the missing type BaseToken
        EolToken cannot be resolved to a type
        BaseToken cannot be resolved to a type
        DateToken cannot be resolved to a type
        DateFSM cannot be resolved to a type
        DateToken cannot be resolved to a type
        DateToken cannot be resolved to a type
        TimeToken cannot be resolved to a type
        TimeFSM cannot be resolved to a type
        TimeToken cannot be resolved to a type
        TimeToken cannot be resolved to a type
        RomanNumeralToken cannot be resolved to a type
        RomanNumeralFSM cannot be resolved to a type
        RomanNumeralToken cannot be resolved to a type
        RomanNumeralToken cannot be resolved to a type
        FractionToken cannot be resolved to a type
        FractionFSM cannot be resolved to a type
        FractionToken cannot be resolved to a type
        FractionToken cannot be resolved to a type
        RangeToken cannot be resolved to a type
        RangeFSM cannot be resolved to a type
        RangeToken cannot be resolved to a type
        RangeToken cannot be resolved to a type
        MeasurementToken cannot be resolved to a type
        MeasurementFSM cannot be resolved to a type
        MeasurementToken cannot be resolved to a type
        MeasurementToken cannot be resolved to a type
        PersonTitleToken cannot be resolved to a type
        PersonTitleFSM cannot be resolved to a type
        PersonTitleToken cannot be resolved to a type
        PersonTitleToken cannot be resolved to a type
        BaseToken cannot be resolved to a type

        at org.apache.ctakes.contexttokenizer.ae.ContextDependentTokenizerAnnota
tor.<init>(ContextDependentTokenizerAnnotator.java:45)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Sou
rce)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at java.lang.Class.newInstance0(Unknown Source)
        at java.lang.Class.newInstance(Unknown Source)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:227)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
tialize(PrimitiveAnalysisEngine_impl.java:156)
        at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
sisEngineFactory_impl.java:94)
        at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
mpositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

        at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
a:387)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
:254)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
tASB(AggregateAnalysisEngine_impl.java:431)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
tialize(AggregateAnalysisEngine_impl.java:185)
        at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
sisEngineFactory_impl.java:94)
        at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
mpositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

        at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
a:354)
        at org.apache.uima.tools.cvd.MainFrame.setupAE(MainFrame.java:1484)
        at org.apache.uima.tools.cvd.MainFrame.loadAEDescriptor(MainFrame.java:4
77)
        at org.apache.uima.tools.cvd.control.AnnotatorOpenEventHandler.actionPer
formed(AnnotatorOpenEventHandler.java:52)
        at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
        at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
        at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
        at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
        at javax.swing.AbstractButton.doClick(Unknown Source)
        at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source)
        at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown
Source)
        at java.awt.Component.processMouseEvent(Unknown Source)
        at javax.swing.JComponent.processMouseEvent(Unknown Source)
        at java.awt.Component.processEvent(Unknown Source)
        at java.awt.Container.processEvent(Unknown Source)
        at java.awt.Component.dispatchEventImpl(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
        at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
        at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Window.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
        at java.awt.EventQueue.access$200(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.awt.EventQueue$3.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
ce)
        at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
ce)
        at java.awt.EventQueue$4.run(Unknown Source)
        at java.awt.EventQueue$4.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
ce)
        at java.awt.EventQueue.dispatchEvent(Unknown Source)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.run(Unknown Source)

vijay garla

unread,
Jan 8, 2014, 8:52:11 AM1/8/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi Bhaskar,

Thanks for working on this!  

I am not sure what is going wrong, but can you try doing a clean checkout and a "mvn clean install" from the command line?  When I do this, all tests in all projects pass, no changes necessary.  I have never had the eclipse maven m2e plugin work flawlessly (congratulations to those of you who do); if the command line mvn clean install works, then I need to figure out why the build doesn't work from eclipse.  I am using the 64-bit eclipse kepler, jdk 1.7, and maven 3.1.0; for me none of the projects with jcasgen plugins compile from eclipse.

Regarding the class not found exceptions: these classes are in CTAKES_HOME/lib/ctakes-core-3.1.2-SNAPSHOT.jar - can you make sure the classes are there?  If not, something went wrong in the build of ctakes-core (again please verify that that this works when you run maven from the command line).

I run this batch script which does a checkout, install, ytex setup, and runs the ytex CPE in a single go:

@REM c:\java\setenv.bat - puts java, maven, and svn in the PATH
@REM c:\temp\ctakes-build - where ctakes gets checked out 
@REM c:\temp - where I downloaded the ctakes resources, ytex resources & lib files
@REM c:\java\apache-ctakes-3.1.2-SNAPSHOT - where ctakes get's installed

call c:\java\setenv.bat
cd C:\temp\ctakes-build\ctakes
rmdir /s /q ctakes
cd ctakes
@rem need to unset ctakes home
set CTAKES_HOME=
call mvn clean install
cd c:\java
rmdir /s /q apache-ctakes-3.1.2-SNAPSHOT
jar xf C:\temp\ctakes-build\ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip
cd apache-ctakes-3.1.2-SNAPSHOT
jar xf c:\temp\ctakes-resources-3.1.0.zip
jar xf c:\temp\ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
jar xf c:\temp\ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
@rem stop here if you don't need to do a ytex setup
@rem adjust this to match your environment-  to use a different DB, copy a different ytex.properties file
copy resources\org\apache\ctakes\ytex\ytex.properties.mssql.example resources\org\apache\ctakes\ytex\ytex.properties
cd bin\ctakes-ytex\scripts
call ..\..\ant.bat -f build-setup.xml all > setup.out 2>&1
cd ..\..\..
call bin\setenv.bat
java -cp "%CLASSPATH%"  -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xms512M -Xmx2g org.apache.ctakes.ytex.tools.RunCPE  desc\ctakes-ytex-uima\desc\cpe\fracture_demo.xml

Best,

Vijay



--
You received this message because you are subscribed to the Google Groups "ytex-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ytex-users+...@googlegroups.com.
To post to this group, send email to ytex-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ytex-users/1bef20e1-0e47-489a-b565-30947bee987b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vlad.va...@gmail.com

unread,
Jan 30, 2014, 4:56:00 PM1/30/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi VJ--

this is great!! Thanks for all the hard work on it!

We're starting to look into the new install. For now we're trying the binaries out.

There were these questions about the proper install steps:

1. Do we first install ytex-0.8
2. Then install the new cTakes 3.1.1 instance and also apply the SNAPSHOT lib and resources zips
3. Work our way to install the UMLS ontologies in the db

Its is not entirely clear from the new document (https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1?ts=1388793998&updated=Installation_cTAKES_3_1)
if there's still need to install ytex-0.8, or YTEX has been entirely merged into cTakes?

If the last statement is correct, there are missing parts in i.e the UMLS install steps that are linked from the new ctakes 3.1.1 document.

Thanks,
vlad

vijay garla

unread,
Jan 30, 2014, 5:17:43 PM1/30/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi Vlad,

All of ytex has been moved into ctakes, it is currently in a branch (https://svn.apache.org/repos/asf/ctakes/branches/ytex).  You don't have to install ytex-0.8 - instead you will have to build and install from the ytex branch to create your own distribution.  Steps 2 & 3 are correct.

Although it is a pain, if you have the jdk, maven, and svn, you can easily build your own distro:
* open a command prompt
* make sure jdk, maven, and svn are in your path
* cd to some directory where you want to check stuff out (I like c:\temp)
* run the following commands
rmdir /s /q ctakes
cd ctakes
mvn clean install -DskipTests

And you will have the ctakes (with ytex) distro in ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip

What is the process for getting the ytex branch merged into trunk?  As I mentioned, there are very few changes to other ctakes classes/types - this should be completely complementary and not affect any existing ctakes functionality.

-vj






--
You received this message because you are subscribed to the Google Groups "ytex-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ytex-users+...@googlegroups.com.
To post to this group, send email to ytex-...@googlegroups.com.

vlad.va...@gmail.com

unread,
Feb 5, 2014, 4:44:24 PM2/5/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi VJ-

so, with trial and error were able to make the distribution and now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive.

Here's what's unclear.

1. Is now this the only (combined) thing that you need for ctakes 3.1.1 + Ytex?
which most probably is outdated, talks about installing cTakes 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) , lib and resources.
This is a confusion point.

2. The directions to import UMLS subset are then outdated as well. Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to
import the RRF files for the UMLS subset and then just use the resulting db. Thoughts?

Thanks,
Vlad Valtchinov
Brigham Rad

vijay garla

unread,
Feb 5, 2014, 5:33:15 PM2/5/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org
Hi Vlad,

sorry that the instructions aren't clear.

re 1) What I am trying to say is install apache-ctakes-3.2.0-snapshot as usual (this is unchanged from 3.1.1).  After that you still have to apply the lib and resources (these are things that cannot be distributed via apache).

re 2) Yes, I need to update those docs.  Hopefully will get to that at some point.  However, I assume you already have a UMLS DB (also assume SQL Server).  If you can't/don't want to use your existing umls DB, please tell me.  The I'll priortize upgrading the doc on importing the umls tables (the scripts are there).

best,

VJ


vijay garla

unread,
Feb 5, 2014, 9:29:07 PM2/5/14
to ytex-...@googlegroups.com, ctake...@incubator.apache.org, vlad.va...@gmail.com
Hi Vlad,

I Updated the umls install guide; see https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1

I would prefer to add the docs in the ctakes confluence, but as far as I can tell, I don't have write access there - can somebody give me write privileges on the ctakes confluence site?

There was a bug in the umls install; copy https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/scripts/data/build.xml over the corresponding file in your ctakes-3.1.2 install (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set.  The import is currently running on the UMLS 2013AA (I assume this will complete without issues as long as the umls schema hasn't changed from 2012).

what trial and error did you have to go through to build the distro?

-vj

vijay garla

unread,
Feb 6, 2014, 10:30:23 AM2/6/14
to d...@ctakes.apache.org, ytex-...@googlegroups.com, ctake...@incubator.apache.org, vlad.va...@gmail.com
I believe it is worth migrating to trunk.

Note that the sentence detector is also complementary - the existing ctakes sentence detector is unchanged - users can choose which sentence detector to use.  There are changes to assertion & dependency parsing to support sentences without newlines, and that works with both sentence detectors.

I believe cTAKES absolutely has to support sentences with newlines within them - I have yet to run across clinical text from a real EMR where newlines represent the end of a sentence - the changes to assertion & dependency parsing will have to be done at some point.

-vj


On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei <Pei....@childrens.harvard.edu> wrote:
VJ,
Aside from the changes to the existing cTAKES code (sentence detector, etc.) [which we could leave out if it's still being debated],
Do you think it's worth migrating the ytex code to trunk at this point?  As you mentioned earlier, it's largely complementary.
[I was just thinking of saving effort to maintain the separate branch and for simplicity for dev...]

--Pei

vijay garla

unread,
Feb 6, 2014, 1:05:08 PM2/6/14
to d...@ctakes.apache.org, ytex-...@googlegroups.com, ctake...@incubator.apache.org, vlad.va...@gmail.com
The cTAKES sentence detector is not changed in the YTEX branch.  The YTEX branch has an *additional* sentence detector that does not automatically split sentences on newlines - users can use this if they like.

-vj


On Thu, Feb 6, 2014 at 1:01 PM, Finan, Sean <Sean....@childrens.harvard.edu> wrote:
Hi Vijay,


>  I have yet to run across clinical text from a real EMR where newlines represent the end of a sentence

Since James pointed out this possibility a couple weeks ago, I have kept my eyes open.  The problem is pretty ubiquitous in a corpus that I'm working with right now.  I just opened the first note and gave it a count ... 95 lines total, 9 are sentence/phrase (lacking punctuation) endings.  This is not including lists, which comprise about half of the note.
One possible conjoinment was "Will consider [...] biopsy\nGiven [...]".  Depending upon how cTakes deals with it, the meaning could change drastically.


> I believe cTAKES absolutely has to support sentences with newlines within them

Yes, cTakes should do so, but I hope that you aren't suggesting that it only support such a structure.

Where is that easy button?
Reply all
Reply to author
Forward
0 new messages