multi-threading in data writers

10 views
Skip to first unread message

Miller, Timothy

unread,
Nov 11, 2014, 1:38:39 PM11/11/14
to cleart...@googlegroups.com
I am using UIMA-AS to run multiple pipelines on cTAKES. At the end of
each pipeline I have some data writers that want to write examples. From
what I can tell by looking at the LibLinear data writers, they are both
going to try to open the a print writer for the same filename, with
possible weird behavior and/or exceptions to follow. Is my understanding
correct? Any good way within cleartk (API) around this issue?
Tim

Steven Bethard

unread,
Nov 11, 2014, 6:51:10 PM11/11/14
to cleart...@googlegroups.com
The file name is fixed, but the directory is a parameter to
DirectoryDataWriterFactory. Is there a reason you can't provide
different directories for the different pipelines?

Steve
> --
> You received this message because you are subscribed to the Google Groups "cleartk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cleartk-user...@googlegroups.com.
> To post to this group, send email to cleart...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cleartk-users.
> For more options, visit https://groups.google.com/d/optout.

Miller, Timothy

unread,
Nov 12, 2014, 11:14:40 AM11/12/14
to cleart...@googlegroups.com
Yeah, that's probably doable, the downside is since I'm using descriptor
files it requires editing an xml file between starting up pipelines,
which is doable. If the goal is to use all the data points to train a
model, is there a built-in way to concatenate training examples and
reconcile encoders and outcome lookups?
Tim

Steven Bethard

unread,
Nov 12, 2014, 1:42:26 PM11/12/14
to cleart...@googlegroups.com
There's no builtin way. But maybe you could write your own DataWriter
along the lines of
org.cleartk.ml.feature.transform.InstanceDataWriter.java, which just
serializes the Instance objects directly, and then after you merge all
your instances, read in your merged Instances with something like
org.cleartk.ml.feature.transform.InstanceStream and then run the
appropriate DataWriter for your machine learning library, along the
lines of:

https://code.google.com/p/cleartk/source/browse/cleartk-examples/src/main/java/org/cleartk/examples/documentclassification/advanced/DocumentClassificationEvaluation.java#251

?

Steve
Reply all
Reply to author
Forward
0 new messages