Reader with multiple Threads

27 views
Skip to first unread message

eisior...@gmail.com

unread,
May 11, 2013, 2:13:42 PM5/11/13
to uimafi...@googlegroups.com
Hello,

i have a Reader that reads all entries of a mongoDB and creates CAS Documents of all the entries. In the DB are 3 Million documents.I start a UIMAFit Process and i set maxThreadCount to 10. After i realized that the Reader produced more CAS docs than the the datebase returns i stopped the process. Could it be that there are multiple objects of readers that all read all entries of the database? Do i have to set some parameters within the reader for a multi threaded evironment to prevent this?

Thanks Andreas


Richard Eckart de Castilho

unread,
May 12, 2013, 5:57:58 AM5/12/13
to uimafi...@googlegroups.com
Hi,

I wonder how you managed to run a reader multi-threaded. What did you use?

Readers are normally not meant to be run on multiple threads. If you wanted to have a reader run in multiple threads, you'd have to somehow coordinate all instances, otherwise you'd exactly see what you are seeing: each reader reads everything.

-- Richard

eisior...@gmail.com

unread,
May 12, 2013, 8:22:50 AM5/12/13
to uimafi...@googlegroups.com
Hi,

basically i just initialize the reader and put it into the multi Thread CpeBuilder.

                CpeBuilder builder = new CpeBuilder();
                builder.setReader(reader);
                builder.setMaxProcessingUnitThreatCount(threatCount);
                StatusCallbackListenerImpl status = new StatusCallbackListenerImpl();
               
                AnalysisEngineDescription ae = AnalysisEngineFactory.createAggregateDescription(ClearToken,ClearDict,Token,tagger, mapper, filter, Token_NER_TAGGER,unify,solr,epolMongoDBTokenUpdater);
             
               
                ae.getAnalysisEngineMetaData().setCapabilities(capabilities);
                builder.setAnalysisEngine(ae);
               
                CollectionProcessingEngine engine = builder.createCpe(status);
                engine.process();

Richard Eckart de Castilho

unread,
May 12, 2013, 8:30:58 AM5/12/13
to uimafi...@googlegroups.com
Hi,

the UIMA CPE runs the reader in a single thread, then scales out all AEs until it reaches the first AE in the pipeline that has set multipleDeploymentAllowed = false (that's UIMA CAS Consumers or uimaFIT (J)CasConsumer_ImplBase which are technically AEs).

See also: http://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.xml.cpe_descriptor.overview

In that case, you shouldn't see more data than you have in your DB.

-- Richard
Reply all
Reply to author
Forward
0 new messages