Resource Sharing

49 views
Skip to first unread message

Tim Feuerbach

unread,
Dec 7, 2014, 8:49:41 AM12/7/14
to dkpro-c...@googlegroups.com
Hello,

I'm trying to get the experimental resource sharing feature of version 1.7.0 as described in issue 250 to work, since I got a 250 Mb model I don't want to instantiate per thread. I tried out the Stanford Parser example as follows (using uimafit-cpe):

String prop = "dkpro.core.resourceprovider.sharable." + StanfordParser.class.getName();
        System.setProperty(prop, "true");

        CollectionReaderDescription in = CollectionReaderFactory.createReaderDescription(TextReader.class,
                TextReader.PARAM_SOURCE_LOCATION, "data/*.txt", TextReader.PARAM_LANGUAGE, "en");

        AnalysisEngineDescription seg = AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class);

        AnalysisEngineDescription parse = AnalysisEngineFactory.createEngineDescription(StanfordParser.class);

        AnalysisEngineDescription out = AnalysisEngineFactory.createEngineDescription(TextWriter.class,
                TextWriter.PARAM_TARGET_LOCATION, "processed");

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(seg);
        aggregateBuilder.add(parse);
        aggregateBuilder.add(out);

        CpeBuilder cpeBuilder = new CpeBuilder();
        cpeBuilder.setReader(in);
        cpeBuilder.setAnalysisEngine(aggregateBuilder.createAggregateDescription());
        cpeBuilder.setMaxProcessingUnitThreadCount(4);

        cpeBuilder.createCpe(new StatusCallbackListener() {
            ...
        }).process();

According to the source code of ResourceObjectProvider, the log should print "Used resource from cache" when a resource is reused, however I get multiple entries of "INFORMATION: Producing resource took 8592ms" only.

Any ideas?


Richard Eckart de Castilho

unread,
Dec 7, 2014, 10:56:07 AM12/7/14
to dkpro-c...@googlegroups.com
Hi Tim,

well, this is still a very experimental feature ;)

The original use case was to share the model between multiple instances of a parse in the same pipeline, but using a single thread - that works.

In your multi-threaded environment, the sharing doesn't kick in, because the threads start loading the model in parallel.
Say you have 2 threads.

time.1) thread 1 looks if a cached model is present
time.1) thread 2 looks if a cached model is present
time.2) thread 1 starts loading the model because it did not find a cached version
time.2) thread 2 starts loading the model because it did not find a cached version
time.3) thread 1 completes loading the model and adds it to the cache
time.3) thread 2 completes loading the model and adds it to the cache

In order for this to work as expected in a multi-threaded environment, thread 1
would have to set a lock on the model until it completes loading it. Thread 2
would then (hopefully) see that lock and wait until the lock is released. Then
it would check the cache and find the already loaded model.

This goes beyond the original use-case and isn't implemented yet.

Cheers,

-- Richard

Tim Feuerbach

unread,
Dec 7, 2014, 4:00:48 PM12/7/14
to dkpro-c...@googlegroups.com
Thanks for your reply! I can now see the error in my thinking. I saw the "synchronized" on loadResource(), but forgot that it only locks object-wise, not class-wise.
Reply all
Reply to author
Forward
0 new messages