JiangConrathComparator for wikitionary is taking longer time to run

59 views
Skip to first unread message

Raihan Ul Islam

unread,
May 12, 2014, 4:50:08 AM5/12/14
to dkpro-simil...@googlegroups.com
Dear Concern,

I am running into problem with   using  JiangConrathComparator  for  wikitionary. The program is running for days. I also added -Xmx5G in vm environment configuration. 

I will be very grateful if any body can pointout me any advice.

The code i am running is below.

        String[] tokens1 = "This is a short example text.".split(" ");
String[] tokens2 = "A short example text could look like that.".split(" ");

List<String> tokens1List = Arrays.asList(tokens1); 
List<String> tokens2List = Arrays.asList(tokens2); 

       LexicalSemanticResource wiktionary = ResourceFactory.getInstance().get("wiktionary", "en");
       wiktionary.setIsCaseSensitive(false);

        LexSemResourceComparator comparator = new JiangConrathComparator(wiktionary, wiktionary.getRoot());
        assertEquals("JiangConrathComparator", comparator.getName());
        MCS06AggregateComparator m = new MCS06AggregateComparator(comparator,new File(pathToIDFFile));
        System.out.println(m.getSimilarity(tokens1List, tokens1List));
        System.out.println(m.getSimilarity(tokens1List, tokens2List));

The log of the output is below

2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 INFO  FileSystemXmlApplicationContext:503 - Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@3382f8ae: startup date [Thu May 08 16:03:49 CEST 2014]; root of context hierarchy
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 INFO  XmlBeanDefinitionReader:315 - Loading XML bean definitions from URL [file:/D:/DropBox/TuDarmstadt/MUGC/DKPRO_HOME/de.tudarmstadt.ukp.dkpro.lexsemresource.core.ResourceFactory/resources.xml]
2014-05-08 16:03:49 DEBUG DefaultDocumentLoader:72 - Using JAXP provider [org.apache.xerces.jaxp.DocumentBuilderFactoryImpl]
2014-05-08 16:03:49 DEBUG PluggableSchemaResolver:140 - Loading schema mappings from [META-INF/spring.schemas]
2014-05-08 16:03:49 DEBUG PluggableSchemaResolver:146 - Loaded schema mappings: {http://www.springframework.org/schema/util/spring-util.xsd=org/springframework/beans/factory/xml/spring-util-3.1.xsd, http://www.springframework.org/schema/beans/spring-beans-3.1.xsd=org/springframework/beans/factory/xml/spring-beans-3.1.xsd, http://www.springframework.org/schema/task/spring-task.xsd=org/springframework/scheduling/config/spring-task-3.1.xsd, http://www.springframework.org/schema/cache/spring-cache.xsd=org/springframework/cache/config/spring-cache-3.1.xsd, http://www.springframework.org/schema/aop/spring-aop-3.0.xsd=org/springframework/aop/config/spring-aop-3.0.xsd, http://www.springframework.org/schema/task/spring-task-3.1.xsd=org/springframework/scheduling/config/spring-task-3.1.xsd, http://www.springframework.org/schema/aop/spring-aop-2.0.xsd=org/springframework/aop/config/spring-aop-2.0.xsd, http://www.springframework.org/schema/tool/spring-tool-2.5.xsd=org/springframework/beans/factory/xml/spring-tool-2.5.xsd, http://gate.ac.uk/ns/spring.xsd=gate/util/spring/xml/gate-spring.xsd, http://www.springframework.org/schema/beans/spring-beans.xsd=org/springframework/beans/factory/xml/spring-beans-3.1.xsd, http://www.springframework.org/schema/jee/spring-jee-2.5.xsd=org/springframework/ejb/config/spring-jee-2.5.xsd, http://www.springframework.org/schema/tool/spring-tool-3.1.xsd=org/springframework/beans/factory/xml/spring-tool-3.1.xsd, http://www.springframework.org/schema/jee/spring-jee-3.1.xsd=org/springframework/ejb/config/spring-jee-3.1.xsd, http://www.springframework.org/schema/aop/spring-aop.xsd=org/springframework/aop/config/spring-aop-3.1.xsd, http://www.springframework.org/schema/beans/spring-beans-2.0.xsd=org/springframework/beans/factory/xml/spring-beans-2.0.xsd, http://www.springframework.org/schema/beans/spring-beans-3.0.xsd=org/springframework/beans/factory/xml/spring-beans-3.0.xsd, http://www.springframework.org/schema/task/spring-task-3.0.xsd=org/springframework/scheduling/config/spring-task-3.0.xsd, http://www.springframework.org/schema/tx/spring-tx-2.5.xsd=org/springframework/transaction/config/spring-tx-2.5.xsd, http://www.springframework.org/schema/context/spring-context-2.5.xsd=org/springframework/context/config/spring-context-2.5.xsd, http://www.springframework.org/schema/tool/spring-tool-3.0.xsd=org/springframework/beans/factory/xml/spring-tool-3.0.xsd, http://www.springframework.org/schema/util/spring-util-2.5.xsd=org/springframework/beans/factory/xml/spring-util-2.5.xsd, http://www.springframework.org/schema/tool/spring-tool-2.0.xsd=org/springframework/beans/factory/xml/spring-tool-2.0.xsd, http://www.springframework.org/schema/tx/spring-tx.xsd=org/springframework/transaction/config/spring-tx-3.1.xsd, http://www.springframework.org/schema/lang/spring-lang.xsd=org/springframework/scripting/config/spring-lang-3.1.xsd, http://www.springframework.org/schema/lang/spring-lang-2.5.xsd=org/springframework/scripting/config/spring-lang-2.5.xsd, http://www.springframework.org/schema/jee/spring-jee-3.0.xsd=org/springframework/ejb/config/spring-jee-3.0.xsd, http://www.springframework.org/schema/jee/spring-jee-2.0.xsd=org/springframework/ejb/config/spring-jee-2.0.xsd, http://www.springframework.org/schema/tx/spring-tx-3.1.xsd=org/springframework/transaction/config/spring-tx-3.1.xsd, http://www.springframework.org/schema/context/spring-context-3.1.xsd=org/springframework/context/config/spring-context-3.1.xsd, http://www.springframework.org/schema/util/spring-util-3.1.xsd=org/springframework/beans/factory/xml/spring-util-3.1.xsd, http://www.springframework.org/schema/lang/spring-lang-3.1.xsd=org/springframework/scripting/config/spring-lang-3.1.xsd, http://www.springframework.org/schema/cache/spring-cache-3.1.xsd=org/springframework/cache/config/spring-cache-3.1.xsd, http://www.springframework.org/schema/context/spring-context.xsd=org/springframework/context/config/spring-context-3.1.xsd, http://www.springframework.org/schema/jee/spring-jee.xsd=org/springframework/ejb/config/spring-jee-3.1.xsd, http://www.springframework.org/schema/aop/spring-aop-2.5.xsd=org/springframework/aop/config/spring-aop-2.5.xsd, http://www.springframework.org/schema/tx/spring-tx-2.0.xsd=org/springframework/transaction/config/spring-tx-2.0.xsd, http://www.springframework.org/schema/aop/spring-aop-3.1.xsd=org/springframework/aop/config/spring-aop-3.1.xsd, http://www.springframework.org/schema/tx/spring-tx-3.0.xsd=org/springframework/transaction/config/spring-tx-3.0.xsd, http://www.springframework.org/schema/context/spring-context-3.0.xsd=org/springframework/context/config/spring-context-3.0.xsd, http://www.springframework.org/schema/tool/spring-tool.xsd=org/springframework/beans/factory/xml/spring-tool-3.1.xsd, http://www.springframework.org/schema/util/spring-util-3.0.xsd=org/springframework/beans/factory/xml/spring-util-3.0.xsd, http://www.springframework.org/schema/util/spring-util-2.0.xsd=org/springframework/beans/factory/xml/spring-util-2.0.xsd, http://www.springframework.org/schema/lang/spring-lang-3.0.xsd=org/springframework/scripting/config/spring-lang-3.0.xsd, http://www.springframework.org/schema/lang/spring-lang-2.0.xsd=org/springframework/scripting/config/spring-lang-2.0.xsd, http://www.springframework.org/schema/beans/spring-beans-2.5.xsd=org/springframework/beans/factory/xml/spring-beans-2.5.xsd}
2014-05-08 16:03:49 DEBUG PluggableSchemaResolver:118 - Found XML schema [http://www.springframework.org/schema/beans/spring-beans-2.5.xsd] in classpath: org/springframework/beans/factory/xml/spring-beans-2.5.xsd
2014-05-08 16:03:49 DEBUG DefaultBeanDefinitionDocumentReader:108 - Loading bean definitions
2014-05-08 16:03:49 DEBUG BeanDefinitionParserDelegate:497 - Neither XML 'id' nor 'name' specified - using generated bean name [org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0]
2014-05-08 16:03:49 DEBUG XmlBeanDefinitionReader:216 - Loaded 3 bean definitions from location pattern [file:/D:/DropBox/TuDarmstadt/MUGC/DKPRO_HOME/de.tudarmstadt.ukp.dkpro.lexsemresource.core.ResourceFactory/resources.xml]
2014-05-08 16:03:49 DEBUG FileSystemXmlApplicationContext:533 - Bean factory for org.springframework.context.support.FileSystemXmlApplicationContext@3382f8ae: org.springframework.beans.factory.support.DefaultListableBeanFactory@40dd3977: defining beans [org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0,wordnet-en,wiktionary-en]; root of factory hierarchy
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:217 - Creating shared instance of singleton bean 'org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0'
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:430 - Creating instance of bean 'org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0'
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:504 - Eagerly caching bean 'org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0' to allow for resolving potential circular references
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:458 - Finished creating instance of bean 'org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0'
2014-05-08 16:03:49 DEBUG FileSystemXmlApplicationContext:800 - Unable to locate MessageSource with name 'messageSource': using default [org.springframework.context.support.DelegatingMessageSource@1b0a7baf]
2014-05-08 16:03:49 DEBUG FileSystemXmlApplicationContext:824 - Unable to locate ApplicationEventMulticaster with name 'applicationEventMulticaster': using default [org.springframework.context.event.SimpleApplicationEventMulticaster@59fc684e]
2014-05-08 16:03:49 INFO  DefaultListableBeanFactory:581 - Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@40dd3977: defining beans [org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0,wordnet-en,wiktionary-en]; root of factory hierarchy
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:245 - Returning cached instance of singleton bean 'org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0'
2014-05-08 16:03:49 DEBUG FileSystemXmlApplicationContext:851 - Unable to locate LifecycleProcessor with name 'lifecycleProcessor': using default [org.springframework.context.support.DefaultLifecycleProcessor@54709809]
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:245 - Returning cached instance of singleton bean 'lifecycleProcessor'
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:217 - Creating shared instance of singleton bean 'wiktionary-en'
2014-05-08 16:03:49 DEBUG DefaultListableBeanFactory:430 - Creating instance of bean 'wiktionary-en'
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG StandardEnvironment:114 - Initializing new StandardEnvironment
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemProperties] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:104 - Adding [systemEnvironment] PropertySource with lowest search precedence
2014-05-08 16:03:49 DEBUG StandardEnvironment:120 - Initialized StandardEnvironment with PropertySources [systemProperties,systemEnvironment]
2014-05-08 16:03:49 DEBUG BeanUtils:442 - No property editor [de.tudarmstadt.ukp.wiktionary.api.LanguageEditor] found for type de.tudarmstadt.ukp.wiktionary.api.Language according to 'Editor' suffix convention
2014-05-08 16:03:50 INFO  WiktionaryResource:91 - Setting PreferedEntryLanguage to ENGLISH
2014-05-08 16:03:50 INFO  WiktionaryResource:95 - Setting PreferedWordLanguage to ENGLISH
2014-05-08 16:03:50 DEBUG DefaultListableBeanFactory:504 - Eagerly caching bean 'wiktionary-en' to allow for resolving potential circular references
2014-05-08 16:03:50 DEBUG DefaultListableBeanFactory:458 - Finished creating instance of bean 'wiktionary-en'
2014-05-08 16:07:35 INFO  EntityGraphJGraphT:132 - Creating entity graph.
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 1 of 1857013 (1%  ETA 06:42:21.156  RUN 00:00:00.13   AVG 13  LAST 13)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 2 of 1857013 (1%  ETA 04:23:04.594  RUN 00:00:00.17   AVG 9  LAST 4)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 3 of 1857013 (1%  ETA 03:46:58.73   RUN 00:00:00.22   AVG 7  LAST 5)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 4 of 1857013 (1%  ETA 03:13:26.306  RUN 00:00:00.25   AVG 6  LAST 3)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 5 of 1857013 (1%  ETA 02:47:07.843  RUN 00:00:00.27   AVG 5  LAST 2)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 6 of 1857013 (1%  ETA 02:34:45.35   RUN 00:00:00.30   AVG 5  LAST 3)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 7 of 1857013 (1%  ETA 02:21:29.170  RUN 00:00:00.32   AVG 5  LAST 2)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 8 of 1857013 (1%  ETA 02:15:24.397  RUN 00:00:00.35   AVG 4  LAST 3)
2014-05-08 16:07:35 DEBUG EntityGraphJGraphT:199 - 9 of 1857013 (1%  ETA 02:41:37.688  RUN 00:00:00.47   AVG 5  LAST 12)
2014-05-08 16:07:36 DEBUG EntityGraphJGraphT:199 - 10 of 1857013 (1%  ETA 02:50:13.517  RUN 00:00:00.55   AVG 6  LAST 8)
2014-05-08 16:07:36 DEBUG EntityGraphJGraphT:199 - 11 of 1857013 (1%  ETA 02:37:33.828  RUN 00:00:00.56   AVG 5  LAST 1)
2014-05-08 16:07:36 DEBUG EntityGraphJGraphT:199 - 12 of 1857013 (1%  ETA 02:29:35.505  RUN 00:00:00.58   AVG 5  LAST 2)
2014-05-08 16:07:36 DEBUG EntityGraphJGraphT:199 - 13 of 1857013 (1%  ETA 02:22:50.769  RUN 00:00:00.60   AVG 5  LAST 2)
......................

2014-05-12 09:51:48 DEBUG EntityGraphJGraphT:199 - 1671933 of 1857013 (91%  ETA 09:56:01.288  RUN 89:44:12.76   AVG 193  LAST 30199)
2014-05-12 09:52:16 DEBUG EntityGraphJGraphT:199 - 1671934 of 1857013 (91%  ETA 09:56:04.269  RUN 89:44:40.943  AVG 193  LAST 28867)
2014-05-12 09:52:45 DEBUG EntityGraphJGraphT:199 - 1671935 of 1857013 (91%  ETA 09:56:07.222  RUN 89:45:09.554  AVG 193  LAST 28611)

Torsten Zesch

unread,
May 12, 2014, 5:17:00 AM5/12/14
to DKPro Similarity Users
The logging output that you are seeing creates a graph representation
of wiktionary in order to speed up computation afterwards.
It always takes some time to create the graph, but several days is
clearly longer than expected.
However, from what I see it should be less than one day until it reaches 100%.
Maybe you should wait at least that long.

-Torsten
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dkpro-similarity-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Raihan Ul Islam

unread,
May 14, 2014, 9:18:48 AM5/14/14
to dkpro-simil...@googlegroups.com
Dear Concern,

I have been running the below mentioned code for two days in a high-end server (64 core server with 250 GB ram). Still it is showing 90% for full one day. Usually from 0% to 89% the program runs very fast after wards from 90% it is taking almost one day.

It will be very helpful if anybody can provide some advice on how to resolve this.

Thanks
Raihan

Richard Eckart de Castilho

unread,
May 14, 2014, 9:32:26 AM5/14/14
to Raihan Ul Islam, dkpro-simil...@googlegroups.com
Hello Raihan,

Does the process still consume CPU power (check e.g. using the "top" command)?

You could try using the "jstack" to produce a thread dump - maybe that tells us where it got stuck.

-- Richard
> <snip>

Raihan Ul Islam

unread,
May 23, 2014, 4:18:40 AM5/23/14
to dkpro-simil...@googlegroups.com
Dear Concern,

I am running the program for four days . But still it is running. Does any one have the graph generated files for wikitionary like the file genrated by wordnet ? Will it be possiable to share it?

Thanks
Raihan
On Monday, May 12, 2014 10:50:08 AM UTC+2, Raihan Ul Islam wrote:

Raihan Ul Islam

unread,
Jun 4, 2014, 7:00:31 AM6/4/14
to dkpro-simil...@googlegroups.com
Hi  Richard,

Please find the thread dump in below. 

C:\Program Files\Java\jdk1.7.0_51\bin>jstack 8460
2014-06-04 12:56:24
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode):

"Service Thread" daemon prio=6 tid=0x0000000052e75800 nid=0x27d4 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x0000000052e70000 nid=0x2148 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x0000000052e6e800 nid=0x1f60 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x0000000052e6d800 nid=0x25c8 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x0000000052e6d000 nid=0x2040 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x0000000052e57000 nid=0x1d08 in Object.wait() [0x0000000058edf000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x000000008cb83f68> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x000000008cb83f68> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x0000000054f87800 nid=0x1dcc in Object.wait() [0x0000000058ddf000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x000000008cb83f98> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x000000008cb83f98> (a java.lang.ref.Reference$Lock)

"main" prio=6 tid=0x00000000030b0800 nid=0x1910 runnable [0x00000000030ae000]
   java.lang.Thread.State: RUNNABLE
        at de.tudarmstadt.ukp.wiktionary.api.Wiktionary.filter(Wiktionary.java:356)
        at de.tudarmstadt.ukp.wiktionary.api.Wiktionary.getWordEntries(Wiktionary.java:274)
        at de.tudarmstadt.ukp.dkpro.lexsemresource.wiktionary.util.WiktionaryUtils.entityToWords(WiktionaryUtils.java:78)
        at de.tudarmstadt.ukp.dkpro.lexsemresource.wiktionary.WiktionaryResource.getParents(WiktionaryResource.java:184)
        at de.tudarmstadt.ukp.dkpro.lexsemresource.graph.EntityGraphJGraphT.createGraph(EntityGraphJGraphT.java:185)
        at de.tudarmstadt.ukp.dkpro.lexsemresource.graph.EntityGraphJGraphT.getEntityGraphJGraphT(EntityGraphJGraphT.java:133)
        at de.tudarmstadt.ukp.dkpro.lexsemresource.graph.EntityGraphManager.getEntityGraph(EntityGraphManager.java:52)
        at dkpro.similarity.algorithms.lsr.path.PathBasedComparator.initialize(PathBasedComparator.java:94)
        at dkpro.similarity.algorithms.lsr.path.PathBasedComparator.<init>(PathBasedComparator.java:82)
        at dkpro.similarity.algorithms.lsr.path.JiangConrathComparator.<init>(JiangConrathComparator.java:59)
        at de.tudarmstadt.tk.mugc.prototype.smilarityMatrices.LexicalAlgorithms.main(LexicalAlgorithms.java:91)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)

"VM Thread" prio=10 tid=0x0000000054f82800 nid=0x1f40 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000030c6800 nid=0x22a4 runnable

"GC task thread#1 (ParallelGC)" prio=6 tid=0x00000000030c8000 nid=0x2b8 runnable

"GC task thread#2 (ParallelGC)" prio=6 tid=0x00000000030c9800 nid=0x1ab8 runnable

"GC task thread#3 (ParallelGC)" prio=6 tid=0x00000000030cc000 nid=0x1d90 runnable

"GC task thread#4 (ParallelGC)" prio=6 tid=0x00000000030ce000 nid=0x21d4 runnable

"GC task thread#5 (ParallelGC)" prio=6 tid=0x00000000030cf800 nid=0x207c runnable

"GC task thread#6 (ParallelGC)" prio=6 tid=0x00000000030d4000 nid=0x2d4 runnable

"GC task thread#7 (ParallelGC)" prio=6 tid=0x00000000030d5000 nid=0x1030 runnable

"GC task thread#8 (ParallelGC)" prio=6 tid=0x00000000030d8000 nid=0x1eb0 runnable

"GC task thread#9 (ParallelGC)" prio=6 tid=0x00000000030d9000 nid=0x26b0 runnable

"GC task thread#10 (ParallelGC)" prio=6 tid=0x00000000030d9800 nid=0x2188 runnable

"GC task thread#11 (ParallelGC)" prio=6 tid=0x00000000030dc800 nid=0xd34 runnable

"GC task thread#12 (ParallelGC)" prio=6 tid=0x00000000030e1800 nid=0x1d54 runnable

"GC task thread#13 (ParallelGC)" prio=6 tid=0x00000000030e2000 nid=0x269c runnable

"GC task thread#14 (ParallelGC)" prio=6 tid=0x00000000030e3000 nid=0x268c runnable

"GC task thread#15 (ParallelGC)" prio=6 tid=0x00000000030e3800 nid=0x308 runnable

"GC task thread#16 (ParallelGC)" prio=6 tid=0x00000000030e8800 nid=0x213c runnable

"GC task thread#17 (ParallelGC)" prio=6 tid=0x00000000030eb800 nid=0x27a4 runnable

"GC task thread#18 (ParallelGC)" prio=6 tid=0x00000000030ea800 nid=0x18d0 runnable

"GC task thread#19 (ParallelGC)" prio=6 tid=0x00000000030e9800 nid=0x1b28 runnable

"GC task thread#20 (ParallelGC)" prio=6 tid=0x00000000030eb000 nid=0x181c runnable

"GC task thread#21 (ParallelGC)" prio=6 tid=0x00000000030ec000 nid=0x261c runnable

"GC task thread#22 (ParallelGC)" prio=6 tid=0x00000000030e9000 nid=0x25e8 runnable

"GC task thread#23 (ParallelGC)" prio=6 tid=0x00000000030ed800 nid=0x1b14 runnable

"GC task thread#24 (ParallelGC)" prio=6 tid=0x00000000030f0000 nid=0x2648 runnable

"GC task thread#25 (ParallelGC)" prio=6 tid=0x00000000030ee000 nid=0x24bc runnable

"GC task thread#26 (ParallelGC)" prio=6 tid=0x00000000030ed000 nid=0x1ed8 runnable

"GC task thread#27 (ParallelGC)" prio=6 tid=0x00000000030ee800 nid=0x270c runnable

"GC task thread#28 (ParallelGC)" prio=6 tid=0x00000000030ef800 nid=0x1404 runnable

"GC task thread#29 (ParallelGC)" prio=6 tid=0x0000000003104000 nid=0x1e2c runnable

"GC task thread#30 (ParallelGC)" prio=6 tid=0x0000000003105000 nid=0x2330 runnable

"GC task thread#31 (ParallelGC)" prio=6 tid=0x0000000003105800 nid=0x2730 runnable

"GC task thread#32 (ParallelGC)" prio=6 tid=0x0000000003102800 nid=0x22b4 runnable

"GC task thread#33 (ParallelGC)" prio=6 tid=0x0000000003106000 nid=0x1e5c runnable

"GC task thread#34 (ParallelGC)" prio=6 tid=0x0000000003103800 nid=0x2410 runnable

"GC task thread#35 (ParallelGC)" prio=6 tid=0x0000000003103000 nid=0x194c runnable

"GC task thread#36 (ParallelGC)" prio=6 tid=0x0000000003108000 nid=0x1ecc runnable

"GC task thread#37 (ParallelGC)" prio=6 tid=0x0000000003107800 nid=0x166c runnable

"GC task thread#38 (ParallelGC)" prio=6 tid=0x000000000310a000 nid=0x2678 runnable

"GC task thread#39 (ParallelGC)" prio=6 tid=0x0000000003109000 nid=0x270 runnable

"GC task thread#40 (ParallelGC)" prio=6 tid=0x0000000003106800 nid=0x25e0 runnable

"GC task thread#41 (ParallelGC)" prio=6 tid=0x000000000310b000 nid=0x2778 runnable

"GC task thread#42 (ParallelGC)" prio=6 tid=0x000000000310a800 nid=0x25dc runnable

"VM Periodic Task Thread" prio=10 tid=0x0000000052e8b000 nid=0x1dd8 waiting on condition

JNI global references: 180


C:\Program Files\Java\jdk1.7.0_51\bin>


Also log output in below
2014-06-04 12:51:35 DEBUG EntityGraphJGraphT:199 - 1645249 of 1857013 (89%  ETA 00:06:13.850  RUN 00:48:24.539  AVG 2  LAST 0)
2014-06-04 12:51:35 DEBUG EntityGraphJGraphT:199 - 1645250 of 1857013 (89%  ETA 00:06:13.848  RUN 00:48:24.539  AVG 2  LAST 0)
2014-06-04 12:51:35 DEBUG EntityGraphJGraphT:199 - 1645251 of 1857013 (89%  ETA 00:06:13.846  RUN 00:48:24.539  AVG 2  LAST 0)
2014-06-04 12:52:46 DEBUG EntityGraphJGraphT:199 - 1645252 of 1857013 (89%  ETA 00:06:22.979  RUN 00:49:35.508  AVG 2  LAST 70969)
2014-06-04 12:53:48 DEBUG EntityGraphJGraphT:199 - 1645253 of 1857013 (89%  ETA 00:06:30.922  RUN 00:50:37.242  AVG 2  LAST 61734)
2014-06-04 12:54:51 DEBUG EntityGraphJGraphT:199 - 1645254 of 1857013 (89%  ETA 00:06:39.61   RUN 00:51:40.493  AVG 2  LAST 63251)
2014-06-04 12:55:57 DEBUG EntityGraphJGraphT:199 - 1645255 of 1857013 (89%  ETA 00:06:47.486  RUN 00:52:45.961  AVG 2  LAST 65468)
2014-06-04 12:56:31 DEBUG EntityGraphJGraphT:199 - 1645256 of 1857013 (89%  ETA 00:06:51.892  RUN 00:53:20.212  AVG 2  LAST 34251)
2014-06-04 12:57:05 DEBUG EntityGraphJGraphT:199 - 1645257 of 1857013 (89%  ETA 00:06:56.304  RUN 00:53:54.509  AVG 2  LAST 34297)
2014-06-04 12:57:40 DEBUG EntityGraphJGraphT:199 - 1645258 of 1857013 (89%  ETA 00:07:00.804  RUN 00:54:29.493  AVG 2  LAST 34984)
2014-06-04 12:58:13 DEBUG EntityGraphJGraphT:199 - 1645259 of 1857013 (89%  ETA 00:07:04.973  RUN 00:55:01.899  AVG 2  LAST 32406)
2014-06-04 12:58:48 DEBUG EntityGraphJGraphT:199 - 1645260 of 1857013 (89%  ETA 00:07:09.590  RUN 00:55:37.790  AVG 2  LAST 35891)

Any idea how to make it faster

Thanks
Raihan

Richard Eckart de Castilho

unread,
Jun 18, 2014, 12:18:30 PM6/18/14
to Raihan Ul Islam, dkpro-simil...@googlegroups.com
Hi all,

I can reproduce the problem. After a while, the progress simply goes to snail speed.

This is the last output before it starts getting slow:

DEBUG EntityGraphJGraphT:199 - 1641784 of 1857013 (89% ETA 00:01:56.763 RUN 00:14:50.675 AVG 1 LAST 1)

Here is my diagnosis and suspicion:

- The process gets stuck in one of the Wiktionary.filter() methods which strips non-matching WordEntries returned from a BDB query.
- This removal is slow when the result set form the BDB is large. So why is it large?
- The method WiktionaryEdition.getPagesForWord(String, boolean) uses WiktionaryPage.normalizeTitle(word) to normalize the word for
which pages are to be fetched from the BDB. Normally, there should only be a few Wiktionary pages for each normalized word.
- However, the normalization is done by simply replacing all non-ASCII characters. This leads to a problem.
- A query for a word that contains only non-ASCII characters is reduced to the emtpy string ""
- A lot of page titles have been reduced to the empty string as well when the BDB database has been created from Wiktionary
- As a result, the query for any non-ASCII word takes ages.
- I assume that the reason the process goes slow towards the end is, that the words that are iterated over are somehow sorted with those words starting with non-ASCII characters being towards the end

It might be possible to fix this in DKPro LSR in the WiktionaryResource getParents(), getChildren(), and getEntity()/getEntities() methods by simply ignoring all words that normalize to the empty string.

Anybody care to try that?

Since the version of JWKTL used in DKPro Similarity is still closed-source, it is not possible to fix it there. Newer versions of JWKTL or alternative builds of the Wiktionary BDB databases may not have the problem either.

Cheers,

-- Richard

P.S.: The method to get the number of entities in a lexical resource is also terribly slow for Wiktionary... I had to copy the BDB database into a RAM disk in order to be able to even get past the entity counting...

Torsten Zesch

unread,
Jun 19, 2014, 6:26:45 PM6/19/14
to Richard Eckart de Castilho, Raihan Ul Islam, DKPro Similarity Users
I agree, this should be fixed in LSR.
However, I will probably be very slow to fix that.
Someone interested in taking over maintenance of DKPro LSR?

-Torsten


2014-06-18 18:18 GMT+02:00 Richard Eckart de Castilho
<richard...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages