configuration step freezed

59 views
Skip to first unread message

Andrea Apicella

unread,
Mar 16, 2015, 3:16:01 PM3/16/15
to wiki...@googlegroups.com
Hi all,
I'm trying installing full english language to evaluate similarity between wikipedia categories.
I started yesterday the process from gui and it has gone well right a few time ago, then log window keep remaining to

mar 16, 2015 7:34:00 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893100000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:01 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893200000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:01 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893300000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:01 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893400000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:02 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893500482, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:03 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893600000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:03 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893700000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:04 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893800000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:04 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 893900000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:05 PM org.wikibrain.utils.ParallelForEach$4 run
INFORMAZIONI: processing iterable 894000000
mar 16, 2015 7:34:05 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 894000000, found 541011312 interesting and 335216479 new
mar 16, 2015 7:34:05 PM org.wikibrain.loader.SqlLinksLoader processOneLink
INFORMAZIONI: Processed link 894100000, found 541011312 interesting and 335216479 new

now are the 08:15 PM and it's still stopped at that point with java process using just the 0.7-2.0% of CPU ( 37.0% memory) instead of about 390% (I have 4 core cpu with 12 GB of RAM) like until rencently. RIght now, I have in my directory 98.8 GB of the stimated 152 GB requested. What could be happened? It's normal that it ishandling? Can I stop and restart the process from this point or, if I stop the process, I'll lost all the data computed right now and reboot all the process from the beginning?
So, if this, how can I avoid having again this situation?

This is the configuration I launched with gui:
java memory: 10 GB
language: en
data source H2
selected phases:
basic data
lucene
phrases
concepts
wikidata
semantic relatedness

this is the initial diagnostic:

* ALL DIAGNOSTIC TESTS SUCCEEDED! **
*************************************

Rough estimate of download size: 25620,0 MBs
    This may be an over-estimate if some files have already been downloaded.
    Time on dial-up (50kbs): 85400,0 minutes
    Time on Broadband (1Mbs): 4270,0 minutes
    Time on Broadband (10Mbs): 427,0 minutes
    Time on Broadband (100Mbs): 42,7 minutes
    stage download will download about 22080,0 about MBs
    stage concepts will download about 660,0 about MBs
    stage wikidata will download about 2880,0 about MBs

Completion time estimate: 1792,3 minutes (NOT including download time)
    stage fetchlinks: 0,0 minutes
    stage download: 0,0 minutes
    stage dumploader: 137,9 minutes
    stage redirects: 7,1 minutes
    stage wikitext: 1004,4 minutes
    stage lucene: 370,6 minutes
    stage phrases: 77,6 minutes
    stage concepts: 41,7 minutes
    stage wikidata: 129,2 minutes
    stage sr: 23,8 minutes

Disk space is okay. (need 152,780 GBs, have 172,938 GBs)
    Warning: Available disk space may be INACCURATE if you have multiple drives.
    stage fetchlinks: 1,2 MBs
    stage download: 22080,0 MBs
    stage dumploader: 31542,9 MBs
    stage redirects: 1577,1 MBs
    stage wikitext: 45000,0 MBs
    stage lucene: 39428,6 MBs
    stage phrases: 9000,0 MBs
    stage concepts: 1577,1 MBs
    stage wikidata: 6000,0 MBs
    stage sr: 240,0 MBs

Amount of memory allocated for the JVM is okay
    memory required: 8,0GB
    memory allocated: 9,5GB

Connection to database succeeded. Active configuration:
    username: "sa"
    partitions: "default"
    password: ""
    connectionsPerPartition: 2
    url: "jdbc:h2:./db/h2;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0;MAX_OPERATION_MEMORY=100000000"
    driver: "org.h2.Driver"


thank you
best regards!

Andrea Apicella

unread,
Mar 17, 2015, 12:11:46 PM3/17/15
to wiki...@googlegroups.com
UPDATE: I try also on another machine (an iMac) and I have same behavior:


INFORMAZIONI: Processed link 893900000, found 541011312 interesting and 335216479 new
mar 17, 2015 8:40:28 AM org.wikibrain.utils.ParallelForEach$4 run
INFORMAZIONI: processing iterable 894000000
mar 17, 2015 8:40:28 AM org.wikibrain.loader.SqlLinksLoader processOneLink

INFORMAZIONI: Processed link 894000000, found 541011312 interesting and 335216479 new
mar 17, 2015 8:40:28 AM org.wikibrain.loader.SqlLinksLoader processOneLink

INFORMAZIONI: Processed link 894100000, found 541011312 interesting and 335216479 new

the freeze point seems to be the same; I think it's a bug (or a problem in the dump file). I opened an issue in the bug section.

Shilad Sen

unread,
Mar 17, 2015, 5:00:35 PM3/17/15
to Andrea Apicella, wiki...@googlegroups.com
Andrea,

This is indeed odd. I haven't encountered these problems with English Wikipedia, but I also have never tried using H2 for EN. Do you have access to a PostgreSQL database? If so, could you see if that helps?

-Shilad

--
You received this message because you are subscribed to the Google Groups "wikibrain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikibrain+...@googlegroups.com.
To post to this group, send email to wiki...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wikibrain/3d95bf8e-be3a-42c9-9241-81bd04a8dc36%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
ss...@macalester.edu

Andrea Apicella

unread,
Mar 17, 2015, 5:24:36 PM3/17/15
to wiki...@googlegroups.com, and....@gmail.com
thanks for your reply!
I'll install PosteGRE DB and I'll try with it. For now, I'm trying installing an old dump file (I modified method getDumps()  source code in this point:

        for (int i = availableDates.size() - 2; i > -1; i--) {

instead of

       for (int i = availableDates.size() - 1; i > -1; i--) {


so, it's downloading 2015/02/05 (completed dump) instead of 2015/03/04 (dump still in progress) (I know it's a very ugly solution for download another version, sorry for this :) ).
If I have still this problem, I'll try with postgre db. In that case..I have a slow connection..can I use a dump already downloaded instead of downloading it again? I throw away a day just for downloading :(
anyway, I'll update you on this tricky behavior!
thanks again
best regards!

Andrea Apicella

unread,
Mar 17, 2015, 9:02:57 PM3/17/15
to wiki...@googlegroups.com, and....@gmail.com
UPDATE:
system has crashed with old dump..
anyway, seems that keeping the "download" dir it doesn't download the files again.
now I'm trying with postgres db.

I will update you

PS I would know if the process brokens in some point, where I can restore without losing computations already done?Can I avoid other stages like download?


Il giorno martedì 17 marzo 2015 22:00:35 UTC+1, Shilad Sen ha scritto:

Andrea Apicella

unread,
Mar 18, 2015, 8:31:45 PM3/18/15
to wiki...@googlegroups.com, and....@gmail.com
I tried with Postgre DB and I obtain the same bad results of H2 like you can see in the image. It still freeze at 894100000 processed link, and I don't understand why it freezes.
So, I need semantic relations...can I build it with the database created until now?
thanks and sorry for all this
nada.jpeg

Andrea Apicella

unread,
Mar 20, 2015, 12:53:38 PM3/20/15
to wiki...@googlegroups.com, and....@gmail.com
OK, I don't stop the process when it seems freezed and it goes to next step after about 3 hours of "apparent death" (i suggest you a waiting message at that point :) ) and terminate successfully with postgre.
Thinking that my problem is just the time, i retried with H2, but in this case after about 48hours it remains freezed at same point.
So, the problem seems really to be the H2 database.
thanks for the help
regards
Reply all
Reply to author
Forward
0 new messages