Trouble importing data

94 views
Skip to first unread message

dan....@gmail.com

unread,
Jun 10, 2016, 6:57:04 PM6/10/16
to wikibrain
Hi! Toby Li alerted me to the existence of Wikibrain, so I was just trying it out, to learn things about neighborhoods within cities. I opened up the GUI loader and just hit "run", and I'm getting this:
Exception in thread "main" org.wikibrain.core.WikiBrainException: No dumps for simple found before 20160610
...
LOADING FAILED!

I mean, it kind of makes sense - I never told it where to get any data. Is there a place where I should have done that? How does it know where to look? And have you seen this issue?
Below is the whole output, for your perusing pleasure.

Thanks much! Sounds like a great project, I'm looking forward to trying it.
Dan


running:
org.wikibrain.Loader org.wikibrain.Loader -l simple -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s sr -c customized.conf


15:50:38.936 [main] INFO org.wikibrain.core.cmd.Env - Configured default logging at the Info Level
15:50:38.937 [main] INFO org.wikibrain.core.cmd.Env - To customize log4j2 set the 'log4j.configurationFile' system property or set EnvBuilder.setReconfigureLogging to false.
15:50:42.458 [main] INFO org.wikibrain.conf.Configurator - configurator installed 75 providers for 38 classes
15:50:42.462 [main] INFO org.wikibrain.core.cmd.Env - using override configuration files [customized.conf]
15:50:42.464 [main] INFO org.wikibrain.core.cmd.Env - using baseDir /Users/dantasse/src/WikibrainDemo/.
15:50:42.465 [main] INFO org.wikibrain.core.cmd.Env - using max vm heapsize of 3641MB
15:50:42.477 [main] INFO org.wikibrain.core.cmd.Env - using languages (SIMPLE)
15:50:42.478 [main] INFO org.wikibrain.core.cmd.Env - using maxThreads 4
15:50:42.478 [main] INFO org.wikibrain.core.cmd.Env - using tmpDir ./.tmp
15:50:42.670 [main] WARN org.wikibrain.core.dao.sql.WpDataSource - Raised connections per partition to 3
15:50:43.495 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning dry run
15:50:44.687 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Ended dry run
*************************************
** ALL DIAGNOSTIC TESTS SUCCEEDED! **
*************************************

Rough estimate of download size: 480.0 MBs
This may be an over-estimate if some files have already been downloaded.
Time on dial-up (50kbs): 1600.0 minutes
Time on Broadband (1Mbs): 80.0 minutes
Time on Broadband (10Mbs): 8.0 minutes
Time on Broadband (100Mbs): 0.8 minutes
stage download will download about 480.0 about MBs

Completion time estimate: 20.6 minutes (NOT including download time)
stage fetchlinks: 0.0 minutes
stage download: 0.0 minutes
stage dumploader: 2.5 minutes
stage redirects: 0.1 minutes
stage wikitext: 9.9 minutes
stage lucene: 6.9 minutes
stage phrases: 0.7 minutes
stage sr: 0.4 minutes

Disk space is okay. (need 2.796 GBs, have 205.795 GBs)
Warning: Available disk space may be INACCURATE if you have multiple drives.
stage fetchlinks: 1.2 MBs
stage download: 480.0 MBs
stage dumploader: 685.7 MBs
stage redirects: 34.3 MBs
stage wikitext: 525.0 MBs
stage lucene: 857.1 MBs
stage phrases: 105.0 MBs
stage sr: 175.0 MBs

Amount of memory allocated for the JVM is okay
memory required: 3.0GB
memory allocated: 3.8GB

Connection to database succeeded. Active configuration:
username: "sa"
partitions: "default"
password: ""
connectionsPerPartition: 2
url: "jdbc:h2:./db/h2;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0;MAX_OPERATION_MEMORY=100000000"
driver: "org.h2.Driver"

Beginning import process in 20 seconds...
Beginning import process in 15 seconds...
Beginning import process in 10 seconds...
Beginning import process in 5 seconds...
15:51:04.706 [main] WARN org.wikibrain.core.dao.sql.WpDataSource - Failed to close connection:
org.h2.jdbc.JdbcSQLException: Database is already closed (to disable automatic closing at VM shutdown, add ";DB_CLOSE_ON_EXIT=FALSE" to the db URL) [90121-174]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:332) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.message.DbException.get(DbException.java:172) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.message.DbException.get(DbException.java:149) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.message.DbException.get(DbException.java:138) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.jdbc.JdbcConnection.checkClosed(JdbcConnection.java:1413) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.jdbc.JdbcConnection.checkClosed(JdbcConnection.java:1388) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.h2.jdbc.JdbcConnection.getAutoCommit(JdbcConnection.java:428) ~[wikibrain-withdeps-0.7.4.jar:?]
at com.jolbox.bonecp.ConnectionHandle.<init>(ConnectionHandle.java:255) ~[wikibrain-withdeps-0.7.4.jar:?]
at com.jolbox.bonecp.ConnectionHandle.recreateConnectionHandle(ConnectionHandle.java:281) ~[wikibrain-withdeps-0.7.4.jar:?]
at com.jolbox.bonecp.ConnectionHandle.close(ConnectionHandle.java:512) ~[wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.core.dao.sql.WpDataSource.closeQuietly(WpDataSource.java:179) [wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.core.dao.sql.WpDataSource.close(WpDataSource.java:260) [wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.conf.Configurator.close(Configurator.java:411) [wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.core.cmd.Env.close(Env.java:187) [wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.Loader.run(Loader.java:93) [wikibrain-withdeps-0.7.4.jar:?]
at org.wikibrain.Loader.main(Loader.java:136) [wikibrain-withdeps-0.7.4.jar:?]
15:51:05.713 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning loading
15:51:05.713 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning stage fetchlinks
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
15:51:06.747 [main] INFO org.wikibrain.core.cmd.Env - Configured default logging at the Info Level
15:51:06.748 [main] INFO org.wikibrain.core.cmd.Env - To customize log4j2 set the 'log4j.configurationFile' system property or set EnvBuilder.setReconfigureLogging to false.
15:51:10.302 [main] INFO org.wikibrain.conf.Configurator - configurator installed 75 providers for 38 classes
15:51:10.310 [main] INFO org.wikibrain.core.cmd.Env - using override configuration files [customized.conf]
15:51:10.312 [main] INFO org.wikibrain.core.cmd.Env - using baseDir /Users/dantasse/src/WikibrainDemo/.
15:51:10.313 [main] INFO org.wikibrain.core.cmd.Env - using max vm heapsize of 3641MB
15:51:10.336 [main] INFO org.wikibrain.core.cmd.Env - using languages (SIMPLE)
15:51:10.353 [main] INFO org.wikibrain.core.cmd.Env - using maxThreads 4
15:51:10.370 [main] INFO org.wikibrain.core.cmd.Env - using tmpDir ./.tmp
15:51:10.391 [main] INFO org.wikibrain.download.RequestedLinkGetter - writing download list to ./download/list.tsv
Exception in thread "main" org.wikibrain.core.WikiBrainException: No dumps for simple found before 20160610
at org.wikibrain.download.RequestedLinkGetter.getAllDates(RequestedLinkGetter.java:88)
at org.wikibrain.download.RequestedLinkGetter.getDumps(RequestedLinkGetter.java:113)
at org.wikibrain.download.RequestedLinkGetter.getLangLinks(RequestedLinkGetter.java:139)
at org.wikibrain.download.RequestedLinkGetter.main(RequestedLinkGetter.java:245)
Stage fetchlinks failed with exit code 1


LOADING FAILED!

onofri...@gmail.com

unread,
Jun 13, 2016, 3:41:22 AM6/13/16
to wikibrain, dan....@gmail.com


Hi, I'm facing the same problem trying to import wikipedia data with version 0.7.4 of Wikibrain. I've tried even with version 0.4.1 but the error persist. I am using both H2 and postgreSQL databases.

Shilad Sen

unread,
Jun 21, 2016, 11:25:10 AM6/21/16
to wikibrain, dan....@gmail.com
Thanks for using WikiBrain, and sorry for the delay! I just released a new version that may fix the problem. Would you give it a try?

On Friday, June 10, 2016 at 5:57:04 PM UTC-5, 

Dan Tasse

unread,
Jun 21, 2016, 1:13:23 PM6/21/16
to Shilad Sen, wikibrain
Hi Shilad - Great, thanks! Loading seems to have completed successfully now.
Dan
Reply all
Reply to author
Forward
0 new messages