Building ESA indexes

27 visualizzazioni
Passa al primo messaggio da leggere

Ann

da leggere,
6 mar 2017, 05:35:0906/03/17
a DKPro Similarity Users
Hello all,
I'm trying to create my own index for ESA. Following the instructions, I got an SQL dump of Wikipedia and ran the class "EsaIndexer".

However, some errors occured while running. The programm tries to alter my table "Page" by adding some new columns. First, it added column "id" and then wrote that there was an error because of the duplicate column name "id". Then I restarted the progrgamm, and now there's a similar error with the column "pageId":
ERROR hbm2ddl.SchemaUpdate: HHH000388: Unsuccessful: alter table Page add column pageId integer unique
ERROR hbm2ddl.SchemaUpdate: Duplicate column name 'pageId'.

And the programm doesn't stop - it continues running with the next similar error, now for column "name", and I guess it will go on showing these errors and adding new columns.

Do you have any ideas why is it happening and how can I fix it?

Ann

da leggere,
6 mar 2017, 07:01:2506/03/17
a DKPro Similarity Users
UPDATE. 

The programm added five new columns to my "page" table and eight more tables to my DB, and everything seemed fine, because after restarting the program the errors I've described above were gone. However, then seven new problems appeared, all of them like this:
ERROR hbm2ddl.SchemaUpdate: HHH000388: Unsuccessful: alter table category_inlinks add index FK3F4337732A72A718 (id), add constraint FK3F4337732A72A718 foreign key (id) references Category (id)
ERROR hbm2ddl.SchemaUpdate: Can't create table `new_wiki`.`#sql-2c34_7` (errno: 150 "Foreign key constraint is incorrectly formed")

After this I got a message:
INFO hbm2ddl.SchemaUpdate: HHH000232: Schema update complete, 

and then - Java NullPointerException: "attempted to lock null".

And so I don't know what I should do, I guess these errors are connected with the fact that something is wrong in my database, but I don't know where to look and what I might have done wrong.

Torsten Zesch

da leggere,
6 mar 2017, 15:01:2206/03/17
a Ann, DKPro Similarity Users
Hi,

it is a bit hard to say from the description.
Are you sure the database is setup correctly?
Can you query it in general?

Also please not that you do not necessarily need the database. A plaintext dump of Wikipedia and a suitable reader will also do.

-Torsten


--
You received this message because you are subscribed to the Google Groups "DKPro Similarity Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-similarity-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Il messaggio è stato eliminato

Torsten Zesch

da leggere,
6 mar 2017, 16:09:3906/03/17
a Ann, DKPro Similarity Users
If you are not familiar with UIMA pipelines and/or how to extract a text version out of Wikipedia, it will be quite difficult to build your own indexes.

Do you know that there is a prebuild Wikipedia index?

-Torsten

2017-03-06 22:03 GMT+01:00 Ann <kruko...@gmail.com>:
Hi,
The queries seem just fine, at least I can perform them. But if the problem is in the database and if it won't work, I should probably try some other way. Could you please explain, what should I do if I want to use plaintext dump? First of all, on wikipedia dumps site there are either xml or sql files - so, as I understand, if I want to get plaintext I will have to extract it from xml dump  using some special tool? Secondly, what reader can I use? I am sorry for such dull questions, I'm just new to all these. Finally, I thought that I need to create a DB because in the EsaIndexer class there's a part where you have to specify your DB parameters (host, username, DB name, and password):
CollectionReader reader = createReader...
If I don't use a database, then how should I specify where my files are?

понедельник, 6 марта 2017 г., 23:01:22 UTC+3 пользователь Torsten Zesch написал:
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-similarity-users+unsubscri...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Rispondi a tutti
Rispondi all'autore
Inoltra
0 nuovi messaggi