I have faced an issue about importing an xml dump file from WPedia to mysql. Currently, I am using 'mwdumper' ; which is basically a jar file that converts xml's into sql files. I have used this command to achieve this:
ERROR 1146 (42S02) at line 46: Table 'wiki_test.text' doesn't exist
14 pages (10,566/sec), 1.000 revs (754,717/sec)
26 pages (13,098/sec), 2.000 revs (1.007,557/sec)
32 pages (11,228/sec), 3.000 revs (1.052,632/sec)
44 pages (12,032/sec), 4.000 revs (1.093,793/sec)
52 pages (12,929/sec), 5.000 revs (1.243,163/sec)
56 pages (12,009/sec), 6.000 revs (1.286,725/sec)
...
Exception in thread "main" java.lang.IllegalArgumentException: Invalid contribut
or
at org.mediawiki.importer.XmlDumpReader.closeContributor(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source
)
I then tried to import the generated sql file to a database in mysql , but I ended up geting this error:
My question is: Is this related with the default database collation and character encoding , or is it a problem within the mwdumper version? I have checked that both the XML and the schema has the same encoding (utf-8) . Also , how did you import the wikipedia dump file (XML) to mysql before? Is this the best way to aciheve this?