NPE when attempting to build DB files using TimeMachine on simple_english

14 views
Skip to first unread message

Rüdiger Gleim

unread,
Sep 18, 2017, 1:24:53 AM9/18/17
to jwpl-users
Hello,
Iam experimenting with JWPL TimeMachine.I managed to setup a current version of simple english using DataMachine, but fail to do so with TimeMachine. I attempt to get an annual snapshot of the simple english wikipedia by setting the "each" parameter to 365. The config and the log file are attached to this post. In addition I get the following exception on the console:

Exception in thread "xml2sql" java.lang.NullPointerException
    at de.tudarmstadt.ukp.wikipedia.wikimachine.util.Redirects.isRedirect(Redirects.java:76)
    at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.PageWriter.updatePage(PageWriter.java:89)
    at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.PageWriter.writeEndPage(PageWriter.java:49)
    at de.tudarmstadt.ukp.wikipedia.mwdumper.importer.PageFilter.writeEndPage(PageFilter.java:62)
    at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.closePage(AbstractXmlDumpReader.java:500)
    at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.endElement(AbstractXmlDumpReader.java:362)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1783)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2970)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:205)
    at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStreamThread.run(XMLDumpTableInputStreamThread.java:90)

Did I miss anything obvious?

A few general questions regarding Time Machine:
- I assume that for each snapshot or "time slice" it will fetch that revision of a page which has been active at that specific point in time.
- Will the Parser automatically include the fitting templates of that point in time?

Thank you for any help or suggestions you could provide.

Best,

Rüdiger
timemachine.xml
20170916_150137.txt

Torsten Zesch

unread,
Sep 18, 2017, 1:52:05 AM9/18/17
to jwpl-users
Hi Rüdiger.

Yes. It works as described.
It will not handle templates.

Did you try a different dump or/and to download again?

Torsten
--
You received this message because you are subscribed to the Google Groups "jwpl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jwpl+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages