When using the DataMachine to transform the WikipediaDump I executed following command in my terminal
So when executing the DataMachine it gives me 3 bin files and an output folder which should contain 11 txt files after a successful execution.
After one and half hour of execution I receive an external txt file created by the DataMachine which contains the following message:
"Date/Time","Total Memory","Free Memory","Message"
"2020.05.09 16:47:15","257425408","245268928","parse input dumps..."
"2020.05.09 16:47:15","257425408","245268928","Discussions are unavailable"
"2020.05.09 18:25:07","209190912","141272304","org.xml.sax.SAXParseException; lineNumber: 57156821; columnNumber: 399; JAXP00010004: Die akkumulierte Größe von Entitys ist "50.000.001" und überschreitet den Grenzwert "50.000.000", der von "FEATURE_SECURE_PROCESSING" festgelegt wurde.
de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:209)
de.tudarmstadt.ukp.wikipedia.datamachine.dump.xml.XML2Binary.<init>(XML2Binary.java:47)
de.tudarmstadt.ukp.wikipedia.datamachine.domain.DataMachineGenerator.processInputDump(DataMachineGenerator.java:70)
de.tudarmstadt.ukp.wikipedia.datamachine.domain.DataMachineGenerator.start(DataMachineGenerator.java:64)
de.tudarmstadt.ukp.wikipedia.datamachine.domain.JWPLDataMachine.main(JWPLDataMachine.java:64)"
The output folder remains empty while the DataMachine is still running. I assumed by getting this message something got interrupted. To me it looks like a lack of available memory, but it is just a guess. On the other hand I already assigned additional memory by using the -Xmx4g flag. Can somebody explain what the problem is and how the DataMachine can be run successfully when such a problem occurs?