Hello,
i really need your help.
I try to get that to work(Windows 10 x64):
Problem is, the guides are not avalible.
I installed hadoop with this tutorial:
Other Tutorials are like - the same.
Ok. All working.
Back to Obtaining-wikipedia-data:
i am at this command:
hadoop jar wikipedia-miner-hadoop.jar org.wikipedia.miner.extraction.DumpExtractor input/DUMP_FILE input/LANG_FILE LANGUAGE_CODE input/SENTENCE_MODEL output
What is like:
C:\Users\Brues\Desktop\Workspace\wikipedia-miner-1.2.0\build\jar>hadoop jar wikipedia-miner-hadoop.jar org.wikipedia.miner.extraction.DumpExtractor /input/enwiki-20180501-pages-articles.xml /input/languages.xml en /input/en-sent.bin /output
I got this problem, it hangs on:
ACCEPTED: waiting for AM container to be allocated, launched and register with RM
I try many proposals on google. Nothing working.
Some informations:
core-site.xml
---------------------------
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
mapred-site.xml
-------------------------
<configuration>
<property>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
------------------------
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>D:\hadoop-2.7.6\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>D:\hadoop-2.7.6\data\datanode</value>
</property>
</configuration>
yarn-site.xml
----------------------
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>