Hi Everyone,
I have installed HDFS on 5 node cluster and it is operating on port 9000.
I wrote an python application and inside I am trying to read data on that HDFS with code
logFile = "hdfs://hostname:9000/user/input2"
When I am running it throws exception like
py4j.protocol.Py4JJavaError: An error occurred while calling o14.collect.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hostname:9000/user/input2
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:52)
at spark.RDD.partitions(RDD.scala:168)
Then When I am deleting port number in python code like this
logFile = "hdfs://hostname/user/input2"
spark tries to connect to port 8020 and throws exception like
13/03/24 18:33:05 INFO ipc.Client: Retrying connect to server: SrvT2C2Master/
10.100.8.55:8020. Already tried 8 time(s).
13/03/24 18:33:06 INFO ipc.Client: Retrying connect to server: SrvT2C2Master/
10.100.8.55:8020. Already tried 9 time(s).
Traceback (most recent call last):
.
.
.
: java.net.ConnectException: Call to SrvT2C2Master/
10.100.8.55:8020 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
Any suggestions,please?
BR,
Aslan