HDFS Spark Integration Error

453 views
Skip to first unread message

Aslan Bekirov

unread,
Mar 24, 2013, 12:40:42 PM3/24/13
to spark...@googlegroups.com

Hi Everyone,

I have installed HDFS on 5 node cluster and it is operating on port 9000.

I wrote an python application and inside I am trying to read data on that HDFS with code

     logFile = "hdfs://hostname:9000/user/input2"

When I am running it throws exception like

py4j.protocol.Py4JJavaError: An error occurred while calling o14.collect.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hostname:9000/user/input2
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:52)
    at spark.RDD.partitions(RDD.scala:168)

Then When I am deleting port number in python code like this

      logFile = "hdfs://hostname/user/input2"


spark tries to connect to port 8020 and throws exception like

13/03/24 18:33:05 INFO ipc.Client: Retrying connect to server: SrvT2C2Master/10.100.8.55:8020. Already tried 8 time(s).
13/03/24 18:33:06 INFO ipc.Client: Retrying connect to server: SrvT2C2Master/10.100.8.55:8020. Already tried 9 time(s).
Traceback (most recent call last):
.
.
.
: java.net.ConnectException: Call to SrvT2C2Master/10.100.8.55:8020 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
    at org.apache.hadoop.ipc.Client.call(Client.java:1075)
   
Any suggestions,please?

BR,
Aslan

Aslan Bekirov

unread,
Mar 24, 2013, 1:18:09 PM3/24/13
to spark...@googlegroups.com
I solved it by myself.

Reason was corruption of input2 file.

BR,
Aslan
Reply all
Reply to author
Forward
0 new messages