Problem while reading files from hdfs using python

924 views
Skip to first unread message

Shalini Ravishankar

unread,
Jan 25, 2015, 3:16:09 PM1/25/15
to aureliu...@googlegroups.com
Hello Everyone,


I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.

Code (full): sample.py
    
    #!/usr/bin/python
    

    from subprocess import Popen, PIPE
    
    print "Before Loop"
    
    cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
                stdout=PIPE)
    put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
                stdin=PIPE)
    for line in cat.stdout:
        line += "Blah"
        print line
        put.stdin.write(line)
    
    cat.stdout.close()
    cat.wait()
    put.stdin.close()
    put.wait()

When I execute : 

    hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile

And When I execute :

     hadoop fs -getmerge ./fileRead/ file.txt

Inside the file.txt, I got :

    Before Loop
    Before Loop

Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt

I would really appreciate the help.


--
Thanks & Regards,
Shalini Ravishankar.

Daniel Kuppitz

unread,
Jan 26, 2015, 4:44:00 PM1/26/15
to aureliu...@googlegroups.com
Hi Shalini,

not sure why you're asking this here, since your question is not even Titan-related, just Python+Hadoop.

I suspect not many of the Titan users are familiar with Python and its Hadoop libraries.

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages