Problem while reading files from hdfs using python

144 views
Skip to first unread message

Shalini Ravishankar

unread,
Jan 25, 2015, 3:20:12 PM1/25/15
to montrea...@googlegroups.com
Hello Everyone,

I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.

Code (full): sample.py
    
    #!/usr/bin/python
    

    from subprocess import Popen, PIPE
    
    print "Before Loop"
    
    cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
                stdout=PIPE)
    put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
                stdin=PIPE)
    for line in cat.stdout:
        line += "Blah"
        print line
        put.stdin.write(line)
    
    cat.stdout.close()
    cat.wait()
    put.stdin.close()
    put.wait()

When I execute : 

    hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile

And When I execute :

     hadoop fs -getmerge ./fileRead/ file.txt

Inside the file.txt, I got :

    Before Loop
    Before Loop

Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt

I would really appreciate the help.


--
Thanks & Regards,
Shalini Ravishankar.

Jordi Gutiérrez Hermoso

unread,
Jan 26, 2015, 10:19:34 AM1/26/15
to montrea...@googlegroups.com
This seems like more of a Hadoop question than a pure Python
question. Can you pose your question in a Hadoop forum? You are likely
to find more people there who understand your problem.

However, I'm going to try to be helpful: are you sure that your
subprocess.wait() calls aren't deadlocking and Hadoop is killing the
process? I'm also not sure about the order in which you're closing the
standard streams and interleaved with wait() calls.

- Jordi G. H.


Eric Parent

unread,
Jan 26, 2015, 1:27:57 PM1/26/15
to montrea...@googlegroups.com
Hi,

I've been dealing with HDF5 files in the past from a Python perspective. It has a few quirks and bumps but, overall, it worked fine.

I'm afraid this has nothing to do with HDF5 files, whatsoever. It looks to me like it's a question for the Hadoop community.

Sorry if I am of no help further than that...

Eric


--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "Montréal-Python".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse montrealpytho...@googlegroups.com.
Pour envoyer un message à ce groupe, envoyez un e-mail à l'adresse montrea...@googlegroups.com.
Visitez ce groupe à l'adresse http://groups.google.com/group/montrealpython.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.



--
Eric

“If you’re not prepared to be wrong, you’ll never come up with anything original.”
-- Sir Ken Robinson (TED: How schools kill creativity)

Julia Evans

unread,
Jan 26, 2015, 1:51:09 PM1/26/15
to montrea...@googlegroups.com
You could try to use snakebite, a Python HDFS client (and library) https://github.com/spotify/snakebite

Ann-Julie Rhéaume

unread,
Jan 26, 2015, 1:58:04 PM1/26/15
to montrea...@googlegroups.com
You could try to use snakebite, a Python HDFS client (and library) https://github.com/spotify/snakebite

... or Pydoop

aj

Eric Parent

unread,
Jan 26, 2015, 1:59:26 PM1/26/15
to montrea...@googlegroups.com
My bad... I thought you meant HDF5 (HDF-five), which is a binary file format for large amount of scientific (numerical) data.

I should clean my glasses more often...

Eric

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "Montréal-Python".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse montrealpytho...@googlegroups.com.
Pour envoyer un message à ce groupe, envoyez un e-mail à l'adresse montrea...@googlegroups.com.
Visitez ce groupe à l'adresse http://groups.google.com/group/montrealpython.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.

Jonathan Doyle

unread,
Jan 26, 2015, 2:08:09 PM1/26/15
to montrea...@googlegroups.com
I thought the same and have followed this conversation, confused.
Reply all
Reply to author
Forward
0 new messages