how to save pandas dataframe as csv in hdfs

2,918 views
Skip to first unread message

nisvinps

unread,
May 16, 2017, 10:31:54 AM5/16/17
to PyData
I am able to save pandas df as csv in local filesystem using df.to_csv, but when i use the same i am getting error. Please suggest me a method to save pandas df as csv in hdfs. Thank you.

dartdog

unread,
May 16, 2017, 10:34:19 AM5/16/17
to PyData
You will have to give us more to go on.. What type of system are you on? What revisions? And most important, what poor messages. also, show your code..

Lee Kangrok

unread,
Feb 3, 2018, 8:27:31 AM2/3/18
to PyData
Hi,

Actually df.to_csv() function is not covered with HDFS directory. So I can save pandas df into hdfs with 2 functions, hdfs command and df.to_csv().
First you should save pandas df into local filesystem such as the below.
import pandas as pd
df_app = pd.DataFrame(...)
df_app.to_csv("./application.txt", index=False)

and second you should copy local file into the HDFS directory like the below.
 
import subprcess
subprocess.call("hdfs dfs -copyFromLocal ./application.txt /user/hdfs/")

Please make sure that you need to use pandas dataframe before applying for my method, cause it's time-consuming job for the sequential task.
I prefer to use Spark dataframe instead of using pandas dataframe.


Regards,
Leo Lee 
Reply all
Reply to author
Forward
0 new messages