mic...@gogotech.hk
unread,May 27, 2015, 3:24:57 AM5/27/15Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to luigi...@googlegroups.com
Hi,
I am new to Luigi and I am working a lot with pandas dataframes. Historically I have written my own mini framework that did inserts and updates using pandas read and write sql functions and it all just worked very smoothly. I did not need to worry about creating table if table did not exist or about specifying column definitions etc.
Since I joined in new company I would like to start using Luigi and I would like to leverage outputs and targets.
1. How do you work with pandas dataframes? Should I be using my own insert/update functions outside output definition or is there some way to automatically write/read sql ?
Currently what I am doing is something like this :
def run(self):
df=pd.read_sql_query('select country,date,requests,orders from agg_country_date limit 10',con=self.engine)
with self.output().open('w') as out_file:
print (df.to_csv(sep='\t', header=False, index=False), file=out_file)
which seems super hacky. Basically I am reading data from postgresql table to dataframe and then printing it to local file. In next job I am reading this file and inserting to db using luigi.postgres.CopyToTable.
1a. Can I just export csv directly from dataframe using df.to_csv(), then read this csv and export to sql using df.to_sql() ?
I can do this easily outside luigi framework but then would need to mock the outputs and probably would not get benefits of luigi.
Best regards,
Michal