faster way to import csv

365 views
Skip to first unread message

Sebastian Demian

unread,
Aug 25, 2013, 6:15:28 PM8/25/13
to web...@googlegroups.com
Hi guys, I am working on an application using web2py framework, and I need to import a csv file that has a size of 700MB. Is there a faster way to import it into the database ? I have tried the database administration tool, but it takes forever.

PS: I am a newbie.

Thank you.

Alan Etkin

unread,
Aug 25, 2013, 6:28:11 PM8/25/13
to web...@googlegroups.com
Hi guys, I am working on an application using web2py framework, and I need to import a csv file that has a size of 700MB. Is there a faster way to import it into the database ? I have tried the database administration tool, but it takes forever.

Have you tried the web2py shell?

]$ python web2py.py -S <app name> -M

>>> with open(<csv file path>) as csvfile:
>>>     db.<table>.import_from_csv_file(csvfile)

Joe Barnhart

unread,
Aug 26, 2013, 5:19:02 AM8/26/13
to web...@googlegroups.com
In general processing almost a gig of CSV data is going to take a measurable amount of time.  It's simply unavoidable.

I process similar size CSV files and I have found the Scheduler to be the ideal solution.  I don't actually care that much how long it takes -- I just want my website to remain responsive while it loads.  This is exactly why the scheduler exists, to fork something long-running to another process so your site is unaffected.  And it truly is another process -- not a thread.  If you have a quad-core you can support a whale of a lot of processing using the scheduler.

I built in a little "messaging" table so when my scheduler is done it sends a message to me and I can see the results.  It's a marvelous piece of work and one of the best features of web2py.

-- Joe

Michele Comitini

unread,
Aug 26, 2013, 3:53:09 PM8/26/13
to web...@googlegroups.com
Yes, there is a  faster alternative.  Most databases have a way to import CSV directly. Postgresql, mysql, sqlite all can import CSV with different degrees of complication, but I can say it is pretty easy and much faster.
The nice thing of web2py is that is independent of the database, and this comes with a price in terms of speed.





2013/8/26 Joe Barnhart <joe.ba...@gmail.com>

--
 
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sebastian Demian

unread,
Aug 26, 2013, 4:58:00 PM8/26/13
to web...@googlegroups.com
I tried importing the CSV file using LOAD DATA INFILE on a test MYSQL DB and it loaded it in less than 1 minute which is great.

Having said that, what do you guys think would be the best approach ? Should I have web2py connect to the remote database and retrieve queries from it ? or should I use the built-in db + scheduler to import the CSV?


guruyaya

unread,
Aug 27, 2013, 12:53:46 AM8/27/13
to web...@googlegroups.com
Michele, I think you're missing on something. I'm not sure why load csv data, is not implemented using load data infile, but the diffrence between DB engines is not a problem on this issue. You can just implement the loading function one way, when you use MySQL, and another when you use GAE. Maybe there's another reason not to do that, but it has nothing to do with compatability.

Michele Comitini

unread,
Aug 27, 2013, 4:27:32 AM8/27/13
to web...@googlegroups.com
Guruyaya,

Sorry could you rephrase?  I am not sure to understand what I am missing.  I was suggesting to Sebastian to use database features to import a large chunk CSV data, bypassing web2py, to avoid speed problems he was experimenting.

mic


2013/8/27 guruyaya <guru...@gmail.com>

--
Reply all
Reply to author
Forward
0 new messages