Memory loss

37 views
Skip to first unread message

jaco...@sas.upenn.edu

unread,
Apr 15, 2016, 12:43:31 PM4/15/16
to sqldf
I am trying to use read.csv.sql using Linux Ubuntu 14.04. My raw csv file is larger than RAM, but I do extensive SQL cleaning within the function. When the file is fully cleaned, it is much smaller than RAM (compressed from 20+ GB to ~30 MB). Nonetheless, I still get memory errors when I try to run the command on my machine.

My co-author, running OS X, is able to run the commands with no problem, despite the fact that we have the same amount of underlying RAM.

For reference, here is the command I am trying to run:

read.csv.sql(dataSetName,
sql = " select SYMBOL, DATE, min(TIME) as TIME, PRICE from file where
SUBSTR(Time, 5,1) IN ( '0', '1','2','3','4','5', '6','7','8','9') GROUP BY
SYMBOL, DATE, SUBSTR(TIME,2,4)", dbname = 'sqldb' )

Gabor Grothendieck

unread,
Apr 15, 2016, 12:53:23 PM4/15/16
to jaco...@sas.upenn.edu, sqldf
Perhaps you could try one of the other back ends that sqldf supports.
read.csv.sql is not supported for them but they have their own
facilities to read files. For example, the H2 database has the
csvread SQL function.

If you are successful you might want to report back on what you did.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
Reply all
Reply to author
Forward
0 new messages