Good question. Assuming there is enough memory, fread will certainly be much faster than drRead.csv. The way you can convert this in-memory data.table to a local disk ddf would be the following.
# supposing you want arbitrary chunks as an intital ddf:
chunk_size <- 100000
n <- nrow(large_dt)
large_dt_ddf_conn <- localDiskConn("__path__")
# loop over chunks and save each as key-value pair to connection
for(ii in seq_len(ceiling(n / chunk_size))) {
idx <- ((ii - 1) * chunk_size + 1):(ii * chunk_size)
addData(large_dt_ddf_conn, kvPairs(kvPair(ii, large_dt[idx,])))
}
# point to this connection as a ddf
large_dt_ddf <- ddf(large_dt_ddf_conn)
This basically loops over your data set and saves each chunk as a key-value pair to a local disk connection. If you’d rather save subsets by some prescribed division to your data.table, you can modify the for loop accordingly to loop over factor levels of the data.table.
Ryan