I would like to use `dplyr` and `RMySQL` to work with my large data. There is no issues with `dplyr` code. The problem (I think) is about exporting data out of `MySQL` to `R`. My connection is dropped every time even I am using `n=Inf` in `collect`. Theoretically, my data should have more than 50K rows, but I can only get around 15K back. Any suggestions are appreciated.
### Approach 1
library(dplyr)
library(RMySQL)
# Connect to a database and select a table
my_db <- src_mysql(dbname='aermod_1', host = "localhost", user = "root", password = "")
my_tbl <- tbl(my_db, "db_table")
out_summary_station_raw <- select(my_tbl, -c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw)
### Approach 2: using `Pool`
library(pool)
library(RMySQL)
library(dplyr)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "aermod_1",
host = "localhost",
username = "root",
password = ""
)
out_summary_station_raw <- src_pool(pool) %>% tbl("aermod_final") %>% select(-c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw, n = Inf)
###Warning message (both approaches):
Warning messages:
1: In dbFetch(res, n) : error while fetching rows
2: Only first 15,549 results retrieved. Use n = Inf to retrieve all.