Connection lost using collect in dplyr

22 views
Skip to first unread message

tao hong

unread,
Jun 12, 2017, 12:37:04 AM6/12/17
to manipulatr
I would like to use `dplyr` and `RMySQL` to work with my large data. There is no issues with `dplyr` code. The problem (I think) is about exporting data out of `MySQL` to `R`. My connection is dropped every time even I am using `n=Inf` in `collect`. Theoretically, my data should have more than 50K rows, but I can only get around 15K back. Any suggestions are appreciated.


### Approach 1
    library(dplyr)
    library(RMySQL)

    # Connect to a database and select a table 
    my_db <- src_mysql(dbname='aermod_1', host = "localhost", user = "root", password = "")   
    my_tbl <- tbl(my_db, "db_table") 
    out_summary_station_raw <- select(my_tbl, -c(X, Y, AVERAGE_CONC))
    out_station_mean_local <- collect(out_summary_station_raw)



### Approach 2: using `Pool`

    library(pool)
    library(RMySQL)
    library(dplyr)
    
    pool <- dbPool(
      drv = RMySQL::MySQL(),
      dbname = "aermod_1",
      host = "localhost",
      username = "root",
      password = ""
    )
    
    out_summary_station_raw <- src_pool(pool) %>% tbl("aermod_final") %>% select(-c(X, Y, AVERAGE_CONC))
    
    out_station_mean_local <- collect(out_summary_station_raw, n = Inf)

###Warning message (both approaches):

    Warning messages:
    1: In dbFetch(res, n) : error while fetching rows
    2: Only first 15,549 results retrieved. Use n = Inf to retrieve all. 
Reply all
Reply to author
Forward
0 new messages