dplyr writing / appending to database

2,012 views
Skip to first unread message

Georg Heiler

unread,
Feb 23, 2015, 6:09:34 AM2/23/15
to manip...@googlegroups.com
Hi,

I am fairly new to dplyr and have one question:
Is there a way to write / append data to an existing postgres table?
I tried the following:

conn <- src_postgres(dbname = 'mydb', host= 'localhost', port= 5432, user = 'myuser', password = 'mypass')

db_insert_into
(conn, "INSERT INTO mytable" VALUES ('test', 22))
df
= data.frame('foobar', 1234)
copy_to
(con, data, name = 'mytable')
dbWriteTable
(conn, value = df, name = "testing", append = TRUE)

Either it does not work at all, or it tells me that the table already exists (copy_to).


Is there a way to write / append data to an existing postgres table from R (maybe using a different tool if dplyr does'nt support this)?

Thank you very much for help.

Regards,
Georg

Hadley Wickham

unread,
Feb 23, 2015, 7:40:20 AM2/23/15
to Georg Heiler, manip...@googlegroups.com
dbWriteTable is the correct tool. What happens when you use it?

Hadley
--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.


--
http://had.co.nz/

Georg Heiler

unread,
Feb 23, 2015, 8:44:21 AM2/23/15
to manip...@googlegroups.com, georg.k...@gmail.com
Well:

dfgg = data.frame('fofffof', 3433)
dbWriteTable
(con, value = dfgg, name = "mytable", append = T)
Error in postgresqlgetResult(new.con) :
  RS
-DBI driver: (could not Retrieve the result : ERROR:  extra data after last expected column
CONTEXT
:  COPY stupidarimatest, line 1: "1      fofffof 3433"
)

whereas mytable has the following fields:
1)TEXT sometext
2)INT index



Am Montag, 23. Februar 2015 13:40:20 UTC+1 schrieb Hadley Wickham:
dbWriteTable is the correct tool. What happens when you use it?

Hadley

On Monday, February 23, 2015, Georg Heiler <georg.k...@gmail.com> wrote:
Hi,

I am fairly new to dplyr and have one question:
Is there a way to write / append data to an existing postgres table?
I tried the following:

conn <- src_postgres(dbname = 'mydb', host= 'localhost', port= 5432, user = 'myuser', password = 'mypass')

db_insert_into
(conn, "INSERT INTO mytable" VALUES ('test', 22))
df
= data.frame('foobar', 1234)
copy_to
(con, data, name = 'mytable')
dbWriteTable
(conn, value = df, name = "testing", append = TRUE)

Either it does not work at all, or it tells me that the table already exists (copy_to).


Is there a way to write / append data to an existing postgres table from R (maybe using a different tool if dplyr does'nt support this)?

Thank you very much for help.

Regards,
Georg

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+unsubscribe@googlegroups.com.

To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Hadley Wickham

unread,
Feb 23, 2015, 9:07:01 AM2/23/15
to Georg Heiler, manip...@googlegroups.com
Can you try RPostgres from my github?
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.

To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.


--
http://had.co.nz/

Georg Heiler

unread,
Feb 23, 2015, 9:14:14 AM2/23/15
to manip...@googlegroups.com, georg.k...@gmail.com
Sure.
I get the following error whilst installation of devtools::install_github("RcppCore/Rcpp"):

During startup - Warning message:
Setting LC_CTYPE failed, using "C" 
ERROR: dependencies 'httr', 'RCurl' are not available for package 'devtools'
* removing '/home/vagrant/R/x86_64-pc-linux-gnu-library/3.1/devtools'

The downloaded source packages are in
        '/tmp/RtmpeA830k/downloaded_packages'
Warning messages:
1: In install.packages("devtools") :
  installation of package 'RCurl' had non-zero exit status
2: In install.packages("devtools") :
  installation of package 'httr' had non-zero exit status
3: In install.packages("devtools") :
  installation of package 'devtools' had non-zero exit status

But shouldn't it be possible to use dbWriteTable?
What was wrong in my approach for using dbWriteTable?

Georg Heiler

unread,
Feb 23, 2015, 11:03:15 AM2/23/15
to manip...@googlegroups.com, georg.k...@gmail.com
https://code.google.com/p/rpostgresql/issues/detail?id=59

row.names = f


seems to solve my problem

Alex Fine

unread,
May 15, 2016, 10:08:15 AM5/15/16
to manipulatr
Hey everyone,

Picking up on this thread.

I have a very large PostgreSQL table (~15 million rows) that I'm de-duplicating using dplyr (which works great).  However, I now need to write the de-duped version back to the PG data base.  I tried this:

dbWriteTable(con, c("schema_name", "table_name"), value = deduped_tbl, n=-1)

I added the "n=-1" because otherwise it only writes the first 100K rows.  This just doesn't work though, I think because of the size of the tbl object it's trying to write.  (When I omit 'n=-1' it is able to write 100K rows to the table, which is what makes me think the size is the problem).

There has to be a better way of doing this, but I just can't find it.  For example, if there were a way to refer to the tbl in SQL, I could just do a CREATE TABLE or something, but it's not clear that that's possible.

Any ideas?    

Jeff Norville

unread,
May 15, 2016, 6:41:26 PM5/15/16
to manipulatr
HI Alex,

Sounds like a new thread, but I'll try a brief reply here -- since frankly I can't think of a reason not to just do it within the DB ... something like this (in psql): 
    SELECT DISTINCT(*)
    INTO tbl_deduped
    FROM tbl_with_probs;

Otherwise, if you stick with the dplyr shuffle try n=Inf instead of n=-1.

Good luck!
Reply all
Reply to author
Forward
0 new messages