inserting bytea data

2,236 views
Skip to first unread message

Peter K

unread,
Jan 28, 2014, 11:05:24 PM1/28/14
to rpostgr...@googlegroups.com
Dear RPostgresql developers,
 I was wondering if insertion of bytea binary data is somehow supported. 
 Here's an example:

 # CREATE TABLE test (name varchar(30) NOT NULL, val bytea, PRIMARY KEY (name)); 

 sample.object <- list("one","two");
  ser <- serialize(sample.object,NULL,ascii=F);
  str(ser)
 
>  iq <- sprintf("INSERT INTO test values('%s',X'%s'","name1",ser); # doesn't work - ser is a vector

# ok, let's paste all the characters together
> iq <- sprintf("INSERT INTO test values('%s',X'%s');","name1",paste(as.character(ser),collapse=""))
> dbSendQuery(con, iq)
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "val" is of type bytea but expression is of type bit
LINE 1: INSERT INTO test values('name1',X'580a0000000200020f03000203...
                                        ^
HINT:  You will need to rewrite or cast the expression.


>  iq <- sprintf("INSERT INTO test values('%s',E'%s');","name1",paste(as.character(ser),collapse=""))
# this one works, but reading the record back, I get a value that I don't know how to interpret - certainly not the same character string


Making a data.frame like that (with a val column being a list of "raw" vector values) throws a seg fault upon dbWriteTable.

Any hints would be greatly appreciated. 
My overall goal here is to store a very large set of small-medium serialized R objects in a database table, optimizing retrieval time as much as possible.

Thanks in advance,
-peter.

NISHIYAMA Tomoaki

unread,
Jan 28, 2014, 11:45:56 PM1/28/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
Dear peter,

Thank you to bring this up.
So far as I know there is no support for bytea, and I did not have a use case that require
binary data access from R. But the serialize example you have shown makes sense.

The right course to do it would, first implement adapter functions to the PQescapeByteaConn,
PQescapeBytea, and PQunescapeBytea at C level.

None the less if you are to construct the SQL yourself,
please refer to http://www.postgresql.org/docs/9.3/static/datatype-binary.html

If your server version is 9.0 or newer they support hex format like
E'\\xDEADBEEF'
which is only a bit different from X'580a0000000200020f03000203...
and thus you would be able to construct it. (change X'%s' with E'\\x%s')

To work with 8.X series you need octal conversion.
http://www.postgresql.org/docs/8.0/static/datatype-binary.html

You will need to unescape the return value as well, which is not a trivial task
without library support, though.

> > iq <- sprintf("INSERT INTO test values('%s',E'%s');","name1",paste(as.character(ser),collapse=""))
> # this one works, but reading the record back, I get a value that I don't know how to interpret - certainly not the same character string


To my expectation this should insert the string
"580a0000000200030000000203000000001300000002000000100000000100040009000000036f6e650000001000000001000400090000000374776f"
and you know how to interpret, two characters per each byte.
The apparent drawback is that the data takes double disk space and you still need reverse conversion.
package pack might do the job, which I don't have tried. (pack/unpack is a common name used in other
languages like Perl, Python, and Ruby, etc.)

> Making a data.frame like that (with a val column being a list of "raw" vector values) throws a seg fault upon dbWriteTable.

Since any seg fault is considered as a bug of the library please report it to the issue
tracking system at
http://code.google.com/p/rpostgresql/issues/list
with minimal example if possible. Thank you.

Sincerely yours,
--
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
> --
> You received this message because you are subscribed to the Google Groups "RPostgreSQL Development and Discussion List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rpostgresql-d...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



NISHIYAMA Tomoaki

unread,
Jan 29, 2014, 8:26:06 AM1/29/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
Hi,

I have committed r257 just now, which hopefully give you a minimal support to deal with bytea datatype.

dbGetQuery(con,"CREATE TABLE byteatable (name text NOT NULL, val bytea, PRIMARY KEY (name))")
sample.object <- list("one","two");
ser <- serialize(sample.object,NULL,ascii=F);
postgresqlEscapeBytea(con, ser)
iq <- sprintf("INSERT INTO byteatable values('%s',E'%s');","name1", postgresqlEscapeBytea(con, ser))
dbGetQuery(con, iq)
rows<-dbGetQuery(con, "SELECT * from byteatable")
ser2<-postgresqlUnescapeBytea(rows[[2]])

You will be able to insert and retrieve the binary data. The example shows that
the retrieved raw data ser2 is identical to the initial serialized data ser.

> rows<-dbGetQuery(con, "SELECT * from byteatable")
> ser2<-postgresqlUnescapeBytea(rows[[2]])
> dbGetQuery(con, "drop table byteatable")
NULL
> dbGetQuery(con,"CREATE TABLE byteatable (name text NOT NULL, val bytea, PRIMARY KEY (name))")
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "byteatable_pkey" for table "byteatable"
NULL
> sample.object <- list("one","two");
> ser <- serialize(sample.object,NULL,ascii=F);
> postgresqlEscapeBytea(con, ser)
[1] "X\\\\012\\\\000\\\\000\\\\000\\\\002\\\\000\\\\003\\\\000\\\\000\\\\000\\\\002\\\\003\\\\000\\\\000\\\\000\\\\000\\\\023\\\\000\\\\000\\\\000\\\\002\\\\000\\\\000\\\\000\\\\020\\\\000\\\\000\\\\000\\\\001\\\\000\\\\004\\\\000\\\\011\\\\000\\\\000\\\\000\\\\003one\\\\000\\\\000\\\\000\\\\020\\\\000\\\\000\\\\000\\\\001\\\\000\\\\004\\\\000\\\\011\\\\000\\\\000\\\\000\\\\003two"
> iq <- sprintf("INSERT INTO byteatable values('%s',E'%s');","name1", postgresqlEscapeBytea(con, ser))
> dbGetQuery(con, iq)
NULL
> rows<-dbGetQuery(con, "SELECT * from byteatable")
> ser2<-postgresqlUnescapeBytea(rows[[2]])
> ser2
[1] 58 0a 00 00 00 02 00 03 00 00 00 02 03 00 00 00 00 13 00 00 00 02 00 00 00
[26] 10 00 00 00 01 00 04 00 09 00 00 00 03 6f 6e 65 00 00 00 10 00 00 00 01 00
[51] 04 00 09 00 00 00 03 74 77 6f
> ser
[1] 58 0a 00 00 00 02 00 03 00 00 00 02 03 00 00 00 00 13 00 00 00 02 00 00 00
[26] 10 00 00 00 01 00 04 00 09 00 00 00 03 6f 6e 65 00 00 00 10 00 00 00 01 00
[51] 04 00 09 00 00 00 03 74 77 6f

--
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan


Peter K

unread,
Jan 29, 2014, 4:35:37 PM1/29/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
Dear Tomoaki,
 Thank you for such a quick response. I've installed the latest patch, and while the test block completes without raising errors, the resulting value (ser2) is not the same as the one being inserted:

>   dbGetQuery(con,"CREATE TABLE byteatable (name text NOT NULL, val bytea, PRIMARY KEY (name))")
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "byteatable_pkey" for table "byteatable"
NULL
>   sample.object <- list("one","two"); 
>   ser <- serialize(sample.object,NULL,ascii=F); 
>   postgresqlEscapeBytea(con, ser)
[1] "\\x580a0000000200020f03000203000000001300000002000000100000000100040009000000036f6e650000001000000001000400090000000374776f"
>   iq <- sprintf("INSERT INTO byteatable values('%s',E'%s');","name1", postgresqlEscapeBytea(con, ser)) 
>   dbGetQuery(con, iq)
NULL
>   rows<-dbGetQuery(con, "SELECT * from byteatable")
>   ser2<-postgresqlUnescapeBytea(rows[[2]])
>   ser2
  [1] 58 30 61 30 30 30 30 30 30 30 32 30 30 30 32 30 66 30 33 30 30 30 32 30 33
 [26] 30 30 30 30 30 30 30 30 31 33 30 30 30 30 30 30 30 32 30 30 30 30 30 30 31
 [51] 30 30 30 30 30 30 30 30 31 30 30 30 34 30 30 30 39 30 30 30 30 30 30 30 33
 [76] 36 66 36 65 36 35 30 30 30 30 30 30 31 30 30 30 30 30 30 30 30 31 30 30 30
[101] 34 30 30 30 39 30 30 30 30 30 30 30 33 37 34 37 37 36 66
>   ser
 [1] 58 0a 00 00 00 02 00 02 0f 03 00 02 03 00 00 00 00 13 00 00 00 02 00 00 00
[26] 10 00 00 00 01 00 04 00 09 00 00 00 03 6f 6e 65 00 00 00 10 00 00 00 01 00
[51] 04 00 09 00 00 00 03 74 77 6f


I haven't experimented with this yet, but perhaps this has something to do with the Postgres version (I have 9.1 installed). 

That return doesn't look like a hex value to me .. it's the same prior to unescape:

>   rows[[2]]
[1] "\\x5830613030303030303032303030323066303330303032303330303030303030303133303030303030303230303030303031303030303030303031303030343030303930303030303030333666366536353030303030303130303030303030303130303034303030393030303030303033373437373666"

Interestingly, if you insert something else, that does generate hex values (not sure how to interpret them though)

>   iq <- sprintf("INSERT INTO byteatable values('%s',E'\\x%s');","name1", "something")
>   dbGetQuery(con, iq)
NULL
>   rows<-dbGetQuery(con, "SELECT * from byteatable")
>   rows[[2]]
[1] "\\x78736f6d657468696e67"
>   ser2<-postgresqlUnescapeBytea(rows[[2]])
>   ser2
 [1] 78 73 6f 6d 65 74 68 69 6e 67

-peter.

Ian Gow

unread,
Jan 29, 2014, 4:38:44 PM1/29/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
I think Joe Conway has some examples of pulling R binary data into PostgreSQL as bytea using PL/R. So there may be useful ideas in the PL/R code.

Links:

NISHIYAMA Tomoaki

unread,
Jan 30, 2014, 6:42:18 AM1/30/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
Hi, 

It appears that the server returned in 'hex' format, which is not
correctly handled with PQunescapeBytea().
So, I inserted a dispatch based on the first two characters.
If the string starts with '\x' that will treated as hex format and decoded as such.
commited as r258

   dbGetQuery(con,"CREATE TABLE byteatable (name text NOT NULL, val bytea, PRIMARY KEY (name))")
   sample.object <- list("one","two");
   ser <- serialize(sample.object,NULL,ascii=F);
   postgresqlEscapeBytea(con, ser)
   iq <- sprintf("INSERT INTO byteatable values('%s','%s');","name1", postgresqlEscapeBytea(con, ser))

   dbGetQuery(con, iq)
   rows<-dbGetQuery(con, "SELECT * from byteatable")
   ser2<-postgresqlUnescapeBytea(rows[[2]])

The test was done with PostgreSQL 9.3.2 this time. 
(previous was with 8.4 and the different number of backslash
mandates to remove E prefix before the single quote opening.)

There is a caution on standard_conforming_strings 
If this is on per default of 9.X, then the back slash in the string is treated as is in 'single quotes'
but in E'escaped string' they are treated as escape prefix.

>    dbGetQuery(con,"CREATE TABLE byteatable (name text NOT NULL, val bytea, PRIMARY KEY (name))")
NULL
>    sample.object <- list("one","two");
>    ser <- serialize(sample.object,NULL,ascii=F);
>    postgresqlEscapeBytea(con, ser)
[1] "X\\012\\000\\000\\000\\002\\000\\003\\000\\000\\000\\002\\003\\000\\000\\000\\000\\023\\000\\000\\000\\002\\000\\000\\000\\020\\000\\000\\000\\001\\000\\004\\000\\011\\000\\000\\000\\003one\\000\\000\\000\\020\\000\\000\\000\\001\\000\\004\\000\\011\\000\\000\\000\\003two"
>    iq <- sprintf("INSERT INTO byteatable values('%s','%s');","name1", postgresqlEscapeBytea(con, ser))
>    dbGetQuery(con, iq)
NULL
>    rows<-dbGetQuery(con, "SELECT * from byteatable")
>    ser2<-postgresqlUnescapeBytea(rows[[2]])
> ser2
 [1] 58 0a 00 00 00 02 00 03 00 00 00 02 03 00 00 00 00 13 00 00 00 02 00 00 00
[26] 10 00 00 00 01 00 04 00 09 00 00 00 03 6f 6e 65 00 00 00 10 00 00 00 01 00
[51] 04 00 09 00 00 00 03 74 77 6f
> ser
 [1] 58 0a 00 00 00 02 00 03 00 00 00 02 03 00 00 00 00 13 00 00 00 02 00 00 00
[26] 10 00 00 00 01 00 04 00 09 00 00 00 03 6f 6e 65 00 00 00 10 00 00 00 01 00
[51] 04 00 09 00 00 00 03 74 77 6f
> rows[[2]]
[1] "\\x580a0000000200030000000203000000001300000002000000100000000100040009000000036f6e650000001000000001000400090000000374776f"

-- 

Tomoaki NISHIYAMA


Advanced Science Research Center,

Kanazawa University,

13-1 Takara-machi, 

Kanazawa, 920-0934, Japan



Peter K

unread,
Jan 30, 2014, 12:26:10 PM1/30/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki
Thank you, Tomoaki! All works now.

Just out curiosity ... since raw-character-raw conversions must result in quite a bit of overhead. I don't have much experience with binary blob fields in DBs, but is this a require procedure, or is there a way to just copy bytes directly without character conversion?

Again, thank you for helping me find a solution so quickly!
Best,
-peter.

NISHIYAMA Tomoaki

unread,
Jan 31, 2014, 10:12:23 AM1/31/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki, Peter K
Hi,

Just out curiosity ... since raw-character-raw conversions must result in quite a bit of overhead.
I don't have much experience with binary blob fields in DBs,
but is this a require procedure, or is there a way to
just copy bytes directly without character conversion?

According to the documentation, we can send binary data as parameters using PQexecParams() or PQexecPrepared(),
and for these we can select on the field basis whether to make it text or binary format.
For retrieval, we can suggest to use binary format, but not specify which field be text and which be binary.
Thus, we have to be sure that we can handle correctly every data that can be sent, not only bytea,
but also integer and floating point data types. These data are machine dependent and R supports even
solaris. So, it is not trivial to perform binary data retrieval.

The most basic function PQexec(), which is the basis of dbSendQuery() and dbGetQuery() 
can send only a text string and cannot specify return type.

Anyhow, with recent development of CPU, the bottle neck is usually not the copy/conversion
on the end computer, but the network and disk I/O. Thus, I am not concerned on the
overhead by conversion to/from text representation.

The dump/restore utility also makes everything in text format.


I think binary transfer would merit most for the double precision floating point number,

where it is difficult to transfer the exact number with text format. However, the general
recommendation is first make text work. It is easier to debug and usually there is no
effective performance penalty.


Kind regards,

Tomoaki

-- 

Tomoaki NISHIYAMA


Advanced Science Research Center,

Kanazawa University,

13-1 Takara-machi, 

Kanazawa, 920-0934, Japan


Peter K

unread,
Jan 31, 2014, 3:05:48 PM1/31/14
to rpostgr...@googlegroups.com, NISHIYAMA Tomoaki, Peter K
Thank you for the detailed explanation!
-peter.
Reply all
Reply to author
Forward
0 new messages