native bytea support for binary data

17 views
Skip to first unread message

kerby2000

unread,
Nov 1, 2010, 11:03:42 AM11/1/10
to ocpgdb
Hi Andrew,

That is me again.
As you saw from my previous example I wand to store ~300MB to
database.
And normally this is binary data (bytea type).
I got the feeling that when I store bytea data it is converted into
string, rather than using the postgres library special functionality
(http://archives.postgresql.org/pgsql-interfaces/2005-06/
msg00019.php).

I followed your code and saw that it performs the following (when
calling the execute method of the db cursor):
step 1: normalize the arguments
step 2: pack the arguments according to their types (and in our case,
it is pack_bytea that is of interest)
step 3: convert the python arguments into postgres C arguments in the
following code:

static PyObject *
connection_execute(PyPgConnection *self, PyObject *args)
...
} else if (PyTuple_Check(param)) {
if (_param_tuple(param, &paramTypes[n], &paramValues[n],
&paramLengths[n]) < 0) {
Py_DECREF(param);
goto error;
}
paramFormats[n] = 1;
} else {
...
There are two options for C like arguments in your code: one is a
plain string and another is more versatile, where the type, value and
length are retrieved from a tuple.
I read in the postgres documentation that in order to enjoy the bytea
specific functionality, a type "17" must be passed, so I figure that
the packing (pack_bytea) should create an appropriate tuple, but now
it only returns a string (unlike other packing methods in that code
that create a tuple from a single type object).

How can I write bytea data into database without extra conversion to
string?

Kind regards,

Sergey

Andrew McNamara

unread,
Nov 3, 2010, 4:56:36 AM11/3/10
to ocp...@googlegroups.com
>As you saw from my previous example I wand to store ~300MB to
>database.
>And normally this is binary data (bytea type).

The bytea/Binary class in ocpgdb is a thin wrapper around the python
str type. It's purpose is allow you to signal to the lower level
functions that you want the string passed as a type 17 "bytea".

>I got the feeling that when I store bytea data it is converted into
>string, rather than using the postgres library special functionality

>I followed your code and saw that it performs the following (when


>calling the execute method of the db cursor):
>step 1: normalize the arguments
>step 2: pack the arguments according to their types (and in our case,
>it is pack_bytea that is of interest)

All arguments to queries are converted to a (oid, data) tuple (the to_db
functions) based on the python type. Oid is an integer representing the
Postgres type, and data is a python string containing binary data.

The to_db function for the bytea type (pack_bytea, as you note) simply
returns (pgoid.bytea, value). Essentially, the only difference between
this and a plain string is that the plain string tuple is (pgoid.text,
value). Communication with the server is the same in both cases, the
server just sees a different type oid (which may effect SQL casting).

Unlike most other Python Postgres adapters, ocpgdb uses the newer version
3 protocol, and passes all data as binary, so there's really no special
processing needed for bytea vs text.

>step 3: convert the python arguments into postgres C arguments in the
>following code:
>
>static PyObject *
>connection_execute(PyPgConnection *self, PyObject *args)
>...
> } else if (PyTuple_Check(param)) {
> if (_param_tuple(param, &paramTypes[n], &paramValues[n],
> &paramLengths[n]) < 0) {
> Py_DECREF(param);
> goto error;
> }
> paramFormats[n] = 1;
> } else {
>...
>There are two options for C like arguments in your code: one is a
>plain string and another is more versatile, where the type, value and
>length are retrieved from a tuple.
>I read in the postgres documentation that in order to enjoy the bytea
>specific functionality, a type "17" must be passed, so I figure that
>the packing (pack_bytea) should create an appropriate tuple, but now
>it only returns a string (unlike other packing methods in that code
>that create a tuple from a single type object).

I'm pretty sure the _param_tuple() version is being used for ocdbpg.bytea
type (the simpler string parameter handling does not do any special
character quoting, so if it was being used, you wouldn't be able to pass
'\0' to Postgres, which you can).

In todb.py:

set_to_db(bytea, pgtype.pack_bytea)

In pgtype.py:

def pack_bytea(value):
return pgoid.bytea, value

Returns a 2-tuple, pgoid.bytea is 17, as we expect.

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Reply all
Reply to author
Forward
0 new messages