Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Making DBD::Pg to return Postgres arrays as Perl arrays

12 views
Skip to first unread message

(Adam Sjøgren)

unread,
May 7, 2015, 7:45:03 AM5/7/15
to dbix-...@lists.scsys.co.uk, dbd...@perl.org
I wrote:

> When I insert new rows where the field has a Perl-array as the value,
> DBIx::Class+DBD::Pg automatically stores them in the database, as hoped
> - nice!

> But when I read the field again, I get the textual "Postgres-encoded"
> representation back (i.e. a string like '{"(a,b,c)","(d,e,f)"}').

After looking more closely on this:

Simple arrays work out of the box, i.e. text[] I can read/write and
values are automatically converted from Perl arrays when writing to the
database, and converted back to Perl arrays when reading.

The problem I have is only when my datatype is a (custom) tuple.

I.e. in Postgres I have:

CREATE TYPE entry AS (tag TEXT, created TIMESTAMP, notes TEXT, initials TEXT);
ALTER TABLE vehicle ADD COLUMN entrylist entry ARRAY;

I would like to be able to do:

$schema->resultset('Vehicle')->create({ entrylist=>[
[ 'a', '2015-05-07', 'info', 'adsj' ],
[ 'b', '2015-05-07', 'more', 'adsj' ],
] });

and I was hoping that there was a way to configure DBIx::Class/DBD::Pg
for that to work (it doesn't, the list-ref of list-refs gets sent to the
database as '{{"a",...},{"b",...}}' which should be '{("a",...),("b",...)}',
and Postgres complains).

... and I was hoping that when reading from the database, I could
automatically get a list-ref of list-refs back.

Maybe I was hoping for too much - I just wanted to check that I wasn't
missing something obvious, before hacking together a parser for
Postgres' format.

Best regards,

Adam

Adding Cc to the DBD::Pg mailing list, as it now looks to me to perhaps
not just be me not configuring DBIx::Class properly.

--
Adam Sjøgren
ad...@novozymes.com

Alvar Freude

unread,
May 7, 2015, 8:45:02 AM5/7/15
to dbix-...@lists.scsys.co.uk, dbd...@perl.org
Hi,

> Am 07.05.2015 um 13:30 schrieb Adam Sjøgren <ad...@novozymes.com>:
>
> The problem I have is only when my datatype is a (custom) tuple.

The same here; I have a small sub, which decodes such strings and creates a array of hashrefs from it. It can be used to convert a pg-array which contains a combined type into a array ref of hashrefs.


Maybe this can be included in DBD::pg? But it needs Text::CSV_XS for the decoding stuff. When interested, I can post the sub.


Ciao
Alvar

signature.asc

Tim Bunce

unread,
May 7, 2015, 4:00:02 PM5/7/15
to dbix-...@lists.scsys.co.uk, dbd...@perl.org, chris....@gmail.com, dbi-...@perl.org

On Thu, May 07, 2015 at 01:30:45PM +0200, Adam Sjøgren wrote:
> I wrote:
>
> > When I insert new rows where the field has a Perl-array as the value,
> > DBIx::Class+DBD::Pg automatically stores them in the database, as hoped
> > - nice!
>
> > But when I read the field again, I get the textual "Postgres-encoded"
> > representation back (i.e. a string like '{"(a,b,c)","(d,e,f)"}').
>
> After looking more closely on this:
>
> Simple arrays work out of the box, i.e. text[] I can read/write and
> values are automatically converted from Perl arrays when writing to the
> database, and converted back to Perl arrays when reading.
>
> The problem I have is only when my datatype is a (custom) tuple.
>
> I.e. in Postgres I have:
>
> CREATE TYPE entry AS (tag TEXT, created TIMESTAMP, notes TEXT, initials TEXT);
> ALTER TABLE vehicle ADD COLUMN entrylist entry ARRAY;
>
> I would like to be able to do:
>
> $schema->resultset('Vehicle')->create({ entrylist=>[
> [ 'a', '2015-05-07', 'info', 'adsj' ],
> [ 'b', '2015-05-07', 'more', 'adsj' ],
> ] });

Take a look at https://metacpan.org/pod/PGObject::Type::Composite

Tim.

(Adam Sjøgren)

unread,
May 8, 2015, 8:30:02 AM5/8/15
to Tim Bunce, dbix-...@lists.scsys.co.uk, dbd...@perl.org, chris....@gmail.com, dbi-...@perl.org
Tim writes:

>> $schema->resultset('Vehicle')->create({ entrylist=>[
>> [ 'a', '2015-05-07', 'info', 'adsj' ],
>> [ 'b', '2015-05-07', 'more', 'adsj' ],
>> ] });

> Take a look at https://metacpan.org/pod/PGObject::Type::Composite

Yeah, Chris pointed me to that as well.

I just thought that when DBD::Pg handles "simple" arrays, handling
arrays of tuples wouldn't be much further a step.

On the write side, the above is turned into {{a,...},{b,...}}, so the
only problem is that the inner {},{} should be (),() instead.

On the read side, there might be more complications.

I haven't looked into DBD::Pg, I was just curious if I was overlooking
existing functionality.


Thanks!

Adam

--
"it will turn into pointer equality or something Adam Sjøgren
ghastly like that" as...@koldfront.dk

Greg Sabino Mullane

unread,
May 11, 2015, 2:30:01 PM5/11/15
to dbix-...@lists.scsys.co.uk, dbd...@perl.org, dbi-...@perl.org
On Thu, May 07, 2015 at 10:04:47PM +0200, Adam Sjøgren wrote:
> I just thought that when DBD::Pg handles "simple" arrays, handling
> arrays of tuples wouldn't be much further a step.

The problem isn't so much as *how* to do it as *when*. It cannot be done
when we are reading in a resultset and come across an unknown type. It
cannot be done at startup as the overhead of scanning the system tables
for custom types would be too heavy. So it would need to be an invoked
method. Then it would somehow need to be tied into the current type system.
Right now, DBD::Pg uses the Postgres source code to generate a static
list of core data types, and knows which ones represent arrays. There is
no easy way to shoehorn new entries in there after the fact, so it's not
a trivial undertaking. It may be easier/more feasible to set an attribute
and tell DBD::Pg directly that column X needs to be handled as an array,
but that gets messy in other ways.

If anyone wants to discuss the design further (and I do want to get this
functionality written sometime soon), let's talk on the dbd-pg list only.

--
Greg Sabino Mullane gr...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
signature.asc

Tim Bunce

unread,
May 12, 2015, 11:45:02 AM5/12/15
to Greg Sabino Mullane, dbd...@perl.org
I think I'd like to see something along these lines:

$dbh->{pg_type_mumble} = { $pg_type_id => $pg_type_info, ... };

Where $pg_type_info contains whatever information DBD::Pg needs to handle
that data type.

Then it's no longer DBD::Pg's responsibility to get that information
on-demand. The application would have the responsibility to provide that
info for the types it needs to handle. I'd expect one or more modules to
be developed to gather that info. The info could be cached by the app,
if appropriate, to avoid refetching on each db connection.

Obviously we'd need to discuss what form that information might take.
The slowest but most flexible option would be to simply provide hooks:

$dbh->{pg_type_mumble} = {
12345 => {
perl_to_pg => sub { ... },
pg_to_perl => sub { ... },
},
...
};

which would be a handy fallback anyway.

Some other ideas:

12345 => 54321, # treat type 12345 like type 54321

pg_to_perl => $PG_TYPE_TREAT_AS_ARRAY, # DBD::Pg exported constant


I wouldn't expect this to do much as it would be implicit:

$sth->bind_col(1, \$foo, { pg_type => $pg_type_id });

but this might be handy to trigger appropriate serialization to postgres:

$sth->bind_param(1, $foo, { pg_type => $pg_type_id });

Tim.

p.s. I'm not really familar with this part of DBD::Pg or the type
mechanism in libpq etc so I may be talking nonsense here :)

Greg Sabino Mullane

unread,
May 13, 2015, 2:45:02 PM5/13/15
to Tim Bunce, dbd...@perl.org
On Tue, May 12, 2015 at 04:26:08PM +0100, Tim Bunce wrote:
> $dbh->{pg_type_mumble} = { $pg_type_id => $pg_type_info, ... };
>
> Where $pg_type_info contains whatever information DBD::Pg needs to handle
> that data type.
>
> Then it's no longer DBD::Pg's responsibility to get that information
> on-demand. The application would have the responsibility to provide that
> info for the types it needs to handle. I'd expect one or more modules to
> be developed to gather that info. The info could be cached by the app,
> if appropriate, to avoid refetching on each db connection.

Sounds reasonable.

> Obviously we'd need to discuss what form that information might take.
> The slowest but most flexible option would be to simply provide hooks:
>
> $dbh->{pg_type_mumble} = {
> 12345 => {
> perl_to_pg => sub { ... },
> pg_to_perl => sub { ... },
> },
> ...
> };
>
> which would be a handy fallback anyway.
>
> Some other ideas:
>
> 12345 => 54321, # treat type 12345 like type 54321
>
> pg_to_perl => $PG_TYPE_TREAT_AS_ARRAY, # DBD::Pg exported constant

Having the user know the numeric ids of the types (both theirs and the
system ones) seems unreasonable - so we could support it, but also allow
data type names when we can).

I think being able to handle arrays would solve 90% of the current
problems, so some simple shortcut as you mention above would be
good, maybe even something like:

$dbh->pg_type_is_array('foobar');

> I wouldn't expect this to do much as it would be implicit:
>
> $sth->bind_col(1, \$foo, { pg_type => $pg_type_id });
>
> but this might be handy to trigger appropriate serialization to postgres:
>
> $sth->bind_param(1, $foo, { pg_type => $pg_type_id });

Yeah, that could be another approach, but I like the first (dbh) version
a little better.

> p.s. I'm not really familar with this part of DBD::Pg or the type
> mechanism in libpq etc so I may be talking nonsense here :)

libpq pretty much stays out of the way except to return us a numeric
type for each returned column. We map this back to type information
stored in some structs (see types.c). We look at this when slinging
the data back and forth to see if we are dealing with an array. Presumably
we could someday internalize a hash-like mapping for things like hstore
and json (and even direct mapping to JSON someday).

I still like the idea of auto-discovery by DBD::Pg, maybe just a simple command
that scans all of the custom types and discovers which are arrays. Perhaps
to supplement the above. As a one-time per session and on-demand command,
it shouldn't be too expensive.
signature.asc

Tim Bunce

unread,
May 18, 2015, 7:30:03 AM5/18/15
to Greg Sabino Mullane, Tim Bunce, dbd...@perl.org
On Wed, May 13, 2015 at 02:33:02PM -0400, Greg Sabino Mullane wrote:
> On Tue, May 12, 2015 at 04:26:08PM +0100, Tim Bunce wrote:
>
> > Obviously we'd need to discuss what form that information might take.
> > The slowest but most flexible option would be to simply provide hooks:
> >
> > $dbh->{pg_type_mumble} = {
> > 12345 => {
> > perl_to_pg => sub { ... },
> > pg_to_perl => sub { ... },
> > },
> > ...
> > };
> >
> > which would be a handy fallback anyway.
> >
> > Some other ideas:
> >
> > 12345 => 54321, # treat type 12345 like type 54321
> >
> > pg_to_perl => $PG_TYPE_TREAT_AS_ARRAY, # DBD::Pg exported constant
>
> Having the user know the numeric ids of the types (both theirs and the
> system ones) seems unreasonable - so we could support it, but also allow
> data type names when we can).

Of course, so long as the names can be reliably and quickly converted to
the underlying real integer values. But that requires a db query so
the API ought to allow the mapping to be supplied so it doesn't have to
be recalculated for each connection to the same db. Part of my point is
that that kind of information could be off-loaded from DBD::pg into a
supporting module, possibly supplied with DBD::Pg.

$dbh->{pg_type_name_map} => { 'foobar' => 98765, ... };

> I think being able to handle arrays would solve 90% of the current
> problems, so some simple shortcut as you mention above would be
> good, maybe even something like:
>
> $dbh->pg_type_is_array('foobar');

which could be generalized to:

$dbh->pg_type_treatment('foobar', $numeric_id_of_array_type);

which could be further generalized to:

$dbh->pg_type_treatment('foobar', 'array');

which could be further generalized to optionally take a hash ref:

$dbh->pg_type_treatment('foobar', { hash of type handling attributes });

and then we're almost back to the hash proposal.

A db query would be needed to map 'foobar' to the corresponding id.
That query couldn't be cached, unlike my attribute API proposal above.
(You could allow typename to be treated as an integer id if it looks like
one, which would avoid the db query for each connection but that's a bit
hackinsh and is pushing complexity onto the app.)


> > I wouldn't expect this to do much as it would be implicit:
> >
> > $sth->bind_col(1, \$foo, { pg_type => $pg_type_id });
> >
> > but this might be handy to trigger appropriate serialization to postgres:
> >
> > $sth->bind_param(1, $foo, { pg_type => $pg_type_id });
>
> Yeah, that could be another approach, but I like the first (dbh) version
> a little better.

They are related. After the app has told DBD::Pg how to handle the type:

$dbh->{pg_type_mumble} = { 9876 => ...how-to-handle-that-type... };

then it can specify that a specific value is of that type:

$sth->bind_param(1, $foo, { pg_type => 9876 });

> > p.s. I'm not really familar with this part of DBD::Pg or the type
> > mechanism in libpq etc so I may be talking nonsense here :)
>
> libpq pretty much stays out of the way except to return us a numeric
> type for each returned column. We map this back to type information
> stored in some structs (see types.c). We look at this when slinging
> the data back and forth to see if we are dealing with an array. Presumably
> we could someday internalize a hash-like mapping for things like hstore
> and json (and even direct mapping to JSON someday).
>
> I still like the idea of auto-discovery by DBD::Pg, maybe just a simple command
> that scans all of the custom types and discovers which are arrays. Perhaps
> to supplement the above. As a one-time per session and on-demand command,
> it shouldn't be too expensive.

Arrays are just one use-case, albeit the most common one. I'm sure many
people would like JSON, for example, to "just work" using their
preferred JSON module, and some would like Geometry types to "just work"
using eg https://metacpan.org/pod/distribution/Geo-JSON/README.pod

Whatever API is added ought to enable support for these use-cases, even
if they're not supported initially. Though allowing at least callback
hooks for serialization and deserialization opens the door for other
people to add support for specific types, like geometry, themselves.

Tim.
0 new messages