Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Interest in serializing composite types?

13 views
Skip to first unread message

Chris Travers

unread,
Aug 24, 2014, 11:07:03 AM8/24/14
to DBD PG
Hi everyone;

I am nearing completion on the logic to serialize composite types into textual representation of tuples for LedgerSMB and I am wondering if there is additional interest from others who use DBD::Pg.  If there is we would certainly release the logic and interfaces on CPAN.  So I figured I would ask.

For those who would be interested, I guess I have a few questions:

1.  Do you need Moose or Moo support?
2.  Do you need plain old hashref support?
3.  Do you need bytea support?

If this is unwelcome please ignore but this seemed like a good place to ask since it is PostgreSQL/Perl-specific and many of you might have thought about using composite types in this way.

In case it isn't clear what I am talking about is:

given a type:

CREATE TYPE foo (
   bar text,
   baz text
);

and a hashref {bar => 'foo', baz => 'this, or else that'}

it should produce (foo,"this, or else that")

--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Tim Bunce

unread,
Aug 24, 2014, 6:26:16 PM8/24/14
to Chris Travers, DBD PG
On Sun, Aug 24, 2014 at 08:07:03AM -0700, Chris Travers wrote:
> Hi everyone;
> I am nearing completion on the logic to serialize composite types into textual representation of tuples
> for LedgerSMB and I am wondering if there is additional interest from others who use DBD::Pg.

> In case it isn't clear what I am talking about is:
> given a type:
> CREATE TYPE foo (
> bar text,
> baz text
> );
> and a hashref {bar => 'foo', baz => 'this, or else that'}
> it should produce (foo,"this, or else that")

I'm not following you Chris.
What's the "it" here and where does it fit into the data flow?

Could you post some example code, ideally showing values of type "foo"
in use in a table and showing both SELECT and bind/INSERT use-cases?

Tim.

Darren Duncan

unread,
Aug 24, 2014, 7:38:07 PM8/24/14
to Chris Travers, DBD PG
Chris, your idea seems good to me in principle, that is when the Postgres value
is of some composite type, that we support it being automatically exposed as a
composite value in Perl.

By default I would recommend, keep it simple.

As I recall, DBD::Pg already supports Postgres arrays as Perl arrays, and
similarly I would recommend supporting composite type values like you mention,
which are structurally tuples, as Perl hash refs for the members.

For a general form for named composite types, I suggest representing a foo like
this: "['foo',{bar=>'what',baz=>'ev'}]"; its the simplest form that keeps all
the information; the 'foo' part could also be generalized to however
fully-qualified entity names are, eg ['public','foo'] for example.

-- Darren Duncan

Chris Travers

unread,
Aug 24, 2014, 7:49:57 PM8/24/14
to Tim Bunce, DBD PG
On Sun, Aug 24, 2014 at 3:26 PM, Tim Bunce <Tim....@pobox.com> wrote:
On Sun, Aug 24, 2014 at 08:07:03AM -0700, Chris Travers wrote:
>    Hi everyone;
>    I am nearing completion on the logic to serialize composite types into textual representation of tuples
>    for LedgerSMB and I am wondering if there is additional interest from others who use DBD::Pg.

>    In case it isn't clear what I am talking about is:
>    given a type:
>    CREATE TYPE foo (
>        bar text,
>        baz text
>    );
>    and a hashref {bar => 'foo', baz => 'this, or else that'}
>    it should produce (foo,"this, or else that")

I'm not following you Chris.
What's the "it" here and where does it fit into the data flow?

It would be the serializer.

What I am looking at specifically for my use case is that I can have an object which is tied to a composite type's structure in the db.  This could then be passed into stored procedures.  Here's a basic example:

CREATE TYPE journal_entry_iface (
   id int,
   source text,
   description text,
   post_date date,
   approved bool,
   lines journal_line[]
);

where journal_line is another composite type (here probably a table)

Then I can have a function like:

CREATE OR REPLACE FUNCTION save(self journal_entry_iface)
RETURNS BOOL
LANGUAGE SQL AS
$$
    INSERT INTO journal_entry(source, description, post_date, approved)
    VALUES (self.source, self.description, self.post_date, self.approved);
    INSERT INTO journal_line
    SELECT * FROM self.lines;
    SELECT TRUE; 
$$;

Could you post some example code, ideally showing values of type "foo"
in use in a table and showing both SELECT and bind/INSERT use-cases?

However, the above could be done for straight inserts too, since you can use composite types in columns in PostgreSQL.

So you could have:

CREATE TYPE currency_amount (
    currency char(3),
    amount numeric
);
CREATE FUNCTION text(currency_amount) LANGUAGE SQL AS
$$ SELECT $1.amount::text || ' ' || $1.currency; $$;

You could then serialize or deserialize currency amount from your application but call text() around it to display it nicely in reports.  This might be handy for multicurrency applications (though there are of course other ways to do that).

Tim.

Chris Travers

unread,
Aug 24, 2014, 8:33:04 PM8/24/14
to Darren Duncan, DBD PG
On Sun, Aug 24, 2014 at 4:38 PM, Darren Duncan <dar...@darrenduncan.net> wrote:
Chris, your idea seems good to me in principle, that is when the Postgres value is of some composite type, that we support it being automatically exposed as a composite value in Perl.

By default I would recommend, keep it simple.

As I recall, DBD::Pg already supports Postgres arrays as Perl arrays, and similarly I would recommend supporting composite type values like you mention, which are structurally tuples, as Perl hash refs for the members.

For a general form for named composite types, I suggest representing a foo like this: "['foo',{bar=>'what',baz=>'ev'}]"; its the simplest form that keeps all the information; the 'foo' part could also be generalized to however fully-qualified entity names are, eg ['public','foo'] for example.

I would love to see this in DBD::Pg.  However there are some complexities here not present in arrays.  Arrays are at least in theory self-contained.  We don't have to ask the db how to parse them.  Tuples are not (we need to do a catalog lookup to find the order of elements).

Do we want that to be memoized?  My inclination is maybe the developer should be in control there but that would also add some complexity there and possibly a (core) dependency.  The dependency doesn't strike me as a problem but the design decisions do.

So I am thinking this way:

First draft will probably be in my PGObject framework.  Once this is working and stable, I will look at porting the core logic to DBD::Pg.  I won't promise proper Moose handling of things like lazy attributes in this version, but it should be able to handle hashrefs.

Does that sound like a reasonable plan?

Best Wishes,
Chris Travers



--
Best Wishes,
Chris Travers

Darren Duncan

unread,
Aug 24, 2014, 9:32:25 PM8/24/14
to Chris Travers, DBD PG
On 2014-08-24, 5:33 PM, Chris Travers wrote:
> On Sun, Aug 24, 2014 at 4:38 PM, Darren Duncan <dar...@darrenduncan.net> wrote:
> As I recall, DBD::Pg already supports Postgres arrays as Perl arrays, and
> similarly I would recommend supporting composite type values like you
> mention, which are structurally tuples, as Perl hash refs for the members.
>
> For a general form for named composite types, I suggest representing a foo
> like this: "['foo',{bar=>'what',baz=>'ev'__}]"; its the simplest form that
> keeps all the information; the 'foo' part could also be generalized to
> however fully-qualified entity names are, eg ['public','foo'] for example.
>
> I would love to see this in DBD::Pg. However there are some complexities here
> not present in arrays. Arrays are at least in theory self-contained. We don't
> have to ask the db how to parse them. Tuples are not (we need to do a catalog
> lookup to find the order of elements).
>
> Do we want that to be memoized? My inclination is maybe the developer should be
> in control there but that would also add some complexity there and possibly a
> (core) dependency. The dependency doesn't strike me as a problem but the design
> decisions do.

I think composite types should be treated in exactly the same manner as tuples
in regular queries. Whenever we look up information to parse the outside-most
tuples of query results, do inner tuples/types then too. If it isn't possible
to do that at prepare() time, then do it at execute() time. Do not repeat for
each row fetched from the result of an execute(), assuming data is homogeneous.
This is the simplest way to know that the type definitions we're working with
are still valid for the results.

We do not want to add any non-core dependencies. All that should be required to
use DBD::Pg is Perl itself, DBI, and DBD::Pg. Any extra dependencies should be
optional and user-defined, eg the user of DBD::Pg registers some sort of handler
against DBI or DBD::Pg to eg override behavior to memoize for speed, but DBD::Pg
has no knowledge of this other than to make the relevant parts possible to
override using said generic API.

> So I am thinking this way:
>
> First draft will probably be in my PGObject framework. Once this is working and
> stable, I will look at porting the core logic to DBD::Pg.

Sounds like a plan. Test things out externally and only port something simple
to DBD::Pg if there's sufficient agreement and lack of opposition, because for
example there might have been several ways to design something and we'd want to
pick what is most flexible or sufficiently simple. Analogous to how the Perl
core has gradually added minimal support for various object or role systems but
most details have been kept out of the core.

> I won't promise
> proper Moose handling of things like lazy attributes in this version, but it
> should be able to handle hashrefs.
>
> Does that sound like a reasonable plan?

I don't know what lazy attributes have to do with this, unless you're talking
about only deserializing a composite-typed value if it is actually accessed.
Either way, DBI/DBD::Pg itself should know nothing about Moose or any other
non-core object systems and just represent things with what tools the core
provides; your framework can do that as you see fit though.

I don't know if it does this (please tell me), but you know what would be an
extremely useful feature in Postgres itself? It is there being a simple fast
way to query the database if it had any schema changes (anything DDL does) since
it was last asked the question. This could take the form of a LISTEN/NOTIFY of
some system-defined channel. If we assume in practice that schema changes are
infrequent, then we can safely memoize any schema knowledge we have and know it
is still valid until we get such a message saying there was a change, and then
we flush the cache and check what we need to know again.

The aforementioned would be widely useful and relatively simple to provide, so
if it doesn't exist I think I'll formally propose it to hackers.

Maybe it can be connected to the in-development feature about DDL triggers, or
be implemented in terms of such, eg a DDL trigger fires off a NOTIFY. But if it
was provided by the system automatically, users won't have to explicitly define
triggers / alter the schema in order to have the feature.

-- Darren Duncan

Darren Duncan

unread,
Aug 24, 2014, 10:03:39 PM8/24/14
to Chris Travers, DBD PG
On 2014-08-24, 6:32 PM, Darren Duncan wrote:
> I don't know if it does this (please tell me), but you know what would be an
> extremely useful feature in Postgres itself? It is there being a simple fast
> way to query the database if it had any schema changes (anything DDL does) since
> it was last asked the question. This could take the form of a LISTEN/NOTIFY of
> some system-defined channel. If we assume in practice that schema changes are
> infrequent, then we can safely memoize any schema knowledge we have and know it
> is still valid until we get such a message saying there was a change, and then
> we flush the cache and check what we need to know again.
>
> The aforementioned would be widely useful and relatively simple to provide, so
> if it doesn't exist I think I'll formally propose it to hackers.
>
> Maybe it can be connected to the in-development feature about DDL triggers, or
> be implemented in terms of such, eg a DDL trigger fires off a NOTIFY. But if it
> was provided by the system automatically, users won't have to explicitly define
> triggers / alter the schema in order to have the feature.

Replying to myself, I looked it up.

Starting with Postgres 9.3, which has been production-ready status for 11 months
now, it looks like most of what I proposed is possible.

I'm not sure if I'm interpreting things right, but perhaps it is as simple as
doing a LISTEN on "ddl_command_start", unless that's not the same kind of event.

Otherwise, here's longer-hand for what I meant:

CREATE OR REPLACE FUNCTION report_schema_did_change()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
BEGIN
NOTIFY event_schema_did_change;
END;
$$;

CREATE EVENT TRIGGER trg_schema_did_change ON ddl_command_start
EXECUTE PROCEDURE report_schema_did_change();

... in which case you LISTEN on "event_schema_did_change".

See http://www.postgresql.org/docs/9.3/static/event-triggers.html et al.

Unfortunately that doesn't trigger for some kinds of DDL, eg on whole schemas,
but it does trigger for most individual kinds of schema objects.

Chris, so my point is, probably a best way to handle memoization is to take
advantage of said feature of Postgres 9.3+ so one can get more performance in
DBD::Pg or tools using it, in the most effective way.

-- Darren Duncan

Darren Duncan

unread,
Aug 24, 2014, 10:23:04 PM8/24/14
to Chris Travers, DBD PG
What you proposed here sounds reasonable to me. -- Darren Duncan

On 2014-08-24, 7:15 PM, Chris Travers wrote:
> I think composite types should be treated in exactly the same manner as
> tuples in regular queries. Whenever we look up information to parse the
> outside-most tuples of query results, do inner tuples/types then too.
>
>
> Sure. There is of course a question of responsibility here. For OO stuff, I
> don't mind saying "it's the object's responsibility to serialize" but we can't
> put that in DBD::Pg. That is, however, the assumption in PGObject.
>
> Because it will make the lazy attribute problems clearer, I will describe the
> PGObject framework here.
>
> PGObject.pm is essentially glue which ties in objects to the db via common
> defined interfaces (essentially duck typing). Then the actual object
> implementations (currently limited to PGObject::Simple for blessed hashrefs and
> PGObject::Simple::Role for /Moo(se)?$/) then provide mapping services between
> stored procedures and application methods. Because of the common interfaces,
> object-responsible serialization means one can have a custom datetime handler
> which serializes and deserializes appropriately (for simple types).
>
> If it isn't possible to do that at prepare() time, then do it at
> execute() time. Do not repeat for each row fetched from the result of an
> execute(), assuming data is homogeneous. This is the simplest way to know
> that the type definitions we're working with are still valid for the results.
>
>
> So basically if we see a hashref, we need to know what type it is. This is
> going to be outside SQL. We can do this with stored procs because we can look
> up expected input types. I don't see how you can do something on an insert or
> update statement though without the application specifying type. I don't see
> how this could be done outside a bind_param call. Nested tuples at that point
> cease to be a major problem.
>
>
> We do not want to add any non-core dependencies. All that should be
> required to use DBD::Pg is Perl itself, DBI, and DBD::Pg.
>
>
> Memoize is core and has been since at least 5.8.9 according to what I have been
> able to find. This being said, memoization is dangerous when it comes to db
> stuff so it should definitely be optional if used at all.
>
> Of course, with a public API, there's no reason one can't memoize a wrapper
> function.
>
> Any extra dependencies should be optional and user-defined, eg the user
> of DBD::Pg registers some sort of handler against DBI or DBD::Pg to eg
> override behavior to memoize for speed, but DBD::Pg has no knowledge of this
> other than to make the relevant parts possible to override using said
> generic API.
>
>
> Right.
>
>
>
>
>
> I won't promise
> proper Moose handling of things like lazy attributes in this version, but it
> should be able to handle hashrefs.
>
> Does that sound like a reasonable plan?
>
>
> I don't know what lazy attributes have to do with this, unless you're
> talking about only deserializing a composite-typed value if it is actually
> accessed.
>
>
> No The problem is you have a moose object with a lazy attribute. When you want
> to serialize it, that attribute may not have been initialized yet. There are a
> number of solutions to this problem:
>
> 1. Make it the object's responsibility to initialize lazy attributes before
> calling db methods. That is what we did in LedgerSMB 1.4, but we found it error
> prone.
>
> 2. Go through accessors if available. This is what I recently did in
> PGObject::Simple::Role and that makes things work for other object systems which
> don't map to simple hashrefs.
>
> I think the obvious way to do this would be to have DBD::Pg *only* work on
> hashrefs, but make the catalog lookups into a public API. This would allow
> other implementations to handle serialization and just hand in a string.
>
> This would keep DBD::Pg's responsibility here to a minimum.
>
> Either way, DBI/DBD::Pg itself should know nothing about Moose or any other
> non-core object systems and just represent things with what tools the core
> provides; your framework can do that as you see fit though.
>
>
> I am actually thinking that a well-designed public API handling only hashrefs
> would be the answer here. This way you could ask for a type definition, and
> generate a hashref with appropriate information. This could then be passed in
> again using a public API so no catalog needs to be hit.
>
>
> I don't know if it does this (please tell me), but you know what would be an
> extremely useful feature in Postgres itself? It is there being a simple
> fast way to query the database if it had any schema changes (anything DDL
> does) since it was last asked the question. This could take the form of a
> LISTEN/NOTIFY of some system-defined channel.
>
>
> http://www.postgresql.org/docs/9.3/static/sql-createeventtrigger.html is the way
> to do it. That's outside the scope of DBD::Pg, though with a public API, an
> application could listen and ask to requery.
>
> If we assume in practice that schema changes are infrequent, then we can
> safely memoize any schema knowledge we have and know it is still valid until
> we get such a message saying there was a change, and then we flush the cache
> and check what we need to know again.
>
>
> Right. What I do with PGObject.pm's stored proc lookups is make memoization
> optional (because I don't know how an app developer wants to handle db schema
> upgrades), but where it is memoized, we flush the cache when we get an error
> about bad arguments and we provide a public API to flush the cache. This allows
> developers to decide how to handle these changes with a bit of a failsafe if
> they forget.
>
> I think memoization should be optional. There are a lot of traps with it that
> are way outside DBD::Pg's scope both on the db side and on the app development
> side (oops, we used shift on the return value).
>
> -- Darren Duncan

Chris Travers

unread,
Aug 24, 2014, 10:15:46 PM8/24/14
to Darren Duncan, DBD PG


I think composite types should be treated in exactly the same manner as tuples in regular queries.  Whenever we look up information to parse the outside-most tuples of query results, do inner tuples/types then too.

Sure.  There is of course a question of responsibility here.  For OO stuff, I don't mind saying "it's the object's responsibility to serialize" but we can't put that in DBD::Pg.  That is, however, the assumption in PGObject.

Because it will make the lazy attribute problems clearer, I will describe the PGObject framework here.

PGObject.pm is essentially glue which ties in objects to the db via common defined interfaces (essentially duck typing).  Then the actual object implementations (currently limited to PGObject::Simple for blessed hashrefs and PGObject::Simple::Role for /Moo(se)?$/) then provide mapping services between stored procedures and application methods.  Because of the common interfaces, object-responsible serialization means one can have a custom datetime handler which serializes and deserializes appropriately (for simple types).
 
  If it isn't possible to do that at prepare() time, then do it at execute() time.  Do not repeat for each row fetched from the result of an execute(), assuming data is homogeneous.  This is the simplest way to know that the type definitions we're working with are still valid for the results.

So basically if we see a hashref, we need to know what type it is.  This is going to be outside SQL.  We can do this with stored procs because we can look up expected input types.  I don't see how you can do something on an insert or update statement though without the application specifying type.  I don't see how this could be done outside a bind_param call.  Nested tuples at that point cease to be a major problem. 
We do not want to add any non-core dependencies.  All that should be required to use DBD::Pg is Perl itself, DBI, and DBD::Pg.
Memoize is core and has been since at least 5.8.9 according to what I have been able to find. This being said, memoization is dangerous when it comes to db stuff so it should definitely be optional if used at all.

Of course, with a public API, there's no reason one can't memoize a wrapper function.
 
  Any extra dependencies should be optional and user-defined, eg the user of DBD::Pg registers some sort of handler against DBI or DBD::Pg to eg override behavior to memoize for speed, but DBD::Pg has no knowledge of this other than to make the relevant parts possible to override using said generic API.

Right. 




I won't promise
proper Moose handling of things like lazy attributes in this version, but it
should be able to handle hashrefs.

Does that sound like a reasonable plan?

I don't know what lazy attributes have to do with this, unless you're talking about only deserializing a composite-typed value if it is actually accessed.

No  The problem is you have a moose object with a lazy attribute.  When you want to serialize it, that attribute may not have been initialized yet.  There are a number of solutions to this problem:

1.  Make it the object's responsibility to initialize lazy attributes before calling db methods.  That is what we did in LedgerSMB 1.4, but we found it error prone.

2.  Go through accessors if available.  This is what I recently did in PGObject::Simple::Role and that makes things work for other object systems which don't map to simple hashrefs.

I think the obvious way to do this would be to have DBD::Pg *only* work on hashrefs, but make the catalog lookups into a public API.  This would allow other implementations to handle serialization and just hand in a string.

This would keep DBD::Pg's responsibility here to a minimum.
 
Either way, DBI/DBD::Pg itself should know nothing about Moose or any other non-core object systems and just represent things with what tools the core provides; your framework can do that as you see fit though.
 
I am actually thinking that a well-designed public API handling only hashrefs would be the answer here.  This way you could ask for a type definition, and generate a hashref with appropriate information.  This could then be passed in again using a public API so no catalog needs to be hit.
I don't know if it does this (please tell me), but you know what would be an extremely useful feature in Postgres itself?  It is there being a simple fast way to query the database if it had any schema changes (anything DDL does) since it was last asked the question.  This could take the form of a LISTEN/NOTIFY of some system-defined channel.
http://www.postgresql.org/docs/9.3/static/sql-createeventtrigger.html is the way to do it.  That's outside the scope of DBD::Pg, though with a public API, an application could listen and ask to requery.
 
  If we assume in practice that schema changes are infrequent, then we can safely memoize any schema knowledge we have and know it is still valid until we get such a message saying there was a change, and then we flush the cache and check what we need to know again.

Right.  What I do with PGObject.pm's stored proc lookups is make memoization optional (because I don't know how an app developer wants to handle db schema upgrades), but where it is memoized, we flush the cache when we get an error about bad arguments and we provide a public API to flush the cache. This allows developers to decide how to handle these changes with a bit of a failsafe if they forget. 

I think memoization should be optional.  There are a lot of traps with it that are way outside DBD::Pg's scope both on the db side and on the app development side (oops, we used shift on the return value).

-- Darren Duncan

Greg Sabino Mullane

unread,
Aug 25, 2014, 1:52:21 PM8/25/14
to Chris Travers, Darren Duncan, DBD PG
On Sun, Aug 24, 2014 at 05:33:04PM -0700, Chris Travers wrote:
> I would love to see this in DBD::Pg. However there are some complexities
> here not present in arrays. Arrays are at least in theory self-contained.
> We don't have to ask the db how to parse them. Tuples are not (we need to
> do a catalog lookup to find the order of elements).

The biggest problem is that any composite types are going to be user-defined
types, and thus DBD::Pg cannot know about them in advance (as it does for
arrays). This problem has reared its head already. We can query the system
catalogs to gather information about data types, but the key point is when?
On connection? When we encounter an unknown type? (tricky, as we are already
inside of an active transaction). Perhaps allow the user to call some method
that tells us to grab the information? I'm thinking some combination of
the former and latter will work: allows a connection attribute that says
"grab information about all user types right after you connect" as well as a
method that does the same thing (or allows some wildcards, etc.). Once that
is in place, it should be relatively easy to roll multiple columns into a
hashref .. I think. Haven't actually dug into that part of the code yet :)

--
Greg Sabino Mullane gr...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
signature.asc

David E. Wheeler

unread,
Aug 25, 2014, 1:53:49 PM8/25/14
to Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
On Aug 25, 2014, at 10:52 AM, Greg Sabino Mullane <gr...@endpoint.com> wrote:

> The biggest problem is that any composite types are going to be user-defined
> types, and thus DBD::Pg cannot know about them in advance (as it does for
> arrays). This problem has reared its head already. We can query the system
> catalogs to gather information about data types, but the key point is when?
> On connection? When we encounter an unknown type? (tricky, as we are already
> inside of an active transaction). Perhaps allow the user to call some method
> that tells us to grab the information? I'm thinking some combination of
> the former and latter will work: allows a connection attribute that says
> "grab information about all user types right after you connect" as well as a
> method that does the same thing (or allows some wildcards, etc.). Once that
> is in place, it should be relatively easy to roll multiple columns into a
> hashref .. I think. Haven't actually dug into that part of the code yet :)

OTOH, would be awesome to have an attribute to serialize and deserialize JSON and JSONB values. :-)

D

signature.asc

Andrew Dunstan

unread,
Aug 25, 2014, 2:32:15 PM8/25/14
to David E. Wheeler, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
Well, of course, json and jsonb are already serializations. Even jsonb
in binary mode is sent to the client as a json string (with a version
number in case we ever decide to send/receive a different binary format).

And turning a composite into json is pretty simple - in 9.4 it's just
to_json(recordval). populate_json(null::recordtype, jsonval) pretty much
does the reverse. On the perl side there are well known modules for
moving between hashes and json objects. Maybe with some sort of Tie
mechanism you could make that almost transparent?

cheers

andrew


David E. Wheeler

unread,
Aug 25, 2014, 2:36:03 PM8/25/14
to Andrew Dunstan, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
On Aug 25, 2014, at 11:32 AM, Andrew Dunstan <and...@dunslane.net> wrote:

> Well, of course, json and jsonb are already serializations. Even jsonb in binary mode is sent to the client as a json string (with a version number in case we ever decide to send/receive a different binary format).
>
> And turning a composite into json is pretty simple - in 9.4 it's just to_json(recordval). populate_json(null::recordtype, jsonval) pretty much does the reverse. On the perl side there are well known modules for moving between hashes and json objects. Maybe with some sort of Tie mechanism you could make that almost transparent?

No, I meant convert /JSONB?/ values to and from hashes or arrays.

D

signature.asc

Andrew Dunstan

unread,
Aug 25, 2014, 2:58:08 PM8/25/14
to David E. Wheeler, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
I'm not following.

In the hands of a client jsonb is still just a string, more or less. And
of course json is just a string anyway. We never send the jsonb on-disk
format to a client, even in binary mode.

cheers

andrew

David E. Wheeler

unread,
Aug 25, 2014, 2:59:22 PM8/25/14
to Andrew Dunstan, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
On Aug 25, 2014, at 11:58 AM, Andrew Dunstan <and...@dunslane.net> wrote:

> In the hands of a client jsonb is still just a string, more or less. And of course json is just a string anyway. We never send the jsonb on-disk format to a client, even in binary mode.

I mean for DBD::Pg to convert between /JSONB/ strings and Perl hashes and/or arrays.

David

signature.asc

David E. Wheeler

unread,
Aug 25, 2014, 3:15:29 PM8/25/14
to Andrew Dunstan, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
On Aug 25, 2014, at 12:11 PM, Andrew Dunstan <and...@dunslane.net> wrote:

> Well, there's no difference from DBD:Pg's POV between json and jsonb. And as I mentioned before, there are numerous modules for converting between json and native perl structures.

I know. What I'm suggesting is a DBD::Pg switch that auto-converts between /JSONB?/ values and hashes and arrays (assuming an appropriate JSON parser is installed).

Best,

David

signature.asc

Andrew Dunstan

unread,
Aug 25, 2014, 3:11:56 PM8/25/14
to David E. Wheeler, Greg Sabino Mullane, Chris Travers, Darren Duncan, DBD PG
Well, there's no difference from DBD:Pg's POV between json and jsonb.
And as I mentioned before, there are numerous modules for converting
between json and native perl structures.

cheers

andrew

David E. Wheeler

unread,
Aug 25, 2014, 6:43:17 PM8/25/14
to Greg Sabino Mullane, Andrew Dunstan, Chris Travers, Darren Duncan, DBD PG
On Aug 25, 2014, at 3:41 PM, Greg Sabino Mullane <gr...@endpoint.com> wrote:

> That's already the plan. Since we know the json type oid already, we
> will add a flag such that when on, json can be slung back and forth
> seamlessly as arrays are now. Arrays are already fully automatic and
> out the barn door, but perhaps a more generic interface such as:
>
> $dbh->pg_automatic_convert('json');
>
> In which 'json' can be replaced by other things down the road. Or even
> allow a second arg to a user coderef/object that knows how to convert
> certain things.

You would need two code refs for back and forth. Would be useful for hstore, too.

D

signature.asc

Greg Sabino Mullane

unread,
Aug 25, 2014, 6:41:59 PM8/25/14
to David E. Wheeler, Andrew Dunstan, Chris Travers, Darren Duncan, DBD PG
On Mon, Aug 25, 2014 at 12:15:29PM -0700, David E. Wheeler wrote:
> I know. What I'm suggesting is a DBD::Pg switch that auto-converts between
> /JSONB?/ values and hashes and arrays (assuming an appropriate JSON
> parser is installed).

That's already the plan. Since we know the json type oid already, we
will add a flag such that when on, json can be slung back and forth
seamlessly as arrays are now. Arrays are already fully automatic and
out the barn door, but perhaps a more generic interface such as:

$dbh->pg_automatic_convert('json');

In which 'json' can be replaced by other things down the road. Or even
allow a second arg to a user coderef/object that knows how to convert
certain things.

signature.asc

Chris Travers

unread,
Aug 25, 2014, 7:33:52 PM8/25/14
to David E. Wheeler, Greg Sabino Mullane, Andrew Dunstan, Darren Duncan, DBD PG
Since the discussion has broadened to serialization and deserialization here are a couple things I have done in PGObject which might also be generally useful (I would be happy to try to port to DBD::PG if folks are interested) related to this:

1.  A standard interface for serialization.  If an object has a to_db() method that gets called for serialization.  This allows programmers to specify how a type serializes.  You can also pass in containerized numbers or the like and specify how they get serialized.

2.  A registry and standard interface for deserialization.  You can register a class to take incoming types and pass them through a from_db() method.  This allows for simple wrappers that turn numeric fields into Math::BigFloat objects for example.  And the application gets to specify which types are important for conversion.  We actually support multiple named registries so that different contexts can have different type mappings.

This provides a more general approach to serialization and deserialization and one which pushes responsibility elsewhere but keeps things manageable.

Chris Travers

unread,
Aug 25, 2014, 7:48:48 PM8/25/14
to Greg Sabino Mullane, Darren Duncan, DBD PG
I think for memory reasons, having a special call would be the way to go.  It may be helpful later to provide to make that call when a bind_param() is called with a user defined type, or when a tuple is encountered.

But I guess this leads to a long-range question:  do we want to be making all kinds of decisions regarding serialization and deserialization in DBD::Pg.  If not, do we want to have standard interfaces to allow folks to specify their own serialization and deserialization (perhaps with some utilities for accessing common catalog information)?

I guess what I am wondering is how far we should scope this in DBD::Pg and how much we should try to hand off responsibility elsewhere.

Best Wishes,
Chris Travers

--
Greg Sabino Mullane gr...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Darren Duncan

unread,
Aug 25, 2014, 8:33:06 PM8/25/14
to Chris Travers, DBD PG
On 2014-08-24, 5:33 PM, Chris Travers wrote:
> Arrays are at least in theory self-contained. We don't
> have to ask the db how to parse them. Tuples are not (we need to do a catalog
> lookup to find the order of elements).

So here's a question ...

DBI has for a long time given the option for tuples to be returned either as
arrays or as hashes.

If the ability to choose representation is preserved in the improvements you're
making, then could the extra db lookups be avoided if say the user choose to get
their tuples as arrays instead? The information about field names is lost, but
then the user chose to not want them anyway, as they are identifying the fields
by ordinal position instead.

DBI defaults to returning tuples from plain SELECTs as arrays, and you have to
ask it if you want hashes; does doing the latter mean a lookup already that the
former doesn't have?

-- Darren Duncan

Chris Travers

unread,
Aug 25, 2014, 8:42:50 PM8/25/14
to Darren Duncan, DBD PG
So suppose I have an array of tuples, each of which has an array of tuples as a member?  Tracking the proper handling of that from the developer's perspective may be difficult.   I would rather get text strings where I can check whether it starts with '(' or '{'

Best wishes,
Chris Travers


-- Darren Duncan

Rolf Schaufelberger

unread,
Aug 26, 2014, 6:09:57 AM8/26/14
to dbd...@perl.org, Chris Travers
Hello Chris,


Am 24.08.2014 um 17:07 schrieb Chris Travers <chris....@gmail.com>:

> Hi everyone;
>
> I am nearing completion on the logic to serialize composite types into textual representation of tuples for LedgerSMB and I am wondering if there is additional interest from others who use DBD::Pg. If there is we would certainly release the logic and interfaces on CPAN. So I figured I would ask.
>
> For those who would be interested, I guess I have a few questions:
>
> 1. Do you need Moose or Moo support?
> 2. Do you need plain old hashref support?
> 3. Do you need bytea support?
>
> If this is unwelcome please ignore but this seemed like a good place to ask since it is PostgreSQL/Perl-specific and many of you might have thought about using composite types in this way.
>
> In case it isn't clear what I am talking about is:
>
> given a type:
>
> CREATE TYPE foo (
> bar text,
> baz text
> );
>
> and a hashref {bar => 'foo', baz => 'this, or else that'}
>
> it should produce (foo,"this, or else that“)

I’ve done something similar a while ago but as a part of DBIx::Class and and I didn’t generalize it more than I needed for my $work.
For each type in my database I had a corresponding (Moose) class where every ‚column' was an attribute and in my BUILD method i parsed the string that came from the database.

Then I wrote a InflateColumn::PgType plugin overwriting the register_column method and so I could write :

__PACKAGE__->load_components(qw/InflateColumn::PgType Core/);

__PACKAGE__->table('pdf_format_param');

__PACKAGE__->add_column ( bg_tpl => { pg_type => 'PdfElement'}) ;


and later access „columns“ of PdfElement like

my $bg_start = $self->sp_pdf_param->bg_tpl->start;

Worked very well for me.
That time when I developed this, I was also thinking about making this more general, however types could be nested and as Greg said, when shall the mapping take place.
So I decided to tell DBIx::Class which column is custom type and to map it to which class.


Rolf Schaufelberger






Nicholas Clark

unread,
Aug 26, 2014, 7:28:09 AM8/26/14
to DBD PG
On Sun, Aug 24, 2014 at 07:15:46PM -0700, Chris Travers wrote:

> Memoize is core and has been since at least 5.8.9 according to what I have
> been able to find. This being said, memoization is dangerous when it comes
> to db stuff so it should definitely be optional if used at all.

First stable release with Memoize was 5.8.0

[please don't assume that I've read the rest of the thread :-)]

Nicholas Clark
0 new messages