SQL::Abstract::Pg

107 views
Skip to first unread message

sri

unread,
Jan 28, 2018, 5:23:54 PM1/28/18
to Mojolicious
Just wanted to give you a quick heads up about a new module we've just
added to the Mojo::Pg distribution. So far the SQL generation features
we've been able to provide with CRUD methods like ->select have been
limited to current features of SQL::Abstract.

That's about to change with the introduction of SQL::Abstract::Pg (and
similar SQL::Abstract subclasses that will follow for other databases,
SQL::Abstract::SQLite...)


There's already quite a few new features, including joins and upsert,
take a look! And if you have any ideas for more features we could add,
please let us know.

--
sebastian

Stefan Adams

unread,
Jan 31, 2018, 10:42:13 PM1/31/18
to mojolicious
I spoke with you about this briefly on Twitter, but just figured I'd put it out here, too.  What about multiple record inserts on a single call?  Twitter thread.

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious+unsubscribe@googlegroups.com.
To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Sebastian Riedel

unread,
Feb 1, 2018, 7:42:10 AM2/1/18
to mojol...@googlegroups.com
> I spoke with you about this briefly on Twitter, but just figured I'd put it
> out here, too. What about multiple record inserts on a single call?
> Twitter thread.

Yes, it's possible, not sure about how useful it would actually be though.

--
Sebastian Riedel
http://mojolicio.us
http://github.com/kraih
http://twitter.com/kraih

Stefan Adams

unread,
Feb 2, 2018, 11:17:01 PM2/2/18
to mojolicious
On Thu, Feb 1, 2018 at 6:42 AM, Sebastian Riedel <kra...@gmail.com> wrote:
> I spoke with you about this briefly on Twitter, but just figured I'd put it
> out here, too.  What about multiple record inserts on a single call?
> Twitter thread.

Yes, it's possible, not sure about how useful it would actually be though.

This makes sense!  I'm able to do this below, which is all that I really need:

  my $i = $pg->db->dbh->prepare('insert into b (first, last, birthday, age, phone) values(?, ?, ?, ?, ?)');

  my $results = $pg->db->select('a', [qw/first last birthday age phone/], undef, {limit => $limit, offset => 1});

  while ( my $next = $results->array ) {

    $i->execute(@$next);

  } 

FWIW, I ran a quick benchmark on 3 different algorithms to insert bulk data into a database, and the method above was the fastest by far.  Is there a better / faster algorithm still to use for inserting bulk data?

Dan Book

unread,
Feb 3, 2018, 12:46:13 AM2/3/18
to mojol...@googlegroups.com
A bulk insert would look more like:
my $rows = $pg->db->select(...)->arrays;
$pg->db->query('insert into b (first, last, birthday, age, phone) values ' . join(',', ('(?,?,?,?,?)')x@$results), @$results);

Or with postgres you could probably use arrays rather than constructing the query with join and x but I wouldn't know how off the top of my head.

-Dan

Abel Abraham Camarillo Ojeda

unread,
Feb 3, 2018, 1:40:55 AM2/3/18
to mojol...@googlegroups.com
On Fri, Feb 2, 2018 at 10:16 PM, Stefan Adams <s103...@gmail.com> wrote:
>
>
> On Thu, Feb 1, 2018 at 6:42 AM, Sebastian Riedel <kra...@gmail.com> wrote:
>>
>> > I spoke with you about this briefly on Twitter, but just figured I'd put
>> > it
>> > out here, too. What about multiple record inserts on a single call?
>> > Twitter thread.
>>
>> Yes, it's possible, not sure about how useful it would actually be though.
>
>
> This makes sense! I'm able to do this below, which is all that I really
> need:
>
> my $i = $pg->db->dbh->prepare('insert into b (first, last, birthday, age,
> phone) values(?, ?, ?, ?, ?)');
>
> my $results = $pg->db->select('a', [qw/first last birthday age phone/],
> undef, {limit => $limit, offset => 1});
>
> while ( my $next = $results->array ) {
>
> $i->execute(@$next);
>
> }
>

in postgres you should be able, in case of inserting from one table to other:

insert into b (first, last, birthday, age, phone) select first, last,
..., phone from a;

and you save the whole round-trip of data.

> FWIW, I ran a quick benchmark on 3 different algorithms to insert bulk data
> into a database, and the method above was the fastest by far. Is there a
> better / faster algorithm still to use for inserting bulk data?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Mojolicious" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mojolicious...@googlegroups.com.

Илья Рассадин

unread,
Feb 3, 2018, 7:20:27 AM2/3/18
to Mojolicious
I think, the case is not about inserting bulk data from one table to another.

For example, we can buffer minion jobs via enqueue and insert them to minion_jobs all at once in the end of request.

That feature also is very valuable for highload project if you want to store some kind of logs in the database.

So bulk inserts is definitely a good thing and I use them a lot in my daily job.

PS: by the way, PostgreSQL supports bulk updates... Maybe, it should be in a feature list too. 



суббота, 3 февраля 2018 г., 9:40:55 UTC+3 пользователь Abel Abraham Camarillo Ojeda написал:

sri

unread,
Feb 3, 2018, 8:24:47 AM2/3/18
to Mojolicious
For example, we can buffer minion jobs via enqueue and insert them to minion_jobs all at once in the end of request.

That feature also is very valuable for highload project if you want to store some kind of logs in the database.

So bulk inserts is definitely a good thing and I use them a lot in my daily job.


Seems like reasonable use cases, if someone finds a good way to implement bulk inserts i'll add the feature.

--
sebastian 

Stefan Adams

unread,
Feb 4, 2018, 2:16:55 AM2/4/18
to mojolicious
Transactions provide insane bulk insert performance:

It is of note here that each insert is a transaction. What this means is Postgres is doing some extra coordination to make sure the transaction is completed before returning. On every single write this takes some overhead. Instead of single row transactions, if we wrap all of our inserts in a transaction, we’ll see some nice performance gains. -- CitusData Blog

Using transactions, my benchmark improves performance by reducing the insert time of 150,000 records from 70s to 1s.

# while ->hash; insert
100.194 wallclock secs (38.81 usr +  2.96 sys = 41.77 CPU) @  0.02/s (n=1)
# prepare; while ->array; execute
71.047 wallclock secs ( 4.56 usr +  1.63 sys =  6.19 CPU) @  0.16/s (n=1)
# begin tx; prepare; while ->array; execute; commit
1.37854 wallclock secs ( 0.26 usr +  0.18 sys =  0.44 CPU) @  2.27/s (n=1)

I'm not sure if or how Mojo::Pg could provide this behavior, or if, as Sebastian said at the beginning, it's really all that useful for Mojo::Pg.

Here's one way to accomplish a high-performing transaction-based bulk insert:

my $results = $pg->db->select('a', [qw/first last birthday age phone/], undef, {limit => $limit, offset => 1});
$pg->db->bulk_insert('b', $results->hashes->to_array);

And in Mojo::Pg::Database, add a new bulk_insert method:

# $records is an array of hashes
sub bulk_insert {
  my ($self, $table, $records, $options) = @_;
  my ($stmt) = $self->pg->abstract->insert($table, $records->[0], $options);
  eval {
    my $tx = $self->pg->db->begin;
    my $i = $self->pg->db->dbh->prepare($stmt);
    while ( my $next = shift @$records ) {
      $i->execute(@$next{sort keys %$next});
    }
    $tx->commit;
  };
  return $@ if $@;
}


--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious+unsubscribe@googlegroups.com.

Stefan Adams

unread,
Feb 4, 2018, 2:45:30 AM2/4/18
to mojolicious
On Sun, Feb 4, 2018 at 1:16 AM, Stefan Adams <s103...@gmail.com> wrote:
as Sebastian said at the beginning, it's really all that useful for Mojo::Pg.

My use case for it is that I'm building a web service cache.  We use a SaaS ERP-type database that provides a SOAP API that's super slow.  The API documentation even recommends caching the data to a local database for better performance, so I'm doing exactly that.  An hourly cronjob will kick off a Mojo app command that syncs the SaaS database to a local database.  There's nothing I can do to speed up the 500-record limit retrieval, but if there's anything I can do to speed up the insert so I can move on to the next batch of records, that'd be great!  Actually, there's one thing the SaaS database provides that helps me to speed up the retrieval: and that's fetching only records that have changed since a provided last_modified date.

A bulk_upsert would be great, too.  I haven't tested this, but it seems like it would work?

sub bulk_upsert {
  my ($self, $table, $records, $options) = @_;
  my ($stmt) = $self->pg->abstract->insert($table, $records->[0], {on_conflict => [$table, $records->[0]], %$options});
  eval {
    my $tx = $self->pg->db->begin;
    my $i = $self->pg->db->dbh->prepare($stmt);
    while ( my $next = shift @$records ) {
      $i->execute(@$next{sort keys %$next}, @$next{sort keys %$next});

sri

unread,
Feb 4, 2018, 5:04:55 AM2/4/18
to Mojolicious
sub bulk_insert {
  my ($self, $table, $records, $options) = @_;
  my ($stmt) = $self->pg->abstract->insert($table, $records->[0], $options);
  eval {
    my $tx = $self->pg->db->begin;
    my $i = $self->pg->db->dbh->prepare($stmt);
    while ( my $next = shift @$records ) {
      $i->execute(@$next{sort keys %$next});
    }
    $tx->commit;
  };
  return $@ if $@;
}

There's a lot wrong with this. You should hold on to $db objects, there's a pool of connections in
Mojo::Pg. Manual prepare/execute is pointless, Mojo::Pg::Database has a transparent statement
handle cache, once you hold on to the $db object you get caching for free without having to do
anything. Just pass the same data structure to ->insert or same SQL to ->query and you're done.

--
sebastian

Stefan Adams

unread,
Feb 4, 2018, 9:28:05 AM2/4/18
to mojolicious
On Sun, Feb 4, 2018 at 4:04 AM, sri <kra...@gmail.com> wrote:
There's a lot wrong with this. You should hold on to $db objects, there's a pool of connections in
Mojo::Pg. Manual prepare/execute is pointless, Mojo::Pg::Database has a transparent statement
handle cache, once you hold on to the $db object you get caching for free without having to do
anything. Just pass the same data structure to ->insert or same SQL to ->query and you're done.

Thanks for the feedback!  Is this a correct implementation of what you're saying?

my $db = $pg->db;
my $results = $db->select('a', [qw/first last birthday age phone/], undef, {limit => $limit, offset => 1});
my $tx = $db->begin;
while ( my $next = $results->hash ) {
  $db->insert('b', $next);
}
$tx->commit;

sri

unread,
Feb 4, 2018, 9:39:34 AM2/4/18
to Mojolicious
Thanks for the feedback!  Is this a correct implementation of what you're saying?

my $db = $pg->db;
my $results = $db->select('a', [qw/first last birthday age phone/], undef, {limit => $limit, offset => 1});
my $tx = $db->begin;
while ( my $next = $results->hash ) {
  $db->insert('b', $next);
}
$tx->commit;

Almost. You do not want more than one active statement handle per connection. The while loop
is pointless there anyway, because DBD::Pg does not support cursors. So all results will be sent
to DBD::Pg as soon as you call $results->hash for the first time anyway. Better to use
$results->hashes->each, which releases the statement handle and frees up the connection for
inserts.

--
sebastian

Stefan Adams

unread,
Feb 5, 2018, 12:12:51 AM2/5/18
to mojolicious
On Sun, Feb 4, 2018 at 8:39 AM, sri <kra...@gmail.com> wrote:
Almost. You do not want more than one active statement handle per connection. The while loop
is pointless there anyway, because DBD::Pg does not support cursors. So all results will be sent
to DBD::Pg as soon as you call $results->hash for the first time anyway. Better to use
$results->hashes->each, which releases the statement handle and frees up the connection for
inserts.

This helps a lot, thank you!  Am I following your suggestion correctly with this:

my $db = $pg->db;
my $tx = $db->begin;
$results->hashes->each(sub{$db->insert('b', $_)});
$tx->commit;

Franz A Pertl

unread,
Jun 12, 2018, 1:50:41 PM6/12/18
to Mojolicious
Sebastian,
Thanks for all the fabulous code!

Just a question on SQL::Abstract::Pg.  The following code

use 5.016;
use Mojo::Pg;
use SQL::Abstract::Pg;

my $abstract = SQL::Abstract::Pg->new;
say $Mojo::Pg::VERSION;
say $abstract->select(['foo', ['bar', 'foo.id' => 'bar.foo_id']]);

outputs:
4.08
SELECT * FROM foo JOIN bar ON (bar.foo.id = foo.bar.foo_id)

but I was expecting
4.08
SELECT * FROM foo JOIN bar ON (foo.id = bar.foo_id)


Is there anything particular I'm doing wrong?

Dan Book

unread,
Jun 12, 2018, 1:57:45 PM6/12/18
to mojol...@googlegroups.com
Make sure the name_sep option is set on your SQL::Abstract::Pg object so it sees the separator provided. See https://metacpan.org/source/SRI/Mojo-Pg-4.08/lib/Mojo/Pg.pm#L14-18 for how Mojo::Pg initializes it.

-Dan

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.

Franz A Pertl

unread,
Jun 12, 2018, 3:36:27 PM6/12/18
to Mojolicious
Thanks! Works perfectly when properly initialized with separator ;)
 

my $abstract = SQL::Abstract::Pg->new(
    array_datatypes => 1,
    name_sep        => '.',
    quote_char      => '"');


Reply all
Reply to author
Forward
0 new messages