I wonder if there is any work being done or thoughts being shared
regarding prepared SQL statements.
Currently I am in heavy need of tuning SQL INSERTS that insert large
(3MB) images into blobs.
I am using rails 2.3.5 with postgres adapter and pg driver.
Any information would be appreciated.
Jarl
Ivan Nastyukhin
diei...@me.com
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To unsubscribe from this group, send email to rubyonrails-co...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
>
Looking at the title, it seems he is suggesting the use of Prepared
Statements in ActiveRecord, which seems pretty reasonable, even if
storing images on database is not...
Rodrigo.
Here's some info on prepared statements in Postgres:
http://developer.postgresql.org/pgdocs/postgres/sql-prepare.html
I haven't used them often, but Sequel has them, and I recall on one
occasion being able to shave about 15-20% off the run time of a job
that had to do a few hundred thousand large inserts; this falls under
the category of a "single session is being used to execute a large
number of similar statements" that the doc describes.
While sometimes quite useful, I'm not sure where this would really fit
into AR. AR is almost exclusively an ORM by design, while Sequel is a
database manipulation library that also happens to provide an ORM.
Sequel's ORM doesn't automatically use them last time I checked.
It would be interesting to see the overall performance impact if, for
example, the typical "SELECT * FROM table WHERE id = ? LIMIT 1" that
AR generates were prepared rather than just executed directly. I would
guess that Postgres already has sufficient logic to cache query plans
for something like this even without using prepared statements, but
I'm not sure.
Regards,
Norman
> It would be interesting to see the overall performance impact if, for
> example, the typical "SELECT * FROM table WHERE id = ? LIMIT 1" that
> AR generates were prepared rather than just executed directly. I would
> guess that Postgres already has sufficient logic to cache query plans
> for something like this even without using prepared statements, but
> I'm not sure.
Ok, I was curious so I took a look into it.
Unless there's some logical flaw in my benchmark code, it looks like
prepared statements run about 50% faster overall for these types of
queries.
"Gist has been deleted"
Konstantin
> On Thu, Jul 8, 2010 at 12:24, Konstantin Haase <k.h...@finn.de>
> wrote:
>>
>> On Jul 8, 2010, at 17:15 , Norman Clarke wrote:
>>
>>> On Thu, Jul 8, 2010 at 11:14, Norman Clarke <nor...@njclarke.com>
>>> wrote:
>>>
>>>> It would be interesting to see the overall performance impact if,
>>>> for
>>>> example, the typical "SELECT * FROM table WHERE id = ? LIMIT 1"
>>>> that
>>>> AR generates were prepared rather than just executed directly. I
>>>> would
>>>> guess that Postgres already has sufficient logic to cache query
>>>> plans
>>>> for something like this even without using prepared statements,
>>>> but
>>>> I'm not sure.
>>>
>>> Ok, I was curious so I took a look into it.
>>>
>>> http://gist.github.com/468126
>>>
>>> Unless there's some logical flaw in my benchmark code, it looks like
>>> prepared statements run about 50% faster overall for these types of
>>> queries.
$ ruby -e 'puts((17.639 - 11.099) / 17.639)'
0.37076931798854806
That's 37% faster for prepared (or direct is 59% slower)
-Rob
Thanks, that looks like the kind of logical flaw I was talking about. :-)
There's a logical flaw in your benchmark code. The situation is a
little more nuanced
http://www.mysqlperformanceblog.com/2006/08/02/mysql-prepared-statements/
And for postgresql it can actually be significantly worse when using
prepared statements:
http://wiki.postgresql.org/wiki/FAQ#Why_is_my_query_much_slower_when_run_as_a_prepared_query.3F
However having said that, it's definitely something that could be
investigated as a feature in arel, then something we used with auto
generated methods like find_by_code_and_status. This has come up a
bunch of times but it never really got dragged over the line because
the benefits are marginal for a lot of use cases so people lose the
motivation to do the hard & fiddly work of finishing up.
--
Cheers
Koz
> Dear all great developers.
>
> I wonder if there is any work being done or thoughts being shared
> regarding prepared SQL statements.
OK. Let that be it (thanks Rodrigo Rosenfeld Rosas). I am interested
in any comments on work/thoughs regarding prepared statements. This
was not intended as a support discussion on whether or not images in
DB is a good choice or not.
If I had a patch that used exploited prepared statements (I don't
currently have), would the rails community be interested?
Jarl
Thanks for comments.
Norman Clarke <nor...@njclarke.com> writes:
> On Thu, Jul 8, 2010 at 07:58, Rodrigo Rosenfeld Rosas
> <rr.r...@gmail.com> wrote:
>> Looking at the title, it seems he is suggesting the use of Prepared
>> Statements in ActiveRecord, which seems pretty reasonable, even if storing
>> images on database is not...
>
> Here's some info on prepared statements in Postgres:
>
> http://developer.postgresql.org/pgdocs/postgres/sql-prepare.html
Yes Postgres has it, almost every (maybe not SQLite) SQL databases on
earth has the feature. Even the (ruby) pg driver for postgres supports
the feature.
My point is the Rails adapters (at least not postgres) does not
exploit it. Even the design of AR (ActiveRecord::Base) does not make
it easy to make a new Rails DB adapter that exploits the feature. It
requires som hacking on ActiveRecord(::Base) it self.
I am pretty sure that putting binary data in the database makes a huge
performace difference when comparing prepared statements and direct
exec() calls. The major difference is
AR has to quote, that is, escape all the byte. Meaning handing over 4
times as many bytes (each non-ascii is represented by a backslash and
the octal representation). After that it has to be passed to the DB
engine (through the ruby driver) and the parsed by the DB parser.
If prepared statements where used, the binary data (in memory of the
Rails app) could be passed as a binary type to the DB driver (using pg
for PostgreSQL).
I am pretty confident that could improve performance. I'll get back
if/when I have some measurements.
> Ok, I was curious so I took a look into it.
>
> http://gist.github.com/468116
Seems like this is the wrong link.
Jarl
Yes, I guess, but I can't speak for the entire Rails community :)
But I don't think prepared statements would be used by default on
ActiveRecord, but this should be explicited unless setting some option
for enabling its use in all cases.
Usually, it is better not to use prepared statements, unless some
statement will be batch executed where we can have some performance
gains, but that will not be always true. There is also support for the
situation that you described regarding binary data.
One of the bad things of using prepared statements if for debugging
queries. Under development environment, we can enable config to see the
SQL generated queries already with all conversions performed by Rails.
When using prepared statements, we would see an output like '["... where
a=?", true]' instead of "... where a=1" or "... where a='t'", etc. So it
is harder to guess how the driver is converting the objects to SQL
values. Specially if you want to run the entire generated SQL in some
tool...
In Sequel, it is possible to call a "sql" method to see the generated
SQL from a Sequel object. I missed this in ActiveRecord for a long time.
I guess this is possible for ActiveRecord in Rails 3 since it is using
Arel which takes a similar approach to Sequel, but I'm not sure... Do
someone know?
Rodrigo.
Yes (sort of). You can call #to_sql on an ActiveRecord::Relation to get the SQL it would most likely generate -- however, that breaks down when eager loading is involved. An excerpt from an article I wrote a while back (http://metautonomo.us/2010/05/11/activerecord-relation-vs-arel/):
If eager loading is a factor in your query, either due to explicit eager_loading values, orincludes values that are also used in your where conditions, then you’ll notice that the value returned by to_sql doesn’t reflect what actually gets run on the server:
-------
ruby-head > Article.includes(:comments).where(:comments => {:body => 'hey'}).to_sql
=> "SELECT \"articles\".* FROM \"articles\"
WHERE (\"comments\".\"body\" = 'hey')"
ruby-head > Article.includes(:comments).where(:comments => {:body => 'hey'}).all
# From the development.log
Article Load (0.4ms) SELECT "articles"."id" AS t0_r0, "articles"."title" AS t0_r1,
"articles"."body" AS t0_r2, "articles"."created_at" AS t0_r3, "articles"."updated_at"
AS t0_r4, "articles"."lookup_id" AS t0_r5, "comments"."id" AS t1_r0,
"comments"."article_id" AS t1_r1, "comments"."body" AS t1_r2,
"comments"."created_at" AS t1_r3, "comments"."updated_at" AS t1_r4
FROM "articles" LEFT OUTER JOIN "comments" ON
"comments"."article_id" = "articles"."id" WHERE ("comments"."body" = 'hey')
-------
Of course, given the nature of how eager loading and including works, you’d expect the query to differ when they’re involved. So to_sql is a useful tool to display what’s being generated in your queries — just be aware of its limitations and remember that it’s no substitute for checking the logs when confusion arises.
For instance, this is probably not possible in ActiveRecord:
class SomeTable < ActiveRecord::Base
set_table 'some_scheme.some_table'
end
> If eager loading is a factor in your query, either due to explicit eager_loading values, orincludes values that are also used in your where conditions, then you�ll notice that the value returned by to_sql doesn�t reflect what actually gets run on the server:
>
> ...
>
> Of course, given the nature of how eager loading and including works, you�d expect the query to differ when they�re involved. So to_sql is a useful tool to display what�s being generated in your queries � just be aware of its limitations and remember that it�s no substitute for checking the logs when confusion arises.
>
Is it difficult to add some method to ActiveRecord that displays exactly
the SQL that will be run, or that would be run if eager loading was set?
I mean, maybe some methods like "eager_sql" and "lazy_sql"? I don't know
ActiveRecord's code, but I guess that sometime it will need to generate
the full sql string, so the method should be already there... Is there
any reasons why we can't just see the full sql statement without
actually running the query? Sometimes it would even easy some test/spec
writing or even make some of them to run faster...
Thanks for the article,
Rodrigo.
I don't think it would be terribly difficult, no -- it'd basically be doing arel.to_sql or a partial version of find_with_associations depending on the result of eager_loading? that looks something like this and sticking it in relation.rb (not tested):
def real_sql
if eager_loading?
including = (@eager_load_values + @includes_values).uniq
join_dependency = ActiveRecord::Associations::ClassMethods::JoinDependency.new(@klass, including, nil)
construct_relation_for_association_find(join_dependency).to_sql
else
arel.to_sql
end
end
I think it's just been a matter of there not being a real use case requiring it to be in core. I could throw it into MetaWhere easily enough if enough people thought it'd be useful though. I'm trying to position it as a sort of "AR Query Interface 3.0 on steroids" anyway.
-Ernie
Ernie, maybe if the sql generation part was extracted in a separate
method and called by find_with_associations, this could simplify
understanding the code as well as allowing the generated sql to be used
outside this method. For instance, someone could think it is useful to
log the SQL queries in database for taking statistics and deciding which
queries to invest effort to optimize based on frequency or just to help
understanding what is happening to their relations.
Suppose you have some real slow query (takes up more than a minute to
run) and you're trying to optimize it. Instead of waiting for the query
to finish, if you are only interested in seeing the generated SQL for
some analysis, the possibility to just view the SQL would be great.
Specially if you are going to make several attempts to rewrite the
relations and observe the results in SQL generation...
Well, I'm saying that because I used this feature a lot while working
with Sequel and I thought it would be a good addition to ActiveRecord
and I don't see any disadvantages in exposing this method to ActiveRecord.
If you decide to write this helpful method, I would be extremely glad!
I'm already glad you took the path to integrate Arel in ActiveRecord and
provided a much better interface to AR anyway ;)
Best regards,
Rodrigo.
>
> If you decide to write this helpful method, I would be extremely glad! I'm already glad you took the path to integrate
> Arel in ActiveRecord and provided a much better interface to AR anyway ;)
Just a clarification -- I appreciate the credit, but while I did the MetaWhere plugin, and contributed a tiny patch to Arel and AR here and there, the overwhelming majority of the work you're crediting was done by Emilio Tagua and Pratik Naik. :)
>> I don't think it would be terribly difficult, no -- it'd basically be doing arel.to_sql or a partial version of find_with_associations depending on the result of eager_loading? that looks something like this and sticking it in relation.rb (not tested):
>>
>> def real_sql
>> if eager_loading?
>> including = (@eager_load_values + @includes_values).uniq
>> join_dependency = ActiveRecord::Associations::ClassMethods::JoinDependency.new(@klass, including, nil)
>> construct_relation_for_association_find(join_dependency).to_sql
>> else
>> arel.to_sql
>> end
>> end
>>
>> I think it's just been a matter of there not being a real use case requiring it to be in core. I could throw it into MetaWhere easily enough if enough people thought it'd be useful though. I'm trying to position it as a sort of "AR Query Interface 3.0 on steroids" anyway.
>>
>>
>
> Ernie, maybe if the sql generation part was extracted in a separate method and called by find_with_associations, this could simplify understanding the code as well as allowing the generated sql to be used outside this method. For instance, someone could think it is useful to log the SQL queries in database for taking statistics and deciding which queries to invest effort to optimize based on frequency or just to help understanding what is happening to their relations.
>
> Suppose you have some real slow query (takes up more than a minute to run) and you're trying to optimize it. Instead of waiting for the query to finish, if you are only interested in seeing the generated SQL for some analysis, the possibility to just view the SQL would be great. Specially if you are going to make several attempts to rewrite the relations and observe the results in SQL generation...
>
> Well, I'm saying that because I used this feature a lot while working with Sequel and I thought it would be a good addition to ActiveRecord and I don't see any disadvantages in exposing this method to ActiveRecord.
Just as a quick update: I tried my untested code above, and by some miracle it actually worked, so I just pushed version 0.5.2 of MetaWhere to GitHub and Rubygems.org. The method's been renamed debug_sql, though. real_sql just reminded me too much of PHP's mysql_real_* functions and anything that makes me think of PHP tends to make me rethink it... :)
Happy debugging! :)
-Ernie
Ok, let me understand this better. I've just noted we are talking about
a plugin, MetaWhere and not about Rails core behavior, right? "bundle
show" doesn't list the meta-where plugin as a dependency of Rails...
So, I guess it is currently not possible to view the queries in a fresh
new Rails application with no plugins installed, right? Unless we look
at the log, of course...
I had taken a look at the meta-where plugin README in github. It seems
interesting but I was worried about overriding the operators of symbols.
If another plugin decides to do the same for another reason and with
different behaviors, this could become a mess that would be hard to
debbug... The idea is interesting but I think that Ruby, with all its
powerfulness, still isn't able to define operators to classes depending
on some context. For instance, if the symbol operator is being called
inside some method of specific class or module, than some operators
would have some specific implementation... Maybe this would make Ruby
even more confuse, even though it would get more powerful... :)
Anyway, I don't really mind writing SQL fragments. What I really hate is
trying to understand a really big SQL statements with lots of joins.
Writing the same query using chainable syntax like in ActiveRecord/Arel
or Sequel makes it a lot easier to separate the SQL statement in more
granular logic that is easier to understand.
Anyway, thank you for the debug_sql method :)
Best regards,
Rodrigo.
This should probably migrate off-list, but the symbol operator
overloads are opt-in only at this point and not enabled in a default
install. Hit me up via email or on github for further MW discussion.
However My primary reason for considering prepared statements is not
the usual save-the-parsing performance argument. My main reason is
actuall to save many copying of large binary objects which have been
quoted/escaped to 4 times the size, and the unquote/unescape it again
in the DB engine. This (I believe) could be saved by using prepared
statements, where the DB API allows a more direct handover of binary
data.
Jarl
As I've said on past messages, I agree with you. I think the best
argument for supporting prepared statements is exactly a better support
for binary data handling. I just don't think it should be enabled by
default for all queries...
Best regards,
Rodrigo.
Agree. However the current (2.3.x) design/architecture of ActiveRecord
does not even let the DB adapters exploit prepared statements of their
underlying db-library. I don't know if things are different in 3.x
Jarl