Please comment on this.
Thank you.
I have documented the design at: http://izumi.plan99.net/blog/?p=35
Thanks. :) The use of prepared statements in MySQL/PostgreSQL will
completely replace the argument escaping stuff. Database adapters that
don't support prepared statements (or haven't implemented support yet)
will silently fallback to argument escaping, as is done now.
> Also, did you follow the "ActiveRecord refactoring"
> thread [1]? Did Zach [2][3] really do it?
>
> [1]http://groups.google.com/group/rubyonrails-core/browse_thread/thread/...
> [2]http://www.continuousthinking.com/
> [3]http://groups.google.com/groups/profile?enc_user=WZYiNRUAAADUiWRahsdn...
I didn't follow any of these threads, but I have read the
ActiveRecord::Extensions website in the past.
This prepared statements support is entirely my own work, and is not
related to Zach's work.
The use of prepared statements in MySQL/PostgreSQL will
completely replace the argument escaping stuff.
The current implementation immediately removes prepared statements
after using them once, so they will be reparsed twice. Right now I'm
trying to make it actually work and getting the framework in place.
Optimization can be done later (and I'm fairly confident that it can
be optimized).
In the future I can add caching. The cache will have a limited size,
and items can be removed based on least-recently-used or some other
criteria, similar to how CPU caches work.
The current implementation immediately removes prepared statements
after using them once
Actually, no. One of the problems with quoting string manually is
excessive memory usage in Rails. For example, if you're uploading a 1
MB photo to your database, then Rails has to quote a 1 MB string,
which is very slow and results in more than 1 MB memory usage because
of string overheads. Using prepared statements will remove this
problem.
And I'm not sure that using prepared statements (and immediately
deleting them) really makes things slower. Why would it be? After all,
if you're not using prepared statements then the database will have to
reparse each statement every time anyway. This will need more
benchmarking.
The prepared statement caching that I mentioned should be trivial to
implement. It's on my todo list.
It'll be somewhat slower than just throwing a SQL statement at the
database because it adds extra synchronous communication associated
with preparing the statement. In most cases it'll be negligible.
From past experiences, I would expect to see a noticeable, if not
huge, gain from caching prepared statements. At least, with Oracle.
From past experiences, I would expect to see a noticeable, if not
huge, gain from caching prepared statements. At least, with Oracle.
It is in my corner of the woods. Let me not mention that evil E-word again :)
> and they would see a noticable gain only when these prepared statements
> were reused for multiple queries.
It's the same with Oracle. Prepared statement is merely a way to avoid
hitting SQL parser and query optimizer.
--
Alex
That's all good and well but right now I have my hands full on making
it *work* (I have a ton of unit test failures to fix). I promise you I
will take care of performance once this actually works.
Are there any other discussion points left?
Are there any other discussion points left?
I've done some testing w/ Oracle, and a very significant gain comes from
simply using real bind variables. Caching the prepared statements in the
client doesn't matter much at all, presuming you're going to have
multiple clients. But the use of real bind variables allows Oracle
itself to cache the statements and avoid multiple hard parses.
It noticed this comment in activerecord/test/binary_test.rb:
# Without using prepared statements, it makes no sense to test
# BLOB data with SQL Server, because the length of a statement is
# limited to 8KB.
#
# Without using prepared statements, it makes no sense to test
# BLOB data with DB2 or Firebird, because the length of a statement
# is limited to 32KB.
So it does make sense to add support for prepared statements in Rails.
And I forgot to say this in my last email. Before I added the fallback
code, I implemented a cache for compiled prepared statements. The
cache hit rate is 65%, but it resulted in no noticeable performance
improvement, at least on MySQL.
I know this is tangential, but is there any good reason for storing
images in the database?
I can think of a few, but they're not very good;
- security (but that can be handled with http or other auth).
- logging (you want to send your images through rails so you know
the # of downloads)
- ??
I've always advised people that filesystem is preferrable (it's a
file, keep in in the file system) since you can easily back it up, and
you don't have to hit rails to get the image (sloooow, and uses up an
app listener).
If you ignore the problem of quoting 1mb strings from image uploads,
what other reason is there for prepared statements?
- All data is in one place. That may make backupping easier.
- Support for transactions. Suppose that you not only store the image,
but also some metadata. If I first save the image file on disk, and
then save the metadata, but the computer crashes during metadata
saving, then I'm left with a stale file that's not used. If I first
save the metadata and then saves the image file, and the computer
crashes during image file saving, then I have a corrupted image file
but intact metadata.
> If you ignore the problem of quoting 1mb strings from image uploads,
> what other reason is there for prepared statements?
Oracle support. Oracle lives by prepared statements, even if those
prepared statements aren't reused, as someone has pointed out. One can
argue that few Rails developers use Oracle, but I can also argue that
that's because Rails performs badly on Oracle.
Furthermore:
- My code will fallback to using "unprepared statements" if it thinks
that using prepared statements is not necessary, so performance is not
a problem.
- It moves a lot of the statement building code to separate classes.
This makes ActiveRecord::Base smaller, and thus easier to read and to
maintain.
I like the gist of your idea and approach. Thanks for working on it.
I'd start with using bound variables first, though. They're always
beneficial and are a natural stepping stone to prepared statements
which can be added later for those queries which may benefit (many do
not).
Sorry I didn't comment on your earlier postgres patch: it includes a
lot of superfluous style edits and untested changes, so I didn't
review further. I salute your effort but it could use a lot of cleanup
before committing.
Are you attending RailsConf this week?
jeremy
So I guess my question is: Is flushing the statements really necessary
or are you just working under an assumption the database (Postgres/
MySQL) will blindly allow it to grow to unreasonable levels?
I'd have to disagree with you there. Rails actually performs very well
with Oracle. A change made a while back to default to "similar"
cursor_sharing resulted in much of the benefit of bind variables.
"performs badly" is definitely too strong.
From my tests w/ Oracle, all the benefit was in the use of (real, aka
database) bind variables. For a while now I've had a working (passes all
tests) implementation of AR that uses bind variables for inserts and
updates, those being by far the easiest to tweak. Given how widely
spread throughout the codebase the selects are, I haven't had the time
to think through a reasonable approach there.
Though I like what was suggested (by you I think) back in Feb, around
the idea of SqlString < String. Has anyone looked into that further?
> Are you attending RailsConf this week?
Perhaps a group of us could get together to dig into this?
Thanks. :) Though I don't exactly understand what you mean - aren't
prepared statements and bounded variables the same thing? That is, is
it not so that one cannot exist without the other?
My patch uses both prepared statements and bounded variables, in case
that wasn't clear.
> Sorry I didn't comment on your earlier postgres patch: it includes a
> lot of superfluous style edits and untested changes, so I didn't
> review further. I salute your effort but it could use a lot of cleanup
> before committing.
No need to apologize. And I'd like to add a minor correction: the
PostgreSQL patch isn't mine, I'm merely trying to get attention on
that patch, because it has a lot of fixes that I depend on.
> Are you attending RailsConf this week?
I will not be attending RailsConf.
I have no experience with Oracle, but according to some blogs, such as
this one:
http://uncommentedbytes.blogspot.com/2006/07/major-enterprise-ruby-on-rails-issues.html
Rails performs badly because of the lack of use of prepared
statements. Here is a quote from Greg Luck:
"We have two production applications running on Ruby. And how is it.
Well, despite being perhaps no more than 5% of the functionality of
our applications, Ruby on Rails is the number one consumer of Oracle
CPU and logical gets. Why? Rails does not support prepared statements,
so Oracle has to reparse every time."
And I forgot to mention this last time, but there's one more reason
for using prepared statements/bounded variables: binary support in
PostgreSQL. I have a Rails app which uses PostgreSQL, and its
Person.password_password column was of the type bytea. It contains a
raw hash of the user's password. When a user logs in, the app
performed the following query:
Person.find_by_username_and_password_hash(username,
calculate_hash(password))
This will send a query to PostgreSQL, and this query will contain a
binary string. However, in PostgreSQL, binary strings must be escaped
with escape_bytea() - the regular string escaping function will not
do. Because the binary string isn't escaped properly, the whole
database connection breaks (it throws an exception, something about
"character not allowed in Unicode mode" or something).
One may wonder why the binary string isn't escaped correctly. The
reason is this:
1. It seems impossible in Ruby to determine whether a String contains
binary data without scanning through the entire String (expensive), so
the current PostgreSQL adapter assumes that strings are text strings,
and escapes them as text strings, unless it knows that this string
belongs to a binary column.
2. During ActiveRecord::Base.save, ActiveRecord will now which
attributes belong to which columns, so saving records that contain
binary data is no problem. The string escaper will have knowledge
about which argument belongs to which column.
3. However, things go wrong during ActiveRecord::Base.find. In there,
it is impossible to find out which arguments belong to which table
columns.
By using prepared statements/bounded variables, we can eliminate this
entire problem. Strings will not be escaped, but be directly bounded
to a prepared statements.
I'm sorry, I do not understand what you mean by "flushing statements"
and "allow it to grow to unreasonable levels".
This was before Rails 1.2 which introduced the cursor_sharing=similar
session parameter. It's oracle-specific and makes the database rewrite
the queries on the fly to use bind variables instead of literal values.
Btw even before 1.2 you may had used that parameter (only database-wide
instead of on a single application).
In any case using real bind variable would represent a further
optimization step, as it'd allow a fine tuning of which bind variables
vs literal values use in each query.
Luca
--
Web: http://spazidigitali.com - http://thetyper.com
Email mailto://luca.m...@gmail.com
Skype callto://l.mearelli
--
Agreed. In my tests the _client_ performance impact of using real bind
variables instead of relying on cursor_sharing of similar was basically
non-existent, but there was some (small, but still some) server
improvement. Plus some Oracle query plans worked better (ie., required
fewer hints) when constants were left as constants.
I wish :)
> > Are you attending RailsConf this week?
>
> Perhaps a group of us could get together to dig into this?
That'd be awesome. I'll be at the hackfest at the Jupiter hotel the
nights of the 17th-19th.
It's at 800 E Burnside St, about a half mile from the convention center.
jeremy
>>>Are you attending RailsConf this week?
>>
>>Perhaps a group of us could get together to dig into this?
>
>
> That'd be awesome. I'll be at the hackfest at the Jupiter hotel the
> nights of the 17th-19th.
> It's at 800 E Burnside St, about a half mile from the convention center.
aahh I won't be at the conf... any way to meet in a virtual space?
perhaps the #railsconf irc channel?
I tested this with MySQL. I can't have unlimited number of prepared
statements. After a certain threshold, queries will fail.
My MySQLAdapter implementation caches up to 32 prepared statements
(this number can be changed). If the cache is full, the least recently
used one will be removed. Though, according to benchmarks, caching
prepared statements doesn't improve performance at all on MySQL.
I tested this with MySQL. I can't have unlimited number of prepared
statements. After a certain threshold, queries will fail.
... according to benchmarks, caching
prepared statements doesn't improve performance at all on MySQL.
For some reason I'm unable to reproduce that problem now. Perhaps I
was mistaken.
> It is possible that there is a performance boost on raw queries, but that
> it's not visible because the ActiveRecord object instantiation time is an
> order of magnitude higher.
>
> Also, simple queries against a database that has a miniscule amount of
> records don't benefit much from having a cached execution plan. You could
> first benchmark several hundred thousand INSERTs, then UPDATE those records,
> then query those tables with SELECTs with as much conditions and joins you
> could throw in. Don't forget the database indexes which play an important
> role in laying out the execution plan.
I've searched Google for benchmarks on prepared-vs-unprepared
statements, and to my surprise I couldn't find any. I'll try to setup
a good benchmark.
Yes, I'd be delighted to. :) I hope we can solve this together.
Brian Hartin
I was just wondering how far this development has reached and when I
would be able to use it. Planning a large rails project around Oracle
soon. Good thing I know that the existing rails work and in future we
would be able to get much better performance.
Deepak
Good to see someone appreciates my work. :) MySQL support is probably
done. I was still working on PostgreSQL support. There's no Oracle
support though as I don't have access to an Oracle database, but
modifying the Oracle adapter should be fairly trivial.
The work is currently on hold as I'm busy with other things, but I
intent to resume this work in the near future.
> > Very interesting to see you working on this project. I work a lot for
> > Enterprises and it makes a lot of difference when it comes to Oracle
> > with prepared statements and data binding. In case of loops etc. we
> > had find the performance difference to be in multiples.
>
> > I was just wondering how far this development has reached and when I
> > would be able to use it. Planning a large rails project around Oracle
> > soon. Good thing I know that the existing rails work and in future we
> > would be able to get much better performance.
>
> Good to see someone appreciates my work. :) MySQL support is probably
> done. I was still working on PostgreSQL support. There's no Oracle
> support though as I don't have access to an Oracle database, but
> modifying the Oracle adapter should be fairly trivial.
> The work is currently on hold as I'm busy with other things, but I
> intent to resume this work in the near future.
Did I hear you live in the Enschede region? :-) I do too and would
like to offer my help to implement the PostgreSQL side of things. Send
me a note if you'd like to meet up sometime and I'd gladly share
drinks and thoughts.
--
Roderick van Domburg
http://www.nedforce.nl