ActiveRecord::Querying#find_by_sql has a confusing interface?

2,017 views
Skip to first unread message

Matias Korhonen

unread,
Jun 25, 2015, 8:38:09 AM6/25/15
to rubyonra...@googlegroups.com
Hi,


We happened to need #find_by_sql today and noticed that it appears to have a somewhat confusing interface/method definition/documentation.

The method definition is:

def find_by_sql(sql, binds = [])
  # ...
end

Which would seem to imply that you should use it like this:

Post.find_by_sql("SELECT * FROM posts WHERE id = ?", [1])

However, that is wrong, you actually need to do (and the examples in the documentation are like this):

Post.find_by_sql(["SELECT * FROM posts WHERE id = ?", 1])

I spent some time reading the documentation and code and can't figure out what the binds argument should be used for.

Maybe some sort of example or explanation should be added to the #find_by_sql documentation? I'd do so myself, but I'm having trouble figuring it out…


— Matias

Jason Fleetwood-Boldt

unread,
Jun 25, 2015, 10:18:20 AM6/25/15
to rubyonra...@googlegroups.com

You are incorrect the documentation is accurate.

You need to have a deeper understanding of ruby, and specifically the magic of the splat (*) operator. 

You do not need to pass all the arguments inside of an array, the correct syntax is:

Post.find_by_sql("SELECT * FROM posts WHERE id = ?", 1)

‘binds’ is just what you think it is, the variables assigned to the replacements noted by the ? characters. If you had two, it would look like so:


Post.find_by_sql("SELECT * FROM posts WHERE id = ? AND foo != ?", 1, 2)

This works because of how the splat operator works under the hood, taking the arity past the first argument and turning it into an array on the receiving side of the called method (find_by_sql)

You should have a deeper understanding of the advanced parts of Ruby (like splats) before moving into Rails.

The documentation is standard according to how Ruby is documented. 

-Jason



--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
Visit this group at http://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

James Coleman

unread,
Jun 25, 2015, 10:22:14 AM6/25/15
to rubyonra...@googlegroups.com
>> You do not need to pass all the arguments inside of an array, the correct syntax is:

>> Post.find_by_sql("SELECT * FROM posts WHERE id = ?", 1)

And yet that's not what the document examples do...

Does the method doc not show the splat operator? Because in the quoted method doc, the * isn't shown, which would lead to confusion even if that's what being used under the hood.

Jason Fleetwood-Boldt

unread,
Jun 25, 2015, 10:30:38 AM6/25/15
to rubyonra...@googlegroups.com
Oh I see what you mean. You’re saying that the docs example show this:

Post.find_by_sql ["SELECT title FROM posts WHERE author = ? AND created > ?", author_id, start_date]
Post.find_by_sql ["SELECT body FROM comments WHERE author = :user_id OR approved_by = :user_id", { :user_id => user_id }]


But again, knowledge of Ruby tells me that the is no programatic difference between calling a method with an array and passing arguments to an array. (Unless this has changed in Ruby, I’m pretty sure ruby basically treats these as the very same thing)

I see that in other parts of the Ruby documentation, the splat is shown in the method title (which is really just an arbitrary thing for us the developers since you never actually write this)


attr_accessible(*attributes)


So in this case I would say that the correct documentation should be:

find_by_sql(sql, *binds = [])

Matias Korhonen

unread,
Jun 25, 2015, 10:41:04 AM6/25/15
to rubyonra...@googlegroups.com
You do not need to pass all the arguments inside of an array, the correct syntax is: 
Post.find_by_sql("SELECT * FROM posts WHERE id = ?", 1)

I wonder if you actually tried that in console? Because I did, with this result:

[1] pry(main)> Post.find_by_sql("SELECT * FROM posts WHERE id = ?", 1)
NoMethodError: undefined method `empty?' for 1:Fixnum
from /Users/matt/.rvm/gems/ruby-2.2.1/gems/activerecord-4.1.10/lib/active_record/connection_adapters/abstract_adapter.rb:389:in `without_prepared_statement?'


You should have a deeper understanding of the advanced parts of Ruby (like splats) before moving into Rails.

Having used Ruby professionally for the past five years or so, I have pretty good understanding of basic Ruby behaviour (like splats), so your snark is unwarranted (and not a good way to make people feel wanted in this community).


So in this case I would say that the correct documentation should be:
find_by_sql(sql, *binds = [])

Except that's not what the method definition actually is. There is no splat in the definition. If there was, the behaviour would be different (if it wasn't for the fact that def find_by_sql(sql, *binds=[]) would be invalid ruby).



James Coleman

unread,
Jun 25, 2015, 11:03:16 AM6/25/15
to rubyonra...@googlegroups.com
@Jason, I don't think you're correct about the way Ruby functions. Splats are not implicit; they must be declared in the method definition. For example:

irb(main):001:0> def t(x, y=[])
irb(main):002:1> puts x
irb(main):003:1> puts y.inspect
irb(main):004:1> end
=> :t
irb(main):005:0> t(1, 2)
1
2
=> nil
irb(main):006:0> t(1,2,3)
ArgumentError: wrong number of arguments (3 for 1..2)
from (irb):1:in `t'
from (irb):6

So clearly if you call a method like that with two arguments, the second will not be coerced into an array like it would if you used splat, and further the method call fails if you pass more than 2 arguments. Hence splat is not implicit.

And the method definition for find_by_sql does not define a splat (see rails/activerecord/lib/active_record/querying.rb) so it cannot possibly function the way you claim it does. Have you tested it out?

I have tested it out, and it fails with more than two arguments to the method call as one would expect based on the method signature:

irb(main):002:0> Client.find_by_sql('select ?, ?', 1, 2)
ArgumentError: wrong number of arguments (3 for 1..2)
from /Users/.../.rbenv/versions/ruby-2.1/lib/ruby/gems/2.1.0/gems/activerecord-4.2.2/lib/active_record/querying.rb:38:in `find_by_sql'
from (irb):2

Nicolás Sanguinetti

unread,
Jun 25, 2015, 11:06:21 AM6/25/15
to rubyonra...@googlegroups.com, rubyonra...@googlegroups.com
> You need to have a deeper understanding of ruby, and specifically the magic of the splat (*) operator.

I think you missed that the method signature is `def find_by_sql(sql, binds = [])`, and not `def find_by_sql(sql, *binds)`. There’s no splat in the method signature.

Before trying to belittle someone, maybe make sure that you’re actually correct :P

Remember MINSWAN, people.

(As for the on-topic conversation, sorry, I have no idea what’s going on with that `binds` argument.)

Cheers,
-foca




James Coleman

unread,
Jun 25, 2015, 11:18:47 AM6/25/15
to rubyonra...@googlegroups.com
Furthermore, the definition of find_by_sql truly is confusing in because when used the way the docs say to call it, the binds variable does not contain the bind variable. For example:

irb(main):010:0> def t(x, y=[])
irb(main):011:1> puts "x: #{x.inspect}"
irb(main):012:1> puts "y: #{y.inspect}"
irb(main):013:1> end
=> :t
irb(main):014:0> t(1,2)
x: 1
y: 2
=> nil
irb(main):016:0> t([1,2])
x: [1, 2]
y: []

T.J. Schuck

unread,
Jun 25, 2015, 11:48:10 AM6/25/15
to rubyonra...@googlegroups.com
The `bind` argument was added as `bind_values` all the way back in 2010, with no change to the documentation: https://github.com/rails/rails/commit/cc468d3ec81d6f1298fca91c0549584b36dafcc6

So the docs are probably just wrong — I’m sure a documentation PR would be quickly merged.

James Coleman

unread,
Jun 25, 2015, 11:54:50 AM6/25/15
to rubyonra...@googlegroups.com
@TJ: No, I don't believe the docs are wrong. If you try the examples they only work as documented, and fail if go by what you'd expect from the method definition instead.

T.J. Schuck

unread,
Jun 25, 2015, 11:56:34 AM6/25/15
to rubyonra...@googlegroups.com
Sorry, I suppose by “wrong” I meant “incomplete”.

There is a `binds` argument, and it seems to be used, but it is not documented.

James Coleman

unread,
Jun 25, 2015, 11:58:11 AM6/25/15
to rubyonra...@googlegroups.com
What is the binds argument used for? It's not immediately clear from the source, and using it as one would expect doesn't work either.

Jason Fleetwood-Boldt

unread,
Jun 25, 2015, 12:26:30 PM6/25/15
to rubyonra...@googlegroups.com
You were right, I apologize. 


Matias Korhonen

unread,
Jun 26, 2015, 3:01:05 AM6/26/15
to rubyonra...@googlegroups.com
There's also no test case (as far as I can find) that would cover passing the binds args to #find_by_sql

Anyone know what it's supposed to be used for?

Al Tenhundfeld

unread,
Jun 26, 2015, 1:05:47 PM6/26/15
to rubyonra...@googlegroups.com
If I'm following the code, it looks like the binds argument is passed through #select_all and eventually down to the adapter's #exec_query, e.g., the MySQL adapter. So, the behavior could be different depending on the adapter and even DB version. That makes sense, because different databases handle prepared statements and bind variables differently.

And it looks like the documentation example of #find_by_sql, passing the array with "bind variables" into the sql arg, works because that sql arg is being passed to #sanitize_sql, which has its own implementation of replacing variables outside of/before the adapter. So, using this syntax isn't really using bind variables, passed to the database. It's more string interpolation and sanitation.

So, following the code, I was able to get a query executing with the Postgres adapter and actually passing the bind variables to Postgres.

Example 1, using binds arg:
Project.find_by_sql('select * from projects where client_id = $1 and region = $2', [[nil, 123], [nil, 'West']])
(NOTE: You can use the column name in place of nil, but I get a warning when I do that.)
Results in this statements in the PG log:
LOG:  execute a3: select * from projects where client_id = $1 and region = $2
DETAIL:  parameters: $1 = '123', $2 = 'West'

Example 2, specifying the variables in the sql arg:
Project.find_by_sql(['select * from projects where client_id = ? and region = ?',123, 'West'])
Results in:
LOG:  execute <unnamed>: select * from projects where client_id = 123 and region = 'West'

So, you can see those are clearly different. In example 1, the statement is being prepared, and the bind variables are passed to Postgres. In example 2, the variables are being replaced by Rails via #sanitize_sql before the statement is passed on to the adapter, which results in an unnamed statement passed to Postgres with no bind variables. (I think, that's how I'm interpreting it at least.)

I couldn't get a true bind variable example working with MySQL, but I don't have a convenient environment using recent versions of the adapter and MySQL server. I think this is related to Pat Shaughnessy's post on prepared statements.

Following that logic, it's not too surprising that standard model methods result in different behavior in PG and MySQL:

Project.find(1) using the Postgres adapter results in:
LOG:  execute a4: SELECT  "projects".* FROM "projects"  WHERE "projects"."id" = $1 LIMIT 1
DETAIL:  parameters: $1 = '1'

Project.find(1) using (my older version of) the MySQL adapter results in the MySQL equivalent of:
LOG:  execute <unnamed>: SELECT  "projects".* FROM "projects"  WHERE "projects"."id" = 1 LIMIT 1

I haven't really kept up with database internals enough to know how much of an improvement prepared statements yield. Years ago, prepared statements could give you a significant performance improvement, because the query parser can easily see two queries are the same, reuse copies of the execution plan, etc. My understanding is databases have gotten better about deriving similarities in queries and reusing plans, cached results, etc.

I don't have an opinion (yet) on what should be done to improve the situation, but I hope that sheds a little light on what's happening with the binds arg and why you might want to use it.

Cheers,
Al




Roque Pinel

unread,
Jun 26, 2015, 2:10:58 PM6/26/15
to rubyonra...@googlegroups.com
I believe the binds are used to improve the statement caching and not suggested to be used directly.

Here is an example of an internal use when `Post.find(1)` executed:

id_query_attribute = ActiveRecord::Relation::QueryAttribute.new(:id, 1, Post.type_for_attribute(:id))
Post.find_by_sql('select * from posts where id = ?', [id_query_attribute])

Al Tenhundfeld

unread,
Jun 26, 2015, 2:19:55 PM6/26/15
to rubyonra...@googlegroups.com
Ah, QueryAttribute was the part I was missing, didn't have time to dig through the code and follow how something like #find was specifying the bind variables (correctly). That makes sense.

Thanks,
Al
Reply all
Reply to author
Forward
0 new messages