A "strict Arel" mode for ActiveRecord to prevent SQL injection vulnerabilities

Christoph Kern

unread,

Nov 22, 2011, 1:11:18 PM11/22/11

to rubyonra...@googlegroups.com, aaron.p...@gmail.com, mic...@koziarski.com, igri...@google.com

Hello rubyonrails-core,

I’ve been looking into possible changes to ActiveRecord / Arel to make it easier to write Rails applications that are free of SQL injection vulnerabilities, and in particular do so in a way that makes it easy for a code reviewer to verify that the app is safe from such bugs.

The concern:
-----------------

With the ActiveRecord API as is, it’s relatively easy to write applications with SQL injection bugs, and those bugs don’t particularly stand out in code reviews. For example,
  Customer.where("name like "%#{params[:id]}%")
is vulnerable, while
  Customer.where("name = ?”, params[:id])
  Customer.where(name => params[:id])
  Customer.where(Customer.arel_table[:name].matches('%#{name_pattern}%')
etc are safe.

While a careful security code review would likely spot this type of bug, there is still a significant risk that it’d be missed, in particular in an agile project with frequent releases that can’t afford to do an in-depth security review between each release.

Proposal:
-------------

I’m wondering if it would be possible to introduce a “strict arel” mode for ActiveRecord which if turned on allows only SQL-safe Arel-style queries (including queries that are internally built via Arel by ActiveRecord, such as hash queries and find_by_xyz queries).

It looks like there’d be two main aspects to an implementation of this mode:

1) A change to Arel to allow the construction of SqlLiterals to be restricted to values that are program constants (i.e., fully determined at application boot time, before the first request is served).

2) A change to ActiveRecord to allow only execution of SQL statements that have been obtained by flattening an Arel AST.

#1 is necessary for two reasons:
- There are a number of code sites in Arel itself where SqlLiterals are constructed from String’s such that it’s impossible to be sure that the String’s value is a program constant (and not derived from user input).

E.g.,
gems/arel-2.2.1/lib/arel/select_manager.rb
  def group *columns
    columns.each do |column|
      # FIXME: backwards compat
      column = Nodes::SqlLiteral.new(column) if String === column
      column = Nodes::SqlLiteral.new(column.to_s) if Symbol === column

- code sites in ActiveRecord where a String is explicitly converted into a SqlLiteral, e.g. to support the Table.where(“foo = ‘#{bar}’”) form of queries:

gems/activerecord-3.1.1/lib/active_record/relation/query_methods.rb
  def collapse_wheres(arel, wheres)
[...]
    (wheres - equalities).each do |where|
      where = Arel.sql(where) if String === where

#2 is necessary to prevent direct execution of SQL statements that are not constructed via Arel, e.g by direct calls to find_by_sql.

Implementation Ideas:
----------------------------

For #1, add a registry to Arel of constant values that may be used as SqlLiterals. If “strict Arel” mode is enabled, and the app is in production mode, the registry is “locked” at the end of application boot (after Rails.initialized? is true), and the SqlLiteral constructor would only allow creation of previously registered SQL literals.

The registry could allow registration of the actual literal values themselves, or be in terms of Symbol-indexed hashes, or both.

Any code inside Arel and ActiveRecord that instantiates SqlLiterals would need to add appropriate calls to register its literals at load time. For example, arel-2.2.1/lib/arel/expressions.rb would call,
  Arel.register_sql('sum_id', ‘max_id’, ….)

In addition, application code itself could register literals at load time, to white-list custom statements, as in

class CustomersController < ApplicationController
Arel.register_sql {:customer_by_id => "name = ?"}
def show
  @customer = Customer.where(:customer_by_id, params[:id])

Note that this effectively restricts the app to using only statements written in terms of bind variables, which is exactly what we want (since the statement has to be registered at load time, it can’t possibly include the value of a request parameter, e.g via string interpolation).

For an initial experiment, see below.

For #2, a possible approach might be to encapsulate strings obtained from flattening Arel in a specific type, e.g. “ArelSqlStatement”; probably in ActiveRecord::ConnectionAdapters::DatabaseStatements.to_sql .
Then (if strict arel mode is enabled), in ActiveRecord::ConnectionAdapters::XyzAdapter.select, enforce that sql === ArelSqlStatement.

Question:
------------

* Is this a terrible idea, or is there some reason this wouldn’t work in practice? Note that it would of course be optional, and developers who enable this would make a conscious decision to restrict themselves to Arel-style queries in their code, in return for higher confidence that their app is free of SQL injection bugs.

* Would you be open to accepting a patch for the above to Arel and ActiveRecord?

Experiments:
-----------------

As an experiment, I tried the following monkey patch (which obviously is missing the registration mechanism; and the whitelisting of Arel/ActiveRecord-internal SQL literals):

$ cat vendor/plugins/strict_arel/lib/strict_arel.rb
# StrictArel

require 'arel'
module Arel
WHITELISTED_RAW_SQL = ["\0", '?', '*'] by
module Nodes
  class SqlLiteral
    alias_method :sqlLiteralInitialize_Do_Not_Call_This_Or_Else, :initialize

    def initialize(raw_sql)
      if WHITELISTED_RAW_SQL.include?(raw_sql)
        sqlLiteralInitialize_Do_Not_Call_This_Or_Else(raw_sql)
      else
        raise "non-whitelisted value for Arel::Nodes::SqlLiteral: #{raw_sql}"
      end
    end
  end

end

This appears to pretty much work as intended:

* Basic (primary key) queries work (which are internally constructed into arel):

ruby-1.9.2-p290 :002 > Customer.first
Customer Load (0.4ms) SELECT `customers`.* FROM `customers` LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">

ruby-1.9.2-p290 :021 >   Customer.find(3)
Customer Load (0.2ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`id` = 3 LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">

* by_{column} queries work:

ruby-1.9.2-p290 :047 > Customer.find_by_name_and_credit('BooBaz', 'baz')
Customer Load (0.4ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`name` = 'BooBaz' AND `customers`.`credit` = 'baz' LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">

* Custom Arel-based queries work:

ruby-1.9.2-p290 :005 > t = Customer.arel_table
=> #<Arel::Table:0x000000026da950 @name="customers", @engine=ActiveRecord::Base, @columns=nil, @aliases=[], @table_alias=nil, @primary_key=nil>

ruby-1.9.2-p290 :022 > Customer.where(t[:name].eq('BooBaz'))
Customer Load (0.3ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`name` = 'BooBaz'
=> [#<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">]
ruby-1.9.2-p290 :024 > Customer.where(t[:name].matches('%Boo%'))
Customer Load (0.3ms) SELECT `customers`.* FROM `customers` WHERE (`customers`.`name` LIKE '%Boo%')
=> [#<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">]

* Hash queries work:
* Projections, order, etc work, but columns need to be given as arel column objects:

ruby-1.9.2-p290 :025 > Customer.select(t[:name]).where(:name => 'BooBaz').order(t[:name]).limit(10)
Customer Load (0.3ms) SELECT `customers`.`name` FROM `customers` WHERE `customers`.`name` = 'BooBaz' ORDER BY `customers`.`name` LIMIT 10
=> [#<Customer name: "BooBaz">]

** However, Queries using string fragments end up calling Arel.sql when the underlying arel is constructed, and are rejected as intended:

ruby-1.9.2-p290 :035 > Customer.where("name = 'evil'")
RuntimeError: non-whitelisted value for Arel::Nodes::SqlLiteral: name = 'evil'
      from /usr/local/google/xtof/s/review/b-5393417-ghost-rails/sample-app/rubymine_sample_native_ruby/vendor/plugins/strict_arel/lib/strict_arel.rb:19:in `initialize'
      from /home/xtof/.rvm/gems/ruby-1.9.2-p290/gems/arel-2.2.1/lib/arel.rb:39:in `new'

Best Regards,
Christoph

Cynthia Kiser

unread,

Nov 27, 2011, 12:29:49 PM11/27/11

to rubyonra...@googlegroups.com

With the ActiveRecord API as is, it’s relatively easy to write applications with SQL injection bugs, and those bugs don’t particularly stand out in code reviews. For example,
Customer.where("name like "%#{params[:id]}%")

While a careful security code review would likely spot this type of bug, there is still a significant risk that it’d be missed, in particular in an agile project with frequent releases that can’t afford to do an in-depth security review between each release.

It is possible to overlook user input in string interpolations - especially if they have been given a temporary variable name, not just params[thing]. But that is one of the things Brakeman is good at detecting already: https://github.com/presidentbeef/brakeman

--
Cynthia Kiser
cynthi...@gmail.com

Ilya Grigorik

unread,

Nov 28, 2011, 3:16:58 PM11/28/11

to rubyonra...@googlegroups.com

Hmm, brakeman looks interesting. Having said that, static analysis is a nice security blanket, but it would still be nice to have an enforceable runtime policy that Christoph is alluding to.

This seems to be related also: https://groups.google.com/forum/#!topic/rubyonrails-core/hvfx2MOLnoU - Christoph, any thoughts?

ig

Anuj Dutta

unread,

Nov 29, 2011, 2:49:31 AM11/29/11

to rubyonra...@googlegroups.com

https://groups.google.com/forum/#!topic/rubyonrails-core/hvfx2MOLnoU - This looks good but there have been instances where some issue or the other vulnerabilities have made their way through SafeBuffer, if I remember correctly. This again could be due to the way the app developer writes the code.

So, Christoph's approach is ofcourse better, quite detailed and well thought-out. I just feel that it expects some work from the app developers which could make it difficult to adopt.

Just my two cents.

Anuj

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/Xiw70fs5eo0J.

To post to this group, send email to rubyonra...@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-co...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.

--
Anuj DUTTA

Christoph Kern

unread,

Nov 29, 2011, 12:40:11 PM11/29/11

to rubyonra...@googlegroups.com

Thanks for the comments!

* Re: brakeman. Looks like an interesting and useful too, however it appears that it only can only find relatively specific patterns of bugs, and therefore doesn't provide particularly high confidence that an app is free of SQL injection (or other classes of bugs).

From a brief look, what it appears to be doing is to essentially pattern-match over the parse tree of the app's source code. However, it does not appear to do any static data flow analysis (except for basic in-lining of variables).

While it does flag,

condition = "name like '%#{params[:name]}%'"

@post = Post.where(condition)

it doesn't flag either

if true

condition = "name like '%#{params[:name]}%'"

else

condition = "foo"

end

@post = Post.where(condition)

nor,

@post = Post.where(build_where_clause(params[:name]))

(even if build_where_clause is defined in the same class).

This is not surprising; in a dynamic language like ruby it's fundamentally impossible to statically derive an accurate control- and data-flow graph from source code.

* Re: SafeBuffer:

If I understand correctly, a SafeBuffer is essentially a string that comes with the promise that it's safe to use in a particular context (in the existing case, HTML).

It seems that this is a useful concept that could be used in conjunction with the approach I've proposed, but doesn't replace it. In a way, the ArelSqlStatement type I've proposed (to represent SQL that was obtained by flattening an Arel AST) is essentially a kind of SqlSafeBuffer. I.e., one way of creating a SqlSafeBuffer is to build an Arel AST and flatten it into string. However, I think we would still want to modify Arel itself to ensure that SQL literals are indeed program constants (in order to guarantee that flattened Arel ASTs are indeed safe).

It's certainly worth considering to add additional ways of creating SqlSafeBuffers, e.g. via string interpolation that automatically quotes values.

If there are other ways of constructing SqlSafeBuffers, it might also make sense to allow Arel leaf nodes of kind SqlSafeBuffer (which would not be quoted during flattening).

That said, I would be concerned about how to ensure that SqlSafeBuffers are in fact constructed safely. It seems nothing would prevent a programmer from writing

"name like '%#{param[:name]}%' ".sql_safe

Of course, this _should_ stick out in code reviews, but relying on that would be just as brittle as relying on a code review catching Foo.where("name like '%#{param[:name]}%' ")

On Mon, Nov 28, 2011 at 23:49, Anuj Dutta <dutta...@googlemail.com> wrote:

https://groups.google.com/forum/#!topic/rubyonrails-core/hvfx2MOLnoU - This looks good but there have been instances where some issue or the other vulnerabilities have made their way through SafeBuffer, if I remember correctly. This again could be due to the way the app developer writes the code.

So, Christoph's approach is ofcourse better, quite detailed and well thought-out. I just feel that it expects some work from the app developers which could make it difficult to adopt.

Well, it would essentially require that developers either

- write their queries as Arel. I suppose this really isn't extra work; rather it's a different style of coding (and if I understand correctly, considered the "proper way" of writing queries in Rails 3?)

- if they want to use prepared statements, they'd have to write a separate call to register the statement. This is indeed a little bit of extra typing, and does have somewhat of a negative impact on readability.

The benefit for this cost would be a reasonably high degree of confidence that an app is free of SQL injection bugs. Which are very serious bugs (often resulting in complete compromise of an application's data), and hence many developers should be willing to make that trade off.