Hello rubyonrails-core,
I’ve been looking into possible changes to ActiveRecord / Arel to make it easier to write Rails applications that are free of SQL injection vulnerabilities, and in particular do so in a way that makes it easy for a code reviewer to verify that the app is safe from such bugs.
The concern:
-----------------
With the ActiveRecord API as is, it’s relatively easy to write applications with SQL injection bugs, and those bugs don’t particularly stand out in code reviews. For example,
Customer.where("name like "%#{params[:id]}%")
is vulnerable, while
Customer.where("name = ?”, params[:id])
Customer.where(name => params[:id])
Customer.where(Customer.arel_table[:name].matches('%#{name_pattern}%')
etc are safe.
While a careful security code review would likely spot this type of bug, there is still a significant risk that it’d be missed, in particular in an agile project with frequent releases that can’t afford to do an in-depth security review between each release.
Proposal:
-------------
I’m wondering if it would be possible to introduce a “strict arel” mode for ActiveRecord which if turned on allows only SQL-safe Arel-style queries (including queries that are internally built via Arel by ActiveRecord, such as hash queries and find_by_xyz queries).
It looks like there’d be two main aspects to an implementation of this mode:
1) A change to Arel to allow the construction of SqlLiterals to be restricted to values that are program constants (i.e., fully determined at application boot time, before the first request is served).
2) A change to ActiveRecord to allow only execution of SQL statements that have been obtained by flattening an Arel AST.
#1 is necessary for two reasons:
- There are a number of code sites in Arel itself where SqlLiterals are constructed from String’s such that it’s impossible to be sure that the String’s value is a program constant (and not derived from user input).
E.g.,
gems/arel-2.2.1/lib/arel/select_manager.rb
def group *columns
columns.each do |column|
# FIXME: backwards compat
column = Nodes::SqlLiteral.new(column) if String === column
column = Nodes::SqlLiteral.new(column.to_s) if Symbol === column
- code sites in ActiveRecord where a String is explicitly converted into a SqlLiteral, e.g. to support the Table.where(“foo = ‘#{bar}’”) form of queries:
gems/activerecord-3.1.1/lib/active_record/relation/query_methods.rb
def collapse_wheres(arel, wheres)
[...]
(wheres - equalities).each do |where|
where = Arel.sql(where) if String === where
#2 is necessary to prevent direct execution of SQL statements that are not constructed via Arel, e.g by direct calls to find_by_sql.
Implementation Ideas:
----------------------------
For #1, add a registry to Arel of constant values that may be used as SqlLiterals. If “strict Arel” mode is enabled, and the app is in production mode, the registry is “locked” at the end of application boot (after Rails.initialized? is true), and the SqlLiteral constructor would only allow creation of previously registered SQL literals.
The registry could allow registration of the actual literal values themselves, or be in terms of Symbol-indexed hashes, or both.
Any code inside Arel and ActiveRecord that instantiates SqlLiterals would need to add appropriate calls to register its literals at load time. For example, arel-2.2.1/lib/arel/expressions.rb would call,
Arel.register_sql('sum_id', ‘max_id’, ….)
In addition, application code itself could register literals at load time, to white-list custom statements, as in
class CustomersController < ApplicationController
Arel.register_sql {:customer_by_id => "name = ?"}
def show
@customer = Customer.where(:customer_by_id, params[:id])
Note that this effectively restricts the app to using only statements written in terms of bind variables, which is exactly what we want (since the statement has to be registered at load time, it can’t possibly include the value of a request parameter, e.g via string interpolation).
For an initial experiment, see below.
For #2, a possible approach might be to encapsulate strings obtained from flattening Arel in a specific type, e.g. “ArelSqlStatement”; probably in ActiveRecord::ConnectionAdapters::DatabaseStatements.to_sql .
Then (if strict arel mode is enabled), in ActiveRecord::ConnectionAdapters::XyzAdapter.select, enforce that sql === ArelSqlStatement.
Question:
------------
* Is this a terrible idea, or is there some reason this wouldn’t work in practice? Note that it would of course be optional, and developers who enable this would make a conscious decision to restrict themselves to Arel-style queries in their code, in return for higher confidence that their app is free of SQL injection bugs.
* Would you be open to accepting a patch for the above to Arel and ActiveRecord?
Experiments:
-----------------
As an experiment, I tried the following monkey patch (which obviously is missing the registration mechanism; and the whitelisting of Arel/ActiveRecord-internal SQL literals):
$ cat vendor/plugins/strict_arel/lib/strict_arel.rb
# StrictArel
require 'arel'
module Arel
WHITELISTED_RAW_SQL = ["\0", '?', '*'] by
module Nodes
class SqlLiteral
alias_method :sqlLiteralInitialize_Do_Not_Call_This_Or_Else, :initialize
def initialize(raw_sql)
if WHITELISTED_RAW_SQL.include?(raw_sql)
sqlLiteralInitialize_Do_Not_Call_This_Or_Else(raw_sql)
else
raise "non-whitelisted value for Arel::Nodes::SqlLiteral: #{raw_sql}"
end
end
end
end
This appears to pretty much work as intended:
* Basic (primary key) queries work (which are internally constructed into arel):
ruby-1.9.2-p290 :002 > Customer.first
Customer Load (0.4ms) SELECT `customers`.* FROM `customers` LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">
ruby-1.9.2-p290 :021 > Customer.find(3)
Customer Load (0.2ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`id` = 3 LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">
* by_{column} queries work:
ruby-1.9.2-p290 :047 > Customer.find_by_name_and_credit('BooBaz', 'baz')
Customer Load (0.4ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`name` = 'BooBaz' AND `customers`.`credit` = 'baz' LIMIT 1
=> #<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">
* Custom Arel-based queries work:
ruby-1.9.2-p290 :005 > t = Customer.arel_table
=> #<Arel::Table:0x000000026da950 @name="customers", @engine=ActiveRecord::Base, @columns=nil, @aliases=[], @table_alias=nil, @primary_key=nil>
ruby-1.9.2-p290 :022 > Customer.where(t[:name].eq('BooBaz'))
Customer Load (0.3ms) SELECT `customers`.* FROM `customers` WHERE `customers`.`name` = 'BooBaz'
=> [#<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">]
ruby-1.9.2-p290 :024 > Customer.where(t[:name].matches('%Boo%'))
Customer Load (0.3ms) SELECT `customers`.* FROM `customers` WHERE (`customers`.`name` LIKE '%Boo%')
=> [#<Customer id: 3, name: "BooBaz", credit: "baz", created_at: "2011-10-26 22:01:36", updated_at: "2011-10-26 22:24:40">]
* Hash queries work:
* Projections, order, etc work, but columns need to be given as arel column objects:
ruby-1.9.2-p290 :025 > Customer.select(t[:name]).where(:name => 'BooBaz').order(t[:name]).limit(10)
Customer Load (0.3ms) SELECT `customers`.`name` FROM `customers` WHERE `customers`.`name` = 'BooBaz' ORDER BY `customers`.`name` LIMIT 10
=> [#<Customer name: "BooBaz">]
** However, Queries using string fragments end up calling Arel.sql when the underlying arel is constructed, and are rejected as intended:
ruby-1.9.2-p290 :035 > Customer.where("name = 'evil'")
RuntimeError: non-whitelisted value for Arel::Nodes::SqlLiteral: name = 'evil'
from /usr/local/google/xtof/s/review/b-5393417-ghost-rails/sample-app/rubymine_sample_native_ruby/vendor/plugins/strict_arel/lib/strict_arel.rb:19:in `initialize'
from /home/xtof/.rvm/gems/ruby-1.9.2-p290/gems/arel-2.2.1/lib/arel.rb:39:in `new'
Best Regards,
Christoph