Deprecate ActiveSupport::SafeBuffer modifications

118 views
Skip to first unread message

Nash Bridges

unread,
Nov 21, 2014, 1:42:30 PM11/21/14
to rubyonra...@googlegroups.com
In this proposal every method which is considered as "unsafe" for `ActiveSupport::SafeBuffer` is going to be deprecated, both bang and non-destructive versions.

At first glance, it seems that this would violate Liskov substitution principle, because  `ActiveSupport::SafeBuffer` is a subclass of `String`. However, let's step back and remember what's the purpose of safe strings.

Safebuffer strings were introduced in Rails 3 in order to distinguish between strings that are already sanitized, that don't need to be sanitized and non-safe strings. Consider this ERB template:

<div>Hello!</div>
<%= params[:evil_input] %>
<%= link_to params[:evil_input], users_path %>

The first string is static therefore it is completely safe and shouldn't be sanitized at all.
The second one is definitely unsafe and should be sanitized.
The third one is made by Rails helper which returns safe strings and shouldn't be sanitized. However, parameter itself is unsafe and should be sanitized by Rails before wrapping it into <a></a>.

When ERB is compiled into a Ruby method, it wraps each dynamic string into a sanitizing method call, which in turn must have a way to distinguish between second and third row. This is when Safebuffers are really used. They respond with +true+ to +html_safe?+, while other strings and objects (except numbers) respond with +false+.

So, this is completely a view concept and even could be considered as a part of ActionView private API, because nobody uses safebuffer to pass data throughout whole application. SafeBuffers were not designed to enhance Ruby String in some way that other parsing/converter libraries could benefit from. As for subclassing String, [here][yehudakatz] explains that this was performance only decision ("Because SafeBuffer inherits from String, Ruby creates this wrapper extremely efficiently (just sharing the internal char * storage).") Should we bother if some methods from String are missing then?

Back to modifications and "unsafe" methods. It turned out that if we would perform, for example, +gsub+ on already sanitized string that was wrapped into a SafeBuffer, we couldn't call it safe anymore. This is because one could revert any changes made by a sanitizer by another series of +gsub("&lt;", "<")+ etc. So the decision was made to return raw string for non-destructive versions of such unsafe methods and mark SafeBuffer as non-safe for destructive (bang) ones.

As for +gsub!+ and its friends. Not only "unsafe SafeBuffer" sounds ridiculous, but this highly complicated the class implementation! Let's look how it could look like:

module ActiveSupport #:nodoc:
  class SafeBuffer < String
    UNSAFE_STRING_METHODS = %w(
      capitalize chomp chop delete downcase gsub lstrip next reverse rstrip
      slice squeeze strip sub succ swapcase tr tr_s upcase
    )

    alias_method :safe_concat, :concat

    class SafeConcatError < StandardError
      def initialize
        super 'Could not concatenate to the buffer because it is not html safe.'
      end
    end

    def initialize_copy(other)
      raise SafeConcatError unless other.html_safe?
      super
    end

    def clone_empty
      self[0, 0]
    end

    def concat(value)
      super(ERB.unwrapped_html_escape(value))
    end
    alias << concat

    def prepend(value)
      super(ERB.unwrapped_html_escape(value))
    end

    def +(other)
      dup.concat(other)
    end

    def %(args)
      case args
      when Hash
        escaped_args = Hash[args.map { |k,arg| [k, ERB.unwrapped_html_escape(arg)] }]
      else
        escaped_args = Array(args).map { |arg| ERB.unwrapped_html_escape(arg) }
      end

      self.class.new(super(escaped_args))
    end

    def html_safe?
      true
    end

    def to_s
      self
    end
    alias_method :html_safe, :to_s

    def to_param
      to_str
    end

    def encode_with(coder)
      coder.represent_scalar nil, to_str
    end

    UNSAFE_STRING_METHODS.each do |unsafe_method|
      if unsafe_method.respond_to?(unsafe_method)
        class_eval <<-EOT, __FILE__, __LINE__ + 1
          def #{unsafe_method}(*args, &block)
            raise NoMethodError
          end

          def #{unsafe_method}!(*args)
            raise NoMethodError
          end
        EOT
      end
    end
  end
end

Now, aint' that nice? [Here][output_safety] you can compare it with the current implementation. By declaring safebuffers *always* safe we make it easier to understand, and we have less code to perform! ([see][tenderlove_rant] for more information about +#safe_concat+). We also always return +self+ for +html_safe+ (there were a [couple][html_safe_pull_1] rejected [approaches][html_safe_pull_2] to enhance that).

One would agree that destructive modifications is true evil, but what's wrong with +gsub+? Well, first of all, it [doesn't work as expected][broken_gsub]. [Here][deprecate_gsub_block] is a proposal to fix that, but we see that this doesn't solve the problem entirely. Moreover, it leads to writing less performant and (possibly) vulnerable code. Consider this example code:

user_input = ERB.unwrapped_html_escape(params[:evil_input]
string_that_is_going_to_view = "<div>#{user_input}</div>".html_safe

This code is 100% safe and efficient. Suddenly, a programmer decides to change something

patched_string = string_that_is_going_to_view.gsub(",", ".")

Now, his "<div"> is escaped by ERB, this is no good. What's the solution? Throw another +html_safe+ !

patched_string = string_that_is_going_to_view.gsub(",", ".").html_safe

Wha'ts wrong with this code? First of all, it creates two SafeBuffer objects while first is disposed immediately. [Here][tenderlove_unwrapped] @tenderlove clearly states that this is wrong approach and every object counts! Next, it can possibly lead to XSS attack. Consider this variant of code above:

patched_string = string_that_is_going_to_view.gsub("[", "<").gsub("]", ">").html_safe

Now, even if +user_input+ was previously sanitized it can be turned into unsafe again.

One could ask "isn't it a user decision to take care of such situations?" I would answer that Rails is opinionated and should force a user to write better code. When I turned deprecation warnings on all unsafe methods and run Rails test suite, I saw *a lot* of deprecations on my screen. There're even specially written tests that checks if SafeBuffer works correctly with +#titleize+ and its family (which obviously modies the passed string). Not to mention that Rails views itself use +gsub!+ as well.

So, if those methods are being deprecated what should programmer do? He/she must realise that any change should be made *before* wrapping a string into Safebuffer. This would be fast and efficient. If it's impossible, then call +to_str+, do a modification and instantiate a new safebuffer. No changes with current approach, except it is explicit now (so a user would feel the pain and maybe decide to rewrite the code). For a library writer (who doesn't know beforehand what kind of data will be passed into a method) I would propose ActiveSupport to have monkeypatched +Kernel.String+ method, so it would return string from a safebuffer, otherwise call +to_s_ as usual.

Reply all
Reply to author
Forward
0 new messages