Adding support for escaping literal query parameters

2,326 views
Skip to first unread message

Zaher Hammoud

unread,
Feb 8, 2013, 4:53:27 PM2/8/13
to mybati...@googlegroups.com
Hello,

We are using mybatis in an enterprise application and it is working for us very well. Our application deals with a large set of data and performance is one of our main concerns. Not all of the application queries running against the database use bind variable. Some queries use literal values to hint the optimizer to select an efficient SQL execution plan.  Mybatis requires us to escape literal values for SQL injection prior to be passed to the mapper. It would be nice if mybatis can do the escaping for the literal values without the client having to do that every time. I am proposing a new feature in mybatis to allow such behavior to happen:

For example:

Interface
Interface MyMapper.java {
     Collection<User> getUserData(@LiteralParam("userName") String userName);
}

MyMapper.xml 
    <select id="getUserData"  resultMap="userDataMap">
          SELECT *
            FROM User_Table t
             WHERE t.user_name = #{userName}
     </select>

Client:
 mapper.getUserData("my'user");

Output SQL:

SELECT *
            FROM User_Table t
             WHERE t.user_name = 'my''user' (single quotes were escaped by mybatis)

In the mapper interface, the parameter that needs to be translated into a literal value will be annotated with a new annotation: @LiteralParam. In the mapper xml, the parameter will be represented by #{} and mybatis will automatically change that to a safe literal value in the query.

Please let me know what you think of this proposed feature.

Thanks,

Zaher

Jeff Butler

unread,
Feb 8, 2013, 10:16:18 PM2/8/13
to mybati...@googlegroups.com
MyBatis already supports literals through the use of ${} instead of #{}. 

As for automatic escaping, I'm not really in favor of this. It is a huge assumption to think that the single quote character should always be escaped. Most of the time this is true, but not always. Commons Lang removed support for SQL escaping for this very reason. 

Jeff Butler
--
You received this message because you are subscribed to the Google Groups "mybatis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mybatis-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Larry Meadors

unread,
Feb 8, 2013, 10:58:34 PM2/8/13
to mybati...@googlegroups.com
I'm with Jeff - that sort of functionality is a nightmare to support
on every random database. IMO, it should be an application concern.

You could wrap the mapper (it's just an interface) and do the escaping
in a class that wraps the real mybatis mapper.

You could make the behavior exactly like you described, but not
complicate the framework for everyone else.

Larry

Zaher Hammoud

unread,
Feb 11, 2013, 12:54:37 PM2/11/13
to mybati...@googlegroups.com, larry....@gmail.com
Thank you Jeff and Larry for the quick and great feedback. I agree with you that supporting all database versions is going to become a maintenance nightmare. You have also convinced me that this functionality should be externalized for the reasons you have pointed out which are valid in my opinion.  At the same time I am not in favor of introducing a layer on top of  mybatis mappers to ensure that our literal parameter values are safe from SQL injection.  This will add another layer of complexity to our application which I think could be avoided. Another suggestion which I like to get your input on is to introduce a formatter for the string substitution parameter which can be custom implemented by the user. For example :

<select id="getUserData"  resultMap="userDataMap">
          SELECT *
            FROM User_Table t
             WHERE t.user_name = ${userName, formatter=MySingleQuoteEscaper}
     </select>

<select id="getUserData"  resultMap="userDataMap">
          SELECT *
            FROM User_Table t
             WHERE t.login_datetime = ${loginDatetime, formatter=MyDatetimeFormater}
     </select>

I think this is more flexible and can be used for other purposes as well.
 

Jeff Butler

unread,
Feb 11, 2013, 1:03:47 PM2/11/13
to mybati...@googlegroups.com
We recommend that you ALWAYS use bind variables if possible.  That is the true way to guard against SQL injection, and it relieves you of all the escaping/formatting problems also.

We support string substitution - ${} - because there are a limited set of use cases where is can be handy (like specifying a dynamic optimizer hint).  What you've shown below should be changed to use bind variables - #{}.

You might want to read the FAQ to make sure you understand the difference between ${} and #{}.


Jeff Butler


 

--

Zaher Hammoud

unread,
Feb 11, 2013, 10:23:36 PM2/11/13
to mybati...@googlegroups.com
We strictly prefer to use bind variables in all our queries but we run into performance problems in two areas due to bind variable peeking and plan stability:

1)      High skew values that have Frequency histograms on the column

2)      Range-based parameters like date BETWEEN and LIKE clauses.


To address the first issue, we sometimes drop the histogram on a column so that the optimizer is not fooled by the absence of a value in the histogram and therefore treats all distinct values equally: For example, in a table of 100 rows and 10 distinct values, the optimizer will assume that you will get 10 rows back every time. This will give us a single plan with still using bind variables. This situation is good when we truly want & expect the same plan regardless of the parameter.  A good example is a newly onboarded customer who very quickly generates many tens of thousands of rows, but is statistically insignificant compared to a long-standing customer.

 

For the second problem, we use the literal parameters to get different execution plans by forcing the hard parse if the consequence of a poor plan choice is high risk/impact to the system. For example:

 

A table has two columns: Category and Value. Category always has cardinality of 100.“WHERE value LIKE ‘A%’ “ has a cardinality of 1000 but “WHERE value LIKE ‘ABCDE%’ “has a lower cardinality of 10. When running the trailing wildcard query we want different plans based on the query parameter: Sometimes we want to drive by the index on“category” column other times by index on “value” column. With the bind variable, the plan is pinned the first time the query is parsed. In this case we tend to use literal parameters in the query.

 

A similar example would apply with “WHERE date_column BETWEEN ? and ? “when the user-entered date ranges might be a day, a week, a month or a year.  Narrow date range? Use the index on the date column? Wide date range? Use a more selective scan.

 

We know parsing a query is an expensive operation, but we weigh the parsing cost against what the query is trying to do. For high occurrence and repetitive queries, we use bind variables always.

 


We do use hints to diagnose a problem but do not consider them to be good for production code.  Call it a policy of ‘non-interference’ with the optimizer when the pattern of growth in our large data model is not predictable.

 


Zaher

Jeff Butler

unread,
Feb 12, 2013, 9:37:44 AM2/12/13
to mybati...@googlegroups.com
This is a very interesting post.  You've definitely done your research.  Thanks for sending it.

I think we're not very interested in adding framework support for something you can do easily enough in your own code - but I haven't looked to see what the impact would be in MyBatis.  If you can come up with a simple patch for this, then add it to an enhancement ticket and we'll see if the community supports it.

Jeff Butler


--

Frank Martínez

unread,
Feb 12, 2013, 9:49:26 AM2/12/13
to mybati...@googlegroups.com
Hi Guys,

I am pretty sure it can be do easily with our new Velocity Scripting Driver. I will post an example.... give me some minutes ;)

Cheers,

Frank.
--
Frank D. Martínez M.

Frank Martínez

unread,
Feb 12, 2013, 10:30:49 AM2/12/13
to mybati...@googlegroups.com
Ok, using velocity scripting driver:

1. Write your own util (formatter)

public class CustomUtil {
  public String esc(final String s) {
    return s.replace("'", "''");
  }    
}

2. Call it from your mapped statement (xml example)

  <select id="MyStatement" resultType="...">
    SELECT *
    FROM names
    WHERE firstName = '$_parameter.util.esc($_parameter.text)'
  </select>

3. Put your util in context on call

      Map param = new HashMap();
      param.put("text", "Rock 'n Roll");
      param.put("util", new CustomUtil());
      List<XXX> answer = sqlSession.selectList("MyStatement", param);

I hope it can be done better, but it is just a quick and dirty example ;)

Cheers,

Frank.

Zaher Hammoud

unread,
Feb 12, 2013, 1:18:20 PM2/12/13
to mybati...@googlegroups.com
Thank you Jeff and Frank. We are willing to spend the time working on the patch and submitting it for review. Before that we will look at the velocity driver to see if it can solve the problem. It looks promising !

Frank, Do you know if we need to pass the formatter object every time to the query or it can be registered as a global type once and called anytime?

Thanks,

Zaher


Frank Martínez

unread,
Feb 12, 2013, 2:32:15 PM2/12/13
to mybati...@googlegroups.com
I think it can be done with the new <bind> tag using OGNL static method calls and avoid the velocity driver. I will do some experiments and post my results here.


--
You received this message because you are subscribed to the Google Groups "mybatis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mybatis-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Frank Martínez

unread,
Feb 12, 2013, 2:50:27 PM2/12/13
to mybati...@googlegroups.com
It works like a charm:

1. Write your formatter as an static method in an utility class:

package org.apache.ibatis.submitted.dynsql;

public class CustomUtil {
    public static String esc(final String s) {
        return s.replace("'", "''");
    }    
}

2. Use it in the mapped statement directly using <bind ...>

  <select id="your-statement" resultType="map" parameterType="string">
    <bind name="escapedLiteral" value="@org.apache.ibatis.submitted.dynsql.CustomUtil@esc(_parameter)" />
    SELECT *
    FROM ibtest.names
    WHERE description LIKE '${escapedLiteral}'
    ORDER BY id
  </select>

What do you think?

Zaher Hammoud

unread,
Mar 11, 2013, 10:59:45 AM3/11/13
to mybati...@googlegroups.com
Hi,

Attached is the proposed patch for adding support to escape/format literal parameters in mybatis. The functionality works as follow :

1) Creates a custom formatter java class:
public class TrailingWildCardFormatter implements Formatter {
@Override
public String format(Object val) {
return val == null ? "" :  "'" + val.toString().replaceAll("\'", "\''") + "%" + "'" ;
}
}

2) Register a type alias for the new custom formatter class in the mapper configuration XML file:
 <typeAliases>
    <typeAlias alias="trailingWildCardFormatter" type="com.mypackage.TrailingWildCardFormatter"/>
  </typeAliases>



3) Use it in your mapper XML File:

  <select id="formatterWithSimpleType" resultType="map" parameterType="string">
    SELECT *
    FROM ibtest.names
    WHERE description LIKE ${value, formatter=trailingWildCardFormatter}
    ORDER BY id
  </select>

I think this functionality will give us more flexibility in dealing with the string substitution parameters "${}". 
Please let me know what you think and if I need to create an enhancement issue on (https://github.com/mybatis/mybatis-3/issues)

Thanks,

Zaher
mybatis-3.patch

Jeff Butler

unread,
Mar 11, 2013, 11:03:08 AM3/11/13
to mybati...@googlegroups.com
Frank showed a way to accomplish this same thing with no change to MyBatis.  Did you try that?

Jeff Butler




Thanks,

Zaher

--

Zaher Hammoud

unread,
Mar 11, 2013, 12:42:52 PM3/11/13
to mybati...@googlegroups.com
Jeff,

Frank and I worked on this together through emails and this is what I think of the three options available to us so far:

1) Using <bind> OGNL static method calls
    This option is very verbose in the mapper xml. When the query has multiple literal parameters it will become hard to read. Refactoring will become difficult because of the direct reference to the fully qualified class name.

2) Using mybatis-scripting plugin 
  Frank and I worked together  on an enhancement to allow passing formatter objects to the velocity context. This solved the problem and it is way nicer than option one. 

3) Using the new proposed feature:
    Aliasing and creating formatters that are applied to the String substitution parameters at run time.

Option 2 did solve things well for us but option 3 has the following benefits :

  a - Shops may not want to use a scripting engine
  b - The scripting of choice might not be velocity :  For example we use freemarker
  c - It is consistent with the TypeHandler functionality that is currently built in mybatis


Thanks,

Zaher
Reply all
Reply to author
Forward
0 new messages