Issues with servlet filter to prevent XSS

19 views
Skip to first unread message

David Karr

unread,
Jul 15, 2022, 11:11:14 AM7/15/22
to esapi-project-users
For quite a while now, the platform I support has had a servlet filter intended to prevent XSS attacks.

It looks something like this:
--------------------
private static final List<String> ACCEPTABLE_MEDIA_TYPES = Arrays.asList(MediaType.APPLICATION_JSON,
MediaType.APPLICATION_XML);

@Override
public void doFilter(ServletRequest servletRequest, ServletResponse response, FilterChain chain) throws IOException, ServletException {
    if (ACCEPTABLE_MEDIA_TYPES.contains(servletRequest.getContentType())) {
        FilterRequestWrapper filterRequestWrapper = null;
        if (servletRequest instanceof FilterRequestWrapper) {
            filterRequestWrapper = (FilterRequestWrapper) servletRequest;
        } else {
            filterRequestWrapper = new FilterRequestWrapper((HttpServletRequest) servletRequest);
        }
        this.sanitizeRequestBody(filterRequestWrapper);
        chain.doFilter(filterRequestWrapper, response);
    } else {
        chain.doFilter(servletRequest, response);
    }
}

private void sanitizeRequestBody(FilterRequestWrapper filterRequestWrapper) throws IOException {
     // Set the sanitized request body back into the request.
    filterRequestWrapper.resetInputStream(Encode.forHtmlContent(filterRequestWrapper.getBody()).getBytes(StandardCharsets.UTF_8));
}
-----------------

Where "Encode" is "org.owasp.encoder.Encode" and FilterRequestWrapper extends HttpServletRequestWrapper.

I recently discovered a likely problem with this, particularly in the "sanitizeRequestBody" method. I read that I should be canonicalizing the input before encoding it. As a result, I changed that line to the following:

    filterRequestWrapper.resetInputStream(Encode.forHtmlContent(
ESAPI.encoder().canonicalize(filterRequestWrapper.getBody())).getBytes(StandardCharsets.UTF_8));

David Karr

unread,
Jul 15, 2022, 11:15:38 AM7/15/22
to esapi-project-users
Sigh. Gmail sent that before I was done, but I was almost finished.

Before this change, we had situations where data was "doubly-encoded" (and often many more than that).  The canonicalization step prevents double encoding by removing a single layer of encoding.  What I also discovered is that the canonicalization step will throw an exception if the input is already doubly-encoded.

What I'd like to know is if what we've done here is reasonable and recommended.  The OWASP documentation pages briefly talk about writing XSS filters, but they don't go into much detail.  It seems like many people have to reinvent the wheel.

Kevin W. Wall

unread,
Jul 15, 2022, 11:57:34 AM7/15/22
to David Karr, esapi-project-users
David,
Generally,  the entire approach to defending XSS via a servlet fiter or intercepting proxy is a naive one at best. Matt and I have seen this misguided approach come up so many times (notably on Stack Oveflow), that I think we both consider it a design anti-pattern by now.

To successfully defend against XSS you need to know the context so that you can use the corresponding appropriate encoder. For example, if the context of the user input is rendered within JavaScript, you need to use Encoder.encodeForJavaScript() and not Encoder.encodeForHTML(). However servlet filters generally will never know or be able to figure out the correct context.

There is a brief discussion of this in the Javadoc for Encoder in class overview section at

But if you need more details,  let me know.

-kevin

--
You received this message because you are subscribed to the Google Groups "ESAPI Project Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to esapi-project-u...@owasp.org.
To view this discussion on the web visit https://groups.google.com/a/owasp.org/d/msgid/esapi-project-users/CAA5t8VoCiZnYZDDqivcuXdmgPHcx9XcHLKnwDE2g40Rv%2BKO7%2BQ%40mail.gmail.com.
-kevin

Matt Seil

unread,
Jul 15, 2022, 12:24:42 PM7/15/22
to Kevin W. Wall, David Karr, esapi-project-users

I wanted to chime in my support here for the idea of ServletFilter or an AOP Proxy to prevent XSS as an antipattern. 

TL DR;

Typically people assume naively that a one-size fits all approach can be taken but since virtually every piece of data input into a system has different uses (and encoding payloads) attached to it, you end up having a filter or an interceptor so large that it has to keep intelligence about every data field in your application.  This should set off design warning bells if you're even passively familiar with the MVC design pattern. 

I have seen this done once and it was so bad we had to rip it out. 

1.)  Check the target URL

2.)  Check the target HTTP METHOD

3.)  Look up the form field type to determine the encoding/decoding logic

    a.)  For this step to work correctly the programmer has to analyze the downstream uses of this input (HQL, HTML/Javascript output back to client) and it was the maintenance of this feature that turned out to have rotted as the application had aged over ten years. 

Let's just say my favorite thing to do as a security-minded developer is to delete code. 

It's far easier (and superior) to simply build a coding practice where the developer always asks "what interpreters am I handing off to?" and then codes appropriately in the V part of MVC on output, and on the C-part of MVC for backend transactions.  With most modern SAST tool offerings, we usually have guiderails to keep even beginners on the right path.

David Karr

unread,
Jul 26, 2022, 2:29:17 PM7/26/22
to Matt Seil, Kevin W. Wall, esapi-project-users
Ok, so let's say that my request contains a json string that contains some values that could contain html or html entities, and it also contains some values that have embedded double quotes.

What I see is that "ESAPI.encoder().canonicalize(data)" handles the html pieces well, essentially decoding things like "&amp;" to "&", but it also creates a problem for those values with embedded double quotes.  For instance, it is converting text like this:

    {"validationMessage":[{"text":"500 Internal Server Error: [{\"code\":\"CGORCH00007\",\"message\":\"...\",\"traceId\":\"...\"}]","type":"Account"}]}

To:

    {"validationMessage":[{"text":"500 Internal Server Error: [{"code":"CGORCH00007","message":"...","traceId":"..."}]","type":"Account"}]}

The input was valid json, and the output is not valid.

Is this an indication that we simply can't use this canonicalize() method to do this sort of thing?  I've been given suggestions of simply doing global replacements of known patterns, like "&amp;" to "&" and perhaps some others, instead of using canonicalization.  That will get expensive if we have a long list of replacements, and with long requests.

Matt Seil

unread,
Jul 26, 2022, 3:05:47 PM7/26/22
to David Karr, Kevin W. Wall, esapi-project-users

Comments in-line...

On 7/26/2022 11:29 AM, David Karr wrote:
Ok, so let's say that my request contains a json string that contains some values that could contain html or html entities, and it also contains some values that have embedded double quotes.

What I see is that "ESAPI.encoder().canonicalize(data)" handles the html pieces well, essentially decoding things like "&amp;" to "&", but it also creates a problem for those values with embedded double quotes.  For instance, it is converting text like this:

    {"validationMessage":[{"text":"500 Internal Server Error: [{\"code\":\"CGORCH00007\",\"message\":\"...\",\"traceId\":\"...\"}]","type":"Account"}]}

To:

    {"validationMessage":[{"text":"500 Internal Server Error: [{"code":"CGORCH00007","message":"...","traceId":"..."}]","type":"Account"}]}

The input was valid json, and the output is not valid.

Is this an indication that we simply can't use this canonicalize() method to do this sort of thing?  I've been given suggestions of simply doing global replacements of known patterns, like "&amp;" to "&" and perhaps some others, instead of using canonicalization.  That will get expensive if we have a long list of replacements, and with long requests.

This is correct, and you hit upon the simplest use-case for why a centralized servlet filter style approach is inappropriate. 

The idea is that you would run canonicalize after your JSON objects have been transformed into POJOs--on the individual data pieces, not the entire JSON message itself. 

While this is oriented towards JSON the same thing was true back with XML. 

your code should looks something like:

Encoder enc = ESAPI.encoder();

String canonicalizedCode = enc.canonicalize(message.getCode());

//process to a validation call, or if using esapi validator implementation it will pass to canonicalize for you. 

Matt Seil

unread,
Jul 26, 2022, 3:13:19 PM7/26/22
to David Karr, Kevin W. Wall, esapi-project-users

One more thing... it's important to add that the general idea is that you want to have a logical separation between your communications medium (JSON/XML) and the code in which you're going to be manipulating data.  Errors creep in when you're trying to do operations on unmarshalled JSON/XML. 

Reply all
Reply to author
Forward
0 new messages