Comments in-line.
Hi Everyone, I'm having a tricky time with an XSS on a <Textarea> that is displaying XML. The XML is a response from an async request. I know we can't do something like: <!-- response is XML --> encoded = ESAPI.encoder().encodeForHTML(response); <textarea name="RESPONSE_XML">${encoded}</textarea>
You have to, otherwise how do you defend against
[</textarea><script>alert(1);</script>] ?
My question is, is output encoding an option? If so, where is the opportunity to encode the output? If not, what control is available?
No, you're thinking correctly here. Encode for HTML.
This can get hairy--I once managed an application that used
TinyMCE which is like embedding an HTML interpreter into the
browser. (Inception-like)
In THAT case we had to use an HTML Sanitizer since the use case
was essentially User-input HTML.
Jeff
--
You received this message because you are subscribed to the Google Groups "ESAPI Project Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to esapi-project-u...@owasp.org.
To view this discussion on the web visit https://groups.google.com/a/owasp.org/d/msgid/esapi-project-users/7223f3b4-5a7e-bec4-cda4-5bd82f018f44%40gmail.com.
This might be pedantic for an experienced programmer as yourself
Jeff, but the key for this is in understanding a bit more about
the data flow.
To simplify your life, I would suggest before submitting to the
server you do something that will save you a rediculous amount of
pain:
Base64 encode the entire payload of the <textarea> before
submitting. That way you just have a block of data back to the
server.
THEN you can decode safely at the server, without needing to worry
about detangling the morass of "THIS part of the input was HTML
encoded XML. My guess is that you're rightly having a hard
time "decoding" what is essentially double-encoded text, which
might be part of why you're thinking you can't do it... because if
you're using ESAPI for input validation, canonicalize will
correctly throw an exception that you sent double-encoded data.
Be cognizant of the fact that when you "submit" the data that
you're either submitting JSON, XML, or HTML and THAT is its own
encoding layer all by itself. If you can eliminate that variable
you get rid of something that may massively complicate
troubleshooting.
The algorithm is like this and assumes you program the full stack:
1.) Base64 Encode the payload before submission. The front end
will marshall this as JSON/XML/HTML
2.) At the server, ensure that the JSON/XML/HTML is
unmarshalled, then base64 decode the textarea data.
3.) Then validate the XML against the schema as Kevin
suggested. IIRC if using a SAX parser you pass it the schema on
construction and you get validation for free.
PITFALL: Do not get the XML input as one big large String and
then do an ESAPI input validation against that String. You will
have to do a sort of 'manual canonicalization' first. Parse it
into a POJO (or at least a map) and then iterate over the elements
calling canonicalize over each element. If not doing a schema
validation you will have to consider validating each individual
element for its own unique regex rules. Yuck. Avoid this. This
pitfall is major because this is another shade of the "DO NOT USE
A SERVLET FILTER TO CATCH ALL XSS! IT'S MATHEMATICALLY
IMPOSSIBLE!"
ESAPI is not well-suited for specialized use cases like
user-generated XML files. It assumes small inputs like user form
fields. XML, user-generated HTML, and URLs are all complicated
data types with their own BNF grammars, and as such should be
validated using a parser and not a regex. Hence why Kevin (and
myself) are telling you to make use of the XML schema.
ON THE WAY BACK TO THE USER: Encode any data destined for the textarea block for HTML.
Now you've covered both sides.
Your