How to validate and filter rich text?

4 views
Skip to first unread message

Aragos

unread,
Dec 3, 2007, 10:10:33 AM12/3/07
to Google Web Toolkit
Hello all,

while working with rich text editors (RTEs) like the GWT RichTextArea
I've been wondering how to best validate the output, or rich text,
generated by them.

I'm looking for a solution should be capable of stripping code of
unwanted tags (<script>, <iframe>, etc.), removing event handlers
(on...) and removing attributes whose value contain "javascript:").
The solution should not be fooled by encoding, case sensitivity and
other similar attacks which try to prevent recognition of malicious
code. It would also be nice to have validation that runs both server-
and client-side.

The first idea that came to my mind was to just run a set of regular
expressions on the code, stripping/searching it for everything
unwanted. Since I have no intention to re-invent the wheel, I went
looking for something like this online, but libraries like the jakarta
commons validation didn't provide the necessary methods. The closest
I could get was a stripTags() method, which takes care of unwanted
tags, but not of event handlers: http://www.xs4all.nl/~rvanloen/docs/commons/com/calitha/util/StringUtil.html

While looking into validation for rich text I also found the following
article which describes validation/filtering through XSLT:
http://coldfusion.sys-con.com/read/206288.htm Although XSLT would
only work on the server side, it seems to provide a lot of control and
should get rid of the encoding issues too.

Am I missing the obvious simple solution for this problem? Should I
just write the regular expressions and encode/decode methods? Is XSLT
a viable alternative? What do you think?

Cheers,

Peter Schmitt
Reply all
Reply to author
Forward
0 new messages