http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project
and
http://blog.pengoworks.com/index.cfm/2008/1/3/Using-AntiSamy-to-protect-your-CFM-pages-from-XSS-hacks
The intent of that project is to produce "clean" HTML with no
injection attacks in - but it should also be possible to simply create
a policy that says "remove all HTML".
(Though I've not used it yet, so don't know how that would actually work.)
Note that the JS-based script Ryan has linked to is just a very
simple/crude regular expression (/<\S[^><]*>/g) which is easy to trip
up, so not really recommended for any user-facing stuff.
<textarea id="sdata" cols="40" rows="35">#HtmlEditFormat(
MyCfHtmlData.replaceAll( '<[^<>]++>' , '' ) )#</texarea>
That'll remove all tag-like constructs and escape any stray < that
might get left over.
If for some reason you need to do in JS, use this:
function stripTags(text) { return text.replace( /<[^<>]+>/g , '' ); }
document.getElementById('sdata').innerHTML =
stripTags('<cfoutput>#JsStringFormat(MyCfHtmlData)#</cfoutput>');
But, as I said before, using a regex to remove things that might look
like tags is not the best solution.
The proper answer is to use the AntiSamy project I linked to earlier -
it'll make it trivial to do small changes like this.
However, the previous regex can be adapted with a negative lookahead
to exclude P tags, like so:
(?!</?p>)<[^<>]+>
Which will then not remove <p> or </p> tags - but we're close to the
limit of what's sensible with regex and HTML - any more complex and
things start getting messy.
Does a pretty nice job of either stripping HTML or formatting HTML as
plain text.
Easy as pie to use from CFML, too! Just drop the jericho jar in the
WEB-INF/lib folder, and do something like this:
source = createObject("java","net.htmlparser.jericho.Source");
plainText = source.init("your HTML here").getRenderer().toString;
Might even have a railo extension for it out at some point. Jericho
does other nifty stuff too.
:Den
--
All action is for the sake of some end; and rules of action, it seems
natural to suppose, must take their whole character and color from the
end to which they are subservient.
John Stuart Mill
plainText = source.init("your HTML here").getRenderer().toString();
Even.