Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How can i deactivate paste in a rich text edit box ?

3 views
Skip to first unread message

Seth Russell

unread,
Sep 21, 2005, 1:45:25 AM9/21/05
to
I'm running Kevin Roth's rte box and i want to deactivate the ability
to past inside the box. People sometimes paste outrageous things in
there that might break my site. How can I deactivate the ability to
paste?

see: http://www.kevinroth.com/rte/demo.htm

Thanks for your help
Seth Russell

Lasse Reichstein Nielsen

unread,
Sep 21, 2005, 2:42:48 AM9/21/05
to
"Seth Russell" <russel...@gmail.com> writes:

> I'm running Kevin Roth's rte box

I don't know what it is, but it probably doesn't work in my browser
anyway ... checking ... well, at least I can write HTML in it.

> and i want to deactivate the ability to past inside the box. People
> sometimes paste outrageous things in there that might break my site.
> How can I deactivate the ability to paste?

That's probably not the best way to solve the problem. Pasting is
a useful operation, and disabling it will be guaranteed to annoy
some users eventually. Also remember, anything that can be pasted,
can also be written manually, so if someone wants to break your
site, they still can (or if need be, they'll fake a HTTP POST
of the bad content).

If your application has a problem with malformed input, it should
scan for exactly that, on the server, before using the input for
anything else.

That is general princliple in client/server programming on the
internet ... don't trust the client. The responsibility for preventing
site breakage should lie in a place that you can trust, which means
the server.

/L
--
Lasse Reichstein Nielsen - l...@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Seth Russell

unread,
Sep 21, 2005, 8:19:43 AM9/21/05
to
>I don't know what it is, but it probably doesn't work in my browser
>anyway ... checking ... well, at least I can write HTML in it.

Not in my version of it, i suppressed the "look at html" check box.
Did the wysiwyg not work in your browser? Which browser is that?

> Also remember, anything that can be pasted,
> can also be written manually,

Not really, you can't write HTML (in my version)

> If your application has a problem with malformed input, it should
> scan for exactly that, on the server, before using the input for
> anything else.

Yes, yes ... care to point me to a routine in php that does that.
Needs to
* disallow all scripts
* disallow broken html - this is going out on a atom \ Rss feed and
needs to be perfect XHTML

Seth Russell

Seth Russell

unread,
Sep 21, 2005, 8:24:16 AM9/21/05
to
PS: What i really want it to do is to strip all HTML just from the
paste input. It should function just exactly like the box here at
Google Groups. I want just what you get if you select all on a web
page and go to word pad and paste.

Seth Russell

christoph...@gmail.com

unread,
Sep 21, 2005, 10:33:00 AM9/21/05
to

Seth Russell wrote:
>
> Not really, you can't write HTML (in my version)

right. You're sending the user a program - a javascript program - and
saying, "please run this and send me the results." Then when you get
the results, you just assume that they are correct? Why? Because the
user was nice and ran your javascript program?

See, the thing is, a person can create their own little web page with a
form in it that submits to *your* page. Do you understand? The kind
of person that you're worried about, the kind of person who'd cut and
paste HTML, is certainly the kind of person who is technically capable
of this simple task.

You *have* to check the input. You have to. It's not optional. It's
not a nice thing that you'll do later, after you get the rest of the
application working. You have to do it now. Checking the input is
more important that the user interface. It's more important than that
rich-text edit box. Whatever it is that you're developing, it will
NEVER be secure until you check and correct the input.

I'm sorry, but this is web programming 101. It's really something that
you need to understand before you even get started.

> Yes, yes ... care to point me to a routine in php that does that.

When you say, "yes, yes" it kind of sounds like you're blowing the guy
off. He gave you good advice. You need to listen to it. Stop
whatever you're doing and fix the input on the server side.

For starters, you could remove all less-than signs.

Seth Russell

unread,
Sep 21, 2005, 11:12:17 AM9/21/05
to
> You *have* to check the input. You have to. It's not optional. It's
> not a nice thing that you'll do later, after you get the rest of the
> application working. You have to do it now. Checking the input is
> more important that the user interface. It's more important than that
> rich-text edit box. Whatever it is that you're developing, it will
> NEVER be secure until you check and correct the input.

Ok, I got it. I guess i suspected this all along and just needed
somebody with experience to tell me. Thanks.

Sorry if it sounded like i was blowing Nielsen off, I really do need
this to find a good sanatizer. Problem is finding a good one and
finding the correct point in the program to execuite it. Obviously i
cannot do the same sanatizing to the output of the RTE box that is
submitted to me that i do to the imput from the paste otherwise i would
loose all the rich text markup.

Prob is I'm pretty ok with php, but javascript is a foreign language
that i am just now learning. How can i preprocess the data comming
into the RTE box from the client's clipboard ? Then where is there a
good checking routine for the final output from the RTE box ?

Thanks for your help ...

Seth Russell

Lasse Reichstein Nielsen

unread,
Sep 21, 2005, 1:19:49 PM9/21/05
to
"Seth Russell" <russel...@gmail.com> writes:

>>I don't know what it is, but it probably doesn't work in my browser
>>anyway ... checking ... well, at least I can write HTML in it.
>
> Not in my version of it, i suppressed the "look at html" check box.
> Did the wysiwyg not work in your browser? Which browser is that?

Opera. It doesn't have formatted text input functionality. I don't
know if any browser except IE and Mozilla-based ones have such a
proprietary feature.

> Yes, yes ... care to point me to a routine in php that does that.
> Needs to
> * disallow all scripts
> * disallow broken html - this is going out on a atom \ Rss feed and
> needs to be perfect XHTML

I'd go the safer way and choose what to allow, not what to deny.
Any text formatting tags should be retained (b, i, u, em, strong,
br, perhaps even p). No attributes should be allowed (no event
handlers or style attributes[1], and the rest doesn't really matter
then). If any of these elements are not closed, it's not a big deal,
but you could count starts and ends add missing ends.

So in Javascript, I would do something like:
---
// list of allowed tagnames
var allowed = ['b','i','u','em','strong','br'];
// RegExp matching tag
var tagRE = /(.*?)(<(/?)(\w+)\b[^>]*>|$)/g;
// RegExp matching alloweed
var validRE = new RegExp("^("+allowed.join("|")+")$");

// replace all non-allowed tags and make sure all allowed tags are closed
function sanitize(html) {
// stack of open tags
var open = [];
// foreach tag, replace with ...
return html.replace(tagRE, function(_, before, tag, end, name) {
// escape < and & in non-tag text.
before = before.replace(/&/g,"&amp;").replace(/</g,"&lt;")
if (name) { // contains a tag - not end of string
if (validRE.test(name)) { // allowed tag
if (!end) { // allowed start tag
open.push(name);
return before+"<"+name+">";
} else { // allowed end tag
var result = [before];
var top;
while (top = open.pop()) {
result.push("</",top,">")
if (top == name) { break; }
}
return result.join("");
}
} else { // unallowed tags.
return before;
}
} else { // end of string
result = [before];
while(open.length > 0) {
result.pop("</",open.pop(),">");
}
return result.join("");
}
});
}
---
I.e., pick out tags and in-between text, escape all "<" and "&" in text,
remove all unallowed tags, remove all attributes from allowed tags,
and close all open tags correctly (remove incorrect closing tags).

While this might not give exactly what an author intended for some
invalid HTML, he really has only himself to blame :)

I have no idea how to convert this to PHP, but a competent PHP'er will
probably know how.
/L

[1] Yes, style elements can be dangerous too (works in, at least, IE):
<b style="background-image:
url(javascript:document.location.href='http://mysexsite.example.com/')">

christoph...@gmail.com

unread,
Sep 21, 2005, 3:32:41 PM9/21/05
to
Lasse Reichstein Nielsen wrote:
> So in Javascript, I would do something like:

My question is, what good is javascript in this situation? He still
has to check the input on the server side, before he puts it in his
database. He's got to do it in php.

Lasse Reichstein Nielsen

unread,
Sep 21, 2005, 4:20:17 PM9/21/05
to
"christoph...@gmail.com" <christoph...@gmail.com> writes:

> Lasse Reichstein Nielsen wrote:
>> So in Javascript, I would do something like:
>
> My question is, what good is javascript in this situation?

It's a functional description of an algorithm in a language that is
on-topic for this newsgroup. It might even be used on the client side
to preview what the final result will be, for non-malicious users.

> He still has to check the input on the server side, before he puts
> it in his database. He's got to do it in php.

Agree completely that it has to be used server side, in whatever
language the server side uses (which could be Javascript, but the
orginal poster appears to use PHP).

It's easier to translate an existing function into a new language than
to write one from scratch, and with a javascript version (as opposed
to a pseudocode description), you can even test that the translation
gives the same results.

/L

christoph...@gmail.com

unread,
Sep 21, 2005, 5:11:38 PM9/21/05
to
I understand.

Seth Russell

unread,
Sep 21, 2005, 5:15:33 PM9/21/05
to
> It's easier to translate an existing function into a new language than
> to write one from scratch, and with a javascript version (as opposed
> to a pseudocode description), you can even test that the translation
> gives the same results.

Yes, definitely i can translate from the javascript to PHP, thanks for
the code :)

The problem i still have is that i want to send the client a checker
(probably exactly the one you have given me above) but i don't know
where to install it in the javascript
(http://fastblogit.com/add/richtext.js) that is running the RTE box
such that it will intervene between the client's paste of their
clipboard. That same routine won't be applied on the output at the
server because then it would eliminate all the nice rich text editing,
right ?

Seth Russell

Seth Russell

unread,
Sep 21, 2005, 5:26:07 PM9/21/05
to
Just so im not misunderstood: I know i need to check on the server side
the XHTML comming back and disallow everyting that was not allowed from
the client's paste or generated by the RTE box logic itself. So
there are 2 different checks (1) between the client's clipboard and the
RTE box - for that i can use the routine above almost verbatem - i just
dont know where the intervention point is; and (2) back at the server
sanatize everything not comming from the gadgets in the RTE box or the
allowable HTML to be pasted.

Hmmm ... does that make sense ?

Seth

Lasse Reichstein Nielsen

unread,
Sep 21, 2005, 5:39:37 PM9/21/05
to
"Seth Russell" <russel...@gmail.com> writes:

> Just so im not misunderstood: I know i need to check on the server side
> the XHTML comming back and disallow everyting that was not allowed from
> the client's paste or generated by the RTE box logic itself. So
> there are 2 different checks (1) between the client's clipboard and the
> RTE box -

...

> Hmmm ... does that make sense ?

Somewhat. Why do you need to prevent the user from pasting "bad" HTML?
If the server removes the badness anyway, there is no problem in the end.

You might wait until submission time, and remove extraneous HTML tags
before submitting, but again, a normal user won't need it, because he
only submits "nice" HTML, and the malicious malefactor will disable
the javascript anyway.

The *only* reason to do anything on the client side is to help the
normal user. Since he isn't pasting bad HTML anyway, there is no need
to do anything. Even if he manages to paste bad HTML, the server will
remove it and (hopefully) display his result to him.

Lasse Reichstein Nielsen

unread,
Sep 21, 2005, 6:19:46 PM9/21/05
to
"Seth Russell" <russel...@gmail.com> writes:

> The problem i still have is that i want to send the client a checker
> (probably exactly the one you have given me above) but i don't know
> where to install it in the javascript

Why? Or, more precisely: What problem are you trying to solve by that?

Remember that in non IE/Mozilla browsers, the textarea will be just
that, a plain HTML textarea.

If you really want to do validation of the field, either do it at
submission time, or in an "onchange" handler on the textarea.

Also remember to check all fields, including both title fields,
since they can also contain malicious HTML.

> (http://fastblogit.com/add/richtext.js) that is running the RTE box
> such that it will intervene between the client's paste of their
> clipboard.

Shouldn't be possible. Pasting is just a way of adding a lot of text
without typing it one character at a time, but

> That same routine won't be applied on the output at the
> server because then it would eliminate all the nice rich text editing,
> right ?

The idea was to remove bad HTML from the input, which means that it
never gets any further.

I have now checked the site, and can see that more formatting is
allowed than what my script would let through. The colors are set
using spans with style attributes, so that should be allowed too ...
and then you can't throw away all attributes, or even all style
attributes, so a more precise filtering is needed.

What you need to remove is then, at least:
Scripts:
* all script elements.
* all intrinsic event handlers (any attribute starting with "on" should do)
* all script urls (any url starting with a protocol not http or ftp,
both in links and image elements, and in style attributes)
* any iframe or object element (could embed another page with scripts).
Malicious HTML:
* any opening or closing comment
* any closing tag not matching an opening tag (throw in a </table> and see :).
* any starting tag not closed (especially those with CDATA content,
i.e., script, style and textarea)

With those gone, I'm fairly sure there is no scripting left, and the
HTML can be contained in its div (adding </table> or </div> could otherwise
mess up the layout).

I have done some testing (as anonymous user aaaaej) which messes things
up quite badly (I think I deleted them now, or maybe the Wizzard did :).

Donius

unread,
Sep 22, 2005, 9:44:45 AM9/22/05
to
The *only* reason to do anything on the client side is to help the
normal user. Since he isn't pasting bad HTML anyway, there is no need
to do anything. Even if he manages to paste bad HTML, the server will
remove it and (hopefully) display his result to him.

I disagree, here! I run some CMS' for some customers who are, let's
see...not technically savvy. I wish everyone could be on the up and
up, but they aren't always. One thing that they constantly do that
fouls up the system before we instituted some scripting ((both server
and client side)) is paste content from MS Word. This looks like html.
Smells like html. But breaks our editor and the resulting webpages
like there's no tomorrow.

So, Mr. Russell, to tell you what we did, on the JS end, we tested with
a regex on pasting and submission for basic valid xhtml ((google for
something that will work, i did!)), and then something similar on the
backend of our server before submission.

Hope that helps!

-Brendan

0 new messages