str.replace() not catching, need a second set of eyes

Jason C

unread,

Oct 3, 2012, 5:25:13 AM10/3/12

to

I'm working with a contenteditable field. When someone pastes data that includes <span... itxt...>... or <a href...rel="nofollow"...>...</a>, I want the tags to be removed.

Here is the exact code I'm using:

a = a.replace(/]*?itxt[^>]*?>([\s\S]*?)<\/span>/gi, '$1');
a = a.replace(/<a href=[^>]*?rel=["']nofollow[^>]*?>([\s\S]*?)<\/a>/gi, '$1');

This works 99.9% of the time, but every once in awhile, I'll have something like this come through:

<a href="http://www.example.com/?start=0&h=20121001202337#" rel="nofollow" id="itxthook1" class="itxtnewhook>whatever</a>

Specifically, notice that the class="itxtnewhook doesn't have a closing ", and there's a closing without an opening .

Since this is being modified onPaste, I don't see the text before it's submitted, and I haven't been able to duplicate it myself so I'm guessing that it's related to a specific browser. But can you guys see what I'm missing in my str.replace() script that's letting this slip through?

Evertjan.

unread,

Oct 3, 2012, 10:51:33 AM10/3/12

to

Some ancient browsers, including IE7,
do not have nongreedy regex parsing.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jason C

unread,

Oct 3, 2012, 3:36:32 PM10/3/12

to

On Wednesday, October 3, 2012 10:51:34 AM UTC-4, Evertjan. wrote:

> Some ancient browsers, including IE7,
> do not have nongreedy regex parsing.

Ouch! I have a significant percentage of my traffic using IE7 (a little over 22,000 visitors, which is 9.77%), and even a few using IE6.

Is there any way to test for that, other than checking for each non-compatible browser one-at-a-time? Even if it's just to ignore the str.search() so that the Perl processing script can catch it.

Danny

unread,

Oct 3, 2012, 4:36:53 PM10/3/12

to

I see nothing wrong with your regex, I test it in a mock page using js as well as in Kodos(python regex app) and it shows that it's actually nongreedy, well I guess I didn't test in IE7, I don't have it at any rate just 9, however, it shows that when you have 1 nested or more will pick up the outter one only, which leads me to think is just some malformed markup being pasted, since it's just string anyway before any parsing. I can't say is just an old browser regex engine, but I'd just do some logging on when it happens, log the inputs, save 'var a' before any .replace() takes place. As far as which browser is parsing, you can get it from js with navigator.userAgent.

You can just do some quick logging for a small period of time, like say 5hrs or so, to be able to check which inputs are doing it.

Evertjan.

unread,

Oct 4, 2012, 3:47:40 AM10/4/12

to

Jason C wrote on 03 okt 2012 in comp.lang.javascript:

> On Wednesday, October 3, 2012 10:51:34 AM UTC-4, Evertjan. wrote:
>
>> Some ancient browsers, including IE7,
>> do not have nongreedy regex parsing.
>
> Ouch! I have a significant percentage of my traffic using IE7 (a
> little over 22,000 visitors, which is 9.77%), and even a few using
> IE6.
>
> Is there any way to test for that, other than checking for each
> non-compatible browser one-at-a-time? Even if it's just to ignore the
> str.search()

Yes,
we all used to make regex without the "non greedy" part
in those old days.

/<.+?>/

or sometimes:

/<[/s/S]+?>/

used to be:

/<[^>]+>/

> so that the Perl processing script can catch it.

Neither cast ye your pearls before swine.

Jason C

unread,

Oct 5, 2012, 4:42:39 AM10/5/12

to

On Thursday, October 4, 2012 3:47:41 AM UTC-4, Evertjan. wrote:

> Yes,
> we all used to make regex without the "non greedy" part
> in those old days.
>
> /<.+?>/
>
> or sometimes:
>
> /<[/s/S]+?>/
>
> used to be:
>
> /<[^>]+>/

I think that you guys were right; the problem seems to be fixed by more or less just removing the ?. This seems to be working correctly now on all of the browsers I could test with:

a = a.replace(/]*itxt[^>]*>([\s\S]*?)<\/span>/gi, '$1');
a = a.replace(/<a [^>]*href=[^>]*rel=["']nofollow[^>]*>([\s\S]*?)<\/a>/gi, '$1');

For future readers, I'm using [\s\S] because JS doesn't necessarily match . to \n. Using [\s\S] does.

(Sorry if this goes through twice. Stupid Google Groups...)

Dr J R Stockton

unread,

Oct 6, 2012, 2:10:56 PM10/6/12

to

In comp.lang.javascript message <fd11da20-8871-4caf-a83e-409e9c939a88@go
oglegroups.com>, Fri, 5 Oct 2012 01:42:39, Jason C <jwca...@gmail.com>
posted:

>
>For future readers, I'm using [\s\S] because JS doesn't necessarily match . to \n. Using [\s\S] does.
>

Did you consider [^] ??
<http://www.merlyn.demon.co.uk/js-valid.htm#RG> says it works.

--
(c) John Stockton, nr London UK. Mail, see homepage. DOS 3.3, 6.20; WinXP, 7.
Web <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links.
PAS EXE TXT ZIP via <http://www.merlyn.demon.co.uk/programs/00index.htm>
My DOS <http://www.merlyn.demon.co.uk/batfiles.htm> - also batprogs.htm.