Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

trying to match a "word", having difficulty with a "Lookbehind assertion"

58 views
Skip to first unread message

R.Wieser

unread,
May 21, 2022, 5:21:00 AM5/21/22
to
Hello all,

I'm trying to match group of characters, delimited by anything that does not
belong in that group.

Case in point: "Couldn't understand a g**d*** word she was saying!"

I would like to match the "g**d***" sequence of letters.

For that I've tried to use a RegEx :

RegExp("(?<=[^a-zA-Z\*])"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

-or-

RegExp("(?<![a-zA-Z\*])"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

("escapeRegExp()" escapes the "*" to "\*")

The problem is that neither of the above seem to match anything.

When I use the following

RegExp("\ \b"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

(no space between the first two slashes. Had to insert it otherwise my
newsclient turns it into a link)

all goes well - but for the problem that than the partial "d***" is matched
before "g**d***", throwing everything off.


Question: if not the above, what /am/ I suppose to use as the "look behind
assertion" ?

Regards,
Rudy Wieser


The Natural Philosopher

unread,
May 21, 2022, 6:17:36 AM5/21/22
to
In the time you have wasted trying to learn enough regexp, failing, and
posting here, you could have written it in C three times over...


--
“Ideas are inherently conservative. They yield not to the attack of
other ideas but to the massive onslaught of circumstance"

- John K Galbraith

JJ

unread,
May 21, 2022, 8:13:38 AM5/21/22
to
Look-behind is used to match patternA which follows or not follows patternB,
and does not include patternB in the match result.

Your condition is to simply to match "g**d***". You do not have a condition
that, "g**d***" must follow or must not follow other pattern. So,
Look-behind is not needed, or perhaps, not applicable.

For matching just "g**d***" as a whole word, we could just use:
(note: `*` is substituted with `#`)

/\bg##d###\b/gi

R.Wieser

unread,
May 21, 2022, 10:34:21 AM5/21/22
to
JJ,

> Look-behind is used to match patternA which follows or not follows
> patternB, and does not include patternB in the match result.

Yeah, I read that too :

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Assertions

> Your condition is to simply to match "g**d***". You do not have a
> condition
> that, "g**d***" must follow or must not follow other pattern.

Whut ? What do you think my 'RegExp(...)' is about than ?

Maybe a bit more code will help :

'- - - - - - - - - - - - - - - - - -

function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the
whole matched string
}

function replaceAll(str, find, replace) {
return str.replace(new RegExp("(?<=[^a-zA-Z\*])" + escapeRegExp(find) +
"(?=[^a-zA-Z\*])",'gi'), replace);;
}


newString = oldString.replaceAll("g**d***","goddamn")

'- - - - - - - - - - - - - - - - - -

My own contribution to the above are just the "(?....)" parts. The latter
one works, the former doesn't.


But, I think I found why it doesn't work : I just found out that my FF v52
already throws an "invalid regexp group" error on something simple as
oldString.replace(/(?<=p)a/g, "!");

No idea why though - and thus no idea what to replace it with either.

Regards,
Rudy Wieser


R.Wieser

unread,
May 21, 2022, 1:45:58 PM5/21/22
to
Stefan,

> |>|"alpha beta g****d*** eps".replace( /.*?\b([\w*]*\*[\w*]).*/, '$1' )
> | |
> |<|"g****d***"

The ultimate idea is to replace the sought-for word. Throwing away
everything thats "not it" isn't a viable approach I'm afraid.

Regards,
Rudy Wieser


R.Wieser

unread,
May 22, 2022, 7:42:47 AM5/22/22
to
TNP,

> In the time you have wasted trying to learn enough regexp, failing, and
> posting here, you could have written it in C three times over...

Good idea ! Now all you have to tell me how I get that C code loaded and
running in a browsers webpage ...

... Idiot.

Regards,
Rudy Wieser


The Natural Philosopher

unread,
May 22, 2022, 8:03:29 AM5/22/22
to
On 22/05/2022 12:42, R.Wieser wrote:
> TNP,
>
>> In the time you have wasted trying to learn enough regexp, failing, and
>> posting here, you could have written it in C three times over...
>
> Good idea ! Now all you have to tell me how I get that C code loaded and
> running in a browsers webpage ...

Google cgi-bin and AJAX

>
> ... Idiot.
>
Well javascript then. Its similar.

> Regards,
> Rudy Wieser
>
>


--
“But what a weak barrier is truth when it stands in the way of an
hypothesis!”

Mary Wollstonecraft

R.Wieser

unread,
May 22, 2022, 11:42:32 AM5/22/22
to
TNP,

> Google cgi-bin and AJAX

Good idea ! Now a simple, locally run JS command is translated to
something *way* heavier and complex, needing to communicate with a server
and going over two different frameworks to do the work . Oh wait, three
including the actual search-and-replace program.

I can see it now : the server sends a full webpage to the browser, than the
browser extracts all textparts from it one by one, sends each of them back
to the server and asks it to do a search-and-replace, after which the server
sends that part back and have Ajax replace the involved textpart in the
browser. Rinse and repeat for a list of to be replaced words. One
webpage, going around at least trice.

Nahhh, I don't think so. It would be *much* simpler to just send the whole
list of words and than have the server do everything at once and send the
full updated webpage back. Using PHP for the search-and-replace ofcourse.

... if the JS I posted would be coming from the/a server to begin with.
Which it doesn't. IOW, the above doesn't even apply.

Besides, JS itself is good a enough language to create a non-regexp solution
in.

Regards,
Rudy Wieser


The Natural Philosopher

unread,
May 23, 2022, 8:57:50 AM5/23/22
to
On 22/05/2022 16:42, R.Wieser wrote:
> TNP,
>
>> Google cgi-bin and AJAX
>
> Good idea ! Now a simple, locally run JS command is translated to
> something *way* heavier and complex, needing to communicate with a server
> and going over two different frameworks to do the work . Oh wait, three
> including the actual search-and-replace program.
>
I never said it would be lightweight or ideal. Merely that it was possible.
Neither are regexps.

> I can see it now : the server sends a full webpage to the browser, than the
> browser extracts all textparts from it one by one, sends each of them back
> to the server and asks it to do a search-and-replace, after which the server
> sends that part back and have Ajax replace the involved textpart in the
> browser. Rinse and repeat for a list of to be replaced words. One
> webpage, going around at least trice.
>
> Nahhh, I don't think so. It would be *much* simpler to just send the whole
> list of words and than have the server do everything at once and send the
> full updated webpage back. Using PHP for the search-and-replace ofcourse.
>
> ... if the JS I posted would be coming from the/a server to begin with.
> Which it doesn't. IOW, the above doesn't even apply.
>
> Besides, JS itself is good a enough language to create a non-regexp solution
> in.
>

As I said.

> Regards,
> Rudy Wieser
>
>


--
In a Time of Universal Deceit, Telling the Truth Is a Revolutionary Act.

- George Orwell

Thomas 'PointedEars' Lahn

unread,
Jun 8, 2022, 6:15:02 PM6/8/22
to
JJ wrote:

> On Sat, 21 May 2022 11:20:22 +0200, R.Wieser wrote:
>> For that I've tried to use a RegEx :
>>
>> RegExp("(?<=[^a-zA-Z\*])"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

There is a conceptual problem in this line near the beginning and the end…

>> -or-
>>
>> RegExp("(?<![a-zA-Z\*])"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

…and in this line near the beginning and the end.

The same problem leads to a program logic problem (although not a syntax
error, which is why it is not so easily spotted) in the following line:

> RegExp("\ \b"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

It is too bad that the OP was posting with an invalid “From” header field
via a troll server, so that is all that I will say about it until at least
the former changes. Then again, they may have found the problem already.

--
PointedEars
<https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2
Please do not cc me. /Bitte keine Kopien per E-Mail.

R.Wieser

unread,
Jun 9, 2022, 3:16:43 AM6/9/22
to
Thomas,

>>> RegExp("(?<=[^a-zA-Z\*])"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')
>
> There is a conceptual problem in this line near the beginning and the end.

You may want to explain that. And no, I'm not going to either guess or try
to read your mind for what you might be meaning.

> The same problem leads to a program logic problem (although not a syntax
> error, which is why it is not so easily spotted) in the following line:

You've now made two claims of of which you didn't prove, underbuild or even
explained either. How is that in any way helpfull ?

You've also not posted not anything that might actually help me find any
kind of solution.

tl;dr:
your current post is as helpfull as a bit of lorem ipsum.

Regards,
Rudy Wieser


Thomas 'PointedEars' Lahn

unread,
Jun 9, 2022, 6:21:11 AM6/9/22
to
Thomas 'PointedEars' Lahn wrote:

> JJ wrote:
>> On Sat, 21 May 2022 11:20:22 +0200, R.Wieser wrote:
>
> […]
> The same problem leads to a program logic problem (although not a syntax
> error, which is why it is not so easily spotted) in the following line:
>
>> RegExp("\ \b"+escapeRegExp(find)+"(?=[^a-zA-Z\*])",'gi')

Sorry for the wrong quotation level, it should have been two “>” originally.

Scott Sauyet

unread,
Jun 14, 2022, 1:56:54 PM6/14/22
to
R.Wieser wrote:
> I'm trying to match group of characters, delimited by anything that does not
> belong in that group.
>
> Case in point: "Couldn't understand a g**d*** word she was saying!"
>
> I would like to match the "g**d***" sequence of letters.

I don't understand your look-behind discussion at all. If all you want
to do is to replace some string in another, it's mostly a
simple problem, with the only significant complexity coming from needing
to escape certain characters for a regular expression.

This should work:

```
const replaceTerm = (term, replacement) => (text) =>
text .replace (new RegExp (term.replace(/\*/g, "\\*"), "gi"), replacement)

replaceTerm
("g**d***", "🔵🟡🔴🟢🔵🟡")
("Couldn't understand a g**d*** word she was saying!")
//=> "Couldn't understand a 🔵🟡🔴🟢🔵🟡 word she was saying!"
```

Obviously, you can do whatever sophisticated escape replacement you need
on the term before passing it to the RegExp constructor.

I'm guessing that the point Thomas was trying to be mysterious about is
the lack of a `new` when you created the RexExp.

-- Scott

R.Wieser

unread,
Jun 15, 2022, 3:47:30 AM6/15/22
to
Scott,

> I don't understand your look-behind discussion at all. If all you
> want to do is to replace some string in another, it's mostly a
> simple problem, with the only significant complexity coming from
> needing to escape certain characters for a regular expression.

Google "clbuttic mistake" and you'll know what the problem with that is.
:-)

> ("g**d***", "????????????")
> ("Couldn't understand a g**d*** word she was saying!")

Now imagine that ("d***", "damn") is another replacement, which might be
executed before the ("g**d***", "goddamn") one. I would be left with
"g**damn". Not good.

Or "p****" -> "pussy" and "p*****" -> "pissed" (a single "*" difference).
If replaced in that order what /should/ result in "pissed" would become
"pussy*" instead. Also not good.

By the way : I got my "check for delimitation" idea from the (build in) "\b"
check (ask yourself: why does it exist ?). It doesn't quite work for what
I need it for, so I tried to roll my own. And failed. :-(

Regards
Rudy Wieser


Scott Sauyet

unread,
Jun 15, 2022, 3:42:47 PM6/15/22
to
R.Wieser wrote:
> Scott Sauyet wrote:

>> I don't understand your look-behind discussion at all. If all you
>> want to do is to replace some string in another, it's mostly a
>> simple problem, with the only significant complexity coming from
>> needing to escape certain characters for a regular expression.

> Google "clbuttic mistake" and you'll know what the problem with that is.
> :-)

Oh, I understand that problem. Which is what worries me with ever
trying to do what you seem to be trying to do.

>> ("g**d***", "????????????")
>> ("Couldn't understand a g**d*** word she was saying!")

> Now imagine that ("d***", "damn") is another replacement, which might be
> executed before the ("g**d***", "goddamn") one. I would be left with
> "g**damn". Not good.

Ahh, so this isn't about replacing the regex at all, but how to organize
a collection of search strings. Is that right? Because if that is, then
we simply need to define a partial ordering of the search strings and do
a topological sort of the resulting dependency graph, in order to run the
replacements one-by-one. Does that sound correct?

If that's what you're looking for, then this might define a useful partial
ordering:

```
const useBefore = (s, t) => new RegExp (t .replace (/\*/g, '.')) .test (s)

useBefore ('d***', 'g**d****') //=> false
useBefore ('g**d****', 'd***') //=> true
useBefore ('d**n', 'd***') //=> true
useBefore ('d***', 'd**n') //=> false
useBefore ('abc', 'xyz') //=> false
useBefore ('xyz', 'abc') //=> false
```

Note that the last two examples show that this is only a partial ordering.

If this is what you need, then the next step is simply to apply a
topological sort [1] of the graph generated by that partial order. There
are well-known algorithms for this.



[1]: https://en.wikipedia.org/wiki/Topological_sorting


-- Scott

R.Wieser

unread,
Jun 15, 2022, 4:15:29 PM6/15/22
to
Scott,

> Ahh, so this isn't about replacing the regex at all, but how to
> organize a collection of search strings. Is that right?

Nope. It really is about creating a regexp that matches searchstrings as
if they are words.

I know I can sort the searchstrings and solve my problem that way, but that
currently is not my interest. Thank you for that "topological sorting"
link though.

Imagine that I have "g**d***" somewhere in the text, but no replacement for
it yet. But I already have "d***" as a searchstring. In that case that
"g**d***" should stay as it is.

Regards,
Rudy Wieser


Scott Sauyet

unread,
Jun 15, 2022, 4:32:39 PM6/15/22
to
R.Wieser wrote:

> Imagine that I have "g**d***" somewhere in the text, but no replacement for
> it yet. But I already have "d***" as a searchstring. In that case that
^^^ ^^^^^^^
> "g**d***" should stay as it is.

I'm not sure what is meant here by "yet" and "already". Is there some
temporal change at play? Do you have a good example of an input structure,
perhaps a list of search strings and their (possible?) replacements and an
input string together with the expected output?

Or possibly a few such examples?

I'm now getting intrigued, but I still don't really understand the problem.

Also, is this an attempt to solve a real-world problem or is mostly a way to
noodle with more advanced regexes?

-- Scott

R.Wieser

unread,
Jun 16, 2022, 3:19:08 AM6/16/22
to
Scott,

> I'm not sure what is meant here by "yet" and "already". Is there some
> temporal change at play?

The searchwords are added (by me) as I encounter them.

> Or possibly a few such examples?

Of what ? I've given several examples (of the basic problem and how your
approach does not quite work), but those are not the ones you seem to need.

> I'm now getting intrigued, but I still don't really understand the
> problem.

Ask yourself : does your approach solve the "this is a clbuttic mistake"
problem ?

'Cause *that* is what I am after - just not for regular words as recognised
by a RegExp (see "\w", "\W", "\b")

> Also, is this an attempt to solve a real-world problem or is mostly a way
> to noodle with more advanced regexes?

Yes ...

https://notalwaysright.com/some-people-shouldnt-be-allowed-to-drive-or-go-out-in-public/258702/

... and yes

Discovering/learning how to solve certain problems is never wasted time to
me.


Also, do notice that I provided the correct solution in my first post.
Only later I found out that the "look behind assertion" method itself isn't
recognised by my FF v52 browsers JS. :-\

Regards,
Rudy Wieser


Scott Sauyet

unread,
Jun 16, 2022, 1:33:18 PM6/16/22
to
R.Wieser wrote:
> Scott Sauyet wrote:,

>> Or possibly a few such examples?

> Of what ? I've given several examples (of the basic problem and how your
> approach does not quite work), but those are not the ones you seem to need.

I have some examples below. I'm *hoping* that they capture at least a
good chunk of your requirements, but it's not clear to me. If there are
gaps, could you supply such an example showing what you expect and how
this differs from that?

>> I'm now getting intrigued, but I still don't really understand the
>> problem.

> Ask yourself : does your approach solve the "this is a clbuttic mistake"
> problem ?
>
> 'Cause *that* is what I am after - just not for regular words as recognised
> by a RegExp (see "\w", "\W", "\b")

I don't think that anything can comfortably reverse the clbuttic
mistake. If you were to try, you'd probably end up with a "bellyasson"
in your navel. What I think you're doing is trying to find a way to
reverse the manual censoring of, say, `g**d***` which is too unlikely to
be intended text, and restore `goddamn`.

I'm trying for something simpler, still using regex. But obviously if
you can't do this with regex, you can still fall back on a custom state
machine implementation. That would be a shame.


>> Also, is this an attempt to solve a real-world problem or is mostly a way
>> to noodle with more advanced regexes?

> Yes ...
>
> https://notalwaysright.com/some-people-shouldnt-be-allowed-to-drive-or-go-out-in-public/258702/
>
> ... and yes
>
> Discovering/learning how to solve certain problems is never wasted time to
> me.


No, but there is are different ways to answer the questions, "How do I
solve this real world problem?" and "How can I best use regex to solve
this problem?"

I don't know whether I should be trying to help you find a way to better
use regex to solve the problem, or looking more broadly at unrelated
techniques. That's why I asked.

> Also, do notice that I provided the correct solution in my first post.
> Only later I found out that the "look behind assertion" method itself isn't
> recognised by my FF v52 browsers JS. :-\

Right, you won't be able to use look-behind in FF before version 78. [1]
But as the current version is 101, and as version 52 was first released
more than five years ago, perhaps an update is in order?

Finally, here's another attempt, using a combination of a negative
look-behind before the target and a negative look-ahead after it. Does
this capture what you're trying to do?

```
const unclean = (w, r) => (s) =>
s .replace (new RegExp (
'\\b(?<!\\*)' + w .replace (/\*/g, '\\*') + '(?![\w*])',
'gi'
), r);

unclean ("g**d***", "goddamn") (""... a g**d*** word ...)
//=> "... a goddamn word ..."
unclean ("g**d***", "goddamn") (""... a d*** word ...)
//=> "... a d*** word ..."
unclean ("d***", "damn") (""... a g**d*** word ...)
//=> "... a g**d*** word ..."
unclean ("d***", "damn") (""... a d*** word ...)
//=> "... a damn word ..."
unclean ("d***", "damn") (""d*** this, d*** that, d*** everything!)
//=> "damn this, damn that, damn everything!"
unclean ("d***", "damn") (""Frankly, my dear, I don't give a d***!)
//=> "Frankly, my dear, I don't give a damn!"
unclean ("d***", "damn") (""Would I knew the villain! / I would lam-d*** him)
//=> "Would I knew the villain! / I would lam-damn him"
unclean ("d***", "damn") (""Teacher, what does 'landd***e' mean?)
//=> "Teacher, what does 'landd***e' mean?"
```

-- Scott

[1]: <https://caniuse.com/js-regexp-lookbehind>

R.Wieser

unread,
Jun 16, 2022, 3:18:52 PM6/16/22
to
Scott,

>> Ask yourself : does your approach solve the "this is a clbuttic mistake"
>> problem ?
...
> I don't think that anything can comfortably reverse the clbuttic
> mistake.

Than start with thinking how *not* to make it. 'Cause /that is all/ what my
problem is about.

I've already tried to explain it, but here it goes again :

* * * * * * * * * * * * * *

When I'm replacing all instances of "ass" in a piece of text I DO NOT WANT
to see "classic" being converted into "clbuttic".

* * * * * * * * * * * * * *

Replace "ass" with "d***" and "classic" with "g**d***" and you have my
example of two posts back.

> What I think you're doing is trying to find a way to reverse the manual
> censoring of, say, `g**d***` which is too unlikely to be intended text,
> and
> restore `goddamn`

Manual or not, yes, that is indeed what I'm doing ...

.... And *not at all* what my question is about.

> No, but there is are different ways to answer the questions, "How do I
> solve this real world problem?" and "How can I best use regex to solve
> this problem?"

The problem with the above is that you are second-guessing what I "really"
want to know.

Don't.

> Finally, here's another attempt, using a combination of a negative
> look-behind before the target and a negative look-ahead after it.

Sigh.

Look at the subjectline. What does that tell you ?

Also take a quick peek at the last few lines of my second message.

Regards,
Rudy Wieser


Кристьян Робам

unread,
Jun 17, 2022, 2:35:01 AM6/17/22
to
Hyviä seksikuvia sinulle..... -------------> https://www.sexpicturespass.com/xxx/gay-big-dick-sucking.html

Scott Sauyet

unread,
Jun 17, 2022, 12:57:06 PM6/17/22
to
R.Wieser wrote:
> Scott Sauyet wrote:

>> I don't think that anything can comfortably reverse the clbuttic
>> mistake.

> Than start with thinking how *not* to make it. 'Cause /that is all/ what my
> problem is about.

Sigh.

The code I've supplied is exactly an attempt to do this. It seems to
work for this case:

const unclean = (w, r) => (s) =>
s .replace (new RegExp (
'\\b(?<!\\*)' + w .replace (/\*/g, '\\*') + '(?![\w*])',
'gi'
), r);

unclean ('ass', 'butt') ('The law is an ass')
//=> "The law is an butt"
unclean ('ass', 'butt') ('Another classic mistake')
//=> "Another classic mistake"


> I've already tried to explain it, but here it goes again :
>
> When I'm replacing all instances of "ass" in a piece of text I DO NOT WANT
> to see "classic" being converted into "clbuttic".


Thank you. That's nice and clear. Look above and see if, as an outsider,
you would infer this from what you've said. I certainly didn't. All your
`g**d***` discussion made it sound like someone had already done some
conversion on plain, possibly obscenity-laced text, and that you were
trying to reverse it.

So are there cases my code doesn't handle?


>> No, but there is are different ways to answer the questions, "How do I
>> solve this real world problem?" and "How can I best use regex to solve
>> this problem?"

> The problem with the above is that you are second-guessing what I "really"
> want to know.
>
> Don't.

An attempt to tease out actual requirements from someone who thinks he's
being clear, but hasn't communicated them well enough for you to
understand, is substantially different from second-guessing what he wants.


>> Finally, here's another attempt, using a combination of a negative
>> look-behind before the target and a negative look-ahead after it.

> Sigh.

Even your updated problem statement above does not tell me what that
sigh is about.


> Look at the subjectline. What does that tell you ?

That you're having some issue with your attempt at a look-behind
solution to the problem of "trying to match a 'word'. Nothing there
tells me whether you want to make negative look-behind work properly
or whether you want to do this with regex in another way or if you're
open to non-regex solutions, or if you just want to b****-and-moan
(:->) about the state of the world.


> Also take a quick peek at the last few lines of my second message.

I believe I already offered a solution to that with:

| Right, you won't be able to use look-behind in FF before version 78.
| But as the current version is 101, and as version 52 was first released
| more than five years ago, perhaps an update is in order?


As far as I can tell, I've written a function that matches your problem
statement, and matched it before you made that statement explicit. If
it doesn't match it, can you show an example that still fails, or give
further clarifications to the statement? If you are saying that it has
to work in five-year old Firefox, does it also have to work in 25-year
old Netscape? (Regex was not added until ES3 -- I know because I had to
implement a partial regex interpreter in ES1/2 JS -- not fun!) If it
must be a regex solution, please make that clear. Conversely, if it
cannot be regex, also make that clear.

My mind-reading days are long over.

-- Scott

Härra Rabmo

unread,
Jul 6, 2022, 12:13:53 PM7/6/22
to
Hey guy. Would You want to get rich with me doing programming? If yes, call me !!!!!!!!!

✆☎☏📱📞 : 372 53900660


R.Wieser kirjutas Laupäev, 21. mai 2022 kl 11:21:00 UTC+2:
0 new messages