application error code: 12247 when doing a GREP search

ce gm

unread,

Dec 14, 2024, 4:43:01 PM12/14/24

to BBEdit Talk

Hello there,

I am doing a GREP search on a .txt file in Bbedit on my Mac. Here are the find/replace terms:

Find: \b(\w+)+\1\b
Replace: \1

When I input the Find term, it correctly identifies the targets in the preview (highlights them in yellow). Then, when I push Replace All, I get a pop up with Application Error Code: 12247 and nothing else.

Anyone know what this means? A cursory Google search was not helpful.

Thanks!

Bruce Van Allen

unread,

Dec 14, 2024, 6:07:35 PM12/14/24

to bbe...@googlegroups.com

Hi,

An example of the text and a description of what you’re trying to accomplish would help.

From your find pattern, I’m guessing you’re trying to find cases where a string is followed by the same string, to be replaced by just one instance of the string.

'\b(\w+)+\1\b’ (your original - without the quotes)

Your find pattern’s second plus sign ‘+’ isn’t doing anything, because the first one, which quantifies the ‘\w’, is grabbing every consecutive word/alphanumeric character including any repetitions.

Removing that second ‘+', the find pattern '\b(\w+)\1\b’ (without the quotes) will find a string of word characters followed immediately by the same string, as in ‘My sentence is abcabc for defdef.’ Using your replacement pattern of ‘\1’, this will become ‘My sentence is abc for def.’

Guessing that you’re are actually looking for duplicated WORDS, if the find pattern has a spacebar space ‘ ‘ then it will find any word followed by a space and then the same exact word, and the replacement will eliminate the duplication.

With find pattern '\b(\w+) \1\b’, your replacement pattern makes 'My sentence is abc abc for def def.’ into 'My sentence is abc for def.’

If you want to find a string of word characters that matches an earlier instance of the same string but separated by more than just a space, your pattern may be more complicated.

HTH and please clarify if my guesses are wrong.

— Bruce

_bruce__van_allen__santa_cruz_ca_

> --
> This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
> ---
> You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/bbedit/c9e18d6f-f5c4-467e-9c01-fa4ffbaa5485n%40googlegroups.com.

Tim A

unread,

Dec 15, 2024, 10:02:20 AM12/15/24

to BBEdit Talk

Using BBEdit version14.6.9, I replicated Bruce's test string with original search pattern and his proposed corrected one. Both worked fine, neither generated an application error.

GP

unread,

Dec 15, 2024, 6:50:55 PM12/15/24

to BBEdit Talk

First with BBEdit 15.1.3 (15B62, Apple Silicon) I didn't get any error with ce gm's grep find and replace.

That said, however, I found the second + is doing something in the find and replace operation.

Using Howard's posted sample records test from the "Sorting multiple records in a text file" for testing text. Using the Pattern Playground with the find: '\b(\w+)+\1\b’ (without the quotes) and replace: \1 pattern, 7 matches were found:

0 -> facilisis
1 -> is
replacement -> is

0 -> Underhill
1 -> l
replacement -> l

0 -> 11
1 -> 1
replacement -> 1

0 -> Afterall
1 -> l
replacement -> l

0 -> 11
1 -> 1
replacement -> 1

0 -> 22
1 -> 2
replacement -> 2

0 -> Afterall
1 -> l
replacement -> l

whereas, with the find: '\b(\w+)\1\b’ (without the second + and without the quotes) and same replace pattern, only 3 matches were found:

0 -> 11
1 -> 1
replacement -> 1

0 -> 11
1 -> 1
replacement -> 1

0 -> 22
1 -> 2
replacement -> 2

According to https://regex101.com's explanation, the difference is due to the capturing group workings of the (\w+)+ part of the regular expression: "A repeated capturing group will only capture the last iteration." So, if I'm not mistaken, the workings of (\w+)+ is equivalent to \w*(\w+) and the equivalent find grep is \b\w*(\w+)\1\b . That would match any word string containing zero or more word characters followed by a capturing group of one or more word characters followed by a single repeat of the captured group of characters. According to regex101.com's Regex Debugger there's a whole lot of backtracking going on to find all the matches with the \b\w*(\w+)\1\b grep.

On Saturday, December 14, 2024 at 3:07:35 PM UTC-8 Bruce Van Allen wrote:

Bruce Van Allen

unread,

Dec 15, 2024, 7:14:09 PM12/15/24

to bbe...@googlegroups.com

Thanks for digging into the regex meaning of that second ‘+’ in '\b(\w+)+\b’.

As it turned out, the OP needed to find repeated words, not characters, so inserting a spacebar space for the second plus sign totally works for them.

Also, I’m not sure you’re suggesting this but at the end of your comment you’re talking about the pattern '\b\w*(\w+)\1\b’. That first zero or more word characters - \w* - won’t be captured and so won’t be in the replacement pattern. Is that what you meant?

Best,

— Bruce

_bruce__van_allen__santa_cruz_ca_

> To view this discussion visit https://groups.google.com/d/msgid/bbedit/72b08e6c-5ac8-478c-8f54-9baddaeb18een%40googlegroups.com.

GP

unread,

Dec 15, 2024, 9:35:02 PM12/15/24

to BBEdit Talk

Yes, that's the way the original '\b(\w+)+\1\b’ works in practice. The first zero or more word characters aren't captured but are included in the match so they get deleted in the replacement of using just the \1 capture group. Since \w* and (?:\w+)* are match equivalent in practice, perhaps the expression '\b(?:\w+)*(\w+)\1\b’ will better explain how it is just the last iteration match of (\w+) of the (\w+)+\1 expression that is captured and how any and all of the preceding groupings of \w+ matches, if any, are discarded as captures and aren't included in the one, final capture group.

Take for example the word facilisis. The regular expression engine ends up finding a leading match on - facil - a group 1 capturing match on - is - and a non-capturing match to capture group 1 - is . The whole matched word string then gets replaced by just the capture group 1 string of 'is' (without the quotes). Your guess is as good as mine as to how much string slicing and dicing; capturing and capture discarding the engine is performing before arriving at that match and capture group solution.

That said, I flubbed the copy and paste in the last of that comment discussing backtracking. I intended to use '\b(\w+)+\1\b’ for the backtracking comment part but instead copied and pasted '\b\w*(\w+)\1\b’. As it turns out both have a whole lot of backtracking but '\b\w*(\w+)\1\b’ has slightly less backtracking than '\b(\w+)+\1\b’ on the example search text I was using.

ce gm

unread,

Dec 19, 2024, 12:00:00 PM12/19/24

to BBEdit Talk

Hi all,

Apologies, I didn't realize my correspondence with Bruce didn't get added to the whole thread! My use case for this grep search was that I am transferring text from PDF to .txt to do text mining. In transferring from PDF to .txt many of the files were duplicating words, as an example:

Here isis anan exampleexample of the way inin whichwhich somesome of the texttext wouldwould be transferred to .txt

I was trying to use a grep search to find the duplicated words and replace them with just a single instance of that word, meaning: take "exampleexample" and change it to "example"

Bruce's solution, '\b(\w+)\1\b’ did end up working for me.

I had also contacted BBedit's help service, and they said that the error code 12247 was a "match limit exceeded" error and suspected it was due to the number of instances the original grep search was finding.

thanks all!

Reply all

Reply to author

Forward