Need explanation of REGEX code

36 views
Skip to first unread message

Howard

unread,
Jul 31, 2021, 10:20:10 AM7/31/21
to BBEdit Talk
Can someone explain how the REGEX code below works? 
(.)(?![^(]*\))

I know it looks for digits, including two-digit numbers in parentheses, and extracts them and inserts a space after each one, but I do not know how the code starting with (? works.

Here is sample data for it:
010000000
(10)1140006x
002200010
00000(11)01x
311200

Here is output:
0 1 0 0 0 0 0 0 0 
(10) 1 1 4 0 0 0 6 x 
0 0 2 2 0 0 0 1 0 
0 0 0 0 0 (11) 0 1 x 
3 1 1 2 0 0 


TJ Luoma

unread,
Jul 31, 2021, 12:52:48 PM7/31/21
to BBEdit MailingList

Try pasting it into https://regexr.com

That's what I usually use to translate regex into English :-)



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/aa2c7738-44c7-4d75-b64e-7cf2655ffb4an%40googlegroups.com.

jj

unread,
Jul 31, 2021, 1:14:01 PM7/31/21
to BBEdit Talk
Hi Howard,

Here is a commented version of your regular expression.

(?x)        (?# Allow whitespace and comments.                              )
(           (?# Opening of capture 1.                                       )
    .       (?# Any single character.                                       )
)           (?# Closing of capture  1.                                      )
(?!         (?# Opening negative lookahead assertion.                       )
    [^(]*   (?# Zero or more characters that are not a opening parenthesis. )
    \)      (?# A closing parenthesis.                                      )
)           (?# Closing negative lookahead assertion.                       )

(?!...) is a Negative lookahead assertion.
It basically says: find occurrences of 'this' that are not followed by 'that' but don't include the 'not that' in the match –– hence the 'lookahead'.

For a detailed explanation check the excellent grep documentation in menu Help > BBEdit Help > Grep Reference link.

So that says:
"Any single character that is not followed by zero or more characters that are not the opening parenthesis, followed by a closing parenthesis."

The double negation makes it difficult to fathom.

The negative lookahead assertion says: skip any match that is followed by '...)' where ... are zero of more not '('.
The digits of the first line are not followed by '...)' -> match.
The first character of the second line '(' is followed by '10)' -> skip
The second character of the second line '1' is followed by '0)' -> skip
The third character of the second line '0' is followed by ')' -> skip
The fourth character of the second line ')' is not followed by '...)' -> match
...

A simpler regular expression that will give the same result (and probably be more performant) is:

(\([^\)]*\)|\d)

"Match any parentheses block or any single digits outside of parentheses blocks."

HTH

Jean Jourdain

jj

unread,
Jul 31, 2021, 1:19:42 PM7/31/21
to BBEdit Talk
Just seen there are some non digits in your input.

Then use:

(\([^\)]*\)|.)

Howard

unread,
Jul 31, 2021, 4:16:33 PM7/31/21
to BBEdit Talk
Thanks. I just tried it for the first time. It is good to know about.
Howard

Howard

unread,
Jul 31, 2021, 4:18:55 PM7/31/21
to BBEdit Talk
Thanks Jean for the comprehensive explanation. It is very helpful.
Reply all
Reply to author
Forward
0 new messages