Extract number within parentheses using GREP

554 views
Skip to first unread message

Howard

unread,
Jun 1, 2021, 9:11:43 AM6/1/21
to BBEdit Talk
I have a set of numbers. Within some of them is at least one number within parentheses. I need to find all the lines containing numbers within parentheses and extract those numbers. I also need to know which line they are extracted from. How can I do that?

Howard

Examples
1001405001
0000(10)0000
001320000001
0(16)5000000
021(10)0000(11)
010101000

Neil Faiman

unread,
Jun 1, 2021, 10:15:09 AM6/1/21
to BBEdit Talk Mailing List
There is likely a simpler way, but here is an easy three step process.

1. Text > Add/Remove Line Numbers... . Select “Insert”, check “Include space after number”:

1 1001405001
2 0000(10)0000
3 001320000001
4 0(16)5000000
5 021(10)0000(11)
6 010101000

2. Text > Process Lines Containing... . Search string “\(\d+\)”. Check “Grep”, “Copy to new document”:

2 0000(10)0000
4 0(16)5000000
5 021(10)0000(11)

3. Command-F (or Search > Find....). Search string “[ )]\d*\(?”. Replacement string ” ” (a single space). Check “Grep”.Click “Replace All”.

2 10 
4 16 
5 10 11 

In the result, each line contains a line number and one or more parenthesized numbers, separated by spaces.

Explanation of the grep search: You want to keep the initial line number and the parenthesized numbers, and remove everything else, replacing it by spaces. The search string is a description of an “everything else”:

“[ )]”– An “everything else” begins with a space (which must be the space following the line number, since those are the only spaces in the file) or the closing parenthesis of a parenthesized number.
“\d+” — The string of contiguous digits following the space or closing parenthesis are an “everything else” number.
“\(?” — The “everything else” number is either at the end of the line or it is followed by the opening parenthesis of a parenthesized number. So if it is followed by an opening parenthesis, then that is part of the “everything else”, too.

Regards,

Neil Faiman

jj

unread,
Jun 1, 2021, 11:27:43 AM6/1/21
to BBEdit Talk

Here is Neil 3 step process as one text filter

```sh
#!/usr/bin/env sh

perl -pe '$_ = "$.:$_"' | perl -ne 'print if /\(/'         | perl -pe 's/\d*\((\d*)\)\d*/ \1/g'
# add line number       # keep only lines containing a '(' # ouput only digits enclosed in parenthesis
```
Regards,

Jean Jourdain

Christopher Stone

unread,
Jun 1, 2021, 4:59:15 PM6/1/21
to BBEdit-Talk
On 06/01/2021, at 08:11, Howard <leadwi...@gmail.com> wrote:
I have a set of numbers. Within some of them is at least one number within parentheses. I need to find all the lines containing numbers within parentheses and extract those numbers. I also need to know which line they are extracted from.


Hey Howard,

Bash text filter:

#!/usr/bin/env bash
nl -n ln | sed -En 's![0-9]*(\([0-9]+\))[0-9]*!\1!p;'

Result:

2     (10)
4     (16)
5     (10)(11)

I've left on the parentheses to make it clear when there are more than one set in a line.


If we're going to resort to Perl we might as well go all in:

#!/usr/bin/env perl -sw
while (<>) {
   if ( m!\(\d+\)! ) {
      s!\d*(\(\d+\))\d*!$1!g;
      print "$.\t" . $_;
   }
}

Result:

2 (10)
4 (16)
5 (10)(11)

Or just to be a bit cheeky:

#!/usr/bin/env perl -sw
while (<>) { if ( m!\(\d+\)! ) { s!\d*(\(\d+\))\d*!$1!g; print "$.\t" . $_; } }

OR

#!/usr/bin/env bash
perl -wlne '{if(m!\(\d+\)!){s!\d*(\(\d+\))\d*!$1!g;print"$.\t".$_;}}'

😎

Don't forget – you can extract text directly from the front document using the Find Dialog and the [Extract] button.

1) Number lines.
2) Extract lines to new doc (with Find Dialog).
3) Find/replace to leave only the desired text.

The same as Neil's suggestion but without using Process Lines Containing.

TMTOWTDI.

I generally prefer using a text filter for this sort of thing, and I have one that opens with a hotkey and runs via another hotkey. So writing a filter and running it usually takes me less time than running through the steps with BBEdit's other tools.  ( Usually but not always... :)

--
Best Regards,
Chris

Howard

unread,
Jun 1, 2021, 7:59:28 PM6/1/21
to BBEdit Talk
Thanks everyone for the responses. 

To help me to understand better the Grep part of Neil's solution, can someone provide me with the search pattern and the replace pattern to just find those lines with numbers in parentheses and extract them without any line numbers? I'd like to put that into the Pattern Playground and work with that a bit.

Tom Robinson

unread,
Jun 1, 2021, 10:21:49 PM6/1/21
to bbe...@googlegroups.com
On 2021-06-02, at 11:59, Howard <leadwi...@gmail.com> wrote:
>
> Thanks everyone for the responses.
>
> To help me to understand better the Grep part of Neil's solution, can someone provide me with the search pattern and the replace pattern to just find those lines with numbers in parentheses and extract them without any line numbers? I'd like to put that into the Pattern Playground and work with that a bit.

Here’s the first one broken into components:

[0-9] Search for a digit 0–9
* Preceding search 0 or many times
( Start a capture buffer
\( Look for a literal left bracket (‘escape’ it)
[0-9]
+ Preceding search 1 or many times
\) Look for a literal right bracket
) End capture buffer
[0-9]
*

Note some of samples also used \d to search for digits. Technically this can find them in other languages too.

Christopher Stone

unread,
Jun 2, 2021, 5:37:42 AM6/2/21
to BBEdit-Talk
On 06/01/2021, at 18:59, Howard <leadwi...@gmail.com> wrote:
To help me to understand better the Grep part of Neil's solution, can someone provide me with the search pattern and the replace pattern to just find those lines with numbers in parentheses and extract them without any line numbers? I'd like to put that into the Pattern Playground and work with that a bit.


Hey Howard,

Assuming all of your data lines are variations of this format:

1001405001
0000(10)0000
001320000001
0(16)5000000
021(10)0000(11)
010101000

Then it's extremely simple to extract the lines.

Find:

.*\(\d+\).*

.*   ==  any character zero or more.
\(   ==  literal open parenthesis.
\d+  ==  any digit one or more.
\)   ==  literal close parenthesis.
.*   ==  any character zero or more.



If I knew that any line containing even 1 parenthesis was viable for extraction I could be lazy and do:

Find:

.*\(.*

.*   ==  any character zero or more.
\(   ==  literal open parenthesis.
.*   ==  any character zero or more.

OR

You could use Neil's suggestion of Process-Lines-Containing with just 1 literal parenthesis.



Now – to remove the unwanted digits:

Find:

\d*(\(\d+\))\d*

\d*  ==  any digit zero or more.
(    ==  start capture group.
\(   ==  literal parenthesis.
\d+  ==  any digit 1 or more.
\)   ==  literal parenthesis.
)    ==  close capture group.
\d*  ==  any digit zero or more.

Replace:

\1

\1   ==  capture group 1.



As I've mentioned BBEdit is always my starting point for building regular expressions, but there are times when a tool like RegEx101.com will give you more information and more explanation.

See your patter here:


--
Best Regards,
Chris

Howard

unread,
Jun 3, 2021, 10:46:58 AM6/3/21
to BBEdit Talk
Chris, what you wrote is very helpful.

To get the Grep results not enclosed in parentheses, I used the Replace pattern `\2` with this Search pattern:

\d*(\((\d+)\))\d*

but that resulted in these numbers:
10
16
10
11

Given that the last two numbers, 10 and 11, are on the same line, is there a way to modify the Grep expression so that the results appear with the final two numbers on the same line, as shown below, with a space separating them?
10
16
10 11

Howard

jj

unread,
Jun 3, 2021, 11:43:53 AM6/3/21
to BBEdit Talk
Hi Howard, 

You can use this regex in the Find field:

\d*(?:\((\d+)\))?\d*(?:\((\d+)\))?\d*(?:\((\d+)\))?\d*(?:\((\d+)\))?\d*(?:\((\d+)\))?\d*

with this in the Replace field:

\1 \2 \3 \4 \5

It can find from 1 to 5 parentheses. Repeat the (?:\((\d+)\))?\d* pattern and the corresponding captures if you need more.

Here is a 2 parentheses case commented:

(?x)                (?# allow whitespace and comments)
\d*                 (?# zero or more leading digits)
(?:                 (?# non capturing parenthesis)
    \(              (?# literal open parenthesis)
        (           (?# capturing open parenthesis - capture \1)
            \d+     (?# one or more digits)
        )           (?# capturing close parenthesis)
    \)              (?# literal close parenthesis)
)?                  (?# zero or one occurrence)
\d*                 (?# zero or more middle digits)
(?:                 (?# non capturing parenthesis)
    \(              (?# literal open parenthesis)
        (           (?# capturing open parenthesis - capture \2)
            \d+     (?# one or more digits)
        )           (?# capturing close parenthesis)
    \)              (?# literal close parenthesis)
)?                  (?# zero or one occurrence)
\d*                 (?# zero or more trailing digits)

You can copy this commented pattern and paste it "as is" in the Find dialog by right-clicking inside the "Find" field with the <Option-key> pressed and choosing the "Paste and Select" menu item.

Regards,

Jean Jourdain

Christopher Stone

unread,
Jun 7, 2021, 3:54:04 PM6/7/21
to BBEdit-Talk
On 06/03/2021, at 09:46, Howard <leadwi...@gmail.com> wrote:

Chris, what you wrote is very helpful.

To get the Grep results not enclosed in parentheses, I used the Replace pattern `\2` with this Search pattern:

\d*(\((\d+)\))\d*

but that resulted in these numbers:
10
16
10
11


Hey Howard,

That can't be right.  If you genuinely got the above result you need to show all your steps.

In BBEdit 12 and 13:

Source:

0000(10)0000
0(16)5000000
021(10)0000(11)

Find: `\d*(\((\d+)\))\d*`

Repl: `\2`

Result:

10
16
1011



The above would be better written this way:

Source:

0000(10)0000
0(16)5000000
021(10)0000(11)

Find: `\d*\((\d+)\)\d*`

Repl: `\1`

Result:

10
16
1011



To get the output you want add a space to `\1 `

You'll get:

10<space>
16<space>
10<space>11<space>

Follow that up by removing the trailing horizontal whitespace.

Find: `\h+$`

Repl: nothing



This becomes terribly tedious when you do it all by hand, so I wrote you an example Text Factory.

Install here:

~/Library/Application Support/BBEdit/Text Filters/Text Factories/

You have to run it from the Text > Apply Text Filter menu with the target document FRONTMOST.

Give it a keyboard shortcut for convenience in BBEdit's Menus & Shortcuts prefs.

Text Factories let you do quite a lot without learning how to write scripts.


--
Take Care,
Chris

Howard.textfactory.zip

Christopher Stone

unread,
Jun 7, 2021, 4:12:08 PM6/7/21
to BBEdit-Talk
On 06/07/2021, at 14:53, Christopher Stone <listmei...@gmail.com> wrote:

This becomes terribly tedious when you do it all by hand, so I wrote you an example Text Factory.

Install here:

~/Library/Application Support/BBEdit/Text Filters/Text Factories/

You have to run it from the Text > Apply Text Filter menu with the target document FRONTMOST.


Hey Howard,

Oops...  I sent you the wrong Text Factory.  That one works but is less precise than this one.


--
Take Care,
Chris

Howard 02.textfactory.zip
Reply all
Reply to author
Forward
0 new messages