Need Grep Pattern

162 views
Skip to first unread message

Kim Mosley

unread,
Feb 14, 2024, 6:05:25 PM2/14/24
to BBEdit Talk
I want to add tabs (or something else that might be better) so that I can have three columns… date, vendor with city, and price.

Can someone help?

Thanks!

01/03/23 CENTRAL MARKET 61 AUSTIN, TX 45.00
01/04/23 H-E-B 425 AUSTIN, TX 74.62
01/09/23 CENTRAL MARKET 61 AUSTIN, TX 43.70
01/10/23 WHOLEFDS LMR 10145 AUSTIN, TX 62.25
01/13/23 SQ *ASAHI IMPORTS Austin, TX 24.46
01/14/23 CENTRAL MARKET 61 AUSTIN, TX 29.22
01/17/23 CENTRAL MARKET 61 AUSTIN, TX 28.25
01/18/23 CENTRAL MARKET 61 AUSTIN, TX 19.34
01/21/23 CENTRAL MARKET 61 AUSTIN, TX 1.83
01/21/23 CENTRAL MARKET 61 AUSTIN, TX 18.34
01/23/23 CENTRAL MARKET 61 AUSTIN, TX 19.85

Jim Straus

unread,
Feb 14, 2024, 7:07:07 PM2/14/24
to bbe...@googlegroups.com
Search for:
^([0-9]+/[0-9]+/[0-9]+) (.*) ([0-9.]+)$
and replace with
\1\t\2\t\3
and Replace all should do it.
-Jim Straus

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/235af5bb-10d7-4dad-9df0-fe2db21e94bdn%40googlegroups.com.

Brian Forte

unread,
Feb 14, 2024, 7:10:32 PM2/14/24
to bbe...@googlegroups.com, Kim Mosley
Kim,

On Wed, 14 Feb 2024 14:44:27 -0800 (PST), Kim Mosley wrote:
> I want to add tabs (or something else that might be better) so that
> I can have three columns… date, vendor with city, and price.

> 01/03/23 CENTRAL MARKET 61 AUSTIN, TX 45.00
> 01/04/23 H-E-B 425 AUSTIN, TX 74.62
> 01/09/23 CENTRAL MARKET 61 AUSTIN, TX 43.70
> 01/10/23 WHOLEFDS LMR 10145 AUSTIN, TX 62.25
> 01/13/23 SQ *ASAHI IMPORTS Austin, TX 24.46
> 01/14/23 CENTRAL MARKET 61 AUSTIN, TX 29.22
> 01/17/23 CENTRAL MARKET 61 AUSTIN, TX 28.25
> 01/18/23 CENTRAL MARKET 61 AUSTIN, TX 19.34
> 01/21/23 CENTRAL MARKET 61 AUSTIN, TX 1.83
> 01/21/23 CENTRAL MARKET 61 AUSTIN, TX 18.34
> 01/23/23 CENTRAL MARKET 61 AUSTIN, TX 19.85

Search for
^([0-9]{2}/[0-9]{2}/[0-9]{2})\s{1,}(.*[A-Z]{2,})\s{1,}([0-9]{1,}\.[0-9]{2})

Replace with
\1\t\2\t\3

The *search for* regex makes several assumptions.

1. all entries in the data set begin on a new line.

2. all the dates in the data set are of the form mm/dd/yy (ie the
legacy US-centric Gregorian date shorthand).

3. all the vendor addresses end with a comma, one or more spaces, and
then a two-letter state code as per the USPS’s Publication 59,
1963-10.

4. all prices are listed to two significant figures after the decimal
point.

Hope this helps.

Regards,

Brian Forte

--
Brian Forte
<bfo...@adelaide.on.net>

Kim Mosley

unread,
Feb 14, 2024, 10:10:09 PM2/14/24
to bbe...@googlegroups.com
Thanks. It worked for some but not all. It could have been my error.

Kim

eu...@gmx.de

unread,
Feb 15, 2024, 10:10:01 AM2/15/24
to bbe...@googlegroups.com
Hi Kim,

Maybe this search pattern will help:

([\d\/]{8})( )([^\,]+)(\, TX )([\d\.]{1,5})


with replace pattern:

\1\t\3\t\5


By the way: Try the pattern first wich the Pattern Playground (the 6th entry in BBEdits search menu). There’s also an extensive help on grep included.

Cheers, Ulrich


Mike Pasini

unread,
Feb 15, 2024, 12:34:00 PM2/15/24
to BBEdit Talk
Let's describe the pattern you are trying to split. There's more than one way to do that (as the replies to you query show) but by focusing on a general pattern (what must always be true about your original text), we can make a more reliable regexp.

You want to capture everything up to the first space -- ^(.+?)/s -- followed by just about anything (.+?)/s until you see a string of numbers including a period ([\d.]+?)$ at the end of the line.

So try: ^(.+?)\s(.+?)\s([\d.]+?)$

In BBEdit, you can put that in the Find dialog with \1\n\2\n\3\n\n in the replace to make the three groups clear (substitute \t for the first \n for tabs).

Ted Burger

unread,
Feb 15, 2024, 12:43:47 PM2/15/24
to bbe...@googlegroups.com
I am not that good with grep, so I would break this down.

Find: /23  (there is a space character at the end)
Replace: /23\t

Then

Find: TX  (there is a space at the end)
Replace: TX\t

Thanks,
Ted
***********************  Ted Burger  ****************************
t...@tobsupport.com      *********     www.tobsupport.com



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.

Kaveh Bazargan

unread,
Feb 15, 2024, 4:16:58 PM2/15/24
to bbe...@googlegroups.com
I think the answer is probably above but here is one more:
Search:
^(\d\d\/\d\d\/\d\d)\s(.+?)\s([0-9\.]+)$
Replace:
\1\t\2\t\3

Saved in Regex101 here.



--
Kaveh Bazargan PhD
Director
Accelerating the Communication of Research
  https://rivervalley.io/gigabyte-wins-the-alpsp-scholarly-publishing-innovation-award-using-river-valleys-publishing-technology/

GP

unread,
Feb 15, 2024, 7:34:25 PM2/15/24
to BBEdit Talk
Since your address parts have varying lengths, a simple grep replace pattern using fixed tabs isn't always going to produce the desired result.

A solution is to use a perl script text filter. Using Brian Forte's grep seach expression in a perl script for a text filter, I came up with this:

#!/usr/bin/perl -w

# to ensure while each is supported
use v5.12;
use strict;

# change the number to adust the size of the psuedo tabs between columns
my $spacesPerTab = 4;
my @dateParts = ();
my @addrParts = ();
my @costParts = ();
my $maxAddrLen =0;
# for each input line, break it up into parts, and find maximum length of all the address parts
while (<>) {
    chomp;
    $_ =~ m!^([0-9]{2}/[0-9]{2}/[0-9]{2})\s{1,}(.*[A-Z]{2,})\s{1,}([0-9]{1,}\.[0-9]{2})!;
    push @dateParts, $1;
    push @addrParts, $2;
    push @costParts, $3;
    if (length($2) > $maxAddrLen) {
        $maxAddrLen = length($2);
    }
}
# for every line, pad the address part to maximum read length and then print out space separated parts
while (my ($i, $addr) = each @addrParts) {
    if (length($addr) < $maxAddrLen) {
        $addrParts[$i] = $addr.(" " x ($maxAddrLen - length($addr)))
    }

    print $dateParts[$i], " " x $spacesPerTab, $addrParts[$i], " " x $spacesPerTab, $costParts[$i], "\n";
}

I saved it as "Columnize_date_address_price.pl" in BBEdit's Text Filters support folder. (Although I didn't do so, you can assign a keyboad shortcut to it in the Text Filters pallet.)

To use it, select the lines of text you want to tidy up into neat columns. Then on the Text Filters pallet, select "Columnize_date_address_price" and click on the "Run" tab button. (You might want to test on copy examples to ensure it is working as desired for any more elaborate cases you may have.)

I would have included what the filter result would be for your example data but the forum isn't using mono sizing for space characters so the price column isn't showing a good alignment in a pasted in result.

I used spaces to regularize the column separation since the width of tabs are pretty variable. If you want the result entabbed, BBEdit has a "Convert Spaces to Tabs…" menu item for that.

Lastly, although I haven't done so, you could modify the perl script to tidy up the price column so the decimal point aligns up for all the prices.

On Wednesday, February 14, 2024 at 3:05:25 PM UTC-8 Kim Mosley wrote:
Reply all
Reply to author
Forward
0 new messages