BBEdit Parsing Help

40 views
Skip to first unread message

Peter Kaufman

unread,
Sep 26, 2020, 8:22:22 PM9/26/20
to BBEdit Talk
Folks,

I've been struggling with this problem all day and can't seem to find a solution to it.  I'm missing some key idea for which I hope one of you can inform me.

Thanks very much in advance for any help!

Peter

My input is below and the BBEdit Find is:
FIND: ^Last Name:\t(.*)\n^First Name:\t(.*)\n^Middle Name:\t(.*)\n^Card Number:\t(.*)\n^Active Date:\t(.*)\tInactive Date:\t(.*)\tInactive Time:.*\nNormal Rights\n\tAccess Codes\n(\t\t(.*)\n)+\n

REPLACE: \1\t\2\t\3\t\4\t\5\t\6\t\8

In order to get one line per group of fields within a record ending with \n\n.

Sample Input (the Access Codes could be from 1 to 4 lines spaced the same way):
Last Name:          Smith
First Name:         Ken
Middle Name:        
Card Number:        8522
Active Date:        08/31/2020          Inactive Date:                Inactive Time:      
Normal Rights
          Access Codes
                    JFF Office
                    Master 24/7

Last Name:          Smith
First Name:         Ken
Middle Name:        
Card Number:        8681
Active Date:        08/31/2020          Inactive Date:                Inactive Time:      
Normal Rights
          Access Codes
                    Master 24/7
                    JFF Office
                    Special Office

Last Name:          Smith
First Name:         Deb
Middle Name:        
Card Number:        12293
Active Date:        08/31/2020          Inactive Date:                Inactive Time:      
Normal Rights
          Access Codes
                    Master 24/7
                    Special Door

Last Name:          Smith
First Name:         Diane
Middle Name:        
Card Number:        12221
Active Date:        08/31/2020          Inactive Date:      09/01/2020          Inactive Time:      23:59
Normal Rights
          Access Codes
                    Master 24/7

Bruce Van Allen

unread,
Sep 26, 2020, 9:00:57 PM9/26/20
to bbe...@googlegroups.com
On 9/26/20 at 5:16 PM, pkau...@gmail.com (Peter Kaufman) wrote:

>My input is below and the BBEdit Find is:
>FIND: ^Last Name:\t(.*)\n^First Name:\t(.*)\n^Middle Name:\t(.*)\n^Card

Preliminarily, are you sure that single tabs separate Labels
from values?

Your sample didn't come through that way in my email client...

Use BBEdit's Show Invisibles, including spaces, to be sure what
white space you're records have.

Converting your first two sample records to single tabs between
labels and values, single starting tab for the Access Codes line
and double starting tabs for the lines under Access Codes, the
following pattern matches both records:

^Last Name:\t(.*)\n^First Name:\t(.*)\n^Middle
Name:\t(.*)\n^Card Number:\t(.*)\n^Active Date:\t(.*)\tInactive
Date:\t(.*)\tInactive Time:\t(.*)\nNormal Rights\n\tAccess Codes\n(\t\t(.*)\n)+\n

Email might be adding some spurious line endings to wrap that.

There are some things that make this pattern fragile, so I
wouldn't recommend it for much use. But all fine if it's a
one-timer, or just getting the darn thing to match is only your
first step in refining it.

HTH

--

- Bruce

_bruce__van_allen__santa_cruz__ca_

Peter Kaufman

unread,
Sep 26, 2020, 9:44:46 PM9/26/20
to bbe...@googlegroups.com
Bruce,

Thanks for looking into it. I actually converted the tabs to spaces to make it look right visually in the email. Sorry for any confusion. There are indeed tabs where the FIND string is expecting them. Perhaps should have included an attached text file.

I’m finding that only the LAST of the Access Codes is saved though. None of those before the last one.

Peter

Sent from my iPad

> On Sep 26, 2020, at 8:00 PM, Bruce Van Allen <b...@cruzio.com> wrote:
> --
> This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/r480Ps-10146i-FB1DA36581374F17940E507B6BADB724%40Forest.local.

Bruce Van Allen

unread,
Sep 26, 2020, 10:34:03 PM9/26/20
to bbe...@googlegroups.com
On 9/26/20 at 6:30 PM, pkau...@gmail.com (Peter Kaufman) wrote:

>I’m finding that only the LAST of the Access Codes is saved
>though. None of those before the last one.

Even though your last capture is allowed to repeat because of
the (...)+, it only becomes one \# capture variable.

The key thing is to repeat the "\t\t(.*)\n" but allow the repeat
to be optional, plus not incidentally capture the wrong stuff:

(?:\t\t(.*?)\n)?

This is the repeating optional match after the first match under
Access Code. What's the maximum number of Access Codes you'd
encounter? Make sure you have that many optional repeats of
that sub-pattern, and capture \# variables in the replacement pattern.

The '?:' after the first opening parens of that sub-pattern
keeps it from being captured into a \# variable.

Access Codes\n\t\t(.*?)\n(?:\t\t(.*?)\n)?(?:\t\t(.*?)\n)?(?:\t\t(.*?)\n)?\n

Matches your first two records and

REPLACE: \1\t\2\t\3\t\4\t\5\t\6\t\8\t\9\t\10\t\11\n

Again, this could be simplified and made sturdier for heavy use,
and I didn't test it on your remaining sample records.

Kerri Hicks

unread,
Sep 26, 2020, 10:37:31 PM9/26/20
to bbe...@googlegroups.com
It's hard to be sure, not knowing exactly what your whitespace is supposed to be, but try this:

^Last Name:\t(.*)\n^First Name:\t(.*)\n^Middle Name:\t(.*)\n^Card Number:\t(.*)\n^Active Date:\t(.*)\tInactive Date:\t(.*)\tInactive Time:\t(.*)\t\nNormal Rights\n\tAccess Codes\n((?s)(.*?)\n)*

--Kerri

Kerri Hicks

unread,
Sep 26, 2020, 10:43:08 PM9/26/20
to bbe...@googlegroups.com
Whoops, missed that last line break!

^Last Name:\t(.*)\n^First Name:\t(.*)\n^Middle Name:\t(.*)\n^Card Number:\t(.*)\n^Active Date:\t(.*)Inactive Date:\t(.*)Inactive Time:\t(.*)\nNormal Rights\n\tAccess Codes\n((?s)(.*?))\n\n

--Kerri

Peter Kaufman

unread,
Sep 27, 2020, 6:29:39 PM9/27/20
to BBEdit Talk
Bruce,

You have clearly caught my issues in misunderstanding and not knowing - about the ?: usage and capture variables.  Thank you!!! 

Keri - I appreciate your input on this as well!  Thank you.

Peter
Reply all
Reply to author
Forward
0 new messages