Extracting parts of names from full names

32 views
Skip to first unread message

anotherhoward

unread,
Feb 27, 2020, 4:44:59 PM2/27/20
to BBEdit Talk
I have a list of names in this format:

Luis Gonzalez\gonzalu01
Eddie Perez\perezed02
B.J. Surhoff\surhob.01
Bobby Bonilla\bonilbo01
Keith Lockhart\lockhke01

I want to extract the last names and separately extract what comes before each last name
(which could be just the first name or two initials as in "B.J.") so that I can later organize them this way:

First       Last
Luis        Gonzalez
Eddie      Eddie
B.J.         Surhoff
Bobby     Bobby
Keith       Lockhart

I do not need the data after the slash.

How can I use GREF (REGEX) to extract the last names and separately the first/middle values?

Kerri Hicks

unread,
Feb 27, 2020, 4:53:44 PM2/27/20
to bbe...@googlegroups.com
How confident are you in your data source that you will always have names in the format of "string with no spaces" "space" "string with no spaces"?

Will you ever have names like:

Jamie Lee Curtis (space in the "first" names)
Onne van der Wal (space in the surname)

or other variants?

The expression will depend on your answer to that question.

--Kerri

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/59705c02-51c1-40da-ac1f-dda43d3f0d8c%40googlegroups.com.

Sam Hathaway

unread,
Feb 27, 2020, 4:57:11 PM2/27/20
to 'anotherhoward' via BBEdit Talk

Do the “last names” in your dataset always consist of the final word before the backslash? If so, you can use:

Find: (.*) (\S+)\\.*
Replace: \1\t\2

But eventually you will need to deal with names that don’t fit this pattern and then you will be sad. For example, in the name Saúl Rodriguez Luna, the “last name” is “Rodriguez Luna”.

You might want to read this:
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Hope this helps.
-sam

Darren Duncan

unread,
Feb 28, 2020, 9:15:35 AM2/28/20
to bbe...@googlegroups.com
What is the business case for this separation? For all practical purposes
keeping the name as a single string is best. A better solution to your problem
may be changing anything that expects parts to expect a single combined name
instead, which would then work for names of any nationality. -- Darren Duncan

On 2020-02-27 1:43 p.m., 'anotherhoward' via BBEdit Talk wrote:
> I have a list of names in this format:
>
> Luis Gonzalez\gonzalu01
> Eddie Perez\perezed02
> B.J. Surhoff\surhob.01
> Bobby Bonilla\bonilbo01
> Keith Lockhart\lockhke01
>
> I want to extract the last names and separately extract what comes before each
> last name
> (which could be just the first name or two initials as in "B.J.") so that I can
> later organize them this way:
>
> *First       Last*

ThePorgie

unread,
Feb 28, 2020, 10:42:12 AM2/28/20
to BBEdit Talk
He might be doing a variable data job where the usage in one instance he needs the first name only. In another instance he needs the whole name....Just off the top of my head Darren.

John Delacour

unread,
Feb 28, 2020, 11:01:01 AM2/28/20
to bbe...@googlegroups.com


On 27 Feb 2020, at 21:43, 'anotherhoward' via BBEdit Talk <bbe...@googlegroups.com> wrote:

I have a list of names in this format:

B.J. Surhoff\surhob.01
Bobby Bonilla\bonilbo01

I want to extract the last names and separately extract what comes before each last name 
(which could be just the first name or two initials as in "B.J.") so that I can later organize them this way:

B.J.         Surhoff
Bobby     Bobby


I do not need the data after the slash.

How can I use GREF (REGEX) to extract the last names and separately the first/middle values?

If you don’t need to script it,

search for: (.+?) ([^ ]+?)\\.+
Replace all with \1 \2


JD

John Delacour

unread,
Feb 28, 2020, 11:08:34 AM2/28/20
to bbe...@googlegroups.com


On 28 Feb 2020, at 16:00, I wrote:

Replace all with \1 \2

…or Extract, of course!

anotherhoward

unread,
Feb 28, 2020, 3:30:35 PM2/28/20
to BBEdit Talk
I would like to thank everyone for their thoughtful comments. Because of them, I realize that what would work best for me are Sam's with a space replacing the "\t" in the replacement pattern and John's.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages