Including "Sr." or "Jr." in name where applicable

59 views
Skip to first unread message

anotherhoward

unread,
Feb 29, 2020, 8:40:53 AM2/29/20
to BBEdit Talk
In a previous post, "Extracting parts of names from full names," I asked about how to extract the first and last names from a string.

Here is an input sample:

Felix Jose\josefe01
Tony Clark\clarkto02
Matt Williams\willima04
John McDonald\mcdonjo03
Mark Grace\gracema01
Steve Finley\finlest01
B.J. Surhoff\surhob.01
J.T. Snow\snowj.01

When I re-examined the full dataset, I noticed two cases not included previously.
Eric Young Sr.\younger0
Ken Griffey Jr.\griffke02

Based on the helpful feedback I got in my previous post, now what I need is a pattern that would handle not only all the original input items plus the two new cases. Further, what I would like extracted is everything up to but not including the backslash.

This attempt (of mine) finds everything in the original dataset, but I was unable to expand it to include either the "Sr." or "Jr.":
^([a-z,A-Z,\.]+) ([a-z,A-Z]+)

1. Is there a way my attempt can be expanded so it includes "Sr." or "Jr." when a name has either of them?
2. What pattern would you write to accomplish the task?

David G Wagner

unread,
Feb 29, 2020, 2:42:43 PM2/29/20
to bbe...@googlegroups.com
You could add:
(\s(jr|sr).){0,1} within the last set of parens so would look like:

^([a-z,A-Z,\.]+) ([a-z,A-Z]+(\s(jr|sr).){0,1})

This would handle those who do not put a period after the Jr or Sr also. 


Wags ;)
WagsWorld
Hebrews 4:15
Ph(primary) : 408-914-1341
Ph(secondary): 408-761-7391
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/8a68d4b5-6d57-4c4c-8cd0-1ed5c30f85d5%40googlegroups.com.

anotherhoward

unread,
Feb 29, 2020, 3:33:45 PM2/29/20
to BBEdit Talk
Thanks. How does this part of the expression work?—  (\s(jr|sr).){0,1})

To unsubscribe from this group and stop receiving emails from it, send an email to bbe...@googlegroups.com.

David G Wagner

unread,
Feb 29, 2020, 4:48:13 PM2/29/20
to bbe...@googlegroups.com
It works but is not quite right. That period there should really be expanded to:
\.{0,1}

So it should read now as:
^([a-z,A-Z,\.]+) ([a-z,A-Z]+(\s(jr|sr)\.{0,1}){0,1})

Where my new part says:

A space followed by either Jr or Sr and there will be a period and not.  

The {} has a low and high:

{1,2} what preceded the { can occur either one or two times

{1,} says what preceded the { can occur 1 or more times

Probably a number of other ways, but this should now work with jr or st with it without a period following...

Sorry about the incorrect first pass... ;)

Wags ;)
WagsWorld
Hebrews 4:15
Ph(primary) : 408-914-1341
Ph(secondary): 408-761-7391
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/a8720563-fcc5-4e04-a7d8-f1f2111bde4f%40googlegroups.com.

153957

unread,
Apr 30, 2020, 8:26:35 AM4/30/20
to BBEdit Talk
First a reply to previous suggestions:
Since {0,1} means 'this occurs 0 or 1 times' it can be replaces by ?, since that means the same. So the following also works:
^([a-z,A-Z,\.]+) ([a-z,A-Z]+(\s(jr|sr)\.?)?)

Also the , in the [a-z,A-Z,\.] blocks are not necessary, if you do want to allow a comma you can leave it in, but once is enough [a-zA-Z,\.], without the , you can just use [a-zA-Z\.] . Moreover, I see that the suggestion for Jr./Sr. is only lower case, so I assume you do not have case sensitivity enabled, so then [a-z\.] would also suffice for the first names, resulting in this:
^([a-z\.]+) ([a-z]+(\s(jr|sr)\.?)?)

Now for an alternative suggestion to the shown data:
In your data example the full names are before a \ and the first name is probably 'anything before the first space' and last name 'everything after the space until the \' you can simply use:
(.+?) (.+?)\\

The first group will match everything until the first space, the second group everything after that until the first backslash, and both groups must contain at least one character.
Reply all
Reply to author
Forward
0 new messages