Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

need help with regex

24 views
Skip to first unread message

Harry

unread,
May 30, 2018, 3:26:29 PM5/30/18
to
Hi All

I am trying to extract some info from an Oracle database field which contains text like this below.

Identifier1^Last^First^Middle^^^^^EX_NBR^PERSONNEL~
Identifier2^Last^First^Middle^^^^^COL_NBR^PERSONNEL~
Identifier3^Last^First^Middle^^^^^MSP_NBR^PERSONNEL~
Identifier4^Last^First^Middle^^^^^PHARMANETPHYS^PERSONNEL

All the 4 lines above are actually within 1 line of field.
I just break them into 4 lines for the sake of clarity.

There may be 3 or 4 sub-fields as above separated by the tilde character.

I want to extract
Identifier3^Last^First^Middle
from the sub-field
Identifier3^Last^First^Middle^^^^^MSP_NBR^PERSONNEL~

MSP_NBR sub-field may or may not be in the middle.
It could be on the fist or on the last sub-field.

Anyone could advice the regex expression (to be used by Oracle's regexp_replace() to extract the info please?

TIA

Icarus Sparry

unread,
May 30, 2018, 3:46:41 PM5/30/18
to
As I understand it, you want to change

something~something~data^^something

to
data

where the first two somethings don't have ~ characters in them. I don't
know Oracle's regexp_replace. In sed I would do it in 2 steps for ease of
thinking, first get the third field and then trim it down.

Get the third field.
s/^[^~]*~[^~]*~\([^~]*\)~.*$/\1

translation,
^ match start of line
[^~] a character that is not ~
* change the last thing to 0 or more occurances, i.e. a run of characters
that is not ~
~ match a ~, the first field seperator
[^~]*~ as above, i.e. the second field and it's seperator
\( start capturing the characters we want
[^~]* the third field
\) stop capturing
~ the seperator at the end of the third field
.* matches anything
$ matches end of line

\1 the text captured in the first set of braces

Then to remove everything after the ^^, just

s/^^.*//

It is possible to put this into one replacement, but it is harder.

Harry

unread,
May 30, 2018, 4:20:02 PM5/30/18
to
On Wednesday, May 30, 2018 at 12:46:41 PM UTC-7, Icarus Sparry wrote:

> Get the third field.
> s/^[^~]*~[^~]*~\([^~]*\)~.*$/\1

Yeah, but the MSP_NBR sub-field that I want is not always on the 3rd field.
And the field may usually contain 3 or 4 sub-fields, so there may not be
a tilde character after the 3rd sub-field.

Ed Morton

unread,
May 30, 2018, 4:21:24 PM5/30/18
to
idk what kind of regexp the tool you're using supports with here's an ERE
assuming you want a string that starts with Identifier3:

Identifier3(\^[^\^]*){3}

used in awks match() function to demonstrate:

$ awk 'match($0,/Identifier3(\^[^\^]*){3}/) {print
substr($0,RSTART,RLENGTH)}' file
Identifier3^Last^First^Middle

If that's not what you're looking for then please clarify your requirements.

Regards,

Ed.

Harry

unread,
May 30, 2018, 5:05:38 PM5/30/18
to
OK, if I put the text into a text file and use sed ....
this one works for me if MSP_NBR sub-field is in 1st or last sub-field.

sed 's/^\(.*~\)*\([^~]*\)^^^^^MSP_NBR^PERSONNEL.*$/\2/' xx.txt

On Oracle, this one works for me .....

regexp_replace(table.field, '^(.*~)*([^~]*)\^\^\^\^\^MSP_NBR\^PERSONNEL.*$', '\2')

Thanks

Thomas 'PointedEars' Lahn

unread,
May 31, 2018, 1:43:19 PM5/31/18
to
Icarus Sparry wrote:

> As I understand it, you want to change
>
> something~something~data^^something
>
> to
> data
>
> where the first two somethings don't have ~ characters in them. I don't
> know Oracle's regexp_replace. In sed I would do it in 2 steps for ease of
> thinking, first get the third field and then trim it down. […]

If that is the goal, it is of course much easier to do with awk(1) instead:

awk -F '~' '{ sub(/\^\^.*$/, "", $3); print $3 }'

Whenever you think “fields”, think of awk(1) and perl(1) first.

--
PointedEars
<https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2
Please do not cc me. /Bitte keine Kopien per E-Mail.

Grant Taylor

unread,
May 31, 2018, 1:50:59 PM5/31/18
to
On 05/31/2018 11:43 AM, Thomas 'PointedEars' Lahn wrote:
> Whenever you think “fields”, think of awk(1) and perl(1) first.

Even if "fields" are something other than "part of a line".

If there is something that separates the fields, chances are good that
awk and / or Perl can easily work with it.



--
Grant. . . .
unix || die
0 new messages