Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular expressions: splitting with REGEXP_SUBSTR and "null"

2,391 views
Skip to first unread message

citte

unread,
Feb 28, 2008, 11:02:53 AM2/28/08
to
Hi,
I have a problem, I found on internet a way to split strings separated
by pipe | (for example) with regexp, something like that:

SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
FROM dual;

which gives the result:
Three

the problem comes when I have one of the words separated by the pipe
is actually a Null, so the string becomes
'One|Two||Four|'

the query

SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
FROM dual;

result in:
Four

I needed a Null (or empty string).
How is this possible?

Thank you in advance!

Mark D Powell

unread,
Feb 29, 2008, 12:31:27 PM2/29/08
to

Will this help?

SQL> l
1 select regexp_substr(replace('one|two||four','||','|NULL|'),
2* '[^|]+',1,3) from dual
SQL> /

REGE
----
NULL

Also to return an actual NULL value
SQL> select regexp_substr(replace('one|two||four','||',''),
2 '[^|]+',1,3) from dual;

R
-


HTH -- Mark D Powell --

Doug Miller

unread,
Feb 29, 2008, 1:11:36 PM2/29/08
to
In article <248f35ac-1e8c-424a...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark....@eds.com> wrote:

Doesn't work in a more general case, e.g. attempting to select the fourth
string from 'one|two|||five' -- the desired result is NULL, but this method
returns 'five'. Conversely, trying to retrieve the fifth string should return
'five' but instead returns NULL.


>
>Also to return an actual NULL value
>SQL> select regexp_substr(replace('one|two||four','||',''),
> 2 '[^|]+',1,3) from dual;
>
>R
>-

Also doesn't work for more general cases:
- specifying the second substring should return 'two' but instead returns
'twofour'
- specifying the fourth substring should return 'four' but instead returns a
null string
- specifying the third substring in 'one|two||four|five' should return a null
string, but instead returns 'five'

In general, when seeking the i-th substring, if the omitted substring is in
the n-th position, this method is guaranteed to produce correct results only
when 0 < i < (n - 1), or when i = n AND there are at most n non-null
substrings. If there are multiple omitted substrings, correct results may also
be obtained for other values of i, but only by coincidence.

Peter Nilsson

unread,
Mar 1, 2008, 12:37:03 AM3/1/08
to

Here are some (not necessarily pretty) ways, assuming n > 0...

select
x,
n,
substr(
'|' || x || '|',
instr('|' || x || '|', '|', 1, n) + 1,
instr('|' || x || '|', '|', 1, n + 1)
- instr('|' || x || '|', '|', 1, n) - 1),
regexp_replace(
'|' || x || '|',
'^([^\|]*\|){' || n || ',' || n || '}([^\|]*)|.*',
'\2' ),
regexp_substr(
regexp_substr(x || '|', '[^\|]*\|', 1, n),
'[^\|]*'),
regexp_substr(
'|' || x,
'[^\|]*',
instr('|' || x, '|', 1, n) + 1)
from
(select 'one|two||four|five' x from dual),
(select level n from dual connect by level <= 7)

--
Peter

Mark D Powell

unread,
Mar 1, 2008, 9:21:35 AM3/1/08
to
On Feb 29, 1:11 pm, spamb...@milmac.com (Doug Miller) wrote:
> be obtained for other values of i, but only by coincidence.- Hide quoted text -
>
> - Show quoted text -

Doug, valid point. I was not thinking about general cases only the
posted one and one should always be aware of what will break what you
wrote.

citte

unread,
Apr 22, 2008, 6:31:52 AM4/22/08
to
On 29 Feb, 20:11, spamb...@milmac.com (Doug Miller) wrote:

> In article <248f35ac-1e8c-424a-80cd-1da3edc66...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Pow...@eds.com> wrote:
>
>
>
> >On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
> >> Hi,
> >> I have a problem, I found on internet a way to split strings separated
> >> by pipe | (for example) with regexp, something like that:
>
> >> SELECTREGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)

> >> FROM dual;
>
> >> which gives the result:
> >> Three
>
> >> the problem comes when I have one of the words separated by the pipe
> >> is actually aNull, so the string becomes

> >> 'One|Two||Four|'
>
> >> the query
>
> >> SELECTREGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)

> >> FROM dual;
>
> >> result in:
> >> Four
>
> >> I needed aNull(or empty string).

> >> How is this possible?
>
> >> Thank you in advance!
>
> >Will this help?
>
> >SQL> l
> > 1 selectregexp_substr(replace('one|two||four','||','|NULL|'),

> > 2* '[^|]+',1,3) from dual
> >SQL> /
>
> >REGE
> >----
> >NULL
>
> Doesn't work in a more general case, e.g. attempting to select the fourth
> string from 'one|two|||five' -- the desired result isNULL, but this method

> returns 'five'. Conversely, trying to retrieve the fifth string should return
> 'five' but instead returnsNULL.
>
>
>
> >Also to return an actualNULLvalue
> >SQL> selectregexp_substr(replace('one|two||four','||',''),

> > 2 '[^|]+',1,3) from dual;
>
> >R
> >-
>
> Also doesn't work for more general cases:
> - specifying the second substring should return 'two' but instead returns
> 'twofour'
> - specifying the fourth substring should return 'four' but instead returns anullstring
> - specifying the third substring in 'one|two||four|five' should return anull
> string, but instead returns 'five'
>
> In general, when seeking the i-th substring, if the omitted substring is in
> the n-th position, this method is guaranteed to produce correct results only
> when 0 < i < (n - 1), or when i = n AND there are at most n non-null
> substrings. If there are multiple omitted substrings, correct results may also
> be obtained for other values of i, but only by coincidence.

yes, you're right. maybe something more compact than that proposed by
Peter could be:

select
replace(
regexp_substr(replace('one|two||four','|','|\'),'[^|]+',1,4)
, '\', ''
) REG_RESULT
from dual;

where '\' should be a character that NEVER appears in "field" strings
(another one could be chosen)
it's an inelegant solution (or really ugly?), you should be really
sure it's never present

I wanted to learn regexp, but I think replace could be replaced
(sorry :) with a regular expression...

thank you!

citte

unread,
Apr 22, 2008, 6:32:14 AM4/22/08
to
On 29 Feb, 20:11, spamb...@milmac.com (Doug Miller) wrote:
> In article <248f35ac-1e8c-424a-80cd-1da3edc66...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Pow...@eds.com> wrote:
>
>
>
> >On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
> >> Hi,
> >> I have a problem, I found on internet a way to split strings separated
> >> by pipe | (for example) with regexp, something like that:
>
> >> SELECTREGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)

> >> FROM dual;
>
> >> which gives the result:
> >> Three
>
> >> the problem comes when I have one of the words separated by the pipe
> >> is actually aNull, so the string becomes

> >> 'One|Two||Four|'
>
> >> the query
>
> >> SELECTREGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)

> >> FROM dual;
>
> >> result in:
> >> Four
>
> >> I needed aNull(or empty string).

> >> How is this possible?
>
> >> Thank you in advance!
>
> >Will this help?
>
> >SQL> l
> > 1 selectregexp_substr(replace('one|two||four','||','|NULL|'),

> > 2* '[^|]+',1,3) from dual
> >SQL> /
>
> >REGE
> >----
> >NULL
>
> Doesn't work in a more general case, e.g. attempting to select the fourth
> string from 'one|two|||five' -- the desired result isNULL, but this method

> returns 'five'. Conversely, trying to retrieve the fifth string should return
> 'five' but instead returnsNULL.
>
>
>
> >Also to return an actualNULLvalue
> >SQL> selectregexp_substr(replace('one|two||four','||',''),

> > 2 '[^|]+',1,3) from dual;
>
> >R
> >-
>
> Also doesn't work for more general cases:
> - specifying the second substring should return 'two' but instead returns
> 'twofour'
> - specifying the fourth substring should return 'four' but instead returns anullstring
> - specifying the third substring in 'one|two||four|five' should return anull
> string, but instead returns 'five'
>
> In general, when seeking the i-th substring, if the omitted substring is in
> the n-th position, this method is guaranteed to produce correct results only
> when 0 < i < (n - 1), or when i = n AND there are at most n non-null
> substrings. If there are multiple omitted substrings, correct results may also
> be obtained for other values of i, but only by coincidence.

yes, you're right. maybe something more compact than that proposed by

0 new messages