Pulling a number out of an nvarchar field

Brad Simon

unread,

Dec 18, 2008, 12:45:01 PM12/18/08

to

Hi all,

I have an nvarchar field that has data like:
XYZ40AB
WX32A
27 blah

I need to extract the numbers out of this field as such:
XYZ40AB = 40
WX32A = 32
27 blah = 27

What would I do to get those? I am on SQL 2005, and I am using T-SQL,
strictly. It is being done in a trigger, so it has to be failsafe. If there
are no numbers it would return a null.

--
Thanks in advance,
Brad Simon

Richard Mueller [MVP]

unread,

Dec 18, 2008, 1:10:47 PM12/18/08

to

Brad Simon wrote:

>
> I have an nvarchar field that has data like:
> XYZ40AB
> WX32A
> 27 blah
>
> I need to extract the numbers out of this field as such:
> XYZ40AB = 40
> WX32A = 32
> 27 blah = 27
>
> What would I do to get those? I am on SQL 2005, and I am using T-SQL,
> strictly. It is being done in a trigger, so it has to be failsafe. If
> there
> are no numbers it would return a null.
>

What if there is more than one number in the string? For example, what if
the string is "XY40A3"?

--
Richard Mueller
MVP Directory Services
Hilltop Lab - http://www.rlmueller.net
--

Brad Simon

unread,

Dec 18, 2008, 1:26:17 PM12/18/08

to

If there is one, then I only take the first number and use it. example:
XY40A3 = 40

I expect I will have to either create a function or do some click code.
But, I am not extremely verse in T-SQL

--
Thanks in advance,
Brad Simon

Plamen Ratchev

unread,

Dec 18, 2008, 2:09:58 PM12/18/08

to

It may be best to use regular expressions via CLR:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx

In T-SQL you can do something like this (the multiple Nx CTEs generate a
table with numbers, you can create and use permanent:
http://www.projectdmx.com/tsql/tblnumbers.aspx):

CREATE TABLE Foo (
keycol INT PRIMARY KEY,
datacol NVARCHAR(30));

INSERT INTO Foo VALUES(1, 'XYZ40AB');
INSERT INTO Foo VALUES(2, 'WX32A');
INSERT INTO Foo VALUES(3, '27 blah');
INSERT INTO Foo VALUES(4, 'ABNC');
INSERT INTO Foo VALUES(5, 'XY40A3');
INSERT INTO Foo VALUES(6, 'XYA3');
INSERT INTO Foo VALUES(7, '7');
INSERT INTO Foo VALUES(8, NULL);

WITH N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM N4),
Data AS
(SELECT keycol, datacol,
(SELECT CASE WHEN SUBSTRING(datacol, n, 1) LIKE '[0-9]'
THEN SUBSTRING(datacol, n, 1)
ELSE '-'
END
FROM Nums
WHERE n <= LEN(datacol)
FOR XML PATH('')) + '-' AS numdata
FROM Foo)
SELECT keycol, datacol,
NULLIF(
SUBSTRING(numdata,
PATINDEX('%[0-9]%', numdata),
CHARINDEX('-', numdata,
PATINDEX('%[0-9]%', numdata) + 1) -
PATINDEX('%[0-9]%', numdata)), '') AS number
FROM Data;

/*

Results:

keycol datacol number
----------- ---------- --------
1 XYZ40AB 40
2 WX32A 32
3 27 blah 27
4 ABNC NULL
5 XY40A3 40
6 XYA3 3
7 7 7
8 NULL NULL

*/

--
Plamen Ratchev
http://www.SQLStudio.com

Brad Simon

unread,

Dec 18, 2008, 3:05:00 PM12/18/08

to

I will try this out on Monday. (I am not on site again until then)

Thanks!!!
--
Brad Simon

Hugo Kornelis

unread,

Dec 18, 2008, 7:15:48 PM12/18/08

to

Hi Brad,

Here's an alternative solution, that I find easier to understand than
the one Plamen posted (may be a matter of personal preference):

SELECT keycol, datacol,
CASE WHEN datacol LIKE '%[0-9]%[^0-9]%'
THEN LEFT(STUFF(datacol, 1, PATINDEX ('%[0-9]%', datacol) -
1, ''),
PATINDEX('%[^0-9]%', STUFF(datacol, 1, PATINDEX
('%[0-9]%', datacol) - 1, '')) - 1)
WHEN datacol LIKE '%[0-9]%'
THEN STUFF(datacol, 1, PATINDEX ('%[0-9]%', datacol) - 1,
'')
END AS numdata
FROM Foo;

The CASE handles three cases: the first is where datacol contains at
least one numeric followed by at least one non-numeric. It uses STUFF to
strip the characters before the first numeric, then uses LEFT on the
result of that STUFF function to strip everything from the first
non-numeric.

Since the call to LEFT would result in invalid parameters is there are
no non-numeric characters after the numeric ones, the next case is a
simplified version of the first, only stripping the characters before
the first numeric for strings that have no non-numeric characters after
the numerics.

And the third possibility of the CASE, when the string has no numeric
characters at all, simply returns NULL.

And below is a slightly simplified version of the above one. This one
works around the invalid paramter for the LEFT function by simply
concatenating an X to the result of the STUFF so that we're now sure
that there always will be some non-numeric after the numeric part. I
consider this a form of cheating, so I like the previous version better.
But your mileage may vary (and there's no denying that this one is
shorter).

SELECT keycol, datacol,
CASE WHEN datacol LIKE '%[0-9]%'
THEN LEFT(STUFF(datacol, 1, PATINDEX ('%[0-9]%', datacol) -
1, ''),
PATINDEX('%[^0-9]%', STUFF(datacol, 1, PATINDEX
('%[0-9]%', datacol) - 1, '') + 'X') - 1)
END AS numdata
FROM Foo;

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Plamen Ratchev

unread,

Dec 18, 2008, 9:47:15 PM12/18/08

to

Hugo is correct, his solution is easier to understand (and for sure
performing better). I was thinking about a different problem. The table
with numbers is not needed here, and no need for the complexity to shred
the string to individual characters and then concatenate back.

Since the last part of the query does most of the work to extract the
first numeric sequence occurrence, that query with small modifications
can be used to accomplish the same:

SELECT keycol, datacol,
NULLIF(
SUBSTRING(datacol,
PATINDEX('%[0-9]%', datacol),

PATINDEX('%[^0-9]%',
STUFF(datacol, 1,

PATINDEX('%[0-9]%', datacol), '') + '-')), '') AS number
FROM Foo;

This ends up similar to the second solution posted by Hugo.

Uri Dimant

unread,

Dec 19, 2008, 11:21:25 AM12/19/08

to

Brad
I learned this from Steve Kass, I modified a little bit Plamen's query
WITH cte

AS

(

SELECT m1.datacol,

(SELECT CASE WHEN SUBSTRING(datacol, n, 1) LIKE '%[0-9]%'

THEN PATINDEX('%'+SUBSTRING(datacol, n, 1)+'%',datacol)

END

FROM numbers

WHERE SUBSTRING(datacol,n,1) LIKE '%[0-9]%'

FOR XML PATH('')) AS numdata

FROM FOO m1

) SELECT datacol,LEFT(substring(datacol,numdata/1000000000%10,1) +

substring(datacol,numdata/100000000%10,1)+

substring(datacol,numdata/10000000%10,1) +

substring(datacol,numdata/1000000%10,1) +

substring(datacol,numdata/100000%10,1) +

substring(datacol,numdata/10000%10,1) +

substring(datacol,numdata/1000%10,1) +

substring(datacol,numdata/100%10,1)+

substring(datacol,numdata/10%10,1)+

substring(datacol,numdata%10,1),2) AS col

FROM cte

"Brad Simon" <bsi...@simondeveloping.com> wrote in message
news:9B8FC6F7-ABC8-4AFD...@microsoft.com...

Hugo Kornelis

unread,

Dec 19, 2008, 6:01:49 PM12/19/08

to

Hi Plamen,

Nice one. I think I like this one even better than my suggestion.

steve dassin

unread,

Dec 21, 2008, 12:18:26 AM12/21/08

to

This is why sql programmers eat by themselves in the IT cafeteria. It's like
watching someone rub sticks together to start a fire. It's going on 2009 in
case sql'ers haven't noticed:) This should be easy do, understand and
maintain. Cobbling together string functions into sql spaghetti is none of
these. And suppose the OP has to get the 2nd, 3rd or the max number? As for
liking one of the sql queries over another it's like only a mother could
love an ugly misbehaving little brat:)
Here's an easy and contemporary way to do this:
http://beyondsql.blogspot.com/2008/12/extracting-numbers-from-string.html

Plamen Ratchev

unread,

Dec 21, 2008, 5:30:28 AM12/21/08

to

The easy and contemporary way ignores the fact that this is NVARCHAR
column and as such can store a lot more characters that those listed in
AlphaTable (http://en.wikipedia.org/wiki/Unicode). :)

IMHO, the easy and contemporary way is to use CLR and regular
expressions as I first posted, rendering the code to something like this:

SELECT dbo.RegexMatch(column, '/\d+/') FROM Table;

--CELKO--

unread,

Dec 23, 2008, 6:14:01 PM12/23/08

to

>> The easy and contemporary way ignores the fact that this is NVARCHAR column and as such can store a lot more characters that those listed in AlphaTable (http://en.wikipedia.org/wiki/Unicode). :) <<

But can I cast NVARCHAR(n) to VARCHAR(n), uppercase it and then use
nested REPLACE() functions to remove the alphas? Of course you have
trouble if the string ins in Chinese ...

Tony Rogerson

unread,

Dec 26, 2008, 3:43:39 AM12/26/08

to

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:c2a5fd00-cf2a-4da8...@w24g2000prd.googlegroups.com...

Still pimping utterly useless solutions that aren't maintainable in the real
world.

Nobody with an ounce of experience, with an ounce of understanding of the
concepts of maintainable code would come up with a solution so insane and
backward.

--ROGERSON--

Brad Simon

unread,

Dec 30, 2008, 10:49:00 AM12/30/08

to

This particular code worked excellent for me. All I needed to do was throw a
"LEFT( <code>, 2)" on it and it worked exactly as I needed.

Thank you everyone for your awesome responses!

--
Brad Simon