Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CLR UDF to strip out high-low ASCII from a text string?

4 views
Skip to first unread message

M Bourgon

unread,
Oct 7, 2010, 5:23:38 PM10/7/10
to
Howdy.
(apologies if this is a dupe)

I'm trying to figure out how to write a CLR UDF that would take a
string of data, strip out any high/low ASCII (outside 32-126), and
return the cleaned string. I found code from last year that does it
with normal SQL, but a faster solution is needed. It looks like using
regexp might work, since I could do [^\x20-\x7E], but I don't know how
to tell it to return everything in the string except the unwanted
characters.

Any help greatly appreciated.

Erland Sommarskog

unread,
Oct 8, 2010, 6:06:41 PM10/8/10
to
M Bourgon (bou...@gmail.com) writes:
> I'm trying to figure out how to write a CLR UDF that would take a
> string of data, strip out any high/low ASCII (outside 32-126), and
> return the cleaned string. I found code from last year that does it
> with normal SQL, but a faster solution is needed. It looks like using
> regexp might work, since I could do [^\x20-\x7E], but I don't know how
> to tell it to return everything in the string except the unwanted
> characters.

That's probably a bad idea. I live on Rörstrandsgatan, not Rrstrandsgatan.
If you are arbitrarily mutiliating customer names and addresses that could
result in sour relations.

But to do it in the CLR, the best is probably to loop the string and
copy matching characters to the return value. The SQL query would include
a condition on whether the string have any such characters at all, so the
function would only be invoked if there is reason to copy.


--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

M Bourgon

unread,
Oct 8, 2010, 7:32:14 PM10/8/10
to
On Oct 8, 5:06 pm, Erland Sommarskog <esq...@sommarskog.se> wrote:
> That's probably a bad idea. I live on Rörstrandsgatan, not Rrstrandsgatan.
> If you are arbitrarily mutiliating customer names and addresses that could
> result in sour relations.

Totally agree. In this case we're parsing an EDI string which should
only have standard alphanumeric, but due to fat-fingered input may
not. I had actually taken that into account. :)

> But to do it in the CLR, the best is probably to loop the string and
> copy matching characters to the return value. The SQL query would include
> a condition on whether the string have any such characters at all, so the
> function would only be invoked if there is reason to copy.

Okay, that makes sense. I was hoping that I could do something like
sed s/[^\x20-\x7E//g in the CLR, which might be faster than looping
through the string.
Many thanks!

Erland Sommarskog

unread,
Oct 9, 2010, 4:58:42 AM10/9/10
to
M Bourgon (bou...@gmail.com) writes:
> Okay, that makes sense. I was hoping that I could do something like
> sed s/[^\x20-\x7E//g in the CLR, which might be faster than looping
> through the string.

Maybe you can, but this is an SQL Server newsgroup. You may get better
answers in a .Net forum. The key issue from a SQL Server perspective is
that you should only invoke the function if there is a need to, whence
the importance to use a WHERE clause. (Assuming that this is an UPDATE.)

Any implmentation must in the end loop over string and copy at least
the characters after the offending character. Obviously the lower level
you do it on, the faster it will be.

Gert-Jan Strik

unread,
Oct 9, 2010, 5:27:58 AM10/9/10
to

There are many way to do it with "normal SQL", and some method will be
faster than others. You can find several solutions at
http://groups.google.nl/group/microsoft.public.sqlserver.programming/browse_thread/thread/d0d6101048f57e21/f6cb0996ac63ad19?hl=nl&lnk=gst

I think using a CLR UDF would be a good idea too (if you don't mind
running CLR on your server). Maybe other can help you with that.
--
Gert-Jan

0 new messages