I'm trying to figure out how to write a CLR UDF that would take a
string of data, strip out any high/low ASCII (outside 32-126), and
return the cleaned string. I found code from last year that does it
with normal SQL, but a faster solution is needed. It looks like using
regexp might work, since I could do [^\x20-\x7E], but I don't know how
to tell it to return everything in the string except the unwanted
characters.
Any help greatly appreciated.
But to do it in the CLR, the best is probably to loop the string and
copy matching characters to the return value. The SQL query would include
a condition on whether the string have any such characters at all, so the
function would only be invoked if there is reason to copy.
--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Totally agree. In this case we're parsing an EDI string which should
only have standard alphanumeric, but due to fat-fingered input may
not. I had actually taken that into account. :)
> But to do it in the CLR, the best is probably to loop the string and
> copy matching characters to the return value. The SQL query would include
> a condition on whether the string have any such characters at all, so the
> function would only be invoked if there is reason to copy.
Okay, that makes sense. I was hoping that I could do something like
sed s/[^\x20-\x7E//g in the CLR, which might be faster than looping
through the string.
Many thanks!
Maybe you can, but this is an SQL Server newsgroup. You may get better
answers in a .Net forum. The key issue from a SQL Server perspective is
that you should only invoke the function if there is a need to, whence
the importance to use a WHERE clause. (Assuming that this is an UPDATE.)
Any implmentation must in the end loop over string and copy at least
the characters after the offending character. Obviously the lower level
you do it on, the faster it will be.
There are many way to do it with "normal SQL", and some method will be
faster than others. You can find several solutions at
http://groups.google.nl/group/microsoft.public.sqlserver.programming/browse_thread/thread/d0d6101048f57e21/f6cb0996ac63ad19?hl=nl&lnk=gst
I think using a CLR UDF would be a good idea too (if you don't mind
running CLR on your server). Maybe other can help you with that.
--
Gert-Jan