Removing ASCII with code 160 from varchar2 using Oracle RegExp

3,731 views
Skip to first unread message

ratul

unread,
Sep 1, 2008, 9:18:19 AM9/1/08
to Oracle PL/SQL
Hi,

I've found myself in a situation where a particular varchar2 field in
the database contains the extended ascii character of code 160 (which
seems to correspond to some type of space). However when our business
logic code in C++ tries to access it, it replaces that character with
non-printable gibberish which breaks our system. Updating our code to
handle extended ascii characters is not immediately feasible, nor is
the use of stored procedures. I'm therefore looking for a simple
convert or regexp_replace type built-in function that will allow me to
achieve this.

I've already tried something like convert(field, 'US7ASCII',
'WE8ISO8859P1') but that didn't work. So far, regexp_replace seems to
be the most promising method, however I am unable to define a method
to concatenate various character classes.

Consider the string: <Tontine@Balino Fund Ltd/Class D >
The character immediately before the closing bracket is meant to be
ascii 160.

Consider query:
select '<' || 'Tontine@Balino Fund Ltd/Class D ' || '>' orig_string,
'<' || regexp_replace('Tontine@Balino Overseas Fund Ltd/Class
D ', '([^[:alnum:]])', '_') || '>' clean_string
from dual

When running this, all the spaces and @ and / are lost, which isn't
what I want. I only want the penultimate character to be replaced by
an underscore. However I'm unable to find a synax that allows me to
define a concatenated set of permitted characters i.e. alnum and any
other special characters I intend to allow, and then negate that by
means of the initial carot.

Any help would be most appreciated.

Thanks in advance

Ratul

ratul

unread,
Sep 1, 2008, 11:54:23 AM9/1/08
to Oracle PL/SQL
Small clarification...rather than saying 'concatenate character
classes', I really mean 'union of various character classes'.

Lou

unread,
Sep 1, 2008, 1:06:54 PM9/1/08
to Oracle PL/SQL


On Sep 1, 8:54 am, ratul <hola.ra...@gmail.com> wrote:
> Small clarification...rather than saying 'concatenate character
> c


If the unprintable character is considered to be garbage, couldn't you
eliminate it:

Update MyTable
set original_string = rtrim( original_string, chr(160) )
where instr( original_string, chr(160) ) > 0 ;

ratul

unread,
Sep 2, 2008, 4:39:04 AM9/2/08
to Oracle PL/SQL
Thanks Lou,

However your query will only be removing that character from the
end...what if it appears in the middle? Also what of other characters
I wish to ignore? My problem is actually a little more general...I
want all non-printable standard ASCII characters (code > 127) to be
stripped out from a string.

Regards

Ratul

R venkat

unread,
Sep 2, 2008, 7:24:38 AM9/2/08
to Oracle...@googlegroups.com
hi,
 
Plz try this one..
 
replace(replace(your_field,chr(160),' '),' ','_')

ora...@msn.com

unread,
Sep 3, 2008, 11:45:37 AM9/3/08
to Oracle PL/SQL


On Sep 2, 6:24 am, "R venkat" <byven...@gmail.com> wrote:
> hi,
>
> Plz try this one..
>
> replace(replace(your_field,chr(160),' '),' ','_')
>
> On 9/2/08, ratul <hola.ra...@gmail.com> wrote:
>
>
>
>
>
> > Thanks Lou,
>
> > However your query will only be removing that character from the
> > end...what if it appears in the middle? Also what of other characters
> > I wish to ignore? My problem is actually a little more general...I
> > want all non-printable standard ASCII characters (code > 127) to be
> > stripped out from a string.
>
> > Regards
>
> > Ratul- Hide quoted text -
>
> - Show quoted text -


Why are you using REPLACE twice? It's unnecessary and inefficient:

SQL> create table ascii_test(
2 ascii_strng varchar2(40)
3 );

Table created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(160)||'uopitrs');

1 row created.

SQL>
SQL> select ascii_strng
2 from ascii_test;

ASCII_STRNG
----------------------------------------
asdfjkláuopitrs

SQL>
SQL> select replace(ascii_strng, chr(160), '_')
2 from ascii_test;

REPLACE(ASCII_STRNG,CHR(160),'_')
----------------------------------------
asdfjkl_uopitrs

SQL>

One call to REPLACE is sufficient.



David Fitzjarrell

ora...@msn.com

unread,
Sep 3, 2008, 11:55:45 AM9/3/08
to Oracle PL/SQL
This should work, although there may be a more efficient way:

SQL> create table ascii_test(
2 ascii_strng varchar2(40)
3 );

Table created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(160)||'uop'||chr(234)||'itrs');

1 row created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(140)||'uopitrs'||chr(244));

1 row created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(167)||'u'||chr(177)||'opitrs');

1 row created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(128)||'uopitrs');

1 row created.

SQL>
SQL> insert into ascii_test
2 values('asdfjkl'||chr(133)||'uopit'||chr(218)||'rs');

1 row created.

SQL>
SQL> select ascii_strng
2 from ascii_test;

ASCII_STRNG
----------------------------------------
asdfjkláuopΩitrs
asdfjkl┐uopitrs⌠
asdfjklºu▒opitrs
asdfjkl┐uopitrs
asdfjkl┐uopit┌rs

SQL>
SQL> begin
2 for i in 128..255 loop
3 update ascii_test
4 set ascii_strng = replace(ascii_strng, chr(i),
'_')
5 where instr(ascii_strng, chr(i)) > 0;
6 end loop;
7 commit;
8 end;
9 /

PL/SQL procedure successfully completed.

SQL>
SQL> select ascii_strng
2 from ascii_test;

ASCII_STRNG
----------------------------------------
asdfjkl_uop_itrs
asdfjkl_uopitrs_
asdfjkl_u_opitrs
asdfjkl_uopitrs
asdfjkl_uopit_rs

SQL>

You might not want to replace all of the unprintable characters with
the same printable representation, but it does show how you can
replace multiple uprintable characters in a single string.


David Fitzjarrell

Ratul Bhadury

unread,
Sep 3, 2008, 12:35:36 PM9/3/08
to Oracle...@googlegroups.com
Thanks Daniel,

I appreciate the time and effort you put in to your post. Unforunately i won't be able to use a stored procedure, in order to iterate over the extended ascii range. I was hoping it would somehow be possible using regexp_replace instead.

Ratul

Michael Moore

unread,
Sep 4, 2008, 3:20:18 PM9/4/08
to Oracle...@googlegroups.com
SELECT TRANSLATE ('AaBbCc', TRANSLATE ('AaBbCc', 'ABC', ' '), ' ')
  FROM DUAL;

Where 'AaBbCc' is the string with garbage characters and 'ABC' is all the the characters you want to keep. In your case, all the ascii characters.
Mike
Reply all
Reply to author
Forward
0 new messages