find and replace syntax

Yvonne Rosehart

unread,

May 1, 2012, 4:22:34 PM5/1/12

to

Hi,
I need to replace a range of characters that might appear in someones
name - such as '(-),: and spaces - ideally I want to remove them and
then concatenate the name so that YVON-NE would appear as YVONNE. Can
anyone provide me with some syntax to do this?

Bruce Weaver

unread,

May 1, 2012, 5:11:13 PM5/1/12

to

Look up the REPLACE function in the fine manual. For the specific
example you give:

COMPUTE Name = REPLACE(Name, "-", "").

It appears you need a series of such commands. You could stick them in
a LOOP. Something like this:

data list list / Name (a15).
begin data
"YVON-NE"
"YVON'NE"
"YVON(-)NE"
"YVON,NE"
"YVON:NE"
"Y VON NE"
end data.

STRING Name2 Key(a10).
COMPUTE Name2 = Name.
COMPUTE Key = "'(-),: ".
LOOP # = 1 to LENGTH(Key).
- COMPUTE Name2 = REPLACE(Name2,substr(Key,#,1),"").
END LOOP.
EXECUTE.
DELETE VARIABLES Key.
LIST.

OUTPUT:

Name Name2

YVON-NE YVONNE
YVON'NE YVONNE
YVON(-)NE YVONNE
YVON,NE YVONNE
YVON:NE YVONNE
Y VON NE YVONNE

Number of cases read: 6 Number of cases listed: 6

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Art Kendall

unread,

May 1, 2012, 5:12:57 PM5/1/12

to

copy the syntax below into a new instance of SPSS. Run it.
I used "list" twice to show how it works.
Is this what you want?

data list list/name_in(a10).
begin data
Jo-Ann
D'Ann
"jo ann"
" jo ann"
JoAnn
Jo+Ann
Jo2Ann
Jo?An??
Jo/ann
end data.
string name_new(a10).
compute name_new = upcase(name_in).
loop #i = 1 to 10.
if not range(substr(name_in,#i,1),'A','Z') substr(name_new,#i,1) ="".
end loop.
list.
compute name_new = replace(name_new," ","").
list.

Art Kendall
Social Research Consultants

Art Kendall

unread,

May 1, 2012, 5:14:53 PM5/1/12

to

copy the syntax below into a new instance of SPSS. Run it.
I used "list" twice to show how it works.
Is this what you want?

If you know some pattern occurs in your data you might add more example
cases between begin data & end data.

data list list/name_in(a10).
begin data
Jo-Ann
D'Ann
"jo ann"
" jo ann"
JoAnn
Jo+Ann
Jo2Ann
Jo?An??
Jo/ann
end data.
string name_new(a10).
compute name_new = upcase(name_in).
loop #i = 1 to 10.
if not range(substr(name_in,#i,1),'A','Z') substr(name_new,#i,1) ="".
end loop.
list.
compute name_new = replace(name_new," ","").
list.

Art Kendall
Social Research Consultants

On 5/1/2012 4:22 PM, Yvonne Rosehart wrote:

David Marso

unread,

May 2, 2012, 2:30:16 AM5/2/12

to A...@drkendall.org

My version is slightly more complex than Art's but illustrates an additional feature of the RANGE function (eliminating UPCASE and REPLACE in this case), and conditional LOOP termination.
---

dPRESERVE.
SET MXLOOPS=100 /*Whatever the length of the STRING if > 40 */.
STRING name_new(a10).
STRING #S (A1).
COMPUTE name_new = name_in.
COMPUTE #i=1.
LOOP.
+ COMPUTE #S=SUBSTR(name_new,#i,1).
+ DO IF NOT RANGE(#S,'A','Z','a','z').
+ COMPUTE SUBSTR(name_new,#i) =SUBSTR(name_new,#i+1).
+ IF (#i GT 1) #i=#i-1.
+ ELSE.
+ COMPUTE #i=#i+1.
+ END IF.
END LOOP IF #i GT LENGTH(RTRIM(name_new)).
RESTORE.
LIST.

*/UNTESTED 'modern UNICODE version' using CHAR family /*.
*?????????????????????????????????????????????????????/*
*JoNoH can you verify???.
LOOP.
+ COMPUTE #S=CHAR.SUBSTR(name_new,#i,1).
+ DO IF NOT RANGE(#S,'A','Z','a','z').
+ COMPUTE CHAR.SUBSTR(name_new,#i) =CHAR.SUBSTR(name_new,#i+CHAR.LENGTH(#s)).
+ IF (#i GT 1) #i=#i-CHAR.LENGTH(#s).
+ ELSE.
+ COMPUTE #i=#i+CHAR.LENGTH(#s).
+ END IF.
END LOOP IF #i GT CHAR.LENGTH(RTRIM(name_new)).
RESTORE.
LIST.

Art Kendall

unread,

May 2, 2012, 8:46:14 AM5/2/12

to

very elegant!

Art Kendall
Social Research Consultants

>> cases between begin data& end data.

Jon Peck

unread,

May 2, 2012, 10:35:28 AM5/2/12

to A...@drkendall.org

A few comments:
MXLOOPS does not apply if the loop is indexed, so one could use an indexed loop with break instead and eliminate the PRESERVE/RESTORE.

CHAR.LENGTH returns the length of the string in characters (not bytes as with the old LENGTH function) after trimming trailing blanks, so RTRIM is unnecessary.

Most important, the range test

NOT RANGE(#S,'A','Z','a','z')

excludes any accented characters and characters such as Japanese, Hebrew, Russian, etc. If that matters here, the best approach would be to use Python programmability to access the Unicode character classification, which provides a general definition of a letter or to use a Python regular expression. Such functionality can be accessed using the SPSSINC TRANS extension command, which minimizes the amount of Python code required.

David Marso

unread,

May 2, 2012, 2:37:07 PM5/2/12

to A...@drkendall.org

> A few comments:
> MXLOOPS does not apply if the loop is indexed, so one could use an indexed loop with break instead and eliminate the PRESERVE/RESTORE.

Of course but I believe it poor programming practice to alter the value of a loop index variable within the loop so I used the nonindexed version and added SET MXLOOPS as an afterthought and the PRESERVE/RESTORE as a second afterthought ;-) .

+ IF (#i GT 1) #i=#i-1.
+ ELSE.
+ COMPUTE #i=#i+1.
>

> CHAR.LENGTH returns the length of the string in characters (not bytes as with the old LENGTH function) after trimming trailing blanks, so RTRIM is unnecessary.
>
> Most important, the range test
> NOT RANGE(#S,'A','Z','a','z')

> excludes any accented characters and characters such as Japanese, Hebrew, >Russian, etc.....
I suppose one might do a little more work and figure out the RANGE of characters desired and work it out as follows (simply add additional ranges).
*-----Even more elegant solution ;-)-----.

PRESERVE.
SET MXLOOPS=100 /*Whatever the length of the STRING if > 40 */.
STRING name_new(a10).

COMPUTE name_new = name_in.
COMPUTE #i=1.
LOOP.

+ COMPUTE #found= NOT(RANGE(NUMBER(SUBSTR(name_new,#i,1),PIB1) ,97,132,65,90)).
+ IF #found SUBSTR(name_new,#i)=SUBSTR(name_new,#i+1).
+ COMPUTE #i=SUM(#i,-1*(#found AND #i GT 1),NOT(#found)).

END LOOP IF #i GT LENGTH(RTRIM(name_new)).
RESTORE.
LIST.

Jon Peck

unread,

May 2, 2012, 2:59:38 PM5/2/12

to A...@drkendall.org

One certainly wouldn't try to modify the loop index inside but rather compute new offsets. As for the range, if it is simple, one could include more conditions, but with over 100,000 characters possible, a general solution would require a different approach.

Art Kendall

unread,

May 3, 2012, 6:50:54 AM5/3/12

to

Art Kendall
Social Research Consultants

If there are only a few such instances, and this is a one time
application, and you know how to enter those characters into syntax,
you could write a few REPLACE statements.
I have no idea where those character are in the collating sequence.
Since not every letter has diacriticals I doubt that there is simple
mapping.
So, other wise, you could follow up on Jon Peck's suggestion and use a
PYTHON approach.

Art Kendall
Social Research Consultants

On 5/2/2012 10:53 AM, Yvonne Rosehart wrote:
> Hi Art,
>
> Thanks so much - that worked! I have one more question:
>
> I also want to remove accents, circumflexes, etc. from letters - do
> you know how to do this?
>
> e.g change:
>
> "À" to "A"
>
> "Á" to "A"
>
> "Â"to "A"
>
> "Ä" to "A"
>
> "Ç" to "C"
>
> Thanks,
>
> Yvonne
>
>
>
> > Date: Tue, 1 May 2012 17:14:53 -0400
> > From: A...@DrKendall.org
> > To: yros...@hotmail.com
> > Subject: Re: find and replace syntax