Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Deleting characters from multiple string variables

160 views
Skip to first unread message

Erin

unread,
Oct 17, 2016, 5:28:39 PM10/17/16
to
Hello,
I have 360 string variables that all look like this:

Variable name: DoR_Tm1_Visit
Responses
Days_OoR22
Days_OoR4
Days_OoR3
Days_OoR2
Days_OoR18
Days_OoR2

(responses go from 1 to 30 and have missing data)

I want to remove the text and just have the number
DoR_Tm1_Visit
22
4
3
2
18
2

I have used this syntax:

string DoR_Tm1_Visit _Num(a20).
COMPUTE DoR_Tm1_Visit_Num = NUMBER(SUBSTR(DoR_Tm1_Visit,9),f2).
EXECUTE .

Which works, but it is clunky and coming up with an error. Also it is taking half a day to run for all the variables and my huge data set


Could do with some help on this one!
thanks

Bruce Weaver

unread,
Oct 17, 2016, 6:45:25 PM10/17/16
to
It was not easy to see in your original post, but notice in the quoted lines above that there is a blank space between Visit and _Num in your STRING command. That is what is causing the error message you are seeing, I suspect.

But if you remove the blank space, you'll just get another error, because your COMPUTE command will then be attempting to assign a numeric value to a string variable. So just get rid of the STRING command, and everything should work fine.

It will work fine for that particular variable, at least. But you said you have 360 variables. Is the to-be-removed letter string the same in all of them? Or can it vary, depending on the variable? If it can vary, you want some kind of looping solution. E.g., here is a simple DO-REPEAT solution for 3 variables where the to-be-removed string is constant within each variable, but varies across variables.

DATA LIST FIXED / V1 1-10 (A) V2 11-17 (A) V3 18-27 (A) .
BEGIN DATA
Days_OoR22 Junk22 Rubbish22
Days_OoR4 Junk4 Rubbish4
Days_OoR3 Junk3 Rubbish3
Days_OoR2 Junk2 Rubbish2
Days_OoR18 Junk18 Rubbish18
Days_OoR2 Junk2 Rubbish2
END DATA.

DO REPEAT V = V1 to V3 /
Junk = "Days_OoR" "Junk" "Rubbish".
- COMPUTE V = REPLACE(V,Junk,"").
END REPEAT.
* Now change the variable type to numeric.
ALTER TYPE V1 to V3 (F2.0).
LIST.

OUTPUT:

V1 V2 V3

22 22 22
4 4 4
3 3 3
2 2 2
18 18 18
2 2 2

HTH.

David Marso

unread,
Oct 17, 2016, 6:49:44 PM10/17/16
to
That code can't possibly work.
You declare DoR_Tm1_Visit _Num as STRING and then hammer at it with the NUMBER function.
Riddle me this:
1. What error message do you receive?

2. The notion 'huge data set' is very subjective.
How many cases do you actually have?

3. Why not run something USEFUL/INFORMATIVE rather than EXECUTE?

4. Are the variables contiguous in the file?

Maybe this?

DO REPEAT strvar=var1 TO var360
/numvar=new1 TO new360.
COMPUTE numvar=NUMBER(REPLACE(strvar,Days_OoR,""),F2).
END REPEAT.



On Monday, October 17, 2016 at 5:28:39 PM UTC-4, Erin wrote:

erin.ps...@gmail.com

unread,
Oct 17, 2016, 9:36:11 PM10/17/16
to
This syntax:
COMPUTE DoR_Tm1_Visit_Num = NUMBER(SUBSTR(DoR_Tm1_Visit,9),f2).
EXECUTE .

Does take out all of the unwanted text but takes a long time and results in this error:

Warning # 635
The string to be converted via the NUMBER function is of zero length. The
result has been set to the system-missing value.
Command line: 227 Current case: 3 Current splitfile group: 1

(the same warning comes up multiple times)

Warning # 92
The limit for MXWARNS warnings in this data pass has been exceeded. Further
warnings have been suppressed. To change the limit use SET MXWARNS.


Tried the syntax:
string DoR_Visit_Tm1 TO DoR_Visit_Tm13 (a20). (not sure if I need to define)

DO REPEAT strvar=DoR_Tm1_Visit TO DoR_Tm13_Visit.
/numvar=DoR_Visit_Tm1 TO DoR_Visit_Tm13.
COMPUTE numvar=NUMBER(REPLACE(strvar,Days_OoR,""),F2).
END REPEAT.
It runs and has no errors if I run the define string bit.
But the data is blank (i.e. not missing, but blank)

Other qns:
2- 200,000 case
3- Open to any suggestions you may have
4- No, but I can re-order

Thank you kindly

erin.ps...@gmail.com

unread,
Oct 17, 2016, 10:34:15 PM10/17/16
to
Thank you, nice spotting.

The to-be-removed letter string varies:

DATA LIST FIXED / V1 1-13 (A) V2 14-27 (A) V3 28-41 (A) .
BEGIN DATA
Days_OoR1 Study_Grp1 Study1
Days_OoR2 Study_Grp2 Study2
Days_OoR3 Study_Grp3 Study3
Days_OoR4 Study_Grp4 Study4
Days_OoR5 Study_Grp5 Study5
Days_OoR6 Study_Grp6 Study6
Days_OoR7 Study_Grp7 Study7
Days_OoR8 Study_Grp8 Study8
Days_OoR9 Study_Grp9 Study9
Days_OoR10 Study_Grp10 Study10
Days_OoR11 Study_Grp11 Study11
Days_OoR12 Study_Grp12 Study12
Days_OoR13 Study_Grp13 Study13
Days_OoR14 Study_Grp14 Study14
Days_OoR15 Study_Grp15 Study15
Days_OoR16 Study_Grp16 Study16
Days_OoR17 Study_Grp17 Study17
Days_OoR18 Study_Grp18 Study18
Days_OoR19 Study_Grp19 Study19
Days_OoR20 Study_Grp20 Study20
Days_OoR21 Study_Grp21 Study21
Days_OoR22 Study_Grp22 Study22
Days_OoR23 Study_Grp23 Study23
Days_OoR24 Study_Grp24 Study24
Days_OoR25 Study_Grp25 Study25
Days_OoR26 Study_Grp26 Study26
Days_OoR27 Study_Grp27 Study27
Days_OoR28 Study_Grp28 Study28
Days_OoR29 Study_Grp29 Study29
Days_OoR30 Study_Grp30 Study30
END DATA.

DO REPEAT V = V1 to V3 /
Junk = "Days_OoR" "Study_Grp" "Study".
COMPUTE V = REPLACE(V,Junk,"").
END REPEAT.

This also runs, but comes back with blanks for all responses for all cases

This is not very familiar to me- so it may be my syntax interpretations...

erin.ps...@gmail.com

unread,
Oct 17, 2016, 11:46:32 PM10/17/16
to
Ok, seems that error is replated to defining the string variable...

So if I run this on its own:
DO REPEAT strvar=DoR_Tm1_Visit TO DoR_Tm13_Visit
/numvar=DoR_Tm1_Visit TO DoR_Tm13_Visit.
COMPUTE numvar=NUMBER(REPLACE(strvar,Days_OoR,""),F2).
END REPEAT.


This error comes up

>Error # 4285 in column 38. Text: Days_OoR
>Incorrect variable name: either the name is more than 64 characters, or it is
>not defined by a previous command.
>Execution of this command stops.









On Tuesday, October 18, 2016 at 9:49:44 AM UTC+11, David Marso wrote:

David Marso

unread,
Oct 18, 2016, 1:26:26 AM10/18/16
to
Oops my bad:
Days_OoR should have quotes.

COMPUTE numvar=NUMBER(REPLACE(strvar,"Days_OoR",""),F2).

Bruce Weaver

unread,
Oct 18, 2016, 8:32:10 AM10/18/16
to
Try this:

DATA LIST FIXED / V1 1-13 (A) V2 14-28 (A) V3 35-42 (A) .
****************************.
ALTER TYPE V1 to V3 (F5.0).
DESCRIPTIVES V1 to V3.
****************************.

Bruce Weaver

unread,
Oct 18, 2016, 5:35:54 PM10/18/16
to
On Monday, October 17, 2016 at 10:34:15 PM UTC-4, erin.ps...@gmail.com wrote:
Given that you have 360 variables, the DO-REPEAT method I suggested earlier is not ideal, because it requires you to list all 360 of the to-be-removed strings. Here's a more general approach that scales up much more easily. It uses a "trick" shown on Andy Wheeler's website:

https://andrewpwheeler.wordpress.com/2013/07/09/quick-spss-tip-cleaning-up-irregular-characters-in-strings/

* Read in some sample data.
DATA LIST FIXED / V1 1-13 (A) V2 14-28 (A) V3 35-42 (A) .
* String variables used in a vector must all
* be formatted to the same length.
ALTER TYPE V1 to V3 (A20).
VECTOR v = V1 to V3.
LOOP # = 1 to 3.
- LOOP #i = 1 to 255.
* ASCII codes for 0 to 9 are 48 to 57.
- IF NOT RANGE(#i,48,57) v(#) = REPLACE(v(#),STRING(#i,PIB),"").
- END LOOP.
END LOOP.
ALTER TYPE V1 to V3 (F5.0).
LIST.

* To scale it up for 360 contiguous variables, replace
* V1 to V3 with V1 to V360, and replace 1 to 3 with 1 to 360.

HTH.

erin.ps...@gmail.com

unread,
Oct 19, 2016, 5:38:14 PM10/19/16
to
Ah yes, ok, i can get it to work with a single variable. But not across multiple.

e.g. this works:
COMPUTE DoR_Tm1_Visit_Num = NUMBER(REPLACE(DoR_Tm1_Visit,"Days_OoR",""),F2).
EXECUTE .

and this runs without error:

DO REPEAT strvar=DoR_Tm1_Visit TO DoR_Tm13_Visit
/numvar=DoR_Visit_Tm1 TO DoR_Visit_Tm13.
COMPUTE numvar=NUMBER(REPLACE(strvar,"Days_OoR",""),F2).
END REPEAT.

But returns missing values for the new variables

erin.ps...@gmail.com

unread,
Oct 19, 2016, 5:42:32 PM10/19/16
to
This is so complex! So this one runs without error too.
But doesn't change the data...which I assume it is meant to do?

Bruce Weaver

unread,
Oct 20, 2016, 8:37:36 AM10/20/16
to
On Wednesday, October 19, 2016 at 5:42:32 PM UTC-4, erin.ps...@gmail.com wrote:
> This is so complex! So this one runs without error too.
> But doesn't change the data...which I assume it is meant to do?

Did you run all of the syntax, or only as far as the END LOOP line? If I stop there, and look at the data editor, the original values are still displayed--but I can also see "Transformations Pending" in the Case Counter area at the bottom of the data editor window. This means that you need an EXECUTE, or some other command that causes a pass through the data. It turns out that the ALTER TYPE command on the next line accomplishes that.


Here is my output from the final LIST command:

V1 V2 V3

1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 11 11
12 12 12
13 13 13
14 14 14
15 15 15
16 16 16
17 17 17
18 18 18
19 19 19
20 20 20
21 21 21
22 22 22
23 23 23
24 24 24
25 25 25
26 26 26
27 27 27
28 28 28
29 29 29
30 30 30

Number of cases read: 30 Number of cases listed: 30


And here is the complete syntax again, just in case there was some issue with parts of it being hidden in the Google Groups interface.
0 new messages