When I want to perform a cluster analysis with nominal variables
with SPSS, I start using optimal scaling with homals command
(avalaible with categories), I save object scores in new variables
and I perform cluster analysis on these news variables.
Optimal scaling is similar to correspondence analysis which can
be seen as method to transform categorical variables into quantitative
variables.
--
Joseph Saint Pierre
http://www.cict.fr/cict/personnel/stpierre
My strategie in present is transform all ordinal variables
into dummy variables, because one have got a 3-point scale and another
a 10-point scale. The ratio variables i do transform to z-Standardized
values with mean 0 and standard deviation 1 !
Is this a acceptable way ???
( i use k-means and want avoid different impacts as a cause from
different codes {euclidean distance} )
Thanks for advance & regards,christian
P.S. The trick with homals is new for me, thanks!
> And what is the best way, if i have a mixed dataSet
> with some ordinal variables (ratings) and some ratio variables like age ???
I never put in a cluster analysis or in a factorial analysis or in any
multivariate analysis variables like ratings (Likert scales ) with
biographical variables such as sex, age, marital status etc...
Just because clusters (or factors) could be defined by associations or
correlations between such variables which are strong and usually known.
For example in a developped country women live longer than men are
more often widow. If you put these three variables age, sex, marital
there are associations between women, old age and widow which, IMHO,
have not to be mixed with ratings variables.
Usually I put only ratings variables in the multivariate analysis and
I study the links between factors or clusters with variables like age
using simple statistical techniques.
> My strategie in present is transform all ordinal variables
> into dummy variables, because one have got a 3-point scale and another
> a 10-point scale. The ratio variables i do transform to z-Standardized
> values with mean 0 and standard deviation 1 !
When there two or more different type of ordinal scales I do
different analysis for different sets of variables. In social
sciences data set I have often used different analysis for different
set of variables even ther were all 4-point scale. I prefer
not to mix differents set in a multivariate analysis, it is very
common that a set of variable has much higher internal associations
than another one, and in such a situation clusters or factors are often
defined only by the variables of the set with highest associations...
> Is this a acceptable way ???
I do not know what is a exactly an acceptable way, I consider that
multivariate techniques have not to be used systematically,
I suggest very often "The Mismeasure of Man" of Stephen Jay Gould,
this book contain an excellent critic of factor analysis usages.
I have written (in French) on my web pages many comments on
usages of multivariates statistical analysis in social sciences,
I suggest basic simple analysis, I consider SPSS as a great package
for its simplicity in recoding, calculating, aggregating etc...
> ( i use k-means and want avoid different impacts as a cause from
> different codes {euclidean distance} )
I am not a specialist of cluster analysis and I still do not understand
how does it really works as a mathematical model, choice of method,
disctance is still a kind of magic:-))
First, I do like the notion of not mixing variable types.
Second, I like the notion of doing correspondence analysis
and forgetting all about cluster analysis.
Third, there are special problems and limits for computing
distances with binary variables; read up on those.
On the other hand, Here is an answer about *weights*
[this is an answer, not a recommendation] --
You can use and manipulate the old default.
Let the scores determine weight: Then, use dummy
scores of 0-10 for X2, if you want X2 to matter 100 times
as much (in squared distance) as X1, for X1 scored 0-1.
Et cetera, et cetera.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html