please help me in interpreting the values of output

711 views
Skip to first unread message

Kamakshaiah

unread,
Sep 27, 2011, 2:46:57 AM9/27/11
to FactoMineR users
Dear friends,

Dear sir/Madam,

I am new to statistics; I do research in ‘Marketing Management’ and
allied areas. For the past one month I am spending my time reading
about Rcmdr. I had come across your website when I was studying about
‘factor analysis. The package ‘FactoMineR’ is very interesting …….

As I am new to this, it is little difficult to understand the concept.
I have gone through the example on your website (http://
factominer.free.fr/classical-methods/hierarchical-clustering-on-
principal-components.html), I understood almost everything, except the
following, as you know here it is very difficult to get some body to
resolve my doubts (In India, for that matter in Asia, I suppose, I
don’t know why people prefer SPSS in spite of it being proprietary) ,
hence, I decided to write to you good people, whom I thought right for
clarifying my doubts.

The following are my doubts (the much common is being problem of
interpretation to values and figures),

Usually, in most of the graphs there are two dimensions (dim1 & dim2),
what are they and what they explain with respect to given values in
parenthesis. For example, in ‘Factor Map’; dim1 (11.37%) and dim2
(9.32%). I only know that it is some thing related to ‘variance’, my
problem is how to interpret this with respect to variables or factors
in study.
Regarding, ‘Description by variables and/or categories’ (res.hcpc
$desc.var$test.chi2,res.hcpc$desc.var$category), I came to know that
it is some thing to do with variables in study, but how to interpret
the values in following figure;

p.value df

where 8.465616e-79 4

how 3.144675e-47 4

price 1.862462e-28 10

tearoom 9.624188e-19 2

pub 8.539893e-10 2

friends 6.137618e-08 2

resto 3.537876e-07 2

How 3.616532e-06 6

Tea 1.778330e-03 4

sex 1.789593e-03 2

frequency 1.973274e-03 6

work 3.052988e-03 2

tea.time 3.679599e-03 2

lunch 1.052478e-02 2

dinner 2.234313e-02 2

always 3.600913e-02 2

sugar 3.685785e-02 2

sophisticated 4.077297e-02 2

Here, why only first two variables are explained to have characterize
most of three clusters. Why not ‘price’ and others? In deed, how to
interpret this p-value of respective variable? Why categories of p-
value less than 0.02 are used? Does it mean that 19th variable has
more than 0.02? (In the illustration there are only 18 variables,
where analysis was done on 19)
Related to ‘Description by principal component’,

$quanti

$quanti$`1`

v.test Mean in category Overall mean sd in category Overall
sd

Dim.6 2.647552 0.03433626 3.088219e-17 0.2655618
0.2671712

Dim.2 -7.796641 -0.13194656 -3.496615e-17 0.1813156
0.3486355

Dim.1 -12.409741 -0.23196088 4.927627e-17 0.2143767
0.3850642

p.value

Dim.6 8.107689e-03

Dim.2 6.357699e-15

Dim.1 2.314001e-35



$quanti$`2`

v.test Mean in category Overall mean sd in category Overall
sd

Dim.2 13.918285 0.81210870 -3.496615e-17 0.2340345
0.3486355

Dim.4 4.350620 0.20342610 1.116042e-17 0.3700048
0.2793822

Dim.14 2.909073 0.10749165 -3.471868e-17 0.2161509
0.2207818

Dim.13 2.341566 0.08930402 2.182264e-17 0.1606616
0.2278809

Dim.3 2.208179 0.11087544 1.099358e-18 0.2449710
0.3000159

Dim.11 -2.234447 -0.08934293 6.981106e-17 0.2066708
0.2389094

p.value

Dim.2 4.905356e-44

Dim.4 1.357531e-05

Dim.14 3.625025e-03

Dim.13 1.920305e-02

Dim.3 2.723180e-02

Dim.11 2.545367e-02



$quanti$`3`

v.test Mean in category Overall mean sd in category Overall
sd

Dim.1 13.485906 0.45155993 4.927627e-17 0.2516544
0.3850642

Dim.6 -2.221728 -0.05161581 3.088219e-17 0.2488566
0.2671712

Dim.4 -4.725270 -0.11479621 1.116042e-17 0.2924881
0.2793822

p.value

Dim.1 1.893256e-41

Dim.6 2.630166e-02

Dim.4 2.298093e-06

attr(,"class")

[1] "catdes" "list "

On what basis we say that “Individuals in cluster 1 have low
coordinates on axes 1 and 2. Individuals in cluster 2 have high
coordinates on the second axis and individuals who belong to the third
cluster have high coordinates on the first axis. Here, a dimension is
kept only when the v-test is higher than 3.”

Can you also please tell me how to interpret values of the following?
cluster: 1

285 152 166 143 71

0.5884476 0.6242123 0.6242123 0.6244176 0.6478185

------------------------------------------------------------

cluster: 2

31 95 53 182 202

0.6620553 0.7442013 0.7610437 0.7948663 0.8154826

------------------------------------------------------------

cluster: 3

172 33 233 18 67

0.7380497 0.7407711 0.7503006 0.7572188 0.7701598



$dist

cluster: 1

82 156 292 197 193

2.009519 1.921977 1.919324 1.908373 1.888461

------------------------------------------------------------

cluster: 2

94 190 212 168 229

1.775459 1.679182 1.674403 1.663392 1.640513

------------------------------------------------------------

cluster: 3

66 273 204 22 44

2.072134 1.924819 1.830174 1.820065 1.794969

In above figure, I understood clusters and respective individuals
(numbers 285, 152, and etc), but what about 0.5884476 and etc.? how to
interpret this value in analysis?

In deed, how to interpret each and every part of the figure?, like in
last figure there are clusters, dims, and values. Here, explanation to
all values is missing……

I MIGHT BE SO IGNORANT OF STATISTICS, but sir, I truly did not
understand how to interpret values, I understood, how to use whole
technique and obtain respective values, but when it comes to
interpretation, it is highly difficult for me.

PLEASE I BEG SOMEONE TO CLARIFY MY DOUBTS……………………………..



François Husson

unread,
Sep 27, 2011, 3:40:44 AM9/27/11
to FactoMineR users
Dear Kamakshaiah,

I try to answer to all your questions one by one in the text. Note
that the book "Exploratory Multivariate Data Analysis by Examples
Using R" (see http://www.crcpress.com/product/isbn/9781439835807)
gives sevreal examples that show how to interpret the outputs of the
different functions.

On 27 sep, 08:46, Kamakshaiah <kamakshaia...@gmail.com> wrote:
> Dear friends,
>
> Dear sir/Madam,
>
> I am new to statistics; I do research in ‘Marketing Management’ and
> allied areas. For the past one month I am spending my time reading
> about Rcmdr. I had come across your website when I was studying about
> ‘factor analysis. The package ‘FactoMineR’ is very interesting …….
>
> As I am new to this, it is little difficult to understand the concept.
> I have gone through the example on your website (http://
> factominer.free.fr/classical-methods/hierarchical-clustering-on-
> principal-components.html), I understood almost everything, except the
> following, as you know here it is very difficult to get some body to
> resolve my doubts (In India, for that matter in Asia, I suppose, I
> don’t know why people prefer SPSS in spite of it being proprietary) ,
> hence, I decided to write to you good people, whom I thought right for
> clarifying my doubts.
>
> The following are my doubts (the much common is being problem of
> interpretation to values and figures),
>
> Usually, in most of the graphs there are two dimensions (dim1 & dim2),
> what are they and what they explain with respect to given values in
> parenthesis. For example, in ‘Factor Map’; dim1 (11.37%) and dim2
> (9.32%). I only know that it is some thing related to ‘variance’, my
> problem is how to interpret this with respect to variables or factors
> in study.

The dimensions correspond to linear combination of the variables. The
first dimension is the best linear combination of the variables (in
the sense that the dimension explains the most percentage of variance
explained by the variables). Then, the second dimension is the
dimension that explains the most percentage of variance once the first
dimension is already taken into account. And the third dimensioon
explain the most percentage of variance once the first two dimensions
are already taken into account. And do on.
These dimensions can be interpreted thanks to the links with the
variables (using the correlation coefficient between the dimension and
the variables we can find an explanation of the dimension).


> Regarding, ‘Description by variables and/or categories’ (res.hcpc
> $desc.var$test.chi2,res.hcpc$desc.var$category), I came to know that
> it is some thing to do with variables in study, but how to interpret
> the values in following figure;
>
>                          p.value df
>
> where         8.465616e-79  4
>
> how           3.144675e-47  4
>
> price         1.862462e-28 10
>
> tearoom       9.624188e-19  2
>
> pub           8.539893e-10  2
>
> friends       6.137618e-08  2
>
> resto         3.537876e-07  2
>
> How           3.616532e-06  6
>
> Tea           1.778330e-03  4
>
> sex           1.789593e-03  2
>
> frequency     1.973274e-03  6
>
> work          3.052988e-03  2
>
> tea.time      3.679599e-03  2
>
> lunch         1.052478e-02  2
>
> dinner        2.234313e-02  2
>
> always        3.600913e-02  2
>
> sugar         3.685785e-02  2
>
> sophisticated 4.077297e-02  2
>

This output gives the p-value of the chi-square test used to evaluate
the link between one qualitative variable and the qualitative variable
corresponding to all the clusters. A p-value less than 5% means that
the qualitative variable can explain the clusters. The variables that
have a p-value larger than 5% are not given (you can modify the 5%
threshold used by default using the argument proba=0.20 for example).
The output above gives the dimensions that explain the qualitative
variable corresponding to the clusters. It means that the individuals
belonging in one cluster have coordinates that are significantly
different from 0 for the dimensions.

> On what basis we say that “Individuals in cluster 1 have low
> coordinates on axes 1 and 2. Individuals in cluster 2 have high
> coordinates on the second axis and individuals who belong to the third
> cluster have high coordinates on the first axis. Here, a dimension is
> kept only when the v-test is higher than 3.”
>
> Can you also please tell me how to interpret values of the following?
> $para
The para object gives the distance from an individual to the centre of
gravity of the cluster it belongs. The closest individual can be seen
as a representative of the cluster. The individuals are sorted from
the closest.

The dist para gives the distance between an individual and the closest
centre of gravity of the other clusters. The individuals are sorted
from the farest. It gives the individual the most different from the
individuals of the other clusters.

kamakshaiah m

unread,
Sep 27, 2011, 4:24:55 AM9/27/11
to factomin...@googlegroups.com
dear sir,

I am deeply impressed by your immediate response, and very much thankful to your answers.

Thanks a lot...
--
-----
Kamakshaiah Musunuru
+251-914238020 (Ethiopia)

Kamakshaiah

unread,
Sep 27, 2011, 11:36:40 PM9/27/11
to FactoMineR users
Dear Friends,

I am some extent better now. Can we do inferential statistics with the
help of FactoMineR. For example, If I have two groups of variables,
how to test hypothesis withe the help of FactoMineR. I am doing a
study, where I need to compare a group of variables (say 7) of Windows
needs to be compared with same group of variables (with different
responses) of Linux. Here, I would like to test a hypothesis as "there
is no significant difference in between Winodows and Linux with
respect to responses, otherwise to say, both the groups are similar".
Now how should I proceed with FactoMineR.

I must be very thankful to your answer, I am partially relieved from
the doubts. But sir, in the last illustration (in my previous
question), what is 'v-test'? and what it conveys? can you please
explain me?

Regards,

On Sep 27, 12:40 pm, François Husson <francois.hus...@agrocampus-
ouest.fr> wrote:
> Dear Kamakshaiah,
>
> I try to answer to all your questions one by one in the text. Note
> that the book "Exploratory Multivariate Data Analysis by Examples
> Using R" (seehttp://www.crcpress.com/product/isbn/9781439835807)

François Husson

unread,
Sep 28, 2011, 3:05:44 AM9/28/11
to factomin...@googlegroups.com
Most of the methods available in FactoMineR are Exploratory Multivariate methods and there is no inference and no test with this kind of methods. Buty concerning your problem, you can use the test of the RV coefficient (using the coeffRV function of FactoMineR). This coefficient measures the link between two groups of variables as the R-square measure the link between two variables or between one variable and a group of (explanatory) variables.

If you want to explore your data, you can use the MFA function which allows you to visualise the link and the difference between the two groups of variables (but it will not allows you to make inference, just exploration of data).

Best
Francois

kamakshaiah m

unread,
Oct 15, 2011, 12:42:38 AM10/15/11
to factomin...@googlegroups.com
Dear Friends,


In the above graph, how should I interpret individuals?
Dim1(5.769), does it mean that, dimension one could explain 5.7 % of variance? how should I interpret individuals with respect to their position in the graph? what are the numbers on x-axis (dim 1) are they values of variance or correlation? I think they are nothing to do with concept of correlation, if it is then values more than 1 on x-axis makes no sense. do the values on x-axis/y-axis represent respective values of individuals in the data set. In that case I don't have any -ve values in data set, as it was shown in the graph.


Please help me, how can I describe individuals in above graph with respect to values of receptive axes. the coordinates of individual 4 is (1, 2.5), what description these values are going to provide for 4?

I studied, articles on http://factominer.free.fr, this type of description is not provided there.

Regards,

François Husson

unread,
Nov 10, 2011, 3:32:00 AM11/10/11
to factomin...@googlegroups.com
Yes it means that 5.7 % of variance is explained by the first dimension. The coordinate on the graph correspond to the value of each individual for the principal component, that is to say for the best linear combination of the variables.
Reply all
Reply to author
Forward
0 new messages