Factor Analysis

Neeraj Kaushik

unread,

Nov 21, 2011, 4:37:57 AM11/21/11

to

Dear Friends

I've really taken much of the time in replying query of Dogra ji. There were so many assignments but certainly that can't be given an excuse. Anyway, lets start by looking for the conditions of factor analysis:
I've attached a ppt regarding the same.

1. Factor analysis (FA) is used when the var are interdependent i.e. we cudn't make sure which var are dep & which are indep.
2. Data MUST be ordinal i.e. likert scale. Nominal data wont be used for FA. Further all likert scale questions must be on same intensity level e.g. In a questionnaire where few statements are on strongly agree to strongly disagree & rest of highly satisfied to highly dissatisfied, here FA can't be applied.
3. FA is generally applied on a very large dataset e.g. 80 statement in questionnaire on 5 point scale replied by 400 respondents.
4. FA got 2 purpose
a) Data reduction
b) Generates new set of var called Factors which are mutually independent
5. Basically FA looks for the inherent undercurrents in the dataset and bring them together in a factor. It starts with finding coefficient of correlation in all var then clubs al var into one place which the respondents replied on same level. It works only on the pattern of responses given by the respondents & is not concerned with 'whether statements resemble each other or not' e.g. in spiritual field there can be various heads from where we can ask questions-meditation, ethical & moral values, religious practices, social norms etc Now FA will find the undercurrent (how much they're alike) in the responses and put them into one factor.
6. FA is of 2 types. I'm discussing only Exploratory FA here.
7. There's many parameters to analyse output of FA
a) Check for Conditions where FA will be applied or not?
A rule of thumb is No. of respondents = No. of statements in question x 5
KMO Value is a better estimate. KMO value . 0.6 signifies FA can be used with present dataset.
b) Method of Extraction
There're various methods but PCA Method is best for beginning.
c) No. of factors to be extracted
There're 3 options (i) When we know nothing about how many factors (This is default method) So here factors are extracted on the basis of Eigen Value Method i.e. where retain those factors where eigen value >=1
(ii) Scree Plot: this is the graphical version of the first method but to be used cautiously
(iii) If total variance explained >60% (Min. criterion for using FA) then we we've option of deciding how many factors to retain. Generally we retain min no. of factors provided their Total var explained >60% This can be done by asking FA to give only 4 or 6 or whatever factors we feel fit. (there's a option which asks for No. of factors)

FA output:
1. The first table given KMO value. It shd be > 0.6 Sig value shd be <0.05
2. The 2nd table of communalities represents how much each var is explained by the factors. Value for every statement shd be > 0.4
3. The 3rd table gives Total var explained and no. of factors taken by FA. Total var explained shd be > 60% i.e. 0.6
4. The 4th table gives coefficient of correlation (called Factor Loading here) between every factor & every statement. Retain those statements in factors where factor loading is >0.4
5. Last table gives the matrix which is reqd in intermediate steps for calculation & is not of much use to us (as of now)

Observations (1):
A quick observation in Table 3 tells factor 1 almost always explains max variance & hence in Table 4, almost all statements will have high factor loading towards factor 1 & very low towards others. So in order to improve solution, we rotate the solution.

Factor Rotation (FR):
1. FR is not compulsory always. If we're getting a gud solution then FR not reqd
2. There're many methods for FR. We'll take Varimax method.
3. After rotation, in output we'll have another table to Rotated factor loading. Now we'll not consider the previous table of Factor loading.
4. In new table the statements will be evenly distributed among various factors.
5. FR never improve the total variance explained.

Observations (2):
In factor analysis output FA never gives the name s of the factors. We've to assign these names on the basis of what's common in various statements of a given factor.
For saving this factor score, we can use Save menu in FA, which employees regression method of saving. These factor sores obtained can further be used as var in techniques like Multiple regression, Logit regression, hypothesis testing etc.

======================================================
Here I've tried to explain the basics of FA. This technique is very subjective as there're many decision areas where no standard exists, only we follow conventions.
In case of any doubt, plz feel free to call me (after 8 pm only)

Best wishes
Neeraj
+91-9996259725

Factor Analysis.ppt

Dr Neeraj Kaushik

unread,

Jan 16, 2013, 2:29:19 AM1/16/13

to dataanalys...@googlegroups.com, spsstraining

Lets talk more of EFA.

When we apply Factor Analysis there're 5 menus given by SPSS:
1. Descriptive
2. Extraction
3. Rotation
4. Scores
5. Options

Lets talk about the various options present in Descriptive Menu which acts as a "Goodness for fit" measure.

KMO & Barlett's test of spherecity
This is one of the most important point as it checks2 basic assumptions of EFA
(a) There's adequate data for EFA. KMO value > 0.6 confirms it.
(b) There's sufficient correlation between the various statements (so that we can group them together as Factors). When Sig value of BArlett's test is less than 0.05 it confirms that there's sufficient correlation.

At this point lets talk of Multi-collinearity problem. High correlation is not always desirable. In fact when 2 statements have r > 0.9 then one of them must be deleted.
Lets analyze it a little more. What happens when all variables are highly correlated? It will create a problem called singularity i.e. it will make problem in determining the unique contribution of statements towards a factor. Hence we must address multi-collinearity.

For checking multi-collinearity we use Coefficient option in Descriptive and see ourself where r > 0.9
Its quite difficult to check r value in a 20x20 matric but then we've mechanisms.
Copy the correlation matrix and paste it in MS-Excel. Use conditional formatting here to check value > 0.9
The option Significance is used for checking the significance of the correlation coefficient. We're hardly concerned with this table.

Another way to check multi-collinearity is by clicking on Determinant. If determinant value < .00001 then there's a problem of multi-collinearity.
In such cases, identify r > 0.8, these statements can be rejected but wait for now.

We can't delete the statement from our choice. A statistical procedure for that is to remove statement on the basis of KMO value for each statement.
Just like we compute KMO for entire dataset, we can compute KMO for each statement too.
Anti-Image option is for that. Anti-Image option produces that inverse of correlation & co-variance matrix.
We can concerned with the Anti-Image correlation matrix. The diagonal elements of this matrix are actually the KMO values for each statement. Plz note off-diagonal value shd be zero or very small.
Remove the statement where the diagonal value < 0.5

If you like to calculate Factor score for your own and you'll need inverse of correlation matrix (I've shown how to calculate Factor score in another post where I made use of MS Excel to calculate inverse of a matrix). SPSS help you in getting the inverse of correlation matrix. Tick Inverse for this.

Finally the option Reproduced.
SPSS will re-generate a correlation matrix on the basis of factors extracted and will compare it with the original correlation matrix. The difference of the two are called residuals and ideally the residuals shd be zero (or < 0.05)
SPSS calculates how many residuals are < 0.05
If 50% of the residuals are under 0.05 then its ok else it signifies some problem.

Happy Learning
Neeraj

priti sharma

unread,

Jan 16, 2013, 5:13:50 AM1/16/13

to dataanalys...@googlegroups.com

Sir,
This reproduced correlation is also used for testing the model in EFA.

On Monday, November 21, 2011 3:07:57 PM UTC+5:30, Dr Neeraj Kaushik wrote:

Radha garg

unread,

Jan 17, 2013, 12:45:08 AM1/17/13

to dataanalys...@googlegroups.com

Respected sir,

If anti-image option produces the inverse of correlation and covariance matrix, then why it is different with the table produced with the help of inverse option which also produced the inverse of correlation matrix.

Regards

Radha

Neeraj Kaushik

unread,

Jan 17, 2013, 3:24:20 AM1/17/13

to dataanalys...@googlegroups.com

Dear Radha

I'm so happy that you scan each & every word and work on it.
This was a typing mistake on my end and actually Anti image matrix produces the anti image of the correlation & covariance matrix.

Plz find attached a file explaining a little bit about the concepts of EFA.

Best wishes
Neeraj

Lecture 4 --- notes on PRINCIPAL COMPONENTS ANALYSIS AND FACTOR ANALYSIS1.pdf

vasudha dhingra

unread,

Jan 21, 2013, 2:07:33 AM1/21/13

to dataanalys...@googlegroups.com

Dear Sir

Kindly clarify my doubt for factor analysis for my data.

I am conducting research on 15 Employee development practices in Telecom Industry

One component of the questionnaire is: Mark employee development methods used in your organization with the following indicator:
1   :     Once a month

2   :     Once in 3 months (Quaterly)

3   :     Once in 6 months (Half Yearly)

4   :     Once a year (Annually)

5   :     Once in 2 years

6   :     Once in more than 2 years

7   :     Never

There are 15 practices each with the above mentioned indicator.

Now can I conduct factor analysis on this component. I have been told I can't since the meauserement is not on a scale.

Waiting anxiously for your response.

Regards

Vasudha

On Thu, Jan 17, 2013 at 11:15 AM, Radha garg <gargr...@gmail.com> wrote:

--
Vasudha

VARUN ARORA

unread,

Jan 21, 2013, 8:01:25 PM1/21/13

to dataanalys...@googlegroups.com

It is for sure a scalar response.

vasudha dhingra

unread,

Jan 21, 2013, 11:10:42 PM1/21/13

to dataanalys...@googlegroups.com

Thanks Varun for your reply but my questions still remains whether factor analysis can be conducted.

According to me its an ordinal scale.

Thanks

Vasudha

--
Vasudha

Neeraj Kaushik

unread,

Jan 21, 2013, 11:48:43 PM1/21/13

to dataanalys...@googlegroups.com

Dear Vasudha

Its indeed a tough question as ur data is ordinal. Here we can't say that whatever difference is there between 1 & 2 the same is there for 2 &3. So u can use Descriptive analysis (comparison on the basis of Mean).

Best wishes
Neeraj

VARUN ARORA

unread,

Jan 23, 2013, 2:05:21 AM1/23/13

to dataanalys...@googlegroups.com

Every data on Likert scale for which you carry out factor analysis is an ordinal data. If it is one of the items then you could carry out factor analysis. However, if it is the only determinant of an outcome, then Neeraj is right.

Madhusmita Choudhury

unread,

Jul 28, 2013, 5:14:03 PM7/28/13

to dataanalys...@googlegroups.com, spsstraining

Sir i have Query on these following sections as per your discussion-

1=Determinant. If determinant value < .00001 then there's a problem of multi-collinearity.

In such cases, identify r > 0.8, these statements can be rejected but wait for now.

In this table (which i copied from SPSS) Determinant value is = .000

we can compute KMO for each statement too. Anti-Image option is for that. Anti-Image option produces that inverse of correlation & co-variance matrix.
We can concerned with the Anti-Image correlation matrix. The diagonal elements of this matrix are actually the KMO values for each statement. Plz note off-diagonal value shd be zero or very small. Remove the statement where the diagonal value < 0.5

as per this i have deleted the covariance matrix in the anti image table & attached the file,

1 -kindly correct me whether the value which i have marked in yellow color are the individual KMO value ?

2- How to check off-diagonal value ( i didn't able to understand) , kindly makr it with other color & let me know.

3- example of which variable to remove for any 1 as maximum value is in negative.

Finally the option Reproduced.
SPSS will re-generate a correlation matrix on the basis of factors extracted and will compare it with the original correlation matrix. The difference of the two are called residuals and ideally the residuals shd be zero (or < 0.05) SPSS calculates how many residuals are < 0.05 , If 50% of the residuals are under 0.05 then its ok else it signifies some problem.

in my Reproduced table, this is the value = There are 83 (43.0%) nonredundant residuals with absolute values greater than 0.05, what inference i shall made out of it

Kindly answer these queries of mine.

Thanks & Regards

sunita- anti image.xlsx

Reply all

Reply to author

Forward