InLatin factor means simply "doer". So in English a factor is an "actor" or element or ingredient in some situation or quantity. Charm can be a factor in someone's success, and lack of exercise can be a factor in producing a poor physique. In math we use factor to mean a number that can be multiplied or divided to produce a given number (for example, 5 and 8 are factors of 40). And in biology a gene may be called a factor, since genes are ingredients in the total organism.
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors plus "error" terms, hence factor analysis can be thought of as a special case of errors-in-variables models.[1]
A common rationale behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in psychometrics, personality psychology, biology, marketing, product management, operations research, finance, and machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality.
Suppose a psychologist has the hypothesis that there are two kinds of intelligence, "verbal intelligence" and "mathematical intelligence", neither of which is directly observed.[note 1] Evidence for the hypothesis is sought in the examination scores from each of 10 different academic fields of 1000 students. If each student is chosen randomly from a large population, then each student's 10 scores are random variables. The psychologist's hypothesis may say that for each of the 10 academic fields, the score averaged over the group of all students who share some common pair of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus another constant times their level of mathematical intelligence, i.e., it is a linear combination of those two "factors". The numbers for a particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are posited by the hypothesis to be the same for all intelligence level pairs, and are called "factor loading" for this subject. [clarification needed] For example, the hypothesis may hold that the predicted average student's aptitude in the field of astronomy is
The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data.
Note that, since any rotation of a solution is also a solution, this makes interpreting the factors difficult. See disadvantages below. In this particular example, if we do not know beforehand that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence without an outside argument.
The values of the loadings L \displaystyle L , the averages μ \displaystyle \mu , and the variances of the "errors" ε \displaystyle \varepsilon must be estimated given the observed data X \displaystyle X and F \displaystyle F (the assumption about the levels of the factors is fixed for a given F \displaystyle F ). The "fundamental theorem" may be derived from the above conditions:
The term on the left is the ( a , b ) \displaystyle (a,b) -term of the correlation matrix (a p p \displaystyle p\times p matrix derived as the product of the p N \displaystyle p\times N matrix of standardized observations with its transpose) of the observed data, and its p \displaystyle p diagonal elements will be 1 \displaystyle 1 s. The second term on the right will be a diagonal matrix with terms less than unity. The first term on the right is the "reduced correlation matrix" and will be equal to the correlation matrix except for its diagonal values which will be less than unity. These diagonal elements of the reduced correlation matrix are called "communalities" (which represent the fraction of the variance in the observed variable that is accounted for by the factors):
This is equivalent to minimizing the off-diagonal components of the error covariance which, in the model equations have expected values of zero. This is to be contrasted with principal component analysis which seeks to minimize the mean square error of all residuals.[3] Before the advent of high-speed computers, considerable effort was devoted to finding approximate solutions to the problem, particularly in estimating the communalities by other means, which then simplifies the problem considerably by yielding a known reduced correlation matrix. This was then used to estimate the factors and the loadings. With the advent of high-speed computers, the minimization problem can be solved iteratively with adequate speed, and the communalities are calculated in the process, rather than being needed beforehand. The MinRes algorithm is particularly suited to this problem, but is hardly the only iterative means of finding a solution.
The goal of factor analysis is to choose the fitting hyperplane such that the reduced correlation matrix reproduces the correlation matrix as nearly as possible, except for the diagonal elements of the correlation matrix which are known to have unit value. In other words, the goal is to reproduce as accurately as possible the cross-correlations in the data. Specifically, for the fitting hyperplane, the mean square error in the off-diagonal components
The term on the right is just the covariance of the errors. In the model, the error covariance is stated to be a diagonal matrix and so the above minimization problem will in fact yield a "best fit" to the model: It will yield a sample estimate of the error covariance which has its off-diagonal components minimized in the mean square sense. It can be seen that since the z ^ a \displaystyle \hat z_a are orthogonal projections of the data vectors, their length will be less than or equal to the length of the projected data vector, which is unity. The square of these lengths are just the diagonal elements of the reduced correlation matrix. These diagonal elements of the reduced correlation matrix are known as "communalities":
Large values of the communalities will indicate that the fitting hyperplane is rather accurately reproducing the correlation matrix. The mean values of the factors must also be constrained to be zero, from which it follows that the mean values of the errors will also be zero.
Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are part of unified concepts.[4] The researcher makes no a priori assumptions about relationships among factors.[4]
Confirmatory factor analysis (CFA) is a more complex approach that tests the hypothesis that the items are associated with specific factors.[4] CFA uses structural equation modeling to test a measurement model whereby loading on the factors allows for evaluation of relationships between observed variables and unobserved variables.[4] Structural equation modeling approaches can accommodate measurement error and are less restrictive than least-squares estimation.[4] Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of observed variables on the latent variables (factors), as well as the correlation between the latent variables.[4]
Principal component analysis (PCA) is a widely used method for factor extraction, which is the first phase of EFA.[4] Factor weights are computed to extract the maximum possible variance, with successive factoring continuing until there is no further meaningful variance left.[4] The factor model must then be rotated for analysis.[4]
Canonical factor analysis, also called Rao's canonical factoring, is a different method of computing the same model as PCA, which uses the principal axis method. Canonical factor analysis seeks factors that have the highest canonical correlation with the observed variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data.
Common factor analysis, also called principal factor analysis (PFA) or principal axis factoring (PAF), seeks the fewest factors which can account for the common variance (correlation) of a set of variables.
Alpha factoring is based on maximizing the reliability of factors, assuming variables are randomly sampled from a universe of variables. All other methods assume cases to be sampled and variables fixed.
Researchers wish to avoid such subjective or arbitrary criteria for factor retention as "it made sense to me". A number of objective methods have been developed to solve this problem, allowing users to determine an appropriate range of solutions to investigate.[7] However these different methods often disagree with one another as to the number of factors that ought to be retained. For instance, the parallel analysis may suggest 5 factors while Velicer's MAP suggests 6, so the researcher may request both 5 and 6-factor solutions and discuss each in terms of their relation to external data and theory.
3a8082e126