Identifiability in Exploratory Factor Analysis

331 views
Skip to first unread message

Mike Lawrence

unread,
Nov 20, 2016, 5:28:21 PM11/20/16
to Stan users mailing list
As a build-up to the full exploratory factor analysis that we talked about earlier (https://groups.google.com/forum/#!category-topic/stan-users/general/KlwFwrsCBZs), I thought I'd code up a version that takes the number of latent factors as data. Below is my attempt, but I fear I've run into identifiability issues since my posterior samples are multimodal. To test my code, I'm using the same data as in the confirmatory factor analysis version the list helped with earlier (https://groups.google.com/forum/#!category-topic/stan-users/general/vkv_-9qW2IQ), but I the solution suggested there (positively constrain all factor loadings) won't work since in EFA you let each factor potentially load on all outcomes. Thoughts?

Here's the model:

data{

 
#N: number of subjects
 
int N ;

 
#K: number of outcomes per subject
 
int K ;

 
#Y: matrix of outcomes
  matrix
[N,K] Y ;

 
#F: number of latent factors
 
int F ;

}
transformed data
{

 
# zeroesForFacMeans: vector of zeroes for factor means
  vector
[F] zeroesForFacMeans ;
  zeroesForFacMeans
= rep_vector(0,F) ;

}
parameters
{

 
# facCor: correlations amongst the F latent factors
  corr_matrix
[F] facCor ;

 
# subjFacs: matrix of values for each factor associated with each subject
  matrix
[N,F] subjFacs ;

 
# outcomeMeans: means for each outcome
  vector
[K] outcomeMeans ;

 
# outcomeSds: SDs for each outcome
  vector
<lower=0>[K] outcomeSds ;

 
# betas: how each factor loads on to each outcome.
  matrix
[K,F] betas ;

}
model
{

 
#priors, assumes data have been scale()ed
  facCor
~ lkj_corr(1) ;
  outcomeMeans
~ normal(0,1) ;
  outcomeSds
~ weibull(2,1) ;
  to_vector
(betas) ~ normal(0,1) ;

 
#generative model:
 
for(n in 1:N){ #loop over subjects
    subjFacs
[n,] ~ multi_normal(zeroesForFacMeans,facCor) ;
   
for(k in 1:K){ #loop over outcomes
      Y
[n,k] ~ normal( subjFacs[n,] * betas[k,]' + outcomeMeans[k] , outcomeSds[k] ) ;
    }
  }

}



Stephen Martin

unread,
Nov 20, 2016, 7:35:20 PM11/20/16
to Stan users mailing list
Under this formulation, I don't think there is a way to address the multimodality. Honestly, I don't think EFA is generally identified. It extracts factors and labels them in descending order of eigenvalue. It then just applies a transform to the eigenvectors to produce a simple structure. I'm also willing to bet that if you ran an EFA multiple times, you may get similar loadings, but with permuted latent variable names.

One could try imposing prior constraints where, e.g., loading from 'factor 1' to item 1 puts most of the probability above 0, centered, e.g., at 1, whereas the loading to the same item to the other factors is spiked at 0 with some probability above and below. However, I'm guessing this would only identify one factor and its loadings with other items; the other K-1 factors would probably swap arbitrarily.

I'd be interested in solutions to this too, but I'm guessing there isn't one.

I *have* tried using Stan to estimate the *number* of latent factors for EFA, but it didn't really work. One could do a mixture-of-K models to determine p(k=K|D), but I opted for something more EFA-flavored, and it failed:
1) use multi_normal as likelihood, estimate Sigma
2) On each iteration, find eigenvalues of Sigma
3) Determine the distribution of eigenvalues to see the posterior probability distributions of eigenvalues in order to assess plausible number of factors to extract. This didn't work well; eigenvalue decomposition was too expensive.

Maxwell Joseph

unread,
Nov 21, 2016, 11:39:54 AM11/21/16
to Stan users mailing list
You might check out this recent paper by Leung & Drton, who devise priors for the loading matrix that they claim leads to an identified model: http://arxiv.org/pdf/1409.7672

Here's a little simulation and a rough translation to Stan syntax (though some of my priors are different): https://gist.github.com/mbjoseph/952b807bf5aad4a72a9d865f84d67afa

It's a little more complicated than a typical EFA because I have coded it to accommodate and estimate missing observations.

Maybe useful?

Mike Lawrence

unread,
Nov 24, 2016, 7:41:54 AM11/24/16
to stan-...@googlegroups.com
Thanks! Just checking this out now. When you're generating fake data, your procedure for generating the Betas ends up always yielding 0 for Beta[1,2]; is this intentional?


Oh, and instead of:

    m_init <- stan("stan/leung_drton_factor_analysis.stan", 
                   iter = 2, chains = 1, data = stan_d)
    m_fit <- stan(fit = m_init, chains = 3, iter = 2000, 
                  pars = c("Sigma", "sigma_L", "y_miss"), 
                  init_r = .1)

You should do (ignore my linebreaks):
    
    m_mod <- rstan::stan_model("stan/leung_drton_factor_analysis.stan")
    m_fit <- rstan::sampling(
        object = m_mod
        , data = stan_d
        , chains = 3
        , iter = 2000
        , pars = c("Sigma", "sigma_L", "y_miss")
        , init_r = .1 
    )




--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Nov 25, 2016, 6:32:57 PM11/25/16
to stan-...@googlegroups.com
Thanks for sharing this. You really want to take this
loop:


for (i in 1:n) {
y[i, ] ~ multi_normal_cholesky(zeros, L_Sigma);
}

and replace it with the much faster vectorized form:

y ~ multi_normal_cholesky(zeros, L_Sigma);

And you don't want to do this:


Sigma = diag_matrix(sigma_y) + multiply_lower_tri_self_transpose(L);

as it adds a bunch of unnecessary zeroes in that diag_matrix() call;
instead, you're better off with the loop:

Sigma = multiply_lower_tri_self_transpose(L);
for (m in 1:M)
Sigma[m, m] = Sigma[m, m] + sigma_y;

We really need an add_diag(matrix, vector):matrix function that does
this in one line, but we don't have it yet and using diag_matrix is
inefficient.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Maxwell Joseph

unread,
Nov 25, 2016, 9:50:50 PM11/25/16
to Stan users mailing list
Thanks Bob and Mike for your improvements. Updated gist is here: https://gist.github.com/mbjoseph/952b807bf5aad4a72a9d865f84d67afa

Mike, the loading matrix had all zeros in the upper triangular part on purpose - but this should still work even when the true loading matrix does not.

Bob, I was able to loop over the diagonal of the covariance matrix instead of adding diag_matrix(...), but I don't think I can remove the loop over the multivariate likelihood, since y is a matrix. When I tried to replace this:


  for (i in 1:n) {
    y[i, ] ~ multi_normal_cholesky(zeros, L_Sigma);
  }

with this:

  y ~ multi_normal_cholesky(zeros, L_Sigma);

I get this syntax error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Warning: integer division implicitly rounds to integer. Found int division: (k * (k - 1)) / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.
No matches for:

  matrix ~ multi_normal_cholesky(vector, matrix)

Available argument signatures for multi_normal_cholesky:

  vector ~ multi_normal_cholesky(vector, matrix)
  vector ~ multi_normal_cholesky(row vector, matrix)
  row vector ~ multi_normal_cholesky(vector, matrix)
  row vector ~ multi_normal_cholesky(row vector, matrix)
  vector ~ multi_normal_cholesky(vector[], matrix)
  vector ~ multi_normal_cholesky(row vector[], matrix)
  row vector ~ multi_normal_cholesky(vector[], matrix)
  row vector ~ multi_normal_cholesky(row vector[], matrix)
  vector[] ~ multi_normal_cholesky(vector, matrix)
  vector[] ~ multi_normal_cholesky(row vector, matrix)
  row vector[] ~ multi_normal_cholesky(vector, matrix)
  row vector[] ~ multi_normal_cholesky(row vector, matrix)
  vector[] ~ multi_normal_cholesky(vector[], matrix)
  vector[] ~ multi_normal_cholesky(row vector[], matrix)
  row vector[] ~ multi_normal_cholesky(vector[], matrix)
  row vector[] ~ multi_normal_cholesky(row vector[], matrix)

require real scalar return type for probability function.

ERROR at line 80

 79:      // likelihood
 80:      y ~ multi_normal_cholesky(zeros, L_Sigma);
                                                    ^
 81:    }

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  :
  failed to parse Stan model 'leung_drton_factor_analysis' due to the above error.

Mike Lawrence

unread,
Nov 26, 2016, 6:27:05 PM11/26/16
to stan-...@googlegroups.com
You need to define y as:

vector[n] y[m];

(Possibly I have the n and m switched, not near a computer to try)

Maxwell Joseph

unread,
Nov 26, 2016, 9:01:20 PM11/26/16
to Stan users mailing list

Thanks Mike - very cool. I hadn't used the vectorized multivariate normal lpdf before.

You were right, n and m were switched (it is vector[m] y[n]), updated here: https://gist.github.com/mbjoseph/952b807bf5aad4a72a9d865f84d67afa

Mike Lawrence

unread,
Nov 27, 2016, 8:24:48 PM11/27/16
to stan-...@googlegroups.com
So, to clarify, this model achieves identifiability by enforcing the constraint that the first item only loads on the first factor, the second item only loads on the first two factors, the third item only loads on the first three factors, and so on? This, at least, the the structure I get for the parameter L, which I presume is the loading matrix. Am I correct?


--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

Maxwell Joseph

unread,
Nov 27, 2016, 9:27:19 PM11/27/16
to Stan users mailing list
That's my understanding - along with the priors for the diagonal elements of the loading matrix (which is the novel contribution of the Leung and Drton paper).
Reply all
Reply to author
Forward
0 new messages