LC model with weighted panel data

114 views
Skip to first unread message

Jingzhu Shan

unread,
Aug 21, 2022, 11:08:25 AM8/21/22
to Biogeme

Dear Professor, 

I am trying to do a latent class model and the class membership model includes socio-economic variable. The datafile is organized as panel data and I want to use weighted data. I remember you said weight does not work for panel data. So, I “flatten” my data file, so that all the observations for the same individual are the same row. 

I first try a latent class model without weight using “flatten” data. However, the results using flatten data are quite different from the results using panel data. (I am sure that the flatten data is well organized according to your suggestion, because I use this flatten data to do a logit model and the results is same with those using panel data.) Could you please help me to check if there are any errors in my syntax? 

Thanks for your help!

 

# Read the data

df = pd.read_excel('BW20.xlsx')

database = db.Database('BW20', df)

 

# Number of observations for each individual.

nbrQuestions = 8

 

globals().update(database.variables)

 

# Parameters to be estimated. One version for each latent class.

 

ASC_NONE_1 = Beta(f'ASC_NONE_class_1', 0, None, None, 0) 

B_DIST_1 = Beta(f'B_DIST_class_1', 0, None, None, 0)

B_ NUM_1 = Beta(f'B_NUM_class_1', 0, None, None, 0) 

 

ASC_NONE_2 = Beta(f'ASC_NONE_class_2', 0, None, None, 0) 

B_DIST_2 = Beta(f'B_DIST_class_2', 0, None, None, 0)

B_ NUM_2 = Beta(f'B_NUM_class_2', 0, None, None, 0) 

 

# Definition of new variables

 

DIST_ A = [Variable(f'DIST_A_{q}') for q in range(nbrQuestions)]

NUM_A = [Variable(f'NUM_A_{q}') for q in range(nbrQuestions)]

 

DIST_ B = [Variable(f'DIST_B_{q}') for q in range(nbrQuestions)]

NUM_B = [Variable(f'NUM_B_{q}') for q in range(nbrQuestions)]

 

A_SP_F = [Variable(f'A_SP_{q}') for q in range(nbrQuestions)]

B_SP_F = [Variable(f'B_SP_{q}') for q in range(nbrQuestions)]

C_SP_F = [Variable(f'C_SP_{q}') for q in range(nbrQuestions)]

CHOICE = [Variable(f'CHOICE_{q}') for q in range(nbrQuestions)]

 

# Utility functions

V11 = B_DIST_1 * DIST_A[q] + \

      B_NUM_1 * NUM25_A[q]

      for q in range(nbrQuestions)

]

V21 = [

      B_DIST_1 * DIST_B[q] + \

      B_NUM_1 * NUM_B[q]

      for q in range(nbrQuestions)

]

V31 = ASC_NONE_1

      

V12 = [

      B_DIST_2 * DIST_A[q] + \

      B_NUM_2 * NUM25_A[q]

      for q in range(nbrQuestions)

V22 = [

      B_DIST_2 * DIST_B[q] + \

      B_NUM_2 * NUM25_B[q]

      for q in range(nbrQuestions)

]

V32 = ASC_NONE_2

 

# Associate the availability conditions with the alternatives

V1 = [

      {1: V11[q], 2: V21[q], 3: V31} 

      for q in range(nbrQuestions)

]

V2 = [

      {1: V12[q], 2: V22[q], 3: V32} 

      for q in range(nbrQuestions)

]

av = [

      {1: A_SP_F[q], 2: B_SP_F[q], 3: C_SP_F[q]}

      for q in range(nbrQuestions)

]

 

# Definition of the model. This is the contribution of each observation to the log likelihood function.

prob1 = [

         models.logit(V1[q], av[q],Variable(f'CHOICE_{q}'))

      for q in range(nbrQuestions)

]

prob2 = [

         models.logit(V2[q], av[q], Variable(f'CHOICE_{q}'))

      for q in range(nbrQuestions)

]

 

# Parameters for the class membership model

ASC_vs1 = Beta('ASC_vs1',0,None,None,0)         

ASC_vs2 = Beta('ASC_vs2',0,None,None,1)

B_AGE= Beta(f'B_AGE', 0, None, None, 0)

           

VS1=ASC_vs1+ B_ASC_AGE * LNAGE

VS2=ASC_vs2

    

PROB_CLASS1=exp(VS1)/(exp(VS1)+exp(VS2))

PROB_CLASS2=exp(VS2)/(exp(VS1)+exp(VS2))

 

conlike=[

           PROB_CLASS1*prob1[q] +PROB_CLASS2*prob2[q]

           for q in range(nbrQuestions)

]

logprob=[

        log(conlike[q])

        for q in range(nbrQuestions)

]

 

biogeme = bio.BIOGEME(database,bioMultSum(logprob))

biogeme.modelName = 'CHOICE_MODEL'

 

#Define level of verbosity

logger = msg.bioMessage()

# logger.setSilent()

# logger.setWarning()

logger.setGeneral()

# logger.setDetailed()

 

# Estimate the parameters.

results = biogeme.estimate()

pandasResults = results.getEstimatedParameters() 

print(pandasResults)

pandasGeneralStat = results.getGeneralStatistics() 

print(pandasGeneralStat)

lc model (flatten data).py

Bierlaire Michel

unread,
Aug 22, 2022, 2:46:18 AM8/22/22
to jingz...@gmail.com, Bierlaire Michel, Biogeme
- The models for the two classes look identical. Therefore, it is quite difficult for an algorithm to find maximum likelihood estimates, due to symmetry. It is likely that it is trapped in a local optimum.
- The class membership is characterizing the individual. So you need to calculate the probability of the trajectory conditional to the class, and then calculate the mean across classes. Here, it seems that you are applying the class membership to each observation. 
- I don’t see any panel effect to capture serial correlation. So, there is no point considering the panel nature of the data, as you assume them independent anyway. 


--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/beeb9e9d-ae0e-4683-b7ff-1249787360a0n%40googlegroups.com.
<lc model (flatten data).py>

Reply all
Reply to author
Forward
0 new messages