[biogeme] Memory Allocating Error

14 views

Skip to first unread message

Tariq Naveed

unread,

Oct 27, 2025, 5:10:30 AMOct 27

to Biogeme

Dear Biogeme Support Team,

I hope this message finds you well.

I am currently using Biogeme version 3.3.1 and encountering a memory issue when estimating a discrete choice model that requires second derivatives with a large number of draws.

When running the model with 1000 draws, I get the following error:

jaxlib._jax.XlaRuntimeError: INTERNAL: Buffer Definition Event: Error dispatching computation: Error dispatching computation: Error dispatching computation: Error dispatching computation: Error dispatching computation: Error preparing computation: Out of memory allocating 3466368000 bytes.

I have tried the following to address the issue:

Using the latest Biogeme version.
Running the script directly from the shell (not Jupyter).
Avoiding second derivatives or using BHHH, but the resulting estimates are unreliable (similar to the example here: https://biogeme.epfl.ch/sphinx/auto_examples/latent/plot_b01_mimic_discrete.html#sphx-glr-auto-examples-latent-plot-b01-mimic-discrete-py).

I understand that large datasets and many draws may require Linux or macOS for better memory handling, but I would like to know if there are any recommended settings, XLA backend configurations, or memory-efficient approaches that could allow estimation with second derivatives and higher draws on my current setup.

I can provide a minimal reproducible example as:

# ----------------------
# Hybrid Choice Model: Walk vs Others
# ----------------------
import os
import numpy as np
import pandas as pd
from biogeme.biogeme import BIOGEME
import biogeme.database as db
from biogeme.expressions import Beta, Variable, Draws, LinearTermTuple, LinearUtility, Numeric, MonteCarlo, log, exp, Elem, MultipleProduct, NormalCdf
from biogeme.models import logit
from biogeme.results_processing import get_pandas_estimated_parameters

# ----------------------
# User settings
# ----------------------
DATA_FILE = "C:\\Users\\tariq\\Desktop\\MotionTag Final Data App\\MT_walk.csv"
MODEL_NAME = "hybrid_walk_vs_others"
NUMBER_OF_DRAWS = 1000
CHOICE_VAR = "CHOICE"
INDICATORS = ["ACCW1","ACCW2","ACCW3","ACCW4"]
DISCRETE_VALUES = [1,2,3,4,5]
MISSING_VALUES = [-1,-2,99]

# ----------------------
# Read CSV and prepare database
# ----------------------
df = pd.read_csv(DATA_FILE)
df = df[df[CHOICE_VAR].isin([1,2])]
if "normalized_weight" not in df.columns:
df["normalized_weight"] = 1.0
database = db.Database("data", df)

# ----------------------
# Variables
# ----------------------
Choice = Variable(CHOICE_VAR)
Den_Recretional_Act_Origin = Variable("Den_Recretional_Act_Origin")

# ----------------------
# Latent variable accw
# ----------------------
b_Den = Beta("struct_accw_Den", 0.0, None, None, 0)
sigma_accw = Beta("struct_accw_sigma", 1.0, None, None, 0)
accw_linear = LinearUtility([LinearTermTuple(b_Den, Den_Recretional_Act_Origin)])
accw = accw_linear + sigma_accw * Draws("struct_accw_error", "NORMAL_MLHS_ANTI")

# ----------------------
# Ordered probit for measurement equations
# ----------------------
def ordered_probit(continuous_value, scale_parameter, values, thresholds):
probs = {}
probs[values[0]] = NormalCdf((thresholds[0]-continuous_value)/scale_parameter)
for i in range(1,len(values)-1):
probs[values[i]] = NormalCdf((thresholds[i]-continuous_value)/scale_parameter) - \
NormalCdf((thresholds[i-1]-continuous_value)/scale_parameter)
probs[values[-1]] = 1 - NormalCdf((thresholds[-1]-continuous_value)/scale_parameter)
return probs

# Example measurement equation
log_d1, log_d2 = Beta("log_delta_1", np.log(0.3), None, None,0), Beta("log_delta_2", np.log(0.8), None, None,0)
d1, d2 = exp(log_d1), exp(log_d2)
thresholds = [-d1-d2, -d1, d1, d1+d2]

def measurement_likelihood(latent, indicators):
factors = []
for ind in indicators:
intercept = Beta(f"meas_intercept_{ind}", 0.0, None, None, 0)
loading = Numeric(1.0) if ind == indicators[0] else Beta(f"meas_coeff_{ind}",0.0,None,None,0)
scale = Beta(f"meas_scale_{ind}",1.0,None,None,0)
probs = ordered_probit(intercept + loading*latent, scale, DISCRETE_VALUES, thresholds)
for mv in MISSING_VALUES:
probs[mv] = Numeric(1.0)
factors.append(Elem(probs, Variable(ind)))
return MultipleProduct(factors)

meas_like = measurement_likelihood(accw, INDICATORS)

# ----------------------
# Choice utilities
# ----------------------
ASC_others = Numeric(0.0)
ASC_walk = Beta("asc_walk",0.0,None,None,0)
beta_accw_walk = Beta("beta_accw_walk",0.0,None,None,0)
beta_Den_walk = Beta("beta_Den_walk",0.0,None,None,0)

v = {
1: ASC_others,
2: ASC_walk + beta_accw_walk*accw + beta_Den_walk*Den_Recretional_Act_Origin
}

# ----------------------
# Log-likelihood
# ----------------------
choice_like = logit(v, None, Choice)
conditional_like = choice_like * meas_like
loglike = log(MonteCarlo(conditional_like))

# ----------------------
# Estimate model
# ----------------------
biogeme = BIOGEME(database, loglike, number_of_draws=NUMBER_OF_DRAWS)
biogeme.model_name = MODEL_NAME
results = biogeme.estimate()
print("Number of estimated parameters:", results.number_of_parameters)
print("Final log-likelihood:", results.final_log_likelihood)

# ----------------------
# Show results
# ----------------------
df_res = get_pandas_estimated_parameters(results)
print(df_res)

Thank you very much for your guidance.

Best regards,

Michel Bierlaire

unread,

Oct 27, 2025, 5:43:58 AMOct 27

to tariq.n...@gmail.com, Michel Bierlaire, Biogeme

I would definitely move to linux, and use no second derivatives.
Also, sequential estimation is probably the way to go here.

You could try to launch with a higher virtual memory limit:
Control Panel → System → Advanced System Settings → Performance → Advanced → Virtual Memory → Increase Paging File.

For very large models, I am currently considering Bayesian estimation using PyMc...

> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/biogeme/259b6871-63d9-498f-8d7a-ee53c5e73a81n%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Reply all

Reply to author

Forward

0 new messages