WAIC for HDDM

607 views
Skip to first unread message

John Clithero

unread,
Feb 16, 2018, 6:06:53 PM2/16/18
to hddm-...@googlegroups.com
Hi everyone,

I was wondering if anyone out there has been computing the widely applicable information criterion (WAIC) for HDDM models.

I see an R package (loo) and a ported Python version. If anyone could point me to a resource that I might be able to implement with HDDM for Python 2 (or 3, although I am running the models in 2) that would be awesome. 

Happy to circle back with the list and organize what I find.

Best,
John

Samuel Mathias

unread,
Feb 16, 2018, 6:26:38 PM2/16/18
to hddm-...@googlegroups.com
I’m unaware of any previous attempts to calculate WAIC for DDM models. My experience with WAIC has been restricted to quite simple models in pymc3 and to be honest I have not found it to be extremely informative. For every model I’ve ever tried beyond the most trivial examples, WAIC calculation throws out warnings about problematic values. It takes forever to calculate, is not widely used in the literature (yet), and there is a good example somewhere in the pymc3 docs showing how it favors the wrong model in a situation where the correct model is obvious (polynomial versus linear regression). I think it shares some properties with loo-cv, which machine learning friends of mine have warned me against because it can be highly variable, but that is anecdotal and second-hand evidence ...

Despite arguments that it favors overly complex models I’ve found DIC to be helpful. Perhaps WAIC is better in many ways, but at some point there is a trade-off between how difficult a statistic is to calculate and how “correct” it is. The extreme example of this is the Bayes Factor. From my reading I am convinced that this is the best model-comparison statistic out there, but it’s calculation is truly a minefield.

Sorry to not be more helpful. Also I am not an expert statistician by any means so if anyone on the mailing list disagrees with me, please let me know.

--
Samuel R. Mathias, Ph.D.
Associate Research Scientist (ARS)
Neurocognition, Neurocomputation and Neurogenetics (n3) Division
Yale University School of Medicine
40 Temple Street, Room 694
New Haven CT 06510

From: hddm-...@googlegroups.com <hddm-...@googlegroups.com> on behalf of John Clithero <john.c...@gmail.com>
Sent: Friday, February 16, 2018 6:06:51 PM
To: hddm-...@googlegroups.com
Subject: [hddm-users] WAIC for HDDM
 
--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael J Frank

unread,
Feb 18, 2018, 9:53:25 AM2/18/18
to hddm-...@googlegroups.com
We are actually currently implementing WAIC and possibly LOO in HDDM -  part of an effort to expand capabilities in various ways, which we will post to this list as the added functionality becomes available. We will report back when we have some results on DDM model recovery simulations using WAIC vs DIC. 

Michael Spezio

unread,
May 27, 2019, 8:32:49 PM5/27/19
to hddm-users
Does anyone know if adding WAIC to HDDM has happened yet or may happen in the near future? Thanks so much. -- Michael
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-...@googlegroups.com.

Michael J Frank

unread,
May 28, 2019, 9:03:01 AM5/28/19
to hddm-...@googlegroups.com
Unfortunately the person that started to implement that had to move on- we still have plans to incorporate WAIC and other methods in HDDM but it has not yet been done and won't be part of this next release (Mads will send an email on that to the list this week). 
 

To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/35d61aa4-6de3-4fbb-b1fc-003dd343d116%40googlegroups.com.

hcp...@gmail.com

unread,
May 24, 2021, 12:16:53 AM5/24/21
to hddm-users
Hi, Michael and other experts,

I am wondering about the status of incorporate WAIC and other methods in HDDM. Maybe I am naive, is it possible to convert HDDM's model objects into ArviZ's inference data. By doing so, we can use methods in ArviZ for model comparison.

Thomas Wiecki

unread,
May 25, 2021, 9:10:16 AM5/25/21
to hddm-...@googlegroups.com
Yes, it should be pretty straightforward to convert the PyMC trace to an arviz InferenceData and use all their stats. If you figure out the code, please post it here.

hcp...@gmail.com

unread,
May 26, 2021, 8:38:40 PM5/26/21
to hddm-users
Hi, Thomas,

Thanks for your feedback.

I tried to convert the traces from HDDM to arviz InferenceData, but so far I can only use a few plots function of arviz, not the stats functions. Below is what I've done:

### First, using cavanagh 2011 data and `HDDM` module:

# define a function for parallel processing
def run_m(id):
    print('running model (depends on stim) %i'%id);
    import hddm

    exp_name = 'cavanagh'
    model_tag = 'm'
    
    #### USE absolute pathes in docker.
    dbname = '/home/jovyan/hddm/temp/df_' + exp_name + '_' + model_tag + '_chain_%i.db'%id
    mname  = '/home/jovyan/hddm/temp/df_' + exp_name + '_' + model_tag + '_chain_%i'%id    
    fname  = '/opt/conda/lib/python3.7/site-packages/hddm/examples/cavanagh_theta_nn.csv'
    data = hddm.load_csv(fname)
    
    m = hddm.HDDM(data, depends_on={'v': 'stim'})
    m.find_starting_values()
    m.sample(5000, burn=1000,dbname=dbname, db='pickle') # it's neccessary to save the model data
    m.save(mname)
    
    return m

# run four chains
from ipyparallel import Client
v = Client()[:]
start_time = time.time()  # the start time of the processing
jobs = v.map(run_m, range(4)) # 4 is the number of CPUs
wait_watching_stdout(jobs)
m_stim_list = jobs.get()

### Then, convert the trace to InferenceData:
import arviz as az
import numpy as np
import pandas as pd
import xarray as xr

df_stim_traces = []
for i in range(4):
    df = m_stim_list[i]
    df_trace = df.get_traces()
    df_trace['chain'] = i
    df_trace['draw'] = np.arange(len(df_trace), dtype=int)
    print('chain', i, df_trace.shape)
    df_stim_traces.append(df_trace)

df_stim_traces = pd.concat(df_stim_traces)
df_stim_traces = df_stim_traces.set_index(["chain", "draw"])

xdata_stim = xr.Dataset.from_dataframe(df_stim_traces)

df_stim = az.InferenceData(posterior=xdata_stim)

df_stim # check the InferenceData
Screenshot from 2021-05-27 08-28-29.png

# test the `az.plot_trace()` function
az.plot_trace(df_stim, var_names=("^a"), filter_vars='regex', rug=True)
Screenshot from 2021-05-27 08-29-40.png

# test the `az.loo()` function:
Screenshot from 2021-05-27 08-30-56.png
The key error message is "TypeError: log likelihood not found in inference data object", and the same error message occurr when I tried `az.waic(df_stim)`. I have no clue how to solve this problem, it'd be great if you can give me some hints.

I also got an error when tried to use az.plot_ppc(): "`data` argument must have the group "posterior_predictive" for ppcplot". I think this error can be solved by first run ppc (using `hddm.utils.post_pred_gen() `) and add the data to InferenceData, but haven't tried yet.

Best,
Chuan-Peng

Alexander Fengler

unread,
May 26, 2021, 11:51:57 PM5/26/21
to hddm-users
Hi Chuang,

arviz seems to allow multiple levels of interface. If you supply traces only you will get access to the plots you show. 
Other functions depend on having access to the generative model.


For hddm you would need to write something like a pymc2 version of this (prob ok to do it via the hddm model object directly).

Haven't looked at it in more detail for now, but hopefully this is a little helpful.

Best,
Alex

hcp...@gmail.com

unread,
May 28, 2021, 9:56:36 PM5/28/21
to hddm-users
Hi, Alex,

Thanks for your help! The webpage is helpful. Also, I found the information about the structure of InferenceData is very helpful: https://arviz-devs.github.io/arviz/schema/schema.html

As I mentioned in the previous post, I have difficulty to convert the hddm model object to  an AZ InferenceData that can use AZ's stats function. Primarily because I have not found the dataset of log_likelihood.

The explanation of log_likelihood is here https://arviz-devs.github.io/arviz/schema/schema.html#log-likelihood, which is "Pointwise log likelihood data. Samples should match with posterior ones and its variables should match observed_data variables. "

I tried to ready more about PyMC2 and HDDM, the only thing that seems relevant are HDDM's likelihoods modules (http://ski.clps.brown.edu/hddm_docs/manual.html#module-hddm.likelihoods). But still, these functions are not directly related to the log_likelihood data needed by AZ's InferenceData.

I really wish to convert HDDM model object to ArviZ's InferenceData so that we can use "modern" bayesian stats. but, obviously, more suggestions/tips are needed, I am from psychology background and my knowledge about PyMC2/Bayesian stats is still quite limited.

Best,
Chuan-Peng
Reply all
Reply to author
Forward
0 new messages