Posterior Predictive Checks, Error: positional indexers are out-of-bounds

170 views
Skip to first unread message

lena Pollerhoff

unread,
Mar 9, 2023, 5:25:06 AM3/9/23
to hddm-users

Dear HDDM users/team, 

 

I am currently trying to run posterior predictive checks for two regression models. We fitted a model for a prosocial decision-making task, but due to a mixed design, separately for a sample of younger adults and older adults. 

This means we run the same model twice, one including younger adults only, and one including older adults only. 

 

Now I want to run the PPC, which is no problem for the younger-adults-model.

But every time I want to run the PPC for the older-adults model (exact same code, same model, only different data) I get the following error message:

 

ppc_data = hddm.utils.post_pred_gen(m12OA_reg)

Traceback (most recent call last):

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 1469, in _get_list_axis

    return self.obj._take_with_is_copy(key, axis=axis)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 3363, in _take_with_is_copy

    result = self.take(indices=indices, axis=axis)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py", line 3351, in take

    indices, axis=self._get_block_manager_axis(axis), verify=True

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1449, in take

    indexer = maybe_convert_indices(indexer, n)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexers.py", line 250, in maybe_convert_indices

    raise IndexError("indices are out-of-bounds")

 

IndexError: indices are out-of-bounds

 

 

The above exception was the direct cause of the following exception:

 

Traceback (most recent call last):

 

  File "<ipython-input-6-a454363c6e94>", line 1, in <module>

    ppc_data = hddm.utils.post_pred_gen(m12OA_reg)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/kabuki/analyze.py", line 328, in post_pred_gen

    for name, data in iter_data:

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/kabuki/analyze.py", line 324, in <genexpr>

    iter_data = ((name, model.data.iloc[obs['node'].value.index]) for name, obs in model.iter_observeds())

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 879, in __getitem__

    return self._getitem_axis(maybe_callable, axis=axis)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 1487, in _getitem_axis

    return self._get_list_axis(key, axis=axis)

 

  File "/Users/lena/opt/anaconda3/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py", line 1472, in _get_list_axis

    raise IndexError("positional indexers are out-of-bounds") from err

 

IndexError: positional indexers are out-of-bounds

 

For your information, I am working on a Mac Book, using Anconda and Spyder (5.0.5), with Python 3.6.13 and HDDM 0.8.0. However, a colleague also tried to run the PPC on a windows PC (Spyder 3.8, HDDM 0.8.0), getting the same error message. 


I’ll send you a google drive link including the m12OA_reg traces, to replicate the error message: https://drive.google.com/file/d/1JXBCBsVlfnAz7FqqW69Rvv49I0XKmxuI/view?usp=sharing

 

I would really appreciate your help, as we were not able to solve the problem!

 

Best wishes and thanks in advance!

Lena 

lena Pollerhoff

unread,
Apr 3, 2023, 3:56:40 AM4/3/23
to hddm-users
Hello, does anyone might help us with this specific problem?

Best
Lena

Alexander Fengler

unread,
Apr 4, 2023, 11:28:23 PM4/4/23
to hddm-users
Hi Lena, 

will look in this tomorrow and report back.

Best,
Alex

Alexander Fengler

unread,
Apr 5, 2023, 4:47:36 PM4/5/23
to hddm-users
Hi Lena, 

I tried to open the file you linked to, but it doesn't not seems to have the right encoding.
(Clicking the link it directly want's to go for a download also)
Can you just share the python scripts and potentially representative data in a folder and share the folder?

Best,
Alex

lena Pollerhoff

unread,
Apr 6, 2023, 4:37:45 AM4/6/23
to hddm-...@googlegroups.com
Dear Alex,

I am sorry for the inconvenience! Could you check whether the current link works for you? Four files should be in the folder (example data (from three older participants, due to the big size (180 trials)), python script, and the model files). 


Thank you so much for helping us out!
Best
Lena 

--
You received this message because you are subscribed to a topic in the Google Groups "hddm-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hddm-users/hNBakvESD_8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/df7cc694-f286-45a1-ae21-90cd89436538n%40googlegroups.com.

Alexis Pérez

unread,
May 12, 2023, 1:16:03 PM5/12/23
to hddm-users
Hi,

I am finding the same problem using the docker HDDM. Could you find the origin of this problem?

Thanks!

Alex

lena Pollerhoff

unread,
Jun 22, 2023, 2:46:03 AM6/22/23
to hddm-users
Hey ,

we finally found a solution for our problem by changing iloc to loc (https://groups.google.com/g/hddm-users/c/Is6AM7eN0fo).
Attached you'll find my code. Hopefully it might help people with similar problems:

# Define the necessary functions
def _parents_to_random_posterior_sample(bottom_node, pos=None):
    """Walks through parents and sets them to pos sample."""
    import pymc as pm
    import numpy as np
    for i, parent in enumerate(bottom_node.extended_parents):
        if not isinstance(parent, pm.Node):  # Skip non-stochastic nodes
            continue

        if pos is None:
            # Set to random posterior position
            pos = np.random.randint(0, len(parent.trace()))

        assert len(parent.trace()) >= pos, "pos larger than posterior sample size"
        parent.value = parent.trace()[pos]


def _post_pred_generate(bottom_node, samples=500, data=None, append_data=False):
    """Generate posterior predictive data from a single observed node."""
    datasets = []

    # Sample and generate stats
    for sample in range(samples):
        _parents_to_random_posterior_sample(bottom_node)
        # Generate data from bottom node
        sampled_data = bottom_node.random()
        if append_data and data is not None:
            sampled_data = sampled_data.join(data.reset_index(), lsuffix='_sampled')
        datasets.append(sampled_data)

    return datasets


def post_pred_gen(model, groupby=None, samples=500, append_data=False, progress_bar=True):
    """Run posterior predictive check on a model.
    :Arguments:
        model : kabuki.Hierarchical
            Kabuki model over which to compute the ppc on.
    :Optional:
        samples : int
            How many samples to generate for each node.
       
        groupby : list
            Alternative grouping of the data. If not supplied, uses splitting
            of the model (as provided by depends_on).
        append_data : bool (default=False)
            Whether to append the observed data of each node to the replicatons.
        progress_bar : bool (default=True)
            Display progress bar
    :Returns:
        Hierarchical pandas.DataFrame with multiple sampled RT data sets.
        1st level: wfpt node
        2nd level: posterior predictive sample
        3rd level: original data index
    :See also:
        post_pred_stats
    """
    import pymc.progressbar as pbar
    results = {}
   
    # Progress bar
    if progress_bar:
        n_iter = len(model.get_observeds())
        bar = pbar.progress_bar(n_iter)
        bar_iter = 0
    else:
        print("Sampling...")
   
    if groupby is None:
        iter_data = ((name, model.data.loc[obs['node'].value.index]) for name, obs in model.iter_observeds())
    else:
        iter_data = model.data.groupby(groupby)
   
    results = {}

    for name, data in iter_data:
        node = model.get_data_nodes(data.index)
       
        if node is None or not hasattr(node, 'random'):
            continue  # Skip
       
        # Generate posterior predictive data from the node
        datasets = _post_pred_generate(node, samples=samples, data=data, append_data=append_data)
        results[name] = pd.concat(datasets, names=['sample'], keys=list(range(len(datasets))))
   
    # Concatenate the results
    ppc_data = pd.concat(results, names=['node'])
   
    return ppc_data

   
# Generate posterior predictive data
ppc_data = post_pred_gen(m12OA_reg)  

# Perform posterior predictive checks
ppc_stats = hddm.utils.post_pred_stats(data, ppc_data)
print(ppc_stats.head())

Reply all
Reply to author
Forward
0 new messages