Combining across different set sizes

Dan Grupe

unread,

Sep 19, 2019, 5:53:36 PM9/19/19

to Mnemonic Similarity Task (MST)

Hello MSTers,

We previously collected data in a group of 25 hard-to-recruit clinical participants using a 40-item version of the MST with separate study/test phases. We hope to bolster this sample by bringing back to the lab an additional ~80 clinical participants whom we previously studied, and delivering a battery of tests including the MST to this group.

One goal of this study is to relate individual and group differences in MST performance to individual differences in brain structure and connectivity in this clinical sample (all of these individuals were scanned at their first visit). I am inclined to use the full 64-item version of the MST for the new participants, as I expect that will increase sensitivity to detect potentially subtle relationships between brain measures and performance for different lure bins.

My question is this: would it be acceptable/recommended to combine data across these groups who have received different versions of the MST (40 vs 64 items)? Or would I be better off using the less sensitive 40-item version for consistency across all participants?

From digging in the supplement of Stark et al., Beh NS (2015) it seems that LDI and recognition memory are unaffected by differences in set size. Are there additional data I should be aware of that speak to this question?

I know the answers to these questions may be context-dependent (based on my specific hypotheses and analyses, etc). But if somebody has general thoughts I would really love to hear them!

Dan

Craig E.L. Stark

unread,

Sep 20, 2019, 11:58:21 AM9/20/19

to Mnemonic Similarity Task (MST)

The goal in the task is to estimate a few probabilities that we use to derive the two measures. In general, the more trials you have to base that probability on, the better (the larger your denominator and the more resolution you have). So, if you have only 10 possible lures to potentially respond "similar" on, you have a coarse estimate (0.1, 0.2 ... ) while if you have 20 you have a finer one (0.05, 0.1, 0.15, ...). The truth of the person's ability, however, isn't changed by how you're estimating it. If the truth is a value of 0.14 and you have 10 trials, you'll come up with 0.1. If you have 20 trials, you'll come up with 0.15 as your estimated probability if we factor out basic noise, so you'll be more accurate with more trials. But, toss in a touch of noise and say run this on 100 people, all of whom have a true value of 0.14 and average the results from both the short and longer sets and you'll end up with estimates of ~0.14 in both (the low set will have something like 60 people reading out at 0.1 and 40 people reading out at 0.2, averaging out to your 0.14).

All that is a long way, but hopefully a valuable way, of getting to the idea that adding in extra trials won't affect the actual LDI number you get (until you get to very very small sample sizes where working memory becomes more of an issue). We ran this with 16-64 items / condition in one of the papers. More trials just reduce your quantization error, but do so at the cost of a longer experiment. At some point, you hit diminishing returns. FWIW, I've done a bit of analysis on that and have some scripts if you really want to simulate / look at the effects. But, don't worry about mixing the set sizes -- just pay attention to counterbalancing the actual stimulus sets used.

Craig

Shauna Stark

unread,

Sep 20, 2019, 12:04:30 PM9/20/19

to Mnemonic Similarity Task (MST)

Dan,

Here are my two-cents on this issue: as long as you use different stimulus sets, I don't think that using the 40 or 64-item version is going to matter very much for the sake of comparison. The LDI and REC measures are remarkably consistent across set size, stimulus sets, and repeated testing. The only caveat that I have to this is if you are interested in looking at different lure bins. As we've dug into this data, lure bins 1 and 5 (the most difficult and the easiest) are the least informative in many ways. Group differences start to pull apart much more when considering the middle lure bins (2-4). The 64-item version is really necessary to look lure discrimination performance in the lure bins - I don't recommend using the shorter versions that way.