Improving fluid intelligence with training on working memory: a meta-analysis(Jaeggi 2014)

641 views
Skip to first unread message

XFMQ902SF

unread,
Aug 8, 2014, 11:45:23 AM8/8/14
to brain-t...@googlegroups.com
Alright everyone, the moment of truth; can dual n-back and it's ostensible usefulness withstand a  meta analysis....

Working memory (WM), the ability to store and manipulate information for short periods of time, is an important predictor of scholastic aptitude and a critical bottleneck underlying higher-order cognitive processes, including controlled attention and reasoning. Recent interventions targeting WM have suggested plasticity of the WM system by demonstrating improvements in both trained and untrained WM tasks. However, evidence on transfer of improved WM into more general cognitive domains such as fluid intelligence (
Gf) has been more equivocal. Therefore, we conducted a meta-analysis focusing on one specific training program, n-back. We searched PubMed and Google Scholar for all n-back training studies with Gf outcome measures, a control group, and healthy participants between 18 and 50 years of age. In total, we included 20 studies in our analyses that met our criteria and found a small but significant positive effect of n-back training on improving Gf. Several factors that moderate this transfer are identified and discussed. We conclude that short-term cognitive training on the order of weeks can result in beneficial effects in important cognitive functions as measured by laboratory tests.


I do not have the background in psychology or meta-analysis or frankly the IQ to know whether or not Jaeggi's meta-analysis of dnb is quality work. Hopefully yes, but its up to you to decide.

cheers

Zaraki

unread,
Aug 8, 2014, 1:29:57 PM8/8/14
to brain-t...@googlegroups.com
She is finally starting to discuss active vs passive control group and claims to have tested the effect of control type in this meta-analysis, but I have only read the introduction.. Does anyone have a link to the full study?


Gwern Branwen

unread,
Aug 8, 2014, 1:46:58 PM8/8/14
to N-back, Thomas S. Redick
Fulltext: https://pdf.yt/d/VMPWmd0jpDYvZIjm /
https://dl.dropboxusercontent.com/u/85192141/2014-au.pdf

I have been eagerly awaiting this, and have had high hopes for it;
after all, my own meta-analysis on exactly this topic
(http://www.gwern.net/DNB%20meta-analysis) has been public and
available online with all details for over 2 years now, so one would
hope Au et al 2014 could improve significantly with the benefit of
being able to look at it. While I haven't looked at each bit and piece
in as much detail as I would like because the original table of data
(on which any alternative analysis will depend) is not included in the
paper or the supplementary info
(https://dl.dropboxusercontent.com/u/85192141/2014-au-supplementarymaterials.zip);
I've emailed the lead author Jacky Au asking for it to be added to the
supplementary info on Springer.

I am a bit disappointed with it. There's a lot that strikes me as a
bit strange or dubious. Their search strategy fails to turn up every
study (no Polar, for example), and their strategy for the gray
literature could be expected to be biased:

> We searched the PubMed and Google Scholar databases using the following keywords taken separately or in combination: n-back training, WM training, cognitive training, fluid intelligence. Several unpublished dissertations were also found on Google Scholar by incorporating the keyword “dissertation” or “thesis” into one of the above search terms. We also included unpublished work from researchers known to us. Finally, we checked the references of selected papers and searched relevant conference proceedings that were accessible to us in order to ensure that there were not any additional studies we had omitted.

They did manage to get Katz's data, though, which I didn't (Katz
ignored my emails).

Their exclusions are weird; for example, they throw out, citing
"Combined training interventions"

- Jausovec & Jausovec (2012)
- Sprenger et al. (2013)
- Takeuchi et al. (2013)
- Schmiedek, Lovden, & Lindenberger (2010)

...except unless you think the other training interventions would
*decrease* intelligence, all 4 studies should still show positive
results if n-back works. Throw in 'combined' as a moderator, don't
throw out the data entirely. Plus, by throwing out Jausovec &
Sprenger, you're throwing out 2 active studies.

They threw out, citing "Training time too short",

• Vartanian et al. (2013)

Vartanian was another active study, and while 60 minutes of training
is not much, it's true, again that's what the training time moderator
is for.

And they threw out, for "Incomplete data for effect size calculation":

- Nussbaumer Grabner, Schneider, & Stern (2013)
- Qiu, Qinqin, Liying, & Lifang (2009)

I already contacted both those authors and provided the necessary data
for effect-size calculation on my page. Nussbaumer was active,
incidentally.

Nor do the exclusions stop there; the supplementary info, particularly
13423_2014_699_MOESM2_ESM.pdf , mentions throwing out for various
moderators:

- Jaeggi et al 2008
- Chooi
- Jaeggi et al 2010
- Schweizer et al 2011
- Salminen et al 2012
- Schwarb et al 2012
- Thompson et al 2013 (this was, incidentally, one of the studies I
mentioned finding no improvement after post-training factor analysis:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0063614#close
)
- Kundu et al 2013
- Clouter 2013
- Colom et al 2013 (likewise http://www.gwern.net/docs/dnb/2013-colom.pdf)
- Smith et al 2013
- Heinzel et al 2014

> Thirty distinct treatment groups and 24 distinct control groups were identified, leading to 24 group comparisons. Where multiple control groups existed within a study, such as an active and passive control, the active control was chosen, provided that the control intervention did not load on WM or some other process that might itself improve Gf. For example, Stephenson and Halpern (2013) used a spatial span active control task that also tapped WM, and Oelhafen et al. (2013) investigated the use of lure trials in n-back and, therefore, considered an adaptive n-back without lures to be an active control. In both cases, the passive control results were selected.

While not as bad as choosing only passive control groups, this is
still bad: they are throwing out a lot of data by scrapping entire
control groups. What they could've done, which is what I did, was keep
all control groups and split the experimental group across control
groups. This would have given them more comparisons than what they
left themselves with, 12 actives and 12 passives.

OK, so what did they find with their limited data?

> Second, studies that used passive controls demonstrated more net transfer (g = 0.44, SE = .10) than those with active controls (g = 0.06, SE = .09), a difference that is also statistically significant at p < .01. However, there was no difference in the performance of either type of control group when compared directly with each other (Ctrl ES; p = .2), but there was significant improvement in the performance of treatment groups (Tx ES; p = .04) within those studies that also use passive controls. Since Tx ES is calculated independently of the control group, the improvements found in these studies are irrespective of the type of control used.

Hedges's _g_ is similar but not identical to the d in my
meta-analysis, so you can't compare numbers directly. But they found
the predicted difference: passive studies find a large effect at 0.44,
while active controls shrink to 0.06, 7x smaller. Case closed, n-back
gains on IQ tests are largely motivational and the rest of the gain
likely to be hollow (as I discussed yesterday in
https://groups.google.com/d/msg/brain-training/4AJJJjo9jRU/KdacH2fLq2wJ
)?

Well... in Table 1, which reports 26 p-values for various permutations
of moderators and various ways to slice before-after scores, they
would like to draw your attention to the difference between Ctrl ES
("Ctrl ES was SMD between pre- and posttests of the control groups.")
and Tx ES ("Tx ES was SMD between pre- and posttests of the treatment
groups.") The experimental groups in active and passive studies
improved g=.25 vs g=.54 while the control groups improved g=.08 vs
g=.28; the argument is that if motivational effects could only affect
the control group, then there should be a bigger pretest/posttest for
control groups in the passive studies compared to the active studies,
but both active & passive studies should see similar gains to the
experimental groups who are getting the same motivational boost. This
seems like a strange argument to me (aren't subjects usually informed
about there being passive controls as part of the informed consent
where they're told how the whole experiment works?) and badly
undermined by their decision to throw away the other control arms of
studies, which would have let them compare active & passive control
groups in the *same* study, not compare across heterogenous studies.
(What's sauce for the goose is sauce for the gander.)

> ...We also ran several multiple regression models (Table 3) to examine possible confounding variables that help explain the greater effect observed in studies with passive controls detailed above. Models 2 and 3 in Table 3 examined the differential effects of controlling for remuneration and international status (with baseline differences covaried out). Both individually appeared to contribute toward the effect observed in passive controls and, together, reduced the ES gain caused by the condition of having a passive control from .31 to .09, rendering the p-value nonsignificant.
>
> ...Passive control groups were mostly enrolled in international studies (10/12), as compared with active control groups (3/12), and these passively controlled studies also remunerated participants significantly less money, on average, for study ($61.75 vs. $220.50), t(14) = 2.23, p = .04, d = 0.87. These factors, individually and together, account for a sizable of the passive control effect and reduce its estimated effect to the point of statistical insignificance (Table 3). The potential effects of remuneration are discussed further below. Despite this, there is still a large amount of shared variance between passively controlled and international studies that needs to be teased apart in future research.
>
> ...International studies tend to find more transfer than U.S. studies. There is a substantial body of literature available on the effects of culture on cognition (cf. Muggleton & Banissy, 2014). These effects may contribute to differences not only between international and U.S. participants, but also between methodological practices of researchers.

Here we have to get philosophical. What does it mean to 'control' for
something? It means simply that when you enter in the variable to the
formula, it helps predict a variable of interest because it correlates
with it. Controlling doesn't deal with things like causation. So when
they enter in a variable for USA, True/False, and regress on it, they
find it helps predict the outcome of what effect size the studies
finish with; fair enough, I could belief that. Does that mean setting
a n-back experiment in the USA *causes* a decrease in training
efficacy and the active coefficient shrinking means it doesn't *cause*
spurious gains? Maybe. Or maybe choosing to use an active control
group causes a decrease in measured effect size, and performing a
study in the USA is highly correlated with choosing to use an active
control group, and this collinearity is why the new coefficient for
active control groups shrinks - because now the true causal effect is
being split over both an 'international' proxy variable and the active
control group variable. There's only 13 vs 11 coded studies, after
all, and collinear situations are notorious for estimates being
unstable and affected by, say, excluding a few datapoints (studies).
Looking at my own dataset and the surnames of lead authors (and
keeping in mind their selection strategy of control groups while I
preserved all control groups possible), there looks like there might
be such a correlation between international studies and use of passive
control groups:

R> dnb[dnb$active==0,]$study
[1] Jaeggi1.8 Jaeggi1.12 Jaeggi1.17 Jaeggi1.19 Qiu
polar Jaeggi2.1
[8] Jaeggi2.2 Stephenson.2 Stephenson.4 Stephenson.6 Chooi.1.2
Chooi.2.2 Zhong.1.05d
[15] Zhong.1.05s Zhong.1.10d Zhong.1.10s Zhong.1.15d
Zhong.1.15s Zhong.1.20d Zhong.1.20s
[22] Zhong.2.15s Zhong.2.19s Redick.1 Rudebeck Salminen
Takeuchi Colom
[29] Heinzel.1 Heinzel.2 Oelhafen Smith.1
Thompson.1 Stepankova.1 Stepankova.2
[36] Burki Burki Pugin
57 Levels: Burki Chooi.1.1 Chooi.1.2 Chooi.2.1 Chooi.2.2 Clouter
Colom Heinzel.1 ... Zhong.2.19s
R> dnb[dnb$active==1,]$study
[1] Jaeggi1.8 Stephenson.1 Stephenson.3 Stephenson.5 Chooi.1.1
Chooi.2.1 Jaeggi3
[8] Kundu1 Schweizer Jaušovec Kundu2 Redick.2
Clouter Jaeggi.5
[15] Jaeggi.5 Smith.2 Sprenger.1 Sprenger.2
Thompson.2 Vartanian Savage
[22] Nussbaumer Burki Burki

(Offhand, I remember Jaušovec is foreign - except of course, Jausovec
was excluded because it was a "Combined training interventions" - and
I'm not sure how Jaeggi would have been classified; Stephenson, Chooi,
Kundu, Redick, and Clouter were definitely USA, and I think Smith,
Sprenger, Thompson and Savage were USA too, but I'm not sure about
Schweizer, Vartanian, Nussbaumer, and Burki.)

> Our work demonstrates the efficacy of several weeks of n-back training in improving performance on measures of Gf. We urge that future studies move beyond attempts to answer the simple question of whether or not there is transfer and, instead, seek to explore the nature and extent of how these improved test scores may reflect “true” improvements in Gf that can translate into practical, real-world settings.

Suffice it to say that I think between the dubiousness of attempts to
explain the passive/active influence, and the likely hollowness of
remaining gains, there's still work to be done.

--
gwern

XFMQ902SF

unread,
Aug 11, 2014, 7:55:48 PM8/11/14
to brain-t...@googlegroups.com
I hope Redick and Engles etc at Georgia Tech see this meta-analysis and dissect it. 

Joshua Hoffer

unread,
Aug 13, 2014, 4:16:00 PM8/13/14
to brain-t...@googlegroups.com, tre...@purdue.edu, gw...@gwern.net
Gwern, You should consider posting you critique of the paper on pubpeer: "Improving fluid intelligence with training on working memory: a meta-analysis" And/or pubmed commons.

XFMQ902SF

unread,
Aug 13, 2014, 4:54:01 PM8/13/14
to brain-t...@googlegroups.com
so...if i am reading this correctly the effect size/transfer size for the active group was almost non-existent at .06 and for the experiments with the passive control groups the transfer size is .44. Just like gwern's meta-analysis, once you remove the passive experiment data, dual n-back is just about useless.


On Friday, August 8, 2014 11:45:23 AM UTC-4, XFMQ902SF wrote:
Reply all
Reply to author
Forward
0 new messages