This is an excellent find.
- more subjects than Jaeggi 2008
- trained some longer than Jaeggi 2008 (19 vs 20 days)
- Chooi approvingly discusses Moody, pg 16-19
- RAPM was administered to all participants *untimed*
- pretty decent variety in the test-battery:
> "The tests administered in the pre- and post-tests include the
Mill-Hill vocabulary test, vocabulary tests (part I and II) from the
Primary Mental Abilities test battery, Word Beginning and Ending test,
Colorado Perceptual Speed Test, Identical Pictures, Finding A’s, Card
Rotation and Paper Folding from the ETS test battery, Shepard-Metzler
Mental Rotation Test (1971) and Raven’s Advanced Progressive Matrices
- more similarities:
> "Participants in the 2-week (8 days) condition had a 34%
improvement and those in the 5-week (20 days) condition improved by
44%. Participants in the Jaeggi et al. (2008) study displayed similar
trends. From the data that they published, it can be estimated that
participants in the 8-day training condition improved by 34% and those
in the 19-day condition improved by 47% (Jaeggi et al., 2008). The
numbers suggested that participants from the current study and the
original Jaeggi et al. (2008) study showed very similar performance on
the training task."
- not sure how to interpret this:
> "A simple regression with improvement in N-back task as
predictor was conducted using data from participants in the 5-week
training group. This predictor did not significantly contribute to any
variance in gain scores for all the variables in pre- and post-tests."
> "Some gender differences were observed – male participants in
the study achieved better SAT Math scores (Cohen’s d = 0.76). They
also performed better on Mental Rotation (Cohen’s d = 0.60) and Card
Rotation (Cohen’s d = 0.52) tasks. Female participants, on the other
hand, scored significantly higher than their male counterparts on Word
Beginning and Ending tests (Cohen’s d = 0.35). They also did better on
the perceptual speed test Finding A’s at both pre- and post-tests
(Cohen’s d = 0.67 and 0.60 respectively). In this test, participants
went through long lists of words and crossed out words that have the
- results; penalties to training?
> "Results from dependent sample t-tests suggested no significant
improvement overall after training. There were significant declines in
performance on some of the verbal fluency and all of the perceptual
speed tests, but this trend is consistent with a previous study
suggesting that the items used in the post-test are potentially more
difficult than the items used in the pre-test. Looking at the results
for participants who trained for 2 weeks, they did not show any
significant decline in performance, which may suggest that they
actually improved on speed after training. These participants also
improved significantly on the Card Rotation test, and they showed some
improvement on the Paper Folding test even though it was not
significant. The same cannot be said of those who trained for 5 weeks.
They showed significant declines in performance on the verbal fluency
and perceptual speed tests except for Finding As. They did show
improvement on Paper Folding and Mental Rotation albeit insignificant.
Interestingly, those who did not train took the post-test 5 weeks
after pre-test improved significantly on Card Rotation."
> "Jaeggi and her colleagues (2008, 2010) argued that the speeded
administration of the transfer tasks was comparable to the non-speeded
administration, and they decided to administer the tests with time
limits to avoid ceiling effects. Results from the current study did
not suggest any ceiling effects of any for the tests administered
(RAPM pre-test mean = 12.5, SD = 2.65; RAPM post-test mean = 12.3, SD
= 2.55; there were 18 items on pre- and post-tests)."
And finally, something I've said many times - that it looks like the
benefits do not include IQ but other things:
> "There are many studies in the literature that suggest positive outcome from working memory training, such as reduced inattentive symptoms in ADHD children (Klingberg et al., 2002; Klingberg et al., 2005), increased memory performance in older adults (Buschkuehl et al., 2008), increased math performance in children with working memory deficits (Holmes et al., 2009), improved short term memory in adolescents with borderline intellectual disability (Van der Molen et al., 2010), reduced cognitive deficits in schizophrenic patients (Wykes et al., 1999), significant reduction in symptoms of cognitive problems in patients with stroke (Westerberg et al., 2007) and improved fatigue symptoms in adults with multiple sclerosis (Vogt et al., 2009). Some of these studies reported no improvement in fluid intelligence (Westerberg et al., 2007; Holmes et al., 2009; Van der Molen et al., 2010) and some reported significant improvement (Klingberg et al., 2002; Klingberg et al., 2005). Anecdotal examples in the current study have been encouraging where some participants in the study claimed that they have improved their focus in general. One participant commented on the post-test questionnaire:
>> 'I feel like sometimes I struggled to keep my attention after 12-15 minutes but have noticed during my voice lessons I can focus more on multiple techniques at once, which takes a lot of focus!'"
The only criticism I could think of for this one was the significant
attrition: ~120 to ~90 subjects. But she checked that the attrited
were not different on any of the metrics she had for them, so I didn't
bother to include it. The gender balance - 60 females (greater
Conscientiousness, eh?) - was also odd, but there's no way an
imbalance of 15 females could possibly lead to the observed null
> Of course Jaeggi research
> had many flaws and NOBODY here takes for granted they "raised
> intelligence ten points in 19 days." But I can tell you right now,
> were these researches serious about measuring intelligence (or
> replicating jaeggi), they would use bomat, or actuallly any iq test
> that is not 20 years old... and they would use more of them (like we
> are - and there is a clean difference between raven and bomat).
Yes, so just like Jaeggi didn't use a flawed 20 year old IQ test...?
This study redid Jaeggi 2008, with the major flaw removed, with bigger
n, and more tests to boot, and without the questionable post hoc
analysis strategy of Jaeggi 2010. This is better than Jaeggi 2008, so
you can't debunk it without also _a fortiori_ debunking Jaeggi 2008,
and it supports what I've been saying all along, against the strenuous
opposition of posters like Pontus: Moody was right.
Officially? No. Any one study is untrustworthy (as I emphasized a
while back). But there are, what, 2 or 3 studies due out this and next
year which we recently discussed; if they all come back null that it
is looking pretty bad for the strong IQ claims.
Which still leaves the non-IQ benefits, of course. I really need to
compile them all into a good section for the FAQ, but I keep getting
distracted! Like yesterday and today, by lithium -
That's pretty boring though. The afflicted do not number most of us
here in their ranks, nor will they for decades to come, and we already
know of a great many interventions which help the afflicted. (We've
covered a few here already.) For some such groups, like the elderly,
it seems like any mental workout helps! This is not what made
He had 93 subjects, but only 13 of them were assigned to the group that
they expected gains from, and only 22 in either of the two training
groups. The other 80 subjects were assigned to one of the 5 control
groups he had.
On 2/20/2012 2:29 PM, XFMQ902SF wrote:
> A study by Jaeggi and her colleagues (2008) claimed that they were
> able to improve fluid intelligence (gf) by training working memory.
> Subjects who trained their working memory on a dual n-back task for a
> period of time showed significant improvements in working memory span
> tasks and fluid intelligence tests such as the Ravenï¿½s Progressive
> Matrices and the Bochumer Matrices Test (BOMAT) after training
> compared to those without training. The current study aimed to
> replicate and extend the original study conducted by Jaeggi et al.
> (2008) in a well-controlled experiment that could explain the cause or
> causes of such transfer if indeed the case. There were a total of 93
> participants who completed the study, and they were randomly assigned
> to one of three groups ï¿½ a passive control group, active control group
> and experimental group. Half of the participants were randomly
> assigned to the 8-day condition and the other half to the 20-day
> condition. All participants completed a battery of tests at pre- and
> post-tests that consisted of short timed tests, a complex working
> memory span and a matrix reasoning task. Participants in the active
> control group practiced for either 8 days or 20 days on the same task
> as the one used in the experimental group, the dual n-back, but at the
> easiest level to control for Hawthorne effect. Results from the
> current study did not suggest any significant improvement in the
> mental abilities tested, especially fluid intelligence and working
> memory capacity, after training for 8 days or 20 days. This leads to
> the conclusion that increasing oneï¿½s working memory capacity by
And presented power calculations to justify the smallness.
On 2/20/2012 8:06 PM, Gwern Branwen wrote:
> And presented power calculations to justify the smallness.
In that power analysis, they used an effect size estimate of (Cohen's) d
= 0.98, and calculated a power for his study of 0.8 based on that effect
size. Yes, that's what was found for DNB on the Raven's in Jaeggi 2010,
but it's still really high. What if the actual effect size is lower than
that, like d=0.5? This page says their power would have been about 0.25
<http://www.stat.ubc.ca/%7Erollin/stats/ssize/n2.html>). For d=0.75,
about 0.48. For d=0.40, about 0.17. For a power of 0.8 with an effect
size of 0.5, they would have needed a sample size of 63 in each group.
With their 6-group design (and still using a paired t-test, which is not
optimal for this design), that would mean 378 subjects total to have an
80% chance of detecting an effect size of 0.5 standard deviations (or ~7
IQ points). And that's without correcting for multiple testing.
(Note: that page assumes that the two groups being compared have the
same sample size, whereas Chooi's study had heavily unbalanced groups.
Though in practice, the size of the smaller group matters more, this
still means that the power estimates spat out by that page (and which I
reported above) are underestimates for Chooi's study, so maybe their
actual power with d=0.75 is about 60%, not 48%.)
It's worth mentioning also that 0.98 was the largest effect size seen
for any of the treatments and transfer tasks in Jaeggi 2010 (DNB to
Raven's). Looking at DNB to BOMAT, the effect size was only 0.49; for
SNB to Raven's and BOMAT, it was 0.65 and 0.70, respectively. Also, the
no-contact group had an effect size of 0.09 (Raven's) or 0.26 (BOMAT),
which really should be subtracted from the treatment groups' effect
sizes, since we don't care if people get better, but only if they get
better because of the training. For comparison, Jaeggi 2008 found an
effect size of 0.65 for the treatment group vs. 0.25 for the no-contact
group. In any case, 0.98 is definitely *not* a conservative effect size
estimate, which is what you should be using when performing power
Actually, Chooi's choice of effect sizes to use for his power analysis
kinda makes it seem like he's trying to gloss over the fact that his
sample size per group was so small. "Yeah, I didn't have many subjects,
but look! Power analysis! Can I have my PhD now?" (I'm not usually one
for ad hominem attacks, but this particular instance seems like the
equivalent of using 1.5 inch margins on a 3 page homework assignment.
Sure, there are instances in which 1.5 inch margins can be appropriate
-- such as when you expect your grader to be running out of space to
write comments -- so it could just be a case of bad judgment, but in
this case it feels pretty slimy. Especially since in most of the paper
he writes about Jaeggi 2008 (which has a smaller effect size), but when
it comes to the power analysis he chooses the biggest effect size from
Jaeggi 2010. Plus there's the fact that Chooi's dissertation is dated
March 2011, so if he did a power analysis using Jaeggi 2010's effect
size, it was clearly a post-hoc power analysis.)
What I get from this study: The effect size for DNB training is probably
less than 0.98. (Of course, that's what I believed anyway before I saw
this.) The effect size could quite reasonably still be as high as 0.75.
Even if the true effect size of DNB training were only 0.3 standard
deviations (~5 IQ points), I think that would still be a pretty big deal
and totally worth my time.
Still haven't read most of Chooi's paper, unfortunately. If only I had
minions to do my engineering work for me, then I could spend more time
on science... sigh.
On 2/20/2012 11:16 PM, Jonathan Toomim wrote:
> Actually, Chooi's choice of effect sizes to use for his power analysis
> kinda makes it seem like he's trying to gloss over the fact that his
> sample size per group was so small....
However, I'm more excited about WM improvement than I am IQ
improvement because most of the effects I want in my thinking come
from WM gains -- So for people who're trying to boost IQ, maybe they
won't use dual n-back anymore, but I think that WM and executive
function are extremely important for your overall intellectual
capacity, independent of IQ
So if WM training improves WM, then it's still a great advancement :)
Thanks for posting the study OP!
> You received this message because you are subscribed to the Google Groups
> "Dual N-Back, Brain Training & Intelligence" group.
> To post to this group, send email to brain-t...@googlegroups.com.
> To unsubscribe from this group, send email to
> For more options, visit this group at
That's a good point. I didn't realize there was such an effect size
difference between Jaeggi 2010 and 2008 or that the choice made such a
difference. I'll add that to the section as it definitely limits what
Chooi 2011 shows. I'll remember to pay closer attention to effect size
& power in the upcoming studies.
On Tue, Feb 21, 2012 at 5:25 AM, polar <pol...@gmail.com> wrote:
> Gwern do you really think that all the studies which regularly are
> finding iq improvements after cognitive training are flawed or fraud?
The burden of proof is *heavily* on those claiming an IQ intervention
in healthy young adults, or children for that matter too. (Chooi
includes some of the standard examples in a small history of such
failures, like Headstart, but I guess you didn't read it...)
> Of course moody was partially right -
I guess I should be happy with what I can get.
> but do you really think that speed is NOT a part of intelligence?
It may be 'part of intelligence' - like vocab is. I believe I have
made this point a few dozen times now. Correlation, causation...
> Yes, for some people dual n-back doesnt work AT ALL. For others through, it definitely works.
Just like faith healing definitely cures cancer for some people amirite
> Maybe there is some interesting trait or factor thats intervening, whaddya say?
Epicycles, special pleading, etc.
On 2/21/2012 3:06 PM, Glenn Henshaw wrote:
> "I found this guys..."
You received this message because you are subscribed to the Google Groups "Dual N-Back, Brain Training & Intelligence" group.
Then presumably they won't improve very much on n-back, and the
statistics won't be much affected by their simultaneous lack of
improvement on the IQ tests?
So I agree with you that there's more to our thinking skill than our
IQ, but I also think that WM can't be a large contributor to our IQ
Do you believe that our WM is constantly changing?
Do you believe that our performance on IQ tests is constantly changing
to match the change in WM?
(these are questions for a lot of people who think that improving WM
will lead to IQ gains -- I'm a huge fan of WM improvement and have
argued so on this newsgroup, so don't think I'm raising critiques of