### Jaeggi 2011
Jaeggi's work with University of Michigan is available as a preprint:
Jaeggi, Buschkuehl, Jonides & Shah. 2011. ["Short- and long-term
benefits of cognitive
training"](http://www.pnas.org/content/early/2011/06/03/1103228108.abstract)
([PDF](http://www.pnas.org/content/early/2011/06/03/1103228108.full.pdf))
> "We trained elementary and middle school children by means of a videogame-like working memory task. We found that only children who considerably improved on the training task showed a performance increase on untrained fluid intelligence tasks. This improvement was larger than the improvement of a control group who trained on a knowledge-based task that did not engage working memory; further, this differential pattern remained intact even after a 3-mo hiatus from training. We conclude that cognitive training can be effective and long-lasting, but that there are limiting factors that must be considered to evaluate the effects of this training, one of which is individual differences in training performance. We propose that future research should not investigate whether cognitive training works, but rather should determine what training regimens and what training conditions result in the best transfer effects, investigate the underlying neural and cognitive mechanisms, and finally, investigate for whom cognitive training is most useful."
It is worth noting that the study used Single N-back (visual). Unlike
Jaeggi 2008, "despite the experimental group’s clear training effect,
we observed no significant group × test session interaction on
transfer to the measures of Gf" (so perhaps the training was long
enough for subjects to hit their ceilings). The group which did n-back
could be split, based on final IQ & n-back scores, into 2 groups;
interestingly "Inspection of n-back training performance revealed that
there were no group differences in the first 3 wk of training; thus,
it seems that group differences emerge more clearly over time [first 3
wk: t(30) < 1; P = ns; last week: t(16) = 3.00; P < 0.01] (Fig. 3)." 3
weeks is ~21 days, or >19 days (the longest period in Jaeggi 2008).
It's also worth noting that Jaeggi 2011 seems to avoid Moody's most
cogent criticism, the speeding of the IQ tests; from the paper's
'Material and Methods' section;
> "We assessed matrix reasoning with two different tasks, the Test of Nonverbal Intelligence (TONI) (23) and Raven’s Standard Progressive Matrices (SPM) (24). Parallel versions were used for the pre, post-, and follow-up test sessions in counterbalanced order. For the TONI, we used the standard procedure (45 items, five practice items; untimed), whereas for the SPM, we used a shortened version (split into odd and even items; 29 items per version; two practice items; timed to 10 min after completion of the practice items. Note that virtually all of the children completed this task within the given timeframe)."
The IQ results were, specifically, the control group averaged
15.33/16.20 (before/after) correct answers on the SPM and 20.87/22.50
on the TONI; the n-back group averaged 15.44/16.94 SPM and 20.41/22.03
TONI. 1.5 more right questions rather than ~1 may not seem like much,
but the split groups look quite different - the 'small training gain'
n-backing group actually fell on its second SPM and improved by <0.2
questions on the TONI, while the 'large training gain' increased >3
questions on the SPM and TONI. The difference is not so dramatic in
the followup 3 months later: the small group is now 17.43/23.43
(SPM/TONI), and the large group 15.67/24.67. Strangely in the
followup, the control group has a higher SPM than the large group (but
not the small group), and a higher TONI than either group. (The
control group has higher IQ scores on both TONI & SPM in the followup
than the aggregate n-back group.)
Jaeggi 2011 has been discussed in mainstream media. From the _Wall
Street Journal_'s ["Boot Camp for Boosting
IQ"](http://online.wsj.com/article/SB10001424052702304432304576371462612272884.html):
> "...when several dozen elementary- and middle-school kids from the Detroit area used this exercise for 15 minutes a day, many showed significant gains on a widely used intelligence test. Most impressive, perhaps, is that these gains persisted for three months, even though the children had stopped training...these schoolchildren showed gains in fluid intelligence roughly equal to five IQ points after one month of training...There are two important caveats to this research. The first is that not every kid showed such dramatic improvements after training. Initial evidence suggests that children who failed to increase their fluid intelligence found the exercise too difficult or boring and thus didn't fully engage with the training."
From _Discover_'s blogs, ["Can intelligence be boosted by a simple
task? For some…"](http://blogs.discovermagazine.com/notrocketscience/2011/06/13/can-intelligence-be-boosted-by-a-simple-task-for-some/),
come additional details:
> She [Jaeggi] recruited 62 children, aged between seven and ten. While half of them simply learned some basic general knowledge questions, the other half trained with a cheerful computerised n-back task. They saw a stream of images where a target object appeared in one of six locations – say, a frog in a lily pond. They had to press a button if the frog was in the same place as it was two images ago, forcing them to store a continuously updated stream of images in their minds. If the children got better at the task, this gap increased so they had to keep more images in their heads. If they struggled, the gap was shortened.
>
> Before and after the training sessions, all the children did two reasoning tests designed to measure their fluid intelligence. At first, the results looked disappointing. On average, the n-back children didn’t become any better at these tests than their peers who studied the knowledge questions. But according to Jaeggi, that’s because some of them didn’t take to the training. When she divided the children according to how much they improved at the n-back task, she saw that those who showed the most progress also improved in fluid intelligence. The others did not. Best of all, these benefits lasted for 3 months after the training. That’s a first for this type of study, although Jaeggi herself says that the effect is “not robust.” Over this time period, all the children showed improvements in their fluid intelligence, “probably [as] a result of the natural course of development”.
>
> ...Philip Ackerman, who studies learning and brain training at the University of Illinois, says, “I am concerned about the small sample, especially after splitting the groups on the basis of their performance improvements.” He has a point – the group that showed big improvements in the n-back training only included 18 children....Why did some of the children benefit from the training while others did not? Perhaps they were simply uninterested in the task, no matter how colourfully it was dressed up with storks and vampires. In Jaeggi’s earlier study with adults, every volunteer signed up themselves and were “intrinsically motivated to participate and train.” By contrast, the kids in this latest study were signed up by their parents and teachers, and some might only have continued because they were told to do so.
>
> It’s also possible that the changing difficulty of the game was frustrating for some of the children. Jaeggi says, “The children who did not benefit from the training found the working memory intervention too effortful and difficult, were easily frustrated, and became disengaged. This makes sense when you think of physical training – if you don’t try and really run and just walk instead, you won’t improve your cardiovascular fitness.” Indeed, a recent study on IQ testing which found that [they reflect motivation as well as intelligence](http://blogs.discovermagazine.com/notrocketscience/2011/04/26/iq-scores-reflect-motivation-as-well-as-intelligence/).
--
gwern
http://www.gwern.net
--
You received this message because you are subscribed to the Google Groups "Dual N-Back, Brain Training & Intelligence" group.
To post to this group, send email to brain-t...@googlegroups.com.
To unsubscribe from this group, send email to brain-trainin...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/brain-training?hl=en.
I expected as much, though, so after pondering the strange IQ test
scores and followups and still not finding any convincing explanation,
I asked another group of people to look at it: LessWrong. I posted
links as a Discussion article:
http://lesswrong.com/lw/68k/nback_news_jaeggi_2011_or_is_there_a/
(Notice I didn't describe what misgivings I had and I specifically
asked people to read the paper *first*.)
Of the 18 comments (many more than here), none seemed to regard it as
even weak evidence, which is interesting. I'll quote some of the most
relevant comments since I know otherwise a lot of people won't bother
reading the link.
[Jonathan Graehl](http://lesswrong.com/lw/68k/nback_news_jaeggi_2011_or_is_there_a/4d34)
(who, incidentally, has expertise in probability & statistics;
http://www.isi.edu/~graehl/publications.html &
http://www.isi.edu/~graehl/CV.html) writes:
> My primary objection is: perhaps some of the students in both groups got smarter (these are 8-9 year olds and still developing) for reasons independent of the interventions, which caused them to improve on the n-back training task AND on the other intelligence tests (fluid intelligence, Gf). If you separated the "active control" group into high and low improvers post-hoc just like was done for the n-back group, you might see that the active control "high improvers" are even smarter than the n-back "high improvers". We should expect some 8-9 year olds to improve in intelligence or motivation over the course of a month or two, without any intervention.
>
> Basically, this result sucks, because of the artificial post-hoc division into high- and low- responders to n-back training, needed to show a strong "effect". I'm not certain that the effect is artificial; I'd have to spend a lot of time doing some kind of sampling to show how well the data is explained by my alternative hypothesis.
>
> It's definitely legitimate to look at the whole n-back group vs. the whole active control group. Those results there aren't impressive at all. I just can't give any credit for the post-hoc division because I don't know how to properly penalize it and it's clearly self-serving for Jaeggi. It's borderline deceptive that the graphs don't show the unsplit n-back population.
>
> It's unsurprising (probably offering no evidence against my explanation) that the initial average n-back score for the low improvers is higher than the initial average for the high improvers; this is what you'd expect if you split a set of paired samples drawn from the same distribution with no change at all, for example.
[Douglas Knight](http://lesswrong.com/lw/68k/nback_news_jaeggi_2011_or_is_there_a/4d3s)
in replying to Jonathan notices the same problem I did in the IQ score
section:
> When you say that the aggregate results "aren't impressive," you imply that they are positive, but if I read table 1 correctly, the aggregate results are often negative.
[Unnamed](http://lesswrong.com/lw/68k/nback_news_jaeggi_2011_or_is_there_a/4d3h)
offers what seems like a pretty good summary:
> The result looks pretty weak. They had 62 kids. First, they gave all the kids a fluid intelligence test to measure their baseline fluid intelligence. Then half the kids (32) were given a month of n-back training (which the authors expect to increase their fluid intelligence) while the other half (30) did a control training which was not supposed to influence fluid intelligence. At the end of the month's training all of the kids took another fluid intelligence test to see if they'd improved, and 3 months later they all took a fluid intelligence test once more to see if they'd retained any improvement.
>
> The result that you'd look for with this design, if n-back training improves fluid intelligence, is that the group that did n-back training would show a larger increase in fluid intelligence scores from the baseline test to the test after training. They looked and did not find that result - in fact, it was not even close to significant (F < 1). That's the effect that the study was designed to find, and it wasn't there. So that's not a good sign.
>
> The kids who did n-back training did improve at the n-back task, so the authors decided to look at the data in another way - they divided the 32 kids in that group in half based on how much they had improved on the n-back task, and looked separately at the 16 who improved the most and the 16 who improved the least. The group of 16 high-improvers did improve on the fluid intelligence test, significantly more than the control group, and they retained that improvement on the follow-up test of fluid intelligence. That is the main result that the paper reports, which they interpret as a causal effect of n-back training. The 16 low-improvers did not have a statistically significant difference from the control group on the fluid intelligence test.
>
> But this just isn't that convincing a result, as the study no longer has an experimental design when you're using n-back performance to divide up the kids. If you give kids 2 intelligence tests (one the n-back task, one the fluid intelligence test), and a month later you give them both intelligence tests again, then it's not surprising that the kids who improved the most on one test would tend to also improve the most on the other test. And that's basically all that they found. Their study design involved training the kids on one of those two tests (n-back) during the month-long gap, but there's no particular reason to think that this had a causal effect on their improvement on the other test. There are plenty of variables that could affect intelligence test performance which would affect performance on both tests similarly (amount of neural development, being sick, learning disability, etc.).
>
> If there is a causal benefit of n-back, then it should show up in the effect that they were originally looking for (more fluid intelligence improvement in the group that did n-back training than the control group). Perhaps they'd need a larger sample size (200 kids instead of 62?) to find it if the benefit only happens to some of the kids (as they claim), but if some kids benefit from the training while others get no effect from it then the net effect should be a measurable benefit. I'd want to see that result before I'm persuaded.
--
gwern
http://www.gwern.net
> ...
>
> read more »
> ...
>
> read more »
> ...
>
> read more »
> ...
>
> read more »
is how Newton's laws do not work because it does not win Russian
Roulette
--
--
> ...
>
> read more »
> ...
>
> read more »
> ...
>
> read more »
> ...
>
> read more »
That would be interesting. If there had been transfer worth a damn
such that the unspeeded IQ tests showed a rise and refuted Moody's
claim that the previous rises were more processing related than IQ
related.
As it is, the paper is just a Texas sharpshooter fallacy. No doubt if
the random variation had come out the other way, so that the kids with
lower initial TONI scores and lower n-back levels had the higher
post-training scores, we would see claims like 'the harder the task
for the subject, the more transfer we saw and it disproportionately
benefitted those with deficits'. No matter how one slices it, one can
justify it post hoc...
--
gwern
http://www.gwern.net
Which negative study does ad hoc splitting of groups in order to
eliminate an increase?
--
gwern
http://www.gwern.net
--
I don't know if that design would be good enough to advertise
n-backing to the entire world as an investment on par with, say,
iodizing one's salt. Many medical studies which were much larger than
600 subjects have been overturned by later studies. (Seriously, does
no one read my footnotes? My skepticism is on much solider grounds
than anyone here's enthusiasm: http://www.gwern.net/DNB%20FAQ#fn43 )
--
gwern
http://www.gwern.net
Well, more studies is usually better (unless there's stuff like
publication bias involved, in which case you can be lead arbitrarily
far away from the truth - imagine a drug company running thousands of
studies and only publishing the p=0.001 hits. With every study you
become more confident that the drug works...)
Meta-analysis is very hard though, so I think it's generally better to
have one large study than a bunch of smaller perhaps incomparable
ones.
--
gwern
http://www.gwern.net
--
You received this message because you are subscribed to the Google Groups "Dual N-Back, Brain Training & Intelligence" group.
To post to this group, send email to brain-t...@googlegroups.com.
To unsubscribe from this group, send email to brain-trainin...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/brain-training?hl=en.
Functional correlations between the BOLD signal obtained from the working memory task (n-back) and intelligence factors were mostly detected in the right prefrontal and bilateral parietal cortices. All these correlations were negative indicating that subjects with high intelligence factor scores had less activation during the n-back memory task, in support of the efficiency model of brain function ([Haier, 1993], [Haier et al., 1988] and [Neubauer and Fink, 2009]). Gray et al (2003) reported that, on more demanding n-back conditions, participants with higher intelligence scores were more accurate and showed greater activity in several frontal and parietal regions. It should be noted that the focus of their analysis was based on an event related design but their report also included the results using a block design which showed a trend of lower activity with higher scores on a single test of fluid intelligence. Waiter et al. (2009) did not find significant correlations between individual differences in brain activity during an n-back task and intelligence scores. Activation levels are known to fluctuate across working memory loads in an inverted-U shape response (Callicott et al., 1999). The position of the inverted-U can shift depending on the working memory capacity of the group or individual (Callicott et al., 2003). Waiter et al. (2009) used a simple version of the n-back task with only 0- and 2-back levels . The limited levels and range of task difficulties in their study may have created confounds related to shifts in the inverted-U curve. In addition, their study focused on elderly (mid to late 60-years-old) subjects, a population that shows a decrease in working memory capacity and related neurophysiology (Mattay et al., 2006) as well as a wide variability among individuals in the extent, rate and pattern of age-related changes that are exhibited at both neural and behavioral levels (Hedden & Gabrieli, 2004). For the current study, a greater range of working memory task was used on a younger population, which could help avoid inconsistencies due to shifts of the inverted-U. This method does not completely avoid the potential anomalies as equal sampling of both sides of the inverted-U is not guaranteed but is less risky than sampling only one level of working memory difficulty. The optimal method would include enough levels and range in the task difficulty to determine the point of peak activation for each subject.
Our n-back results are in agreement with the single published study aimed at quantifying the neuro-anatomical overlap between the general factor of intelligence (g) and working memory capacity (Colom, Jung, & Haier, 2007). That study showed that a common neuro-anatomic framework for these constructs implicates mainly frontal gray matter regions belonging to the right superior frontal gyrus, the left middle frontal gyrus, and the right inferior parietal lobule. These findings (a) were thought to support the role of a discrete parieto-frontal network, as proposed by the P-FIT model, and (b) were consistent with Cowan's (2005) theory which distinguished a capacity limit (related to parietal regions) and the control of attention (related to frontal areas). It was suggested that capacity limits and attention control relate to the commonality between intelligence and working memory. We also note that we found no correlations between our memory factor and any fMRI activations, possibly because the factor was derived as a broader assessment of memory than the more focused processes required for the n-back task, although this is not determined.