What's wrong with Gray & Atkinson, "Language-tree divergence times support the Anatolian theory of Indo-European origin" Nature 426 (11/27/03): 435-39? From the point of view of the linguist, a great deal. For starters, the article seems to assume what it is apparently trying to prove: that statistical methods developed for biological cladistics have something to tell us about language classification. (At least, that's what the companion essay, "Trees of life and of language," pp. 391-92, seems to claim; it's by David B. Searls, a "bioinformatics" researcher who works at a drug company.) Secondly, the article is not about language. It's impossible to tell what data the authors used. In the abstract they refer to "a matrix of 87 languages with 2,449 lexical items." At the foot of col. 436a it's "a matrix of 87 Indo-European languages with 2,449 cognate sets coded as discrete binary characters." Under "Methods: Data and coding," col. 438a, it's "The presence or absence of words from each cognate set was coded as '1' or '0', respectively, to produce a binary matrix of 2,449 cognates in 87 languages." Now maybe to a pair of psychologists a "lexical item" is a "cognate set" is a "cognate," but to linguists those are three completely different things. If, as they claim, they used the 200-word Swadesh list, there should have been exactly 200 cognate sets, with exactly 17,400 lexical items (minus a few, since the source of their data, Dyen, Kruskal, and Black, had to leave a few slots blank). Since DK&B compiled their data set to test lexicostatistical techniques in a way that would be useful to linguists investigating families with no written data from ages past, they used only modern spoken languages (G & A don't explain how their method identified Indo-Iranian but DK&B's method didn't). G & A were unhappy with this, so they "added three extinct IE languages, thought to fit near the base of the tree (Hittite, Tocharian A and Tocharian B). Word form and cognacy judgements for all three languages were made on the basis of multiple sources to ensure reliability" (438a). No hint is given of what these "multiple sources" were, let alone what the data and "cognacy judgements" were. They claim they dealt with the absense of sufficient data from these three languages by changing "no match" to "uncertain," and this change had no effect on the result (437a). They claim that "there is considerable support for Hittite ... as the most appropriate root for IE" (437b). This is absurd, unless they are using "root" in some idiosyncratic way. The rest of the text of the article concerns computational statistical methodology, which means nothing to me. The prettily colored diagram on p. 437 agrees in large part with standard charts of IE, which is not surprising, since the standard chart of IE seems to have been incorporated into their raw data -- the "Supplementary information" available only at the Nature website (and thus only to subscribers) comprises a list of 14 nodes with "age constraints." These are: Iberian-French, Italic-Romanian, Germanic, Welsh/Breton, Irish/Welsh, Indic, Iranian, Indo-Iranian, Slavic, Balto- Slavic, Greek split, Tocharic, Tocharian A & B, Hittite. Some of the "age constraints" are reasonable, some are absurdly broad. There's no indication of what hyphen vs. slash vs. ampersand means. I don't have the _slightest_ idea what figs. 1b-1e mean; they are all bell curve-shaped bar graphs plotting "age BP" against "Frequency" (it doesn't say frequency of what), and the four of them have different scales on the y-axis. G & A's conclusion is that their tree somehow supports what used to be Renfrew's suggestion about the spread of agriculture into Europe. G & A provide no hypothesis about why we should suppose that the first farmers of Europe spoke IE languages. Or why we should suppose that PIE is nearly 9000 years old -- if it were, the earliest attestations of it, all from the mid 2nd millennium BCE, would be far less similar to each other than they are, and it would be virtually impossible to assemble 17,000 -- or even 2500 -- "cognates," cognate sets," or "lexical items."