Women, men have different writing styles, computer
program finds
By Robert S. Boyd
Knight Ridder Newspapers
WASHINGTON - A new computer program can determine the
sex of an author by detecting subtle differences in the
words men and women prefer to use.
For instance, female writers tend to choose grammatical
terms that apply to personal relationships, such as
"for" and "with," more frequently than men do.
"Women have a more interactive style," said Shlomo
Argamon, a computer scientist at the Illinois Institute
of Technology in Chicago who developed the program.
"They want to create a relationship between the writer
and the reader."
Men, on the other hand, use more numbers, adjectives
and determiners - words such as "the," "this" and
"that" - because they apparently care more than women
do about conveying specific information.
Argamon said the intent of male writers often was to
say: "Here's something I want to tell you about, and
here are some things about it."
Women, he found, write the pronoun "she" more often
than men do, although both sexes use "he" about
equally.
Argamon said it wasn't clear what psychological or
sociological differences between men and women might
explain their different writing styles. "It's a subject
for further research," he said.
Other experts, such as Deborah Tannen, a linguistics
professor at Georgetown University in Washington, have
popularized the idea that men and women have different
communications styles. But Argamon's work is the first
to show such distinctions in writing.
"This is surprising, since, unlike conversation,
writing a book or an article does not involve direct
social interaction," he said.
Argamon claimed his program correctly determined the
sex of the author in 80 percent of the works it
checked. One it missed was A.S. Byatt's best-selling
novel, "Possession." The computer said it was written
by a man; Byatt is a woman. On the other hand, Michael
Frayn's science fiction tale, "A Landing on the Sun,"
was misidentified as the work of a woman.
Argamon's gender program is part of a much broader
technique called "stylometry," which analyzes styles
not only of writing, but also of music, graphics, art
and architecture.
A practical application of stylometry, he said, would
be to identify writers of anonymous communications,
such as the Unabomber, on the basis of their writings.
The Unabomber, whose 17-year terrorism spree ended in
1995, was identified as Theodore Kaczynski only after
his 35,000-word manifesto was compared with his known
writings by his brother, David.
Similarly, Donald Foster, a professor of literature at
Vassar College in Poughkeepsie, N.Y., unmasked
political columnist Joe Klein as the anonymous author
of the popular Clinton-era novel "Primary Colors."
Without using a computer, Foster laboriously compared
the style of the book with Klein's other writings.
Boulder, Colo., prosecutors hired Foster in 1998 in an
effort to identify the writer of the ransom note in the
unsolved JonBenet Ramsey murder case. He reportedly
determined that a woman wrote the note, but authorities
refused to confirm that.
For years, scholars have debated whether William
Shakespeare wrote a 17th-century play called "Two Noble
Kinsmen."
"These computer techniques may eventually be able to
provide us with answers to these kinds of questions,"
Argamon said.
To carry out his project, Argamon and colleagues
analyzed the texts of 566 British books and articles,
both fiction and nonfiction, taken from a huge computer
database known as the British National Corpus.
From that mass of almost 20 million words, a computer
extracted 1,081 distinctive ``features,'' such as
prepositions, pronouns and adjective phrases. It
checked the use of different verb forms such as ``go''
and ``going.'' It even counted punctuation marks such
as dashes and exclamation marks.
After running repeatedly through these features, the
computer winnowed the list down to 128 significant
contrasts. The results showed that the words favored
most heavily by men were what grammarians call
determinative words such as "the," "a," "as," "that"
and "one." Female writers favored "she" and
relationship words such as "for," "with," "in," "and"
and "not."
When Argamon then tested his program on other texts, it
succeeded 80 percent of the time in identifying the sex
of an anonymous writer.
Argamon and fellow researchers Moshe Koppel and Anat
Shimoni published a report on their work in the April
edition of the journal Literary and Linguistic
Computing.
"This paper has presented convincing evidence of a
difference in male and female writing styles in modern
English books and articles," Argamon concluded. "Such a
difference is sufficiently pronounced that it can be
exploited for automated text classification with
accuracy of approximately 80 percent (and higher in
some cases)."
To see the list of 566 books and 1,081 "features" used
in this study, go to:
http://shekel.jct.ac.il/(TILDE)argamon/gender-style