Searching for Double Object Ditransitives

Liam Considine

unread,

Dec 21, 2011, 1:58:10 PM12/21/11

to chibolts

Hey Chibolts Community,

I am working on extracting double object ditransitive occurrences from
the CHILDES corpus.

"John give me the cookie"

I've tried a handful of different searches on the %mor and %gra line.
I would really like some other people who are familiar with CLAN
syntax to check out my searches. I have already made a search for the
prepositional dative so I am trying for this search to exclude those
instances.

Here is my %mor line attempt:
combo +t*CHI +t%mor +sv*^(pro*)^(det*+qn*+pro*)^(n*+pro*) +k +r2 +u
*.cha

My first %gra line form:
combo +t*CHI +t%gra +s"1|0|ROOT^2|1|OBJ^((3|4|DET^ 4|1|OBJ2)+3|1|
OBJ2)" +k +r2 +u *.cha

My best effort %gra line:
combo +t*CHI +t%gra +s"(1|2|SUBJ^2|0|ROOT^3|2|OBJ^((4|2|OBJ2)+(4|
5DET^5|2|OBJ2)))+(1|0|ROOT^2|1|OBJ^((3|4|DET^ 4|1|OBJ2)+3|1|OBJ2))" +k
+r2 +u *.cha

I've selected the same data files from CHILDES as Anat Ninio does in
the book "Syntactic Development Its input and output." This seems to
be about 75% of all the files available.

With my bigger %gra search i'm getting about 1075 hits. Is this
consistent with the frequency of occurrence others have seen? Does my
syntax have any glaring errors?

Thanks for all the time and energy,
Liam Considine

Brian MacWhinney

unread,

Dec 21, 2011, 3:39:35 PM12/21/11

to ChiBolts, Liam Considine

Dear Liam,
The best way to do this would be to create a test file. That file would include
as much variation in the configuration of double object sentences as you can think of.
You would start by collecting about 60 such sentences by hand and eye from
various corpora. Then perhaps you would imagine some other possible combinations.
Then you would see if your search strings correctly located each occurrence.
If you can first do the work of composing a test file, we could go from there.
Regarding your %mor line attempt, I can easily think of many cases it would miss, such
as sentences with two nouns as objects. In theory the %gra line should be more definitive,
but the level of accuracy of tagging of objects there is at about 90%, so the GRASP tagger
is itself going to miss some things.
Generally, this is probably going to take repeated work and testing.

-- Brian MacWhinney

> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To post to this group, send email to chib...@googlegroups.com.
> To unsubscribe from this group, send email to chibolts+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
>
>

William Snyder

unread,

Dec 21, 2011, 11:34:32 PM12/21/11

to chib...@googlegroups.com, Liam Considine

Dear Liam,

Some years ago, Karin Stromswold and I did a fairly fine grained analysis of double-object datives in the longitudinal corpora that were available in CHILDES at that time.

[Snyder, William and Karin Stromswold. 1997. The structure and acquisition of English dative constructions. Linguistic Inquiry 28(2): 281-317.]

The type of approach we used is still an option for you.

In current terms, you would use the CLAN program 'freq' (with the +u switch) to get a list of all words used at least once by the child in a given corpus. (It would also be possible to combine corpora and run 'freq' on all the transcripts at once, to obtain a single list of words used at least once by at least one of the children you are studying.)

The next step would be to hand-code that list to identify all words that can either function as a double-object verb in adult English, or that have a meaning that might tempt a child to use the double-object structure (in error).

The third step would be to enter the words in a text file, and use the CLAN program 'combo' to locate all child utterances that contain at least one of the words. Preferably you'd use the -w2 switch to get two lines of context for each match, so that you could easily identify and discard direct imitations of other speakers.

The fourth step would be to hand-code the matching utterances to identify the ones that are relevant to your project.

~~~

If you want to be sure to catch all the child's early uses of the double-object dative, including the errors that may have occurred, this strategy would (I think) be a reasonable way to go.

On the other hand, if you need speed more than a high level of accuracy, using the automatic parses in some way (preferably in the way that Brian recommended) could be a better choice for you.

~~~

With best wishes,

William

William Snyder

University of Connecticut

In (Snyder & Stromswold 1997)

Liam Considine

unread,

Jan 4, 2012, 11:24:29 PM1/4/12

to chibolts

Thanks for the generous contributions!

I've made a test file with two simple lines:
give^me^it
pass^me^that

I can see how finding a set of ~60 or so candidates would cover a
great deal of the occurrences. Will, thanks for the tips about how to
make this process exhaustive and accurate.

I found it especially cool that the test file can contain %mor line
configurations. It is nice that they can be combinations of syntactic
categories and lexical items. This is a much better approach than
making one search with lots of parentheses and "+" or statements to
cover variations. One only needs to make a list of the searches.

Thanks again for the advice! I'm sure i'll be back around with some
more questions.
-Liam

On Dec 21 2011, 11:34 pm, William Snyder <william.sny...@uconn.edu>
wrote:

Reply all

Reply to author

Forward