Working with codes

22 views

Skip to first unread message

bar...@zas.gwz-berlin.de

unread,

Jun 19, 2008, 7:43:29 AM6/19/08

to chib...@googlegroups.com

Dear all,

For a study on anaphora, we are coding referring expressions in
children's narratives and I have some questions concerning the coding
line (we use %cod), as well as subsequent CLAN analyses.

First, is such a %cod line legal?
*CHI: der hund beisst sie in den schwanz.
%cod: der hund|S-DA:N-BL-V1:3-AS-DIR-hundC sie|O-PRO:PP-BL-V3:1-AS-DIS-
katzeC den schwanz|NSO-DA:N-UBL-NV-IND-schwanzC

We use the minus symbol - for separating 7 levels of coding of each
referring expression, e.g., syntactic position, lexical realisation,
in/animacy, referent introduction vs. reference maintenance, etc. The
symbol : is used for separating sub-levels within each of the 7
superordinated levels.

Secondly, is there any possibility to link each referring expression
on the *CHI line with its coding on the %cod line? Provisionally, we
opted for typing the referring expression before the coding string,
e.g., 'die katze'.

Thidly and most importantly, we want to conduct analyses concerning
the cooccurrence of elements within each coding string. For instance,
we want to investigate differences in children's realisation of
referents as a function of referent introduction vs. anaphorical
expression (reference maintenance). For that, we want to find a range
of cooccurrences as the following:

DA:N and NV and IND
(where DA:N means definite article + noun, NV means referent
introduction, and IND means indirect anaphor)

I have tried COMB, but either I don't understand the principle for the
syntax of the command line or I miss some important switch or, well, I
don't know what.

Two things are in such searching procedures very important for us:
- The search must be limited to each of the coding strings and not be
based on the whole %cod line. For instanance, when looking for the
cooccurrence DA:N and DIS, CLAN would be supposed not to find it in
the example above, since it doesn't occur in any of the 3 coding
strings. That is, for this concrete example, how can we proceed for
ensuring that CLAN ignores the cooccurrence of DA:N for 'the hund' and
DIS for 'sie'?
- How can we proceed to get quantitative results of such searches? I
mean, in addition to the concrete hits showed in the output window,
it'd be very important to have the number of cooccurrences found in
each chat file, as well as in all chat files in which the cooccurrence
was looked for.

I apologize if the answers for my questions are obvious or easy to be
found in the CLAN manual. I have read the manual very carefully
before sending this query, but I don't seem to be able to find the
needed answers therein.

Many, many thanks in advance for any hint.

Kind regards,
Susanna

*****************************************************************
Susanna Bartsch
https://www.zas.gwz-berlin.de/mitarb/homepage/bartsch/
bar...@zas.gwz-berlin.de
Zentrum fuer Allgemeine Sprachwissenschaft (ZAS)
Centre for General Linguistics
Schuetzenstr. 18
10117 Berlin
Germany
Tel. +49 (0)30 20 192 503
Fax +49 (0)30 20 192 402
*****************************************************************

Brian MacWhinney

unread,

Jun 20, 2008, 5:58:35 PM6/20/08

to chib...@googlegroups.com

Dear Susanna,

Sorry about the delay in replying. I have been traveling. Let
me try to answer some of these questions below.

--Brian MacWhinney

On Jun 19, 2008, at 7:43 AM, bar...@zas.gwz-berlin.de wrote:

>
> Dear all,
>
> For a study on anaphora, we are coding referring expressions in
> children's narratives and I have some questions concerning the coding
> line (we use %cod), as well as subsequent CLAN analyses.
>
> First, is such a %cod line legal?
> *CHI: der hund beisst sie in den schwanz.
> %cod: der hund|S-DA:N-BL-V1:3-AS-DIR-hundC sie|O-PRO:PP-BL-V3:1-AS-
> DIS-
> katzeC den schwanz|NSO-DA:N-UBL-NV-IND-schwanzC
>

For the dependent tier lines like %cod, pretty much everything is
legal, since the programs
don't presume any particularly structure on this line. For these
lines, the main issue is a practical one relating to composing the +s
switch when you need to do searching. Just make sure that you can
find the things you want to find by testing out some FREQ or KWAL
commands in advance.

> We use the minus symbol - for separating 7 levels of coding of each
> referring expression, e.g., syntactic position, lexical realisation,
> in/animacy, referent introduction vs. reference maintenance, etc. The
> symbol : is used for separating sub-levels within each of the 7
> superordinated levels.

This is fine. You will have to have search strings like +s"*-*-*-*-BL-
*" and such. Personally, I would find this confusing and prone to
error, but if you are good at asterisk counting, this will work.

>
>
> Secondly, is there any possibility to link each referring expression
> on the *CHI line with its coding on the %cod line? Provisionally, we
> opted for typing the referring expression before the coding string,
> e.g., 'die katze'.

Ah, herein lies the rub (somewhere in Shakespeare). You are basically
trying to construct something like the %mor line with its 1-to-1 match
to the main line. This is a great idea. However, the CLAN software
is not yet really ready for this. We are currently right in the
middle of implementing strict 1-to-1 matching between the %mor and the
main tier within the XML version of CLAN. Once this is finished then
"match" searches will work with the %mor line. At that point, it
would be relatively easy to extend this to a tier called %mat for a
user-defined matching tier. However, none of this will be ready until
later this year.

>
>
> Thidly and most importantly, we want to conduct analyses concerning
> the cooccurrence of elements within each coding string. For instance,
> we want to investigate differences in children's realisation of
> referents as a function of referent introduction vs. anaphorical
> expression (reference maintenance). For that, we want to find a range
> of cooccurrences as the following:
>
> DA:N and NV and IND
> (where DA:N means definite article + noun, NV means referent
> introduction, and IND means indirect anaphor)

I am not sure what you mean by "range" in your phrase "a range of
cooccurrences". However, finding *-DA:N-*^*^*-NV-* should be possible.

> I have tried COMB, but either I don't understand the principle for the
> syntax of the command line or I miss some important switch or, well, I
> don't know what.

You probably just have to play around to learn how to use COMBO.

>
>
> Two things are in such searching procedures very important for us:
> - The search must be limited to each of the coding strings and not be
> based on the whole %cod line. For instanance, when looking for the
> cooccurrence DA:N and DIS, CLAN would be supposed not to find it in
> the example above, since it doesn't occur in any of the 3 coding
> strings. That is, for this concrete example, how can we proceed for
> ensuring that CLAN ignores the cooccurrence of DA:N for 'the hund' and
> DIS for 'sie'?

That should be easy enough. In COMBO lines, it is the ^ that searches
across word boundaries. Just make sure that your search strings don't
include the ^. So, you want
*-DA:N-*-DIS-*

>
> - How can we proceed to get quantitative results of such searches? I
> mean, in addition to the concrete hits showed in the output window,
> it'd be very important to have the number of cooccurrences found in
> each chat file, as well as in all chat files in which the cooccurrence
> was looked for.
>
> I apologize if the answers for my questions are obvious or easy to be
> found in the CLAN manual. I have read the manual very carefully
> before sending this query, but I don't seem to be able to find the
> needed answers therein.

I don't think you can really learn this stuff by reading the manual.
You just have
to devote an hour or two to playing around with COMBO. Think of it as
a Bach
theme with variations.