scoping of a clause delimiter

Monika Bader

unread,

May 8, 2019, 7:48:29 AM5/8/19

to chibolts

Hi,

we are trying to decide on the best way to code for clauses. The manual suggests using a clause delimiter, and we quite like this option (especially the possibility of creating user defined codes). However, we are somewhat worried about the scoping of the symbol. We understand that for some analyses, such as MLU/MLT based on clauses, this is not a crucial issue, but we do believe that for some other analyses one would need the right scoping (if we are not mistaken). For instance, calculating words per error free clauses (or any other clause code one uses). In examples such as

The book [that you buyed yesterday] [^c err] has disappeared [^c]

[^c err] would scope over "the book" as well, which we wouldn't necessarily want to include. In some languages these kinds of nested/center-embedded clauses are more common than in other. The manual says that "it is not necessary to mark the scope", but is it possible? or is there any other way to deal with cases such as these?

We appreciate any suggestions!

Best,

Monika

Brian MacWhinney

unread,

May 8, 2019, 10:01:54 AM5/8/19

to ChiBolts

Dear Monika,

I am not familiar with work that calculates MLU based on clauses and I am not sure why one would want to use such a measure. The major point of MLU is to consider the extent to which speakers compose more complex sentences and the act of breaking up sentences into clauses would actually remove the thing that it is trying to measure.

As you say, this system of clause marking is definitely not going to work well for center embedding. You could get the scope of the embedded clause, but then the main clause would be broken up. But perhaps that is interesting in itself.

I am curious why you are using this type of analysis. What exactly are you interested in measuring. It seems to me that, if you have a relatively accurate %gra line for an utterance, then that could be more useful that hand-done clause marking.

--Brian MacWhinney

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/9212b6f1-ad4c-4183-b277-45711661af3c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Monika Bader

unread,

May 9, 2019, 8:11:13 AM5/9/19

to chibolts

Dear Brian,

thanks for responding so quickly! We are working with written texts produced by young foreign language learners and want to code both for clauses and T-units/C-units to get ratios such as clauses per T-unit (a measure for which scoping is not crucial). We are relatively new to CLAN and had a tutorial with Victoria Johansson from Lund University, where they used CLAN for a big project on language development. They used the following strategy to encode T-units and clauses: clauses were placed on separate chat lines, while T-units were separated using @EndTurn. Center-embedded clauses (which there are many of in Swedish) were repeated on a separate subordinate tier, named %ces.

After reading the CHAT/CLAN manuals, we realized that an alternative option would be to place T-units on separate chat lines, and use [^c] to code for clauses. We have been since grappling with the possible consequences of choosing one strategy over the other, so if you have some thoughts around this, we would be happy to get your advice. We want later to share the corpus with our students for further analyses (and further coding), so we are trying to think carefully about the different options.So far we are leaning towards the [^c] strategy. Though T-units have been used in the literature to measure development, we are also interested in conducting a more detailed analysis, and investigating what kind of clauses and structures are inside those T-units to better understand what our learners can do and which clauses and structures they rely on. We are also interested in investigating the extent to which they use what LGSWE calls syntactic nonclausal structures (which seem to be quite common in our data set). We have the impression that a lot of this information can be encoded by modifying [^c]. It seems to us that there might be issues related to scoping if one then wanted to limit the search to features related to subtypes of [^c] (for instance, number of errors/or certain types of errors in certain types of clauses, such as relative clauses). It is possible that as you suggest these are better handled by relying on the %gra line, though I think we would have to conduct an extensive manual error analysis if we wanted to get a relatively accurate %gra line.

Best,

Monika

onsdag 8. mai 2019 16.01.54 UTC+2 skrev macw følgende:

Dear Monika,
I am not familiar with work that calculates MLU based on clauses and I am not sure why one would want to use such a measure. The major point of MLU is to consider the extent to which speakers compose more complex sentences and the act of breaking up sentences into clauses would actually remove the thing that it is trying to measure.
As you say, this system of clause marking is definitely not going to work well for center embedding. You could get the scope of the embedded clause, but then the main clause would be broken up. But perhaps that is interesting in itself.
I am curious why you are using this type of analysis. What exactly are you interested in measuring. It seems to me that, if you have a relatively accurate %gra line for an utterance, then that could be more useful that hand-done clause marking.

--Brian MacWhinney

On May 8, 2019, at 7:48 AM, 'Monika Bader' via chibolts <chib...@googlegroups.com> wrote:

Hi,
we are trying to decide on the best way to code for clauses. The manual suggests using a clause delimiter, and we quite like this option (especially the possibility of creating user defined codes). However, we are somewhat worried about the scoping of the symbol. We understand that for some analyses, such as MLU/MLT based on clauses, this is not a crucial issue, but we do believe that for some other analyses one would need the right scoping (if we are not mistaken). For instance, calculating words per error free clauses (or any other clause code one uses). In examples such as

The book [that you buyed yesterday] [^c err] has disappeared [^c]

[^c err] would scope over "the book" as well, which we wouldn't necessarily want to include. In some languages these kinds of nested/center-embedded clauses are more common than in other. The manual says that "it is not necessary to mark the scope", but is it possible? or is there any other way to deal with cases such as these?

We appreciate any suggestions!

Best,
Monika

--
You received this message because you are subscribed to the Google Groups "chibolts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to chib...@googlegroups.com.

brielle.stark

unread,

Jan 10, 2020, 11:46:49 AM1/10/20

to chibolts

I just asked a similar question, I believe, e.g. how to count certain clauses. Perhaps we're on the same train of thought.

For instance, we're using the Peer Conflict Resolution task (https://pubs.asha.org/doi/full/10.1044/1058-0360%282007/022%29), and their main outcomes measures are (1) mean length of T-unit, (2) clausal density (based on number of independent, nominative, relative and adverbial clauses) and (3) overall subordinate clause use. It's really quite difficult to train raters to code each clause type within an already heavily coded sample (e.g. we're doing typical word-level codes, utterance-level codes as well as correct information unit [e] codes), so I didn't know if there was something in the %gra dependencies that we could extract to get this information without hand-coding? Perhaps this is a similar answer to what Monika is looking for?

Best,

Brie

brielle.stark

unread,

Jan 10, 2020, 11:47:36 AM1/10/20

to chibolts

I should also say that we typically code C-units and not T-units, and we're hoping to not have to code two types of utterances (e.g. not teach raters how to do both T- and C-unit delineation).

Brie

Reply all

Reply to author

Forward