Dear Brian,
thanks for responding so quickly! We are working with written texts produced by young foreign language learners and want to code both for clauses and T-units/C-units to get ratios such as clauses per T-unit (a measure for which scoping is not crucial). We are relatively new to CLAN and had a tutorial with Victoria Johansson from Lund University, where they used CLAN for a big project on language development. They used the following strategy to encode T-units and clauses: clauses were placed on separate chat lines, while T-units were separated using @EndTurn. Center-embedded clauses (which there are many of in Swedish) were repeated on a separate subordinate tier, named %ces.
After reading the CHAT/CLAN manuals, we realized that an alternative option would be to place T-units on separate chat lines, and use [^c] to code for clauses. We have been since grappling with the possible consequences of choosing one strategy over the other, so if you have some thoughts around this, we would be happy to get your advice. We want later to share the corpus with our students for further analyses (and further coding), so we are trying to think carefully about the different options.So far we are leaning towards the [^c] strategy. Though T-units have been used in the literature to measure development, we are also interested in conducting a more detailed analysis, and investigating what kind of clauses and structures are inside those T-units to better understand what our learners can do and which clauses and structures they rely on. We are also interested in investigating the extent to which they use what LGSWE calls syntactic nonclausal structures (which seem to be quite common in our data set). We have the impression that a lot of this information can be encoded by modifying [^c]. It seems to us that there might be issues related to scoping if one then wanted to limit the search to features related to subtypes of [^c] (for instance, number of errors/or certain types of errors in certain types of clauses, such as relative clauses). It is possible that as you suggest these are better handled by relying on the %gra line, though I think we would have to conduct an extensive manual error analysis if we wanted to get a relatively accurate %gra line.
Best,
Monika