In response to work that Johannes Wagner and Lone Laursen are
doing on a gold standard corpus for Danish CA -CHAT transcription, we
have modified three things in CHAT and I am interested in getting
reactions to these changes. Before going into the details, let me
emphasize that Lone and Johannes are trying to avoid breaking up
TCUs. Because non-CHA CHAT tends to break up TCUs, it is important to
provide ways of avoiding TCU breakup for CA transcription.
1. Earlier, we had introduced the triple wavy mark as the indicator
of a continuation of a TCU across an interruption from another
speaker. See line 9 at http://talkbank.org/CABank/codes.html. This
is not the mark of basic speaker-internal latching, but rather of TCU
continuation. However,initially we used the same triple wavy symbol
at the end of the first segment of the TCU and the beginning of the
second segment. This is not good for computational analysis, so we
now added a plus before the triple wavy to mark the second case.
2. It is often the case that a single TCU includes a variety of
phrases that are marked with one of the six final contours gien in
rows 3-7 at http://talkbank.org/CABank/codes.html. However, in order
to make it clear to the computer that these marks are not the ends of
the TCU, we are adding a comma after them when they are not TCU final.
3. It is often the case that a speaker will continually mark
acknowledgement, assent, or coparticipation with short forms such as
"ja" "mhmm" or "uhhuh" that punctuate a longer TCU by the other
speaker. Interrupting the ongoing TCU overlp marking and new lines
for this can be cumbersome. So, we have added a new code to CHAT that
allows for in-line marking of these acknowledgments. The form is
&*SAM=yeah, where SAM is the code name for a speaker such as SAM or,
if the code was just S, then it would be &*S=yeah. This code is
placed directly after the word with which it overlaps.
I would be interested in comments regarding any of these additions to
CHAT. Many thanks.
-- Brian MacWhinney