Dear ChiBolts,
I am replying here to a message that Darinka posted to info-childes. Because this is a message about the functioning of CLAN programs, it is best discussed on chibolts instead. Here is her message and my answer:
Dear collegues,
I wanted to match codes on the %mor tier to words on the main line and I used MODREP for it:
MODREP +u +k +r2 +b%mor +c* +t@ID=2.*.*.*.*.* +o"v|*" *.mor
But then I found out that the output of this command do not provide the same frequency for codes as I get from FREQ on %mor tier. MODREP provided systematically larger frequency then FREQ.
I had expected them to be exactly the same. Do I missunderstand something?
Darinka Andjelkovic
Laboratory of Experimental Psychology and Institute of Psychology
Faculty of Philosophy
University of Belgrade
Serbia
Dear Darinka,
The difference arises from the fact that the %mor line excludes repetitions and several other things from the main line. The goal of the %mor line is to analyze morphology and syntax. To do this, excluding repetitions helps in automatic part of speech tagging. The %mod line, on the other hand, is pegged to the main line in a way that includes all words, including repetitions. Therefore, it will have larger numbers than a FREQ on the %mor line. You can modify what FREQ looks at on the main line by using the +r6 switch to exclude repetitions. You might check that. In general, few people have used the %mod line as the target for FREQ, so we haven't payed much attention to the details of what is there.
-- Brian MacWhinney