Hi all,
I'm running some CLAN FREQ searches for my MA thesis and could use advice on how to handle German verbs in the %mor tier. I'm trying to calculate frequency of transitive verbs in German child-directed speech, but I'm running into problems with separable-prefix verbs.
On the %mor tier, I've noticed that separable-prefix verbs appear as the correct lemma (ex.
anrufen) when they are in the infinitive. But in finite forms where the prefix is separated, the prefix is often tagged as an adverb/particle (ex. adv|an v|rufen). Because of this, it's hard for me to get accurate lemma counts.
I’ve also come across instances where multiword sequences such as
darunter bauen (“build underneath it”) are analyzed as a pseudo-lemma like
darunterbauen.
- Is there a way in CLAN/MOR to consistently output the full verb lemma (prefix + stem) without going line by line?
- Is there a recommended process to recombine prefix+ stem (but avoid false lemmas like darunterbauen)?
As a separate problem, I’ve also noticed that some forms of wissen (like
weiß) are being tagged as
weißen (“to whiten”) on the %mor tier.
- Is this a known issue in the German MOR grammar?
- Is there a standard fix for this in CLAN?
Thank you so much!
Sincerely,
Lanna