Dear All,
I realize this may be not unintended behaviour, but I still want to describe the following problem I have run into to my great surprise:
I search in GRAK (uacorpus.org), using Nosketcheingine
(https://parasol.vmguest.uni-jena.de/grac_crystal/#dashboard?corpname=grac15)
, for the preposition pri followed by an optional adjective and
then a noun:
[lemma="при"] [tag="adj.*"]?{tag="noun.*"]
Unexpectedly, I get duplicate results:
That is, sometimes (but not always!), the same example comes up twice - obviously, because the example can be interpreted both as PRI ADJ NOUN and PRI NOUN. This is obviously because the word following PRI is ambigous between adj and noun.
So I have two questions:
a) Is there any way to not get this behaviour?
b) there is a second question related to this work. Given I have a query that gives me multi-word-results, how do I get a frequency distribution of part of these results, say, the last word? In the given example, I am not interested in the adjectives that come between preposition and noun; I only want to get the frequency of the last word of the string, i.e., the noun. How do I do this? I know such a function in CQP used in CorpusWorkBench, but how to do this eludes me in the Manatee CQP.
I would very much appreciate your help,
best wishes to all,
Ruprecht
Hi Ruprecht,
a) Is there any way to not get this behaviour?
In the advanced filtering menu, you can click on Hide sub-hits. The button is under the label Quick filters off towards the right edge of the screen, so you might miss it at first (I did).
You could also tweak your query, e.g. to something like [lemma="при"] [tag="adj.*"]? [tag="noun.*" & tag!="adj.*"]
if you know for a fact that the nouns you actually want should never be ambiguous. But if you’re not sure, the sub-hit filter is safer and more generally applicable.
I only want to get the frequency of the last word of the string, i.e., the noun
When creating the frequency distribution, select the Advanced tab and change the dropdown that says KWIC to Last KWIC word. This sets the anchor respective to which you select the position to analyze frequency on. If you leave the anchor itself selected, the result will be a freq. dist. of the last token in each match; if you click on 1 in the left context, it will be a freq. dist. of the second to last token in each match, etc.
If you want to pick and choose multiple positions according to which you want to perform the frequency breakdown, use the + button to add more criteria. E.g. both First KWIC word and Last KWIC word at the same time, skipping over optional intervening ones. Doesn’t quite make sense in your specific case, since the first word is always the same lemma, but you get the picture.
Best,
David
Reviewing my spam folder today, I noticed Michal Cukr had replied to Ruprecht mentioning the Hide sub-hits feature before I did — it just unfortunately happened to land in spam for me. Apologies for the duplication!
David