Duplicate hits?

Ruprecht von Waldenfels

unread,

Dec 2, 2022, 1:46:10 AM12/2/22

to NoSketch Engine

Dear All,

I realize this may be not unintended behaviour, but I still want to describe the following problem I have run into to my great surprise:

I search in GRAK (uacorpus.org), using Nosketcheingine (https://parasol.vmguest.uni-jena.de/grac_crystal/#dashboard?corpname=grac15) , for the preposition pri followed by an optional adjective and then a noun:

[lemma="при"] [tag="adj.*"]?{tag="noun.*"]

Unexpectedly, I get duplicate results:

That is, sometimes (but not always!), the same example comes up twice - obviously, because the example can be interpreted both as PRI ADJ NOUN and PRI NOUN. This is obviously because the word following PRI is ambigous between adj and noun.

So I have two questions:

a) Is there any way to not get this behaviour?

b) there is a second question related to this work. Given I have a query that gives me multi-word-results, how do I get a frequency distribution of part of these results, say, the last word? In the given example, I am not interested in the adjectives that come between preposition and noun; I only want to get the frequency of the last word of the string, i.e., the noun. How do I do this? I know such a function in CQP used in CorpusWorkBench, but how to do this eludes me in the Manatee CQP.

I would very much appreciate your help,

best wishes to all,

Ruprecht

Michal Cukr | Sketch Engine Support

unread,

Dec 2, 2022, 4:26:23 AM12/2/22

to no...@sketchengine.co.uk, ruprecht....@gmail.com

Dear Ruprecht,

The search results correspond to your query. You are looking for strings with an optional position ([tag="adj.*"]?), and thus you will get both concordance lines. The first one corresponds to the query: [lemma="при"] [tag="noun.*"]

whereas the second one meets the criteria [lemma="при"] [tag="adj.*"][tag="noun.*"].

To hide the first result, please use the Hide sub-hits option. (Sort > Advanced > Hide sub-hits). We have recently informed about this option on our social media channels https://twitter.com/SketchEngine/status/1592835745714425859

Best regards,

Michal Cukr

--
Sketch Engine Team
Email: sup...@sketchengine.eu

Web: https://www.sketchengine.eu/guide/

YouTube tutorials: https://youtube.com/c/SketchEngine

Boot Camp Online – a course in mastering Sketch Engine https://www.sketchengine.eu/bootcamp/

Michal Cukr | Sketch Engine Support

unread,

Dec 2, 2022, 4:36:44 AM12/2/22

to no...@sketchengine.co.uk, ruprecht....@gmail.com

Dear Ruprecht,

I am sorry for the confusion. You need to select FILTER > Advanced > Hide-subhits.

(My previous information about the Sort tool was not correct.)

David Lukeš

unread,

Dec 2, 2022, 6:45:17 AM12/2/22

to Ruprecht von Waldenfels, NoSketch Engine

Hi Ruprecht,

a) Is there any way to not get this behaviour?

In the advanced filtering menu, you can click on Hide sub-hits. The button is under the label Quick filters off towards the right edge of the screen, so you might miss it at first (I did).

You could also tweak your query, e.g. to something like [lemma="при"] [tag="adj.*"]? [tag="noun.*" & tag!="adj.*"] if you know for a fact that the nouns you actually want should never be ambiguous. But if you’re not sure, the sub-hit filter is safer and more generally applicable.

I only want to get the frequency of the last word of the string, i.e., the noun

When creating the frequency distribution, select the Advanced tab and change the dropdown that says KWIC to Last KWIC word. This sets the anchor respective to which you select the position to analyze frequency on. If you leave the anchor itself selected, the result will be a freq. dist. of the last token in each match; if you click on 1 in the left context, it will be a freq. dist. of the second to last token in each match, etc.

If you want to pick and choose multiple positions according to which you want to perform the frequency breakdown, use the + button to add more criteria. E.g. both First KWIC word and Last KWIC word at the same time, skipping over optional intervening ones. Doesn’t quite make sense in your specific case, since the first word is always the same lemma, but you get the picture.

Best,

David

Ruprecht von Waldenfels

unread,

Dec 2, 2022, 6:47:09 AM12/2/22

to David Lukeš, NoSketch Engine

Thanks, David!

The query tweak I I actually thought of myself after posting, the other two tips are extremely helpful.

Many greetings from snowy Jena!

Ruprecht

Am 02.12.22 um 12:45 schrieb David Lukeš:

David Lukeš

unread,

Dec 19, 2022, 8:10:42 AM12/19/22

to NoSketch Engine, ruprecht....@gmail.com, NoSketch Engine, David Lukeš

Reviewing my spam folder today, I noticed Michal Cukr had replied to Ruprecht mentioning the Hide sub-hits feature before I did — it just unfortunately happened to land in spam for me. Apologies for the duplication!

David

Reply all

Reply to author

Forward