Restricting kwic kontext to a <doc> element

10 views
Skip to first unread message

Normunds Grūzītis

unread,
May 30, 2023, 11:56:30 AM5/30/23
to NoSketch Engine
Hello everyone,

In our grpoup, we are widely using NoSke for text corpora (https://korpuss.lv/en/) and now we are testing it for speech corpora.

In speech corpora, "documents" can be very short - just isolated phrases / segments; consider, for instance, the Common Voice corpora.
Is it possible to somehow restrict the context in the concordance view to a single document?

I have attached a screenshot illustrating that "previous" and "next" documents are included by default in the context window. Our users say that this is very confusing.

Best regards,

Normunds
University of Latvia

doc.png

Miloš Jakubíček

unread,
May 30, 2023, 1:10:24 PM5/30/23
to Normunds Grūzītis, NoSketch Engine
Hi Normunds,

in the concordance view, you can switch from KWIC view to sentence view (see https://www.sketchengine.eu/my_keywords/kwic/) if you have sentences marked with the <s> structure in the corpus.
So, either make the <doc> into an <s> and recompile the corpus; or it looks like you could also tweak this in the run.cgi setting senleftctx = '-1:doc' and senrightctx = '1:doc' in the properties of the BonitoCGI class
(which defaults to '-1:s' and '1:s' in conccgi.py) -- haven't tried myself though so this is just a quick hunch based on looking into the code for two minutes ;-)

Best
Milos


Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton, UK


--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.
To view this discussion on the web visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/092c032c-64f9-46a5-8b14-983250281333n%40sketchengine.co.uk.

Normunds Grūzītis

unread,
May 30, 2023, 2:54:38 PM5/30/23
to Miloš Jakubíček, NoSketch Engine
Thanks, Miloš, it works! Although the kwic alignment is lost.

We will try the run.cgi setting which seems even a better solution.

Best,
Normunds


Reply all
Reply to author
Forward
0 new messages