--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.
To view this discussion visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/AS1PR10MB53655ADB31DC49F153B70F0D9B282%40AS1PR10MB5365.EURPRD10.PROD.OUTLOOK.COM.
Dear Michal,
thanks for looking into this. As I was the one to prepare the files and mount them on the concordancers, I can try and take it from here:
I think that the registry and vertical files didn't change since the compilation of the corpus, but I admit it can be confusing that the text (i.e. speech) type "subcorpus" can have the value of "COVID,War", while the registry file does not specify that this is a multivalued attribute. However, this is on purpose, as whatever is in the War subcorpus is also in the COVID subcorpus, i.e. we have the possible values "Reference", "COVID" and "COVID,War", which makes it easier to choose the subcorpus one wants, rather than the values being multivalued.
We come now to the strange part: if we look at the Text type
analysis of this corpus (as Kristina wrote), i.e. at
https://www.clarin.si/ske/#text-type-analysis?corpname=parlamint41_xx_en&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.subcorpus
we get two values only:
So, the value "COVID,War" does not show up, even though it is in the vertical file for the corpus, as you saw in the sample.
Also, the structure frequency here shown is 7,650,267 while the
complete corpus as 8,081,124 speeches.
and click to get the metadata of the first hit, I get
To view this discussion visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/5a1fe652-67af-42d8-b01e-1e00027da30a%40ijs.si.
Hi Ondrej,
thanks, looking forward to the next release!
And, just to let you know, I now put
ATTRIBUTE subcorpus {
TYPE "MD_MGD"
MULTISEP "÷"
}
in the registry file, and, indeed, it fixes the problem, cf.
https://www.clarin.si/ske/#text-type-analysis?corpname=parlamint41_xx_en&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.subcorpus
All the best,
Tomaž