Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

compilecorp WARNING: lempos could be created from tag and lemma

22 views
Skip to first unread message

Valdis Saulespurens

unread,
Jul 4, 2024, 6:15:11 AM7/4/24
to NoSketch Engine
What is the command to create lempos when compiling corpora? 

Maybe one needs to change something in registry config?

As I understand lempos is simply lemma + tag

I am looking at compilecorp instructions:


We have migrated to newer NoSketch version at https://korpuss.lnb.lv (based on Docker version from https://github.com/ELTE-DH/NoSketch-Engine-Docker)

lempos WARNING is the last warning I get when compiling with following command:

docker exec noske compilecorp --no-ske --recompile-corpus salins

Checking corpus salins:
 * registry file
WARNING: lempos could be created from tag and lemma
 * lexicon queries
 * dynamic attributes
 * word sketches
 * sizes
 * structure sanity

(salins is the name of the registry file that I am attaching)

Best regards,
   Valdis Saulespurens
researcher National Library of Latvia





salins.txt

Michal Cukr | Sketch Engine Support

unread,
Jul 11, 2024, 1:47:22 AM7/11/24
to no...@sketchengine.co.uk, valdis.sa...@gmail.com
Dear Valdis,

You understand the lempos attribute correctly. It is a combination of lemma and abbreviation of part of speech. We do not have any in-built tool for computing this attribute.
You can easily create it with a simple python script where you map each part of speech a to single-letter abreviation (for undefined/unspecified parts of speech you can provide the default value "x",  e.g. usually used for interjection or particles, but depending on you). Then you use a for loop for reprocessing your vertical file and replacing the lemma column with the lempos values created by your script. The lemma attribute is finally created as a DYNAMIC attribute from the lempos attribute (see more at 

Otherwise, the warning on missing lempos can be ignored. It is only recommendation for better use of a corpus. The presence of the lempos attribute within the corpus allows corpus creator to add a part-of-speech selector available on the Advnaced tab of the word list, concordance or n-grams feature. This can implemented by adding the LPOSLIST line into your corpus configuration file, please check our documentation https://www.sketchengine.eu/documentation/fine-tune-your-corpus/#toggle-id-10

Also you can then compile word sketches on the lempos attribute to allow users select a particular part of speech when generate word sketch results in the interface.

Best regards,

Michal Cukr 


--
Sketch Engine Team
Email: sup...@sketchengine.eu
Boot Camp Online – a course in mastering Sketch Engine https://www.sketchengine.eu/bootcamp/


--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.
To view this discussion on the web visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/2f068663-ccfa-490c-b3ff-9261723f4d0bn%40sketchengine.co.uk.



Reply all
Reply to author
Forward
0 new messages