Ambiguous annotations

106 views
Skip to first unread message

Ruprecht von Waldenfels

unread,
Nov 25, 2017, 10:45:07 AM11/25/17
to NoSketch Engine
Dear All,
in CWB, which has almost the same format as NoSketch engine, you can specify lemma sets in the vertical text as follows:

dove |dive|dove| |VPast|N|

I.e., for the word form "dove", the lemmas "dive" and "dove", and the tags VPast and N are given as choices, i.e., sets.

Can you do this in SketchEngine, too? I just encoded  a corpus like this, and I got some (rather unclear) mistakes. For this reason? Is such input supported?

Best,
Ruprecht

Ruprecht von Waldenfels

unread,
Nov 25, 2017, 10:59:12 AM11/25/17
to NoSketch Engine
Hi, I think I figured it out -  it should probably look like this:
ATTRIBUTE   lemma {
       MULTIVALUE y
       MULTISEP "|"
}
Best, Ruprecht



Miloš Jakubíček

unread,
Nov 25, 2017, 12:02:08 PM11/25/17
to Ruprecht von Waldenfels, NoSketch Engine
yes, exactly

Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton UK
http://www.sketchengine.co.uk

--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+unsubscribe@sketchengine.co.uk.
To post to this group, send email to no...@sketchengine.co.uk.
Visit this group at https://groups.google.com/a/sketchengine.co.uk/group/noske/.
To view this discussion on the web visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/3f9602a2-2a54-4fd1-a27e-bf4eaec7f15a%40sketchengine.co.uk.

Ruprecht von Waldenfels

unread,
Nov 25, 2017, 3:48:51 PM11/25/17
to Miloš Jakubíček, NoSketch Engine
Thanks - but for some reason, it didn't work anyway. It said:

arallel@vmd20395:/corpora$ sudo compilecorp --recompile-corpus --no-ske grac /data/UkrRegCorp/fullcorpus.cwb.txt
Manatee version: 2.36.5-open-2.151.5
Reading corpus configuration...
PATH=/corpora/data/grac/
VERTICAL=/data/UkrRegCorp/fullcorpus.cwb.txt
WSDEF=
WSHIST=
SUBCDEF=
WSBASE=
WSATTR=
WSTHES=
WSMINHITS=
WSOLDSCORES=
ALIGNDEF=
ALIGNED=
TERMDEF=
TERMBASE=
ATTRLIST=word,lemma,tag,lc,lemma_lc
DIACHRONIC=
Vertical text will be read from /data/UkrRegCorp/fullcorpus.cwb.txt
Deleting corpus PATH directory...
Compiling corpus...
[20171125-16:57:20] Available memory 7015MB, estimated usage 159MB
[20171125-16:57:22] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:57:23] lexicon (/corpora/data/grac/word) make_lex_srt_file
^C
parallel@vmd20395:/corpora$ sudo compilecorp --recompile-corpus --no-ske grac /data/UkrRegCorp/fullcorpus.cwb.txt
Manatee version: 2.36.5-open-2.151.5
Reading corpus configuration...
PATH=/corpora/data/grac/
VERTICAL=/data/UkrRegCorp/fullcorpus.cwb.txt
WSDEF=
WSHIST=
SUBCDEF=
WSBASE=
WSATTR=
WSTHES=
WSMINHITS=
WSOLDSCORES=
ALIGNDEF=
ALIGNED=
TERMDEF=
TERMBASE=
ATTRLIST=word,lemma,tag,lc,lemma_lc
DIACHRONIC=
Corpus is compiled
Vertical text will be read from /data/UkrRegCorp/fullcorpus.cwb.txt
Deleting corpus PATH directory...
Compiling corpus...
[20171125-16:57:43] Available memory 7015MB, estimated usage 159MB
[20171125-16:57:44] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:57:45] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:57:47] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:57:47] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:57:50] Processed 1000000 lines, 941435 positions.
[20171125-16:57:51] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:57:52] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:57:58] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:57:59] Processed 2000000 lines, 1869916 positions.
[20171125-16:58:00] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:58:09] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:58:11] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:58:15] Processed 3000000 lines, 2804583 positions.
[20171125-16:58:22] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:58:28] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:58:31] Processed 4000000 lines, 3736421 positions.
[20171125-16:58:36] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:58:45] Processed 5000000 lines, 4658867 positions.
[20171125-16:58:46] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:58:49] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:58:56] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:58:58] Processed 6000000 lines, 5603229 positions.
[20171125-16:59:02] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:59:13] Processed 7000000 lines, 6534299 positions.
[20171125-16:59:14] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:59:22] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:59:32] Processed 8000000 lines, 7466812 positions.
[20171125-16:59:38] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-16:59:45] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-16:59:47] Processed 9000000 lines, 8383019 positions.
[20171125-17:00:00] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:00:06] Processed 10000000 lines, 9299651 positions.
[20171125-17:00:19] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:00:26] Processed 11000000 lines, 10234199 positions.
[20171125-17:00:37] Processed 12000000 lines, 11170881 positions.
[20171125-17:00:53] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:00:56] Processed 13000000 lines, 12089412 positions.
[20171125-17:01:11] Processed 14000000 lines, 13009533 positions.
[20171125-17:01:13] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:01:29] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:01:32] Processed 15000000 lines, 13944968 positions.
[20171125-17:01:46] Processed 16000000 lines, 14888589 positions.
[20171125-17:01:50] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:02:03] Processed 17000000 lines, 15827366 positions.
[20171125-17:02:06] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:02:18] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:02:20] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:02:25] Processed 18000000 lines, 16740055 positions.
[20171125-17:02:37] Processed 19000000 lines, 17677076 positions.
[20171125-17:02:47] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:02:53] Processed 20000000 lines, 18623208 positions.
[20171125-17:02:56] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:03:09] Processed 21000000 lines, 19560477 positions.
[20171125-17:03:21] Processed 22000000 lines, 20492187 positions.
[20171125-17:03:35] Processed 23000000 lines, 21437736 positions.
[20171125-17:03:48] Processed 24000000 lines, 22372023 positions.
[20171125-17:03:54] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:04:03] Processed 25000000 lines, 23303377 positions.
[20171125-17:04:16] Processed 26000000 lines, 24208634 positions.
[20171125-17:04:30] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:04:36] Processed 27000000 lines, 25135095 positions.
[20171125-17:04:50] Processed 28000000 lines, 26072812 positions.
[20171125-17:05:03] Processed 29000000 lines, 27001164 positions.
[20171125-17:05:22] Processed 30000000 lines, 27940447 positions.
[20171125-17:05:24] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:05:42] Processed 31000000 lines, 28866398 positions.
[20171125-17:05:56] Processed 32000000 lines, 29756404 positions.
[20171125-17:06:13] Processed 33000000 lines, 30681813 positions.
[20171125-17:06:30] Processed 34000000 lines, 31613975 positions.
[20171125-17:06:46] Processed 35000000 lines, 32545712 positions.
[20171125-17:07:02] Processed 36000000 lines, 33453751 positions.
[20171125-17:07:17] Processed 37000000 lines, 34382006 positions.
[20171125-17:07:31] Processed 38000000 lines, 35311298 positions.
[20171125-17:07:41] Processed 39000000 lines, 36235561 positions.
[20171125-17:07:55] Processed 40000000 lines, 37156531 positions.
[20171125-17:08:06] Processed 41000000 lines, 38081518 positions.
[20171125-17:08:19] Processed 42000000 lines, 39014081 positions.
[20171125-17:08:34] Processed 43000000 lines, 39941683 positions.
[20171125-17:08:48] Processed 44000000 lines, 40878402 positions.
[20171125-17:09:01] Processed 45000000 lines, 41808789 positions.
[20171125-17:09:16] Processed 46000000 lines, 42737082 positions.
[20171125-17:09:32] Processed 47000000 lines, 43657624 positions.
[20171125-17:09:46] Processed 48000000 lines, 44591218 positions.
[20171125-17:09:58] Processed 49000000 lines, 45515243 positions.
[20171125-17:10:14] Processed 50000000 lines, 46444271 positions.
[20171125-17:10:28] Processed 51000000 lines, 47388562 positions.
[20171125-17:10:41] Processed 52000000 lines, 48295519 positions.
[20171125-17:10:54] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:11:02] Processed 53000000 lines, 49227796 positions.
[20171125-17:11:14] Processed 54000000 lines, 50157101 positions.
[20171125-17:11:28] Processed 55000000 lines, 51074839 positions.
[20171125-17:11:44] Processed 56000000 lines, 52000585 positions.
[20171125-17:11:59] Processed 57000000 lines, 52925670 positions.
[20171125-17:12:12] Processed 58000000 lines, 53843933 positions.
[20171125-17:12:22] Processed 59000000 lines, 54749523 positions.
[20171125-17:12:36] Processed 60000000 lines, 55690570 positions.
[20171125-17:12:48] Processed 61000000 lines, 56630502 positions.
[20171125-17:13:03] Processed 62000000 lines, 57555587 positions.
[20171125-17:13:17] Processed 63000000 lines, 58499809 positions.
[20171125-17:13:31] Processed 64000000 lines, 59436555 positions.
[20171125-17:13:45] Processed 65000000 lines, 60375349 positions.
[20171125-17:13:55] Processed 66000000 lines, 61309220 positions.
[20171125-17:14:08] Processed 67000000 lines, 62176394 positions.
[20171125-17:14:21] Processed 68000000 lines, 63056602 positions.
[20171125-17:14:33] Processed 69000000 lines, 63980677 positions.
[20171125-17:14:48] Processed 70000000 lines, 64919163 positions.
[20171125-17:14:59] Processed 71000000 lines, 65863393 positions.
[20171125-17:15:14] Processed 72000000 lines, 66779290 positions.
[20171125-17:15:28] Processed 73000000 lines, 67712714 positions.
[20171125-17:15:38] Processed 74000000 lines, 68658259 positions.
[20171125-17:15:52] Processed 75000000 lines, 69583546 positions.
[20171125-17:16:07] Processed 76000000 lines, 70523415 positions.
[20171125-17:16:25] Processed 77000000 lines, 71468314 positions.
[20171125-17:16:43] Processed 78000000 lines, 72404352 positions.
[20171125-17:16:57] Processed 79000000 lines, 73337098 positions.
[20171125-17:17:09] Processed 80000000 lines, 74266046 positions.
[20171125-17:17:21] Processed 81000000 lines, 75181042 positions.
[20171125-17:17:37] Processed 82000000 lines, 76108612 positions.
[20171125-17:17:49] Processed 83000000 lines, 77049366 positions.
[20171125-17:18:00] Processed 84000000 lines, 77975483 positions.
[20171125-17:18:11] Processed 85000000 lines, 78913531 positions.
[20171125-17:18:23] Processed 86000000 lines, 79838009 positions.
[20171125-17:18:38] Processed 87000000 lines, 80779235 positions.
[20171125-17:18:52] Processed 88000000 lines, 81718340 positions.
[20171125-17:19:08] Processed 89000000 lines, 82645042 positions.
[20171125-17:19:20] Processed 90000000 lines, 83579701 positions.
[20171125-17:19:36] Processed 91000000 lines, 84497938 positions.
[20171125-17:19:53] Processed 92000000 lines, 85423211 positions.
[20171125-17:20:12] Processed 93000000 lines, 86354922 positions.
[20171125-17:20:29] Processed 94000000 lines, 87286711 positions.
[20171125-17:20:47] Processed 95000000 lines, 88211773 positions.
[20171125-17:21:06] Processed 96000000 lines, 89141731 positions.
[20171125-17:21:21] Processed 97000000 lines, 90079443 positions.
[20171125-17:21:36] Processed 98000000 lines, 91014352 positions.
[20171125-17:21:51] Processed 99000000 lines, 91941439 positions.
[20171125-17:22:00] Processed 100000000 lines, 92864670 positions.
[20171125-17:22:13] Processed 101000000 lines, 93797682 positions.
[20171125-17:22:31] Processed 102000000 lines, 94727364 positions.
[20171125-17:22:46] Processed 103000000 lines, 95665447 positions.
[20171125-17:23:05] Processed 104000000 lines, 96606358 positions.
[20171125-17:23:24] Processed 105000000 lines, 97542597 positions.
[20171125-17:23:42] Processed 106000000 lines, 98479431 positions.
[20171125-17:24:00] Processed 107000000 lines, 99422504 positions.
[20171125-17:24:18] Processed 108000000 lines, 100349821 positions.
[20171125-17:24:34] Processed 109000000 lines, 101289313 positions.
[20171125-17:24:48] Processed 110000000 lines, 102227709 positions.
[20171125-17:25:01] Processed 111000000 lines, 103162094 positions.
[20171125-17:25:17] Processed 112000000 lines, 104098809 positions.
[20171125-17:25:29] Processed 113000000 lines, 105031884 positions.
[20171125-17:25:43] Processed 114000000 lines, 105955967 positions.
[20171125-17:25:57] Processed 115000000 lines, 106884247 positions.
[20171125-17:26:10] Reading input file finished.
[20171125-17:26:10] Closing structure font ...
[20171125-17:26:10] warning: structure attribute (font.type) never present
[20171125-17:26:10]             ... finished.
[20171125-17:26:10] Closing structure g ...
[20171125-17:26:10]             ... finished.
[20171125-17:26:10] Closing structure head ...
[20171125-17:26:10] warning: structure attribute (head.type) never present
[20171125-17:26:10]             ... finished.
[20171125-17:26:10] Closing structure meta ...
[20171125-17:26:10] warning: structure attribute (meta.file) never present
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.author) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.born) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.date) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.file) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.genre) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.original) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.region) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.title) make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.translator) make_lex_srt_file
[20171125-17:26:10]             ... finished.
[20171125-17:26:10] Closing structure s ...
[20171125-17:26:10] /data/UkrRegCorp/fullcorpus.cwb.txt:115874154: warning: auto-closing structure (s) opened at line /data/UkrRegCorp/fullcorpus.cwb.txt:115874142:
[20171125-17:26:10]             ... finished.
[20171125-17:26:10] Closing attribute /corpora/data/grac/word ...
[20171125-17:26:10] lexicon (/corpora/data/grac/word) make_lex_srt_file
[20171125-17:26:55] Creating regular expression optimization attribute word
[20171125-17:28:28] lexicon (/corpora/data/grac/word.regex) make_lex_srt_file
[20171125-17:28:28]             ... finished.
[20171125-17:28:28] Closing attribute /corpora/data/grac/lemma ...
[20171125-17:28:28] lexicon (/corpora/data/grac/lemma) make_lex_srt_file
[20171125-17:29:50] Creating regular expression optimization attribute lemma
[20171125-17:32:25] lexicon (/corpora/data/grac/lemma.regex) make_lex_srt_file
[20171125-17:32:26]             ... finished.
[20171125-17:32:26] Closing attribute /corpora/data/grac/tag ...
[20171125-17:32:26] lexicon (/corpora/data/grac/tag) make_lex_srt_file
[20171125-17:33:14] Creating regular expression optimization attribute tag
[20171125-17:33:18] lexicon (/corpora/data/grac/tag.regex) make_lex_srt_file
[20171125-17:33:18]             ... finished.
[20171125-17:33:18] Processed 115874154 lines, 107710157 positions.
[20171125-17:33:18] Creating dynamic attribute lc
[20171125-17:33:19] lexicon (/corpora/data/grac/lc) make_lex_srt_file
[20171125-17:33:24] lexicon (/corpora/data/grac/lc) make_lex_srt_file
[20171125-17:33:30] lexicon (/corpora/data/grac/lc) make_lex_srt_file
[20171125-17:33:38] lexicon (/corpora/data/grac/lc) make_lex_srt_file
[20171125-17:33:51] lexicon (/corpora/data/grac/lc) make_lex_srt_file
[20171125-17:33:58] Creating regular expression optimization attribute lc
[20171125-17:35:24] lexicon (/corpora/data/grac/lc.regex) make_lex_srt_file
[20171125-17:35:25] Creating dynamic attribute lemma_lc
[20171125-17:35:26] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:35:29] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:35:35] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:35:42] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:35:52] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:36:08] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:36:26] lexicon (/corpora/data/grac/lemma_lc) make_lex_srt_file
[20171125-17:36:37] Creating regular expression optimization attribute lemma_lc
[20171125-17:38:24] lexicon (/corpora/data/grac/lemma_lc.regex) make_lex_srt_file
[20171125-17:38:25] Summary of errors encountered in the input file:
[20171125-17:38:25] 1 times: warning type 'closing structure automatically'
[20171125-17:38:25] Only first 100 occurrencies emitted, use -v to emit each occurrence.
Compiling frequencies...
100 %
Compiling arf for attribute word
freq already compiled, skipping.
Compiling docf for attribute word
100 %oc" structure (DOCSTRUCTURE) available.            Can't compile document freqs for word.
Compiling aldf for attribute word
100 %
Compiling arf for attribute lemma
freq already compiled, skipping.
Compiling docf for attribute lemma
100 %oc" structure (DOCSTRUCTURE) available.            Can't compile document freqs for lemma.
Compiling aldf for attribute lemma
100 %
Compiling arf for attribute tag
freq already compiled, skipping.
Compiling docf for attribute tag
100 %oc" structure (DOCSTRUCTURE) available.            Can't compile document freqs for tag.
Compiling aldf for attribute tag
100 %
Compiling arf for attribute lc
freq already compiled, skipping.
Compiling docf for attribute lc
100 %oc" structure (DOCSTRUCTURE) available.            Can't compile document freqs for lc.
Compiling aldf for attribute lc
100 %
Compiling arf for attribute lemma_lc
freq already compiled, skipping.
Compiling docf for attribute lemma_lc
100 %oc" structure (DOCSTRUCTURE) available.            Can't compile document freqs for lemma_lc.
Compiling aldf for attribute lemma_lc
SUBCDEF path not specified in the configuration file; skipping subcorpora...
Compiling word sketches disabled; skipping...
Compiling longest commonest match disabled; skipping...
Compiling terms disabled; skipping...
Compiling word sketch hashes disabled; skipping...
Compiling thesaurus disabled; skipping...
Compiling histograms disabled; skipping...
No parallel corpora specified in ALIGNED; skipping alignment...
No document structure (doc) found
ls: cannot access '/corpora/data/grac//doc.*.norm': No such file or directory
No parallel corpora specified in ALIGNED; skipping alignment size computations...
Sizes compiled
Compiling bilingual dictionaries disabled; skipping...
Compiling bilingual terminology disabled; skipping...
Compiling trends disabled; skipping...
Warning: WPOSLIST not specified
Warning: lempos could be created from tag and lemma
Warning: Number of columns in vertical does not match number of attributes.
Error: Missing or corrupted lexicon for font.type
Error: Missing or corrupted lexicon for head.type
Error: Invalid size of structure font

Error: Invalid size of structure head

Error: Invalid size of structure g

Checking corpus grac:
 * correct filename format and parseability of the registry file
 * basic information
 * vertical
 * paths
 * lowercase attributes
 * URL stuff
 * subcorpattrs
 * sizes
 * lexicon queries
 * dynamic attributes
 * word sketches
 * structure sanity
<---
ERROR: line 921 - command 'corpcheck $CORPUS' exited with status: 1
   ... Error at ::main::main called at line 934
--->
Writing log to /corpora/data/grac/log/compilecorp_2017-11-25_1657.log
<---
ERROR: line 934 - command 'tee "$TMPLOGFILE"' exited with status: 1
   ... Error at ::main called at line 0
--->


The insta. file was:

MAINTAINER "ruprecht....@gmail.com"
INFO "General Regionally Annotated Corpus of Ukrainian"
NAME "Grac"
PATH "/corpora/data/grac"
ENCODING "UTF-8"
LANGUAGE "Ukrainian"
VERTICAL "/data/UkrRegCorp/fullcorpus.cwb.txt"

INFOHREF "http://parasolcorpus.org/Kyiv"
TAGSETDOC "http://parasolcorpus.org/Kyiv/"

FULLREF "doc.file,doc.n"

ATTRIBUTE   word


ATTRIBUTE   lemma {
       MULTIVALUE y
       MULTISEP "|"
}



ATTRIBUTE   tag {
       MULTIVALUE y
       MULTISEP "|"
}


ATTRIBUTE   lc {
    LABEL    "word (lowercase)"
    DYNAMIC  utf8lowercase
    DYNLIB   internal
    ARG1     "C"
    FUNTYPE  s
    FROMATTR word
    TYPE     index
    TRANSQUERY    yes
}

ATTRIBUTE   lemma_lc {
    LABEL    "lemma (lowercase)"
    DYNAMIC  utf8lowercase
    DYNLIB   internal
    ARG1     "C"
    FUNTYPE  s
    FROMATTR lemma
    TYPE     index
    TRANSQUERY    yes
}

STRUCTURE meta {
    ATTRIBUTE    file
    ATTRIBUTE    author
    ATTRIBUTE    title
    ATTRIBUTE    date
    ATTRIBUTE    original
    ATTRIBUTE    genre
    ATTRIBUTE    region
    ATTRIBUTE    translator
    ATTRIBUTE born
}

STRUCTURE font {
    ATTRIBUTE type
}

STRUCTURE head {
    ATTRIBUTE type
}

STRUCTURE s

STRUCTURE g {
   DISPLAYTAG 0
   DISPLAYBEGIN "_EMPTY_"
}


Any idea what I am doing wrong?
Thanks! Ruprecht

Miloš Jakubíček

unread,
Nov 25, 2017, 5:43:33 PM11/25/17
to Ruprecht von Waldenfels, NoSketch Engine
Hi Ruprecht,

the errors are unrelated to that:

>Error: Missing or corrupted lexicon for font.type
>Error: Missing or corrupted lexicon for head.type
>Error: Invalid size of structure font
>Error: Invalid size of structure head
>Error: Invalid size of structure g

That just means you have no font, head or g structures, so you should remove them from the configuration file.

Best
Milos

Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton UK
http://www.sketchengine.co.uk

Ruprecht von Waldenfels

unread,
Nov 26, 2017, 2:20:31 AM11/26/17
to Miloš Jakubíček, NoSketch Engine
Hi, there still is some problem. Do you understand? Thank you for your help! Ruprecht
--> -->
 
 
<class 'manatee.AttrNotFound'>
Python 2.7.12: /usr/bin/python
Sun Nov 26 08:18:44 2017

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

 /var/www/bonito/run.cgi in ()
     82         print "</pre>"
     83     else:
=>   84         BonitoCGI(user=username).run_unprotected (selectorname='corpname')
     85 
     86 # vim: ts=4 sw=4 sta et sts=4 si tw=80:
BonitoCGI = <class __main__.BonitoCGI>, user undefined, username = None, ).run_unprotected undefined, selectorname undefined
 /usr/lib/python2.7/dist-packages/bonito/CGIPublisher.py in run_unprotected(self=<__main__.BonitoCGI instance>, path=['first'], selectorname='corpname', outf=<open file '<stdout>', mode 'w'>)
    262                     result = self.error_template % err_msg
    263         self.output_headers()
=>  264         self.output_result (methodname, tmpl, result, outf)
    265 
    266     def process_method (self, methodname, pos_args, named_args):
self = <__main__.BonitoCGI instance>, self.output_result = <bound method BonitoCGI.output_result of <__main__.BonitoCGI instance>>, methodname = 'first_form', tmpl = 'first_form.tmpl', result = {'AttrList': [{'label': u'word', 'n': u'word'}, {'label': u'lemma', 'n': u'lemma'}, {'label': u'tag', 'n': u'tag'}], 'Corplist': [{'id': 'grac', 'name': u'Grac'}, {'id': u'susanne', 'name': u'Susanne'}], 'Globals': [{'name': 'corpname', 'value': u'grac'}, {'name': 'refs', 'value': ''}, {'name': 'iquery', 'value': u'w'}], 'Lposlist': [], 'Q': [], 'StructAttrList': [{'label': u'doc.file', 'n': u'doc.file'}, {'label': u'doc.author', 'n': u'doc.author'}, {'label': u'doc.title', 'n': u'doc.title'}, {'label': u'doc.date', 'n': u'doc.date'}, {'label': u'doc.original', 'n': u'doc.original'}, {'label': u'doc.genre', 'n': u'doc.genre'}, {'label': u'doc.region', 'n': u'doc.region'}, {'label': u'doc.translator', 'n': u'doc.translator'}, {'label': u'doc.born', 'n': u'doc.born'}], 'Wposlist': [], '_bonito_version': 'open-3.99.9', '_version': u'2.36.5-open-2.151.5-open-3.99.9', 'can_wseval': '', ...}, outf = <open file '<stdout>', mode 'w'>
 /usr/lib/python2.7/dist-packages/bonito/CGIPublisher.py in output_result(self=<__main__.BonitoCGI instance>, methodname='first_form', template='first_form.tmpl', result={'AttrList': [{'label': u'word', 'n': u'word'}, {'label': u'lemma', 'n': u'lemma'}, {'label': u'tag', 'n': u'tag'}], 'Corplist': [{'id': 'grac', 'name': u'Grac'}, {'id': u'susanne', 'name': u'Susanne'}], 'Globals': [{'name': 'corpname', 'value': u'grac'}, {'name': 'refs', 'value': ''}, {'name': 'iquery', 'value': u'w'}], 'Lposlist': [], 'Q': [], 'StructAttrList': [{'label': u'doc.file', 'n': u'doc.file'}, {'label': u'doc.author', 'n': u'doc.author'}, {'label': u'doc.title', 'n': u'doc.title'}, {'label': u'doc.date', 'n': u'doc.date'}, {'label': u'doc.original', 'n': u'doc.original'}, {'label': u'doc.genre', 'n': u'doc.genre'}, {'label': u'doc.region', 'n': u'doc.region'}, {'label': u'doc.translator', 'n': u'doc.translator'}, {'label': u'doc.born', 'n': u'doc.born'}], 'Wposlist': [], '_bonito_version': 'open-3.99.9', '_version': u'2.36.5-open-2.151.5-open-3.99.9', 'can_wseval': '', ...}, outf=<open file '<stdout>', mode 'w'>)
    374             self.set_localisation() # in case run_protected has not ran (CA)
    375             self._add_globals (result)
=>  376             self.add_undefined (result, methodname)
    377             result = self.rec_recode(result)
    378             for attr in dir(self): # recoding self
self = <__main__.BonitoCGI instance>, self.add_undefined = <bound method BonitoCGI.add_undefined of <__main__.BonitoCGI instance>>, result = {'AttrList': [{'label': u'word', 'n': u'word'}, {'label': u'lemma', 'n': u'lemma'}, {'label': u'tag', 'n': u'tag'}], 'Corplist': [{'id': 'grac', 'name': u'Grac'}, {'id': u'susanne', 'name': u'Susanne'}], 'Globals': [{'name': 'corpname', 'value': u'grac'}, {'name': 'refs', 'value': ''}, {'name': 'iquery', 'value': u'w'}], 'Lposlist': [], 'Q': [], 'StructAttrList': [{'label': u'doc.file', 'n': u'doc.file'}, {'label': u'doc.author', 'n': u'doc.author'}, {'label': u'doc.title', 'n': u'doc.title'}, {'label': u'doc.date', 'n': u'doc.date'}, {'label': u'doc.original', 'n': u'doc.original'}, {'label': u'doc.genre', 'n': u'doc.genre'}, {'label': u'doc.region', 'n': u'doc.region'}, {'label': u'doc.translator', 'n': u'doc.translator'}, {'label': u'doc.born', 'n': u'doc.born'}], 'Wposlist': [], '_bonito_version': 'open-3.99.9', '_version': u'2.36.5-open-2.151.5-open-3.99.9', 'can_wseval': '', ...}, methodname = 'first_form'
 /usr/lib/python2.7/dist-packages/bonito/conccgi.py in add_undefined(self=<__main__.BonitoCGI instance>, result={'AttrList': [{'label': u'word', 'n': u'word'}, {'label': u'lemma', 'n': u'lemma'}, {'label': u'tag', 'n': u'tag'}], 'Corplist': [{'id': 'grac', 'name': u'Grac'}, {'id': u'susanne', 'name': u'Susanne'}], 'Globals': [{'name': 'corpname', 'value': u'grac'}, {'name': 'refs', 'value': ''}, {'name': 'iquery', 'value': u'w'}], 'Lposlist': [], 'Q': [], 'StructAttrList': [{'label': u'doc.file', 'n': u'doc.file'}, {'label': u'doc.author', 'n': u'doc.author'}, {'label': u'doc.title', 'n': u'doc.title'}, {'label': u'doc.date', 'n': u'doc.date'}, {'label': u'doc.original', 'n': u'doc.original'}, {'label': u'doc.genre', 'n': u'doc.genre'}, {'label': u'doc.region', 'n': u'doc.region'}, {'label': u'doc.translator', 'n': u'doc.translator'}, {'label': u'doc.born', 'n': u'doc.born'}], 'Wposlist': [], '_bonito_version': 'open-3.99.9', '_version': u'2.36.5-open-2.151.5-open-3.99.9', 'can_wseval': '', ...}, methodname='first_form')
    413 
    414         if 'TextTypeSel' in names:
=>  415             result['TextTypeSel'] = self.texttypes_with_norms(ret_nums=False)
    416         if 'auto_pos' in names:
    417             result['auto_pos'] = bool(self._corp().get_conf('WSPOSLIST'))
result = {'AttrList': [{'label': u'word', 'n': u'word'}, {'label': u'lemma', 'n': u'lemma'}, {'label': u'tag', 'n': u'tag'}], 'Corplist': [{'id': 'grac', 'name': u'Grac'}, {'id': u'susanne', 'name': u'Susanne'}], 'Globals': [{'name': 'corpname', 'value': u'grac'}, {'name': 'refs', 'value': ''}, {'name': 'iquery', 'value': u'w'}], 'Lposlist': [], 'Q': [], 'StructAttrList': [{'label': u'doc.file', 'n': u'doc.file'}, {'label': u'doc.author', 'n': u'doc.author'}, {'label': u'doc.title', 'n': u'doc.title'}, {'label': u'doc.date', 'n': u'doc.date'}, {'label': u'doc.original', 'n': u'doc.original'}, {'label': u'doc.genre', 'n': u'doc.genre'}, {'label': u'doc.region', 'n': u'doc.region'}, {'label': u'doc.translator', 'n': u'doc.translator'}, {'label': u'doc.born', 'n': u'doc.born'}], 'Wposlist': [], '_bonito_version': 'open-3.99.9', '_version': u'2.36.5-open-2.151.5-open-3.99.9', 'can_wseval': '', ...}, self = <__main__.BonitoCGI instance>, self.texttypes_with_norms = <bound method BonitoCGI.texttypes_with_norms of <__main__.BonitoCGI instance>>, ret_nums undefined, builtin False = False
 /usr/lib/python2.7/dist-packages/bonito/conccgi.py in texttypes_with_norms(self=<__main__.BonitoCGI instance>, subcorpattrs=u'doc.file,doc.n', list_all=False, ret_nums=False)
   1913                      'Normslist': [], 'Blocks': [],
   1914                    }
=> 1915         tt = corplib.texttype_values(corp, subcorpattrs, list_all, self.hidenone)
   1916         if not ret_nums: return {'Blocks': tt, 'Normslist': []}
   1917         basestructname = subcorpattrs.split('.')[0]
tt undefined, global corplib = <module 'corplib' from '/usr/lib/python2.7/dist-packages/bonito/corplib.py'>, corplib.texttype_values = <function texttype_values>, corp = <manatee.Corpus; proxy of <Swig Object of type 'Corpus *' at 0x7f41a0f6b5a0> >, subcorpattrs = u'doc.file,doc.n', list_all = False, self = <__main__.BonitoCGI instance>, self.hidenone = 1
 /usr/lib/python2.7/dist-packages/bonito/corplib.py in texttype_values(corp=<manatee.Corpus; proxy of <Swig Object of type 'Corpus *' at 0x7f41a0f6b5a0> >, subcorpattrs=u'doc.file,doc.n', list_all=False, hidenone=1)
    416                 n = n[1:]
    417                 slurp = 1
=>  418             attr = corp.get_attr (n)
    419             attrval = { 'name': n,
    420                         'label': corp.get_conf (n+'.LABEL') or n,
attr = <manatee.PosAttr; proxy of <Swig Object of type 'PosAttr *' at 0x7f41a106c0f0> >, corp = <manatee.Corpus; proxy of <Swig Object of type 'Corpus *' at 0x7f41a0f6b5a0> >, corp.get_attr = <bound method Corpus.get_attr of <manatee.Corpus...g Object of type 'Corpus *' at 0x7f41a0f6b5a0> >>, n = u'doc.n'
 /usr/lib/python2.7/dist-packages/manatee.py in get_attr(self=<manatee.Corpus; proxy of <Swig Object of type 'Corpus *' at 0x7f41a0f6b5a0> >, name=u'doc.n', struct_attr=False)
    908 
    909     def get_attr(self, name, struct_attr=False):
=>  910         return _manatee.Corpus_get_attr(self, name, struct_attr)
    911 
    912     def get_struct(self, name):
global _manatee = <module '_manatee' from '/usr/lib/python2.7/dist-packages/_manatee.so'>, _manatee.Corpus_get_attr = <built-in function Corpus_get_attr>, self = <manatee.Corpus; proxy of <Swig Object of type 'Corpus *' at 0x7f41a0f6b5a0> >, name = u'doc.n', struct_attr = False

<class 'manatee.AttrNotFound'>: AttrNotFound (n)
      args = ()
      message = ''
      this = <Swig Object of type 'AttrNotFound *'>

Ruprecht von Waldenfels

unread,
Nov 26, 2017, 4:14:12 AM11/26/17
to Miloš Jakubíček, NoSketch Engine
Ura! It works!



Am 26.11.2017 um 01:42 schrieb Miloš Jakubíček:

Ruprecht von Waldenfels

unread,
Nov 26, 2017, 4:28:07 AM11/26/17
to Miloš Jakubíček, NoSketch Engine
Miloš,
it works, it's great.

Is the programming well documented? In other words, if we wanted implement a different interface for manatee, and release it as Open Source, something like this: http://www.parasolcorpus.org/Pushkino/login.php
would that be easily feasible, i.e., the interface API to manatee is well documented and you guys would not see this as an intrusion?

So far, just asking!
Ruprecht


 



Am 26.11.2017 um 01:42 schrieb Miloš Jakubíček:

Miloš Jakubíček

unread,
Nov 27, 2017, 6:18:15 PM11/27/17
to Ruprecht von Waldenfels, NoSketch Engine
Hi Ruprecht,

I would not say it is well documented but it is at least somewhat documented, you can see the SWIG interface file api/manatee.i

By all means you are free to do whatever you want on top of Manatee, just please bear in mind that:

- there are already two open source front-ends, Bonito and Kontext
- a third one is to be released soon (see http://alpha.sketchengine.co.uk)
- while we try to maintain backward compatibility as far as we can, we cannot promise anything as for helping you keeping the pace with Manatee etc.
- Manatee has a Go-port for a while which we plan to switch to, at the moment that is fully compatible but might not be in the future

Best
Milos


Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton UK
http://www.sketchengine.co.uk

Reply all
Reply to author
Forward
0 new messages