Thanks - but for some reason, it didn't
work anyway. It said:
arallel@vmd20395:/corpora$ sudo compilecorp --recompile-corpus
--no-ske grac /data/UkrRegCorp/fullcorpus.cwb.txt
Manatee version: 2.36.5-open-2.151.5
Reading corpus configuration...
PATH=/corpora/data/grac/
VERTICAL=/data/UkrRegCorp/fullcorpus.cwb.txt
WSDEF=
WSHIST=
SUBCDEF=
WSBASE=
WSATTR=
WSTHES=
WSMINHITS=
WSOLDSCORES=
ALIGNDEF=
ALIGNED=
TERMDEF=
TERMBASE=
ATTRLIST=word,lemma,tag,lc,lemma_lc
DIACHRONIC=
Vertical text will be read from
/data/UkrRegCorp/fullcorpus.cwb.txt
Deleting corpus PATH directory...
Compiling corpus...
[20171125-16:57:20] Available memory 7015MB, estimated usage 159MB
[20171125-16:57:22] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:57:23] lexicon (/corpora/data/grac/word)
make_lex_srt_file
^C
parallel@vmd20395:/corpora$ sudo compilecorp --recompile-corpus
--no-ske grac /data/UkrRegCorp/fullcorpus.cwb.txt
Manatee version: 2.36.5-open-2.151.5
Reading corpus configuration...
PATH=/corpora/data/grac/
VERTICAL=/data/UkrRegCorp/fullcorpus.cwb.txt
WSDEF=
WSHIST=
SUBCDEF=
WSBASE=
WSATTR=
WSTHES=
WSMINHITS=
WSOLDSCORES=
ALIGNDEF=
ALIGNED=
TERMDEF=
TERMBASE=
ATTRLIST=word,lemma,tag,lc,lemma_lc
DIACHRONIC=
Corpus is compiled
Vertical text will be read from
/data/UkrRegCorp/fullcorpus.cwb.txt
Deleting corpus PATH directory...
Compiling corpus...
[20171125-16:57:43] Available memory 7015MB, estimated usage 159MB
[20171125-16:57:44] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:57:45] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:57:47] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:57:47] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:57:50] Processed 1000000 lines, 941435 positions.
[20171125-16:57:51] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:57:52] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:57:58] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:57:59] Processed 2000000 lines, 1869916 positions.
[20171125-16:58:00] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:58:09] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:58:11] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:58:15] Processed 3000000 lines, 2804583 positions.
[20171125-16:58:22] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:58:28] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:58:31] Processed 4000000 lines, 3736421 positions.
[20171125-16:58:36] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:58:45] Processed 5000000 lines, 4658867 positions.
[20171125-16:58:46] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:58:49] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:58:56] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:58:58] Processed 6000000 lines, 5603229 positions.
[20171125-16:59:02] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:59:13] Processed 7000000 lines, 6534299 positions.
[20171125-16:59:14] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:59:22] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:59:32] Processed 8000000 lines, 7466812 positions.
[20171125-16:59:38] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-16:59:45] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-16:59:47] Processed 9000000 lines, 8383019 positions.
[20171125-17:00:00] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:00:06] Processed 10000000 lines, 9299651 positions.
[20171125-17:00:19] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:00:26] Processed 11000000 lines, 10234199 positions.
[20171125-17:00:37] Processed 12000000 lines, 11170881 positions.
[20171125-17:00:53] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:00:56] Processed 13000000 lines, 12089412 positions.
[20171125-17:01:11] Processed 14000000 lines, 13009533 positions.
[20171125-17:01:13] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:01:29] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:01:32] Processed 15000000 lines, 13944968 positions.
[20171125-17:01:46] Processed 16000000 lines, 14888589 positions.
[20171125-17:01:50] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:02:03] Processed 17000000 lines, 15827366 positions.
[20171125-17:02:06] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:02:18] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:02:20] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:02:25] Processed 18000000 lines, 16740055 positions.
[20171125-17:02:37] Processed 19000000 lines, 17677076 positions.
[20171125-17:02:47] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:02:53] Processed 20000000 lines, 18623208 positions.
[20171125-17:02:56] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:03:09] Processed 21000000 lines, 19560477 positions.
[20171125-17:03:21] Processed 22000000 lines, 20492187 positions.
[20171125-17:03:35] Processed 23000000 lines, 21437736 positions.
[20171125-17:03:48] Processed 24000000 lines, 22372023 positions.
[20171125-17:03:54] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:04:03] Processed 25000000 lines, 23303377 positions.
[20171125-17:04:16] Processed 26000000 lines, 24208634 positions.
[20171125-17:04:30] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:04:36] Processed 27000000 lines, 25135095 positions.
[20171125-17:04:50] Processed 28000000 lines, 26072812 positions.
[20171125-17:05:03] Processed 29000000 lines, 27001164 positions.
[20171125-17:05:22] Processed 30000000 lines, 27940447 positions.
[20171125-17:05:24] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:05:42] Processed 31000000 lines, 28866398 positions.
[20171125-17:05:56] Processed 32000000 lines, 29756404 positions.
[20171125-17:06:13] Processed 33000000 lines, 30681813 positions.
[20171125-17:06:30] Processed 34000000 lines, 31613975 positions.
[20171125-17:06:46] Processed 35000000 lines, 32545712 positions.
[20171125-17:07:02] Processed 36000000 lines, 33453751 positions.
[20171125-17:07:17] Processed 37000000 lines, 34382006 positions.
[20171125-17:07:31] Processed 38000000 lines, 35311298 positions.
[20171125-17:07:41] Processed 39000000 lines, 36235561 positions.
[20171125-17:07:55] Processed 40000000 lines, 37156531 positions.
[20171125-17:08:06] Processed 41000000 lines, 38081518 positions.
[20171125-17:08:19] Processed 42000000 lines, 39014081 positions.
[20171125-17:08:34] Processed 43000000 lines, 39941683 positions.
[20171125-17:08:48] Processed 44000000 lines, 40878402 positions.
[20171125-17:09:01] Processed 45000000 lines, 41808789 positions.
[20171125-17:09:16] Processed 46000000 lines, 42737082 positions.
[20171125-17:09:32] Processed 47000000 lines, 43657624 positions.
[20171125-17:09:46] Processed 48000000 lines, 44591218 positions.
[20171125-17:09:58] Processed 49000000 lines, 45515243 positions.
[20171125-17:10:14] Processed 50000000 lines, 46444271 positions.
[20171125-17:10:28] Processed 51000000 lines, 47388562 positions.
[20171125-17:10:41] Processed 52000000 lines, 48295519 positions.
[20171125-17:10:54] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:11:02] Processed 53000000 lines, 49227796 positions.
[20171125-17:11:14] Processed 54000000 lines, 50157101 positions.
[20171125-17:11:28] Processed 55000000 lines, 51074839 positions.
[20171125-17:11:44] Processed 56000000 lines, 52000585 positions.
[20171125-17:11:59] Processed 57000000 lines, 52925670 positions.
[20171125-17:12:12] Processed 58000000 lines, 53843933 positions.
[20171125-17:12:22] Processed 59000000 lines, 54749523 positions.
[20171125-17:12:36] Processed 60000000 lines, 55690570 positions.
[20171125-17:12:48] Processed 61000000 lines, 56630502 positions.
[20171125-17:13:03] Processed 62000000 lines, 57555587 positions.
[20171125-17:13:17] Processed 63000000 lines, 58499809 positions.
[20171125-17:13:31] Processed 64000000 lines, 59436555 positions.
[20171125-17:13:45] Processed 65000000 lines, 60375349 positions.
[20171125-17:13:55] Processed 66000000 lines, 61309220 positions.
[20171125-17:14:08] Processed 67000000 lines, 62176394 positions.
[20171125-17:14:21] Processed 68000000 lines, 63056602 positions.
[20171125-17:14:33] Processed 69000000 lines, 63980677 positions.
[20171125-17:14:48] Processed 70000000 lines, 64919163 positions.
[20171125-17:14:59] Processed 71000000 lines, 65863393 positions.
[20171125-17:15:14] Processed 72000000 lines, 66779290 positions.
[20171125-17:15:28] Processed 73000000 lines, 67712714 positions.
[20171125-17:15:38] Processed 74000000 lines, 68658259 positions.
[20171125-17:15:52] Processed 75000000 lines, 69583546 positions.
[20171125-17:16:07] Processed 76000000 lines, 70523415 positions.
[20171125-17:16:25] Processed 77000000 lines, 71468314 positions.
[20171125-17:16:43] Processed 78000000 lines, 72404352 positions.
[20171125-17:16:57] Processed 79000000 lines, 73337098 positions.
[20171125-17:17:09] Processed 80000000 lines, 74266046 positions.
[20171125-17:17:21] Processed 81000000 lines, 75181042 positions.
[20171125-17:17:37] Processed 82000000 lines, 76108612 positions.
[20171125-17:17:49] Processed 83000000 lines, 77049366 positions.
[20171125-17:18:00] Processed 84000000 lines, 77975483 positions.
[20171125-17:18:11] Processed 85000000 lines, 78913531 positions.
[20171125-17:18:23] Processed 86000000 lines, 79838009 positions.
[20171125-17:18:38] Processed 87000000 lines, 80779235 positions.
[20171125-17:18:52] Processed 88000000 lines, 81718340 positions.
[20171125-17:19:08] Processed 89000000 lines, 82645042 positions.
[20171125-17:19:20] Processed 90000000 lines, 83579701 positions.
[20171125-17:19:36] Processed 91000000 lines, 84497938 positions.
[20171125-17:19:53] Processed 92000000 lines, 85423211 positions.
[20171125-17:20:12] Processed 93000000 lines, 86354922 positions.
[20171125-17:20:29] Processed 94000000 lines, 87286711 positions.
[20171125-17:20:47] Processed 95000000 lines, 88211773 positions.
[20171125-17:21:06] Processed 96000000 lines, 89141731 positions.
[20171125-17:21:21] Processed 97000000 lines, 90079443 positions.
[20171125-17:21:36] Processed 98000000 lines, 91014352 positions.
[20171125-17:21:51] Processed 99000000 lines, 91941439 positions.
[20171125-17:22:00] Processed 100000000 lines, 92864670 positions.
[20171125-17:22:13] Processed 101000000 lines, 93797682 positions.
[20171125-17:22:31] Processed 102000000 lines, 94727364 positions.
[20171125-17:22:46] Processed 103000000 lines, 95665447 positions.
[20171125-17:23:05] Processed 104000000 lines, 96606358 positions.
[20171125-17:23:24] Processed 105000000 lines, 97542597 positions.
[20171125-17:23:42] Processed 106000000 lines, 98479431 positions.
[20171125-17:24:00] Processed 107000000 lines, 99422504 positions.
[20171125-17:24:18] Processed 108000000 lines, 100349821
positions.
[20171125-17:24:34] Processed 109000000 lines, 101289313
positions.
[20171125-17:24:48] Processed 110000000 lines, 102227709
positions.
[20171125-17:25:01] Processed 111000000 lines, 103162094
positions.
[20171125-17:25:17] Processed 112000000 lines, 104098809
positions.
[20171125-17:25:29] Processed 113000000 lines, 105031884
positions.
[20171125-17:25:43] Processed 114000000 lines, 105955967
positions.
[20171125-17:25:57] Processed 115000000 lines, 106884247
positions.
[20171125-17:26:10] Reading input file finished.
[20171125-17:26:10] Closing structure font ...
[20171125-17:26:10] warning: structure attribute (font.type) never
present
[20171125-17:26:10] ... finished.
[20171125-17:26:10] Closing structure g ...
[20171125-17:26:10] ... finished.
[20171125-17:26:10] Closing structure head ...
[20171125-17:26:10] warning: structure attribute (head.type) never
present
[20171125-17:26:10] ... finished.
[20171125-17:26:10] Closing structure meta ...
[20171125-17:26:10] warning: structure attribute (meta.file) never
present
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.author)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.born)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.date)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.file)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.genre)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.original)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.region)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.title)
make_lex_srt_file
[20171125-17:26:10] lexicon (/corpora/data/grac/meta.translator)
make_lex_srt_file
[20171125-17:26:10] ... finished.
[20171125-17:26:10] Closing structure s ...
[20171125-17:26:10] /data/UkrRegCorp/fullcorpus.cwb.txt:115874154:
warning: auto-closing structure (s) opened at line
/data/UkrRegCorp/fullcorpus.cwb.txt:115874142:
[20171125-17:26:10] ... finished.
[20171125-17:26:10] Closing attribute /corpora/data/grac/word ...
[20171125-17:26:10] lexicon (/corpora/data/grac/word)
make_lex_srt_file
[20171125-17:26:55] Creating regular expression optimization
attribute word
[20171125-17:28:28] lexicon (/corpora/data/grac/word.regex)
make_lex_srt_file
[20171125-17:28:28] ... finished.
[20171125-17:28:28] Closing attribute /corpora/data/grac/lemma ...
[20171125-17:28:28] lexicon (/corpora/data/grac/lemma)
make_lex_srt_file
[20171125-17:29:50] Creating regular expression optimization
attribute lemma
[20171125-17:32:25] lexicon (/corpora/data/grac/lemma.regex)
make_lex_srt_file
[20171125-17:32:26] ... finished.
[20171125-17:32:26] Closing attribute /corpora/data/grac/tag ...
[20171125-17:32:26] lexicon (/corpora/data/grac/tag)
make_lex_srt_file
[20171125-17:33:14] Creating regular expression optimization
attribute tag
[20171125-17:33:18] lexicon (/corpora/data/grac/tag.regex)
make_lex_srt_file
[20171125-17:33:18] ... finished.
[20171125-17:33:18] Processed 115874154 lines, 107710157
positions.
[20171125-17:33:18] Creating dynamic attribute lc
[20171125-17:33:19] lexicon (/corpora/data/grac/lc)
make_lex_srt_file
[20171125-17:33:24] lexicon (/corpora/data/grac/lc)
make_lex_srt_file
[20171125-17:33:30] lexicon (/corpora/data/grac/lc)
make_lex_srt_file
[20171125-17:33:38] lexicon (/corpora/data/grac/lc)
make_lex_srt_file
[20171125-17:33:51] lexicon (/corpora/data/grac/lc)
make_lex_srt_file
[20171125-17:33:58] Creating regular expression optimization
attribute lc
[20171125-17:35:24] lexicon (/corpora/data/grac/lc.regex)
make_lex_srt_file
[20171125-17:35:25] Creating dynamic attribute lemma_lc
[20171125-17:35:26] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:35:29] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:35:35] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:35:42] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:35:52] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:36:08] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:36:26] lexicon (/corpora/data/grac/lemma_lc)
make_lex_srt_file
[20171125-17:36:37] Creating regular expression optimization
attribute lemma_lc
[20171125-17:38:24] lexicon (/corpora/data/grac/lemma_lc.regex)
make_lex_srt_file
[20171125-17:38:25] Summary of errors encountered in the input
file:
[20171125-17:38:25] 1 times: warning type 'closing structure
automatically'
[20171125-17:38:25] Only first 100 occurrencies emitted, use -v to
emit each occurrence.
Compiling frequencies...
100 %
Compiling arf for attribute word
freq already compiled, skipping.
Compiling docf for attribute word
100 %oc" structure (DOCSTRUCTURE) available. Can't
compile document freqs for word.
Compiling aldf for attribute word
100 %
Compiling arf for attribute lemma
freq already compiled, skipping.
Compiling docf for attribute lemma
100 %oc" structure (DOCSTRUCTURE) available. Can't
compile document freqs for lemma.
Compiling aldf for attribute lemma
100 %
Compiling arf for attribute tag
freq already compiled, skipping.
Compiling docf for attribute tag
100 %oc" structure (DOCSTRUCTURE) available. Can't
compile document freqs for tag.
Compiling aldf for attribute tag
100 %
Compiling arf for attribute lc
freq already compiled, skipping.
Compiling docf for attribute lc
100 %oc" structure (DOCSTRUCTURE) available. Can't
compile document freqs for lc.
Compiling aldf for attribute lc
100 %
Compiling arf for attribute lemma_lc
freq already compiled, skipping.
Compiling docf for attribute lemma_lc
100 %oc" structure (DOCSTRUCTURE) available. Can't
compile document freqs for lemma_lc.
Compiling aldf for attribute lemma_lc
SUBCDEF path not specified in the configuration file; skipping
subcorpora...
Compiling word sketches disabled; skipping...
Compiling longest commonest match disabled; skipping...
Compiling terms disabled; skipping...
Compiling word sketch hashes disabled; skipping...
Compiling thesaurus disabled; skipping...
Compiling histograms disabled; skipping...
No parallel corpora specified in ALIGNED; skipping alignment...
No document structure (doc) found
ls: cannot access '/corpora/data/grac//doc.*.norm': No such file
or directory
No parallel corpora specified in ALIGNED; skipping alignment size
computations...
Sizes compiled
Compiling bilingual dictionaries disabled; skipping...
Compiling bilingual terminology disabled; skipping...
Compiling trends disabled; skipping...
Warning: WPOSLIST not specified
Warning: lempos could be created from tag and lemma
Warning: Number of columns in vertical does not match number of
attributes.
Error: Missing or corrupted lexicon for font.type
Error: Missing or corrupted lexicon for head.type
Error: Invalid size of structure font
Error: Invalid size of structure head
Error: Invalid size of structure g
Checking corpus grac:
* correct filename format and parseability of the registry file
* basic information
* vertical
* paths
* lowercase attributes
* URL stuff
* subcorpattrs
* sizes
* lexicon queries
* dynamic attributes
* word sketches
* structure sanity
<---
ERROR: line 921 - command 'corpcheck $CORPUS' exited with status:
1
... Error at ::main::main called at line 934
--->
Writing log to
/corpora/data/grac/log/compilecorp_2017-11-25_1657.log
<---
ERROR: line 934 - command 'tee "$TMPLOGFILE"' exited with status:
1
... Error at ::main called at line 0
--->
The insta. file was:
MAINTAINER
"ruprecht....@gmail.com"
INFO "General Regionally Annotated Corpus of Ukrainian"
NAME "Grac"
PATH "/corpora/data/grac"
ENCODING "UTF-8"
LANGUAGE "Ukrainian"
VERTICAL "/data/UkrRegCorp/fullcorpus.cwb.txt"
INFOHREF
"http://parasolcorpus.org/Kyiv"
TAGSETDOC
"http://parasolcorpus.org/Kyiv/"
FULLREF "doc.file,doc.n"
ATTRIBUTE word