sed error compiling corpus

21 views
Skip to first unread message

Jean-Francois Burdet

unread,
Oct 16, 2022, 7:06:07 AM10/16/22
to NoSketch Engine
Hi there,
I'm using Nosketch for a project here at U. of Geneva.
I'm getting weird problem when I'm trying to compile some corpus using a dockerized setup (manatee-open-2.208.tar.gz)
Everything run smoothly beside that, I can compile sample susanne corpus without problem.

The error message is as follow :

[20221016-11:00:32] Creating dynamic attribute ascii
sed: -e expression #1, char 22: strings for `y' command are different lengths
[20221016-11:00:32] lexicon (/corpora/cheulex_de/indexed/ascii) writing FSA...
[20221016-11:00:32] Writing FSA finished
error: error: expected 65257 values, got 0
[20221016-11:00:32] ERROR: failed to create dynamic attribute ascii

How can I debug this ?

Jean-Francois Burdet

unread,
Oct 16, 2022, 11:00:58 AM10/16/22
to NoSketch Engine
Replying to myself ... it seems that changing my dynamic attribute definition from

ATTRIBUTE "ascii" {
    DYNAMIC "sed -e 'y/ÄÖÜäöü/AOUaou/' -e 's/ẞ/SS/g' -e 's/ß/ss/g'"
    DYNLIB "pipe"
    DYNTYPE "freq"
    FROMATTR "lc"
}

to

ATTRIBUTE "ascii" {
DYNAMIC "sed -e 's/ẞ/SS/g' -e 's/ß/ss/g' -e 's/Ä/A/g' -e 's/Ö/O/g' -e 's/Ü/U/g' -e 's/ä/a/g' -e 's/ö/o/g' -e 's/Ü/u/g'"
DYNLIB "pipe"
DYNTYPE "freq"
FROMATTR "lc"
}

Fixed my problem ...  so I believe "y/ /" sed expression trigered a bug in sed (which is 4.8)
Reply all
Reply to author
Forward
0 new messages