Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1028473: dictionaries-common: problem in russian dict. Word '��� ��' contains illegal characters.

22 views
Skip to first unread message

Jason Lee Quinn

unread,
Jan 11, 2023, 11:20:04 AM1/11/23
to
Package: dictionaries-common
Version: 1.29.3
Severity: minor
X-Debbugs-Cc: jason.lee.q...@gmail.com

Dear Maintainer,

About two weeks ago on a fresh install of Debian Bookworm
from a daily installer build I
came accross a dictionary error related to the
installation of synaptic. The relavent output is

--------

Setting up synaptic (0.91.2) ...
Processing triggers for dictionaries-common (1.29.3) ...
ispell-autobuildhash: Processing 'american' dict.
ispell-autobuildhash: Processing 'brasilero' dict.
ispell-autobuildhash: Processing 'british' dict.
ispell-autobuildhash: Processing 'catala' dict.
ispell-autobuildhash: Processing 'danish' dict.
ispell-autobuildhash: Processing 'espa-nol' dict.
ispell-autobuildhash: Processing 'lietuviu' dict.
ispell-autobuildhash: Processing 'ngerman' dict.
ispell-autobuildhash: Processing 'polish' dict.
ispell-autobuildhash: Processing 'portugues' dict.
ispell-autobuildhash: Processing 'russian' dict.

Word '��� ��' contains illegal characters.
ispell-autobuildhash: Processing 'swiss' dict.

--------

It looks to be an error in a dictionary file
but I never selected any language except English so
this is default behavior and as far as I can
tell I do not even have the russian dictionary
installed at all.

My best guess is that this is a issue in
dictionaries-common/dc-deconf-select.pl and/or
related files related to dpkg triggers.

If you'd like more details about this just suggest
what extra info you'd like and I can try to
supply it.

Cheers,
Jason

-- System Information:
Debian Release: bookworm/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-6-amd64 (SMP w/24 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages dictionaries-common depends on:
ii debconf [debconf-2.0] 1.5.81
ii emacsen-common 3.0.5
ii libtext-iconv-perl 1.7-8

dictionaries-common recommends no packages.

Versions of packages dictionaries-common suggests:
ii aspell 0.60.8-4+b1
ii ispell 3.4.05-1
ii wamerican [wordlist] 2020.12.07-2

-- debconf information:
dictionaries-common/ispell-autobuildhash-message:
* dictionaries-common/default-ispell: american (American English)
* dictionaries-common/default-wordlist: american (American English)
dictionaries-common/selecting_ispell_wordlist_default:
dictionaries-common/invalid_debconf_value:
dictionaries-common/debconf_database_corruption:
dictionaries-common/old_wordlist_link: true

Agustin Martin

unread,
Jan 11, 2023, 2:30:03 PM1/11/23
to
El mié, 11 ene 2023 a las 17:15, Jason Lee Quinn
(<jason.lee.q...@gmail.com>) escribió:
HI,

This is a harmless message during hash creation for russian ispell
dict. Nothing to worry about.

> It looks to be an error in a dictionary file
> but I never selected any language except English so
> this is default behavior and as far as I can
> tell I do not even have the russian dictionary
> installed at all.

By the way, you have all those ispell dicts installed although you may
have not explicitly installed them.

Thanks for your contribution to Debian

--
Agustin

Jason Lee Quinn

unread,
Jan 14, 2023, 9:40:04 AM1/14/23
to
Package: dictionaries-common
Version: 1.29.3
Followup-For: Bug #1028473
X-Debbugs-Cc: jason.lee.q...@gmail.com

Thank you for the reply.

If these dictionaries are installed,
where are they located? I've searched
/usr/lib/ispell and many other places can only find
american and british dictionaries on my machine.

Also where does the "contains illegal characters" message
actually come from? Whatever the source of that messgae,
I'm having trouble tracking it down. A techincal explaination
about why the message is harmless would also be interesting
for me. Perhaps the message itself and the logic that produces
it could be improved to provide a nicer user experience.

Agustin Martin

unread,
Jan 15, 2023, 3:02:20 PM1/15/23
to
Control: reassign -1 irussian
Control: tags -1 + patch pending

El sáb, 14 ene 2023 a las 15:39, Jason Lee Quinn
(<jason.lee.q...@gmail.com>) escribió:
>
> Package: dictionaries-common
> Version: 1.29.3
> Followup-For: Bug #1028473
> X-Debbugs-Cc: jason.lee.q...@gmail.com
>
> Thank you for the reply.
>
> If these dictionaries are installed,
> where are they located? I've searched
> /usr/lib/ispell and many other places can only find
> american and british dictionaries on my machine.

Hi,

You should find relevant files under different dirs. In one of my boxes

$ dir /usr/lib/ispell/ /usr/share/ispell/ /var/lib/ispell/
/var/lib/dictionaries-common/ispell/

/usr/lib/ispell/:
american.aff castellano.hash english.aff espanol.hash
spanish.hash
american.hash default.aff espa~nol.aff README.select-ispell
castellano.aff default.hash espa~nol.hash spanish.aff

/usr/share/ispell/:
american.med+.mwl.gz american.mwl.gz english.aff espa~nol.mwl.gz

/var/lib/dictionaries-common/ispell/:
iamerican ispanish

/var/lib/ispell/:
american.compat american.hash american.remove espa~nol.compat
espa~nol.hash espa~nol.remove README

If there is no trace of those dicts in your dirs, synaptic is
upgrading something else (A virtual machine?)

> Also where does the "contains illegal characters" message
> actually come from? Whatever the source of that messgae,
> I'm having trouble tracking it down. A techincal explaination
> about why the message is harmless would also be interesting
> for me. Perhaps the message itself and the logic that produces
> it could be improved to provide a nicer user experience.

Munched word in line 39 of original russian ispell dict contains
whitespace, which is allowed only as word separator, not in the middle
of a word. Then ispell (and aspell) complains about it and skips that
word, that is the message. This is harmless because its only result is
that word is skipped.

Attached patch should strip that word before package is built, thus
making the message go away. I am reassigning this bug report to
irussian (I am also uploader for it) and will upload a package with
this fix unless maintainer disagree.

Thanks again for your feedback.

--
Agustin
irussian-whitespace.diff
0 new messages