Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

145 views
Skip to first unread message

Soren Stoutner

unread,
Sep 20, 2022, 4:40:03 PM9/20/22
to
Package: dictionaries-common
Version: 1.28.18
Severity: wishlist
Tags: l10n

Qt WebEngine has the ability to use Hunspell dictionaries for spell checking with the WebEngine, but for some reason they require that the dictionary files be converted to a special binary format (.bdic). This conversion can be done using qwebengine_convert_dict from the qtwebengine5-dev-tools package. The upstream documentation regarding this is found on Qt's website:

https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker

Once these libraries are available they can be used by any program that includes Qt WebEngine.

The purpose of this bug report is to create a central location for discussion about the best way to package these dictionaries.

There are two general questions that should be standardized:

1. What should be the policy regarding the binary packages?
2. Where should the dictionaries be placed on the file system?

1. The binary packages can be built without much difficulty from the same source as existing Hunspell dictionaries. For an example of how this can be done, see the following commit by Don Armstrong <d...@debian.org> to the scowl sorce package which builds the English Hunspell dictionary binary packages:

https://git.donarmstrong.com/?p=deb_pkgs/scowl.git;a=commitdiff;h=4510f7fed66204384fe8c39fc875e24fd874229b

In this example patch, the compiled Qt WebEngine binary dictionary is shipped as part of the existing Hunspell binary packages (for example, hunspell-en-us). Another option would be to create a separate binary package (for example, qtwebengine-dict-en-us). The argument for including it in the existing binary package is that the compiled Qt WebEngine dictionary is not very large (691.2 KiB for en_US). The argument for splitting it into a separate binary package is that most people who install the Hunspell dictionaries don't intent to use a program that does spell checking inside of a Qt WebEngine, so it would be wasted space on their system.

2. Qt WebEngine searches for these binary dictionary packages in a number of places described in the upstream link above. One of them is in the system-wide QT_INSTALL_PREFIX/qtwebengine_dictionaries. The current QT_INSTALL_PREFIX can be determined by running the following command (assuming qmake is installed):

~$ qmake -query | grep QT_INSTALL_DATA
QT_INSTALL_DATA:/usr/share/qt5

Originally, I had proposed installing the dictionary files directly into /usr/share/qt5/qtwebengine_dictionaries with a symlink from the upcoming /usr/share/qt6/qtwebengine_dictionaries. However Don Armstrong proposed that they instead be installed in an unversioned directory and then symlinked from all the current versioned Qt directories, which makes it easier to maintain. His patch, linked above, places the .bdic files into /usr/share/hunspell with the original Hunspell files they were compiled from. Rene Engelhard <re...@debian.org> objects to this file location because he feels it should be preserved for files in the canonical Hunspell format. If a different directory is used for the Qt WebEngine .bdic files, I would propose something like /usr/share/qtwebengine-dict.

There is some discussion about this topic that already exists in the bug report for scowl and on the Debian-KDE mailing list.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017646
https://lists.debian.org/debian-kde/2022/09/msg00011.html

I don't have a particularly strong opinion about either of these two issues, although I do lean slightly towards having separate binary packages and using /usr/share/qtwebengine-dict for the file locations. However, I do think it is important that there is a consensus among all those who maintain the dictionary language packages and that this consensus be documented in a central location.

Rene Engelhard

unread,
Sep 21, 2022, 12:50:03 AM9/21/22
to
Hi,

[ your HTML mails make quoting hard... ]

Thanks for filing the report.

Am Tue, Sep 20, 2022 at 01:31:14PM -0700 schrieb Soren Stoutner:
> Another option would be to create a separate binary package (for example, qtwebengine-dict-en-us).

Name makes sense to me, yes.

> The argument for including it in the existing binary package is that the compiled Qt WebEngine dictionary is not very large (691.2 KiB for en_US).

I don't think that is a reason to keep it in hunspell-* per se, so..

> The argument for splitting it into a separate binary package is that most people who install the Hunspell dictionaries don't intent to use a program that does spell checking inside of a Qt WebEngine, so it would be wasted space on their system.

I agree with this one.

> Originally, I had proposed installing the dictionary files directly into /usr/share/qt5/qtwebengine_dictionaries with a symlink from the upcoming /usr/share/qt6/qtwebengine_dictionaries. However Don Armstrong proposed that they instead be installed in an unversioned directory and then symlinked from all the current versioned Qt directories, which makes it easier to maintain.

Yup. Or patch QtWebEgine to (also) directly look there if they are supposed to
be compatible between Qt5/Qt6 (which a symlink assumes)
and directly install it there (as you propose later to usr/share/qtwebengine-dict)?

CCing the QtWebEgine Maintainers.

> His patch, linked above, places the .bdic files into /usr/share/hunspell with the original Hunspell files they were compiled from.
> Rene Engelhard <re...@debian.org> objects to this file location because he feels it should be preserved for files in the canonical Hunspell format.

Indeed.

> If a different directory is used for the Qt WebEngine .bdic files, I would propose something like /usr/share/qtwebengine-dict.

Sounds good.

> I don't have a particularly strong opinion about either of these two issues, although I do lean slightly towards having separate binary packages and using /usr/share/qtwebengine-dict for the file locations.

Good.

> However, I do think it is important that there is a consensus among all those who maintain the dictionary language packages and that this consensus be documented in a central location.

Indeed.

Regards,

Rene

Soren Stoutner

unread,
Sep 21, 2022, 9:30:04 PM9/21/22
to
I filed bugs against each of the Hunspell source packages that produce
dictionaries making each of the maintainers aware of the discussion happening
here and inviting them to participate in the discussion.

Once a packaging consensus is reached, these bug reports will also provide a
useful mechanism for tracking when the packaging of the .bdic dictionaries is
completed for each package.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Mattia Rizzolo

unread,
Sep 22, 2022, 3:10:04 AM9/22/22
to
On Wed, Sep 21, 2022 at 06:39:16AM +0200, Rene Engelhard wrote:
> Am Tue, Sep 20, 2022 at 01:31:14PM -0700 schrieb Soren Stoutner:
> > Another option would be to create a separate binary package (for example, qtwebengine-dict-en-us).
>
> Name makes sense to me, yes.
>
> > The argument for including it in the existing binary package is that the compiled Qt WebEngine dictionary is not very large (691.2 KiB for en_US).
>
> I don't think that is a reason to keep it in hunspell-* per se, so..
>
> > The argument for splitting it into a separate binary package is that most people who install the Hunspell dictionaries don't intent to use a program that does spell checking inside of a Qt WebEngine, so it would be wasted space on their system.
>
> I agree with this one.

TBH, I'd argue for keeping them in the same hunspell-$lang binary
packages, without creating another 80ish binary packages for what IMHO
is very little gain. My understanding is that these a tighly coupled
objects, with everything relatively small… I'd just stash everything
together.

> > Originally, I had proposed installing the dictionary files directly into /usr/share/qt5/qtwebengine_dictionaries with a symlink from the upcoming /usr/share/qt6/qtwebengine_dictionaries. However Don Armstrong proposed that they instead be installed in an unversioned directory and then symlinked from all the current versioned Qt directories, which makes it easier to maintain.
>
> Yup. Or patch QtWebEgine to (also) directly look there if they are supposed to
> be compatible between Qt5/Qt6 (which a symlink assumes)
> and directly install it there (as you propose later to usr/share/qtwebengine-dict)?
>
> CCing the QtWebEgine Maintainers.
>
> > His patch, linked above, places the .bdic files into /usr/share/hunspell with the original Hunspell files they were compiled from.
> > Rene Engelhard <re...@debian.org> objects to this file location because he feels it should be preserved for files in the canonical Hunspell format.
>
> Indeed.
>
> > If a different directory is used for the Qt WebEngine .bdic files, I would propose something like /usr/share/qtwebengine-dict.
>
> Sounds good.
>
> > I don't have a particularly strong opinion about either of these two issues, although I do lean slightly towards having separate binary packages and using /usr/share/qtwebengine-dict for the file locations.
>
> Good.

I also don't think installing the files in /usr/share/hunspell make
sense, but /usr/share/qtwebengine-dict(s)/ is good to me.
However, I can't help but notice that these bdic files were developed by
the chromium people, and it seems like chrome can make use of them as
well.
As such, if these bdic files have a better name for themselves, I'd also
propose a directory that doesn't name qtwebengine, which potentially is
only one user of those files.

symlinks don't sound fun with potentially that many files, so consider
getting qtwebengine to look in that directory itself.

--
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`.
More about me: https://mapreri.org : :' :
Launchpad user: https://launchpad.net/~mapreri `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
signature.asc

Soren Stoutner

unread,
Sep 22, 2022, 3:40:04 AM9/22/22
to
Qt WebEngine is indeed built from a modified version of the Chromium source
code and Chromium does appear to use the same .bdic file format. Chromium has
an internal menu that allows for the downloading of additional dictionaries,
which investigation shows are stored under ~/.config/chromium/Dictionaries. It
is unclear to me if Chromium also looks in some system-wide directory for
these dictionaries, but if it doesn’t it may be possible to fix that with a
patch.

This being the case, I think it would be better to use a file location more
like /usr/share/chromium-dict, and, if packaged separately, a package name
like chromium-dict-en-us.

Also, if these dictionaries can be used by Chromium that may be an argument to
just include them in the current Hunspell binary packages as a greater
percentage of existing Hunspell users would find them of value.

I am CCing the Chromium developers to include them in the conversation.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Rene Engelhard

unread,
Sep 22, 2022, 4:20:04 AM9/22/22
to


Hi,

Am 22. September 2022 09:34:24 MESZ schrieb Soren Stoutner <so...@stoutner.com>:
>Qt WebEngine is indeed built from a modified version of the Chromium source
>code and Chromium does appear to use the same .bdic file format.

Cool.

> It
>is unclear to me if Chromium also looks in some system-wide directory for
>these dictionaries, but if it doesn’t it may be possible to fix that with a
>patch.

Jup, would be worthwhile.

>This being the case, I think it would be better to use a file location more
>like /usr/share/chromium-dict, and, if packaged separately, a package name
>like chromium-dict-en-us.

And what about other qtwebengine using stuff?

>Also, if these dictionaries can be used by Chromium that may be an argument to
>just include them in the current Hunspell binary packages as a greater
>percentage of existing Hunspell users would find them of value.

I still don't like it but if the majority thinks otherwise - it's democracy.

>I am CCing the Chromium developers to include them in the conversation.

Thanks :)

Regards,

René
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

Soren Stoutner

unread,
Sep 22, 2022, 5:00:04 AM9/22/22
to
On Thursday, September 22, 2022 1:09:04 AM MST Rene Engelhard wrote:
> >This being the case, I think it would be better to use a file location more
> >like /usr/share/chromium-dict, and, if packaged separately, a package name
> >like chromium-dict-en-us.
>
> And what about other qtwebengine using stuff?

I just did some testing, and manually adding a .bdic into the ~/.config/
chromium/Dictionaries directory does not make it automatically appear in the
list of available languages in Chromium’s settings. Meaning that Chromium
doesn’t scan the directory to see what is available, but that there is some
config file somewhere that must be populated with the dictionaries that Chromium
expects to use.

I noticed that Chromium already has a directory at /usr/share/chromium. If we
follow the same structure as the user directories, probably the best place to
put the dictionaries would be /usr/share/chromium/Dictionaries. Assuming we
can find a way to update whatever config file is used to inform Chromium about
the installed dictionaries, it should be possible to use that location as a
system-wide setting.

Regarding Qt WebEngine, instead of patching it to look in a new location
(which possibly might involve maintenance effort if the internal structure
changes in the future) what if we instead create symlinks from /usr/share/qt*/
qtwebengine_dictionaries to /usr/share/chromium/Dictionaries? For example,
libqt5webenginecore5 and libqt6webenginecore6 might be good packages to handle
these symlinks.

I just did some testing and, if the symlink points to a non-existent
directory, programs built on QtWebEngine that are looking for a dictionary file
don’t mind. They just go about their business without spellchecking enabled.
Meaning that, if no Hunspell dictionary package is installed and the directory
doesn’t exist, creating the symlink won’t cause any problems.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Andres Salomon

unread,
Sep 22, 2022, 12:10:03 PM9/22/22
to
FYI, chromium's documentation about this stuff is here:

https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries/+/refs/heads/main/README.chromium

It appears that the dictionary files are versioned, and those versions
are matched in chromium's source code:

https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/spellcheck/common/spellcheck_common.cc

My understanding is that .config/chromium/Dictionaries isn't the source
of dictionaries; when you add a language to chromium, a .bdic file for
that (already installed) language doesn't show up there unless you
first add the language in the "Preferred languages" chromium config
screen, and then go into the spelling config screen and manually select
that language under "Use spell check for".

Given the versioning, I'm not sure how you'd go about using shared
bdic files between chromium and qt. I'm open to ideas, though. Btw,
you can see the versioning happening in my own Dictionaries directory:

-rw-r--r-- 1 dilinger dilinger 442K Apr 25 14:45 en-US-10-1.bdic
-rw-r--r-- 1 dilinger dilinger 437K Oct 15 2018 en-US-8-0.bdic
-rw-r--r-- 1 dilinger dilinger 442K Apr 13 2020 en-US-9-0.bdic





On Thu, Sep 22, 2022 at 01:53, Soren Stoutner <so...@stoutner.com>
wrote:

Agustin Martin

unread,
Sep 22, 2022, 12:30:04 PM9/22/22
to
El mar, 20 sept 2022 a las 22:33, Soren Stoutner
(<so...@stoutner.com>) escribió:
>
> Package: dictionaries-common
> Version: 1.28.18
> Severity: wishlist
> Tags: l10n
>
> Qt WebEngine has the ability to use Hunspell dictionaries for spell checking with the WebEngine, but for some reason they require that the dictionary files be converted to a special binary format (.bdic). This conversion can be done using qwebengine_convert_dict from the qtwebengine5-dev-tools package. The upstream documentation regarding this is found on Qt's website:
>
> https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker
>
> Once these libraries are available they can be used by any program that includes Qt WebEngine.
>
> The purpose of this bug report is to create a central location for discussion about the best way to package these dictionaries.

Hi, Soren.

Sorry for the delay. For some reason your messages went to the spambox
and I am becoming aware of them just now (as well as about all this
issue). Here goes a quick reply.

First of all, I am curious about the reasons behind this new format,
the problems it deals with and its advantages. I assume they are valid
enough, but they imply yet another spellchecking engine/format. We
currently have goog old ispell, aspell and hunspell. vim has its own
spellchecker engine using its own format, with dicts that can be
created from old myspell2 dicts. We did not add vim format dicts (from
aspell dicts sources) since there seems to be some work to make vim
use hunspell directly. And now these bdict dicts.

Some other questions here,

From your info and proposed locations seems that these dicts are
arch:all, ¿is that true?

Another question is what happens with affix files, which I see are
used at build time, ¿are they used (from their path) at runtime or is
all the info (dic+aff) bundled into the bdic file? If explicit affix
files are still required at runtime, both bdic and aff files should
probably be in the same dir. Otherwise I am more for a separate
location. In this case, since bdic dicts seem to be more generic than
just a qtwebengine issue and they are indeed created from hunspell
files I would go for a rather generic name (may be something like
/usr/share/hunspell-bdic or something without the hunspell name?)

Regarding the binary package that should contain them, I tested with
en_US files and bdic file is smaller that .dic file, but not very
much, so size seems not the main reason to go one way or another. Do
not know for other languages. While it is easier to handle
dependencies with separate packages, I admit I do not have a strong
opinion here,

Regards,

--
Agustin

Soren Stoutner

unread,
Sep 22, 2022, 2:20:04 PM9/22/22
to
On Thursday, September 22, 2022 9:04:53 AM MST Andres Salomon wrote:
> FYI, chromium's documentation about this stuff is here:
>
> https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries/+/refs
> /heads/main/README.chromium

Thanks for that link.

> It appears that the dictionary files are versioned, and those versions
> are matched in chromium's source code:
>
> https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/
> spellcheck/common/spellcheck_common.cc

My understanding from reading this documentation is that the versioning of the
dictionaries is to facilitate automatic updating of the dictionary files. If
we could coax Chromium to use a system-wide directory, that would no longer be
an issue because updates would be handled by apt-get.

> My understanding is that .config/chromium/Dictionaries isn't the source
> of dictionaries; when you add a language to chromium, a .bdic file for
> that (already installed) language doesn't show up there unless you
> first add the language in the "Preferred languages" chromium config
> screen, and then go into the spelling config screen and manually select
> that language under "Use spell check for".

My testing indicates that, when you add a new language under the languages
Chromium config screen, it then adds an entry to the spelling config screen. If
you turn that entry on (disabled by default), Chromium downloads the
appropriate .bdic to ~/.config/chromium/Dictionaries from some central
repository hard coded into Chromium's code. This would explain why simply
dropping a .bdic into that directory is not enough to enable it as an option
in Chromium.

Given how this is structured, I am not sure how easy it would be to convince
Chromium to use a system-wide directory for the .bdic storage.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Sep 22, 2022, 3:40:04 PM9/22/22
to

On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:

> First of all, I am curious about the reasons behind this new format,

> the problems it deals with and its advantages. I assume they are valid

> enough, but they imply yet another spellchecking engine/format. We

> currently have goog old ispell, aspell and hunspell. vim has its own

> spellchecker engine using its own format, with dicts that can be

> created from old myspell2 dicts. We did not add vim format dicts (from

> aspell dicts sources) since there seems to be some work to make vim

> use hunspell directly. And now these bdict dicts.


The .bdic format is specified by the upstream Chromium project, and is required by anything that is based off of Chromium's code, like Qt WebEngine.  I do not know why they went with a proprietary binary format, but I would assume that if they went to so much trouble to not use the standard Hunspell format there must have been something to make it worthwhile, like some performance improvement.  Perhaps I am giving Google too much credit for having logical reasons instead of making arbitrary decisions.


> From your info and proposed locations seems that these dicts are

> arch:all, ¿is that true?


I have not seen anything to indicate they are not arch:all.  Although it probably depends on how the binary data is processed.  There is a possibility there might be an endianess issue.


> Another question is what happens with affix files, which I see are

> used at build time, ¿are they used (from their path) at runtime or is

> all the info (dic+aff) bundled into the bdic file? If explicit affix

> files are still required at runtime, both bdic and aff files should

> probably be in the same dir. Otherwise I am more for a separate

> location. In this case, since bdic dicts seem to be more generic than

> just a qtwebengine issue and they are indeed created from hunspell

> files I would go for a rather generic name (may be something like

> /usr/share/hunspell-bdic or something without the hunspell name?)


The .bdic binary file contains all the information from the .dic and .aff files, so neither of them are needed by Qt WebEngine.  As such, I think a dedicated directory for the .bdic files is best.


My personal motivation for getting these dictionaries into Debian is that I am the developer of Privacy Browser, which is a web browser based on Qt WebEngine.  The PC version is currently in a pre-alpha state.


https://www.stoutner.com/privacy-browser-pc/


When adding spell checking functionality, I realized that these dictionaries were not already packaged.  The little bit of poking around that I did showed that Arch Linux packages them, but I do not know if other distributions do so.


https://archlinux.org/todo/packaging-qtwebengine-dictionaries/


There are a number of existing web browsers in Debian based on Qt WebEngine that could take advantage of the presence of these .bdic dictionaries.  A non-exhaustive list includes:  Konqueror, Falkon, qutebrowser, and angelfish.  If it ends up being feasible for Chromium to also use a system-wide .bdic location, then any Chromium fork would also benefit.


Once Privacy Browser reaches an alpha release, my intention is to maintain a Debian package for it.  I have the option of integrating the .bdics directly into the program's personal data folders, but that seems like a suboptimal approach, because anything else on the system that wanted to use them would have to have their own copy.  When the binary dictionaries are installed in the correct system-wide folder, any Qt WebEngine program can utilize them with a single line of code that specifies which dictionary to use (only one can be active at a time).  Of course, the program would also probably need to establish a GUI where the user can select which dictionary they would like to be active, which GUI involves more than a single line of code.


--

Soren Stoutner

so...@stoutner.com

signature.asc

Andres Salomon

unread,
Sep 27, 2022, 11:40:04 PM9/27/22
to
The "team" would just be me (wanna join? :), and I had to do some
security uploads today and haven't had the chance to look further into
this. Unfortunately, there's a few other high-priority things I need to
deal with before I can take a look.

On Tue, Sep 27, 2022 at 20:27, Soren Stoutner <so...@stoutner.com>
wrote:
> Does anyone from the Chromium team have any insights into the
> feasibility of Chromium using a system-wide directory for .bdic files?
>
>
>
> --
>
> Soren Stoutner
>
> so...@stoutner.com
>

Soren Stoutner

unread,
Sep 27, 2022, 11:40:06 PM9/27/22
to
signature.asc

Soren Stoutner

unread,
Sep 28, 2022, 12:30:04 AM9/28/22
to
On Tuesday, September 27, 2022 8:29:30 PM MST Andres Salomon wrote:
> The "team" would just be me (wanna join? :),

Currently my interests lie elsewhere, but I may reconsider that in the future.

> and I had to do some
> security uploads today and haven't had the chance to look further into
> this. Unfortunately, there's a few other high-priority things I need to
> deal with before I can take a look.

That’s fine. This is a low priority, so it can wait until whenever you have
time to look at it.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Agustin Martin

unread,
Oct 5, 2022, 8:20:04 AM10/5/22
to
El jue, 22 sept 2022 a las 21:30, Soren Stoutner
(<so...@stoutner.com>) escribió:
>
> On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
>
> > First of all, I am curious about the reasons behind this new format,
> > the problems it deals with and its advantages. I assume they are valid
> > enough, but they imply yet another spellchecking engine/format. We
> > currently have goog old ispell, aspell and hunspell. vim has its own
> > spellchecker engine using its own format, with dicts that can be
> > created from old myspell2 dicts. We did not add vim format dicts (from
> > aspell dicts sources) since there seems to be some work to make vim
> > use hunspell directly. And now these bdict dicts.
>
> The .bdic format is specified by the upstream Chromium project, and is required by anything that is based off of Chromium's code, like Qt WebEngine. I do not know why they went with a proprietary binary format, but I would assume that if they went to so much trouble to not use the standard Hunspell format there must have been something to make it worthwhile, like some performance improvement. Perhaps I am giving Google too much credit for having logical reasons instead of making arbitrary decisions.

Hi, Soren

It s a pity not to have more info about the reasons for this new
format. Even if using it is more effficient in terms of plain
performance, I do not think that is noticeable in stuff like chromium.

Another question is whether that bdic format is expected to change or
that is very unlikely.

Thinking about this, I have done some tests about these bdic files
being generated at postinst, like emacs byte-compiled files (although
my tests were more rude), delegating everything to the qtwebengine
packages. . bdic generation is not very slow, but IMHO is not fast
enough to go this way (which woud require moving
qwebengine_convert_dic to Qt WebEngine runtime package and control
everything from it).

One noticeable thing is that bdic generation failed for some hunspell
dicts I have installed

++ processing an_ES.aff
[1003/125813.760330:FATAL:aff_reader.cc(305)] Did not find a space in 'y i'.
Trace/breakpoint trap
++ processing ar.aff
[1003/125813.796753:FATAL:aff_reader.cc(123)] We don't support the
IGNORE command yet. This would change how we would insert things in
our lookup table.
++ processing gl_ES.aff
gl_ES.dic_delta not found.
Reading gl_ES.aff
Reading gl_ES.dic
Serializing...
Verifying...
Word does not match!
Index: 2126
Expected: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battānī
Actual: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battā
ERROR converting, the dictionary does not check out OK.

Regards,

--
Agustin

Soren Stoutner

unread,
Oct 5, 2022, 12:50:04 PM10/5/22
to
On Wednesday, October 5, 2022 5:07:50 AM MST Agustin Martin wrote:
> El jue, 22 sept 2022 a las 21:30, Soren Stoutner
> One noticeable thing is that bdic generation failed for some hunspell
> dicts I have installed

That’s concerning.

> ++ processing an_ES.aff
> [1003/125813.760330:FATAL:aff_reader.cc(305)] Did not find a space in 'y
> i'. Trace/breakpoint trap

This is caused by line 90 of an_ES.aff:

REP y<tab character>i

All the previous instances of REP in this file have a space between the two
arguments. This is the first one to use a tab. Following line 90 both tabs
and spaces are used.

I don’t know enough about the Hunspell file format to know what is expected.
Is this an example of an incorrectly formatted .aff file or is this an example
of qwebengine_convet_dict not knowing how to read appropriate Hunspell
formatting?

> ++ processing ar.aff
> [1003/125813.796753:FATAL:aff_reader.cc(123)] We don't support the
> IGNORE command yet. This would change how we would insert things in
> our lookup table.

Based on this error message, it seems fairly obvious that
qwebengine_convert_dict does not fully support the Hunspell format. The line
in question is 24142 from ar.aff which reads as follows:

IGNORE

I will file an upstream bug to see if that can be corrected in some way, but I
think I will wait until I have the answers to these other questions to decide
if I should file one bug or three.

> ++ processing gl_ES.aff
> gl_ES.dic_delta not found.
> Reading gl_ES.aff
> Reading gl_ES.dic
> Serializing...
> Verifying...
> Word does not match!
> Index: 2126
> Expected: Abū po:antropónimo
> is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_
> Ṣabiʾ_al_Battānī Actual: Abū po:antropónimo
> is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_
> Ṣabiʾ_al_Battā ERROR converting, the dictionary does not check out OK.

I am not exactly sure what is causing this error, but I would assume that it
is some mismatch between the .aff and the .dic files. The line it appears to be
complaining about is 2095 from gl_ES.dic, which reads as follows:

Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battānī

However, for some reason it is expecting the line to be shorter.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Oct 12, 2022, 5:41:32 PM10/12/22
to
On Wednesday, October 5, 2022 9:38:09 AM MST Soren Stoutner wrote:
> > ++ processing gl_ES.aff
> > gl_ES.dic_delta not found.
> > Reading gl_ES.aff
> > Reading gl_ES.dic
> > Serializing...
> > Verifying...
> > Word does not match!
> >
> > Index: 2126
> > Expected: Abū po:antropónimo
> >
> > is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_a
> > ṣ_ Ṣabiʾ_al_Battānī Actual: Abū po:antropónimo
> > is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_a
> > ṣ_ Ṣabiʾ_al_Battā ERROR converting, the dictionary does not check out OK.
> I am not exactly sure what is causing this error, but I would assume that it
> is some mismatch between the .aff and the .dic files. The line it appears
> to be complaining about is 2095 from gl_ES.dic, which reads as follows:
>
> Abū po:antropónimo
> is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_
> Ṣabiʾ_al_Battānī
>
> However, for some reason it is expecting the line to be shorter.

After thinking about this error a little more, it occurred to me that the most
likely explanation is probably that the .bdic binary format has a maximum field
length that is shorter than the field in the original .aff file. So, when the
field is read into the .bdic it is truncated, and then, at a later step, when
it is compared, the lines no longer match.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Oct 12, 2022, 6:51:22 PM10/12/22
to
signature.asc

Roland Rosenfeld

unread,
Oct 14, 2022, 7:00:04 AM10/14/22
to
Hi,

let me try to summarize where we stand and what options and open
questions we have.

I see the following options to package the bdic-Files (seems not all
of them were already mentioned before):

a) Bundle the bdic files in the existing hunspell-<lang> files.
- Pro: no new packages needed
- Con: Duplicate size of existing ~80 packages
b) Create new packages hunspell-bdic-<lang>.
- Pro: User can install what is needed
- Con: ~80 new packages necessary
c) Add a mechanism to dictionaries-common, which extends
update-dictcommon-hunspell to build the bdic file in
hunspell-<lang>.postinst.
- Pro: No changes in hunspell-<lang>
- Pro: No redundancy in Archive
- Con: Wastes space on users disk, unless QtWebEngine is used
- Con: May slow down hunspell-<lang> installation
- Con: Pulls in qtwebengine5-dev-tools for all hunspell-<lang> packages
d) Add a new package (hunspell-bdic-generator or the like), that
builds bdic files for all hunspell-<lang> packages if it is
installed. This requires some dpkg/apt-hook to trigger building
bdic if a new hunspell-<lang> package is installed or upgraded.
All packages using bdic files have to depend on
hunspell-bdic-generator.
- Pro: No changes in hunspell-<lang>
- Pro: No redundancy in Archive
- Pro: Only used/installed when needed
- Con: Complex hook mechanism

I'm not sure what option I prefer myself, they all have
disadvantages, but I personally prefer b) over a), while c) and d)
could reduce the effort for hunspell-<lang> maintainers (in trade-off
to the efforts in dictionaries-common).


Except this I see the following open points:
- Is bdic really arch:all or do we have some endiane issue? For option
c) and d) this is irrelevant.
- Where should the bdic files be placed?
1) /usr/share/hunspell-bdic
2) /usr/share/qtwebengine-dict
3) /usr/share/bdic
4) /usr/share/hunspell
5) something else
(The order mentions my personal preference)
- Is there some commandline client for auto-testing the bdic files?
- How to reuse the bdic files with chromium?
- 3 bugs in qwebengine_convert_dict reported by Soren Stoutner
- Do we target this this to bookworm or trixie?

Greetings
Roland

Soren Stoutner

unread,
Oct 14, 2022, 1:30:03 PM10/14/22
to
On Friday, October 14, 2022 3:54:53 AM MST Roland Rosenfeld wrote:
> - Where should the bdic files be placed?
> 1) /usr/share/hunspell-bdic

I like this option because it would eliminate the need to wait to find out if
Chromium can use the files before deciding where to put them.

On a separate note, I am in the process of filing upstream bug reports with
Google as recommended by Qt. Once I get those filed I will place links to them
in the Qt bug reports. I don’t think we need to wait for these bugs to be
fixed before adding packages to Debian as they only affect three languages, but
if the .bdic files are going to be generated automatically we need some
mechanism to specify that they shouldn’t attempt to be generated for these
three languages until the bugs are fixed.

Alternatively, we could patch the files so that the errors are avoided. In the
case Aragonese with tabs in the .aff, we could change them all to be spaces. I
can find no Hunspell file spec that says either tabs or spaces are required, and
I am assuming that Hunspell itself doesn’t have problems with tabs because
there are no nasty bugs filed against the current Aragonese Debian package, but
it should also work fine with spaces as that is what all the other Hunspell
packages appear to use.

The errors in the other two files could be fixed by removing one line from ar.aff
and 31 lines from gl_ES.dic. That makes them slightly less correct, but most
people using the .bidc would probably not notice. If we go this route we need
to make sure we are only editing temporary files used to generate the .bdic as
those using the standard Hunspell system should not end up with an inferior
experience just to accommodate a custom binary format.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Oct 14, 2022, 2:01:26 PM10/14/22
to

This is Google’s page describing the .bdic format:


https://sites.google.com/a/chromium.org/dev/developers/how-tos/editing-the-spell-checking-dictionaries


It doesn’t directly address the topic of endianess, but it does say the following:


"The .bdic files are always UTF-8 internally, and the convert_dict tool converts things appropriately when it runs.”


I must admit that the topic of endianess goes a bit beyond my expertise, but my understanding is that it is primarily an issue for executable files.  As the .bdic is only a data file, and as the data encoded inside it is in UTF-8 as described above, would that mean that it is safe to assume that these are arch:all?


--

Soren Stoutner

so...@stoutner.com

signature.asc

Roland Rosenfeld

unread,
Oct 14, 2022, 2:40:04 PM10/14/22
to
> It doesn’t directly address the topic of endianess, but it does say
> the following:
>
> "The .bdic files are always UTF-8 internally, and the convert_dict
> tool converts things appropriately when it runs.”
>
> I must admit that the topic of endianess goes a bit beyond my
> expertise, but my understanding is that it is primarily an issue for
> executable files. As the .bdic is only a data file, and as the data
> encoded inside it is in UTF-8 as described above, would that mean
> that it is safe to assume that these are arch:all?

I'm also not very familiar with this, so I tried to check out. I
noted that s390x is big endian (in contrast to nearly all other
supported architectures) and so I logged into a s390x porter box to
build a bdic file and compare it with one build on other
architectures.

When I tried to install qtwebengine5-dev-tools I had to notice, that
this package is not available for s390x. Checking
https://tracker.debian.org/pkg/qtwebengine-opensource-src I noticed,
that the package is limited to amd64, arm64, armhf, i386, mips64el and
mipsel. Which means, that it is not available on armel, ppc64el and
s390x (and all other non release relevant architectures).

That's a big drawback, since we want to support the hunspell
dictionaries on all architectures.

I checked, whether the convert_dict tool (mentioned in the chromium
page "Editing the spell checking dictionaries") may be available in
Debian for other architectures, but I only found
qt6-webengine-dev-tools, which supports an even shorter architecture
list (only amd64, arm64, armhf and i386)...

Greetings
Roland

Andres Salomon

unread,
Oct 14, 2022, 2:40:04 PM10/14/22
to
FYI:

Chromium includes an embedded copy of the hunspell library, which
they've forked to ignore dic/aff files and instead use bdic files. The
patch and google additions can be found here:

https://sources.debian.org/src/chromium/106.0.5249.119-1/third_party/hunspell/google.patch/

https://sources.debian.org/src/chromium/106.0.5249.119-1/third_party/hunspell/google/

You'll note that bdict.h in the google directory mentions right up
front that offsets in bdict format are little endian, and also
describes endianness of integer fields themselves. That means putting
bdic files into arch:all should be fine, as the tools should convert
them automatically on big-endian platforms (and if they don't, it's a
bug that chromium devs would accept a fix for).

Andres Salomon

unread,
Oct 14, 2022, 3:10:04 PM10/14/22
to
I don't have a strong opinion about a) vs b).


>
>
> Except this I see the following open points:
> - Is bdic really arch:all or do we have some endiane issue? For
> option
> c) and d) this is irrelevant.

There shouldn't be endian issues, as I mentioned elsewhere.


> - Where should the bdic files be placed?
> 1) /usr/share/hunspell-bdic
> 2) /usr/share/qtwebengine-dict
> 3) /usr/share/bdic
> 4) /usr/share/hunspell


In my opinion, chromium's (, or QT's, or whoever's) bdic support should
be merged upstream into hunspell, and hunspell should be shipping bdic
files in /usr/share/hunspell alongside the .aff and .dic files. I don't
know how active hunspell upstream is, though, and how amenable they'd
be to patches. I see at least one person created an
hunspell-with-bdic-support fork a decade ago:
https://github.com/sheremetyev/hunspell

That would allow chromium and other hunspell users to link against a
system hunspell when desired, dropping all the bdict versioning stuff
and the custom paths. I'm pretty sure I could get a patch to link
against system hunspell into chromium upstream, provided bdic support
made it into upstream hunspell.

I wouldn't want to see debian carrying bdic patches in its hunspell
package, though; nor would I want to see the security team needing to
deal with a hunspell fork package.


> 5) something else
> (The order mentions my personal preference)
> - Is there some commandline client for auto-testing the bdic files?
> - How to reuse the bdic files with chromium?
> - 3 bugs in qwebengine_convert_dict reported by Soren Stoutner


I just took a peek at qtwebengine-opensource-src-5.15.10+dfsg, and I
see that they're using the exact same hunspell fork from chromium. :(

Mattia Rizzolo

unread,
Oct 14, 2022, 4:01:20 PM10/14/22
to
On Fri, Oct 14, 2022 at 02:58:17PM -0400, Andres Salomon wrote:
> In my opinion, chromium's (, or QT's, or whoever's) bdic support should be
> merged upstream into hunspell, and hunspell should be shipping bdic files in
> /usr/share/hunspell alongside the .aff and .dic files. I don't know how
> active hunspell upstream is, though, and how amenable they'd be to patches.
> I see at least one person created an hunspell-with-bdic-support fork a
> decade ago: https://github.com/sheremetyev/hunspell
>
> That would allow chromium and other hunspell users to link against a system
> hunspell when desired, dropping all the bdict versioning stuff and the
> custom paths. I'm pretty sure I could get a patch to link against system
> hunspell into chromium upstream, provided bdic support made it into upstream
> hunspell.
>
> I wouldn't want to see debian carrying bdic patches in its hunspell package,
> though; nor would I want to see the security team needing to deal with a
> hunspell fork package.

I think you are the first to mention "integrating" bdic into hunspell
itself, and to mention that effectively the parser is based on hunspell
itself…

From what I know, the hunspell project is basically in maintenance mode,
with nobody actively doing anything to it. Pretty much all changes in
the past years were done by drive-by contributors.

This is to say, I can't deny that somebody proposing MRs upstream might
actually see them merged.

> > 5) something else
> > (The order mentions my personal preference)
> > - Is there some commandline client for auto-testing the bdic files?
> > - How to reuse the bdic files with chromium?
> > - 3 bugs in qwebengine_convert_dict reported by Soren Stoutner
>
>
> I just took a peek at qtwebengine-opensource-src-5.15.10+dfsg, and I see
> that they're using the exact same hunspell fork from chromium. :(

Not surprising since afaik qtwebengine is basically a fork of chromium
itself...
signature.asc

Soren Stoutner

unread,
Oct 14, 2022, 5:51:15 PM10/14/22
to
On Friday, October 14, 2022 11:58:17 AM MST Andres Salomon wrote:
> That would allow chromium and other hunspell users to link against a
> system hunspell when desired, dropping all the bdict versioning stuff
> and the custom paths. I'm pretty sure I could get a patch to link
> against system hunspell into chromium upstream, provided bdic support
> made it into upstream hunspell.

I think that would be the way to go if both the upstreams would support it.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Oct 25, 2022, 2:51:21 PM10/25/22
to
While we wait for answers as to whether these dictionaries can be used by the
Chromium package and how they might possibly be integrated with upstream
Hunspell, I would recommend that we move forward with packaging them in /usr/
share/hunspell-bdic. This location provides flexibility for whatever ends up
happening with upstream Hunspell and Chromium.

The question at this point is if they should be generated at package creation
or if they should be generated during install. It appears that the majority
leans towards generating them at package creation. Is there anyone who feels
strongly the other way?

--
Soren Stoutner
so...@stoutner.com
signature.asc

Agustin Martin

unread,
Oct 28, 2022, 7:20:03 AM10/28/22
to
Hi all,

I am not particularly happy about this (see details below), but seems
we will have to package all these .bdic files because qtwebengine and
chromium use them. Since some .bdic may fail to build I would rather
prefer them to be generated during package creation, where it is
easier not to create them if required. If done during package install
I think everything should be handled from qtwebengine package. In this
case some fine tuning can be done to improve efficiency (handling
symlinks better, regenerate only when a new version of dict package is
installed or incompatibilities in qtwebengine hunspell appear, ...)

Why I am not that happy about these .bdic files? Looking at
https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries/+/refs/heads/main/README.chromium
and https://sites.google.com/a/chromium.org/dev/developers/how-tos/editing-the-spell-checking-dictionaries
the only reasons for this seem to be support for delta files, where
"The .dic_delta files are used to add words which are not there in the
.dic files" and having everything UTF-8. Correct me if I am wrong.

Packaging all possible hunspell dicts in .bdic format will in practice
not be useful to support delta files as originally intended, since
original hunspell dict will be used. Debian maintainer could use a
delta file for Debian changes in poorly maintained dicts, but I think
that in this case they should better patch original .dic file to make
the fix available to all hunspell users.

Another thing I do not like is to have three copies of hunspell flying
around, original hunspell lib and those embedded in chromium and
qtwebengine. This makes harder to keep everything synced.

I agree that that the best would to extend hunspell, but to support
.dic_delta files instead of changing it to use bdic format. Part of
the code may even be reusable to support something like aspell .multi
files.

Regards,

--
Agustin





https://github.com/sheremetyev/hunspell
>
> --
> Soren Stoutner
> so...@stoutner.com

Soren Stoutner

unread,
Nov 3, 2022, 6:41:12 PM11/3/22
to
On Friday, October 28, 2022 4:09:45 AM MST Agustin Martin wrote:
> I am not particularly happy about this (see details below), but seems
> we will have to package all these .bdic files because qtwebengine and
> chromium use them. Since some .bdic may fail to build I would rather
> prefer them to be generated during package creation, where it is
> easier not to create them if required. If done during package install
> I think everything should be handled from qtwebengine package. In this
> case some fine tuning can be done to improve efficiency (handling
> symlinks better, regenerate only when a new version of dict package is
> installed or incompatibilities in qtwebengine hunspell appear, ...)

I agree with you. I am also unhappy that Chromium and QtWebEngine want to use
a specialized file format instead of just using the standard Hunspell files.
However, as much as I don’t like it, I also agree with you that the best thing
Debian can do in the short term is to move forward with the packaging of these
.bdic files while we wait to see if we can make any changes upstream.

Given that nobody else responded to this question, I think there is a
consensus that it is best to create the .bdic files during package creation.

The next question that needs to be answered is if we should create new binary
packages for the .bdic files or if we should ship them as part of the existing
Hunspell language binary packages. The opinions that have been expressed so
far have run the gamut on both sides, but my sense is they lean a little
towards shipping them in the existing Hunspell packages so as to not add 80+
new packages to Debian that only contain a few files each.

Is there anyone who feels strongly that they should not be shipped in the
existing files?

--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Nov 9, 2022, 1:20:04 PM11/9/22
to
I would take the lack of response to indicate that nobody has any strong
objections to packaging the .bdic files inside the existing Hunspell binary
packages.

This means that there is a consensus on the following two items:

1. The .bdic files should be compiled at package creation time instead of at
package install time.
2. The .bdic files should be shipped in the existing Hunspell language binary
packages.

The next question that needs to be decided is where these files should be
placed. There have been a number of locations proposed. I think the majority
of the options are that they should not be placed in /usr/share/hunspell. Of
all the proposals that have been suggested, I personally like /usr/share/
hunspell-bdic because this location is usage agnostic. Chromium can start
using them in the future, Qt WebEngine can use them now, and other programs we
haven't considered can use them in the future all without a need to change the
name of the file location and without the name feeling like it is limited to
certain programs.

This would also require the the Debain Qt/KDE Maintainers add a symlink from /
usr/share/qt5/qtwebengine_dictionaries and /usr/share/qt6/
qtwebengine_dictionaries to /usr/share/hunspell-bdic. They can do this in
whatever package makes the most sense to them, but possibly in
libqt5webengine-data and libqt6webengine6-data.

Is there anyone who objects to this file location and symlink approach?
signature.asc

Agustin Martin

unread,
Nov 13, 2022, 5:20:04 PM11/13/22
to
Hi,

I am for the approach that causes as little annoyance as possible to
the Debian archive, and I think that is using current packages. This
way we do not bother ftpmasters with all these new packages that might
be temporary.

I would personally expect this to be temporary until someone with the
appropiate skills provides a patch to make qtwebengine use system
hunspell in Debian (as has already been done for other libs in Debian
qtwebengine). I looked at the embedded hunspell code, but I am far
from having those skills, so got no result.

Also note that https://github.com/sheremetyev/hunspell seems to be
based in a 10 years old fork of hunspell. I hope hunspell code in
chromium and qtwebengine is not 10 years old and hunspell upstream has
been tracked for updates (at least for security updates). I have done
a quick comparison and they are not exactly the same, and not only
cosmetically, but did not go further.

It is to note that even that 10 years code apparently has support for
the IGNORE flag, unsupported by the .bdic dicts. Fortunately, seems
that there are not many dicts using that flag in
libreoffice-dictionaries.

libreoffice-dictionaries-7.4.2$ grep -r IGNORE *
dictionaries/bo/bo.aff:IGNORE ༵༷
dictionaries/ar/ar.aff:IGNORE ـٰ
dictionaries/uk_UA/uk_UA.aff:IGNORE ́
dictionaries/ckb/dictionaries/ckb.aff:IGNORE ـٰ١٢٣٤۴٥۵٦۶٧٨٩٠
dictionaries/hu_HU/hu_HU.aff:IGNORE ()]

Soren Stoutner

unread,
Nov 14, 2022, 7:40:04 PM11/14/22
to

On Sunday, November 13, 2022 3:13:55 PM MST Agustin Martin wrote:

> It is to note that even that 10 years code apparently has support for

> the IGNORE flag, unsupported by the .bdic dicts. Fortunately, seems

> that there are not many dicts using that flag in

> libreoffice-dictionaries.

>

> libreoffice-dictionaries-7.4.2$ grep -r IGNORE *

> dictionaries/bo/bo.aff:IGNORE ༵༷

> dictionaries/ar/ar.aff:IGNORE ـٰ

> dictionaries/uk_UA/uk_UA.aff:IGNORE ́

> dictionaries/ckb/dictionaries/ckb.aff:IGNORE ـٰ١٢٣٤۴٥۵٦۶٧٨٩٠

> dictionaries/hu_HU/hu_HU.aff:IGNORE ()]


Thanks for catching these additional languages with the IGNORE command in their .aff file.


I added notes to the Tibetan bug report:


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020479


And to the LibreOffice dictionaries bug report, which ships Hungarian and Ukrainian:


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020479


The issue with Arabic was previously reported:


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020481


I was not able to find ckb.aff in the Debian archive.  There is a ckb_IQ.aff which links to kmr_Latn.cff in the hunspell-kmr package, but that file does not include an IGNORE command and doesn’t produce an error when compiled to the .bdic format.  Does anyone know if ckb.aff exists in Debian?


https://packages.debian.org/search?searchon=contents&keywords=ckb.aff&mode=path&suite=unstable&arch=any



--

Soren Stoutner

so...@stoutner.com

signature.asc

Soren Stoutner

unread,
Nov 17, 2022, 4:31:31 PM11/17/22
to
Based on the lack of opposition, it seems that the following is the consensus
for packaging .bdic files.

1. The .bdic files should be compiled at package creation time.
2. The .bdic files should be included in the existing Hunspell language binary
packages.
3. The .bdic files should be installed in /usr/share/hunspell-bdic.

At this point, the only question left is where this should be documented and
who should write the documentation. I am assuming that /usr/share/doc/
dictionaries-common-dev/dsdt-policy.html is the correct place for
documentation. I am willing to submit a PR if nobody else prefers to do so
instead.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Mattia Rizzolo

unread,
Nov 17, 2022, 5:30:04 PM11/17/22
to
On Thu, Nov 17, 2022 at 02:25:17PM -0700, Soren Stoutner wrote:
> Based on the lack of opposition, it seems that the following is the consensus
> for packaging .bdic files.

thanks for driving this silent resolution ahah :D

> 1. The .bdic files should be compiled at package creation time.
> 2. The .bdic files should be included in the existing Hunspell language binary
> packages.

well, I can confirm these 2 points match my preference, so…

> 3. The .bdic files should be installed in /usr/share/hunspell-bdic.

… and I don't really have an opinion here.

What I do want to see *before* we actually release a lo-dicts with these
is something that actually reads and make use of them *first*.

> At this point, the only question left is where this should be documented and
> who should write the documentation. I am assuming that /usr/share/doc/
> dictionaries-common-dev/dsdt-policy.html is the correct place for
> documentation. I am willing to submit a PR if nobody else prefers to do so
> instead.

mh, what documentation are you talking about??
signature.asc

Soren Stoutner

unread,
Nov 17, 2022, 5:40:05 PM11/17/22
to

On Thursday, November 17, 2022 3:18:17 PM MST Mattia Rizzolo wrote:>

> What I do want to see *before* we actually release a lo-dicts with these

> is something that actually reads and make use of them *first*.


Privacy Browser PC uses them.


https://www.stoutner.com/privacy-browser-pc/


I would like to package up an alpha release of Privacy Browser PC.  One of the things I am waiting on is getting spell checking working correctly.  Part of this is the Privacy Browser needs to be able to enumerate the installed dictionaries so that users can select between them.  How it does this depends on where the dictionaries are placed and requires that the Qt packages have the correct symlinks.  So, I considered having these dictionaries packaged and in Debian was an important first step.


Originally I planned to have Privacy Browser packaged by the end of the year, but at the current pace it looks like it is going to take longer than I expected to get all the prerequisites in place (this being one of them) so I am now looking for an early 2023 release (hopefully before the freeze).


> > At this point, the only question left is where this should be documented

> > and who should write the documentation.  I am assuming that

> > /usr/share/doc/ dictionaries-common-dev/dsdt-policy.html is the correct

> > place for

> > documentation.  I am willing to submit a PR if nobody else prefers to do

> > so

> > instead.

>

> mh, what documentation are you talking about??


The documentation for any packager of future Hunspell languages so that they know that they need to include compiled .bdic files, where they should be placed, and how to do it.


--

Soren Stoutner

so...@stoutner.com

signature.asc

Lisandro Damián Nicanor Pérez Meyer

unread,
Nov 21, 2022, 1:40:05 PM11/21/22
to
Hi,

On Wed, 9 Nov 2022 at 15:13, Soren Stoutner <so...@stoutner.com> wrote:
[snip]
> This would also require the the Debain Qt/KDE Maintainers add a symlink from /
> usr/share/qt5/qtwebengine_dictionaries and /usr/share/qt6/
> qtwebengine_dictionaries to /usr/share/hunspell-bdic. They can do this in
> whatever package makes the most sense to them, but possibly in
> libqt5webengine-data and libqt6webengine6-data.

The problem is not a symlink, but building and testing qt[5 6]webkit
with this change. Maybe building is not an issue as we could just
remove the bdic files and add the symlink... but someone who really
understands what's going on under the hood should do a very careful
test to see things are working.

Soren Stoutner

unread,
Nov 21, 2022, 3:11:33 PM11/21/22
to
No current changes are needed to QT WebEngine as it currently exists in
Debian. It works just fine as long as the dictionaries are in the canonical
location (or that canonical location is a symlink to the actual location).

I have written some descriptions of my testing of this in earlier posts to
this bug report, but if there are any questions I would be happy to provide
more documentation about how I did my testing and how anyone can reproduce it
and verify that the existing QT WebEngine already has all the .bdic plumbing
built into it.

On Monday, November 21, 2022 11:29:28 AM MST Lisandro Damián Nicanor Pérez
--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Dec 3, 2022, 11:00:04 PM12/3/22
to

I created an MR:


https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/5


Please review and make sure I haven’t missed anything or misrepresented the consensus.

signature.asc

Agustin Martin

unread,
Dec 6, 2022, 5:40:04 PM12/6/22
to
El dom, 4 dic 2022 a las 4:54, Soren Stoutner (<so...@stoutner.com>) escribió:
>
> I created an MR:
>
> https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/5
>
> Please review and make sure I haven’t missed anything or misrepresented the consensus.

Merged.

Will wait some days for possible new comments.

Agustin Martin

unread,
Dec 9, 2022, 1:20:04 PM12/9/22
to
El mar, 6 dic 2022 a las 23:34, Agustin Martin
(<agustin...@gmail.com>) escribió:
By the way, I have been playing with an old helper
(installdeb-myspell) shipped with dictionaries-common-dev to help with
these bdic files. First cut committed to salsa. Currently
installdeb-myspell will fail if no conversion tool is found.

Soren Stoutner

unread,
Dec 9, 2022, 4:20:10 PM12/9/22
to
That’s really cool. Thank you for doing that.

On Friday, December 9, 2022 11:09:00 AM MST Agustin Martin wrote:
> By the way, I have been playing with an old helper
> (installdeb-myspell) shipped with dictionaries-common-dev to help with
> these bdic files. First cut committed to salsa. Currently
> installdeb-myspell will fail if no conversion tool is found.


--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Dec 13, 2022, 12:50:05 PM12/13/22
to

Agustin,


You are correct that there are currently two copies in Debian, one that comes with the Qt 5 packages and the other that comes with the Qt 6 packages.


Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of either creating a meta package that depends on the most recent package that includes qwebengine_convert_dict or creating an unversioned package that installs qwebengine_convert_dict?  Also, either having qwebengine_convert_dict being installed in an unversioned location or having a symlink that is unversioned?  That would make it easier for Hunspell language packages to build-depend on qwebengine_convert_dict and wouldn’t require reworking all of those packages’ build scripts every time the version of Qt in Debian changes.


Regarding qwebengine_convert_dict expecting the .dic as a file entry, I am not certain I understand what you are referring to.  This is how it builds on my Debian testing system.  The .dic file must be in the same directory as the .aff, but it isn’t specified (or at least doesn’t need to be specified) as a file entry.


$ /usr/lib/qt5/bin/qwebengine_convert_dict en_US.aff en_US.bdic

en_US.dic_delta not found.

Reading en_US.aff

Reading en_US.dic

Serializing...

Verifying...

Writing en_US.bdic

Success. Dictionary converted.


On Tuesday, December 13, 2022 1:52:20 AM MST Agustin Martin (<agma...@debian.org>) wrote:

>> Note that Debian has a different path for qwebengine_convert_dict

>> (/usr/lib/qt5/bin/qwebengine_convert_dict) and that it expects .dic

>> file as entry, Fixed.

>

>Just noticed that it is also in package and path you set, there are

>two possibilities. Expect new dictionaries-common package soon.


--

Soren Stoutner

so...@stoutner.com

signature.asc

Agustin Martin

unread,
Dec 13, 2022, 1:20:04 PM12/13/22
to
El mar, 13 dic 2022 a las 18:43, Soren Stoutner (<so...@stoutner.com>) escribió:
>
> Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of either creating a meta package that depends on the most recent package that includes qwebengine_convert_dict or creating an unversioned package that installs qwebengine_convert_dict? Also, either having qwebengine_convert_dict being installed in an unversioned location or having a symlink that is unversioned? That would make it easier for Hunspell language packages to build-depend on qwebengine_convert_dict and wouldn’t require reworking all of those packages’ build scripts every time the version of Qt in Debian changes.

I modified installdeb-myspell to look for both, with qt6 version
preferred. In policy document, I commented about qt5 version
existence, but discouraging its use as it will disappear sooner. In
theory it could be useful for stable backports, but since .bdic sid
version should be usable unchanged in stable there is no real use for
it.

> Regarding qwebengine_convert_dict expecting the .dic as a file entry, I am not certain I understand what you are referring to. This is how it builds on my Debian testing system. The .dic file must be in the same directory as the .aff, but it isn’t specified (or at least doesn’t need to be specified) as a file entry.

$ /usr/lib/qt6/libexec/qwebengine_convert_dict
Usage: qwebengine_convert_dict <dic file> <bdic file>

Just put what usage note and associated example shows, it is supposed
to be more "stable". Noticed that qwebengine_convert_dict seems to
accept any of both (and look for the other). In theory, a dic file may
have no associated aff file (and be a plain wordlist), but just
checked that even that requires an empty aff file.

--
Agustin

Soren Stoutner

unread,
Dec 13, 2022, 2:05:18 PM12/13/22
to
Agustin,

On Tuesday, December 13, 2022 11:14:22 AM MST Agustin Martin wrote:
> I modified installdeb-myspell to look for both, with qt6 version
> preferred. In policy document, I commented about qt5 version
> existence, but discouraging its use as it will disappear sooner. In
> theory it could be useful for stable backports, but since .bdic sid
> version should be usable unchanged in stable there is no real use for
> it.

Your script does make it easier for packages to automatically migrate to the
current qwebengine_convert_dict location on Qt migrations. However, without a
meta package or an unversioned package each Hunspell dictionary source package
would need to update their build-depends.

This may not be too big a deal as Qt doesn’t switch versions often. However,
it seems to me that the ideal would be to have a stable build-depends package
and even a stable binary execution path for those who either choose not to use
the provided script or who cannot because they need to modify the files first to
work around the current bugs with qwebengine_convert_dict not supporting
certain valid input files.

> $ /usr/lib/qt6/libexec/qwebengine_convert_dict
> Usage: qwebengine_convert_dict <dic file> <bdic file>
>
> Just put what usage note and associated example shows, it is supposed
> to be more "stable". Noticed that qwebengine_convert_dict seems to
> accept any of both (and look for the other). In theory, a dic file may
> have no associated aff file (and be a plain wordlist), but just
> checked that even that requires an empty aff file.

Good catch. For some reason I thought the documentation said to use the .aff
file, but I see now that it was just working around my not using it
idiomatically.


--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Dec 19, 2022, 12:20:04 PM12/19/22
to
Could one of the Debian Qt/KDE maintainers please comment on the questions
asked at

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020387#215

>Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of
either creating a meta package that depends on the most recent package that
includes qwebengine_convert_dict or creating an unversioned package that
installs qwebengine_convert_dict? Also, either having qwebengine_convert_dict
being installed in an unversioned location or having a symlink that is
unversioned? That would make it easier for Hunspell language packages to
build-depend on qwebengine_convert_dict and wouldn’t require reworking all of
those packages’ build scripts every time the version of Qt in Debian changes."

Thanks.

--
Soren Stoutner
so...@stoutner.com
signature.asc

Roland Rosenfeld

unread,
Dec 23, 2022, 11:30:04 AM12/23/22
to
Hi Agustin!

On Fri, 09 Dec 2022, Agustin Martin wrote:

> By the way, I have been playing with an old helper
> (installdeb-myspell) shipped with dictionaries-common-dev to help with
> these bdic files. First cut committed to salsa. Currently
> installdeb-myspell will fail if no conversion tool is found.

Just one question about this: Why did you add this code to
installdeb-myspell and not to installdeb-hunspell?
It's quite confusing that the .bdic file (used by hunspell) is
generated by installdeb-myspell and not by installdeb-hunspell,
especially since the former uses debian/info-myspell and the latter
uses debian/info-hunspell (so I now need to have both of them...).

Or did I miss something here?

Greetings
Roland

Dmitry Shachnev

unread,
Dec 26, 2022, 11:20:10 AM12/26/22
to
Hi all!

(And sorry for the late response. debian...@lists.debian.org is a
list for bots, so I didn't get it in my inbox. It's better to use
pkg-kd...@alioth-lists.debian.net or <package>@packages.debian.org.)

On Tue, Dec 13, 2022 at 10:43:06AM -0700, Soren Stoutner wrote:
> Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of
> either creating a meta package that depends on the most recent package
> that includes qwebengine_convert_dict or creating an unversioned package
> that installs qwebengine_convert_dict? Also, either having
> qwebengine_convert_dict being installed in an unversioned location or
> having a symlink that is unversioned? That would make it easier for
> Hunspell language packages to build-depend on qwebengine_convert_dict and
> wouldn’t require reworking all of those packages’ build scripts every time
> the version of Qt in Debian changes.

I think we can do this, but why do you think such tool should be provided by
Qt WebEngine, not by Chromium itself?

Chromium is the main upstream for convert_dict tool, while Qt WebEngine is one
of several wrappers around it (e.g. another one is Electron). Also having it
in Chromium will help to avoid the problem with versions, as there is always
only one version of Chromium.

Source code for convert_dict is present in the Chromium tarball [1], so it
shouldn't be hard to provide a new binary package for it.

(Maybe this was already discussed in the thread, but I did not read every
message, please give me a link if it's the case.)

[1]: https://sources.debian.org/src/chromium/108.0.5359.124-1/chrome/tools/convert_dict/

--
Dmitry Shachnev
signature.asc

Soren Stoutner

unread,
Dec 26, 2022, 12:40:05 PM12/26/22
to

Dmitry


It hasn’t been discussed, but I think it would make sense for Chromium to ship the convert_dict tool as it is the upstream for the project.  I suppose the reason why the discussion was around how it is shipped in the Qt packages was because that is the only place it is currently shipped in Debian:


https://packages.debian.org/search?searchon=contents&keywords=convert_dict&mode=path&suite=testing&arch=any


Andres, do you have an comments on the feasibility of shipping convert_dict as part of a Chromium package targeted at developers?


Dmitry, can you also comment about adding a symlink from /usr/share/qt5/qtwebengine_dictionaries to /usr/share/hunspell-dict as part of one of the libqt5webengine packages and from /usr/share/qt6/qtwebengine_dictionaries as part of one of the libqt6webengine packages?


There is some information about where Qt WebEngines search for these dictionaries at https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker

--

Soren Stoutner

so...@stoutner.com

signature.asc

Agustin Martin

unread,
Dec 26, 2022, 2:00:04 PM12/26/22
to
El vie, 23 dic 2022 a las 17:21, Roland Rosenfeld
(<rol...@debian.org>) escribió:
>
> Hi Agustin!
>
> > By the way, I have been playing with an old helper
> > (installdeb-myspell) shipped with dictionaries-common-dev to help with
> > these bdic files. First cut committed to salsa. Currently
> > installdeb-myspell will fail if no conversion tool is found.
>
> Just one question about this: Why did you add this code to
> installdeb-myspell and not to installdeb-hunspell?
> It's quite confusing that the .bdic file (used by hunspell) is
> generated by installdeb-myspell and not by installdeb-hunspell,
> especially since the former uses debian/info-myspell and the latter
> uses debian/info-hunspell (so I now need to have both of them...).

Hi, Roland,

I agree that this may sound strange and there is indeed no big reason
behind. Just did it because it was simpler and more straighforward to
reuse installdeb-myspell code and debian/info-myspell file format when
playing with this. Also, debian/info-hunspell is harder to handle from
e.g., lo-dicts while it should be easier to generate a temporary file
in debian/info-myspell file format from minimal info.

Regards

--
Agustin

Andres Salomon

unread,
Dec 26, 2022, 3:40:04 PM12/26/22
to

On Mon, Dec 26 2022 at 10:32:20 AM -0700, Soren Stoutner <so...@stoutner.com> wrote:

Dmitry


It hasn’t been discussed, but I think it would make sense for Chromium to ship the convert_dict tool as it is the upstream for the project.  I suppose the reason why the discussion was around how it is shipped in the Qt packages was because that is the only place it is currently shipped in Debian:


https://packages.debian.org/search?searchon=contents&keywords=convert_dict&mode=path&suite=testing&arch=any


Andres, do you have an comments on the feasibility of shipping convert_dict as part of a Chromium package targeted at developers?




It's definitely feasible*. However, there's the question of whether we want other important packages depending on chromium. https://bugs.debian.org/1004441 shows that it's still an outstanding question whether chromium will even ship in bookworm. I now have Tim helping with packaging, which is wonderful and a huge help (thanks Tim!), but he doesn't have upload privs. If I were to get hit by a bus (or more likely, hit by 😱Responsibilities😱), he'd have to find someone else to sponsor his upload. Without his help, I'm not sure I'd want commit to the next 3 years of security support. So the question of other packages build-depending on convert_dict from chromium will involve the release team and what we decide to do for bookworm.


 * for my own future reference:  ninja -j$(njobs) -C out/Release  convert_dict;  install ./out/Release/convert_dict

Dmitry Shachnev

unread,
Dec 26, 2022, 4:20:04 PM12/26/22
to
Hi Andres!

On Mon, Dec 26, 2022 at 03:33:52PM -0500, Andres Salomon wrote:
> It's definitely feasible*. However, there's the question of whether we want
> other important packages depending on chromium.
> https://bugs.debian.org/1004441 shows that it's still an outstanding
> question whether chromium will even ship in bookworm. I now have Tim helping
> with packaging, which is wonderful and a huge help (thanks Tim!), but he
> doesn't have upload privs. If I were to get hit by a bus (or more likely,
> hit by 😱Responsibilities😱), he'd have to find someone else to sponsor his
> upload. Without his help, I'm not sure I'd want commit to the next 3 years
> of security support. So the question of other packages build-depending on
> convert_dict from chromium will involve the release team and what we decide
> to do for bookworm.

OK, it is a valid reason (although it's a bit weird that we can ship without
Chromium for security maintenance reasons, but at the same time ship with
Qt WebEngine which is usually not getting any security updates at all).

So, if the consensus is that we should ship convert-dict from the Qt side,
I propose to name the package and the executable in a generic way, without
the Qt word (e.g. /usr/bin/convert-dict). This way if the situation changes
in future, we will be able to transfer this binary package to the Chromium
team which is a better home for it.

Also, I am passing the ball to Patrick Franz, who is the Qt 6 maintainer
(unlike me, who mostly works on Qt 5). He may have his own objections, of
course.

If you agree with my naming suggestion, the needed thing will be to add a
symlink /usr/bin/convert-dict -> /usr/lib/qt6/libexec/qwebengine_convert_dict,
and then either rename qt6-webengine-dev-tools to convert-dict or make it
provide a virtual package (the latter won't require passing the NEW queue).

--
Dmitry Shachnev
signature.asc

Timothy Pearson

unread,
Dec 26, 2022, 4:50:03 PM12/26/22
to
For what it's worth I'm interested in obtaining upload privileges to further assist, but I think I need a sponsor etc. for that process?

Thanks!

Soren Stoutner

unread,
Jan 4, 2023, 4:40:05 PM1/4/23
to
Dmitry,

I wanted to followup on the topic of symlinks from /usr/share/qt5/
qtwebengine_dictionaries and /usr/share/qt6/qtwebengine_dictionaries to /usr/
share/hunspell-bdic.

Now that some of the languages are shipping .bdic files, anyone can test how
this works with programs that use Qt WebEngine. For example, install Falkon.
In the Preferences, go to the spell check tab. A list of all available
languages will auto-populate. Enable at least one language and then type
words into the search field in the middle of the screen to verify that spell
checking is working.

However, the above instructions only work if there is a symlink from /usr/
share/qt5/qtwebengine_dictionaries to /usr/share/hunspell-bdic. Users can
manually create this symlink to test this out, but it would be preferable if
one of the webengine packages would add it.

Soren

On Monday, December 26, 2022 10:32:20 AM MST Soren Stoutner wrote:
> Dmitry, can you also comment about adding a symlink from /usr/share/qt5/
> qtwebengine_dictionaries to /usr/share/hunspell-dict as part of one of the
> libqt5webengine packages and from /usr/share/qt6/qtwebengine_dictionaries
> as part of one of the libqt6webengine packages?
>
> There is some information about where Qt WebEngines search for these
> dictionaries at
> https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker[2]

--
Soren Stoutner
so...@stoutner.com
signature.asc

Dmitry Shachnev

unread,
Jan 5, 2023, 6:33:13 AM1/5/23
to
Hi Soren!

On Wed, Jan 04, 2023 at 02:28:45PM -0700, Soren Stoutner wrote:
> Dmitry,
>
> I wanted to followup on the topic of symlinks from /usr/share/qt5/
> qtwebengine_dictionaries and /usr/share/qt6/qtwebengine_dictionaries to /usr/
> share/hunspell-bdic.
>
> Now that some of the languages are shipping .bdic files, anyone can test how
> this works with programs that use Qt WebEngine. For example, install Falkon.
> In the Preferences, go to the spell check tab. A list of all available
> languages will auto-populate. Enable at least one language and then type
> words into the search field in the middle of the screen to verify that spell
> checking is working.
>
> However, the above instructions only work if there is a symlink from /usr/
> share/qt5/qtwebengine_dictionaries to /usr/share/hunspell-bdic. Users can
> manually create this symlink to test this out, but it would be preferable if
> one of the webengine packages would add it.

Sorry, I somehow missed your previous email.

Committed this change now:
https://salsa.debian.org/qt-kde-team/qt/qtwebengine/-/commit/eb6bf5d49fb84bf7

However, if none of the dictionaries is installed, the symlink will be
dangling. Do you think it's okay, or we should find how to fix this?

--
Dmitry Shachnev
signature.asc

Soren Stoutner

unread,
Jan 5, 2023, 5:30:04 PM1/5/23
to
What is the Debian policy on this? If a user does not have any Hunspell
dictionaries installed it will result in a dangling symlink. We could have
some essential package create the /usr/share/hunspell-bdic directory, but in
that case /usr/share/huspell-bdic will exist on systems that don’t intend to
ever install any Hunspell dictionaries, which may be considered by some to be
suboptimal.

From a functionality perspective I don’t think there are any problems with the
dangling symlink. Programs built on Qt WebEngine will simply report that no
dictionaries are available but otherwise will continue to function properly.
This can be verified with the Falcon instruction I posted previously.

Once we figure out the dangling symlink question, who do we need to talk to in
order to get a similar symlink created for /usr/share/qt6/
qtwebengine_dictionaries?

On Thursday, January 5, 2023 4:26:19 AM MST Dmitry Shachnev wrote:
> However, if none of the dictionaries is installed, the symlink will be
> dangling. Do you think it's okay, or we should find how to fix this?
>
> --
> Dmitry Shachnev


--
Soren Stoutner
so...@stoutner.com
signature.asc

Dmitry Shachnev

unread,
Jan 6, 2023, 10:10:04 AM1/6/23
to
On Thu, Jan 05, 2023 at 03:18:14PM -0700, Soren Stoutner wrote:
> What is the Debian policy on this? If a user does not have any Hunspell
> dictionaries installed it will result in a dangling symlink. We could have
> some essential package create the /usr/share/hunspell-bdic directory, but in
> that case /usr/share/huspell-bdic will exist on systems that don’t intend to
> ever install any Hunspell dictionaries, which may be considered by some to be
> suboptimal.
>
> From a functionality perspective I don’t think there are any problems with the
> dangling symlink. Programs built on Qt WebEngine will simply report that no
> dictionaries are available but otherwise will continue to function properly.
> This can be verified with the Falcon instruction I posted previously.

Okay. I think it makes sense. I will include this change in my next upload
(5.15.12 release).

> Once we figure out the dangling symlink question, who do we need to talk to in
> order to get a similar symlink created for /usr/share/qt6/
> qtwebengine_dictionaries?

Patrick Franz who is CCed.

--
Dmitry Shachnev
signature.asc

Lisandro Damián Nicanor Pérez Meyer

unread,
Jan 9, 2023, 9:50:04 AM1/9/23
to
Hi!

On Fri, 6 Jan 2023 at 12:22, Dmitry Shachnev <mit...@debian.org> wrote:
>
> On Thu, Jan 05, 2023 at 03:18:14PM -0700, Soren Stoutner wrote:
> > What is the Debian policy on this? If a user does not have any Hunspell
> > dictionaries installed it will result in a dangling symlink. We could have
> > some essential package create the /usr/share/hunspell-bdic directory, but in
> > that case /usr/share/huspell-bdic will exist on systems that don’t intend to
> > ever install any Hunspell dictionaries, which may be considered by some to be
> > suboptimal.
> >
> > From a functionality perspective I don’t think there are any problems with the
> > dangling symlink.

I understand they pose a security issue, or possible one. But I'm
afraid I do not know the details.

I also would need to read the whole thread in order to see if there is
any other option...

Soren Stoutner

unread,
Jan 9, 2023, 1:43:10 PM1/9/23
to
Although I can think of some circumstances where a dangling symlink can pose a
security risk (depending on where it is located, where it points to, if there
are different permissions on who can write to each location, and what type of
information programs read or write to the link), but I cannot think of any way
this particular symlink could pose a security risk.

On Monday, January 9, 2023 7:46:15 AM MST Lisandro Damián Nicanor Pérez Meyer
wrote:
--
Soren Stoutner
so...@stoutner.com
signature.asc

Soren Stoutner

unread,
Jan 9, 2023, 1:43:11 PM1/9/23
to
For sake of completeness, it was previously discussed that it would be
possible to patch the Qt WebEngine source to directly look for the
dictionaries in /usr/share/hunspell-bdic instead of the default upstream
location. It is unclear how much ongoing maintenance effort that would entail,
but it is a possible solution if the symlink is unacceptable.
signature.asc

Soren Stoutner

unread,
Jan 31, 2023, 3:43:25 PM1/31/23
to

In discussion with the Qt 5 maintainer, we have found a solution that does not use a symlink, which will be included in the upcoming 5.15.12+dfsg-3 release.


More information can be found at:


https://salsa.debian.org/qt-kde-team/qt/qtwebengine/-/merge_requests/12


On Monday, January 9, 2023 11:37:58 AM MST Soren Stoutner wrote:

> For sake of completeness, it was previously discussed that it would be

> possible to patch the Qt WebEngine source to directly look for the

> dictionaries in /usr/share/hunspell-bdic instead of the default upstream

> location.  It is unclear how much ongoing maintenance effort that would

> entail, but it is a possible solution if the symlink is unacceptable.


--

Soren Stoutner

so...@stoutner.com

signature.asc

Soren Stoutner

unread,
Feb 4, 2023, 12:41:50 PM2/4/23
to

I have submitted a merge request to the qt6webengine package that implements what has been discussed.


https://salsa.debian.org/qt-kde-team/qt6/qt6-webengine/-/merge_requests/4


Once it is merged, I will prepare a merge request for the documentation in this package that reflects these changes, specifically that Hunspell language packages should build-depend on the `convert-dict` virtual package and use the unversioned /usr/bin/convert_dict to create the .bdic files.


On Tuesday, January 31, 2023 1:25:59 PM MST Soren Stoutner wrote:

> In discussion with the Qt 5 maintainer, we have found a solution that does

> not use a symlink, which will be included in the upcoming 5.15.12+dfsg-3

> release.

>

> More information can be found at:

>

signature.asc

Rene Engelhard

unread,
Feb 4, 2023, 1:00:05 PM2/4/23
to

Hi,

Am 04.02.23 um 18:30 schrieb Soren Stoutner:

I have submitted a merge request to the qt6webengine package that implements what has been discussed.


https://salsa.debian.org/qt-kde-team/qt6/qt6-webengine/-/merge_requests/4


Once it is merged, I will prepare a merge request for the documentation in this package that reflects these changes, specifically that Hunspell language packages should build-depend on the `convert-dict` virtual package and use the unversioned /usr/bin/convert_dict to create the .bdic files.


Sorry for my absence, and sorry if it's already discussed:

Why convert-dict? Sounds to generic to me, can't we at least do qt-convert-dict or bdic-convert-dict or so?


Regards,


Rene

Soren Stoutner

unread,
Feb 4, 2023, 1:20:04 PM2/4/23
to
Seeing as how .bdic files are not exclusive to Qt, qt-convert-dict is probably
not the most accurate name, but bdic-convert-dict would make sense. Another
option would be to name it convert-bdic. The Chromium upstream names the tool
convert_dict, but we aren’t beholden to follow their lead.

I have updated the merge request to name the tool /usr/bin/convert-bdic, and
the virtual package to be convert-bdic, but can change it again if there is a
consensus in a different direction.

On Saturday, February 4, 2023 10:51:38 AM MST Rene Engelhard wrote:
> Hi,
>
> Sorry for my absence, and sorry if it's already discussed:
>
> Why convert-dict? Sounds to generic to me, can't we at least do
> qt-convert-dict or bdic-convert-dict or so?
>
>
> Regards,
>
>
> Rene


--
Soren Stoutner
so...@stoutner.com
signature.asc

Rene Engelhard

unread,
Feb 4, 2023, 2:30:04 PM2/4/23
to
Hi,

Am 04.02.23 um 19:14 schrieb Soren Stoutner:
> Seeing as how .bdic files are not exclusive to Qt, qt-convert-dict is probably
> not the most accurate name, but bdic-convert-dict would make sense. Another
> option would be to name it convert-bdic. The Chromium upstream names the tool
> convert_dict, but we aren’t beholden to follow their lead.
>
> I have updated the merge request to name the tool /usr/bin/convert-bdic, and
> the virtual package to be convert-bdic, but can change it again if there is a
> consensus in a different direction.


convert-bdic is fine with me,  thanks :)


Regards,


Rene

Agustin Martin

unread,
Feb 6, 2023, 5:50:04 PM2/6/23
to
Also fine with me, thanks.

One question. Will old packages keep shipping conversion tool using
old name and location for a while? We would otherwise need to rebuild
dicts to use new dependency and avoid FTBFS. Once it is clear that
/usr/bin/convert-bdic is the new path I will add it as preferred path.

Regards,

--
Agustin

Soren Stoutner

unread,
Feb 6, 2023, 6:10:05 PM2/6/23
to

Yes, the packages will continue to ship the conversion tools under their current names in perpetuity.  Because Qt goes through version transitions, there are often two version of Qt available in Debian (currently Qt 5 and Qt 6), both of which will ship this tool under a versioned path.  The latest version of Qt will ship a virtual package named `convert-bdic` that will install a symlink from /usr/bin/convert-bdic to the actual location of the conversion utility that is shipped with the newest version of Qt packaged in Debian.


When this commit is merged and released, the package providing the `convert-bdic` virtual package will be `qt6-webengine-dev-tools` and /usr/bin/convert-bdic will be a symlink to usr/lib/qt6/libexec/qwebengine_convert_dict.  Packages manually calling the Qt 5 version will need to be updated when Qt 5 is removed from Debian (at some future date that will likely be a while).


There is not currently any difference between the the copies of `qwebengine_convert_dict` that is shipped with Qt 5 and that shipped with Qt 6.  Both are the same as the upstream Chromium `convert_dict`, which Google has not changed in a long time.

Soren Stoutner

so...@stoutner.com

signature.asc

Lisandro Damián Nicanor Pérez Meyer

unread,
Feb 14, 2023, 2:32:30 PM2/14/23
to
One thing I do not understand is why is this needed on both Qt 5 and Qt 6?
What I understand from the thread is that currently any of them can provide
the dictionaries, so why not keeping this under just one source package?


signature.asc

Soren Stoutner

unread,
Feb 14, 2023, 5:40:04 PM2/14/23
to
Which part do you not understand about not being needed on both Qt 5 and Qt 6?
The part about building the .bdic files or the part about Qt WebEngine using
the .bdic files at runtime?

On Tuesday, February 14, 2023 12:25:20 PM MST Lisandro Damián Nicanor Pérez
--
Soren Stoutner
so...@stoutner.com
signature.asc

Lisandro Damian Nicanor Perez Meyer

unread,
Feb 15, 2023, 8:30:04 PM2/15/23
to
On martes, 14 de febrero de 2023 19:28:53 -03 Soren Stoutner wrote:
> Which part do you not understand about not being needed on both Qt 5 and Qt
> 6? The part about building the .bdic files or the part about Qt WebEngine
> using the .bdic files at runtime?

Sorry, wrong question on my side.

I just went trough the whole thread. I understand that these bdic files are
needed for packages that use Qt[5 6]webengine. But I do not like the idea of
**other** packages making use of them.

Let me explain you why:

- webengine is the most complicated package we handle, it is the very example
of a PITA. It embeds the world, it takes ages to compile, it has weird
errors...

- Hunspell dictionaries should be handled by... hunspell. Yes, I know this was
considered and it's still not possible. But the fact that webengine ships them
is not enough a reason to expose them to the world instead of doing the right
thing: handling them there.

- They are not built by default by Qt itself. This is weird... or they do not
want to handle possible build errors. Should we Qt maintainers? No.

- If the patches are taken and at some point webengine upstreams decide to
switch to something else then we the Qt maintainers get the broken pieces.
Insta RC bugs, we get this package stopped from migrating to testing until
solving the issue... a pain.

So no, I'm totally against these change. Dmitry, Patrick: my suggestion is to
reverse the patches.

Rene Engelhard

unread,
Feb 16, 2023, 12:50:05 AM2/16/23
to
Hi,

Am 16.02.23 um 02:24 schrieb Lisandro Damian Nicanor Perez Meyer:
> - Hunspell dictionaries should be handled by... hunspell. Yes, I know
> this was
> considered and it's still not possible. But the fact that webengine ships them
> is not enough a reason to expose them to the world instead of doing the right
> thing: handling them there.

Then make it use hunspell.

Unpatched and with the same format.

It's not as if hunspell invented a binary format for no gain at all
instead of just differentiating for differentiating.

Same with an internal *patched* hunspell copy.

> - If the patches are taken and at some point webengine upstreams decide to
> switch to something else then we the Qt maintainers get the broken pieces.
> Insta RC bugs, we get this package stopped from migrating to testing until
> solving the issue... a pain.


e.g.

That is already the case. All packages building bdic right now *are*
laready using it (and be it via usage of installdeb-myspell which calls
the binary):

root@frodo:/# apt-cache showsrc igerman98
Package: igerman98
Binary: ingerman, iswiss, wngerman, wswiss, rmligs-german,
hunspell-de-at, hunspell-de-ch, hunspell-de-de, aspell-de
Version: 20161207-11
Maintainer: Roland Rosenfeld <rol...@debian.org>
Uploaders: Rene Engelhard <re...@debian.org>
Build-Depends: debhelper-compat (= 13)
Build-Depends-Indep: aspell, busybox, dictionaries-common-dev (>=
1.29.3), hunspell, qt6-webengine-dev-tools, ispell
[...]

> So no, I'm totally against these change. Dmitry, Patrick: my suggestion is to
> reverse the patches.

If you  revert the virtual package it's still too late. And it
complicates things even more since then people need to change their
build-dependency (and maybe calls) explicitely, causing more PITA.

The virtual package will help with that in that this will automagically
happen.


And the whole bdic  thingy: It's there, in in qtwebengine itself.
dictionaries-commons policy, in installdeb-myspell --bdic-only etc.

And Soren has a point, Debian should support those .bdic files is
possible, how broken their existence may be.


Regards,


Rene

>
>

Rene Engelhard

unread,
Feb 16, 2023, 1:10:04 AM2/16/23
to
Hi,

Am 16.02.23 um 06:40 schrieb Rene Engelhard:
> root@frodo:/# apt-cache showsrc igerman98
> Package: igerman98

root@frodo:/# grep-dctrl -FBuild-Depends-Indep qt6-web
/var/lib/apt/lists/deb.debian.org_debian_dists_unstable_main_source_Sources
-sPackage
Package: eo-spell
Package: espa-nol
Package: igerman98
Package: ispell-fo
root@frodo:/# grep-dctrl -FBuild-Depends qt6-web
/var/lib/apt/lists/deb.debian.org_debian_dists_unstable_main_source_Sources
-sPackage
Package: dsdo
Package: hunspell-ca
Package: hunspell-eu
Package: hunspell-lv
Package: pyqt6
Package: pyqt6-webengine
Package: qt6-httpserver
Package: qt6-webchannel
Package: qt6-webengine
Package: qt6-webview
Package: ring

Regards,


Rene

Lisandro Damian Nicanor Perez Meyer

unread,
Feb 16, 2023, 6:30:04 AM2/16/23
to
On jueves, 16 de febrero de 2023 02:40:21 -03 Rene Engelhard wrote:
> Hi,
>
> Am 16.02.23 um 02:24 schrieb Lisandro Damian Nicanor Perez Meyer:
> > - Hunspell dictionaries should be handled by... hunspell. Yes, I know
> > this was
> > considered and it's still not possible. But the fact that webengine ships
> > them is not enough a reason to expose them to the world instead of doing
> > the right thing: handling them there.
>
> Then make it use hunspell.
>
> Unpatched and with the same format.
>
> It's not as if hunspell invented a binary format for no gain at all
> instead of just differentiating for differentiating.
>
> Same with an internal *patched* hunspell copy.

I agree with that, and actually is one of the many reasons why I was against
uploading qt6-webengine in the first place.

> > - If the patches are taken and at some point webengine upstreams decide to
> > switch to something else then we the Qt maintainers get the broken pieces.
> > Insta RC bugs, we get this package stopped from migrating to testing until
> > solving the issue... a pain.
>
> e.g.
>
> That is already the case. All packages building bdic right now *are*
> laready using it (and be it via usage of installdeb-myspell which calls
> the binary):
>
> root@frodo:/# apt-cache showsrc igerman98
> Package: igerman98
> Binary: ingerman, iswiss, wngerman, wswiss, rmligs-german,
> hunspell-de-at, hunspell-de-ch, hunspell-de-de, aspell-de
> Version: 20161207-11
> Maintainer: Roland Rosenfeld <rol...@debian.org>
> Uploaders: Rene Engelhard <re...@debian.org>
> Build-Depends: debhelper-compat (= 13)
> Build-Depends-Indep: aspell, busybox, dictionaries-common-dev (>=
> 1.29.3), hunspell, qt6-webengine-dev-tools, ispell
> [...]
>
> > So no, I'm totally against these change. Dmitry, Patrick: my suggestion is
> > to reverse the patches.
>
> If you revert the virtual package it's still too late. And it
> complicates things even more since then people need to change their
> build-dependency (and maybe calls) explicitely, causing more PITA.

The virtual package for Qt 6 is commited but not uploaded.

See, with the current status if tomorrow webengine stops providing hunspell
dictionaries then we Qt maintainers have no obligation against packages using
them: they where never meant to be used for packages not really using Qt 6.

On the other hand going ahead with this means this becomes official API-like
behavior.

> The virtual package will help with that in that this will automagically
> happen.
>
>
> And the whole bdic thingy: It's there, in in qtwebengine itself.
> dictionaries-commons policy, in installdeb-myspell --bdic-only etc.
>
> And Soren has a point, Debian should support those .bdic files is
> possible, how broken their existence may be.

Then fork the hunspell code out of webengine and provide a proper package.
That won't break at webengine's developers will.

Lisandro Damian Nicanor Perez Meyer

unread,
Feb 16, 2023, 6:40:04 AM2/16/23
to
By the way: I **do** understand that what you all are proposing is an easy way
out and sounds like it makes sense.

Now I have been around Qt for 10+ years already, and suffered each and every
web engine of the day source code during all this time. I know how problematic
it can be and how, at the end of the day, is us maintainers then one that get
the broken pieces when something breaks. Really, it's a pain.

Had this occurred in another Qt submodule I would probably not be so adamant
in avoiding it. But webengine/webkit where always a PITA. And I do not expect
that to change, I'm afraid.

Soren Stoutner

unread,
Feb 16, 2023, 11:52:57 AM2/16/23
to
Seeing as this is how Qt WebEngine is designed upstream, I think it is
important to support it in Debian. From my personal perspective, the program
I am developing (Privacy Browser) depends on Qt WebEngine and needs spell
checking functionality to be viable in Debian.

I have been working with the Qt 5 and 6 WebEngine code base recently and have
submitted patches both to Debian and upstream. My goal is to make the
WebEngine packages Lintian free, which is going to require a bit of work, but
I am in it for the long haul. I am also willing to become the maintainer of
the WebEngine packages or to co-maintain them with others.

While I agree that the entire design of the .bdic binary dictionaries is
suboptimal, I think that appropriately supporting them in Debian is the best
way forward.

On Thursday, February 16, 2023 4:25:45 AM MST Lisandro Damian Nicanor Perez
--
Soren Stoutner
so...@stoutner.com
signature.asc

Lisandro Damián Nicanor Pérez Meyer

unread,
Feb 16, 2023, 1:32:29 PM2/16/23
to
El jueves, 16 de febrero de 2023 13:42:42 -03 Soren Stoutner escribió:
> Seeing as this is how Qt WebEngine is designed upstream, I think it is
> important to support it in Debian. From my personal perspective, the
> program I am developing (Privacy Browser) depends on Qt WebEngine and needs
> spell checking functionality to be viable in Debian.
>
> I have been working with the Qt 5 and 6 WebEngine code base recently and
> have submitted patches both to Debian and upstream. My goal is to make the
> WebEngine packages Lintian free, which is going to require a bit of work,
> but I am in it for the long haul. I am also willing to become the
> maintainer of the WebEngine packages or to co-maintain them with others.

I'm totally in for this, but then I need to see proves before continue
exposing internal stuff to third parties. It's very much the same issue with
private headers.

I would definitely do not mind to expose this if the Qt project compiles the
bdic files as part of their build process *and* it's part of their CI testing.

> While I agree that the entire design of the .bdic binary dictionaries is
> suboptimal, I think that appropriately supporting them in Debian is the best
> way forward.

Believe me I try to do the same, but web engines already made me waste too
much time, so I try to avoid whatever could bring us yet another headache. A
simple error can easily cost a couple of hours.
signature.asc

Soren Stoutner

unread,
Feb 16, 2023, 2:00:05 PM2/16/23
to
Honestly, the impact on maintaining the Qt WebEngine packages is negligible.
The Debian packages have been shipping the binary dictionary conversion tool
for a long time, which is the biggest piece of the puzzle and has already been
solved. Upstream (both Qt and Chromium) have not modified any of this code for
a long time, and Chromium has stated that it is in maintenance mode, meaning
they aren't currently planning to make any changes going forward, so nothing
should need to change with the packaging of that.

All the Qt packages need to do is to continue to ship the code that they have
been shipping for a long time. The only difference is that now there is a
symlink that makes it easy for language packaging to jump between Qt versions
without needing to update their path to reference the new Qt version, there is
a virtual package, so they don't need to update their build-depends to
reference the new Qt version, and the WebEngines now have an environment
variable set so they know where to look to find the dictionaries that are
shipped by other packages.

If this were going to be a large maintenance burden on Qt WebEngine packaging
I could see there being some concern. But the burden on Qt packagers, myself
or others, going forward is very minimal.

On Thursday, February 16, 2023 11:22:01 AM MST Lisandro Damián Nicanor Pérez
--
Soren Stoutner
so...@stoutner.com
signature.asc

Andres Salomon

unread,
Feb 16, 2023, 3:30:04 PM2/16/23
to
Related to this - we got approval for chromium to ship in bookworm
(#1004441). That doesn't necessarily mean it'll be in future releases
(trixie or whatever), of course, but if it's easier for the dependency
chain; I'm open to discussing having chromium provide it.

I haven't followed all of this very long thread, so it may be
irrelevant at this point. :)

Soren Stoutner

unread,
Feb 16, 2023, 3:40:06 PM2/16/23
to
It would be fine with me if Chromium provided the virtual package and symlink
used to build the .bdic files. My only concern is that it is important that
these always exist in Stable and Old Stable going forward. Otherwise, it
makes backporting Hunspell language packages more difficult (not impossible,
just more time consuming for the language package maintainers).

On Thursday, February 16, 2023 1:19:45 PM MST Andres Salomon wrote:
> Related to this - we got approval for chromium to ship in bookworm
> (#1004441). That doesn't necessarily mean it'll be in future releases
> (trixie or whatever), of course, but if it's easier for the dependency
> chain; I'm open to discussing having chromium provide it.
>
> I haven't followed all of this very long thread, so it may be
> irrelevant at this point. :)

--
Soren Stoutner
so...@stoutner.com
signature.asc

Rene Engelhard

unread,
Feb 17, 2023, 12:40:04 AM2/17/23
to

Am 16.02.23 um 21:37 schrieb Soren Stoutner:
> It would be fine with me if Chromium provided the virtual package and symlink
> used to build the .bdic files. My only concern is that it is important that
> these always exist in Stable and Old Stable going forward.

Yes. Not only for backporting but as soon as e.g. scowl will use it the
whole thing will be in a key package build-dependency chain
(libreoffice) as hunspell-en-us is used as part of its tests. Probably
even more.

Basically any hunspell dictionary then requires chromium to be there.


On and off ("shipping dictionaries whether chromium is in a good state
or not") is bad.


> Otherwise, it
> makes backporting Hunspell language packages more difficult (not impossible,
> just more time consuming for the language package maintainers).

No hunspell package really needs backporting at all, they are arch all
and can just be installed. :)

And be it with dpkg -i :-)


Regards,


Rene

Soren Stoutner

unread,
Mar 6, 2023, 3:31:53 PM3/6/23
to

I have created a merge request to update the documentation to reflect the changes in the Qt packaging that have entered unstable with qt6-webengine 6.4.2-final+dfsg-1.


https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/6


https://tracker.debian.org/pkg/qt6-webengine


Soren


--

Soren Stoutner

so...@stoutner.com

signature.asc
0 new messages