On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
> First of all, I am curious about the reasons behind this new format,
> the problems it deals with and its advantages. I assume they are valid
> enough, but they imply yet another spellchecking engine/format. We
> currently have goog old ispell, aspell and hunspell. vim has its own
> spellchecker engine using its own format, with dicts that can be
> created from old myspell2 dicts. We did not add vim format dicts (from
> aspell dicts sources) since there seems to be some work to make vim
> use hunspell directly. And now these bdict dicts.
The .bdic format is specified by the upstream Chromium project, and is required by anything that is based off of Chromium's code, like Qt WebEngine. I do not know why they went with a proprietary binary format, but I would assume that if they went to so much trouble to not use the standard Hunspell format there must have been something to make it worthwhile, like some performance improvement. Perhaps I am giving Google too much credit for having logical reasons instead of making arbitrary decisions.
> From your info and proposed locations seems that these dicts are
> arch:all, ¿is that true?
I have not seen anything to indicate they are not arch:all. Although it probably depends on how the binary data is processed. There is a possibility there might be an endianess issue.
> Another question is what happens with affix files, which I see are
> used at build time, ¿are they used (from their path) at runtime or is
> all the info (dic+aff) bundled into the bdic file? If explicit affix
> files are still required at runtime, both bdic and aff files should
> probably be in the same dir. Otherwise I am more for a separate
> location. In this case, since bdic dicts seem to be more generic than
> just a qtwebengine issue and they are indeed created from hunspell
> files I would go for a rather generic name (may be something like
> /usr/share/hunspell-bdic or something without the hunspell name?)
The .bdic binary file contains all the information from the .dic and .aff files, so neither of them are needed by Qt WebEngine. As such, I think a dedicated directory for the .bdic files is best.
My personal motivation for getting these dictionaries into Debian is that I am the developer of Privacy Browser, which is a web browser based on Qt WebEngine. The PC version is currently in a pre-alpha state.
https://www.stoutner.com/privacy-browser-pc/
When adding spell checking functionality, I realized that these dictionaries were not already packaged. The little bit of poking around that I did showed that Arch Linux packages them, but I do not know if other distributions do so.
https://archlinux.org/todo/packaging-qtwebengine-dictionaries/
There are a number of existing web browsers in Debian based on Qt WebEngine that could take advantage of the presence of these .bdic dictionaries. A non-exhaustive list includes: Konqueror, Falkon, qutebrowser, and angelfish. If it ends up being feasible for Chromium to also use a system-wide .bdic location, then any Chromium fork would also benefit.
Once Privacy Browser reaches an alpha release, my intention is to maintain a Debian package for it. I have the option of integrating the .bdics directly into the program's personal data folders, but that seems like a suboptimal approach, because anything else on the system that wanted to use them would have to have their own copy. When the binary dictionaries are installed in the correct system-wide folder, any Qt WebEngine program can utilize them with a single line of code that specifies which dictionary to use (only one can be active at a time). Of course, the program would also probably need to establish a GUI where the user can select which dictionary they would like to be active, which GUI involves more than a single line of code.
--
Soren Stoutner
I submitted three upstream bugs.
https://bugreports.qt.io/browse/QTBUG-107599
https://bugreports.qt.io/browse/QTBUG-107600
https://bugreports.qt.io/browse/QTBUG-107601
--
Soren Stoutner
This is Google’s page describing the .bdic format:
It doesn’t directly address the topic of endianess, but it does say the following:
"The .bdic files are always UTF-8 internally, and the convert_dict tool converts things appropriately when it runs.”
I must admit that the topic of endianess goes a bit beyond my expertise, but my understanding is that it is primarily an issue for executable files. As the .bdic is only a data file, and as the data encoded inside it is in UTF-8 as described above, would that mean that it is safe to assume that these are arch:all?
--
Soren Stoutner
On Sunday, November 13, 2022 3:13:55 PM MST Agustin Martin wrote:
> It is to note that even that 10 years code apparently has support for
> the IGNORE flag, unsupported by the .bdic dicts. Fortunately, seems
> that there are not many dicts using that flag in
> libreoffice-dictionaries.
>
> libreoffice-dictionaries-7.4.2$ grep -r IGNORE *
> dictionaries/bo/bo.aff:IGNORE ༵༷
> dictionaries/ar/ar.aff:IGNORE ـٰ
> dictionaries/uk_UA/uk_UA.aff:IGNORE ́
> dictionaries/ckb/dictionaries/ckb.aff:IGNORE ـٰ١٢٣٤۴٥۵٦۶٧٨٩٠
> dictionaries/hu_HU/hu_HU.aff:IGNORE ()]
Thanks for catching these additional languages with the IGNORE command in their .aff file.
I added notes to the Tibetan bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020479
And to the LibreOffice dictionaries bug report, which ships Hungarian and Ukrainian:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020479
The issue with Arabic was previously reported:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020481
I was not able to find ckb.aff in the Debian archive. There is a ckb_IQ.aff which links to kmr_Latn.cff in the hunspell-kmr package, but that file does not include an IGNORE command and doesn’t produce an error when compiled to the .bdic format. Does anyone know if ckb.aff exists in Debian?
--
Soren Stoutner
On Thursday, November 17, 2022 3:18:17 PM MST Mattia Rizzolo wrote:>
> What I do want to see *before* we actually release a lo-dicts with these
> is something that actually reads and make use of them *first*.
Privacy Browser PC uses them.
https://www.stoutner.com/privacy-browser-pc/
I would like to package up an alpha release of Privacy Browser PC. One of the things I am waiting on is getting spell checking working correctly. Part of this is the Privacy Browser needs to be able to enumerate the installed dictionaries so that users can select between them. How it does this depends on where the dictionaries are placed and requires that the Qt packages have the correct symlinks. So, I considered having these dictionaries packaged and in Debian was an important first step.
Originally I planned to have Privacy Browser packaged by the end of the year, but at the current pace it looks like it is going to take longer than I expected to get all the prerequisites in place (this being one of them) so I am now looking for an early 2023 release (hopefully before the freeze).
> > At this point, the only question left is where this should be documented
> > and who should write the documentation. I am assuming that
> > /usr/share/doc/ dictionaries-common-dev/dsdt-policy.html is the correct
> > place for
> > documentation. I am willing to submit a PR if nobody else prefers to do
> > so
> > instead.
>
> mh, what documentation are you talking about??
The documentation for any packager of future Hunspell languages so that they know that they need to include compiled .bdic files, where they should be placed, and how to do it.
--
Soren Stoutner
I created an MR:
https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/5
Please review and make sure I haven’t missed anything or misrepresented the consensus.
Agustin,
You are correct that there are currently two copies in Debian, one that comes with the Qt 5 packages and the other that comes with the Qt 6 packages.
Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of either creating a meta package that depends on the most recent package that includes qwebengine_convert_dict or creating an unversioned package that installs qwebengine_convert_dict? Also, either having qwebengine_convert_dict being installed in an unversioned location or having a symlink that is unversioned? That would make it easier for Hunspell language packages to build-depend on qwebengine_convert_dict and wouldn’t require reworking all of those packages’ build scripts every time the version of Qt in Debian changes.
Regarding qwebengine_convert_dict expecting the .dic as a file entry, I am not certain I understand what you are referring to. This is how it builds on my Debian testing system. The .dic file must be in the same directory as the .aff, but it isn’t specified (or at least doesn’t need to be specified) as a file entry.
$ /usr/lib/qt5/bin/qwebengine_convert_dict en_US.aff en_US.bdic
en_US.dic_delta not found.
Reading en_US.aff
Reading en_US.dic
Serializing...
Verifying...
Writing en_US.bdic
Success. Dictionary converted.
On Tuesday, December 13, 2022 1:52:20 AM MST Agustin Martin (<agma...@debian.org>) wrote:
>> Note that Debian has a different path for qwebengine_convert_dict
>> (/usr/lib/qt5/bin/qwebengine_convert_dict) and that it expects .dic
>> file as entry, Fixed.
>
>Just noticed that it is also in package and path you set, there are
>two possibilities. Expect new dictionaries-common package soon.
--
Soren Stoutner
Dmitry
It hasn’t been discussed, but I think it would make sense for Chromium to ship the convert_dict tool as it is the upstream for the project. I suppose the reason why the discussion was around how it is shipped in the Qt packages was because that is the only place it is currently shipped in Debian:
Andres, do you have an comments on the feasibility of shipping convert_dict as part of a Chromium package targeted at developers?
Dmitry, can you also comment about adding a symlink from /usr/share/qt5/qtwebengine_dictionaries to /usr/share/hunspell-dict as part of one of the libqt5webengine packages and from /usr/share/qt6/qtwebengine_dictionaries as part of one of the libqt6webengine packages?
There is some information about where Qt WebEngines search for these dictionaries at https://doc.qt.io/qt-5/qtwebengine-features.html#spellchecker
Dmitry
It hasn’t been discussed, but I think it would make sense for Chromium to ship the convert_dict tool as it is the upstream for the project. I suppose the reason why the discussion was around how it is shipped in the Qt packages was because that is the only place it is currently shipped in Debian:
Andres, do you have an comments on the feasibility of shipping convert_dict as part of a Chromium package targeted at developers?
In discussion with the Qt 5 maintainer, we have found a solution that does not use a symlink, which will be included in the upcoming 5.15.12+dfsg-3 release.
More information can be found at:
https://salsa.debian.org/qt-kde-team/qt/qtwebengine/-/merge_requests/12
On Monday, January 9, 2023 11:37:58 AM MST Soren Stoutner wrote:
> For sake of completeness, it was previously discussed that it would be
> possible to patch the Qt WebEngine source to directly look for the
> dictionaries in /usr/share/hunspell-bdic instead of the default upstream
> location. It is unclear how much ongoing maintenance effort that would
> entail, but it is a possible solution if the symlink is unacceptable.
I have submitted a merge request to the qt6webengine package that implements what has been discussed.
https://salsa.debian.org/qt-kde-team/qt6/qt6-webengine/-/merge_requests/4
Once it is merged, I will prepare a merge request for the documentation in this package that reflects these changes, specifically that Hunspell language packages should build-depend on the `convert-dict` virtual package and use the unversioned /usr/bin/convert_dict to create the .bdic files.
On Tuesday, January 31, 2023 1:25:59 PM MST Soren Stoutner wrote:
> In discussion with the Qt 5 maintainer, we have found a solution that does
> not use a symlink, which will be included in the upcoming 5.15.12+dfsg-3
> release.
>
> More information can be found at:
>
Hi,
I have submitted a merge request to the qt6webengine package that implements what has been discussed.
https://salsa.debian.org/qt-kde-team/qt6/qt6-webengine/-/merge_requests/4
Once it is merged, I will prepare a merge request for the documentation in this package that reflects these changes, specifically that Hunspell language packages should build-depend on the `convert-dict` virtual package and use the unversioned /usr/bin/convert_dict to create the .bdic files.
Sorry for my absence, and sorry if it's already discussed:
Why convert-dict? Sounds to generic to me, can't we at least do qt-convert-dict or bdic-convert-dict or so?
Regards,
Rene
Yes, the packages will continue to ship the conversion tools under their current names in perpetuity. Because Qt goes through version transitions, there are often two version of Qt available in Debian (currently Qt 5 and Qt 6), both of which will ship this tool under a versioned path. The latest version of Qt will ship a virtual package named `convert-bdic` that will install a symlink from /usr/bin/convert-bdic to the actual location of the conversion utility that is shipped with the newest version of Qt packaged in Debian.
When this commit is merged and released, the package providing the `convert-bdic` virtual package will be `qt6-webengine-dev-tools` and /usr/bin/convert-bdic will be a symlink to usr/lib/qt6/libexec/qwebengine_convert_dict. Packages manually calling the Qt 5 version will need to be updated when Qt 5 is removed from Debian (at some future date that will likely be a while).
There is not currently any difference between the the copies of `qwebengine_convert_dict` that is shipped with Qt 5 and that shipped with Qt 6. Both are the same as the upstream Chromium `convert_dict`, which Google has not changed in a long time.
Soren Stoutner
I have created a merge request to update the documentation to reflect the changes in the Qt packaging that have entered unstable with qt6-webengine 6.4.2-final+dfsg-1.
https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/6
https://tracker.debian.org/pkg/qt6-webengine
Soren
--
Soren Stoutner