Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: po statistics

0 views
Skip to first unread message

Helge Kreutzmann

unread,
Nov 21, 2023, 12:10:05 AM11/21/23
to
Hello,
On Mon, Nov 20, 2023 at 01:41:04PM +0100, Laura Arjona Reina wrote:
> El 20/11/23 a las 10:43, Thomas Lange escribió:
> > I'm still not sure which of the languages we need or which are just
> > bugs in packages. Or do we have bug in the scripts, that generat this
> > language list?
> > What about the AA_BB and AA@somestring languages?
> >
> > For e.g. I wonder why we have international/l10n/po/man_DE
> > which links only to this po file:
> > https://i18n.debian.org/material/po/unstable/main/i/i2p/installer/resources/locale-man/i2p_0.9.48-1.1_man_de.po.gz
> >
> > This po file cleary says
> > "Language: de\n"
> > Why is the language then called man_DE and not just "de"? Is this a
> > bug in our scripts?
>
> I think this is a bug, because those po files are named man_ because they
> are translations of the manual pages, not because of the Mandingo language.
>
> "Similar" thing happens with
> https://www.debian.org/international/l10n/po/bos_DE (and all other bos_XX
> links in https://www.debian.org/international/l10n/po/): they are
> translations files of the boswars package, having two translations
> templates, one named     boswars_version_xx.po.gz and another one named 
> boswars_version_bos_xx.po.gz, so that "bos" is misdetected as Bosnian
> language.

Ideally these would be fixed upstream, but 2nd best option would be to
filter them, i.e. if it is invalid combination (as shown) then simply
simply present it unter "DE" (in your example).

This, howeve, would require a list of all valid combination. I don't
know if such a list exists.

Greetings

Helge

--
Dr. Helge Kreutzmann deb...@helgefjell.de
Dipl.-Phys. http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
Help keep free software "libre": http://www.ffii.de/
signature.asc

Thomas Lange

unread,
Nov 23, 2023, 7:50:03 AM11/23/23
to
Hi all,

thanks for all the feedback. It seems that there's more work to do
than I've expected. Currently I will not work on fixing the po
statistics, because I want to concentrate on the security pages.
So, if anyone else likes to work on this, feel free to do so.

--
regards Thomas

Holger Wansing

unread,
Jan 1, 2024, 5:40:04 PMJan 1
to
Hi,
I have a tested patch here, which allows to strip out some of the language
entries out of the 'langs' list, so those entries do no longer show up at
https://www.debian.org/international/l10n/po/ and the respective html
pages are not built as well.

For the first step I have chosen to delete most of the "Unknown language"
entries and some more, which are false-positives because of wrongly
named files/directories in packages.
It workes fine here locally.

The functionality is as the following:
When make is executed in the ./english/international/l10n/po directory,
the gen-files.sh script is called, which builds the list of languages
out of the material from https://i18n.debian.org/material/.
This list of languages is written to ./english/international/l10n/data/langs.

I have now created a script named strip-langs.sh, which removes entries
by choice from this langs file (you can define in this script, which
entries to remove).
The strip-langs.sh is called after the gen-files.sh run.
And then the html files for all the left languages are built.

With this, we would have an infrastructure, to remove single language
entries from the langs list choice by choice.

In additional steps, we could also remove entries like de_AT, de_CH, de_DE
and only leave 'de', as mentioned by Thomas.
But that's another discussion.


I have filed this as a merge request:
https://salsa.debian.org/webmaster-team/webwml/-/merge_requests/947

Holger


--
Holger Wansing <hwan...@mailbox.org>
PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076

Thomas Lange

unread,
Jan 2, 2024, 3:30:04 AMJan 2
to
Hi Holger,

this script looks good to me. Thanks a lot for your work on this.
I think you can merge it.

>>>>> On Mon, 1 Jan 2024 23:39:09 +0100, Holger Wansing <hwan...@mailbox.org> said:


> In additional steps, we could also remove entries like de_AT, de_CH, de_DE
> and only leave 'de', as mentioned by Thomas.
> But that's another discussion.
I guess we do not want to remove those languages, but merge the
variants into de. But maybe this need to be done in a different
script, maybe in gen-files.pl.

--
regards Thomas
0 new messages