svnadmin upgrade output message i18n issue

5 views
Skip to first unread message

QXO

unread,
May 22, 2013, 10:13:34 PM5/22/13
to us...@subversion.apache.org
os: windows
encoding:GBK ( chcp 936 )

The svnadmin  upgrade command output message  first line encoding issue(UTF-8 show in GBK),But the second line is right encoding!

宸插彇寰楃増鏈簱閿佸畾銆?璇风◢鍊欙紱鍗囩骇鐗堟湰搴撳彲鑳介渶瑕佷竴娈垫椂闂?..

完成升级。

if change console encoding to UTF-8 (chcp 65001),output message is :

Repository lock acquired.
Please wait; upgrading the repository may take some time...

Upgrade completed.


Dongsheng Song

unread,
May 23, 2013, 4:00:21 AM5/23/13
to QXO, us...@subversion.apache.org
I have download a binary package from win32svn[1], and confirmed your issue.

I check the subversion.mo file:

msgunfmt.exe subversion.mo -o subversion.po

It looks OK. Then I replaced the intl3_svn.dll file with gettext
0.18.2, it give me another output:

C:\var\tmp>svnadmin upgrade test
已取得版本库锁定。
请稍候;升级版本库可能需要一段时间...

?\205?\234?\179?\201?\201?\253?\188?\182?\161?\163

C:\var\tmp>

The I write a simple test program:

/*
* cl /MD /I. t-intl.c libintl-8.lib
*/
#include <stdio.h>
#include <locale.h>
#include <libintl.h>
#include <iconv.h>

#define _(S) gettext(S)

#define PACKAGE_NAME "subversion"

int main(int argc, char **argv)
{
iconv_t cd;
size_t nc, inbytesleft, outbytesleft;
char *msg, msg2[256];

setlocale(LC_ALL,"");
setlocale(LC_CTYPE, "");

bindtextdomain(PACKAGE_NAME, "../share/locale");
/* bind_textdomain_codeset(PACKAGE_NAME, "UTF-8"); */
textdomain(PACKAGE_NAME);

#undef printf

printf(_("Repository lock acquired.\n"
"Please wait; recovering the repository may take some time...\n"));

printf(_("\n"
"Upgrade completed.\n"));

return 0;
}

C:\var\tmp\svn-win32-1.7.9\bin>cl /nologo /MD /I. t-intl.c libintl-8.lib

C:\var\tmp\svn-win32-1.7.9\bin>t-intl.exe
已取得版本库锁定。
请稍候;修复版本库可能需要一段时间...

完成升级。

So this is a binary package build issue, not subversion issue.

[1] http://sourceforge.net/projects/win32svn/files/1.7.9/apache24/svn-win32-1.7.9-ap24.zip/download

Philip Martin

unread,
May 23, 2013, 4:17:06 AM5/23/13
to QXO, us...@subversion.apache.org, d...@subversion.apache.org
[bringing in d...@s.a.o]
Those two lines are produced by different code paths. The first line
is produced by repos_notify_handler:

svn_error_clear(svn_stream_printf(feedback_stream, scratch_pool,
_("Repository lock acquired.\n"
"Please wait; upgrading the"
" repository may take some time...\n")));

The second line is produced by:

SVN_ERR(svn_cmdline_printf(pool, _("\nUpgrade completed.\n")));

and svn_cmdline_printf uses svn_cmdline_cstring_from_utf8 to do a UTF8
to native conversion.

So it appears the UTF8 to native conversion is missing from
repos_notify_handler. I think repos_notify_handler should be using
svn_stream_printf_from_utf8 rather than svn_stream_printf.

--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Dongsheng Song

unread,
May 23, 2013, 9:06:15 AM5/23/13
to Philip Martin, QXO, us...@subversion.apache.org, d...@subversion.apache.org
NO. From GETTEXT(3) man pages:

In both cases, the functions also use the LC_CTYPE locale facet in
order to convert the translated message from the translator's
codeset to the ***current locale's codeset***, unless overridden by a
prior call to the bind_textdomain_codeset function.

So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8
coded, it it encoded to the ***current locale's codeset***.

--
Regards,
Dongsheng

Philip Martin

unread,
May 23, 2013, 9:11:19 AM5/23/13
to QXO, us...@subversion.apache.org, d...@subversion.apache.org
Philip Martin <philip...@wandisco.com> writes:

> So it appears the UTF8 to native conversion is missing from
> repos_notify_handler. I think repos_notify_handler should be using
> svn_stream_printf_from_utf8 rather than svn_stream_printf.

I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed it
for 1.8.

Dongsheng Song

unread,
May 23, 2013, 9:16:26 AM5/23/13
to Philip Martin, QXO, us...@subversion.apache.org, d...@subversion.apache.org
On Thu, May 23, 2013 at 9:11 PM, Philip Martin
<philip...@wandisco.com> wrote:
> Philip Martin <philip...@wandisco.com> writes:
>
>> So it appears the UTF8 to native conversion is missing from
>> repos_notify_handler. I think repos_notify_handler should be using
>> svn_stream_printf_from_utf8 rather than svn_stream_printf.
>
> I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed it
> for 1.8.
>

As GETTEXT(3) man pages said, If and only if
defined(HAVE_BIND_TEXTDOMAIN_CODESET),
your commit is OK.

So you should check HAVE_BIND_TEXTDOMAIN_CODESET when you use
svn_cmdline_cstring_from_utf8.

Philip Martin

unread,
May 23, 2013, 9:28:29 AM5/23/13
to Dongsheng Song, QXO, us...@subversion.apache.org, d...@subversion.apache.org
Are you saying there is a problem with my change? If there is a problem
doesn't already apply to all other uses of svn_cmdline_cstring_from_utf8?

Dongsheng Song

unread,
May 23, 2013, 9:37:01 AM5/23/13
to Philip Martin, QXO, us...@subversion.apache.org, d...@subversion.apache.org
On Thu, May 23, 2013 at 9:28 PM, Philip Martin
<philip...@wandisco.com> wrote:
> Dongsheng Song <dongshe...@gmail.com> writes:
>
>> On Thu, May 23, 2013 at 9:11 PM, Philip Martin
>> <philip...@wandisco.com> wrote:
>>> Philip Martin <philip...@wandisco.com> writes:
>>>
>>>> So it appears the UTF8 to native conversion is missing from
>>>> repos_notify_handler. I think repos_notify_handler should be using
>>>> svn_stream_printf_from_utf8 rather than svn_stream_printf.
>>>
>>> I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed it
>>> for 1.8.
>>>
>>
>> As GETTEXT(3) man pages said, If and only if
>> defined(HAVE_BIND_TEXTDOMAIN_CODESET),
>> your commit is OK.
>>
>> So you should check HAVE_BIND_TEXTDOMAIN_CODESET when you use
>> svn_cmdline_cstring_from_utf8.
>
> Are you saying there is a problem with my change? If there is a problem
> doesn't already apply to all other uses of svn_cmdline_cstring_from_utf8?
>

I thinks so. In the subversion/libsvn_subr/nls.c file:

#ifdef HAVE_BIND_TEXTDOMAIN_CODESET
bind_textdomain_codeset(PACKAGE_NAME, "UTF-8");
#endif /* HAVE_BIND_TEXTDOMAIN_CODESET */

bind_textdomain_codeset only called when HAVE_BIND_TEXTDOMAIN_CODESET
defined. In this case, you can assume GETTEXT(3) returned string is
UTF-8 encoded.

Philip Martin

unread,
May 23, 2013, 10:06:43 AM5/23/13
to Dongsheng Song, QXO, us...@subversion.apache.org, d...@subversion.apache.org
I still don't understand if you are claiming my change has a problem or
if there is a problem in all uses of svn_cmdline_cstring_from_utf8.

I recall a related thread from last year:

http://svn.haxx.se/dev/archive-2012-08/index.shtml#34
http://mail-archives.apache.org/mod_mbox/subversion-dev/201208.mbox/%3Cop.wilcelggnngjn5@tortoise%3E

I think we assume that the translations are UTF-8.

Is there some code change you think we should make?

Branko Čibej

unread,
May 23, 2013, 10:12:22 AM5/23/13
to us...@subversion.apache.org
We do not "assume" the translations are UTF-8, we require them to be.

http://subversion.apache.org/docs/community-guide/l10n.html#po-mo-requirements

-- Brane

--
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com

Dongsheng Song

unread,
May 23, 2013, 10:42:29 AM5/23/13
to Philip Martin, QXO, us...@subversion.apache.org, d...@subversion.apache.org
On Thu, May 23, 2013 at 10:06 PM, Philip Martin
Even ALL the translations are UTF-8, GETTEXT(3) still return the
string encoded by the ***current locale's codeset***.

Here is sniped from the GETTEXT(3) man pages:

In both cases, the functions also use the LC_CTYPE locale facet in
order to convert the translated message from the translator's
codeset to the ***current locale's codeset***, unless overridden by a
prior call to the bind_textdomain_codeset function.

So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8
coded, it it encoded to the ***current locale's codeset***.

I think the best solution is: DO NOTconvert the GETTEXT(3) returned
messages, write it ***AS IS***, since GETTEXT(3) already do the
correct conversion for us.

Erik Huelsmann

unread,
May 23, 2013, 11:02:59 AM5/23/13
to Dongsheng Song, Philip Martin, dev, us...@subversion.apache.org, QXO

sent from my phone

But we call the codeset function to make sure we do not generate output in the current locale encoding.

> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> messages, write it ***AS IS***, since GETTEXT(3)  already do the
> correct conversion for us.

Well, even though gettext may want us to believe otherwise, this doesn't work for cross platform applications: e.g. in windows the locale for output on the console may be different from the locale for other uses. Back when we went with gettext (2004?), we've hashed this through pretty thoroughly. I hope that discussion is still available in the archives.

Bye,

Erik.

Dongsheng Song

unread,
May 23, 2013, 11:15:43 AM5/23/13
to Erik Huelsmann, Philip Martin, dev, us...@subversion.apache.org, QXO
As I said in the first email of this thread, gettext 0.18.2 and 0.14.1
give me the different behavior, it seems that gettext 0.14.1 do not do
the correct thing. But do we still need support this OLD and BUGGY
version ?

Philip Martin

unread,
May 23, 2013, 11:29:46 AM5/23/13
to Dongsheng Song, QXO, us...@subversion.apache.org, d...@subversion.apache.org
Dongsheng Song <dongshe...@gmail.com> writes:

> Even ALL the translations are UTF-8, GETTEXT(3) still return the
> string encoded by the ***current locale's codeset***.
>
> Here is sniped from the GETTEXT(3) man pages:
>
> In both cases, the functions also use the LC_CTYPE locale facet in
> order to convert the translated message from the translator's
> codeset to the ***current locale's codeset***, unless overridden by a
> prior call to the bind_textdomain_codeset function.

We do call bind_textdomain_codeset if it is available so we should be
getting UTF8 translations.

> So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8
> coded, it it encoded to the ***current locale's codeset***.
>
> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> messages, write it ***AS IS***, since GETTEXT(3) already do the
> correct conversion for us.

It's not that simple. We would have to change almost every error:

svn_error_createf(SVN_ERR_BAD_RELATIVE_PATH, NULL, \
_("Path '%s' must be an immediate child of " \
"the directory '%s'"), path, relative_to_dir)

and convert variable like 'path' and 'relative_to_dir' from UTF8 to
native before combining with the native translation.

What would be the gain for all that work? The only problem at present
is a system that doesn't have bind_textdomain_codeset but where gettext
returns the current locale encoding having converted it from the UTF8 in
the file. Are there any such systems? What about the opposite problem:
systems that don't have bind_textdomain_codeset and where gettext
returns UTF8 because that is the encoding in the file. Are there any
systems like that?

Erik Huelsmann

unread,
May 23, 2013, 11:38:21 AM5/23/13
to Dongsheng Song, Philip Martin, dev, us...@subversion.apache.org, QXO


> >
> >> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> >> messages, write it ***AS IS***, since GETTEXT(3)  already do the
> >> correct conversion for us.
> >
> > Well, even though gettext may want us to believe otherwise, this doesn't
> > work for cross platform applications: e.g. in windows the locale for output
> > on the console may be different from the locale for other uses. Back when we
> > went with gettext (2004?), we've hashed this through pretty thoroughly. I
> > hope that discussion is still available in the archives.
> >
>
> As I said in the first email of this thread, gettext 0.18.2 and 0.14.1
> give me the different behavior, it seems that gettext 0.14.1 do not do
> the correct thing. But do we still need support this OLD and BUGGY
> version ?

That was not my point nor the point we discussed back then. As long as gettext tries to convert its translations to *any* encoding, it's flawed by design, because some systems have multiple active output encodings (e.g. Windows).

Unless this design has changed between 0.14 and 0.18, gettext() is still as broken as it was. Translating or not translating doesn't matter: it'll just be broken on other systems. Too bad the rest of it is actually pretty good.

Bye,

Erik.

Erik Huelsmann

unread,
May 23, 2013, 11:55:20 AM5/23/13
to Dongsheng Song, users, QXO, Philip Martin, dev

Found at least one of the related discussions:

http://svn.haxx.se/dev/archive-2004-05/0078.shtml

bye,

Erik.

Dongsheng Song

unread,
May 23, 2013, 12:44:28 PM5/23/13
to Erik Huelsmann, Philip Martin, dev, us...@subversion.apache.org, QXO
On Thu, May 23, 2013 at 11:38 PM, Erik Huelsmann <ehu...@gmail.com> wrote:
> That was not my point nor the point we discussed back then. As long as
> gettext tries to convert its translations to *any* encoding, it's flawed by
> design, because some systems have multiple active output encodings (e.g.
> Windows).
>

This does not matter. If I open 2 console window, one is CP437, the
other is CP936. Then svn in CP437 windows generate English (ASCII)
output, CP936 windows generate Chinese (GBK/GB18030) output.

Dongsheng Song

unread,
May 23, 2013, 12:52:00 PM5/23/13
to Philip Martin, QXO, us...@subversion.apache.org, d...@subversion.apache.org
On Thu, May 23, 2013 at 11:29 PM, Philip Martin
<philip...@wandisco.com> wrote:
> Dongsheng Song <dongshe...@gmail.com> writes:
>
>> Even ALL the translations are UTF-8, GETTEXT(3) still return the
>> string encoded by the ***current locale's codeset***.
>>
>> Here is sniped from the GETTEXT(3) man pages:
>>
>> In both cases, the functions also use the LC_CTYPE locale facet in
>> order to convert the translated message from the translator's
>> codeset to the ***current locale's codeset***, unless overridden by a
>> prior call to the bind_textdomain_codeset function.
>
> We do call bind_textdomain_codeset if it is available so we should be
> getting UTF8 translations.
>

For non-autotools system, e.g. Windows, user may not define
HAVE_BIND_TEXTDOMAIN_CODESET.

>> So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8
>> coded, it it encoded to the ***current locale's codeset***.
>>
>> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
>> messages, write it ***AS IS***, since GETTEXT(3) already do the
>> correct conversion for us.
>
> It's not that simple. We would have to change almost every error:
>
> svn_error_createf(SVN_ERR_BAD_RELATIVE_PATH, NULL, \
> _("Path '%s' must be an immediate child of " \
> "the directory '%s'"), path, relative_to_dir)
>
> and convert variable like 'path' and 'relative_to_dir' from UTF8 to
> native before combining with the native translation.
>
> What would be the gain for all that work? The only problem at present
> is a system that doesn't have bind_textdomain_codeset but where gettext
> returns the current locale encoding having converted it from the UTF8 in
> the file. Are there any such systems? What about the opposite problem:
> systems that don't have bind_textdomain_codeset and where gettext
> returns UTF8 because that is the encoding in the file. Are there any
> systems like that?
>

Or we should call bind_textdomain_codeset as possible, and warn the
user if HAVE_BIND_TEXTDOMAIN_CODESET not defined:

#ifdef HAVE_BIND_TEXTDOMAIN_CODESET
bind_textdomain_codeset(PACKAGE_NAME, "UTF-8");
#else
fprintf(sdterr, "bind_textdomain_codeset not available, or not
configured. Non-UTF8 locales maybe see garbled output.\n");
#endif /* HAVE_BIND_TEXTDOMAIN_CODESET */

Erik Huelsmann

unread,
May 23, 2013, 12:52:27 PM5/23/13
to Dongsheng Song, Philip Martin, dev, us...@subversion.apache.org, QXO

One application has multiple active code page settings on Windows. Or course if your example was the only option, we would not be having this discussion.

Bye,

Erik.

sent from my phone

Dongsheng Song

unread,
May 23, 2013, 12:58:43 PM5/23/13
to Erik Huelsmann, Philip Martin, dev, us...@subversion.apache.org, QXO
On Fri, May 24, 2013 at 12:52 AM, Erik Huelsmann <ehu...@gmail.com> wrote:
> One application has multiple active code page settings on Windows. Or course
> if your example was the only option, we would not be having this discussion.
>

Very interesting. In my mind, application only can have 1 active
locale in 1 thread. When gettext() got called, the current locale is
uniquely. Could you give me a sample ?

Regards,
Dongsheng

Philip Martin

unread,
May 23, 2013, 1:31:07 PM5/23/13
to Dongsheng Song, QXO, us...@subversion.apache.org, d...@subversion.apache.org
Dongsheng Song <dongshe...@gmail.com> writes:

>> We do call bind_textdomain_codeset if it is available so we should be
>> getting UTF8 translations.
>>
>
> For non-autotools system, e.g. Windows, user may not define
> HAVE_BIND_TEXTDOMAIN_CODESET.

If you build the software with the wrong settings it may not work
properly. The solution is to build it with the correct settings.
Perhaps you can improve the Windows build.

> Or we should call bind_textdomain_codeset as possible, and warn the
> user if HAVE_BIND_TEXTDOMAIN_CODESET not defined:
>
> #ifdef HAVE_BIND_TEXTDOMAIN_CODESET
> bind_textdomain_codeset(PACKAGE_NAME, "UTF-8");
> #else
> fprintf(sdterr, "bind_textdomain_codeset not available, or not
> configured. Non-UTF8 locales maybe see garbled output.\n");
> #endif /* HAVE_BIND_TEXTDOMAIN_CODESET */

That error would be annoying if it was a false alarm. Perhaps we could
verify the correct behaviour: call gettext and check whether the
returned string is valid UTF8? However, we are not getting reports of
problems so it's probably not worth the effort. The bug that started
this thread is about the exact opposite: gettext was returning UTF8 and
the output code was failing to convert to locale encoding.

QXO

unread,
Jun 18, 2013, 10:54:03 PM6/18/13
to Philip Martin, Dongsheng Song, us...@subversion.apache.org, d...@subversion.apache.org
The bug fixed in svn 1.8.0,Thanks:)

2013/5/24 Philip Martin <philip...@wandisco.com>

QXO

unread,
Sep 6, 2013, 2:49:17 AM9/6/13
to Philip Martin, Dongsheng Song, us...@subversion.apache.org, d...@subversion.apache.org
The bug found in svn 1.8.3(r1516576) again :(


E:\svnmirror>D:\tools\svn-win32-1.8.3\bin\svnadmin.exe upgrade ZmccPrj
宸插彇寰楃増鏈簱閿佸畾銆?璇风◢鍊欙紱鍗囩骇鐗堟湰搴撳彲鑳介渶瑕佷竴娈垫椂闂?..

完成升级。


E:\svnmirror>D:\tools\svn-win32-1.8.0\bin\svnadmin.exe upgrade ZmccPrj
已取得版本库锁定。
请稍候;升级版本库可能需要一段时间...

完成升级。




2013/6/19 QXO <qxod...@gmail.com>

Philip Martin

unread,
Sep 6, 2013, 5:14:15 AM9/6/13
to QXO, Dongsheng Song, us...@subversion.apache.org, d...@subversion.apache.org
QXO <qxod...@gmail.com> writes:

> The bug found in svn 1.8.3(r1516576) again :(
>
> E:\svnmirror>D:\tools\svn-win32-1.8.3\bin\svnadmin.exe upgrade ZmccPrj
> 宸插彇寰楃増鏈簱閿佸畾銆?璇风◢鍊欙紱鍗囩骇鐗堟湰搴撳彲鑳介渶瑕佷竴娈垫椂闂?..
> 完成升级。
>
> E:\svnmirror>D:\tools\svn-win32-1.8.0\bin\svnadmin.exe upgrade ZmccPrj
> 已取得版本库锁定。
> 请稍候;升级版本库可能需要一段时间...
> 完成升级。
>
> 2013/6/19 QXO <qxod...@gmail.com>
>
>> The bug fixed in svn 1.8.0,Thanks:)

As I recall you are using a non-UTF8 encoding. There was a bug in 1.8.0
that caused double UTF8-to-native conversion for svnadmin, that was
fixed in 1.8.3. Since you say 1.8.0 worked your build must be missing
one of the conversion. How did you build Subversion? Was it built with
ENABLE_NLS and HAVE_BIND_TEXTDOMAIN_CODESET?

Is anyone else using Windows in a non-UTF8 setup?

--
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Bert Huijben

unread,
Sep 6, 2013, 5:15:21 AM9/6/13
to QXO, Dongsheng Song, us...@subversion.apache.org, Philip Martin

Which build of the Windows binaries do you use?

 

The Subversion project doesn’t produce binaries, so can’t really support build issues.

 

Personally I do build the SlikSvn binaries (http://sliksvn.com/en/download), so if you find issues in that I might be able to help you.

 

                Bert

QXO

unread,
Sep 6, 2013, 10:01:47 AM9/6/13
to Bert Huijben, Dongsheng Song, us...@subversion.apache.org, Philip Martin
Reply all
Reply to author
Forward
0 new messages