Bind const L-value reference to function return value

Frederick Gotham

unread,

Mar 17, 2020, 4:05:02 AM3/17/20

to

I remember 15 years ago I frequently wrote code like this:

extern string Func(void); /* Defined elsewhere */

int main(void)
{
string const &str = Func();
}

The was back when we had the 1998 C++ Standard, (i.e. before R-value references), and you could bind a const L-value reference to the return value of a function (in the hope that the compiler would optimise it by prolonging the lifetime of the temporary object).

Well anyway I had some problematic code yesterday on an embedded Linux 32-Bit ARM device. I was using Boost::Filesystem::directory_iterator to go through a list of all the files in a directory, and I had this code:

namespace bfs = boost::filesystem;

bfs::directory_iterator const end_itr; // Default ctor yields one-past-the-end

for ( bfs::directory_iterator i("/home/frederick"); i != end_itr; ++i )
{
// Skip if a directory (but allow regular file, symbolic link, etc.)
if( bfs::is_directory(i->status()) )
{
continue;
}

string const &filename = i->path().filename().string();

if( false == boost::regex_match(filename, "\d\d\d.jpg") )
{
continue;
}

// File matches
return filename;
}

return string();

If you look at that line in the middle of the loop:

string const &filename = i->path().filename().string();

For some reason this was giving me a garbage filename, and I had to change it to:

string const filename( i->path().filename().string() );

This fixed the problem.

I don't think there should have been a problem with the original code though.

Frederick Gotham

unread,

Mar 17, 2020, 4:06:52 AM3/17/20

to

On Tuesday, March 17, 2020 at 8:05:02 AM UTC, Frederick Gotham wrote:

> if( false == boost::regex_match(filename, "\d\d\d.jpg") )

That string should actually be "\\d\\d\\d.jpg" but this is unrelated to the original problem.

Jorgen Grahn

unread,

Mar 17, 2020, 5:44:26 AM3/17/20

to

On Tue, 2020-03-17, Frederick Gotham wrote:
>
> I remember 15 years ago I frequently wrote code like this:
>
> extern string Func(void); /* Defined elsewhere */
>
> int main(void)
> {
> string const &str = Func();
> }
>
> The was back when we had the 1998 C++ Standard, (i.e. before R-value
> references), and you could bind a const L-value reference to the
> return value of a function (in the hope that the compiler would
> optimise it by prolonging the lifetime of the temporary object).

I had to read up on that in TC++PL. Don't think I've ever written
code like that -- is there really a performance benefit to it?

> Well anyway I had some problematic code yesterday on an embedded
> Linux 32-Bit ARM device. I was using
> Boost::Filesystem::directory_iterator to go through a list of all
> the files in a directory, and I had this code:
>
> namespace bfs = boost::filesystem;
>
> bfs::directory_iterator const end_itr; // Default ctor yields one-past-the-end
>
> for ( bfs::directory_iterator i("/home/frederick"); i != end_itr; ++i )
> {
> // Skip if a directory (but allow regular file, symbolic link, etc.)
> if( bfs::is_directory(i->status()) )
> {
> continue;
> }
>
> string const &filename = i->path().filename().string();

I don't know Boost::Filesystem, and I got lost in their documentation,
as usual ...

But if string() returns a /reference/ to a std::string, you have a
completely different scenario: no temporary object gets created, and
you'd have to look elsewhere to figure out the lifetime of the object
behind 'filename'. I.e. in the Boost.Filesystem documentation.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Bonita Montero

unread,

Mar 17, 2020, 6:19:12 AM3/17/20

to

> If you look at that line in the middle of the loop:
> string const &filename = i->path().filename().string();
> For some reason this was giving me a garbage filename, and I had to change it to:
> string const filename( i->path().filename().string() );

Check this:

#include <iostream>

using namespace std;

struct S
{
S() { cout << "S::S()" << endl; }
~S() { cout << "S::~S()" << endl; }
};

S fS()
{
S s;
return s;
}

int main()
{
cout << "before" << endl;
S &s = fS();
cout << "after" << endl;
}

This outputs this:

before
S::S()
S::~S()
after
S::~S()

So the temporary object is destroyed after the definition of s.
And when s goes out of scope the compiler tries to destroy the
referenced "temporary" again although it doesn't exist anymore.

Frederick Gotham

unread,

Mar 17, 2020, 7:07:20 AM3/17/20

to

On Tuesday, March 17, 2020 at 10:19:12 AM UTC, Bonita Montero wrote:

> S &s = fS();

>
> So the temporary object is destroyed after the definition of s.
> And when s goes out of scope the compiler tries to destroy the
> referenced "temporary" again although it doesn't exist anymore.

The rules are different if it's a "reference to const".

Frederick Gotham

unread,

Mar 17, 2020, 7:08:49 AM3/17/20

to

On Tuesday, March 17, 2020 at 9:44:26 AM UTC, Jorgen Grahn wrote:

> But if string() returns a /reference/ to a std::string

This would be the most likely explanation.... Now if I could only find the relevant info in the Boost documentation. I might look in the header files but they can be a bit hairy with macros and templates and so forth.

Bonita Montero

unread,

Mar 17, 2020, 7:55:09 AM3/17/20

to

>> S &s = fS();

>> So the temporary object is destroyed after the definition of s.
>> And when s goes out of scope the compiler tries to destroy the
>> referenced "temporary" again although it doesn't exist anymore.

> The rules are different if it's a "reference to const".

You're right, but not only with const:

#include <iostream>

using namespace std;

struct S
{

S( S & ) { cout << "S::S( S & )" << endl; }

S() { cout << "S::S()" << endl; }
~S() { cout << "S::~S()" << endl; }
};

S fS()
{
S s;
return s;
}

int main()
{
cout << "before" << endl;
S &s = fS();
cout << "after" << endl;
}

before
S::S()
S::S( S & )

S::~S()
after
S::~S()

Seems there's no real reference but a copy.

Felix Palmen

unread,

Mar 17, 2020, 8:50:10 AM3/17/20

to

* Jorgen Grahn <grahn...@snipabacken.se>:

> On Tue, 2020-03-17, Frederick Gotham wrote:
>> you could bind a const L-value reference to the
>> return value of a function (in the hope that the compiler would
>> optimise it by prolonging the lifetime of the temporary object).
>
> I had to read up on that in TC++PL. Don't think I've ever written
> code like that -- is there really a performance benefit to it?

I think this is a red herring. The code is well-defined, but as
automatic lifetime sill applies, a copy must be taken for the caller so
it can reference it. This could only be optimized away if copying is
guaranteed to have no side effects. Even then, how would a typical
implementation, using a stack frame for automatic storage of objects,
handle this kind of optimization?

> But if string() returns a /reference/ to a std::string, you have a
> completely different scenario: no temporary object gets created

Which is almost certainly the case here. It makes sense that a path
holds a filename and an accessor just returns a reference.

--
Dipl.-Inform. Felix Palmen <fe...@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

Paavo Helde

unread,

Mar 17, 2020, 10:19:00 AM3/17/20

to

I copied your code into a .cpp file and went to the definition of
string() via a single context menu click:

# ifdef BOOST_WINDOWS_API
const std::string string() const { return string(codecvt()); }
// ...
# else // BOOST_POSIX_API
// string_type is std::string, so there is no conversion
const std::string& string() const { return m_pathname; }

It appears the return type is a reference in Linux and an object in
Windows. Most devious!

Bonita Montero

unread,

Mar 17, 2020, 10:21:47 AM3/17/20

to

> #include <iostream>
>
> using namespace std;
>
> struct S
> {
>     S( S & ) { cout << "S::S( S & )" << endl; }
>     S()       { cout << "S::S()"       << endl; }
>     ~S()      { cout << "S::~S()"      << endl; }
> };
>
> S fS()
> {
>     S s;
>     return s;
> }
>
> int main()
> {
>     cout << "before" << endl;
>     S &s = fS();
>     cout << "after" << endl;
> }
>
> before
> S::S()
> S::S( S & )
> S::~S()
> after
> S::~S()
>
> Seems there's no real reference but a copy.

Oh, the above code only works with g++ if I define S as const.

Alf P. Steinbach

unread,

Mar 17, 2020, 12:48:08 PM3/17/20

to

That's crazy.

Additionally, at least for the standard library's filesystem, the
encoding is system specific, UTF-8 in Linux and Windows ANSI in Windows.

Happily since May last year it's possible to set UTF-8 as the process'
Windows ANSI encoding, but.

Until C++20 you could avoid this silliness by using .u8string() instead.

In C++20 the return type of that was/will be changed to incompatible.

- Alf

Alf P. Steinbach

unread,

Mar 17, 2020, 12:58:31 PM3/17/20

to

On 17.03.2020 09:04, Frederick Gotham wrote:
>
> I remember 15 years ago I frequently wrote code like this:
>
> extern string Func(void); /* Defined elsewhere */
>
> int main(void)

Don't do that. It's a C-ism.

> {
> string const &str = Func();

Don't do that.

It has no performance effect and opens the door to subtle bug of only
keeping a returned reference. The only reasonable use of it that I know
was in Petru Marginean's ScopeGuard. And in C++11 and later there is no
need for that particular trick.

> }
>
> The was back when we had the 1998 C++ Standard, (i.e. before R-value
> references), and you could bind a const L-value reference to the
> return value of a function (in the hope that the compiler would
> optimise it by prolonging the lifetime of the temporary object).
>
> Well anyway I had some problematic code yesterday on an embedded
> Linux 32-Bit ARM device. I was using
> Boost::Filesystem::directory_iterator to go through a list of all the
> files in a directory, and I had this code:
>
> namespace bfs = boost::filesystem;
>
> bfs::directory_iterator const end_itr; // Default ctor yields one-past-the-end
>
> for ( bfs::directory_iterator i("/home/frederick"); i != end_itr; ++i )

I would re-express that with a range based `for`. Depending on the
particulars of the filesystem library and C++ version that may require a
little support class. Here's an example support class: <url:
https://github.com/alf-p-steinbach/cppx-core-language/blob/master/source/cppx-core-language/syntax/collection-util/Span_.hpp>.

> {
> // Skip if a directory (but allow regular file, symbolic link, etc.)
> if( bfs::is_directory(i->status()) )
> {
> continue;
> }
>
> string const &filename = i->path().filename().string();

Again, don't do that.

> if( false == boost::regex_match(filename, "\d\d\d.jpg") )

Better expressed with `not`.

> {
> continue;

As I see it, preferably use `continue` in the same way as early
`return`, only at start of block to get rid or errors or trivial cases.

> }
>
> // File matches
> return filename;
> }
>
> return string();

Consider writing just `return "";`, I think that's more clear.

Or alternatively the general "return default constructed
instance"-idiom, namely `return {};`.

> If you look at that line in the middle of the loop:
>
> string const &filename = i->path().filename().string();
>
> For some reason this was giving me a garbage filename, and I had to change it to:
>
> string const filename( i->path().filename().string() );
>
> This fixed the problem.
>
> I don't think there should have been a problem with the original code though.

Binding to reference function result.

- Alf

Felix Palmen

unread,

Mar 17, 2020, 2:08:12 PM3/17/20

to

* Alf P. Steinbach <alf.p.stein...@gmail.com>:

> Additionally, at least for the standard library's filesystem, the
> encoding is system specific, UTF-8 in Linux and Windows ANSI in Windows.

Without knowing this particular API: This can't be entirely true. It
seems to be something handling filenames. The thing is: A filename on a
Unix system ist just an array of octets, with no encoding information
attached or implied. So it's UTF-8 IFF the file in question was named
while an UTF-8 locale was in effect.

On Windows, it's a different thing: Filenames are 16bit wide characters,
encoded in UTF-16. That seems to be the reason why the Windows
implementation returns a newly created object: It must convert the
native Windows filename to an 8-bit string. This whole thing is a
recurring PITA, as Windows API calls based on `char` (and also e.g. the
Standard C library functions like `fopen()`) indeed use a Windows ANSI
encoding, so information is lost. IMHO the only sane thing to do on
Windows when you need 8bit strings is to enforce using UTF-8, and
convert it to/from UTF-16 whenever needed to use exclusively the
wide-string APIs.

Alf P. Steinbach

unread,

Mar 18, 2020, 2:13:27 AM3/18/20

to

On 17.03.2020 19:07, Felix Palmen wrote:
> * Alf P. Steinbach <alf.p.stein...@gmail.com>:
>> Additionally, at least for the standard library's filesystem, the
>> encoding is system specific, UTF-8 in Linux and Windows ANSI in Windows.
>
> Without knowing this particular API: This can't be entirely true. It
> seems to be something handling filenames. The thing is: A filename on a
> Unix system ist just an array of octets, with no encoding information
> attached or implied.

The "implied" is incorrect.

Today the implied encoding of a Linux filename is UTF-8.

I guess a lot of tools will have difficulties with filenames that are
invalid as UTF-8, even though technically wrt. the OS API they're valid
filenames. 20 years ago they would be good. In that period that changed.

> So it's UTF-8 IFF the file in question was named
> while an UTF-8 locale was in effect.

That's true as far as it goes, which is a micro-meter or so.

> On Windows, it's a different thing: Filenames are 16bit wide characters,
> encoded in UTF-16. That seems to be the reason why the Windows
> implementation returns a newly created object: It must convert the
> native Windows filename to an 8-bit string.

Right, but only because the library chooses to use UTF-16 internally in
Windows.

IIRC the original filesystem v3 spec had that as a requirement.

With the standardization, instead the iostreams were outfitted with
constructors taking `std::filesystem::path`.

> This whole thing is a
> recurring PITA, as Windows API calls based on `char` (and also e.g. the
> Standard C library functions like `fopen()`) indeed use a Windows ANSI
> encoding, so information is lost.

No, information is only lost when the process Windows ANSI codepage is
not UTF-8.

Support for UTF-8 Windows ANSI process codepages was added in May last year.

> IMHO the only sane thing to do on
> Windows when you need 8bit strings is to enforce using UTF-8,

That's now, in the last few years, become a good idea.

Because now also the system compiler, Visual C++, supports UTF-8 as
execution character set, as well as default source encoding.

Still worth noting that without setting UTF-8 process codepage

* `main` arguments will be ANSI encoded,
* ditto for `char` based environment variables,
* narrow output to wide streams will be treated as ANSI,
* narrow filenames will be assumed to be ANSI,
* all the locale dependent functionality will be ungood.

> and convert it to/from UTF-16 whenever needed to use exclusively the
> wide-string APIs.

That used to be my advice for Windows.

However, if one doesn't have to support Windows versions earlier than
May 2019, then there is no need to do all this conversion.

- Alf

Felix Palmen

unread,

Mar 18, 2020, 3:18:13 AM3/18/20

to

* Alf P. Steinbach <alf.p.stein...@gmail.com>:

> The "implied" is incorrect.
>
> Today the implied encoding of a Linux filename is UTF-8.

Wrong. The filename is an opaque octet sequence. Linux (as well as other
POSIX systems) starts up in a C locale, where characters with bit 7 set
aren't even defined, and will happily use any other encoding specified
in the locale. So there's nothing implied. "Implied" doesn't mean that a
vast majority of users configures it that way. For a sane program, it's
reasonable to assume that all filenames follow the encoding of the
current locale (cause that's the best thing it can do). It's not
reasonable to just assume UTF-8.

>> On Windows, it's a different thing: Filenames are 16bit wide characters,
>> encoded in UTF-16.
>

> Right, but only because the library chooses to use UTF-16 internally in
> Windows.

No, because Windows uses UTF-16 internally and in all API calls for
anything Unicode. The library just decided to follow this design
decision.

>> This whole thing is a
>> recurring PITA, as Windows API calls based on `char` (and also e.g. the
>> Standard C library functions like `fopen()`) indeed use a Windows ANSI
>> encoding, so information is lost.
>
> No, information is only lost when the process Windows ANSI codepage is
> not UTF-8.

Which is almost always the case. Windows always uses its 8bit single
character codepages like eg CP-1252.

> Support for UTF-8 Windows ANSI process codepages was added in May last year.

Sure, but AFAIK you can't configure it system-wide, and, more
importantly, it can't be made the default for the all-sacred backwards
compatibility (and you can't use it anyways if you want your program to
work with slightly older versions of Windows).

> Still worth noting that without setting UTF-8 process codepage
>
> * `main` arguments will be ANSI encoded,
> * ditto for `char` based environment variables,
> * narrow output to wide streams will be treated as ANSI,
> * narrow filenames will be assumed to be ANSI,
> * all the locale dependent functionality will be ungood.

For all these issues, there are functions in either the Windows API
(like e.g. GetCommandLineW()) or the C runtime (like e.g. _wfopen()). Of
course, this means a bit of conditional compilation is needed for
portability to Windows.

> However, if one doesn't have to support Windows versions earlier than
> May 2019, then there is no need to do all this conversion.

Sure. And that's a good thing. Let's start talking about this in another
5 years. People still use Windows 7 right now, ignoring all sanity...

Alf P. Steinbach

unread,

Mar 18, 2020, 3:31:05 AM3/18/20

to

On 18.03.2020 08:16, Felix Palmen wrote:
> * Alf P. Steinbach <alf.p.stein...@gmail.com>:
>> The "implied" is incorrect.
>>
>> Today the implied encoding of a Linux filename is UTF-8.
>
> Wrong.

That sort of braindead. We've already discussed this. I hope I will
never use an app made by you.

> The filename is an opaque octet sequence.

Yes, at the OS API level is that.

> Linux (as well as other
> POSIX systems) starts up in a C locale, where characters with bit 7 set
> aren't even defined, and will happily use any other encoding specified
> in the locale. So there's nothing implied. "Implied" doesn't mean that a
> vast majority of users configures it that way. For a sane program, it's
> reasonable to assume that all filenames follow the encoding of the
> current locale (cause that's the best thing it can do). It's not
> reasonable to just assume UTF-8.

Again, braindead.

>>> On Windows, it's a different thing: Filenames are 16bit wide characters,
>>> encoded in UTF-16.
>>
>> Right, but only because the library chooses to use UTF-16 internally in
>> Windows.
>
> No, because Windows uses UTF-16 internally and in all API calls for
> anything Unicode. The library just decided to follow this design
> decision.

The "No" is meaningless, the rest is correct but doesn't support the "no".

Again, braindead.

>>> This whole thing is a
>>> recurring PITA, as Windows API calls based on `char` (and also e.g. the
>>> Standard C library functions like `fopen()`) indeed use a Windows ANSI
>>> encoding, so information is lost.
>>
>> No, information is only lost when the process Windows ANSI codepage is
>> not UTF-8.
>
> Which is almost always the case. Windows always uses its 8bit single
> character codepages like eg CP-1252.

You have just been informed otherwise, in the posting you're responding to.

Sounds like irrational denial to me.

>> Support for UTF-8 Windows ANSI process codepages was added in May last year.
>
> Sure, but AFAIK you can't configure it system-wide,

You can.

You're just making this up as you go, aren't you?

> and, more
> importantly, it can't be made the default for the all-sacred backwards
> compatibility (and you can't use it anyways if you want your program to
> work with slightly older versions of Windows).

I failed to parse that, sorry.

>> Still worth noting that without setting UTF-8 process codepage
>>
>> * `main` arguments will be ANSI encoded,
>> * ditto for `char` based environment variables,
>> * narrow output to wide streams will be treated as ANSI,
>> * narrow filenames will be assumed to be ANSI,
>> * all the locale dependent functionality will be ungood.
>
> For all these issues, there are functions in either the Windows API
> (like e.g. GetCommandLineW()) or the C runtime (like e.g. _wfopen()). Of
> course, this means a bit of conditional compilation is needed for
> portability to Windows.

Yes there are workarounds. And it's worth noting.

>> However, if one doesn't have to support Windows versions earlier than
>> May 2019, then there is no need to do all this conversion.
>
> Sure. And that's a good thing. Let's start talking about this in another
> 5 years. People still use Windows 7 right now, ignoring all sanity...

You shouldn't talk too loudly about the last word you used there.

- Alf

Jorgen Grahn

unread,

Mar 18, 2020, 3:35:41 AM3/18/20

to

On Wed, 2020-03-18, Felix Palmen wrote:
> * Alf P. Steinbach <alf.p.stein...@gmail.com>:
>> The "implied" is incorrect.
>>
>> Today the implied encoding of a Linux filename is UTF-8.
>
> Wrong. The filename is an opaque octet sequence. Linux (as well as other
> POSIX systems) starts up in a C locale, where characters with bit 7 set
> aren't even defined, and will happily use any other encoding specified
> in the locale. So there's nothing implied. "Implied" doesn't mean that a
> vast majority of users configures it that way. For a sane program, it's
> reasonable to assume that all filenames follow the encoding of the
> current locale (cause that's the best thing it can do). It's not
> reasonable to just assume UTF-8.

I agree, but (IIRC, and e.g.) Debian no longer handles bug reports for
tools which hardcode UTF-8. At the same time they ship non-utf8
locales, so there are some contradicitons.

My filenames are in Latin-1 and so is my LC_CTYPE. There are a few
tools which have problems with that. This may originate in some GUI
toolkit misfeature.

Felix Palmen

unread,

Mar 18, 2020, 4:40:10 AM3/18/20

to

* Alf P. Steinbach <alf.p.stein...@gmail.com>:

[bullshit]

*plonk*

Felix Palmen

unread,

Mar 18, 2020, 4:46:13 AM3/18/20

to

* Jorgen Grahn <grahn...@snipabacken.se>:

> I agree, but (IIRC, and e.g.) Debian no longer handles bug reports for
> tools which hardcode UTF-8. At the same time they ship non-utf8
> locales, so there are some contradicitons.

I fail to see the contradiction here? Tools hardcoding UTF-8 will fail
in non-utf8 locales, so isn't it just consequent to not support them?

> My filenames are in Latin-1 and so is my LC_CTYPE. There are a few
> tools which have problems with that. This may originate in some GUI
> toolkit misfeature.

I don't see any practical reason any more *not* to use UTF-8 nowadays,
apart from maybe having to deal with some (commercial?) Unix systems
that still don't deal well with it. But that's a personal decision, and
any software blindly assuming UTF-8 while other encodings are still
supported by the OS is just broken.

Jorgen Grahn

unread,

Mar 18, 2020, 5:37:08 AM3/18/20

to

On Wed, 2020-03-18, Felix Palmen wrote:

> * Jorgen Grahn <grahn...@snipabacken.se>:
>> I agree, but (IIRC, and e.g.) Debian no longer handles bug reports for
>> tools which hardcode UTF-8. At the same time they ship non-utf8
>> locales, so there are some contradicitons.
>
> I fail to see the contradiction here? Tools hardcoding UTF-8 will fail
> in non-utf8 locales, so isn't it just consequent to not support them?

The contradiction would be shipping non-utf-8 locales, but not
handling bugs activated by those locales. But note that I haven't
investigated this in detail. I don't have a reference at hand for "no
longer handles bug reports", and I see the Debian Reference only
/recommends/ UTF-8.

>> My filenames are in Latin-1 and so is my LC_CTYPE. There are a few
>> tools which have problems with that. This may originate in some GUI
>> toolkit misfeature.
>
> I don't see any practical reason any more *not* to use UTF-8 nowadays,
> apart from maybe having to deal with some (commercial?) Unix systems
> that still don't deal well with it.

My reasons are offtopic here, but basically I don't want to go through
and modify my archived data.

> But that's a personal decision, and
> any software blindly assuming UTF-8 while other encodings are still
> supported by the OS is just broken.

James Kuyper

unread,

Mar 18, 2020, 10:02:49 AM3/18/20

to

On 3/18/20 3:30 AM, Alf P. Steinbach wrote:
> On 18.03.2020 08:16, Felix Palmen wrote:
>> * Alf P. Steinbach <alf.p.stein...@gmail.com>:

...

>>> Support for UTF-8 Windows ANSI process codepages was added in May last year.
>>
>> Sure, but AFAIK you can't configure it system-wide,
>
> You can.

I realize that the two of you are enjoying insulting each other, so I'm
not sure you'd appreciate this interruption. However, this particular
issue is a simple matter of fact: it's either true or false, and can be
verified by anyone with a sufficiently up-to-date version of Windows.
That being the case, a more useful response would have explained
precisely what you need to do to configure it system-wide. Then he would
either have to concede that it is possible to do so, or report that it
doesn't work.
He could also change his mind, and instead of saying it can't be done,
say that there's some disadvantage to doing so. Since the importance of
such a disadvantage is entirely a matter of judgement rather than fact,
would allow the two of you to go on arguing about it.

James Kuyper

unread,

Mar 18, 2020, 10:12:10 AM3/18/20

to

On 3/18/20 3:16 AM, Felix Palmen wrote:
> * Alf P. Steinbach <alf.p.stein...@gmail.com>:

...

>> No, information is only lost when the process Windows ANSI codepage is
>> not UTF-8.
>
> Which is almost always the case. Windows always uses its 8bit single
> character codepages like eg CP-1252.

The only things I use Windows for don't require use of anything other
than 7-bit ASCII, so I'm not at all familiar with such issues. But I'm
having trouble figuring out those two sentences. It seems to me that
they address different aspects of the same thing. Therefore, I would
expect either both sentences to use "almost always", or both sentences
to use "always". How can the codepage "almost always" be non-UTF8, but
"always" be 8bit single character codepages?

Felix Palmen

unread,

Mar 18, 2020, 10:30:12 AM3/18/20

to

* James Kuyper <james...@alumni.caltech.edu>:
> [...] I would

> expect either both sentences to use "almost always", or both sentences
> to use "always". How can the codepage "almost always" be non-UTF8, but
> "always" be 8bit single character codepages?

Sloppy phrasing from my side, sorry. Setting a language in Windows, the
8bit character set is *always* set to something like e.g. CP-1252 (which
is a single-byte character set similar to iso-8859-1). AFAIK, in some
special cases, double-byte encodings are used as well (chinese etc), but
not UTF-8.

The "almost always" slipped in because there actually *is* an UTF-8
codepage available in newer versions of Windows, which the application
might use. As this is only available in newer versions and isn't default
either, it's useless in practice (which might change in the future of
course). Even if they really added a way to set it system-wide recently,
it will take years until you can safely assume you can use it when
targeting windows.

Until then, in a portable program using 8bit characters, you'll probably
continue to do a lot of conversions when building for Windows.

cda...@gmail.com

unread,

Mar 18, 2020, 11:40:35 AM3/18/20

to

FIGHT FIGHT