[boost] [filesystem] becoming worse than the standard?

Vinnie Falco via Boost

unread,

Aug 14, 2022, 7:34:54 PM8/14/22

to boost@lists.boost.org List, Vinnie Falco

I see this in the commit log for boost.filesystem:

Deprecated path construction/assignment/appending from container types.
Users are advised to use string types and iterators instead of containers
to construct/assign/append to paths.
In v4, the support for containers is removed.

This can't be right now, can it? Because std::filesystem supports path
construction from anything meeting the requirements of Source which is
kind of range-like. Boost.URL and its downstream libraries HTTP-Proto,
HTTP-IO, Websocket-Proto, and Websocket-IO are depending on the
container-based interface to achieve reasonably decent syntax and
semantics despite the limitations of the path API (which are the fault
of the standard and not Boost).

If std::filesystem::path supports container-based parameters to
assign() and append() then why is boost::filesystem::path removing it?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev via Boost

unread,

Aug 15, 2022, 6:37:53 AM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 02:34, Vinnie Falco via Boost wrote:
> I see this in the commit log for boost.filesystem:
>
> Deprecated path construction/assignment/appending from container types.
> Users are advised to use string types and iterators instead of containers
> to construct/assign/append to paths.
> In v4, the support for containers is removed.
>
> This can't be right now, can it? Because std::filesystem supports path
> construction from anything meeting the requirements of Source which is
> kind of range-like.

Paths are supposed to interoperate with strings, so that is what the
change you refer to[1] is aimed for. Assigning or constructing a path
from e.g. std::list<char> doesn't make much sense, and that is being
deprecated. That Boost.Filesystem used to accept this is an unfortunate
legacy misfeature, which is being removed. Note that the support for
string types is *not* being removed.

std::filesystem requirements[2] actually restrict Source arguments to
std string types, iterators (read: pointers) to C-style strings and
arrays of characters (which are interpreted as C-style strings). So
Boost.Filesystem is getting closer to the standard here.

> Boost.URL and its downstream libraries HTTP-Proto,
> HTTP-IO, Websocket-Proto, and Websocket-IO are depending on the
> container-based interface to achieve reasonably decent syntax and
> semantics despite the limitations of the path API (which are the fault
> of the standard and not Boost).

Could you describe why you need to construct/assign/append to paths from
arbitrary containers and why you cannot use string types or iterators?

[1]:
https://github.com/boostorg/filesystem/commit/d829a46b3120933e3dbaf9b3c85bf1dc1ec8e77c
[2]: http://eel.is/c++draft/fs.path.req#1

Rainer Deyke via Boost

unread,

Aug 15, 2022, 7:33:10 AM8/15/22

to bo...@lists.boost.org, Rainer Deyke

On 15.08.22 01:34, Vinnie Falco via Boost wrote:
> I see this in the commit log for boost.filesystem:
>
> Deprecated path construction/assignment/appending from container types.
> Users are advised to use string types and iterators instead of containers
> to construct/assign/append to paths.
> In v4, the support for containers is removed.
>
> This can't be right now, can it? Because std::filesystem supports path
> construction from anything meeting the requirements of Source which is
> kind of range-like. Boost.URL and its downstream libraries HTTP-Proto,
> HTTP-IO, Websocket-Proto, and Websocket-IO are depending on the
> container-based interface to achieve reasonably decent syntax and
> semantics despite the limitations of the path API (which are the fault
> of the standard and not Boost).
>
> If std::filesystem::path supports container-based parameters to
> assign() and append() then why is boost::filesystem::path removing it?

I've already given up on Boost.Filesystem due to
https://github.com/boostorg/filesystem/issues/181. std::filesystem
seems to work better across different compilers/platforms, and it's
standard.

--
Rainer Deyke (rai...@eldwood.com)

Vinnie Falco via Boost

unread,

Aug 15, 2022, 10:11:09 AM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 3:37 AM Andrey Semashev via Boost
<bo...@lists.boost.org> wrote:
> Could you describe why you need to construct/assign/append to paths from
> arbitrary containers and why you cannot use string types or iterators?

Yeah, and from your description it sounds like maybe it is just a user
error on my part or a bug. The container in question is
boost::urls::pct_encoded_view:

<https://github.com/CPPAlliance/url/blob/cba301383791f67c31b3ccd52a401849f7cf8940/include/boost/url/pct_encoded_view.hpp#L75>

This used to work:

pct_encoded_view v;
boost::filesystem::path p;

p.append( v );

What should happen here is that a path segment will be appended to p
formed by applying percent-decoding to the string referenced by 'v'
(the percent-decoding happens in the iterator).

Thanks

Andrey Semashev via Boost

unread,

Aug 15, 2022, 10:57:38 AM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 17:10, Vinnie Falco wrote:
> On Mon, Aug 15, 2022 at 3:37 AM Andrey Semashev via Boost
> <bo...@lists.boost.org> wrote:
>> Could you describe why you need to construct/assign/append to paths from
>> arbitrary containers and why you cannot use string types or iterators?
>
> Yeah, and from your description it sounds like maybe it is just a user
> error on my part or a bug. The container in question is
> boost::urls::pct_encoded_view:
>
> <https://github.com/CPPAlliance/url/blob/cba301383791f67c31b3ccd52a401849f7cf8940/include/boost/url/pct_encoded_view.hpp#L75>
>
> This used to work:
>
> pct_encoded_view v;
> boost::filesystem::path p;
>
> p.append( v );
>
> What should happen here is that a path segment will be appended to p
> formed by applying percent-decoding to the string referenced by 'v'
> (the percent-decoding happens in the iterator).

With std::filesystem::path that is what would happen, though
pct_encoded_view::operator std::string(). With boost::filesystem::path,
it will construct the path from a string that is obtained from
[v.begin(), v.end()) range (which, I presume, will not be
percent-encoded). Since the recent change, it will also give you a
deprecated warning.

The correct code is either use v.to_string() or explicitly use
v.begin()/v.end(), depending on what you want.

BTW, duplicate "as a" here:

https://github.com/CPPAlliance/url/blob/cba301383791f67c31b3ccd52a401849f7cf8940/include/boost/url/pct_encoded_view.hpp#L57

Vinnie Falco via Boost

unread,

Aug 15, 2022, 11:33:00 AM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 7:57 AM Andrey Semashev via Boost <
bo...@lists.boost.org> wrote:

> on what you want.

boost::filesystem::path used to meet the requirements of MutableString, but
now does not:
https://master.url.cpp.al/url/concepts/mutablestring.html

--
Regards,
Vinnie

Follow me on GitHub: https://github.com/vinniefalco

Andrey Semashev via Boost

unread,

Aug 15, 2022, 11:42:32 AM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 18:32, Vinnie Falco wrote:
>
> boost::filesystem::path used to meet the requirements of MutableString,
> but now does not:
> https://master.url.cpp.al/url/concepts/mutablestring.html

> <https://master.url.cpp.al/url/concepts/mutablestring.html>

It does not meet the requirements on Windows, and never did -
path::value_type is wchar_t there.

Also, you don't define the semantics of the operations, in particular,
what is the expected result. Regardless of the platform, path::append
does not merely appends characters, it appends a path, which may result
in appending a directory delimiter or completely replacing the path if
the argument is an absolute path. Depending on what exactly you mean by
"Appends the characters", this may or may not qualify.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 11:50:24 AM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 18:42, Andrey Semashev wrote:
> On 8/15/22 18:32, Vinnie Falco wrote:
>>
>> boost::filesystem::path used to meet the requirements of MutableString,
>> but now does not:
>> https://master.url.cpp.al/url/concepts/mutablestring.html
>> <https://master.url.cpp.al/url/concepts/mutablestring.html>
>
> It does not meet the requirements on Windows, and never did -
> path::value_type is wchar_t there.
>
> Also, you don't define the semantics of the operations, in particular,
> what is the expected result. Regardless of the platform, path::append
> does not merely appends characters, it appends a path, which may result
> in appending a directory delimiter or completely replacing the path if
> the argument is an absolute path. Depending on what exactly you mean by
> "Appends the characters", this may or may not qualify.

I should also note that filesystem::path is not a container either
because path::value_type is different from that of path::iterator.

Niall Douglas via Boost

unread,

Aug 15, 2022, 12:06:27 PM8/15/22

to bo...@lists.boost.org, Niall Douglas

On 15/08/2022 16:42, Andrey Semashev via Boost wrote:
> On 8/15/22 18:32, Vinnie Falco wrote:
>>
>> boost::filesystem::path used to meet the requirements of MutableString,
>> but now does not:
>> https://master.url.cpp.al/url/concepts/mutablestring.html
>> <https://master.url.cpp.al/url/concepts/mutablestring.html>
>
> It does not meet the requirements on Windows, and never did -
> path::value_type is wchar_t there.
>
> Also, you don't define the semantics of the operations, in particular,
> what is the expected result. Regardless of the platform, path::append
> does not merely appends characters, it appends a path, which may result
> in appending a directory delimiter or completely replacing the path if
> the argument is an absolute path. Depending on what exactly you mean by
> "Appends the characters", this may or may not qualify.

Also "foo/bar" == "foo//bar" for filesystem paths. And several other
comparisons are equal which wouldn't be if a filesystem path were a
basic string.

Niall

Alan de Freitas via Boost

unread,

Aug 15, 2022, 1:14:59 PM8/15/22

to bo...@lists.boost.org, Alan de Freitas

>
> Assigning or constructing a path
> from e.g. std::list<char> doesn't make much sense,

Why not?

--
Alan Freitas
https://github.com/alandefreitas

Vinnie Falco via Boost

unread,

Aug 15, 2022, 1:18:22 PM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 8:42 AM Andrey Semashev via Boost
<bo...@lists.boost.org> wrote:
> you don't define the semantics of the operations, in particular,
> what is the expected result

Yes. Thank you for noticing this, as it is exactly what I intended. It
is the caller's responsibility to determine if the operation makes
sense. The library doesn't care what the outcome is, only that the
syntactic requirements are met. Which is how I documented it :)

Thanks

Peter Dimov via Boost

unread,

Aug 15, 2022, 1:21:05 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Vinnie Falco wrote:
> On Mon, Aug 15, 2022 at 8:42 AM Andrey Semashev via Boost
> <bo...@lists.boost.org> wrote:
> > you don't define the semantics of the operations, in particular, what
> > is the expected result
>
> Yes. Thank you for noticing this, as it is exactly what I intended. It is the caller's
> responsibility to determine if the operation makes sense. The library doesn't
> care what the outcome is, only that the syntactic requirements are met. Which
> is how I documented it :)

The semantics are pretty clear when value_type is char, which is what is required.

It's less clear what should happen when the library calls append or assign with
iterators whose value_type is char on something whose value_type is wchar_t,
which is what you seem to be doing.

Vinnie Falco via Boost

unread,

Aug 15, 2022, 1:22:16 PM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 10:17 AM Vinnie Falco <vinnie...@gmail.com> wrote:
> On Mon, Aug 15, 2022 at 8:42 AM Andrey Semashev via Boost

> > ...

It comes down to this. boost::urls::pct_encoded_view is a ForwardRange
of char. How do we append it as a segment to each of the following 4
things:

1. boost::filesystem::path on Windows
2. boost::filesystem::path not on Windows
3. std::filesystem::path on Windows
4. std::filesystem::path not on Windows

I do not know how to answer this question, other than with annoying
wide string conversions and extra memory allocations.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 1:24:40 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 20:14, Alan de Freitas via Boost wrote:
>>
>> Assigning or constructing a path
>> from e.g. std::list<char> doesn't make much sense,
>
> Why not?

Because list is not a string. If you're assigning a list to a path, most
likely you are doing something wrong.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 1:41:45 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 20:21, Vinnie Falco wrote:
> On Mon, Aug 15, 2022 at 10:17 AM Vinnie Falco <vinnie...@gmail.com> wrote:
>> On Mon, Aug 15, 2022 at 8:42 AM Andrey Semashev via Boost
>>> ...
>
> It comes down to this. boost::urls::pct_encoded_view is a ForwardRange
> of char. How do we append it as a segment to each of the following 4
> things:
>
> 1. boost::filesystem::path on Windows
> 2. boost::filesystem::path not on Windows
> 3. std::filesystem::path on Windows
> 4. std::filesystem::path not on Windows
>
> I do not know how to answer this question, other than with annoying
> wide string conversions and extra memory allocations.

filesystem::path will handle character code conversion. Regardless of
the platform, you call path::append(). Or path::concat(), if you want
dumb concatenation. The question is only what and how exactly do you
want to append and what do you expect to get in the end.

I didn't read Boost.URL docs, but it seems strange to me that operations
are defined without an expected outcome. I mean, you can define loose
concepts such as MutableString, but I don't find it helpful as I still
can't tell what the operations on its models are doing.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 1:46:26 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 20:20, Peter Dimov via Boost wrote:
> Vinnie Falco wrote:
>> On Mon, Aug 15, 2022 at 8:42 AM Andrey Semashev via Boost
>> <bo...@lists.boost.org> wrote:
>>> you don't define the semantics of the operations, in particular, what
>>> is the expected result
>>
>> Yes. Thank you for noticing this, as it is exactly what I intended. It is the caller's
>> responsibility to determine if the operation makes sense. The library doesn't
>> care what the outcome is, only that the syntactic requirements are met. Which
>> is how I documented it :)
>
> The semantics are pretty clear when value_type is char, which is what is required.
>
> It's less clear what should happen when the library calls append or assign with
> iterators whose value_type is char on something whose value_type is wchar_t,
> which is what you seem to be doing.

That part is defined by filesystem::path - character code conversion
happens. What isn't clear in the code Vinnie posted earlier is what is
supposed to be appended (percent-escaped string or not) and how (path
appending or dumb concatenation). That's the problem with the docs, it
says "appends", but there are too many ways to "append".

Vinnie Falco via Boost

unread,

Aug 15, 2022, 1:52:48 PM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 10:46 AM Andrey Semashev via Boost
<bo...@lists.boost.org> wrote:
> What isn't clear in the code Vinnie posted earlier is what is
> supposed to be appended (percent-escaped string or not)

When pct_encoded_view is iterated it produces normal individual
characters. That is, any percent-escapes in the referenced character
buffer have the encoding removed.

> and how (path appending or dumb concatenation). That's the problem
> with the docs, it says "appends", but there are too many ways to "append".

The docs are correct in leaving the meaning of "append" unspecified.
It is only necessary to meet the syntactic requirements. It is up to
the user to decide whether or not the behavior of the mutable string's
'append' and/or 'assign' operations are appropriate to achieve the
desired outcome.

Thanks

Peter Dimov via Boost

unread,

Aug 15, 2022, 2:28:10 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Andrey Semashev wrote:
> > The semantics are pretty clear when value_type is char, which is what is
> required.
> >
> > It's less clear what should happen when the library calls append or
> > assign with iterators whose value_type is char on something whose
> > value_type is wchar_t, which is what you seem to be doing.
>
> That part is defined by filesystem::path - character code conversion happens.

It's defined, but in my opinion, the semantics are wrong for the intended
use case. The interpretation of paths coming from URLs should not depend
on the current server ANSI code page - that's an implementation detail. You
should be able to move your server to a different Windows installation (or
even to a different OS) and the existing URLs should not suddenly change
meaning.

That's why I think that the only sensible interpretation of paths coming from
URLs is as UTF-8, and for this case, the default filesystem conversion isn't
suitable.

The current MutableString requirements disallow this, as they mandate a
matching value_type of char. This requirement, again in my opinion, is
correct, because if relaxed, people will silently get ANSI code page
dependent conversions, often without realizing it, if they don't develop on
Windows.

_If_ this requirement is relaxed, the library should somehow default to
transcoding from UTF-8 to the value_type, instead of using the ANSI
code page (as with fs::path) or a char to value_type cast (as with
e.g. wstring or u16string.)

Alan de Freitas via Boost

unread,

Aug 15, 2022, 2:55:18 PM8/15/22

to bo...@lists.boost.org, Alan de Freitas

>
> It's less clear what should happen when the library calls append or assign
> with
> iterators whose value_type is char on something whose value_type is
> wchar_t,
> which is what you seem to be doing.

> std::filesystem requirements[2] actually restrict Source arguments to

> std string types, iterators (read: pointers) to C-style strings and
> arrays of characters (which are interpreted as C-style strings).

Yes. That sounds right. It looks like boost::filesystem is matching the
behavior of std;:filesystem now, even if more restrictive (
https://godbolt.org/z/5WYYce6v8).
Boost.URL shouldn't use it and that's for the best, since std::filesystem
doesn't support it either.

Non-contiguous ranges that dereference to char work with both std:: and
boost::filesystem if we use the `append(InputIterator begin, InputIterator
end)` overload.
I think that's the source of confusion here. What C++ says about the
`append(Source const& source)` overload is even more misleading.

(2) and (3) participate in overload resolution only if Source and path are
not the same type, and either:

- Source is a specialization of std::basic_string
<https://en.cppreference.com/w/cpp/string/basic_string> or
std::basic_string_view
<https://en.cppreference.com/w/cpp/string/basic_string_view>, or
- std::iterator_traits
<http://en.cppreference.com/w/cpp/iterator/iterator_traits><std::decay_t
<http://en.cppreference.com/w/cpp/types/decay><Source>>::value_type is
valid and denotes a possibly const-qualified encoding character type (
char, char8_t, (since C++20)char16_t, char32_t, or wchar_t).

We assumed both overloads should work because of this second condition.

> If you're assigning a list to a path, most
> likely you are doing something wrong.

Yes. `append(InputIterator begin, InputIterator end)` would still allow the
person to do this wrong thing though.

And `append(InputIterator begin, InputIterator end)` doesn't look like it's
always wrong.
Two obvious use cases could be (i) appending paths from resource trees or
(ii) some std::ranges::view::... that transforms the input into the chars
to represent a path segment for that input.
If `append(InputIterator begin, InputIterator end)` is not wrong, it looks
like `append(Source const& source)` would not be less wrong when Source is
just the range holding the iterators for the first overload.

In any case, both are still dangerous. Boost.URL and other libraries
shouldn't count on it.
As Peter mentioned, things like wstring and u16string could be appended,
but the semantics will probably be wrong.
They will convert char by char, without regards of encoding.

Thank you for the explanation.

Em seg., 15 de ago. de 2022 às 14:24, Andrey Semashev via Boost <
bo...@lists.boost.org> escreveu:

--
Alan Freitas
https://github.com/alandefreitas

_______________________________________________

Vinnie Falco via Boost

unread,

Aug 15, 2022, 3:10:29 PM8/15/22

to bo...@lists.boost.org, Vinnie Falco, Peter Dimov

On Mon, Aug 15, 2022 at 11:28 AM Peter Dimov via Boost
<bo...@lists.boost.org> wrote:
> That's why I think that the only sensible interpretation of paths coming from
> URLs is as UTF-8, and for this case, the default filesystem conversion isn't
> suitable.

Can you please submit a pull request showing me how route.cpp can be
correctly implemented according to these semantics?

Thanks

Peter Dimov via Boost

unread,

Aug 15, 2022, 3:35:27 PM8/15/22

to Vinnie Falco, bo...@lists.boost.org, Peter Dimov

Vinnie Falco wrote:
> On Mon, Aug 15, 2022 at 11:28 AM Peter Dimov via Boost
> <bo...@lists.boost.org> wrote:
> > That's why I think that the only sensible interpretation of paths
> > coming from URLs is as UTF-8, and for this case, the default
> > filesystem conversion isn't suitable.
>
> Can you please submit a pull request showing me how route.cpp can be
> correctly implemented according to these semantics?

The usual approach is to put

boost::filesystem::path::imbue( std::locale( std::locale::classic(),
new boost::filesystem::detail::utf8_codecvt_facet ) );

as the first thing in `main`.

fs::path has overloads taking codecvt_type, but I've never used them
so I don't know how they work. Maybe Andrey can help.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 4:41:28 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 21:54, Alan de Freitas via Boost wrote:
>>
>> It's less clear what should happen when the library calls append or assign
>> with
>> iterators whose value_type is char on something whose value_type is
>> wchar_t,
>> which is what you seem to be doing.
>
>
>
>> std::filesystem requirements[2] actually restrict Source arguments to
>> std string types, iterators (read: pointers) to C-style strings and
>> arrays of characters (which are interpreted as C-style strings).
>
> Yes. That sounds right. It looks like boost::filesystem is matching the
> behavior of std;:filesystem now, even if more restrictive (
> https://godbolt.org/z/5WYYce6v8).

Boost.Filesystem v3 is more permissive, as it still compiles the
std::list case, although with a warning. v4 will fail to compile.

> Boost.URL shouldn't use it and that's for the best, since std::filesystem
> doesn't support it either.
>
> Non-contiguous ranges that dereference to char work with both std:: and
> boost::filesystem if we use the `append(InputIterator begin, InputIterator
> end)` overload.
> I think that's the source of confusion here. What C++ says about the
> `append(Source const& source)` overload is even more misleading.
>
> (2) and (3) participate in overload resolution only if Source and path are
> not the same type, and either:
>
> - Source is a specialization of std::basic_string
> <https://en.cppreference.com/w/cpp/string/basic_string> or
> std::basic_string_view
> <https://en.cppreference.com/w/cpp/string/basic_string_view>, or
> - std::iterator_traits
> <http://en.cppreference.com/w/cpp/iterator/iterator_traits><std::decay_t
> <http://en.cppreference.com/w/cpp/types/decay><Source>>::value_type is
> valid and denotes a possibly const-qualified encoding character type (
> char, char8_t, (since C++20)char16_t, char32_t, or wchar_t).
>
>
> We assumed both overloads should work because of this second condition.

std::list is not an iterator, applying std::iterator_traits to it is not
valid (for one, std::list does not have an iterator_category).

>> If you're assigning a list to a path, most
>> likely you are doing something wrong.
>
>
> Yes. `append(InputIterator begin, InputIterator end)` would still allow the
> person to do this wrong thing though.
>
> And `append(InputIterator begin, InputIterator end)` doesn't look like it's
> always wrong.

The signature with two iterators is the established practice for
obtaining elements from a foreign sequence. You have it in every std
container, std::string, etc. In particular, it allows to obtain the
elements from exotic sources, like reading from an
std::istreambuf_iterator. There is no such practice with a
single-argument signature.

> Two obvious use cases could be (i) appending paths from resource trees or
> (ii) some std::ranges::view::... that transforms the input into the chars
> to represent a path segment for that input.
> If `append(InputIterator begin, InputIterator end)` is not wrong, it looks
> like `append(Source const& source)` would not be less wrong when Source is
> just the range holding the iterators for the first overload.
>
> In any case, both are still dangerous. Boost.URL and other libraries
> shouldn't count on it.
> As Peter mentioned, things like wstring and u16string could be appended,
> but the semantics will probably be wrong.
> They will convert char by char, without regards of encoding.

If the user calls a function passing two iterators, he is arguably aware
that he is constructing/assigning/appending elements one-by-one,
performing element-wise conversion, if needed. Again, this is
established practice.

If the user passes a single object to constructor/assignment/append, he
provides the call with additional information on the nature of the input
sequence, and the call is expected to behave according to that
knowledge. For example, the call may not copy anything at all and simply
move the contents or increment a reference counter, or use strlen to
discover the end of the string, or use a locale from the source to
perform character code conversion, and so on. As you can see, the
behavior of such call can be very different depending on the argument type.

Yes, with a range like boost::iterator_range or std::span there's really
nothing fancy going on, and semantically the call would be expected to
behave the same as with a pair of iterators. However, this is still a
special case that has to be supported by the call explicitly, among the
other single-argument signatures. This is relatively novel practice, and
in some cases like the one that started this discussion, it can be
ambiguous as to what the call actually does. In comparison, the
two-iterator signature is rather explicit and clear wherever you see it.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 4:57:35 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/15/22 22:35, Peter Dimov via Boost wrote:
> Vinnie Falco wrote:
>> On Mon, Aug 15, 2022 at 11:28 AM Peter Dimov via Boost
>> <bo...@lists.boost.org> wrote:
>>> That's why I think that the only sensible interpretation of paths
>>> coming from URLs is as UTF-8, and for this case, the default
>>> filesystem conversion isn't suitable.
>>
>> Can you please submit a pull request showing me how route.cpp can be
>> correctly implemented according to these semantics?
>
> The usual approach is to put
>
> boost::filesystem::path::imbue( std::locale( std::locale::classic(),
> new boost::filesystem::detail::utf8_codecvt_facet ) );

I'd rather

boost::filesystem::path::imbue(boost::locale::generator()("C.UTF-8"));

Don't recommend users to use stuff from namespace detail. :)

Note though that after that call all narrow strings you pass to
filesystem::path must be UTF-8; you should be careful if you obtain
paths from user input, files, network, etc.

> as the first thing in `main`.
>
> fs::path has overloads taking codecvt_type, but I've never used them
> so I don't know how they work. Maybe Andrey can help.

If you made call above then that is the only thing you need to do. The
overloads taking codecvt is for when you don't want to set the global
filesystem locale for some reason. This may be useful during global
destructors stage, for example. By default, the codecvt is taken from
the global filesystem locale.

Alan de Freitas via Boost

unread,

Aug 15, 2022, 4:59:34 PM8/15/22

to bo...@lists.boost.org, Alan de Freitas

>
> std::list is not an iterator, applying std::iterator_traits to it is not
> valid (for one, std::list does not have an iterator_category).
>

Oh... Right. I failed to notice Source is (i) a basic_string, (ii)
basic_string_view, or (iii) an iterator with one of those value types.
We assumed Source would be (i) a basic_string, (ii) basic_string_view, or
(iii) a container with one of those value types.
Great.

> If the user calls a function passing two iterators, he is arguably aware
> that he is constructing/assigning/appending elements one-by-one,
> performing element-wise conversion, if needed. Again, this is
> established practice.
>

Yes. That makes sense.

> In comparison, the two-iterator signature is rather explicit and clear
> wherever you see it.
>

Definitely.

Thank you again for clarifying.

--
Alan Freitas
https://github.com/alandefreitas

Peter Dimov via Boost

unread,

Aug 15, 2022, 5:10:19 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Andrey Semashev wrote:
> On 8/15/22 22:35, Peter Dimov via Boost wrote:
> > Vinnie Falco wrote:
> >> On Mon, Aug 15, 2022 at 11:28 AM Peter Dimov via Boost
> >> <bo...@lists.boost.org> wrote:
> >>> That's why I think that the only sensible interpretation of paths
> >>> coming from URLs is as UTF-8, and for this case, the default
> >>> filesystem conversion isn't suitable.
> >>
> >> Can you please submit a pull request showing me how route.cpp can be
> >> correctly implemented according to these semantics?
> >
> > The usual approach is to put
> >
> > boost::filesystem::path::imbue( std::locale( std::locale::classic(),
> > new boost::filesystem::detail::utf8_codecvt_facet ) );
>
> I'd rather
>
> boost::filesystem::path::imbue(boost::locale::generator()("C.UTF-8"));
>
> Don't recommend users to use stuff from namespace detail. :)

Sound advice in principle but that's pretty much the established practice
at this point. Perhaps we should expose a non-detail way to perform the
above, such as provide a function "set_utf8_as_default".

> Note though that after that call all narrow strings you pass to filesystem::path
> must be UTF-8; you should be careful if you obtain paths from user input, files,
> network, etc.

Right. That's why I alluded to the alternative, using the codecvt_type overloads.

Andrey Semashev via Boost

unread,

Aug 15, 2022, 5:56:27 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/16/22 00:10, Peter Dimov via Boost wrote:
> Andrey Semashev wrote:
>> On 8/15/22 22:35, Peter Dimov via Boost wrote:
>>> Vinnie Falco wrote:
>>>> On Mon, Aug 15, 2022 at 11:28 AM Peter Dimov via Boost
>>>> <bo...@lists.boost.org> wrote:
>>>>> That's why I think that the only sensible interpretation of paths
>>>>> coming from URLs is as UTF-8, and for this case, the default
>>>>> filesystem conversion isn't suitable.
>>>>
>>>> Can you please submit a pull request showing me how route.cpp can be
>>>> correctly implemented according to these semantics?
>>>
>>> The usual approach is to put
>>>
>>> boost::filesystem::path::imbue( std::locale( std::locale::classic(),
>>> new boost::filesystem::detail::utf8_codecvt_facet ) );
>>
>> I'd rather
>>
>> boost::filesystem::path::imbue(boost::locale::generator()("C.UTF-8"));
>>
>> Don't recommend users to use stuff from namespace detail. :)
>
> Sound advice in principle but that's pretty much the established practice
> at this point. Perhaps we should expose a non-detail way to perform the
> above, such as provide a function "set_utf8_as_default".

Perhaps, move utf8_codecvt_facet to Utility or Core?

>> Note though that after that call all narrow strings you pass to filesystem::path
>> must be UTF-8; you should be careful if you obtain paths from user input, files,
>> network, etc.
>
> Right. That's why I alluded to the alternative, using the codecvt_type overloads.

For completeness, here's a usage example:

void append_utf8(std::string const& str, fs::path& p)
{
std::locale loc = boost::locale::generator()("C.UTF-8");
p.append(str, std::use_facet<fs::path::codecvt_type>(loc));
}

Same with iterators.

Peter Dimov via Boost

unread,

Aug 15, 2022, 6:07:46 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Andrey Semashev wrote:
> > Sound advice in principle but that's pretty much the established
> > practice at this point. Perhaps we should expose a non-detail way to
> > perform the above, such as provide a function "set_utf8_as_default".
>
> Perhaps, move utf8_codecvt_facet to Utility or Core?

Maybe it's time. And make it all inline?

Andrey Semashev via Boost

unread,

Aug 15, 2022, 6:21:14 PM8/15/22

to bo...@lists.boost.org, Andrey Semashev

On 8/16/22 01:07, Peter Dimov via Boost wrote:
> Andrey Semashev wrote:
>>> Sound advice in principle but that's pretty much the established
>>> practice at this point. Perhaps we should expose a non-detail way to
>>> perform the above, such as provide a function "set_utf8_as_default".
>>
>> Perhaps, move utf8_codecvt_facet to Utility or Core?
>
> Maybe it's time. And make it all inline?

Yes.

Although there's this already:

https://github.com/boostorg/locale/blob/develop/include/boost/locale/utf8_codecvt.hpp

Gavin Lambert via Boost

unread,

Aug 15, 2022, 6:46:44 PM8/15/22

to bo...@lists.boost.org, Gavin Lambert

On 16/08/2022 07:35, Peter Dimov wrote:
> The usual approach is to put
>
> boost::filesystem::path::imbue( std::locale( std::locale::classic(),
> new boost::filesystem::detail::utf8_codecvt_facet ) );
>
> as the first thing in `main`.

As a library, you cannot assume that the application has done this (nor
can you do it yourself). The code has to act correctly regardless.

Peter Dimov via Boost

unread,

Aug 15, 2022, 7:31:32 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Gavin Lambert wrote:
> On 16/08/2022 07:35, Peter Dimov wrote:
> > The usual approach is to put
> >
> > boost::filesystem::path::imbue( std::locale( std::locale::classic(),
> > new boost::filesystem::detail::utf8_codecvt_facet ) );
> >
> > as the first thing in `main`.
>
> As a library, you cannot assume that the application has done this (nor can you
> do it yourself). The code has to act correctly regardless.

Yes in principle, but the question I'm replying to is this:

> Can you please submit a pull request showing me how route.cpp can be
> correctly implemented according to these semantics?

In this case I can assume things.

route.cpp does create a path from argv[2], but interpreting it as UTF-8 would
probably be a feature. Using the ACP is wrong anyway as argv[2] is likely in the
console code page.

Gavin Lambert via Boost

unread,

Aug 15, 2022, 7:46:15 PM8/15/22

to bo...@lists.boost.org, Gavin Lambert

On 16/08/2022 11:31, Peter Dimov wrote:
> route.cpp does create a path from argv[2], but interpreting it as UTF-8 would
> probably be a feature. Using the ACP is wrong anyway as argv[2] is likely in the
> console code page.

On Windows, the application always receives command line arguments as
wchar_t. If you provide the narrow main() entrypoint method then it is
the application's runtime startup code that will convert args back to
ANSI using the ACP, not the console code page. As such, interpreting
argv as ACP is correct (albeit lossy if there were any Unicode
characters involved -- the only truly correct design is to only use the
wide entrypoint or ignore the entrypoint args and reparse from the
original Unicode string).

Also on Windows, in most cases the console code page is also the ACP,
although that is less true in more recent versions.

Vinnie Falco via Boost

unread,

Aug 15, 2022, 7:54:03 PM8/15/22

to bo...@lists.boost.org, Vinnie Falco

On Mon, Aug 15, 2022 at 4:46 PM Gavin Lambert via Boost
<bo...@lists.boost.org> wrote:
> ...

My experiences with std::filesystem and boost::filesystem have been
nothing but negative. I think that the decision to make the character
type different on Windows was a mistake. The need for locales and
imbuements and global state and... really, it is just giving me a big
headache.

I think there is room for a new library that handles files,
directories, file metadata, and also has some features of nowide
(where is the signature of fopen that accepts a filesystem::path?). It
should be utf-8 only, use Plain Old char (even on Windows), it should
be completely portable, except that it requires that directories are
possible and that the filesystem isn't weird (I don't really care
about compatibility with grandpa's EPROMs that can hold 9-bit flat
files).

Possible names:

Boost.LordOfTheFiles
Boost.HurrDir
Boost.Progra~1
Boost.FlySystem

Thoughts?

Peter Dimov via Boost

unread,

Aug 15, 2022, 8:03:16 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Gavin Lambert wrote:
> On 16/08/2022 11:31, Peter Dimov wrote:
> > route.cpp does create a path from argv[2], but interpreting it as
> > UTF-8 would probably be a feature. Using the ACP is wrong anyway as
> > argv[2] is likely in the console code page.
>
> On Windows, the application always receives command line arguments as
> wchar_t. If you provide the narrow main() entrypoint method then it is the
> application's runtime startup code that will convert args back to ANSI using the
> ACP, not the console code page. As such, interpreting argv as ACP is correct

You're right, it is. So path::imbue is wrong even for route.cpp.

> (albeit lossy if there were any Unicode characters involved -- the only truly
> correct design is to only use the wide entrypoint or ignore the entrypoint args
> and reparse from the original Unicode string).
>
> Also on Windows, in most cases the console code page is also the ACP,

I'm not sure if these ever match. The console code page is 437 for me, and I
assume for mostly everyone else.

Peter Dimov via Boost

unread,

Aug 15, 2022, 8:05:07 PM8/15/22

to bo...@lists.boost.org, Peter Dimov

Vinnie Falco wrote:
> My experiences with std::filesystem and boost::filesystem have been nothing
> but negative. I think that the decision to make the character type different on
> Windows was a mistake. The need for locales and imbuements and global
> state and... really, it is just giving me a big headache.
>
> I think there is room for a new library that handles files, directories, file
> metadata, and also has some features of nowide (where is the signature of
> fopen that accepts a filesystem::path?). It should be utf-8 only, use Plain Old
> char (even on Windows), it should be completely portable, except that it
> requires that directories are possible and that the filesystem isn't weird (I don't
> really care about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).
>
> Possible names:
>
> Boost.LordOfTheFiles
> Boost.HurrDir
> Boost.Progra~1
> Boost.FlySystem

Boost.LLFIO obviously.

Emil Dotchevski via Boost

unread,

Aug 15, 2022, 8:09:25 PM8/15/22

to Boost, Emil Dotchevski

На пн, 15.08.2022 г. в 16:54 ч. Vinnie Falco via Boost <
bo...@lists.boost.org> написа:

> it should
> be completely portable, except that it requires that directories are
> possible and that the filesystem isn't weird (I don't really care
> about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).
>

AFAIK on posix it is usual that file names are simply zero terminated byte
sequences, which means that UTF-8 won't be _completely_ portable, and not
only on grandpa's EPROMs.

Gavin Lambert via Boost

unread,

Aug 15, 2022, 11:15:52 PM8/15/22

to bo...@lists.boost.org, Gavin Lambert

On 16/08/2022 11:53, Vinnie Falco wrote:
> My experiences with std::filesystem and boost::filesystem have been
> nothing but negative. I think that the decision to make the character
> type different on Windows was a mistake. The need for locales and
> imbuements and global state and... really, it is just giving me a big
> headache.

Using wchar_t on Windows is actually the least painful option. (And you
don't have to worry about locales and imbuements etc if you never try to
convert to not-wchar_t.)

For correct behaviour, you *must* only use the W variants of the native
API methods, or wchar_t methods of standard library functions.

Inevitably, everything in the standard library that accepts 'char'
params assumes that these are encoded in the ANSI code page, not UTF-8.
This can't be "fixed" or it breaks all the legacy apps.

In practice, this means that unless you can absolutely guarantee that
your paths only contain pure ASCII (and the instant you accept a path or
filename from the user, you lose), it is *never* safe to use any of the
non-wide library methods.

You *can* (and many do) store paths in other libraries and in the
application in 'char'-encoded-as-UTF-8, but then you have to remember
every single time you hit the standard library or direct WinAPI
boundaries to convert your strings to wide before passing them across,
or hilarity will ensue (without even a convenient compiler error).

Storing paths as wchar_t in the first place both avoids the cost of
converting back and forth and potential corruption (often overlooked,
unless you regularly test with unicode paths) from accidentally
forgetting a conversion.

> (where is the signature of fopen that accepts a filesystem::path?)

Why are you using fopen in C++ in the first place?

Filesystem does provide 'path' overloads for fstreams, which you should
have been using instead anyway.

> It should be utf-8 only, use Plain Old char (even on Windows), it should
> be completely portable, except that it requires that directories are
> possible and that the filesystem isn't weird (I don't really care
> about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).

In theory, the standard library (and other wrapper libraries around the
WinAPI, including Filesystem) could start doing more sane things by
using the C++20 'char8_t'/'u8string' types to disambiguate between UTF-8
encoded paths and legacy idkwtf-'char'-encoded paths. But this will
take a very long time to percolate through the ecosystem, especially as
there are a bunch of people who hate the very idea of it. And it
doesn't solve the conversion performance angle.

(Hopefully, Windows will eventually provide char8_t entrypoints and
APIs, which will make it easier to interoperate with not-Windows.)

Although as Emil has already pointed out, it's valid in not-Windows to
have arbitrary not-UTF-8 byte sequences in paths, so you can get into
trouble in that direction as well.

That's another reason for using wchar_t in Windows and char in
not-Windows: no conversions happen at all (at least where values are
accepted natively from the OS), which has maximal compatibility for
otherwise-invalid byte sequences that nevertheless exist.

Daniela Engert via Boost

unread,

Aug 16, 2022, 2:33:32 AM8/16/22

to bo...@lists.boost.org, Daniela Engert, Gavin Lambert

Amen brother, you speak wisely!

I want to add the following to stay sane on Windows: ensure that *both*
the wide and the narrow execution character encoding is Unicode (i.e.
UTF-16 for wchar_t (that's the default) and UTF-8 for char), build with
_UNICODE defined, and link with <activeCodePage
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>.
This guarantees consistent semantics throughout the *whole* execution of
the program on reasonably recent versions of Windows. And lastly,
represent paths with std/boost filesystem paths and use APIs that know
how to deal with them *correctly*.

Similar advise applies to POSIX systems. UTF-8 everywhere is just a
recommendation but no guarantee.

Dani

Alexander Grund via Boost

unread,

Aug 16, 2022, 3:01:00 AM8/16/22

to bo...@lists.boost.org, Alexander Grund

Am 15.08.22 um 23:10 schrieb Peter Dimov via Boost:

> Andrey Semashev wrote:
>> On 8/15/22 22:35, Peter Dimov via Boost wrote:
>>> The usual approach is to put
>>>
>>> boost::filesystem::path::imbue( std::locale( std::locale::classic(),
>>> new boost::filesystem::detail::utf8_codecvt_facet ) );
>> I'd rather
>>
>> boost::filesystem::path::imbue(boost::locale::generator()("C.UTF-8"));
>>
>> Don't recommend users to use stuff from namespace detail. :)
> Sound advice in principle but that's pretty much the established practice
> at this point. Perhaps we should expose a non-detail way to perform the
> above, such as provide a function "set_utf8_as_default".

Is it? I see `boost::locale::generator().generate("")` being used more
often, e.g. (quick Google search):
http://doc.aldebaran.com/1-14/ref/libqi/boost_filesystem_example_8cpp-example.html

And there is `boost::nowide::nowide_filesystem()` doing pretty much what
you suggest:
https://github.com/boostorg/nowide/blob/master/include/boost/nowide/filesystem.hpp#L20

Am 16.08.22 um 00:21 schrieb Andrey Semashev via Boost:
> On 8/16/22 01:07, Peter Dimov via Boost wrote:

>> Andrey Semashev wrote:
>>> Perhaps, move utf8_codecvt_facet to Utility or Core?
>> Maybe it's time. And make it all inline?
> Yes.
>
> Although there's this already:
>
> https://github.com/boostorg/locale/blob/develop/include/boost/locale/utf8_codecvt.hpp

And again in Nowide:
https://github.com/boostorg/nowide/blob/master/include/boost/nowide/utf8_codecvt.hpp

Am 16.08.22 um 01:53 schrieb Vinnie Falco via Boost:
> (where is the signature of fopen that accepts a filesystem::path?).
That's outside of its purpose: It is meant to accept narrow strings,
UTF-8 encoded where applicable.

Andrey Semashev via Boost

unread,

Aug 16, 2022, 3:41:31 AM8/16/22

to bo...@lists.boost.org, Andrey Semashev

On 8/16/22 03:08, Emil Dotchevski via Boost wrote:
> На пн, 15.08.2022 г. в 16:54 ч. Vinnie Falco via Boost <
> bo...@lists.boost.org> написа:
>
>> it should
>> be completely portable, except that it requires that directories are
>> possible and that the filesystem isn't weird (I don't really care
>> about compatibility with grandpa's EPROMs that can hold 9-bit flat
>> files).
>
> AFAIK on posix it is usual that file names are simply zero terminated byte
> sequences, which means that UTF-8 won't be _completely_ portable, and not
> only on grandpa's EPROMs.

In practice, paths are in UTF-8 on all modern POSIX systems.

Peter Dimov via Boost

unread,

Aug 16, 2022, 3:45:04 AM8/16/22

to bo...@lists.boost.org, Peter Dimov

Gavin Lambert wrote:
> Using wchar_t on Windows is actually the least painful option. (And you don't
> have to worry about locales and imbuements etc if you never try to convert to
> not-wchar_t.)

That's only if your program never runs on anything else. For portable code,
using char and UTF-8 is the least painful option. We have an entire library
in Boost for this purpose, whose documentation does a reasonable job
explaining that.

https://www.boost.org/doc/libs/1_80_0/libs/nowide/doc/html/index.html

Andrey Semashev via Boost

unread,

Aug 16, 2022, 3:47:20 AM8/16/22

to bo...@lists.boost.org, Andrey Semashev

On 8/16/22 02:53, Vinnie Falco via Boost wrote:
>
> (where is the signature of fopen that accepts a filesystem::path?).

I was actually thinking of adding one in Boost.Filesystem but haven't
got around to do it.

In the meantime, you can use boost::filesystem::fstream.

Niall Douglas via Boost

unread,

Aug 16, 2022, 11:19:50 AM8/16/22

to bo...@lists.boost.org, Niall Douglas

On 16/08/2022 00:53, Vinnie Falco via Boost wrote:
> On Mon, Aug 15, 2022 at 4:46 PM Gavin Lambert via Boost
> <bo...@lists.boost.org> wrote:
>> ...
>
> My experiences with std::filesystem and boost::filesystem have been
> nothing but negative. I think that the decision to make the character
> type different on Windows was a mistake. The need for locales and
> imbuements and global state and... really, it is just giving me a big
> headache.
>
> I think there is room for a new library that handles files,
> directories, file metadata, and also has some features of nowide
> (where is the signature of fopen that accepts a filesystem::path?). It
> should be utf-8 only, use Plain Old char (even on Windows), it should
> be completely portable, except that it requires that directories are
> possible and that the filesystem isn't weird (I don't really care
> about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).

With proposed std::filesystem::path_view you can use UTF-8 everywhere
once it is into the standard. The committee has signed off on the
design, I just need to get it through LWG and it's into the next IS.

I just need non-work free time to reappear to get back to working on my
standards papers. It'll happen eventually.

Niall

Tom Honermann via Boost

unread,

Aug 16, 2022, 11:21:58 AM8/16/22

to bo...@lists.boost.org, Tom Honermann

On 8/16/22 3:41 AM, Andrey Semashev via Boost wrote:
> On 8/16/22 03:08, Emil Dotchevski via Boost wrote:
>> На пн, 15.08.2022 г. в 16:54 ч. Vinnie Falco via Boost <
>> bo...@lists.boost.org> написа:
>>
>>> it should
>>> be completely portable, except that it requires that directories are
>>> possible and that the filesystem isn't weird (I don't really care
>>> about compatibility with grandpa's EPROMs that can hold 9-bit flat
>>> files).
>> AFAIK on posix it is usual that file names are simply zero terminated byte
>> sequences, which means that UTF-8 won't be _completely_ portable, and not
>> only on grandpa's EPROMs.
> In practice, paths are in UTF-8 on all modern POSIX systems.

Could you please provide a citation for that?

I have experience with people continuing to use Shift-JIS for file names
in recent-ish times.

Tom.

Andrey Semashev via Boost

unread,

Aug 16, 2022, 12:21:51 PM8/16/22

to bo...@lists.boost.org, Andrey Semashev

On 8/16/22 18:15, Tom Honermann wrote:
> On 8/16/22 3:41 AM, Andrey Semashev via Boost wrote:
>> On 8/16/22 03:08, Emil Dotchevski via Boost wrote:
>>> На пн, 15.08.2022 г. в 16:54 ч. Vinnie Falco via Boost <
>>> bo...@lists.boost.org> написа:
>>>
>>>> it should
>>>> be completely portable, except that it requires that directories are
>>>> possible and that the filesystem isn't weird (I don't really care
>>>> about compatibility with grandpa's EPROMs that can hold 9-bit flat
>>>> files).
>>> AFAIK on posix it is usual that file names are simply zero terminated
>>> byte
>>> sequences, which means that UTF-8 won't be _completely_ portable, and
>>> not
>>> only on grandpa's EPROMs.
>> In practice, paths are in UTF-8 on all modern POSIX systems.
>
> Could you please provide a citation for that?
>
> I have experience with people continuing to use Shift-JIS for file names
> in recent-ish times.

That's just my experience. I haven't seen a non-UTF-8 POSIX system in
decades.

Gavin Lambert via Boost

unread,

Aug 16, 2022, 6:31:01 PM8/16/22

to bo...@lists.boost.org, Gavin Lambert

On 16/08/2022 19:44, Peter Dimov wrote:
> Gavin Lambert wrote:
>> Using wchar_t on Windows is actually the least painful option. (And you don't
>> have to worry about locales and imbuements etc if you never try to convert to
>> not-wchar_t.)
>
> That's only if your program never runs on anything else. For portable code,
> using char and UTF-8 is the least painful option. We have an entire library
> in Boost for this purpose, whose documentation does a reasonable job
> explaining that.

Currently, yes. In theory, though, you could adopt a TCHAR-like
approach where you use wchar_t on Windows and char/char8_t on
not-Windows, selected at compile time. This would avoid all conversions
and just use the native character type of the OS, which would be better.

(Windows has its own version of the invalid characters problem -- it's
legal to have mismatched surrogates in filenames, which work fine as
long as you keep everything in wchar_t UCS-2 and never convert it, but
break if you convert to UTF-8 and back. It's probably less common than
not-UTF-8 non-Windows filenames, though.)

The downside is that you need every single bit of code to either use
this TCHAR type (which in turn means that you need to be able to
recompile everything), or (better) to provide overloads for all possible
underlying types (with the same name, so that the actual code is spelled
the same either way), and some usages may need macros or char_traits
etc. (But then that tends to lead to either code duplication or
over-templating, neither of which is good.)

Ideally, the standard library would have defined such a
platform-specific type alias (notably, not actually a distinct type, so
that existing overloads work), which would have made it easier to build
up libraries around it, or at least encourage writing both overloads.
Or the language would define some kind of compile-time-variant that
permits separate-translation-unit implementation of overloaded types
that have the "same" implementation without header-only templates.
Sadly that hasn't happened yet.

Andrey Semashev via Boost

unread,

Aug 29, 2022, 5:58:15 PM8/29/22

to bo...@lists.boost.org, Andrey Semashev

Actually, I was incorrect on this one, at least wrt.
std::filesystem::path. It only accepts iterators that have value_type
that is one of the path character types.

Reply all

Reply to author

Forward