It does not meet the requirements on Windows, and never did -
path::value_type is wchar_t there.
Also, you don't define the semantics of the operations, in particular,
what is the expected result. Regardless of the platform, path::append
does not merely appends characters, it appends a path, which may result
in appending a directory delimiter or completely replacing the path if
the argument is an absolute path. Depending on what exactly you mean by
"Appends the characters", this may or may not qualify.
I should also note that filesystem::path is not a container either
because path::value_type is different from that of path::iterator.
> std::filesystem requirements[2] actually restrict Source arguments to
> std string types, iterators (read: pointers) to C-style strings and
> arrays of characters (which are interpreted as C-style strings).
Yes. That sounds right. It looks like boost::filesystem is matching the
behavior of std;:filesystem now, even if more restrictive (
https://godbolt.org/z/5WYYce6v8).
Boost.URL shouldn't use it and that's for the best, since std::filesystem
doesn't support it either.
Non-contiguous ranges that dereference to char work with both std:: and
boost::filesystem if we use the `append(InputIterator begin, InputIterator
end)` overload.
I think that's the source of confusion here. What C++ says about the
`append(Source const& source)` overload is even more misleading.
(2) and (3) participate in overload resolution only if Source and path are
not the same type, and either:
- Source is a specialization of std::basic_string
<https://en.cppreference.com/w/cpp/string/basic_string> or
std::basic_string_view
<https://en.cppreference.com/w/cpp/string/basic_string_view>, or
- std::iterator_traits
<http://en.cppreference.com/w/cpp/iterator/iterator_traits><std::decay_t
<http://en.cppreference.com/w/cpp/types/decay><Source>>::value_type is
valid and denotes a possibly const-qualified encoding character type (
char, char8_t, (since C++20)char16_t, char32_t, or wchar_t).
We assumed both overloads should work because of this second condition.
> If you're assigning a list to a path, most
> likely you are doing something wrong.
Yes. `append(InputIterator begin, InputIterator end)` would still allow the
person to do this wrong thing though.
And `append(InputIterator begin, InputIterator end)` doesn't look like it's
always wrong.
Two obvious use cases could be (i) appending paths from resource trees or
(ii) some std::ranges::view::... that transforms the input into the chars
to represent a path segment for that input.
If `append(InputIterator begin, InputIterator end)` is not wrong, it looks
like `append(Source const& source)` would not be less wrong when Source is
just the range holding the iterators for the first overload.
In any case, both are still dangerous. Boost.URL and other libraries
shouldn't count on it.
As Peter mentioned, things like wstring and u16string could be appended,
but the semantics will probably be wrong.
They will convert char by char, without regards of encoding.
Thank you for the explanation.
Em seg., 15 de ago. de 2022 às 14:24, Andrey Semashev via Boost <
bo...@lists.boost.org> escreveu:
--
Alan Freitas
https://github.com/alandefreitas
_______________________________________________
> it should
> be completely portable, except that it requires that directories are
> possible and that the filesystem isn't weird (I don't really care
> about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).
>
AFAIK on posix it is usual that file names are simply zero terminated byte
sequences, which means that UTF-8 won't be _completely_ portable, and not
only on grandpa's EPROMs.
Amen brother, you speak wisely!
I want to add the following to stay sane on Windows: ensure that *both*
the wide and the narrow execution character encoding is Unicode (i.e.
UTF-16 for wchar_t (that's the default) and UTF-8 for char), build with
_UNICODE defined, and link with <activeCodePage
xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>.
This guarantees consistent semantics throughout the *whole* execution of
the program on reasonably recent versions of Windows. And lastly,
represent paths with std/boost filesystem paths and use APIs that know
how to deal with them *correctly*.
Similar advise applies to POSIX systems. UTF-8 everywhere is just a
recommendation but no guarantee.
Dani
In practice, paths are in UTF-8 on all modern POSIX systems.
Could you please provide a citation for that?
I have experience with people continuing to use Shift-JIS for file names
in recent-ish times.
Tom.
That's just my experience. I haven't seen a non-UTF-8 POSIX system in
decades.