On sábado, 23 de abril de 2016 12:16:23 PDT Andrew Tomazos wrote:
> Please find attached a draft 3-page proposal entitled "Proposal of File
> String Literals".
Any thought about what to do if the input file isn't the exact binary data you
want? For example, suppose you need to encode or decode base64.
Can you show this can be done with constexpr expressions?
I can see how the proposed feature would be useful in many contexts and
provide a clean solution to what if often handled in a messy way today.
I'm wondering why you have decided to handle a pre-processing action
with a syntax that doesn't look like this. I remember having seen
somebody discuss a similar feature some time ago that looked somehow
like this.
#inplace char data "datafile.txt"
#include <options> <file-specifier/pp-token>auto the_string = #include text "somefile.txt";On 2016-04-23 19:27, Moritz Klammler wrote:
> I'm wondering why you have decided to handle a pre-processing action
> with a syntax that doesn't look like this. I remember having seen
> somebody discuss a similar feature some time ago that looked somehow
> like this.
>
> #inplace char data "datafile.txt"
I was also wondering about that. In particular, I'll note that tools
that need to do non-compiler-assisted dependency scanning may have an
easier time with this format.
> Another question that should be discussed is whether and how to support
> non-text data.
No, that's not a question. That's a hard requirement ;-). I'm not sure
the proposal doesn't cover this, though?
auto binary_data = F"image.png"; // char[]
auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
auto image = parse_png(binary_data, binary_size);
That said... I also wonder if being able to select between text vs.
binary mode is important, especially if importing large text blocks is
really a desired feature? (Most of the use cases I've seen have been for
binary files such as images. Andrew's proposal seems to imply uses for
text.)
On 2016-04-25 13:17, Andrew Tomazos wrote:
> On Mon, Apr 25, 2016 at 5:24 PM, Matthew Woehlke wrote:
>> On 2016-04-23 19:27, Moritz Klammler wrote:
>>> Another question that should be discussed is whether and how to support
>>> non-text data.
>>
>> No, that's not a question. That's a hard requirement ;-). I'm not sure
>> the proposal doesn't cover this, though?
>>
>> auto binary_data = F"image.png"; // char[]
>> auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
>> auto image = parse_png(binary_data, binary_size);
>
> I don't think that is portable, or advisable. In particular source
> files are decoded in an implementation-defined manner during
> translation. Even if your source encoding was the same as the
> execution encoding, the implementation may still reject arbitrary
> binary sequences which are not valid for that encoding.
>
> While we might be able to extend the proposal to make this work, I
> think the better way [...]
In that case, I am Strongly Against your proposal. I probably, on some
occasions, want this feature for text. I *definitely* want it for binary
resources, and much more often than I might want it for text.
I think you are missing a significant and important use case, and, if
you don't account for that case, the feature is just begging to be
misused and abused and subject to confusion and surprise breakage.
On Mon, Apr 25, 2016 at 7:45 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Monday, April 25, 2016 at 1:38:05 PM UTC-4, Matthew Woehlke wrote:
On 2016-04-25 13:17, Andrew Tomazos wrote:
> On Mon, Apr 25, 2016 at 5:24 PM, Matthew Woehlke wrote:
>> On 2016-04-23 19:27, Moritz Klammler wrote:
>>> Another question that should be discussed is whether and how to support
>>> non-text data.
>>
>> No, that's not a question. That's a hard requirement ;-). I'm not sure
>> the proposal doesn't cover this, though?
>>
>> auto binary_data = F"image.png"; // char[]
>> auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
>> auto image = parse_png(binary_data, binary_size);
>
> I don't think that is portable, or advisable. In particular source
> files are decoded in an implementation-defined manner during
> translation. Even if your source encoding was the same as the
> execution encoding, the implementation may still reject arbitrary
> binary sequences which are not valid for that encoding. [...]
If you give people the ability to include files as strings, people are going to use it for including binary files. That is guaranteed. So the only options are to have it cause subtle breakage or to properly support it.Ok, you've convinced me.We can add a new encoding-prefix "b" for binary. So it would be:auto binary_data = bF"image.png";I'd need to think about the wording, it will probably be impementation-defined with a note saying roughly that the data should undergo no decoding or encoding from source to execution.
On Fri, Apr 29, 2016 at 8:01 PM, Nicol Bolas <jmck...@gmail.com> wrote:On Friday, April 29, 2016 at 1:16:01 PM UTC-4, Andrew Tomazos wrote:On Fri, Apr 29, 2016 at 6:01 PM, Nicol Bolas <jmck...@gmail.com> wrote:On Thursday, April 28, 2016 at 5:39:09 PM UTC-4, Arthur O'Dwyer wrote:I was just thinking before your post that this has shades of the icky "mode" behavior of fopen(); i.e., it's up to the programmer (and therefore often buggy) whether the file is opened in "b"inary or "t"ext mode. What makes for the "often buggy" part is that the path-of-least-resistance happens to work perfectly on Unix/Linux/OSX, and therefore the vast majority of working programmers never need to learn the icky parts.What happens on a Windows platform when I writeconst char data[] = R"()";? Does data come out equivalent to "\n" or to "\r\n"? Does it depend on the compiler (MSVC versus Clang) or not? I don't have a Windows machine to find out for myself, sorry.
According to the standard:
> A source-file new-line in a raw string literal results in a new-line in the resulting execution string-
literal.
So it would be a `\n`, not a `\r\n`.
Granted, the above quote is not in normative text (presumably because section 2.14.3 makes it more clear). But clearly that is the intent of the specification. So if VS doesn't do that correctly, then it's broken.
And since VS has had raw string literals for a while now, odds are good someone would have noticed it if they did it wrong.Not so fast:"Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined."This mapping is commonly known as the "source encoding". As a part of the souce file, the content of raw string literals are input, likewise, in source encoding.
Considering that the specification has non-normative examples of raw string literals and their non-raw equivalents, and those examples explicitly show that a "source encoding" newline should be equivalent to "\n", then clearly the writers of the specification believe that the conversion is not implementation dependent. So either your interpretation or their interpretation of the spec is wrong.The examples you are referring to do not show how the new-line is encoded in the original physical source file.
They cannot, unless they show the hex dump of the bytes of the physical source file. The examples just show that the "text" new line in the raw string literal (after decoding) can be equivalent to an escaped new line '\n' in an ordinary string literal. I think the motivation of the example was just to show that raw string literals can contain embedded new lines (unlike ordinary string literals) - among other this.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
In Table 7 it says '\n' maps to NL(LF) and that '\r' maps to CR, and offers no further definition of what NL( LF) and CR are.
I assume NL(LF) is the new line character that is a member of the basic source character set, and that CR is the carriage return that is a member of the basic execution character set. I think the these basic source character set new lines are the same "new-line characters" refered to in the "introducing new-line characters for end-of-line indicators" during phase 1 source decoding.
On 2016-04-28 17:39, Arthur O'Dwyer wrote:
> Either way, your proposal should include an example along the lines of
>
> const char example[] = RF"foo(bar.h)foo";
>
> Does this mean "include bar.h", or "include foo(bar.h)foo" — and why?
Certainly the latter; anything else is just overly complicating things
to no benefit.
If you really need a string like "foo" + contents of 'bar.h' + "foo",
use concatenation:
Ah... yes, I interpret the original example as being the same as
`F"foo(bar.h)foo"`, with the `R` serving only to specify binary vs. text
include mode. (As Nicol notes, this may be a good reason to use
something other than `R` for that purpose. Maybe we should use `Ft` and
`Fb` instead, with `F` by itself being a synonym for `Ft`?)
const char *str = "SomeLiteral";const unsigned char var[] = bF"Filename.bin";
const unsigned char* pVar = var;const unsigned char* pVar = bF"Filename.bin";On 2016-05-04 11:22, Nicol Bolas wrote:
> Actually, something just occurred to me about `bF`. Namely, NUL termination.
>
> All of the genuine string literals should be NUL terminated, since that's
> how we expect literals to behave. But `bF` shouldn't be NUL terminated.
> So... what do we do?
I'm not sure that's genuinely a problem¹. Many file formats, even
binary, are likely tolerant of a "stray" NUL at the end, and even if
not, I can't think how you would use such a string without specifying
the length, in which case it would be trivial to subtract 1.
On Wed, May 4, 2016 at 8:08 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Wednesday, May 4, 2016 at 10:23:17 AM UTC-4, Matthew Woehlke wrote:Ah... yes, I interpret the original example as being the same as
`F"foo(bar.h)foo"`, with the `R` serving only to specify binary vs. text
include mode. (As Nicol notes, this may be a good reason to use
something other than `R` for that purpose. Maybe we should use `Ft` and
`Fb` instead, with `F` by itself being a synonym for `Ft`?)
We need more than just `t` and `b` here. We need to be able to use the full range of encodings that C++ provides for string literals: narrow, wide, UTF-8, UTF-16, and UTF-32.
So `u8F` would mean that the file is encoded in UTF-8, so the generated literal should match. I would prefer to avoid `Fu8`, because that makes `Fu8"filename.txt"` seem like the `u8` applies to the filename literal rather than the generated one.
We need to have: `F` (narrow), `LF` (wide), `u8F` (UTF-8), `uF` (UTF-16), `UF` (UTF-32), and `bF` (no translation).Incorrect (for once ;)).The prefixes u8, u, U and the suffix L don't apply to the encoding of the *source code*. They apply to the encoding of the *runtime data*.
On Wednesday, May 4, 2016 at 4:09:34 PM UTC-4, Arthur O'Dwyer wrote:On Wed, May 4, 2016 at 8:08 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Wednesday, May 4, 2016 at 10:23:17 AM UTC-4, Matthew Woehlke wrote:Ah... yes, I interpret the original example as being the same as
`F"foo(bar.h)foo"`, with the `R` serving only to specify binary vs. text
include mode. (As Nicol notes, this may be a good reason to use
something other than `R` for that purpose. Maybe we should use `Ft` and
`Fb` instead, with `F` by itself being a synonym for `Ft`?)
We need more than just `t` and `b` here. We need to be able to use the full range of encodings that C++ provides for string literals: narrow, wide, UTF-8, UTF-16, and UTF-32.
So `u8F` would mean that the file is encoded in UTF-8, so the generated literal should match. I would prefer to avoid `Fu8`, because that makes `Fu8"filename.txt"` seem like the `u8` applies to the filename literal rather than the generated one.
We need to have: `F` (narrow), `LF` (wide), `u8F` (UTF-8), `uF` (UTF-16), `UF` (UTF-32), and `bF` (no translation).Incorrect (for once ;)).The prefixes u8, u, U and the suffix L don't apply to the encoding of the *source code*. They apply to the encoding of the *runtime data*.
So you're saying that you will always be limited to the source character set. So... what if the source character set doesn't include all of the characters you want to use? What if you have a Unicode-encoded file you want to include?
In a regular C++ file, you can work around this by escaping characters in string literals. C++11 allows you to do `u8"\u4321"`, and it will convert that Unicode code unit into a UTF-8 sequence.
How do I do the same with file input? There seem to be 3 alternatives:
1: Escape sequences in the included file are processed as though they were in a non-raw string literal. That's... bad. I'm pretty sure most people don't want to have escape sequences work that way, especially if they're including text for other languages like scripting languages.
2: No escape sequences are allowed, which means that the character data for inclusions is limited to only the implementation-defined source character set. No Unicode characters, nada.
3: The user has the ability to tell the compiler what character set is being read at the inclusion point.
#3 is what I am referring to with those prefixes.
So we seem to have two semi-orthogonal dimensions of options: the format of the source file and the desired format of the converted string literal. Source files can be:
- Source character set
- Unicode, UTF-8
- Unicode, UTF-16
- Unicode, UTF-32
- Binary, no translation.
Alternatively, we can make the Unicode reading a specialized form of binary loading. So if you say `u8bF`, what you're saying is that no text translation will be done, but the file's data will assumed to be a UTF-8 string. Similarly, with `ubF`, the file will be read as a UTF-16 string with no text translation.
The only use cases with this that would be missed are:
- Unicode transcoding. Reading a UTF-16 file and re-encoding it as UTF-8. Probably not a compelling use case.
- Text translation with Unicode files. That is, if you have a UTF-8 file that contains platform-specific newlines and you want it translated to platform-neutral ones.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CADvuK0%2BDjUDqprR8VvuqUJMMaP9LhGepOKpEAbnvB%2BTk6VAwoQ%40mail.gmail.com.
My current design has two input formats. Binary and text. The text encoding format is the usual source encoding for your source files (the overwhelming default these days is UTF-8). If you have a text file in a different encoding, then you can transcode it into source encoding within your source files. It will then undergo the usual source-to-execution transcoding that the bodies of raw string literals undergo.
Beyond these two options, I do not think futher built-in support for a specific set of different source text encoding formats is necessary.
I would like to be able to write a constexpr function f such that both f(F"foo") and f(bF"foo") will work and properly get the range of bytes that the two literals present.
Ultimately I would like this to work:constexpr string_view sv = bF"foo";
On 2016-05-05 11:18, Tom Honermann wrote:
> On 5/5/2016 11:15 AM, Matthew Woehlke wrote:
>> On 2016-05-05 11:01, Tom Honermann wrote:
>>> On 5/5/2016 10:52 AM, Matthew Woehlke wrote:
>>>> Personally, I'd prefer if the source (if text) is required to be in
>>>> Unicode with a BOM...
>>> Please, no. On non-ASCII based systems, forcing Unicode would be a
>>> significant burden to users.
>> Uh, you *do* realize that ASCII is a subset of UTF-8, yes? Requiring
>> input files to be Unicode with BOM (no BOM → UTF-8) would be fully
>> compatible with ASCII-conforming input files.
>>
> I said on *non*-ASCII based systems.
Ah, sorry, misread :-).
But in that case, Andrew's suggestion (just read
it as source encoding, and if you need something else, "too bad"¹)
works. I'm more inclined to that anyway.
(¹ Use compile-time text processing for this case if you simply *must*
have it.)
On Wednesday, May 4, 2016 at 10:35:55 PM UTC-4, Andrew Tomazos wrote:My current design has two input formats. Binary and text. The text encoding format is the usual source encoding for your source files (the overwhelming default these days is UTF-8). If you have a text file in a different encoding, then you can transcode it into source encoding within your source files. It will then undergo the usual source-to-execution transcoding that the bodies of raw string literals undergo.
Source encodings are not required to be able to support all of Unicode. And without escape characters and `\U`, I have no way to "transcode it into source encoding" because the source encoding cannot support it. If I have a Unicode-encoded file under your rules, the only thing I can do is load it as binary. And thanks to cross-compiling, the executable environment may have a different endian than the source environment.
Beyond these two options, I do not think futher built-in support for a specific set of different source text encoding formats is necessary.
Internationalization is not optional. Nor is cross-compilation support.
I would like to be able to write a constexpr function f such that both f(F"foo") and f(bF"foo") will work and properly get the range of bytes that the two literals present.
... why? Generally speaking, functions that process text and functions that process binary data are different functions. Unless your binary data actually is text, but that doesn't really make sense.Ultimately I would like this to work:constexpr string_view sv = bF"foo";
Why? `string_view`, as the name suggests, is for strings. The class you want is called `span<unsigned char>`. That's how we spell "array of arbitrary bytes" in C++.
On Thu, May 5, 2016 at 5:26 PM, Nicol Bolas <jmck...@gmail.com> wrote:On Wednesday, May 4, 2016 at 10:35:55 PM UTC-4, Andrew Tomazos wrote:My current design has two input formats. Binary and text. The text encoding format is the usual source encoding for your source files (the overwhelming default these days is UTF-8). If you have a text file in a different encoding, then you can transcode it into source encoding within your source files. It will then undergo the usual source-to-execution transcoding that the bodies of raw string literals undergo.
Source encodings are not required to be able to support all of Unicode. And without escape characters and `\U`, I have no way to "transcode it into source encoding" because the source encoding cannot support it. If I have a Unicode-encoded file under your rules, the only thing I can do is load it as binary. And thanks to cross-compiling, the executable environment may have a different endian than the source environment.It's very rare to be cross-compiling from a build system that is less powerful than the target system in this fashion.
Ultimately I would like this to work:constexpr string_view sv = bF"foo";
Why? `string_view`, as the name suggests, is for strings. The class you want is called `span<unsigned char>`. That's how we spell "array of arbitrary bytes" in C++.People use string_view to address ranges of "raw memory", and not just for text.
That given, supporting that small (or perhaps even nonexistant) group with the DIY constexpr/binary option, seems more reasonable than introducing the complexity of multi-source-encoding into the feature.
On Thursday, May 5, 2016 at 12:10:25 PM UTC-4, Andrew Tomazos wrote:On Thu, May 5, 2016 at 5:49 PM, Nicol Bolas <jmck...@gmail.com> wrote:It's silly to give up now that we're so close to a functional design. You pointed out that you can identify Unicode-encoded files by their BOMs. So instead of the giant number of sources, we actually only have:
- Source character set
- Unicode, in a format as identified by BOMs
- Binary
`F` would mean source character set. `Fb` would mean binary. And `Fu` would mean Unicode, as identified by BOMs. You can still apply the encoding prefixes to the non-binary forms, and indeed you must provide one for `Fu`. For example:
- u8F: Read source character set, convert to UTF-8.
- uFu: Read Unicode text, convert to UTF-16.
And so forth.
`Fu` would specifically mean:
- Platform-specific text translation (new-lines and so forth).
- BOM to identify the source Unicode encoding and endian. Lack of BOM automatically means UTF-8, but the UTF-8 BOM will also mean UTF-8. BOM is stripped out.
- NUL-terminated.
The encoding prefix allows cross-Unicode conversion. So if you have a UTF-16 text file and you want to store it as UTF-8, you use `u8Fu`.
It's a simple and elegant solution to the source encoding problem.
(¹ Use compile-time text processing for this case if you simply *must*
have it.)
The problem with that is that you are assuming that Unicode-encoded files represent a minor province of what people do.That's not the claim. The claim is that the number of people using both an unusual source encoding and a Unicode execution encoding is very small. I think that is correct.
What is "an unusual source encoding?" Unicode is not "unusual".
That given, supporting that small (or perhaps even nonexistant) group with the DIY constexpr/binary option, seems more reasonable than introducing the complexity of multi-source-encoding into the feature.
Given that we have no evidence that "DIY constexpr/binary option" is even possible (for example, how do you detect what endian your execution environment is?) I see no reason to ignore such use cases.
Also, what "complexity" are we talking about? I added a single suffix to `F`. That's not exactly complex.
, the source encoding must be unusual.
That given, supporting that small (or perhaps even nonexistant) group with the DIY constexpr/binary option, seems more reasonable than introducing the complexity of multi-source-encoding into the feature.
Given that we have no evidence that "DIY constexpr/binary option" is even possible (for example, how do you detect what endian your execution environment is?) I see no reason to ignore such use cases.It works fine. I've done much more complicated things with constexpr programming than transcoding some text.
As for "detecting the endianness" of your execution environment, you can either get it from your compiler predefined macros / intrinsics (if available) or specify it explicitly as part of your build configuration (setting for example a macro or constexpr variable). It's not a big deal.
It increases the number of encoding prefixes quadratically for everyone - to solve what seems to be a small problem for which there are adequate simpler solutions.Also, what "complexity" are we talking about? I added a single suffix to `F`. That's not exactly complex.
, the source encoding must be unusual.
ASCII is not "unusual" for a source character set. The default source character set for quite a few compilers does not natively handle embedded Unicode characters.
In almost all cases the source encoding is UTF-8, and the text files for input to text file literals will already be in that format.
, the source encoding must be unusual.
ASCII is not "unusual" for a source character set. The default source character set for quite a few compilers does not natively handle embedded Unicode characters.
I don't think that is true. Please list a few of those quite a few compilers.
On 5/5/2016 2:19 PM, Andrew Tomazos wrote:
In almost all cases the source encoding is UTF-8, and the text files for input to text file literals will already be in that format.
I keep hearing this echoed on various C++ standard mailing lists, but in my experience, this just isn't true. I think Clang assumes that the source encoding is UTF-8, but gcc,
the Microsoft compiler, IBM's compilers, etc... use the current locale at the time of compilation to determine the source encoding in the absence of a UTF-8 BOM (rare), #pragma (rare, somewhat common for IBM compilers, at least in header files) or explicit compiler option (also rare, and only very recently an option for the Microsoft compiler [1]). It is certainly common for source files to be limited to ASCII and therefore UTF-8 compatible, but I don't think it is fair to state this is true in "almost all cases".
, the source encoding must be unusual.
ASCII is not "unusual" for a source character set. The default source character set for quite a few compilers does not natively handle embedded Unicode characters.
I don't think that is true. Please list a few of those quite a few compilers.
The most obvious example is IBM's z/OS C++ compiler. But again, by default, gcc and Microsoft use the current locale at the time of compilation to determine the source character set. The current locale may specify a character set that is not ASCII compatible; Shift JIS, for example, encodes a yen symbol (¥) at the code point (0x5c) that is used for backslash (\) in ASCII.
Tom.
[1]: https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/73afa19b-6ce0-a209-b2e6-82962511ebea%40honermann.net.
If gcc doesn't consult the locale, then there isn't anything to fall
back from (unrecognized -finput-charset operands are rejected). The
test I supplied demonstrated that gcc doesn't reject ill-formed UTF-8,
so I think it is imprecise to state it uses UTF-8 as the default input
encoding. Per my response to Matthew, it looks like it uses wtutf8.
On sábado, 7 de maio de 2016 12:23:21 PDT Tom Honermann wrote:
> If gcc doesn't consult the locale, then there isn't anything to fall
> back from (unrecognized -finput-charset operands are rejected). The
> test I supplied demonstrated that gcc doesn't reject ill-formed UTF-8,
> so I think it is imprecise to state it uses UTF-8 as the default input
> encoding. Per my response to Matthew, it looks like it uses wtutf8.
Your test was invalid because it contained invalid UTF-8 sequences and my
editor destroyed them.
This discussion is also going off-topic. GCC behaviour should be discussed in
a GCC mailing list.
Sending files over the network and sharing with other
people in other operating systems, with possibly different locale encodings,
is not part of the C++ standard.
I'm not joking. The standard doesn't take that into account. Sharing files with
other people and getting the same compilation requires stepping outside of the
standard and and into compiler-specific territory.
On sábado, 7 de maio de 2016 19:48:09 PDT Nicol Bolas wrote:
> > Your test was invalid because it contained invalid UTF-8 sequences and my
> > editor destroyed them.
>
> An implementation of UTF-8 must also disallow illegal UTF-8 sequences. Just
> like an implementation of C++ must disallow illegal C++.
Right, it "disallowed" the invalid sequences by destroying them. They were
replaced by the replacement character.
> > This discussion is also going off-topic. GCC behaviour should be discussed
> > in
> > a GCC mailing list.
>
> The point of discussing it *here* is to answer a very important question:
> what source character sets are the defaults for compilers, and how many
> don't default to some form of Unicode? If 20% of compilers (by use) default
> to ASCII or whatever, then literal inclusion of text files would be
> significantly hampered due to the inability of people to use it for Unicode
> strings or Internationalization of any kind.
Given the discussion about trigraphs, I'm guessing only IBM currently still
cares about a non-ASCII encoding of source code.
Maybe if we ask Michael Wong directly for his opinion on this matter, we'll
get somewhere.
> > Sending files over the network and sharing with other
> > people in other operating systems, with possibly different locale
> > encodings,
> > is not part of the C++ standard.
> >
> > I'm not joking. The standard doesn't take that into account. Sharing files
> > with
> > other people and getting the same compilation requires stepping outside of
> > the
> > standard and and into compiler-specific territory.
>
> That's true, the standard doesn't provide any such guarantees.
>
> That doesn't mean that this file inclusion mechanism *shouldn't*. After
> all, the whole point of allowing binary inclusion is to allow people to be
> able to store literal binary data cross-platform, *exactly* as it was in
> the file. With `Fb`, each compiler is required to store the same stream of
> `unsigned char` that would have been read from an untranslated `fopen` or
> `iostream` or whatever.
>
> That's as cross-compiler as it gets.
Well, you're assuming that the target platform has bytes the same size as the
host platform that is compiling the source code.
The C++ standard cannot
guarantee that. That means the only possible solution is "implementation-
defined".
I don't see the point in making binary files any more sharable than source code
itself.
On segunda-feira, 9 de maio de 2016 09:12:55 PDT Ross Smith wrote:
> On 2016-05-09 06:33, Thiago Macieira wrote:
> > This was a comment on the binary include. Suppose you're building on a
> > regular 8-bit-byte platform, targetting a 9-bit-byte platform. How are
> > you going to>
> > represent values 256 to 512 in that byte, in your source file, when you do:
> > Fb"data.bin"
>
> This is kind of tangential to the thread, but honestly I think it's time
> we started giving serious consideration to giving up making the C++
> specification tie itself in knots trying to support platforms where a
> byte is not 8 bits.
Agreed. C++14 made it mandatory to be at least 8 bits,