On sábado, 23 de abril de 2016 12:16:23 PDT Andrew Tomazos wrote:
> Please find attached a draft 3-page proposal entitled "Proposal of File
> String Literals".
Any thought about what to do if the input file isn't the exact binary data you
want? For example, suppose you need to encode or decode base64.
Can you show this can be done with constexpr expressions?
I can see how the proposed feature would be useful in many contexts and
provide a clean solution to what if often handled in a messy way today.
I'm wondering why you have decided to handle a pre-processing action
with a syntax that doesn't look like this. I remember having seen
somebody discuss a similar feature some time ago that looked somehow
like this.
#inplace char data "datafile.txt"
#include <options> <file-specifier/pp-token>
auto the_string = #include text "somefile.txt";
On 2016-04-23 19:27, Moritz Klammler wrote:
> I'm wondering why you have decided to handle a pre-processing action
> with a syntax that doesn't look like this. I remember having seen
> somebody discuss a similar feature some time ago that looked somehow
> like this.
>
> #inplace char data "datafile.txt"
I was also wondering about that. In particular, I'll note that tools
that need to do non-compiler-assisted dependency scanning may have an
easier time with this format.
> Another question that should be discussed is whether and how to support
> non-text data.
No, that's not a question. That's a hard requirement ;-). I'm not sure
the proposal doesn't cover this, though?
auto binary_data = F"image.png"; // char[]
auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
auto image = parse_png(binary_data, binary_size);
That said... I also wonder if being able to select between text vs.
binary mode is important, especially if importing large text blocks is
really a desired feature? (Most of the use cases I've seen have been for
binary files such as images. Andrew's proposal seems to imply uses for
text.)
On 2016-04-25 13:17, Andrew Tomazos wrote:
> On Mon, Apr 25, 2016 at 5:24 PM, Matthew Woehlke wrote:
>> On 2016-04-23 19:27, Moritz Klammler wrote:
>>> Another question that should be discussed is whether and how to support
>>> non-text data.
>>
>> No, that's not a question. That's a hard requirement ;-). I'm not sure
>> the proposal doesn't cover this, though?
>>
>> auto binary_data = F"image.png"; // char[]
>> auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
>> auto image = parse_png(binary_data, binary_size);
>
> I don't think that is portable, or advisable. In particular source
> files are decoded in an implementation-defined manner during
> translation. Even if your source encoding was the same as the
> execution encoding, the implementation may still reject arbitrary
> binary sequences which are not valid for that encoding.
>
> While we might be able to extend the proposal to make this work, I
> think the better way [...]
In that case, I am Strongly Against your proposal. I probably, on some
occasions, want this feature for text. I *definitely* want it for binary
resources, and much more often than I might want it for text.
I think you are missing a significant and important use case, and, if
you don't account for that case, the feature is just begging to be
misused and abused and subject to confusion and surprise breakage.
On Mon, Apr 25, 2016 at 7:45 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Monday, April 25, 2016 at 1:38:05 PM UTC-4, Matthew Woehlke wrote:
On 2016-04-25 13:17, Andrew Tomazos wrote:
> On Mon, Apr 25, 2016 at 5:24 PM, Matthew Woehlke wrote:
>> On 2016-04-23 19:27, Moritz Klammler wrote:
>>> Another question that should be discussed is whether and how to support
>>> non-text data.
>>
>> No, that's not a question. That's a hard requirement ;-). I'm not sure
>> the proposal doesn't cover this, though?
>>
>> auto binary_data = F"image.png"; // char[]
>> auto binary_size = sizeof(binary_data) / sizeof(*binary_data);
>> auto image = parse_png(binary_data, binary_size);
>
> I don't think that is portable, or advisable. In particular source
> files are decoded in an implementation-defined manner during
> translation. Even if your source encoding was the same as the
> execution encoding, the implementation may still reject arbitrary
> binary sequences which are not valid for that encoding. [...]
If you give people the ability to include files as strings, people are going to use it for including binary files. That is guaranteed. So the only options are to have it cause subtle breakage or to properly support it.Ok, you've convinced me.We can add a new encoding-prefix "b" for binary. So it would be:auto binary_data = bF"image.png";I'd need to think about the wording, it will probably be impementation-defined with a note saying roughly that the data should undergo no decoding or encoding from source to execution.
On Fri, Apr 29, 2016 at 8:01 PM, Nicol Bolas <jmck...@gmail.com> wrote:On Friday, April 29, 2016 at 1:16:01 PM UTC-4, Andrew Tomazos wrote:On Fri, Apr 29, 2016 at 6:01 PM, Nicol Bolas <jmck...@gmail.com> wrote:On Thursday, April 28, 2016 at 5:39:09 PM UTC-4, Arthur O'Dwyer wrote:I was just thinking before your post that this has shades of the icky "mode" behavior of fopen(); i.e., it's up to the programmer (and therefore often buggy) whether the file is opened in "b"inary or "t"ext mode. What makes for the "often buggy" part is that the path-of-least-resistance happens to work perfectly on Unix/Linux/OSX, and therefore the vast majority of working programmers never need to learn the icky parts.What happens on a Windows platform when I writeconst char data[] = R"()";? Does data come out equivalent to "\n" or to "\r\n"? Does it depend on the compiler (MSVC versus Clang) or not? I don't have a Windows machine to find out for myself, sorry.
According to the standard:
> A source-file new-line in a raw string literal results in a new-line in the resulting execution string-
literal.
So it would be a `\n`, not a `\r\n`.
Granted, the above quote is not in normative text (presumably because section 2.14.3 makes it more clear). But clearly that is the intent of the specification. So if VS doesn't do that correctly, then it's broken.
And since VS has had raw string literals for a while now, odds are good someone would have noticed it if they did it wrong.Not so fast:"Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined."This mapping is commonly known as the "source encoding". As a part of the souce file, the content of raw string literals are input, likewise, in source encoding.
Considering that the specification has non-normative examples of raw string literals and their non-raw equivalents, and those examples explicitly show that a "source encoding" newline should be equivalent to "\n", then clearly the writers of the specification believe that the conversion is not implementation dependent. So either your interpretation or their interpretation of the spec is wrong.The examples you are referring to do not show how the new-line is encoded in the original physical source file.
They cannot, unless they show the hex dump of the bytes of the physical source file. The examples just show that the "text" new line in the raw string literal (after decoding) can be equivalent to an escaped new line '\n' in an ordinary string literal. I think the motivation of the example was just to show that raw string literals can contain embedded new lines (unlike ordinary string literals) - among other this.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
In Table 7 it says '\n' maps to NL(LF) and that '\r' maps to CR, and offers no further definition of what NL( LF) and CR are.
I assume NL(LF) is the new line character that is a member of the basic source character set, and that CR is the carriage return that is a member of the basic execution character set. I think the these basic source character set new lines are the same "new-line characters" refered to in the "introducing new-line characters for end-of-line indicators" during phase 1 source decoding.
On 2016-04-28 17:39, Arthur O'Dwyer wrote:
> Either way, your proposal should include an example along the lines of
>
> const char example[] = RF"foo(bar.h)foo";
>
> Does this mean "include bar.h", or "include foo(bar.h)foo" — and why?
Certainly the latter; anything else is just overly complicating things
to no benefit.
If you really need a string like "foo" + contents of 'bar.h' + "foo",
use concatenation:
Ah... yes, I interpret the original example as being the same as
`F"foo(bar.h)foo"`, with the `R` serving only to specify binary vs. text
include mode. (As Nicol notes, this may be a good reason to use
something other than `R` for that purpose. Maybe we should use `Ft` and
`Fb` instead, with `F` by itself being a synonym for `Ft`?)
const char *str = "SomeLiteral";
const unsigned char var[] = bF"Filename.bin";
const unsigned char* pVar = var;
const unsigned char* pVar = bF"Filename.bin";
On 2016-05-04 11:22, Nicol Bolas wrote:
> Actually, something just occurred to me about `bF`. Namely, NUL termination.
>
> All of the genuine string literals should be NUL terminated, since that's
> how we expect literals to behave. But `bF` shouldn't be NUL terminated.
> So... what do we do?
I'm not sure that's genuinely a problem¹. Many file formats, even
binary, are likely tolerant of a "stray" NUL at the end, and even if
not, I can't think how you would use such a string without specifying
the length, in which case it would be trivial to subtract 1.
On Wed, May 4, 2016 at 8:08 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Wednesday, May 4, 2016 at 10:23:17 AM UTC-4, Matthew Woehlke wrote:Ah... yes, I interpret the original example as being the same as
`F"foo(bar.h)foo"`, with the `R` serving only to specify binary vs. text
include mode. (As Nicol notes, this may be a good reason to use
something other than `R` for that purpose. Maybe we should use `Ft` and
`Fb` instead, with `F` by itself being a s