Include external file as a char[]

tdh...@gmail.com

unread,

Feb 15, 2015, 1:38:53 PM2/15/15

to std-pr...@isocpp.org

Hi,

Apologies if this has been discussed before. This is something I've wanted in C++ for ages and it seems trivial, so here goes my (first ever) informal proposal. Let me know what you think!

# Overview

I propose a method for including an external file in a .cpp file, but instead of copy/pasting code as #include does, it will create a char array containing the file contents.

# Motivation

People have wanted something like this for decades for including assets in a binary. Some uses include:

* Images

* OpenGL shaders

* HTML templates

The lack of a standard built in, cross platform way to include these has lead to hacks like the XPM image format, and recently this method using GCC assembly:

https://github.com/graphitemaster/incbin

Uses of this include both binary files (which might contain null characters) and text files, so the syntax should support both.

# Syntax

Obviously there are a million different possible syntaxes. I propose this:

#load "shader.vert" main_shader main_shader_len

Which works in exactly the same way as #include (i.e. you can use "" quotes and <> brackets with the same search path), but the contents of the file are loaded as follows:

const char[] main_shader = <the contents of the file>;

const int main_shader_len = <the length of the file>;

The length variable is optional.

# Obvious objections

* No way to specify if it is static or not.

* No way to specify alignment (it should have a sane default, e.g. machine word size).

* Should it be char or uint8_t? I'm not sure; I'm not really a C++ guru and I mostly ignore the esoteric difference (char is technically only 7 bit, etc.)

* Syntax bikeshedding.

It's not perfect and all-encompassing, but it is very simple and way better than anything we have now.

Cheers,

Tim

sasho648

unread,

Feb 15, 2015, 1:58:13 PM2/15/15

to std-pr...@isocpp.org, tdh...@gmail.com

Sorry to disappoint you but people those days doesn't bother with meta-programming and thus don't care for it - you're out of luck. People now days will rather go copy paste the whole file manually or by using some external scripting language. Otherwise there is a lot more general solution for such problems that I had written on ages ago and which involves 'constexpr' function and classes for manipulating internal data structures (like creating member names by using common string patterns, editing existing classes, namespaces, creating a lot faster code in which is determined which will be evaluated on compilation and which should be executed and so).

But don't worry - you may not be so unlucky as this proposal seems simple enough to be considered.

Douglas Boffey

unread,

Feb 15, 2015, 4:45:12 PM2/15/15

to std-pr...@isocpp.org, tdh...@gmail.com

What about using a std::array?

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

Message has been deleted

sasho648

unread,

Feb 15, 2015, 5:58:36 PM2/15/15

to std-pr...@isocpp.org, tdh...@gmail.com

What about using more than one sentence to express your thoughts

David Krauss

unread,

Feb 15, 2015, 10:08:36 PM2/15/15

to std-pr...@isocpp.org

On 2015–02–16, at 2:38 AM, tdh...@gmail.com wrote:

I propose a method for including an external file in a .cpp file, but instead of copy/pasting code as #include does, it will create a char array containing the file contents.

If I were to express my thoughts in just one sentence, I’d say that an iostreams-compatible resource_stream, such that the implementation can transparently store each resource in the main executable or in an associated file, would be more useful than a mechanism guaranteeing memory-resident data.

Arthur O'Dwyer

unread,

Feb 16, 2015, 1:25:20 AM2/16/15

to std-pr...@isocpp.org, tdh...@gmail.com

On Sunday, February 15, 2015 at 10:38:53 AM UTC-8, tdh...@gmail.com wrote:

Apologies if this has been discussed before. This is something I've wanted in C++ for ages and it seems trivial

As you've noticed below ("Obvious objections"), it's not trivial.

# Overview

I propose a method for including an external file in a .cpp file, but instead of copy/pasting code as #include does, it will create a char array containing the file contents.

# Motivation

People have wanted something like this for decades ...

The lack of a standard built in, cross platform way to include these has lead to hacks like the XPM image format, and recently this method using GCC assembly:

https://github.com/graphitemaster/incbin

There's also the standard *nix/BSD utility "xxd".

Seems like the niche is filled. Or, at least, if you want to claim that

(A) XPM

(B) incbin

(C) "xxd -i"

(D) various ad-hoc scripts given in http://stackoverflow.com/questions/8707183/script-tool-to-convert-file-to-c-c-source-code-array

...do NOT completely fill this evolutionary niche, and a completely new approach is necessary... well, extraordinary claims require extraordinary evidence.

#load "shader.vert" main_shader main_shader_len

... It's not perfect and all-encompassing, but it is very simple and way better than anything we have now.

Your proposal would have to include at least one example of trying to do something the "old" way (presumably involving an xxd step in your Makefile), identify some problems with the old way, and then show how doing it the "new" way solves those problems.

–Arthur

Chris Gary

unread,

Feb 16, 2015, 2:16:13 AM2/16/15

to std-pr...@isocpp.org, tdh...@gmail.com

This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.

Speaking from experience, it is a tremendously bad idea to bake any resource into a binary. This includes long string literals meant entirely for human consumption (error messages, etc. Unless you're using them as keys for gettext or similar...).

My #1 rule about any non-C++ resource is: It will change sooner and more frequently than the source code using it.

C++ source files are no place for mesh data, images, or other languages coded in string literals.

However, I'm not against compiler extensions that would permit the embedding and semantic checking of shader/compute languages (we already have _asm).

On most platforms, it is possible to export a symbol and run the image through tooling that "pastes" the data somewhere in a custom section and links the aforementioned symbol (like packing icons and sounds in a DOS image).

Matthew Woehlke

unread,

Feb 16, 2015, 10:46:01 AM2/16/15

to std-pr...@isocpp.org

On 2015-02-15 13:38, tdh...@gmail.com wrote:
> I propose this:
>
> #load "shader.vert" main_shader main_shader_len
>
> Which works in exactly the same way as #include (i.e. you can use "" quotes
> and <> brackets with the same search path), but the contents of the file
> are loaded as follows:
>
> const char[] main_shader = <the contents of the file>;
> const int main_shader_len = <the length of the file>;

constexpr size_t main_shader_len =
sizeof(main_shader) / sizeof(char);

> The length variable is optional.

Yes... really, *really* optional... as in, there is no case in which you
actually need it, since you can just take the sizeof() the data instead.
(Do note that the type is `size_t` and not `int`!)

> # Obvious objections
>
> * No way to specify if it is static or not.
> * No way to specify alignment (it should have a sane default, e.g. machine
> word size).
> * Should it be char or uint8_t? I'm not sure; I'm not really a C++ guru and
> I mostly ignore the esoteric difference (char is technically only 7 bit,
> etc.)

This could all be solved by having a directive which instead simply
inlines the contents of a file as a string literal:

static uint8_t const main_shader[] =
#foo "shader.vert"

(Note: I don't love "load" for this implementation, but not going to
open the naming bikeshed, hence the placeholder directive name.)

As noted above, if you want the length, you can use sizeof() to derive
it, so it is not necessary that the directive itself actually declare
anything.

Another open question is, how do you handle encoding? (Do you? You
definitely want to handle the case of raw bytes, for e.g. image data,
but what about text? Just require that the application be able to
correctly decode it from the raw bytes? Do you ever want to 'load' a
wide string?)

All that said, I'll point out that this is a non-issue for Qt
applications that can simply use Qt's resources for this sort of business.

--
Matthew

Matthew Woehlke

unread,

Feb 16, 2015, 10:55:47 AM2/16/15

to std-pr...@isocpp.org

On 2015-02-16 02:16, Chris Gary wrote:
> Speaking from experience, it is a tremendously bad idea to bake any
> resource into a binary. This includes long string literals meant entirely
> for human consumption (error messages, etc. Unless you're using them as
> keys for gettext or similar...).
>
> My #1 rule about any non-C++ resource is: It will change sooner and more
> frequently than the source code using it.
>
> C++ source files are no place for mesh data, images, or other languages
> coded in string literals.

Agreed as far as you specified *source files*, and assuming that you
mean ones containing non-resource-related code. (Also translations, as
these are often developed separate from the application itself.) I can
see where such a tool could easily be misused. On the other hand, it
would be extremely convenient for very simple applications where quality
of code is a lesser concern. More importantly however I'm confident you
could build on such a mechanism to provide an effective resource system
*without the need for additional tools* that would not have problems
here, e.g. having a source file that is *just* resources and some
minimal code (maybe just exporting the symbols) to make them available
to other TU's.

> However, I'm not against compiler extensions that would permit the
> embedding and semantic checking of shader/compute languages (we already
> have _asm).

Who said anything about semantic checking?

> On most platforms, it is possible to export a symbol and run the image
> through tooling that "pastes" the data somewhere in a custom section and
> links the aforementioned symbol (like packing icons and sounds in a DOS
> image).

Okay, I'm confused... above you say "it is a tremendously bad idea to
bake any resource into a binary", but here you are suggesting exactly that?

--
Matthew

dgutson .

unread,

Feb 16, 2015, 11:11:45 AM2/16/15

to std-proposals

I find this useless specially in embedded environments since there should be some processing of the binary data anyway, either before building the application (in which case the use case goes back to the original "problem" since an external tool should be invoked by the build system) or later once the application is running thus consuming time and storing the result in RAM,
If I'm mistaken, Please show use cases where embedding a binary file exactly "as is" and without requiring and additional processing.
Maybe the Key question is: what would be the sources of those files to be embedded?

Daniel.

--

Matthew Woehlke

unread,

Feb 16, 2015, 11:49:49 AM2/16/15

to std-pr...@isocpp.org

On 2015-02-16 11:11, dgutson . wrote:
> I find this useless specially in embedded environments since there should
> be some processing of the binary data anyway, either before building the
> application (in which case the use case goes back to the original "problem"
> since an external tool should be invoked by the build system) or later once
> the application is running thus consuming time and storing the result in
> RAM,
> If I'm mistaken, Please show use cases where embedding a binary file
> exactly "as is" and without requiring and additional processing.

Huh? I don't understand your confusion. The OP gave several valid examples.

It's true that embedding e.g. compressed image data will require further
processing before the application can practically use the data, but it's
frequently desirable to punt this to execution time as the trade-off
between an insignificant time to decode the image vs. the reduction in
size of the binary is typically beneficial. (How many image resources do
you know that are saved in uncompressed format these days?) No
build-time processing of the image data is needed; one just needs to
take the raw bytes of the image resource and somehow get them into a
buffer which can (at run time) be fed into the appropriate function to
decode the image.

GLSL shaders are another great example as they *must* be compiled and
linked on the system actually running the code, due to potential
differences in graphics cards / drivers.

Have you truly never seen code like:

char const[] shader =
"#version 330\n"
"int main()\n"
/* remainder elided */
;

...? I certainly have. It's ugly as all else, but it beats having to
write your own file management routines when you lack a proper resource
management system. A proper resource management system (e.g. Qt's qrc)
is much, *MUCH* better. The OP is proposing similar functionality as a
standard feature of the C++ compiler.

Really, if you don't understand why this would be useful, I would
encourage you to look at real world use of existing resource solutions
e.g. qrc / Qt resources. Then ask yourself if it would be useful to have
such a system provided by the compiler rather than depending on external
tools. I rather think that the answer is "yes".

(I actually can't think where you would need additional processing of
the resource files at build time where such processing would logically
fall to a tool that assists with resource embedding. About the only
example that comes to mind is having an SVG that is pre-rendered to a
raster image. But that's like saying that, because there are cases when
source files are generated, we don't need a C++ compiler.)

> Maybe the Key question is: what would be the sources of those files to be
> embedded?

Files on disk, just like any #include's? Maybe I don't understand the
question...

--
Matthew

Tim Hutt

unread,

Feb 16, 2015, 4:26:23 PM2/16/15

to Arthur O'Dwyer, std-pr...@isocpp.org

Hi,

Thanks everyone for the feedback! I'm going to summarise it here and give responses.

* Use std::array or iostream.

Nice idea. Only reason not to do this is it makes things more complicated, and I thought this proposal might be useful to the C folk too. Std::array is elegant though, and good point about memory residence.

* No need for a size term - just use sizeof() on the array.

That is a nice idea in theory, but arrays decompose into pointers at the drop of a hat, and then you lose your size information, e.g. when passed to functions.

* What's wrong with incbin, using external tools etc?

To be clear, this doesn't let you do anything new, it just makes it easier. There are already tons of things in C++ that are just conveniences so that isn't a real objection.

* Use a proper resource system!

That's overkill for many things, and is often not available.

* Don't see a need for this.

I've given clear examples of where there is definitely a need. The existence of incbin proves it. Incbin has 77 stars and 5 forks in less than two weeks. As Matthew Woehlke mentioned it is easy to find code that would benefit from this.

So based on the feedback it seems like people think this is generally a good idea, but the debate is over how it will work. After considering the options I think Matthew's proposal of

static uint8_t const main_shader[] =

#load "shader.vert"

is onto something. It allows you to specify the format (uint8_t vs char) and linkage of the variable. If it expands to `{0x50, 0x51, ... }` then this can also work with std::vector, and std::string, although I can't see a way to make it work with std::array because that requires the size in a template argument.

The syntax is kind of ugly though, especially with them on the different lines and with no semicolon. Maybe the preprocessor is the wrong place for this. I don't think you could do this:

const uint8_t main_shader[] = #load "shader.vert"

const std::vector<uint8_t> main_shader = #load "shader.vert"

const std::string main_shader = #load "shader.vert"

Maybe you could do this

#load "shader.vert" SHADER_SOURCE_LITERAL

const std::string main_shader = SHADER_SOURCE_LITERAL;

SHADER_SOURCE_LITERAL would be replaced with {0x50, 0x51, 0x52, ...} (or whatever the file contains).

Can anyone think of a better syntax?

Cheers,

Tim

Matthew Woehlke

unread,

Feb 16, 2015, 5:11:00 PM2/16/15

to std-pr...@isocpp.org

On 2015-02-16 16:26, Tim Hutt wrote:
> * No need for a size term - just use sizeof() on the array.
>
> That is a nice idea in theory, but arrays decompose into pointers at the
> drop of a hat, and then you lose your size information, e.g. when passed to
> functions.

The intent / expectation was to take the sizeof() at the point of
variable declaration :-).

> [various other points]

Eliding the other points, but FWIW I agree with most of them. Resources
are *vital* to many projects. Sure, there are external tools, but that
way lies inconsistency and inconvenience.

> So based on the feedback it seems like people think this is generally a
> good idea, but the debate is over how it will work. After considering the
> options I think Matthew's proposal of
>
> static uint8_t const main_shader[] =
> #load "shader.vert"
>
> is onto something. It allows you to specify the format (uint8_t vs char)
> and linkage of the variable. If it expands to `{0x50, 0x51, ... }` then
> this can also work with std::vector, and std::string, although I can't see
> a way to make it work with std::array because that requires the size in a
> template argument.

Hmm... that's a really great idea, but I'm not sure if it's ideal. The
trouble is that a string literal decomposes into a char*, while an
initializer_list (effectively what you're suggesting) I'm pretty sure
does not. In particular, I could imagine this being limiting with
libraries that can't construct a string type from an initializer_list.
That said, I think both have their uses; maybe the syntax should allow
specifying which is wanted? (And if not... I think I agree that the
initializer_list is more generally useful in modern code.)

I think with make_array, you could use the initializer_list to create a
std::array.

> The syntax is kind of ugly though, especially with them on the different
> lines and with no semicolon.

Actually, you forgot the ';', above :-). (Um... and I forgot it in my
example. Could've sworn I had one, too... it may have been accidentally
lost in editing. Sorry about that.)

Maybe something like __pragma, i.e. a preprocessor directive but NOT a
'#' directive, would be better?

> Maybe you could do this
>
> #load "shader.vert" SHADER_SOURCE_LITERAL
> const std::string main_shader = SHADER_SOURCE_LITERAL;

...or that. I could see that as being similar to '#define' (in which
case I might suggest having the identifier precede the file name... or
not; consistency aside, your order works better from an English grammar
POV).

I like either one. The __pragma-like doesn't introduce a PP symbol, but
that can go either way (namespace pollution vs. ease of reuse).

--
Matthew

Thiago Macieira

unread,

Feb 16, 2015, 7:46:29 PM2/16/15

to std-pr...@isocpp.org

On Monday 16 February 2015 17:10:50 Matthew Woehlke wrote:
> > is onto something. It allows you to specify the format (uint8_t vs char)
> > and linkage of the variable. If it expands to `{0x50, 0x51, ... }` then
> > this can also work with std::vector, and std::string, although I can't see
> > a way to make it work with std::array because that requires the size in a
> > template argument.
>
> Hmm... that's a really great idea, but I'm not sure if it's ideal. The
> trouble is that a string literal decomposes into a char*, while an
> initializer_list (effectively what you're suggesting) I'm pretty sure
> does not. In particular, I could imagine this being limiting with
> libraries that can't construct a string type from an initializer_list.
> That said, I think both have their uses; maybe the syntax should allow
> specifying which is wanted? (And if not... I think I agree that the
> initializer_list is more generally useful in modern code.)

Two questions:

a) why would you store the array into anything except a static const array or
a template type that is equivalent to that? If the template type takes an
array by reference, it doesn't matter if the string was "foo" or
{'f', 'o', 'o', 0}

b) why would the underlying element type be anything other than char?

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

David Krauss

unread,

Feb 16, 2015, 8:29:28 PM2/16/15

to std-pr...@isocpp.org

On 2015–02–17, at 8:46 AM, Thiago Macieira <thi...@macieira.org> wrote:

a) why would you store the array into anything except a static const array or
a template type that is equivalent to that? If the template type takes an
array by reference, it doesn't matter if the string was "foo" or
{'f', 'o', 'o', 0}

For large resources, it’s seldom practical to load them all before main starts. Although virtual memory and shared libraries can help alleviate the pain, the tried-and-true solution is to load resource files at runtime. Embedded systems with little RAM and no VM might have no choice.

Serializing initializer lists is an interesting thought, but I think that problem is orthogonal to the means of storage. You can’t serialize a std::initializer_list of non-serializable objects. However, initializer_list over serializable types could be a useful intermediary for real containers and such. Perhaps some clever runtime could parse a standard text (or compressed binary) format back to initializer_list and mixed-aggregate nodes and call the correct constructors as if they all formed a braced-init-list. The other piece of the puzzle is serialization.

In any case, once the initializer_list has been used, you might as well unload it. With static storage, that means falling back on VM.

Actually, for large resources on small systems, putting all the initializers in RAM is too inefficient. Range iterables would be a better solution, perhaps providing the efficiency of streams (minus the inefficiency of <iostream>) with the convenience of initializer lists.

b) why would the underlying element type be anything other than char?

Some string processing prefers wide characters. Numeric sequences might be stored as binary. I think it’s better to simplify conversion than to change the underlying type, though.

Thiago Macieira

unread,

Feb 16, 2015, 10:06:32 PM2/16/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 09:29:15 David Krauss wrote:
> > On 2015–02–17, at 8:46 AM, Thiago Macieira <thi...@macieira.org

> > <mailto:thi...@macieira.org>> wrote:
> >
> > a) why would you store the array into anything except a static const array
> > or a template type that is equivalent to that? If the template type takes
> > an array by reference, it doesn't matter if the string was "foo" or
> > {'f', 'o', 'o', 0}
>
> For large resources, it’s seldom practical to load them all before main
> starts. Although virtual memory and shared libraries can help alleviate the
> pain, the tried-and-true solution is to load resource files at runtime.
> Embedded systems with little RAM and no VM might have no choice.

Agreed, which means they are unlikely to use this functionality at all.

On systems with no VM and little RAM, the resource data would be stored in
some type of mass storage that doesn't get loaded with the executable. Because
it's outside the executable, it's outside the compiler's scope and therefore
we're not talking about this feature.

I didn't understand the rest of your comment.

> > b) why would the underlying element type be anything other than char?
>
> Some string processing prefers wide characters. Numeric sequences might be
> stored as binary. I think it’s better to simplify conversion than to change
> the underlying type, though.

The file on disk is composed of bytes. Are we expecting the compiler will
translate from bytes (char) to bigger data types? If so, how to specify the
endian order?

I think that, at most, the compiler would load the byte data and store as-is
in the binary image, with no conversion. If you use #load with wchar_t, you
should make sure that your data file has the right size and endianness.

David Krauss

unread,

Feb 16, 2015, 10:46:31 PM2/16/15

to std-pr...@isocpp.org

> On 2015–02–17, at 11:06 AM, Thiago Macieira <thi...@macieira.org> wrote:
>
> On Tuesday 17 February 2015 09:29:15 David Krauss wrote:
>> the tried-and-true solution is to load resource files at runtime.
>> Embedded systems with little RAM and no VM might have no choice.
>
> Agreed, which means they are unlikely to use this functionality at all.
>
> On systems with no VM and little RAM, the resource data would be stored in
> some type of mass storage that doesn't get loaded with the executable. Because
> it's outside the executable, it's outside the compiler's scope and therefore
> we're not talking about this feature.
>
> I didn't understand the rest of your comment.

These days, most vendors provide build systems that do handle assets. It’s reasonable to specify some system for retrieving named source-file resources as streams, and let vendors decide how to do the baking.

Portable libraries are currently restricted to resources baked into the C++ source code. That’s the fundamental problem getting attacked here. It’s a fallacy to say that a C++ compiler must be restricted to producing one executable file, and therefore no more efficient solution is viable. Even if you accept that, there are still two solutions that allow a dumb compiler: bake-in all the resources, or bake-in none and let the build system copy them to a local subdirectory known to the runtime library. These are already widely implemented on embedded and hosted systems, respectively.

>> Some string processing prefers wide characters. Numeric sequences might be
>> stored as binary. I think it’s better to simplify conversion than to change
>> the underlying type, though.
>
> The file on disk is composed of bytes. Are we expecting the compiler will
> translate from bytes (char) to bigger data types? If so, how to specify the
> endian order?

Endian order doesn’t need to be specified for resource files any more than it does for executable files. They are subordinate to the executable, ideally hidden from the user, and typically not removed or installed except by the executable or its dedicated updater. Of course, a platform can define a binary format. Mac OS has gone through several iterations — its original “Resource Manager” was instrumental to getting inexperienced vendors to fit GUI apps into a minuscule machine. Ideally C++ can provide a portable interface to platform-specific resource formats.

> I think that, at most, the compiler would load the byte data and store as-is
> in the binary image, with no conversion. If you use #load with wchar_t, you
> should make sure that your data file has the right size and endianness.

We can do better than hacking assembler behavior into the preprocessor.

Chris Gary

unread,

Feb 16, 2015, 11:38:41 PM2/16/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

Okay, I'm confused... above you say "it is a tremendously bad idea to
bake any resource into a binary", but here you are suggesting exactly that?

Its still a bad idea (have to distribute a patch for a change of icon). Just pointing out that its been done.

Who said anything about semantic checking?

I meant to say "syntax checking", though "semantic" still applies here (e.g. CUDA extensions). More important than "it just compiling" is "it not compiling when it shouldn't work."

*snip* On the other hand, it would be extremely convenient for very simple applications where quality of code is a lesser concern. More importantly however I'm confident you could build on such a mechanism to provide an effective resource system *without the need for additional tools* that would not have problems here, e.g. having a source file that is *just* resources and some minimal code (maybe just exporting the symbols) to make them available to other TU's

Simple applications notwithstanding, this is a toolchain problem. Pursuing standardization of resource management in general would require a more thorough examination of this domain than proposing simple data inclusion -- we are still being bitten by similar shortsightedness with "#include" now several decades after the fact.

Thiago Macieira

unread,

Feb 16, 2015, 11:43:23 PM2/16/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 11:46:19 David Krauss wrote:
> > The file on disk is composed of bytes. Are we expecting the compiler will
> > translate from bytes (char) to bigger data types? If so, how to specify
> > the endian order?
>
> Endian order doesn’t need to be specified for resource files any more than
> it does for executable files. They are subordinate to the executable,
> ideally hidden from the user, and typically not removed or installed except
> by the executable or its dedicated updater. Of course, a platform can
> define a binary format. Mac OS has gone through several iterations — its
> original “Resource Manager” was instrumental to getting inexperienced
> vendors to fit GUI apps into a minuscule machine. Ideally C++ can provide a
> portable interface to platform-specific resource formats.

The problem is that the file containing binary data may also be intended to be
portable, so it may contain data in a specific endianness. When compiled to a
target with a different endianness, the compiler could be expected to swap
things around.

Which is why I am saying this feature, if adopted, should be restricted to 1-
byte entities. That also resolves the question of whether to return a
character literal or an initializer list: since they are the same, a character
literal is easier to understand.

Thiago Macieira

unread,

Feb 16, 2015, 11:45:33 PM2/16/15

to std-pr...@isocpp.org

On Monday 16 February 2015 20:38:41 Chris Gary wrote:
> > Okay, I'm confused... above you say "it is a tremendously bad idea to
> > bake any resource into a binary", but here you are suggesting exactly
> > that?
>
>
> Its still a bad idea (have to distribute a patch for a change of icon).
> Just pointing out that its been done.

Does it make a difference if it's a separate file or the same as the executable
if the only way to get anything changed on the device is to flash it with a new
image?

Please don't generalise: what might be a bad idea for some scenarios may be
perfectly acceptable for others.

Chris Gary

unread,

Feb 17, 2015, 12:35:59 AM2/17/15

to std-pr...@isocpp.org

Does it make a difference if it's a separate file or the same as the executable
if the only way to get anything changed on the device is to flash it with a new
image?

Please don't generalise: what might be a bad idea for some scenarios may be
perfectly acceptable for others.

You are correct: No single approach or ideology is universally applicable.

I'm not generalizing here. Just try to define "resource" and "access thereto."

I call it a "bad idea" in the sense of mixing concerns. This is fundamentally a toolchain/platform problem.

Attempting to standardize such a thing at this time seems foolhardy, especially through a preprocessor facility.

It would be nice to have a comparison of all resource management tools/philosophies presently in common use and for what platforms they are used.

Everyone can legitimately start arguing about benefits/correctness from there.

Chris Gary

unread,

Feb 17, 2015, 12:57:44 AM2/17/15

to std-pr...@isocpp.org, arthur....@gmail.com, tdh...@gmail.com

On Monday, February 16, 2015 at 2:26:23 PM UTC-7, Tim Hutt wrote:

*snip*

Can anyone think of a better syntax?

Cheers,

Tim

How about attributes?


[[from_file("shader.vert")]] // Perhaps use the same lookup rules as #include
extern const char src[];    // Size is computed when sizeof() is applied: "shader.vert" is loaded and measured. Linker deals with the data later...

David Krauss

unread,

Feb 17, 2015, 1:24:16 AM2/17/15

to std-pr...@isocpp.org

On 2015–02–17, at 12:43 PM, Thiago Macieira <thi...@macieira.org> wrote:

The problem is that the file containing binary data may also be intended to be
portable, so it may contain data in a specific endianness.

There are source files and there are installable build products. Source files should be portable, and the proposal does concern binary source files (which are usually called “assets” instead). Though installed asset files might be identical copies of source files, they’re not portable.

When I say “resource files” there, I mean something which is already a build product. Looking again at the Mac OS, its original Resource Manager dealt with “resources” which were baked byte-strings stored in the “resource fork” of an executable file and loaded on demand. Such resources were often specified in textual source files but distributed in binary. Today, the formats and interfaces have changed, and OS X makes resources look more like regular files, but the build tools and filesystem still cooperate for the sake of data compression.

When compiled to a
target with a different endianness, the compiler could be expected to swap
things around.

My main suggestion is that C++ should provide function that returns a named resource as a std::istream. Then the programmer chooses between text and platform-specific endianness, and the platform build system chooses between storage in the executable, in a separate file, in a file aggregating various resources, etc.

For assets in standard formats like PNG, the build system should also be able intervene so the resource isn’t exactly the same as the source file. This means that it would be overspecification to say that the resource is defined exactly by the file contents. It’s sufficient to say that a string constant identifies a resource, similarly to a header file, and that the istream provides the resource contents.

A secondary suggestion is that C++ should provide a standard class for baked, opaque representation of nested, constant initializer lists. Runtime classes representing resources can initialize from such an object. (Constexpr is a requirement of initialization, but access is not constexpr.) The programmer has no control over the binary format, but the platform build system can choose whether to optimize it, and whether to store it in a separate updatable file despite the possible presence of endian-specific formatting.

For now, let’s stick to the primary suggestion.

Which is why I am saying this feature, if adopted, should be restricted to 1-
byte entities. That also resolves the question of whether to return a
character literal or an initializer list: since they are the same, a character
literal is easier to understand.

I’m not sure what you mean by “return a character literal.”

String literals have size fixed at compile time, but you’re talking about replacing the underlying file. If you mean to return char*, then the assumption of null termination is too restrictive. We need the size of the byte-string, which implies the complexity of initializer_list. But if it’s ever going to be reliably unloaded, it needs to be on the heap, which bumps the requirement up to std::string.

The next step is std::stringstream. I think iostreams is a more viable interface, and it represents less implementation effort, because the platform can forgo all of this. The build system copies the file to a dedicated directory, the runtime library opens it and returns an ifstream. Implementation complete!

On 2015–02–17, at 12:45 PM, Thiago Macieira <thi...@macieira.org> wrote:

Does it make a difference if it's a separate file or the same as the executable
if the only way to get anything changed on the device is to flash it with a new
image?

The platforms standing to benefit are a vast majority. Even minimalist embedded pseudo-filesystems can provide facilities for partial updates. Anyway, standardization of a feature for updatable programs wouldn’t lock out platforms lacking dedicated updaters.

“Updatable programs.” That has a nice ring to it.

Matheus Izvekov

unread,

Feb 17, 2015, 1:48:31 AM2/17/15

to std-pr...@isocpp.org, arthur....@gmail.com, tdh...@gmail.com

On Tuesday, February 17, 2015 at 3:57:44 AM UTC-2, Chris Gary wrote:

How about attributes?

[[from_file("shader.vert")]] // Perhaps use the same lookup rules as #include extern const char src[]; // Size is computed when sizeof() is applied: "shader.vert" is loaded and measured. Linker deals with the data later...

Attributes are supposed to be ignorable. Implementations are allowed to completely ignore them, and this should cause no semantic changes in the program.

Chris Gary

unread,

Feb 17, 2015, 1:55:16 AM2/17/15

to std-pr...@isocpp.org, arthur....@gmail.com, tdh...@gmail.com

Yep. So they can also ignore [[dllimport]] (if MS ever does this...) and other things that would otherwise prevent compilation.

Other attributes that break in terrible ways when ignored include all the "support" ever implemented for parsing __declspec() (which includes __declspec(align) and __declspec(property)).

A toolchain that ignores [[from_file("")]] would simply fail to compile it (can't compute array size, can't link).

It follows that source code expecting this feature would fail to compile, as expected.

Thiago Macieira

unread,

Feb 17, 2015, 2:08:15 AM2/17/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 14:24:03 David Krauss wrote:
> > When compiled to a
> > target with a different endianness, the compiler could be expected to swap
> > things around.
>
> My main suggestion is that C++ should provide function that returns a named
> resource as a std::istream. Then the programmer chooses between text and
> platform-specific endianness, and the platform build system chooses between
> storage in the executable, in a separate file, in a file aggregating
> various resources, etc.

Your suggestion is completely orthogonal to the suggestion from the OP. It
would be a nice feature to have, but very difficult to implement source files
without #load or without a helper code generator (like Qt's rcc).

> > Which is why I am saying this feature, if adopted, should be restricted to
> > 1- byte entities. That also resolves the question of whether to return a
> > character literal or an initializer list: since they are the same, a
> > character literal is easier to understand.
>
> I’m not sure what you mean by “return a character literal.”

I was talking about #load. The product of that should be a very long character
literal, which you can store in an array or manipulate via templates or
constexpr functions.

Thiago Macieira

unread,

Feb 17, 2015, 2:09:45 AM2/17/15

to std-pr...@isocpp.org

On Monday 16 February 2015 22:55:16 Chris Gary wrote:
> Yep. So they can also ignore [[dllimport]] (if MS ever does this...) and
> other things that would otherwise prevent compilation.
>
> Other attributes that break in terrible ways when ignored include all the
> "support" ever implemented for parsing __declspec() (which includes
> __declspec(align) and __declspec(property)).

Those are not C++ standard attributes. As extensions, they are out of scope.

All current C++ standard attributes are ignorable if the compiler does not
implement that feature. That's why alignas is a keyword, not an attribute.

Matheus Izvekov

unread,

Feb 17, 2015, 2:09:49 AM2/17/15

to std-pr...@isocpp.org, arthur....@gmail.com, tdh...@gmail.com

On Tuesday, February 17, 2015 at 4:55:16 AM UTC-2, Chris Gary wrote:

Yep. So they can also ignore [[dllimport]] (if MS ever does this...) and other things that would otherwise prevent compilation.

Other attributes that break in terrible ways when ignored include all the "support" ever implemented for parsing __declspec() (which includes __declspec(align) and __declspec(property)).

A toolchain that ignores [[from_file("")]] would simply fail to compile it (can't compute array size, can't link).

It follows that source code expecting this feature would fail to compile, as expected.

Well, vendor-specific attributes are not bound to this rule anyway, they can do whatever the vendor pleases.
But core attributes are a different thing altogether.

You are right though that ignoring this would probably only cause the program to fail to link eventually.

But anyway, this does not mean such a use for attributes would be acceptable to committee members.

Chris Gary

unread,

Feb 17, 2015, 2:19:09 AM2/17/15

to std-pr...@isocpp.org

On Monday, February 16, 2015 at 11:24:16 PM UTC-7, David Krauss wrote:

*snip*

My main suggestion is that C++ should provide function that returns a named resource as a std::istream. Then the programmer chooses between text and platform-specific endianness, and the platform build system chooses between storage in the executable, in a separate file, in a file aggregating various resources, etc.

*snip*

I'm not sure if its just me being crazy, though I remember something a while ago about standardizing localization catalogues in the same manner (I can't remember exactly what it was). Basically, it followed what you're suggesting: Let the toolchain decide what a "resource" is and just expose an istream to its raw data.

This would be the easiest thing to suggest to the committee, I think, as its just another way to construct a stream. I can immediately see how this would work on something like android (no more need for JNI just to grab a bunch of images)...

Chris Gary

unread,

Feb 17, 2015, 2:33:45 AM2/17/15

to std-pr...@isocpp.org

On Tuesday, February 17, 2015 at 12:09:45 AM UTC-7, Thiago Macieira wrote:

On Monday 16 February 2015 22:55:16 Chris Gary wrote:

*snip*

Those are not C++ standard attributes. As extensions, they are out of scope.

All current C++ standard attributes are ignorable if the compiler does not
implement that feature. That's why alignas is a keyword, not an attribute.

AFAIK: Attributes are simply a standardized syntax for whatever the vendor wanted to do, with the exception of a few reserved names.

Mainly to keep calling convention specifiers and other paraphernalia in a predictable place.

That is, instead of:

GLint (STDCALL_OR_NIL *glDoSomething)(...);

We would have:

#ifdef _WIN32
#define STDCALL_OR_NIL [[msvc::stdcall]] // whatever this might actually look like
#else
#define STDCALL_OR_NIL
#endif

STDCALL_OR_NIL
GLint (*glDoSomething)(...);

A minor change, but a definite improvement in usability: Just put all the random compiler-specific goodies in front of a declaration.

So, I suggested an attribute as a way to implement OP's very-much-toolchain-specific extension.

David Krauss

unread,

Feb 17, 2015, 2:52:51 AM2/17/15

to std-pr...@isocpp.org

> On 2015–02–17, at 3:33 PM, Chris Gary <cgar...@gmail.com> wrote:
>
> AFAIK: Attributes are simply a standardized syntax for whatever the vendor wanted to do, with the exception of a few reserved names.

Now you know this: Attributes are always ignorable.

> Mainly to keep calling convention specifiers and other paraphernalia in a predictable place.

Nope. Although it’s often advisable for vendors to group proprietary keywords together with attributes where possible, they’re not allowed to turn keywords into attributes where they convey necessary meaning.

> A minor change, but a definite improvement in usability: Just put all the random compiler-specific goodies in front of a declaration.

Vendors don’t need the standard to tell them where to add extensions.

David Krauss

unread,

Feb 17, 2015, 2:57:37 AM2/17/15

to std-pr...@isocpp.org

On 2015–02–17, at 3:08 PM, Thiago Macieira <thi...@macieira.org> wrote:

Your suggestion is completely orthogonal to the suggestion from the OP. It
would be a nice feature to have, but very difficult to implement source files
without #load or without a helper code generator (like Qt's rcc).

Different yes, orthogonal or difficult no.

When the programmer writes something like this:

std::string greeting = std::resource( "strings/hello" );

he shouldn’t care, and it should be unspecified, whether the compiler ever looks at the file, or the standard library simply opens it at runtime.

The standard doesn’t ever mention an “executable file” or even a “compiler.” A program simply defines observable behaviors for a computer.

Which is why I am saying this feature, if adopted, should be restricted to
1- byte entities. That also resolves the question of whether to return a
character literal or an initializer list: since they are the same, a
character literal is easier to understand.

I’m not sure what you mean by “return a character literal.”

I was talking about #load. The product of that should be a very long character
literal, which you can store in an array or manipulate via templates or
constexpr functions.

For text manipulation, we already have raw strings, and #load would just be the same as #include plus raw-string bumpers on the file. That leaves compile-time binary byte-streams, which are quite a niche to be designing around.

Chris Gary

unread,

Feb 17, 2015, 3:38:49 AM2/17/15

to std-pr...@isocpp.org

On Tuesday, February 17, 2015 at 12:52:51 AM UTC-7, David Krauss wrote:

> Mainly to keep calling convention specifiers and other paraphernalia in a predictable place.

Nope. Although it’s often advisable for vendors to group proprietary keywords together with attributes where possible, they’re not allowed to turn keywords into attributes where they convey necessary meaning.

Can you clarify that last bit? N2761 seems to suggest them as a replacement for all forms of __attribute__ and __declspec (much ado about how they are equivalent to GCC's __attribute__). Even mentioning alignment in the "yes, do this" bullet list of suggestions.

> A minor change, but a definite improvement in usability: Just put all the random compiler-specific goodies in front of a declaration.

Vendors don’t need the standard to tell them where to add extensions.

Well, now they have been given very strong advice indeed! Not that they'll listen...

I'm probably just dreaming with that example, though. The calling convention should actually go right next to the pointer, now that I think about it.

What I gathered from the (old) draft I have on hand, they could be used for things like this:


[[glsl::fragment_source]]
const char *frag_src = R"frag(

    main()
    {
        gl_FragColor = gl_Color;
    }

)frag";

The purpose of the attribute would be to run the associated string literal through a GLSL compiler during translation, allowing any errors raised to prevent linking.

Following the guideline bullets: Removing the attribute still leaves a valid raw string literal.

Back to my attribute suggestion:

[[from_file("shader.vert")]]
extern const char str[];

Removing the attribute still leaves a valid declaration. Even though I suggested altering the semantics of the symbol (sizeof() is valid on str where this is supported), this will fail to compile if it is ignored, as desired (the code expects str to point to something, and to be sized statically, as OP desired). This is probably a stark violation of the already vague set of mere suggestions, so I won't pursue it any longer.

Avoiding further digression: I already agree with your suggestion that an opaque named-resource-stream would be the most feasible approach to this problem.

It simply isn't reasonable further extend the preprocessor in a way that creates a required coupling with the next stage (needing to 'remember' the binding target -- this is what #pragma is for).

Making a preprocessor directive like #load that simply drops "" around its contents would require a linefeed after the symbol:


const char *src = 
#load "things.stuff"    // This HAS to be on its own line
;

which looks outright weird.

Other weirdness:

#load_into src "stuff.things"    // Coupling with the next stage: "src" is a pp-token here

const char src[];    // what does my decl actually look like?

This could be rephrased as:

#pragma paste_file(src, "stuff.things")

const char src[];    // I still don't know what my decl should look like!

David Krauss

unread,

Feb 17, 2015, 7:25:28 AM2/17/15

to std-pr...@isocpp.org

On 2015–02–17, at 4:38 PM, Chris Gary <cgar...@gmail.com> wrote:

Can you clarify that last bit? N2761 seems to suggest them as a replacement for all forms of __attribute__ and __declspec (much ado about how they are equivalent to GCC's __attribute__).

That paper predates the decision about ignorability.

Even mentioning alignment in the "yes, do this" bullet list of suggestions.

C++11 added the alignas specifier which syntactically groups with attributes, but is not an attribute.

Well, now they have been given very strong advice indeed! Not that they'll listen…

There has been no official advice about preferring to put extensions with attributes. I only said it’s “advisable” because it means less new syntax for users to learn.

I'm probably just dreaming with that example, though. The calling convention should actually go right next to the pointer, now that I think about it.

The standard way of specifying calling conventions is with extern "string" linkage specifiers. Vendors have mostly ignored this feature, for whatever reason. Personally I’d like to see more of it, though not for something like this proposal.

Just for the sake of argument, though, this would be a valid “resource grabber” extension with no new keywords:

extern "resource" char greeting[] = "strings/hello.txt";

Avoiding further digression: I already agree with your suggestion that an opaque named-resource-stream would be the most feasible approach to this problem.

Cool! Thanks for bringing up the issue, it’s not something I’d have thought of otherwise.

My suggestion almost avoids touching the core language, but I said the standard function should take a compile-time string. So far such a thing does not exist, although a couple proposals were presented and received well at the last ISO meeting.

So, my suggestion leads to a choice between proposing a library function that illegally treats a function parameter as a compile-time constant (as in my previous example), or proposing that resource names must be marked as compile-time constants by some suffix which has yet to be standardized, e.g.

std::string greeting = std::resource( "strings/hello.txt"cs ); // “cs” for “constant string”

One of the proposals was N4236. I can’t find the other one right now, but they’re functionally equivalent for this purpose.

Compile-time strings have different types from ordinary strings, so std::resource could have additional char const * and std::string overloads. That alone might be enough for a proposal, with compile-time functionality left as an extension or subsequent proposal. It might sound trivial, if all it’s going to do in practice is prepend a platform-specific prefix to the path, but that’s still a wheel that shouldn’t get reinvented so often.

#pragma paste_file(src, "stuff.things") const char src[]; // I still don't know what my decl should look like!

#pragma usually signals a platform-specific extension. C has standard pragmas but C++ so far does not.

The usual way of adding platform-specific keywords is with a leading double underscore:

const char src[] = __paste_file( "stuff.things" );

C likewise standardizes new keywords with a leading underscore and capital letter, but C++ so far prefers to add keywords and “contextual keywords.”

David Krauss

unread,

Feb 17, 2015, 8:25:31 AM2/17/15

to std-pr...@isocpp.org

On 2015–02–17, at 5:43 PM, David Krauss <pot...@gmail.com> wrote:

My suggestion almost avoids touching the core language, but I said the standard function should take a compile-time string.

Eh, no, I hadn’t said that. Brain fart, ignore the rest of this.

Matthew Woehlke

unread,

Feb 17, 2015, 11:05:03 AM2/17/15

to std-pr...@isocpp.org

On 2015-02-17 02:52, David Krauss wrote:

>> On 2015–02–17, at 3:33 PM, Chris Gary wrote:
>>
>> AFAIK: Attributes are simply a standardized syntax for whatever the vendor wanted to do, with the exception of a few reserved names.
>
> Now you know this: Attributes are always ignorable.

Yes... and no. A compiler that ignores attributes can be conforming.
Things may fall apart if you are relying on non-standard attributes to
be parsed by the compiler that provides the same, and it suddenly
changes its mind, but you're already in non-standard territory there.
Code relying on arbitrary compilers to support non-standard attributes
is broken.

IOW, it's reasonable to write code that relies on the compiler
supporting an attribute where said code knows that the compiler supports
said attribute. It would not be reasonable in such case for the compiler
to say that it supports the attribute (if the attribute introduces a
semantic change) and then ignore it. It's also reasonable for a compiler
to ignore all attributes (provided that it doesn't claim to do otherwise).

--
Matthew

Matthew Woehlke

unread,

Feb 17, 2015, 11:12:27 AM2/17/15

to std-pr...@isocpp.org

On 2015-02-16 23:38, Chris Gary wrote:

> Matthew Woehlke wrote:
>> Okay, I'm confused... above you say "it is a tremendously bad idea to
>> bake any resource into a binary", but here you are suggesting exactly
>> that?
>
> Its still a bad idea (have to distribute a patch for a change of icon).
> Just pointing out that its been done.

And how *else* are you going to distribute a change of icon? You're
going to have to distribute *something* - preferably an automated
something - for *any* sort of change. At that point, I don't see it
making much difference if the changed file(s) are data or binaries. (On
most Linux systems, it makes exactly zero difference, as the "patch" is
a complete rebuild of the package. I'm not entirely up to speed on OS/X
bundles, but I think they're in the same boat.)

If you're in some domain where that isn't the case... then don't use the
feature :-). As Thiago said, I don't see that one corner case where this
doesn't work should prevent accepting it when it would be useful for
many other cases.

>> Who said anything about semantic checking?
>
> I meant to say "syntax checking", though "semantic" still applies here
> (e.g. CUDA extensions). More important than "it just compiling" is "it not
> compiling when it shouldn't work."

I meant that literally. *Now* it's been brought up, but when I wrote
that, I don't recall that anyone had suggested this feature do anything
but "paste" file contents; no syntax / semantic checking involved.

That could be nice, but it's not necessary, and certainly not worth
derailing the proposal.

--
Matthew

Matthew Woehlke

unread,

Feb 17, 2015, 11:32:34 AM2/17/15

to std-pr...@isocpp.org

On 2015-02-16 19:46, Thiago Macieira wrote:
> On Monday 16 February 2015 17:10:50 Matthew Woehlke wrote:
>>> is onto something. It allows you to specify the format (uint8_t vs char)
>>> and linkage of the variable. If it expands to `{0x50, 0x51, ... }` then
>>> this can also work with std::vector, and std::string, although I can't see
>>> a way to make it work with std::array because that requires the size in a
>>> template argument.
>>
>> Hmm... that's a really great idea, but I'm not sure if it's ideal. The
>> trouble is that a string literal decomposes into a char*, while an
>> initializer_list (effectively what you're suggesting) I'm pretty sure
>> does not. In particular, I could imagine this being limiting with
>> libraries that can't construct a string type from an initializer_list.
>> That said, I think both have their uses; maybe the syntax should allow
>> specifying which is wanted? (And if not... I think I agree that the
>> initializer_list is more generally useful in modern code.)
>
> Two questions:
>
> a) why would you store the array into anything except a static const array or
> a template type that is equivalent to that? If the template type takes an
> array by reference, it doesn't matter if the string was "foo" or
> {'f', 'o', 'o', 0}

Again, an initializer list does not implicitly convert to a char const*.
I can't pass it to anything that expects a char const*; I have to assign
it to a variable first. I may not want to do that; I can readily
conceive instances where storing it in a variable is just unneeded
overhead, i.e. I could readily just pass it directly to some function.

That's my main concern with the list format.

> b) why would the underlying element type be anything other than char?

Why not uchar? Different API's expect different representations of "raw
memory bytes". (Sure, you can reinterpret_cast, but why not avoid that
if we can?)

Why you'd ever *want* to use, say, int, I don't know, but I'd be
surprised if the compiler choked on it if the operation is defined as
producing a list of char literals.

As I see it, the most important reasons to let the user decide is to
allow direct use in constructors/factories (e.g. std::string,
std::make_array), to allow specifying storage (e.g. static, const...
there's no reason this couldn't be used to initialize a non-const
array), and so forth. This is also why the list format should be
supported; class constructors are more likely to accept an
initializer_list than to have a templated constructor to accept a
known-size character array literal.

--
Matthew

Matthew Woehlke

unread,

Feb 17, 2015, 11:35:10 AM2/17/15

to std-pr...@isocpp.org

On 2015-02-16 20:29, David Krauss wrote:

>> On 2015–02–17, at 8:46 AM, Thiago Macieira wrote:
>>
>> a) why would you store the array into anything except a static const array or
>> a template type that is equivalent to that? If the template type takes an
>> array by reference, it doesn't matter if the string was "foo" or
>> {'f', 'o', 'o', 0}
>
> For large resources, it’s seldom practical to load them all before
> main starts. Although virtual memory and shared libraries can help
> alleviate the pain, the tried-and-true solution is to load resource
> files at runtime. Embedded systems with little RAM and no VM might
> have no choice.

...and, as Thiago said, such systems aren't going to use this feature.
Or any other resource system that embeds resources into the binary, for
that matter.

I'm curious how many such platforms exist that would even have large
resources used? (Do iOS / Android / etc. not have VM? I would have
thought they do...)

On platforms with VM and read only data segments, I would expect the
resource data would be in a RO segment (same as the rest of the code)
that would have essentially zero overhead versus an external file on
disk. (Possibly less, unless you're mmap'ing the file, as I believe
fread() would *always* be reading into RAM, whereas RO data may not, or
at least can be swapped out without using the swap file, or if there
*is* no swap file.)

> Serializing initializer lists is an interesting thought, but I think
> that problem is orthogonal to the means of storage. You can’t
> serialize a std::initializer_list of non-serializable objects.

We're only talking about [u]char here, though.

--
Matthew

Thiago Macieira

unread,

Feb 17, 2015, 3:03:13 PM2/17/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 11:31:13 Matthew Woehlke wrote:
> I'm curious how many such platforms exist that would even have large
> resources used? (Do iOS / Android / etc. not have VM? I would have
> thought they do...)
>
> On platforms with VM and read only data segments, I would expect the
> resource data would be in a RO segment (same as the rest of the code)
> that would have essentially zero overhead versus an external file on
> disk. (Possibly less, unless you're mmap'ing the file, as I believe
> fread() would *always* be reading into RAM, whereas RO data may not, or
> at least can be swapped out without using the swap file, or if there
> *is* no swap file.)

On an architecture that supports paging, it's easy for the OS to provide VM
support and memory-mapped files. That's a key target for embedded binary
content: it's loaded as read-only, always-pure, shareable pages. That means
the content can also be demand-loaded and discarded when there's memory
pressure.

The problem is when you're trying to write software for systems without paging
support. That means the entire binary would need to be loaded into main RAM
and stay there, including all resources, whether the are needed or not. That
doesn't sound to me like a very good use of resources. Those kind of systems
are more of a target for the solution that David proposed: manually load on
demand.

You asked whether they are common -- they still are and are gaining again in
relevance, as we move to the "Internet of Things" world when your toaster and
light switches want to connect to a network.

There's a sub-category of the no-paging systems: the execute-in-place systems
(see https://en.wikipedia.org/wiki/Execute_in_place). I don't have enough
experience with those to provide meaningful conclusions, though.

Thiago Macieira

unread,

Feb 17, 2015, 3:04:53 PM2/17/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 11:31:19 Matthew Woehlke wrote:
> Why not uchar? Different API's expect different representations of "raw
> memory bytes". (Sure, you can reinterpret_cast, but why not avoid that
> if we can?)

static const unsigned char foo[] = "Hello";

Matthew Woehlke

unread,

Feb 17, 2015, 3:20:01 PM2/17/15

to std-pr...@isocpp.org

On 2015-02-17 15:03, Thiago Macieira wrote:
> The problem is when you're trying to write software for systems without paging
> support. That means the entire binary would need to be loaded into main RAM
> and stay there, including all resources, whether the are needed or not. That
> doesn't sound to me like a very good use of resources. Those kind of systems
> are more of a target for the solution that David proposed: manually load on
> demand.

Right. This isn't a feature you'd use in such cases. (Nor e.g. Qt's
system which AFAIK is roughly equivalent.)

> You asked whether they are common -- they still are and are gaining again in
> relevance, as we move to the "Internet of Things" world when your toaster and
> light switches want to connect to a network.

On the other hand, I am curious what sorts of "resources" you are even
using on such systems? Somehow I'm not seeing much in the way of
graphics on your light switch :-).

--
Matthew

Matthew Woehlke

unread,

Feb 17, 2015, 3:25:08 PM2/17/15

to std-pr...@isocpp.org

On 2015-02-17 15:04, Thiago Macieira wrote:
> On Tuesday 17 February 2015 11:31:19 Matthew Woehlke wrote:
>> Why not uchar? Different API's expect different representations of "raw
>> memory bytes". (Sure, you can reinterpret_cast, but why not avoid that
>> if we can?)
>
> static const unsigned char foo[] = "Hello";

Sorry, I may have misread the question. The "pasted content" or whatever
you want to call it should be such that it can be stored as an array of
'char'. (Or 'uchar', but as you note, that seems to Just Work. Likewise
'int', etc.)

I may be missing the context that caused you to ask the question in the
first place...

--
Matthew

Thiago Macieira

unread,

Feb 17, 2015, 4:54:06 PM2/17/15

to std-pr...@isocpp.org

On Tuesday 17 February 2015 15:19:50 Matthew Woehlke wrote:
> On 2015-02-17 15:03, Thiago Macieira wrote:
> > The problem is when you're trying to write software for systems without
> > paging support. That means the entire binary would need to be loaded into
> > main RAM and stay there, including all resources, whether the are needed
> > or not. That doesn't sound to me like a very good use of resources. Those
> > kind of systems are more of a target for the solution that David
> > proposed: manually load on demand.
>
> Right. This isn't a feature you'd use in such cases. (Nor e.g. Qt's
> system which AFAIK is roughly equivalent.)

QResource also supports reading a resource directory directly from a file,
instead of something registered inside the binary image (see
QResource::registerResource).

But David's description is more similar to QFileSelector.

> > You asked whether they are common -- they still are and are gaining again
> > in relevance, as we move to the "Internet of Things" world when your
> > toaster and light switches want to connect to a network.
>
> On the other hand, I am curious what sorts of "resources" you are even
> using on such systems? Somehow I'm not seeing much in the way of
> graphics on your light switch :-).

Many of those may contain small web servers for configuration and management,
which means they may have icons they wish to send to you. Another scenario I
can think of is the transfer of prepared responses over the communication
channel. If nothing else, they probably have multiple preconfigured encryption
keys that may not be required all the time.

The size of the RAM on such systems that I'm hearing about would scare even my
DOS-using 1992 self...

http://download.intel.com/newsroom/kits/ces/2015/pdfs/Intel_CURIE_Module_Factsheet.pdf
- 384 kB Flash, 80 kB SRAM. At 80 kB, a 2048-bit RSA key + certificate would
occupy around 1% of your memory, so you may not want to keep it past the
initial TLS handshake.

Magnus Fromreide

unread,

Feb 17, 2015, 6:17:20 PM2/17/15

to std-pr...@isocpp.org

On Tue, Feb 17, 2015 at 05:43:58PM +0800, David Krauss wrote:
>
> > On 2015–02–17, at 4:38 PM, Chris Gary <cgar...@gmail.com> wrote:
> >
> > Can you clarify that last bit? N2761 seems to suggest them as a replacement for all forms of __attribute__ and __declspec (much ado about how they are equivalent to GCC's __attribute__).
>
> That paper predates the decision about ignorability.
>
> > Even mentioning alignment in the "yes, do this" bullet list of suggestions.
>
> C++11 added the alignas specifier which syntactically groups with attributes, but is not an attribute.
>
> > Well, now they have been given very strong advice indeed! Not that they'll listen…
>
> There has been no official advice about preferring to put extensions with attributes. I only said it’s “advisable” because it means less new syntax for users to learn.
>
> > I'm probably just dreaming with that example, though. The calling convention should actually go right next to the pointer, now that I think about it.
>
> The standard way of specifying calling conventions is with extern "string" linkage specifiers. Vendors have mostly ignored this feature, for whatever reason. Personally I’d like to see more of it, though not for something like this proposal.
>
> Just for the sake of argument, though, this would be a valid “resource grabber” extension with no new keywords:
>
> extern "resource" char greeting[] = "strings/hello.txt";

This form drops the original forms ability to be used not only in character
arrays but also in argument lists, template argument lists and as an
expression. (ok, the expression form ain't very useful)

Consider:

char greeting[] = {
#load "strings/hello.txt"
}

otherGreeting<
#load "strings/hello.txt"
> bar;

thirdGreeting(
#load "strings/hello.txt"
);

and finally there is

fourthGreeting,
#load "strings/hello.txt"
; /* The rather useless case... */

Thiago Macieira

unread,

Feb 18, 2015, 2:30:10 AM2/18/15

to std-pr...@isocpp.org

On Wednesday 18 February 2015 00:17:15 Magnus Fromreide wrote:
> char greeting[] = {
> #load "strings/hello.txt"
> }
>
> otherGreeting<
> #load "strings/hello.txt"
>
> > bar;
>
> thirdGreeting(
> #load "strings/hello.txt"
> );
>
> and finally there is

You can replace those with a load into a constexpr char variable and pass that
to the template or function call.

We're not talking about loading a file and interpreting it as C++ source code.
We have #include for that already.

Arthur O'Dwyer

unread,

Feb 18, 2015, 11:52:46 AM2/18/15

to std-pr...@isocpp.org

On Tuesday, February 17, 2015 at 11:30:10 PM UTC-8, Thiago Macieira wrote:

On Wednesday 18 February 2015 00:17:15 Magnus Fromreide wrote:
> char greeting[] = {
> #load "strings/hello.txt"
> }
>
> otherGreeting<
> #load "strings/hello.txt"
>
> > bar;
>
> thirdGreeting(
> #load "strings/hello.txt"
> );
>
> and finally there is

You can replace those with a load into a constexpr char variable and pass that
to the template or function call.

Not with the same semantics, though.

otherGreeting<"abc">

has a very different meaning from

otherGreeting<'a', 'b', 'c'>

.

We're not talking about loading a file and interpreting it as C++ source code.
We have #include for that already.

Magnus's suggestion was not to load a file and interpret it as source code; it was to load a file and "interpret" it as a comma-separated sequence of character literals. For example,

int a[]={

#load __FILE__

};

when preprocessed would produce

int a[] = {

'i', 'n', 't', ' ', 'a', '[', ']', '=', '{', '\n', '#', 'l', 'o', 'a', 'd', ' ', '_', '_', 'F', 'I', 'L', 'E', '_', '_', '\n', '}', ';', '\n'

};

I think Magnus's is a reasonable suggestion... except for the fact that it already exists, in the form of the standard utility program xxd. I don't see any reason to merge the functionalities of xxd and cc into a single program, when they work perfectly fine as separate programs. (Or, if someone in this thread claims that they don't work perfectly fine, I'd like to see that claim made explicitly, and then backed up with at least a bit of ancedotal evidence.)

–Arthur

Matthew Woehlke

unread,

Feb 18, 2015, 12:30:39 PM2/18/15

to std-pr...@isocpp.org

On 2015-02-18 11:52, Arthur O'Dwyer wrote:
> I think Magnus's is a reasonable suggestion... except for the fact that it
> already exists, in the form of the standard utility program xxd.

$ rpm -qf /usr/bin/xxd
vim-common-7.4.475-2.fc20.x86_64

Really? So in order to use resources in a C++ program I should be
required to install vim? (I'm sure this will surprise no one, and that
emacs fans won't be in the slightest annoyed...)

C:\Program Files (x86)>xxd
'xxd' is not recognized as an internal or external command,
operable program or batch file.

Oh, look, *it's not portable* either. (That was in my MSVC shell, btw.)

I have to disagree with you; this is sufficiently useful to justify
being part of the compiler, which would make it portable and usable with
no build-system dependencies.

--
Matthew

Thiago Macieira

unread,

Feb 18, 2015, 12:31:36 PM2/18/15

to std-pr...@isocpp.org

On Wednesday 18 February 2015 08:52:46 Arthur O'Dwyer wrote:
> > You can replace those with a load into a constexpr char variable and pass
> > that
> > to the template or function call.
>
> Not with the same semantics, though.
> otherGreeting<"abc">
> has a very different meaning from
> otherGreeting<'a', 'b', 'c'>

But it should have the same meaning as

otherGreeting<{'a', 'b', 'c'}>

which is what it seems to me it should be. The #load should produce one token,
not multiple comma-separated ones.

We don't want this:

static const uchar data[] = 'a', 'b', 'c';

> > We're not talking about loading a file and interpreting it as C++ source
> > code.
> > We have #include for that already.
>
> Magnus's suggestion was not to load a file and interpret it as source code;
> it was to load a file and "interpret" it as a comma-separated sequence of
> character literals. For example,
>
> int a[]={
> #load __FILE__
> };
>
> when preprocessed would produce
>
> int a[] = {
> 'i', 'n', 't', ' ', 'a', '[', ']', '=', '{', '\n', '#', 'l', 'o', 'a',
> 'd', ' ', '_', '_', 'F', 'I', 'L', 'E', '_', '_', '\n', '}', ';', '\n'
> };

That's an interesting suggestion, but that's not how I had read it. I was
expecting that the { } would be unnecessary. Hence my reply.

> I think Magnus's is a reasonable suggestion... except for the fact that it
> already exists, in the form of the standard utility program xxd. I don't
> see any reason to merge the functionalities of xxd and cc into a single
> program, when they work perfectly fine as separate programs. (Or, if

> someone in this thread claims that they *don't* work perfectly fine, I'd

> like to see that claim made explicitly, and then backed up with at least a
> bit of ancedotal evidence.)

I also don't think there's anything wrong with code generators.

Cleiton Santoia

unread,

Feb 18, 2015, 1:11:24 PM2/18/15

to std-pr...@isocpp.org

If I have a table in Mysql

CREATE TABLE my_table( cod int(10), nm varchar(30) )

Can I somehow parse the ".frm"

WEIRED_PARSE_MACRO(#load "..\..\..\var\mysql_data\my_table.frm");

and get a structure ?

struct my_table{
   int cod;
   std::string nm;
};

Thanks

Matthew Woehlke

unread,

Feb 18, 2015, 1:22:38 PM2/18/15

to std-pr...@isocpp.org

On 2015-02-18 13:11, Cleiton Santoia wrote:
> If I have a table in Mysql
>
> CREATE TABLE my_table( cod int(10), nm varchar(30) )
>
> Can I somehow parse the ".frm"
>
> WEIRED_PARSE_MACRO(#load "..\..\..\var\mysql_data\my_table.frm");
>
> and get a structure ?
>
> struct my_table{
> int cod;
> std::string nm;
> };

...only if WEIRD_PARSE_MACRO can go from {'C','R','E' /*...*/
'3','0',')',' ',')',0} to that. That sounds rather more like a job for a
specialized code generator.

--
Matthew

Douglas Boffey

unread,

Feb 18, 2015, 1:55:25 PM2/18/15

to std-pr...@isocpp.org

If I understand the way this thread is going, it would seem to me
better to extend #include to load from a pipe.

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.
>

Douglas Boffey

unread,

Feb 18, 2015, 1:58:43 PM2/18/15

to std-pr...@isocpp.org

e.g.

#include "tokenise foo.txt|"

where the '|' signifies the pipe.

Dale Weiler

unread,

Feb 18, 2015, 2:02:41 PM2/18/15

to std-pr...@isocpp.org, tdh...@gmail.com

As the author of incbin (I also go by graphitemaster) I feel like this discussion has been mostly derailed about the purposes of what the original proposer, Tim suggested.
The proposal was not about incorporating a resource system into C++, nor was it about suggesting a library component to incorporate into the standard library. This is
purely a proposal about introducing a method (in the language) to include binary assets as-is into the program.

Assemblers have had this capability long before C even existed. In fact, the first original language before C, mainly B, had always supported means to specify storage for
the final kludge as simple statement-directives in B. The first C, which was just B with backwards compound operators and weird function declarations rid with it because
the preconstruct (The B compiler used to bootstrap C.) had inline assembler that could include binary data. We lost it when C++ more or less picked up where C had left off,
when it should've at least looked at where C came from itself. That's enough history.

It irritates me that since the beginning of computing in general, ways to incorporate binary data has always existed without the need to use external preprocessing tools or
build steps which are non-portable and quite frankly ridiculous. If the committees view has been "If it can be done with a build tool, do that first before adding something to the language",
then raw string literals is not really following those rules. I'm not, and never have been a fan of introducing build complexity to a project just to achieve something.
Judging both by how popular incbin itself has become and how much bike-shedding has been going on here, I can within reasonable interpretation justify that there is use for something
like this in the language.

For added build step complexity it can be done reasonably well, sure. But people like myself compile for a variety of platforms, on a variety of compilers, cross compilation
included, and the one thing I've come to accept is to avoid doing any sort of external processing as a build step because there is just too many permutations for targets that
can go wrong at any time, impossible for any one individual to test and quite frankly, ridiculousness. It's always better to utilize the compiler, since it's the only thing that
has to be consistent due to standardization.

I don't care what the syntax is, but I can assure you that the language shouldn't interpret the data as anything but a raw sequence of bytes. There should be no endianess conversion,
alignment, or other nonsensical things done to the data. The md5sum of a dump of the data should compare equal to that of the md5sum of the included data once it's 'embedded' in
the kludge.

Matthew Woehlke

unread,

Feb 18, 2015, 2:19:58 PM2/18/15

to std-pr...@isocpp.org

On 2015-02-18 13:55, Douglas Boffey wrote:
> If I understand the way this thread is going, it would seem to me
> better to extend #include to load from a pipe.

No. That *would* be way out of context of the original suggestion. It's
also unnecessary; code generation is clearly the job of build systems.

Let me reiterate: the suggestion *is not about (complex) code
generation*. It is about inserting the contents of a file as code-level
data (i.e. string literals or initializer lists, as opposed to code,
which is what #include does).

(Another way to look at it is that it is about inserting file contents
such that the raw contents wind up on CODE/RODATA without further
translation, as opposed to being processed as code. That is, *less*
translation. Not more. Please don't derail the thread.)

--
Matthew

Matthew Woehlke

unread,

Feb 18, 2015, 2:30:06 PM2/18/15

to std-pr...@isocpp.org

On 2015-02-18 14:02, Dale Weiler wrote:
> The proposal was not about incorporating a resource system into C++,
> nor was it about suggesting a library component to incorporate into
> the standard library. This is purely a proposal about introducing a
> method (in the language) to include binary assets as-is into the
> program.

...the obvious purpose of which is to implement a resource system :-).
True, it is not proposing any form of advanced resource management. Nor
does it need to; that's what identifiers are for.

That said, I think it's reasonable to consider that this is a major
motivating use case and that this won't entirely solve that problem.

Still, I agree with the sentiment that we shouldn't get bogged down in
adding functionality to something that should be inherently quite
simple. That's why I'm not in favor of any of the counter-proposals that
wandered off in this direction.

> For added build step complexity it can be done reasonably well, sure.
> But people like myself compile for a variety of platforms, on a
> variety of compilers, cross compilation included, and the one thing
> I've come to accept is to avoid doing any sort of external processing
> as a build step because there is just too many permutations for
> targets that can go wrong at any time, impossible for any one
> individual to test and quite frankly, ridiculousness. It's always
> better to utilize the compiler, since it's the only thing that has to
> be consistent due to standardization.

Hear, hear!

> I don't care what the syntax is, but I can assure you that the language
> shouldn't interpret the data as anything but a raw sequence of bytes. There
> should be no endianess conversion,

I wouldn't *oppose* endian conversion as such, but it's a nice-to-have.
The no-conversion flavor on the other hand is bare minimum functionality.

> alignment,

Alignment can and should be handled as part of the storage declaration,
i.e. it is orthogonal to the "paste operator" (or whatever you want to
call it). So... yes, the paste operator itself shouldn't know/care.

--
Matthew

Arthur O'Dwyer

unread,

Feb 18, 2015, 2:56:27 PM2/18/15

to std-pr...@isocpp.org

On Wed, Feb 18, 2015 at 11:19 AM, Matthew Woehlke <mw_t...@users.sourceforge.net> wrote:
> On 2015-02-18 13:55, Douglas Boffey wrote:
>> If I understand the way this thread is going, it would seem to me
>> better to extend #include to load from a pipe.

FWIW, that seems reasonable to me (Perl does something like that with its equivalent of fopen) — EXCEPT for the massive security problem. Compiling a C++ program mustn't result in executing arbitrary shell commands!

#include `rm -rf *` // ha ha

> No. That *would* be way out of context of the original suggestion. It's
> also unnecessary; code generation is clearly the job of build systems.
>
> Let me reiterate: the suggestion *is not about (complex) code
> generation*. It is about inserting the contents of a file as code-level
> data (i.e. string literals or initializer lists, as opposed to code,
> which is what #include does).

In other words, the suggestion is about taking a binary file, preprocessing it in such a way as to insert singlequote-comma-singlequote between each pair of bytes (with appropriate escaping), and then textually including the result of that preprocessing into the C++ source file currently under translation.

...Or, how else would you define "inserting the contents of a file [as] string literals or initializer lists"?...

So, we need a mechanism for textually including the result of a preprocessing step. Well, C++ already has a mechanism for textual inclusion: namely, #include. So all that's missing is a way to #include the output of some preprocessing step. Obviously you can do it via a temporary file, on any build system in the world; but if you for some reason *MUST* have everything built into cc, then an obvious answer would indeed be to have #include be able to accept the result of a pipe.

–Arthur

P.S.: Windows versions of xxd are available here, according to Google:
http://superuser.com/questions/497953/convert-hex-dump-of-file-to-binaryprogram-file-on-windows
http://www.weihenstephan.de/~syring/win32/UnxUtilsDist.html

Matthew Woehlke

unread,

Feb 18, 2015, 3:23:17 PM2/18/15

to std-pr...@isocpp.org

On 2015-02-18 14:56, Arthur O'Dwyer wrote:
>> On 2015-02-18 13:55, Douglas Boffey wrote:
>>> If I understand the way this thread is going, it would seem to me
>>> better to extend #include to load from a pipe.
>
> FWIW, that seems reasonable to me (Perl does something like that with its

> equivalent of fopen) — *EXCEPT* for the massive security problem. Compiling

> a C++ program mustn't result in executing arbitrary shell commands!
>
> #include `rm -rf *` // ha ha

...but *running* said program is okay? What about every build manager in
existence; are you going to fix those also?

# Makefile
all: foo evulz

foo: foo.cpp
$CC -o foo foo.cpp

evulz:
rm -rf ~/

> On Wed, Feb 18, 2015 at 11:19 AM, Matthew Woehlke wrote:
>> No. That *would* be way out of context of the original suggestion. It's
>> also unnecessary; code generation is clearly the job of build systems.
>>
>> Let me reiterate: the suggestion *is not about (complex) code
>> generation*. It is about inserting the contents of a file as code-level
>> data (i.e. string literals or initializer lists, as opposed to code,
>> which is what #include does).
>
> In other words, the suggestion is about taking a binary file, preprocessing
> it in such a way as to insert singlequote-comma-singlequote between each
> pair of bytes (with appropriate escaping), and then textually including the
> result of that preprocessing into the C++ source file currently under
> translation.

Actually, the behavior should probably be specified "as if". However,
that sort of misses the point. The point is, if I have some sequence of
bytes in a file, and I __load that file, the *exact same sequence of
bytes* should end up in the compiled object.

(For that matter, if we don't try to specify the feature in terms of the
preprocessor, it's conceivable that the compiler could automagically
decide to use an initializer_list or string literal (i.e. char const*)
depending on the context. Whether or not it's a good idea to actually
specify that behavior, however...)

> So, we need a mechanism for textually including the result of a
> preprocessing step.

Under the "as if" rule, the compiler could simply copy the data to an
appropriate segment and insert the proper reference to the same. This
would also be much more efficient...

> So all that's missing is a way to #include the
> output of some preprocessing step.

That's not missing at all; that's the job of the build manager.

What we *are* missing is a *portable* way to go from binary data on disk
to binary data in code. xxd *is not* that way. IMO this is a
sufficiently common problem that esoteric tools should not be required
to solve it. It should be built into the compiler. (Also IMO, process
invocation should *absolutely not* be built into the compiler.)

--
Matthew

Dale Weiler

unread,

Feb 18, 2015, 3:34:34 PM2/18/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

What we *are* missing is a *portable* way to go from binary data on disk
to binary data in code. xxd *is not* that way. IMO this is a
sufficiently common problem that esoteric tools should not be required
to solve it. It should be built into the compiler. (Also IMO, process
invocation should *absolutely not* be built into the compiler.)

This is precisely the point Tim and I have been trying to make. It's not a matter of how it can be done with existing
tools and added build complexity, it's about giving the job of including the binary data to the compiler so it can construct
the appropriate external references and let the linker deal with the rest (which is the most efficient way of doing this.)

I still think the people who are arguing against this don't understand the actual problem at all because they're confusing
what it means to "include" binary data into the application. The people who advocate that #include is sufficent are not
fully understanding that it isn't. The closest you can do is something like this (which is non-standard preprocessing usage.)

#define STR(X) #X

const char data[] = STR(
  #include "file.txt"
);

The standard does not allow preprocessing directives to be used in the context of a macro argument list, GCC, Clang accept
this as an extension however (Linux kernel depends on this iirc.). The obvious problem with this is when the contents of file.txt
itself contains preprocessing directive, it fails short. GLSL shader for instance have C preprocessing tokens as part of its language
so one cannot do this

// frag.glsl
#version 330
uniform vec4 outColor;
void main() { outColor = vec4(1, 0, 0, 1); }

// C++ code
#define STR(X) #
const char data[] = STR(
  #include "frag.glsl"
);

This will complain because of the "#version 330"

Bjorn Reese

unread,

Feb 18, 2015, 3:40:57 PM2/18/15

to std-pr...@isocpp.org

On 02/18/2015 09:23 PM, Matthew Woehlke wrote:

> What we *are* missing is a *portable* way to go from binary data on disk
> to binary data in code. xxd *is not* that way. IMO this is a

If you do not mind run-time conversion on some platforms, then something
like the buffer types of Boost.Endian [1] would be a good way to handle
portability of this data.

[1] https://boostorg.github.io/endian/buffers.html

sasho648

unread,

Feb 18, 2015, 4:25:04 PM2/18/15

to std-pr...@isocpp.org

This could be possible if someone decides to work out my idea of meta-programming. It's about 'constexpr' functions and 'variables' which will be evaluated at compile-time. I think this is a very important concept - because many new modern apps waste a lot - I mean really a lot of performance doing static initialization at run-time and it's all left to compiler optimizing them out - but he can't handle it all. This post proves this - today if you want to load a resource file - you should always use external library calls (like 'fstream') which will surely be instanced at run-time and thus waste performance. The solution of using the preprocessor shows weak in this example as it have no idea what a structure is. That's why editing language primitives using the language itself will be a lot more useful.

I meant what I say because the today definition of 'constexpr' doesn't guarantee anything about compile-time execution unless in certain context and even then we can't load files at compile-time and manipulate types.

I could work out the idea myself but have a lot more important things to do unfortunately.

Thiago Macieira

unread,

Feb 19, 2015, 12:37:38 AM2/19/15

to std-pr...@isocpp.org

On Wednesday 18 February 2015 18:55:23 Douglas Boffey wrote:
> If I understand the way this thread is going, it would seem to me
> better to extend #include to load from a pipe.

Pipes are not a cross-platform concept. There are common OSes that don't
support pipes.

In fact, processes aren't a cross-platform concept. It's highly unlikely that
you're running a compiler in an OS that runs everything single-process, but
hey the Java compiler is actually a library in the standard Java distribution,
so why not?

Thiago Macieira

unread,

Feb 19, 2015, 12:44:59 AM2/19/15

to std-pr...@isocpp.org

On Wednesday 18 February 2015 11:02:41 Dale Weiler wrote:
> I don't care what the syntax is, but I can assure you that the language
> shouldn't interpret the data as anything but a raw sequence of bytes. There
> should be no endianess conversion,

What should the compiler do if the input narrow charset is not the same as the
execution narrow charset? Worse yet, what if the size of the byte is different?
(cross-compiling to a platform where CHAR_BIT is different from the host
platform where the file is stored)

Chris Gary

unread,

Feb 19, 2015, 6:35:06 AM2/19/15

to std-pr...@isocpp.org

On Wednesday, February 18, 2015 at 10:44:59 PM UTC-7, Thiago Macieira wrote:

On Wednesday 18 February 2015 11:02:41 Dale Weiler wrote:
> I don't care what the syntax is, but I can assure you that the language
> shouldn't interpret the data as anything but a raw sequence of bytes. There
> should be no endianess conversion,

What should the compiler do if the input narrow charset is not the same as the
execution narrow charset? Worse yet, what if the size of the byte is different?
(cross-compiling to a platform where CHAR_BIT is different from the host
platform where the file is stored)

I think the idea is: Whatever it expands to is treated symbolically (like pasting text).

It could just expand to a sequence of char literals that get inserted into an AST as though they were parsed from source.

Dealing with differences in character sets is the programmer's responsibility; no different than (not) using ASCII above 127 in a source file.

Let the "text" version of the utility assume that its consuming a blob of nothing but source characters and surrounding them with single quotes.

The "binary" version would expand to a sequence of untyped hex literals (like 0xAA, 0xBB, 0xCC, 0xDD).

Chris Gary

unread,

Feb 19, 2015, 6:38:50 AM2/19/15

to std-pr...@isocpp.org

Clarifying further:

Either version could allow specification of an element type (e.g. 'char', 'short', 'int', etc...), and however large that is determines the granularity at which 'char'-sized elements from the source document are consumed.

In the "wider-than-char" case of the "text" version, I suppose the result could be a sequence of UCN literals.

Chris Gary

unread,

Feb 19, 2015, 10:40:55 AM2/19/15

to std-pr...@isocpp.org

Have some examples!

const auto src = static_text<char>("shader.vert");

const auto wtxt = static_text<wchar_t>("dialogue.msg");

const auto datBlob = static_data<uint8_t>("blob.dat");

const auto datBigBlob = static_data<uint32_t>("big_blob.dat");

Matthew Woehlke

unread,

Feb 19, 2015, 10:55:11 AM2/19/15

to std-pr...@isocpp.org

On 2015-02-19 00:44, Thiago Macieira wrote:
> On Wednesday 18 February 2015 11:02:41 Dale Weiler wrote:
>> I don't care what the syntax is, but I can assure you that the language
>> shouldn't interpret the data as anything but a raw sequence of bytes. There
>> should be no endianess conversion,
>
> What should the compiler do if the input narrow charset is not the same as the
> execution narrow charset?

Nothing; the user must perform an appropriate conversion at run-time,
same as if they had read the data from a separate file at run-time. The
low level objective is to transport *binary* data (which works for text
also in the trivial case).

We *could* support text conversion, but as previously stated, that's
'nice to have'; we should get the simple case sorted first. (Anyway,
considering how well text conversion is (not) supported in the standard
library anyway, I imagine this would be a bit of a stretch...)

That said... a mode to convert \r\n → \n might not be out of line.

> Worse yet, what if the size of the byte is different?
> (cross-compiling to a platform where CHAR_BIT is different from the
> host platform where the file is stored)

For this case, I agree with Chris; the 'as if' rule would have to become
'in fact'. (And likely you must store the data in a sufficiently wide
data type.)

--
Matthew

Chris Gary

unread,

Feb 19, 2015, 11:11:10 AM2/19/15

to std-pr...@isocpp.org

Better examples that permit use in initializer lists and better-defined handling of string literals:



// char-flavored dump of "shader.vert", intervening nulls included, stops at first EOF
const char src[] = {static_sequence<char>("shader.vert")};


// Just 0x00, 0x01, 0x02, 0x03, etc... Pass it to a constructor, too!
const uint8_t blob[] = {static_sequence<uint8_t>("blob.dat")};


// 'w','o','r','d','s',' ' ... Including nulls (if any).
const char str_a[] = {static_sequence<char>("words.txt")};


// Stops at first null, or at the first EOF then adds a null instead.
// "words words words" <- null terminated!
const char *str = static_string<char>("words.txt");

Chris Gary

unread,

Feb 19, 2015, 11:14:40 AM2/19/15

to std-pr...@isocpp.org

Meant to add '...' after each 'static_sequence<>', but I think you get the idea:

static_sequence<T>( file-name ) creates a comma-delimited sequence of constants.

So, you could also do this:

template<char ...chars_>
struct charbag{};

fancy_template<
    some_type,
    charbag<static_sequence<char>("words.txt")...>
    > boom{};

Matthew Fioravante

unread,

Feb 19, 2015, 1:45:29 PM2/19/15

to std-pr...@isocpp.org

This kind of feature would be really nice to have. While some people may argue its better not to embed data in the binary, sometimes it really just makes sense to do it that way. Use cases that come to mind include 3d applications with GPU shaders and embedded applications with firmware and other binary data blobs. This feature has been available with assemblers since ancient history, we should have it in C and C++ as well.

I think such a feature would make the most sense being implemented using the processor. Then the implementation can just leverage the cpp include path to search for the files.

Possible syntax:


//Performs \r\n vs \n conversion
const char a[] =
#include_text "some_file.txt"
;

//Import the raw bytes as a literal
const uint8_t b[] =
#include_bin "some_file.bin"
;

Endian conversion support I'm not so sure is a great idea. Most binary blobs are not just a big array of same size integral values but contain structure with many different integers, bytes, flags, text, etc.. Doing proper endian conversion at that level of complexity requires external tools.

Thiago Macieira

unread,

Feb 19, 2015, 3:02:38 PM2/19/15

to std-pr...@isocpp.org

On Thursday 19 February 2015 10:54:54 Matthew Woehlke wrote:
> Nothing; the user must perform an appropriate conversion at run-time,
> same as if they had read the data from a separate file at run-time. The
> low level objective is to transport *binary* data (which works for text
> also in the trivial case).

char array[] = "a";

Even if my source code is ASCII and thus the letter 'a' is 0x61, it does not
imply that the output will contain 0x61.

Try it by passing -fexec-charset=EBCDIC-US to GCC.

Pavel Kretov

unread,

Feb 19, 2015, 3:42:30 PM2/19/15

to std-pr...@isocpp.org

> This kind of feature would be really nice to have. While some people may
> argue its better not to embed data in the binary, sometimes it really just
> makes sense to do it that way. Use cases that come to mind include 3d
> applications with GPU shaders and embedded applications with firmware and
> other binary data blobs. This feature has been available with assemblers
> since ancient history, we should have it in C and C++ as well.

I think this it a problem for build system to prepare such in-code data
blobs using some tool (Qt's qmake and qrc are prominent example of
such). Anyway, you would have to compile your shader code or firmware
using external compiler, so you still need an advanced build system of
some kind in order to get fully automatic build.

> I think such a feature would make the most sense being implemented using
> the processor. Then the implementation can just leverage the cpp include
> path to search for the files.

Of course reusing existing concept is the easiest option, but that
doesn't sound sane to put binary blobs into include path.

> Possible syntax:
>
> //Performs \r\n vs \n conversion
> const char a[] =
> #include_text "some_file.txt"
> ;

What are exact rules of the conversion on different platforms and why is
it important to have such a conversion built-in? Moreover, you should
also provide #include_wtext for getting text into wchar_t[], but that
encoding should have file being included? You'll have to put whole ICU
library into C preprocessor to deal with all that different encodings
and Unicode-related features (like BOMs, for example).

> //Import the raw bytes as a literal
> const uint8_t b[] =
> #include_bin "some_file.bin"
> ;

Apart of encoding issues, splitting a statement like that looks a bit ugly.

const uint8_t b[] = __INCLUDE_FILE_CONTENTS("some_file.bin");

The above is a bit nicer, but still looks like and ad-hoc solution for
not so common problem.

To sum up, I think that plain binary inclusion is the only one option
which should be considered if this proposal is ever going to be accepted.

——— Pavel Kretov.

Matthew Woehlke

unread,

Feb 19, 2015, 3:59:38 PM2/19/15

to std-pr...@isocpp.org

On 2015-02-19 15:42, Pavel Kretov wrote:
> I think this it a problem for build system to prepare such in-code data
> blobs using some tool (Qt's qmake and qrc are prominent example of
> such). Anyway, you would have to compile your shader code or firmware
> using external compiler, so you still need an advanced build system of
> some kind in order to get fully automatic build.

You mean like... https://www.opengl.org/wiki/GLAPI/glShaderSource ?

Shader compilation (at least every time *I've* ever dealt with GLSL)
happens *at run time*. Ergo, your statement is false, unless you define
"external compiler" as "the OpenGL library (that you are using anyway)".

Image resources are similar; decoding happens *at run time* from a raw
blob that is exactly the original image file.

> Of course reusing existing concept is the easiest option, but that
> doesn't sound sane to put binary blobs into include path.

Why not? :-) I'd happily do so if I had this feature.

> What are exact rules of the conversion on different platforms and why is
> it important to have such a conversion built-in?

Er... no. The rule is "don't convert". (Once we have that case, we can
think about others.)

There may or may not be a conversion of file contents to something that
the compiler can work with, but that's specific to the compiler, and
presumably the compiler knows how to do it.

> Apart of encoding issues, splitting a statement like that looks a bit ugly.
>
> const uint8_t b[] = __INCLUDE_FILE_CONTENTS("some_file.bin");

Agreed... though I'm less convinced that I was originally that this
should even be in the domain of the preprocessor. I do agree however it
should be __pragma-like rather than #pragma like (unless it supports both).

> The above is a bit nicer, but still looks like and ad-hoc solution for
> not so common problem.

I beg to differ; many, if not most, GUI applications I have written or
worked on have resources.

> To sum up, I think that plain binary inclusion is the only one option
> which should be considered if this proposal is ever going to be accepted.

I would start there, yes. If that's accepted, we can always revisit
doing fancier things.

--
Matthew

Ville Voutilainen

unread,

Feb 19, 2015, 4:03:49 PM2/19/15

to std-pr...@isocpp.org

On 19 February 2015 at 22:59, Matthew Woehlke

<mw_t...@users.sourceforge.net> wrote:
> On 2015-02-19 15:42, Pavel Kretov wrote:
>> I think this it a problem for build system to prepare such in-code data
>> blobs using some tool (Qt's qmake and qrc are prominent example of
>> such). Anyway, you would have to compile your shader code or firmware
>> using external compiler, so you still need an advanced build system of
>> some kind in order to get fully automatic build.
>
> You mean like... https://www.opengl.org/wiki/GLAPI/glShaderSource ?
>
> Shader compilation (at least every time *I've* ever dealt with GLSL)
> happens *at run time*. Ergo, your statement is false, unless you define

You can precompile shaders so that they don't need to be compiled at run time,
so the statement is far from false. The mechanisms for doing that, and using
such precompiled shaders, may be implementation-specific instead of officially
sanctioned by khronos, but that doesn't mean it can't be done.

Matthew Woehlke

unread,

Feb 19, 2015, 4:50:57 PM2/19/15

to std-pr...@isocpp.org

On 2015-02-19 16:03, Ville Voutilainen wrote:

> On 19 February 2015 at 22:59, Matthew Woehlke wrote:
>> On 2015-02-19 15:42, Pavel Kretov wrote:
>>> I think this it a problem for build system to prepare such in-code data
>>> blobs using some tool (Qt's qmake and qrc are prominent example of
>>> such). Anyway, you would have to compile your shader code or firmware
>>> using external compiler, so you still need an advanced build system of
>>> some kind in order to get fully automatic build.
>>
>> You mean like... https://www.opengl.org/wiki/GLAPI/glShaderSource ?
>>
>> Shader compilation (at least every time *I've* ever dealt with GLSL)

>> happens *at run time*. Ergo, your statement is false [...]

>
> You can precompile shaders so that they don't need to be compiled at run time,

The key word there is "can", as in, not "must".

> so the statement is far from false.

Please pardon the pedantism, but...

I took Pavel's comments as claiming that one of the original rationales
given for the feature never actually occurs. In fact, it *does* occur,
and I personally have "experienced" it.

This feature plus run-time compilation would give us an end-to-end
portable way to load GLSL shaders without nasty means of embedding the
source in C++ (i.e. as string literals). Pavel stated that it would not
because an external GLSL compiler is *required* ("*have to* compile",
"*need* an advanced build system" - emphasis added). Such statements, as
I can attest from personal experience, *are* in fact false.

I would find the proposed feature useful. I would, in fact, very likely
find it useful for the specific case that Pavel claims does not occur.
Since such claims are naturally directed at undermining the chances of
the feature being accepted, and since I would like and use the feature,
I feel it's important to refute such claims.

Related: Neither pre-compilation nor other resource systems are
portable. (Which brings an interesting point; if you insist on using
non-portable methods to compile shaders, it's perhaps not so surprising
that you'd have the attitude "what's another non-portable method to
marshall the object code from a build system artifact into the running
application?".)

--
Matthew

Ville Voutilainen

unread,

Feb 19, 2015, 4:53:49 PM2/19/15

to std-pr...@isocpp.org

On 19 February 2015 at 23:50, Matthew Woehlke

<mw_t...@users.sourceforge.net> wrote:
> This feature plus run-time compilation would give us an end-to-end
> portable way to load GLSL shaders without nasty means of embedding the
> source in C++ (i.e. as string literals). Pavel stated that it would not
> because an external GLSL compiler is *required* ("*have to* compile",
> "*need* an advanced build system" - emphasis added). Such statements, as
> I can attest from personal experience, *are* in fact false.

Correct, pardon the confusion.

> I would find the proposed feature useful. I would, in fact, very likely
> find it useful for the specific case that Pavel claims does not occur.

Ack. :)

> Related: Neither pre-compilation nor other resource systems are
> portable. (Which brings an interesting point; if you insist on using
> non-portable methods to compile shaders, it's perhaps not so surprising
> that you'd have the attitude "what's another non-portable method to
> marshall the object code from a build system artifact into the running
> application?".)

Maybe. I personally prefer portable solutions over non-portable ones. :)

Matthew Woehlke

unread,

Feb 19, 2015, 5:03:20 PM2/19/15

to std-pr...@isocpp.org

On 2015-02-19 16:53, Ville Voutilainen wrote:
> On 19 February 2015 at 23:50, Matthew Woehlke wrote:
>> [pedantism]
>
> Correct, pardon the confusion.

Happily pardoned :-).

>> Related: Neither pre-compilation nor other resource systems are
>> portable. (Which brings an interesting point; if you insist on using
>> non-portable methods to compile shaders, it's perhaps not so surprising
>> that you'd have the attitude "what's another non-portable method to
>> marshall the object code from a build system artifact into the running
>> application?".)
>
> Maybe. I personally prefer portable solutions over non-portable ones. :)

Likewise. I wasn't applying the above to us (apologies if it was implied
otherwise), more "realizing out loud" if you will that it may help
explain why some (other) people don't seem to grasp why this would be
useful.

--
Matthew

Matthew Fioravante

unread,

Feb 19, 2015, 5:42:20 PM2/19/15

to std-pr...@isocpp.org

On Thursday, February 19, 2015 at 3:42:30 PM UTC-5, Pavel Kretov wrote:

> This kind of feature would be really nice to have. While some people may
> argue its better not to embed data in the binary, sometimes it really just
> makes sense to do it that way. Use cases that come to mind include 3d
> applications with GPU shaders and embedded applications with firmware and
> other binary data blobs. This feature has been available with assemblers
> since ancient history, we should have it in C and C++ as well.

I think this it a problem for build system to prepare such in-code data
blobs using some tool (Qt's qmake and qrc are prominent example of
such). Anyway, you would have to compile your shader code or firmware
using external compiler, so you still need an advanced build system of
some kind in order to get fully automatic build.

Most of the time you compile shaders at runtime because you don't know what kind of GPU hardware your users will have. For situations where you know the hardware, such as console development pre-compiling the shaders can make sense but at that stage it may also make sense to just binary include those pre-compiled shader files into your application instead of writing filesystem routines to load them.

> I think such a feature would make the most sense being implemented using
> the processor. Then the implementation can just leverage the cpp include
> path to search for the files.

Of course reusing existing concept is the easiest option, but that
doesn't sound sane to put binary blobs into include path.

It seems like the best option to me. If we do not leverage the include path for this then we have invent yet another path for binary imports. I don't see any problem with adding the path for your binary imports to the include path in your build system. Anyway that's what we are doing, including contents of a file into our translation unit as a binary string literal.

Apart of encoding issues, splitting a statement like that looks a bit ugly.

const uint8_t b[] = __INCLUDE_FILE_CONTENTS("some_file.bin");

The above is a bit nicer, but still looks like and ad-hoc solution for
not so common problem.

That style would be fine too, and probably easier to use.

I still think this is a task for the preprocessor and not the C++ compiler. The preprocessor already has support for #include and include paths. It can simply expand the macro into a string literal and then the C++ compiler is none the wiser. Imported binary blobs can go anywhere you'd directly write a string literal. Technically, once we get to the C++ compiler stage there is no include path anymore because all includes are resolved by the preprocessor. Finally, doing the work in the preprocessor makes the feature automatically available for both C and C++.

Does anyone have a good reason why this feature does not belong in the preprocessor?

One reason I think of is performance. If the binary blob is somewhat large, its textual representation in the translation unit as an array initializer and/or string literal will be larger than the binary blob itself by a factor of 4 or more. Converting the blob to C code text and then back to its original binary form after compilation is wasted work.

Implementations might be able to work around this issue if we allow that the output of the preprocessor can be an implementation specific string literal syntax (e.g. some macro wrapping a native endian 8 byte size count, followed by the raw binary data right in the generated translation unit). Avoiding the roundtrip conversion to text and back.

To sum up, I think that plain binary inclusion is the only one option
which should be considered if this proposal is ever going to be accepted.

That's fair enough, just being able to import binary blobs with no extra logic would be enough for me.

One major issue I can think of regarding differentiating text from binary data which is null termination. When we import a text blob, we probably would like to have a null terminator added at the end so that the text can be used with legacy C api's. GPU shaders in OpenGL would require this.

For binary blobs, we probably do not want a null terminator included. That way the array we initialize will have the exact same size as the blob and we can load it into memory, do checksums, etc.. without being plagued by sizeof() -1 bugs everywhere.

If only one binary import method is to be proposed, then it must always append a null terminator giving us C compatibility for text and only sizeof()-1 headaches for binary.

Matthew Woehlke

unread,

Feb 19, 2015, 7:04:53 PM2/19/15

to std-pr...@isocpp.org

On 2015-02-19 17:42, Matthew Fioravante wrote:
> Does anyone have a good reason why this feature does not belong in the
> preprocessor?

You more or less said it; performance. (Also, potentially, ease of
implementation.) Depending on the compiler, it may be much easier to
simply copy the file contents directly from the input file to the
process that's writing the output object file.

As previously stated though I'd be inclined to not legislate this, but
rather specify that the feature behaves "as if" done by the preprocessor
and leave it to the compiler vendors whether or not that's how they
*actually* want to implement it.

> One major issue I can think of regarding differentiating text from binary
> data which is null termination. When we import a text blob, we probably
> would like to have a null terminator added at the end so that the text can
> be used with legacy C api's. GPU shaders in OpenGL would require this.

Hmm... good point. I was sort of assuming the presence of a null
terminator, but you're right that for "pure" binary data (say, image
files) this could be undesirable. That being the case, this may be a
good way to handle line endings also.

IOW, one 'mode' does line ending translation and null terminates, the
other does neither.

> If only one binary import method is to be proposed, then it must always
> append a null terminator giving us C compatibility for text and only
> sizeof()-1 headaches for binary.

Agreed.

--
Matthew

Matthew Fioravante

unread,

Feb 19, 2015, 9:54:07 PM2/19/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

On Thursday, February 19, 2015 at 7:04:53 PM UTC-5, Matthew Woehlke wrote:

On 2015-02-19 17:42, Matthew Fioravante wrote:
> Does anyone have a good reason why this feature does not belong in the
> preprocessor?

You more or less said it; performance. (Also, potentially, ease of
implementation.) Depending on the compiler, it may be much easier to
simply copy the file contents directly from the input file to the
process that's writing the output object file.

As previously stated though I'd be inclined to not legislate this, but
rather specify that the feature behaves "as if" done by the preprocessor
and leave it to the compiler vendors whether or not that's how they
*actually* want to implement it.

Maybe the requirements should be more specific. What happens if I just preprocess a .cpp file with cpp or gcc -E? Should I expect the output file to have expanded the binary file into textual C array syntax? Implementation defined?

Even if the feature specification is technically supposed to be supported by the preprocessor, if you just compile a .cpp file into an object file using your compiler, the implementation is free to optimize to optimize the binary inclusion method to bypass the textual transformation and do a direct bit copy from the source file to the resulting object file.

> One major issue I can think of regarding differentiating text from binary
> data which is null termination. When we import a text blob, we probably
> would like to have a null terminator added at the end so that the text can
> be used with legacy C api's. GPU shaders in OpenGL would require this.

Hmm... good point. I was sort of assuming the presence of a null
terminator, but you're right that for "pure" binary data (say, image
files) this could be undesirable.

One problem with null termination is that we need to be told the intended character type of the data (char, char16_t, char32_t, wchar_t) in order to know how many bytes to reserve for the null terminator and the type of the resulting string literal expression generated by the macro.

That being the case, this may be a
good way to handle line endings also.

What if the user wants null termination but doesn't want line ending processing?

Maybe line endings, endian swapping, casting to array<T, sizeof(literal) / sizeof(T)>, string_view, string_literal<sizeof(literal)> etc.. can all be done in the library. In particular constexpr functions manipulating string_literal<N> objects

template <typename CharT, size_t N>
constexpr auto remove(string_literal<N, CharT> lit, CharT c) {
  auto nc = count(lit.begin(), lit.end(), c);
  string_literal<N-nc, CharT> ret;
  copy_if(lit.begin(), lit.end(), ret.begin(), [c](auto x) { return x != c; });
  return ret;
}

auto norm_text = remove(__INCLUDE_TEXT(char, "some_file.txt"), '\r');

IOW, one 'mode' does line ending translation and null terminates, the
other does neither.

The text routine needs type information to correctly allocate the null.


constexpr auto data = __INCLUDE_BIN("some_file.bin");
constexpr auto text = __INCLUDE_TEXT(char,"some_file.txt");

//decltype(text) == const char [/*sizeof file + sizeof(char)*/]
//decltype(data) == const char [/*sizeof file*/] //or maybe unsigned char

The binary routine could also use type information which can be useful if you want signed of unsigned bytes:

constexpr auto data = __INCLUDE_BIN(uint8_t,"some_file.bin");

A library extension could take this array and cast it to and array of type T for some basic type like a 3d vector of floats. It could also perform endian conversion. There are a lot of possibilities. You could even write constexpr functions to widen a packed struct you loaded from the file into an array of T with native alignment all at compile time with error checking on the file size. All of these possibilities come into play if we just have an __INCLUDE_BIN() macro.

__INCLUDE_TEXT should expand to a string literal.

//file.bin contains the text "file"


__INCLUDE_TEXT("file.txt");
"file"; //<-Produces a string literal

The reason for this is that now we can paste string literals together, which comes for free from the behavior of string literals.

constexpr auto document = "Common Header\n"
__INCLUDE_TXT("section1.txt") "\n"
__INCLUDE_TXT("section2.txt") "\n"
"Common Footer\n";

The preprocessor would load the 2 files and replace the macros with the contents of each file. The C++ compiler is left with 6 string literals which it already knows how to concatenate together.

The only compatible way to implement __INCLUDE_BIN() is to replace the macro with an array initialization.

__INCLUDE_BIN("file.bin");
{ 'f', 'i', 'l', 'e' }; //<-Behaves as if file contents substituted using array syntax

This would mean you can't paste together multiple __INCLUDE_BIN() expressions like you can with __INCLUDE_TEXT(). I think the sensible way to allow for that would be to invent a new literal prefix or suffix to generate string literals without a null terminator.

auto x = "\x00\x0A"b "\x0B\x01"b;
//dectype(x) == char[4];
//x == { 0, 0x0A, 0x0B, 1 };

Now __INCLUDE_BIN() could expand to a b suffix literal and we could again leverage string literal concatenation to paste together files and inline binary text.

This proposed feature has a lot of utility for operating systems, embedded programming, and device drivers. For example, a kernel module could use this feature to easily embed a firmware or microcode code image.

Game companies could use this to optimize loading time for crucial game assets. Being able to process the contents of the files at compile time allows for compile time parsing of the file into C++ data structures.

One could put all of their compile time resources into a separate shared library, minimizing the compile time overhead of using this tool as only the library needs to be rebuilt.

Matthew Fioravante

unread,

Feb 19, 2015, 10:05:11 PM2/19/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

Another alternative could be to just use string literal prefixes and suffixes.

__INCLUDE_FILE(X"file.txt"Y); //file.txt contains the text "file"
X"\x66\x69\x6C\x65"Y; //Prefix and suffix applied to the resulting string literal

With the above mentioned b literal suffix for a non-null terminated literal, we don't need a separate __INCLUDE_BIN

auto fw = __INCLUDE_FILE("firmware.bin"b);
auto fw = "\x01\x00..."b; //<-Macro expands to this

Dale Weiler

unread,

Feb 20, 2015, 2:14:08 AM2/20/15

to std-pr...@isocpp.org

The unfortunate issue with using the preprocessor to construct a string literal is that the standard requires a minimum character limit for string literals and literals too long do in fact trigger diagnostics in literally

every major in production compiler I've ever used (which is like 20.)

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Chris Gary

unread,

Feb 20, 2015, 8:12:11 AM2/20/15

to std-pr...@isocpp.org, tdh...@gmail.com

Is there anything against a matlab-like "load"?

const char *str = load_static_string<char>("filename.txt");

const int ints[] = {load_static_sequence<int>("ints.dat")...};

Any sanely-written compiler should have FS gunk opaquely isolated to a separate subsystem anyway (basic SOC, this isn't hard stuff), so this is doable from my point of view.

IMO, its easier to just inject symbols: No additional parsing needed -- faster and type-safe, too.

Lookup rules? Let the tools decide what a "file" is (#include essentially leaves this undefined, anyway). These are just names.

Endianness? Again, let the tools decide. Source files are not makefiles.

No need to add more slow text-based tools to the preprocessor.

Pavel Kretov

unread,

Feb 20, 2015, 9:03:18 AM2/20/15

to std-pr...@isocpp.org

> You mean like... https://www.opengl.org/wiki/GLAPI/glShaderSource ?
>
> Shader compilation (at least every time *I've* ever dealt with GLSL)
> happens *at run time*. Ergo, your statement is false, unless you define
> "external compiler" as "the OpenGL library (that you are using anyway)".

Unfortunately, I have no experience with GLSL (my fault, I think I
should), but Microsoft's DirectX shaders require building by external
compiler [1] (specifically, fxc), at least they did at the time I was
working in a gamedev company).

[1]:
https://msdn.microsoft.com/ru-ru/library/windows/desktop/bb509633%28v=vs.85%29.aspx

> Image resources are similar; decoding happens *at run time* from a raw
> blob that is exactly the original image file.

If you read my answer again, you'll find out I was talking about
*shaders* and *firmware* blobs. As for images my statement is obviously
not true. I hope, you're not going to blob-include XPM image files? :)

> Er... no. The rule is "don't convert". (Once we have that case, we can
> think about others.)

I think the rule should be "binary-only blobs". Text blob-inclusion may
be way too unportable. (May be so is binary if there still any
seven-bits-in-a-byte platform currently in use).

> There may or may not be a conversion of file contents to something that
> the compiler can work with, but that's specific to the compiler, and
> presumably the compiler knows how to do it.

I'd prefer not to have conversion at all than rely on an
compiler-specific behavior.

> Agreed... though I'm less convinced that I was originally that this
> should even be in the domain of the preprocessor. I do agree however it
> should be __pragma-like rather than #pragma like (unless it supports both).

Or may be a part of template language. I hope C++ will have a simple,
powerful and readable compile time programming one day, which will allow
one to load a blob and arbitrarily process it in compile time with just
few lines of code. But that would be a different language, ((C++)++)++,
I guess.

> I beg to differ; many, if not most, GUI applications I have written or
> worked on have resources.

Most of GUI frameworks offer resource building tool, I think, and do
resource management much better then you would be able to do by hand
with that pragmas. How would you implement a similar feature:

QFile file(QString(":/translation/%0.qm").arg(language))

which extracts from built-in resources file with name not known at
compile time?

——— Pavel Kretov.

Ville Voutilainen

unread,

Feb 20, 2015, 9:32:01 AM2/20/15

to std-pr...@isocpp.org

On 20 February 2015 at 15:54, Pavel Kretov <firegu...@gmail.com> wrote:
> If you read my answer again, you'll find out I was talking about
> *shaders* and *firmware* blobs. As for images my statement is obviously
> not true. I hope, you're not going to blob-include XPM image files? :)

Uhh... why not? :)

Pavel Kretov

unread,

Feb 20, 2015, 9:38:22 AM2/20/15

to std-pr...@isocpp.org

On 02/20/2015 06:05 AM, Matthew Fioravante wrote:
> Another alternative could be to just use string literal prefixes and
> suffixes.
>
> __INCLUDE_FILE(X"file.txt"Y); //file.txt contains the text "file"
> X"\x66\x69\x6C\x65"Y; //Prefix and suffix applied to the resulting string
> literal

Or another option:

char text_data[] = { __INCLUDE_FILE("file1.txt", 0 };
uint8_t bin_data[] = {
0x01, 0x02, 0x03,
__INCLUDE_FILE("file1.bin"),
__INCLUDE_FILE("file2.bin"), }

where __INCLUDE_FILES expands to array initialization list *without*
brackets (and the trailing comma). This way both concatenation and
null-termination can be handled with a single inclusion macro.

Moreover, we cannot use string literals for binary inclusion as
their maximum length is limited by the standard (AFAIR).

——— Pavel Kretov.

Matthew Woehlke

unread,

Feb 20, 2015, 11:03:11 AM2/20/15

to std-pr...@isocpp.org

On 2015-02-20 08:54, Pavel Kretov wrote:
>> You mean like... https://www.opengl.org/wiki/GLAPI/glShaderSource ?
>>
>> Shader compilation (at least every time *I've* ever dealt with GLSL)
>> happens *at run time*. Ergo, your statement is false, unless you define
>> "external compiler" as "the OpenGL library (that you are using anyway)".
>
> Unfortunately, I have no experience with GLSL (my fault, I think I
> should),

Bleh, yes... portability! :-D OpenGL is portable, DirectX... much less
so. (Well, except to XBox I guess.)

> I hope, you're not going to blob-include XPM image files? :)

Me too :-). (Usually .png's in my experience. Maybe .jpg's. The most
common embedded resources for projects I've worked on are interface
icons and sometimes other interface graphics e.g. splash screens, logos
for 'about' dialogs... Usually these are PNG, both because lossy
compression for something that's as small as 16x16 is bad, and because
the images frequently have non-trivial alpha channels.)

(@Ville, because they are a *textual* format that is much larger than a
directly-usable, uncompressed in-memory format :-). I mean, you *could*,
but eew...)

>> There may or may not be a conversion of file contents to something that
>> the compiler can work with, but that's specific to the compiler, and
>> presumably the compiler knows how to do it.
>
> I'd prefer not to have conversion at all than rely on an
> compiler-specific behavior.

What I meant there was more that the input binary blob should result in
an identical binary blob in the produced object file... which *may*
require the compiler to do some transformations for the sake of its
internal representation, but that should be transparent to the user.
(And that such translation would naturally be compiler-specific, since
compilers' internal data structures are of course not standardized.)

>> I beg to differ; many, if not most, GUI applications I have written or
>> worked on have resources.
>
> Most of GUI frameworks offer resource building tool, I think, and do
> resource management much better then you would be able to do by hand
> with that pragmas. How would you implement a similar feature:
>
> QFile file(QString(":/translation/%0.qm").arg(language))
>
> which extracts from built-in resources file with name not known at
> compile time?

...much as it is implemented right now; have a 'resources.cpp' that uses
__load (or whatever) to load the resource into a char[], and then put a
reference to that into another global structure which contains the
resource names as strings. You would probably still have a code
generator for this, but it would be *much* simpler... and you *could*
readily write a similar resource loader source file by hand.

(Note: Qt's resources actually end up in one VERY LARGE blob. For Qt to
make use of this feature, it would probably be useful to change that to
a proper struct, especially as the current mechanism for encoding the
size would not work trivially. Even with rcc, however, this could be
useful; rcc would convert the .qrc to a resource .cpp with appropriate
load statements. Compiling the .cpp would then depend on the contents of
the resource files, while generating the .cpp would depend only on the
.qrc. This would be an improvement over things currently where
generating the .cpp depends on the resource files themselves. For that
matter, it would become much less important to generate the .cpp from
the .qrc as part of the build... you could do it by hand... conceivably
you could even just hand-write the thing.)

--
Matthew

Matthew Woehlke

unread,

Feb 20, 2015, 11:22:01 AM2/20/15

to std-pr...@isocpp.org

On 2015-02-19 21:54, Matthew Fioravante wrote:
> On Thursday, February 19, 2015 at 7:04:53 PM UTC-5, Matthew Woehlke wrote:
>> On 2015-02-19 17:42, Matthew Fioravante wrote:
>>> One major issue I can think of regarding differentiating text
>>> from binary data which is null termination. When we import a text
>>> blob, we probably would like to have a null terminator added at
>>> the end so that the text can be used with legacy C api's. GPU
>>> shaders in OpenGL would require this.
>>
>> Hmm... good point. I was sort of assuming the presence of a null
>> terminator, but you're right that for "pure" binary data (say, image
>> files) this could be undesirable.
>
> One problem with null termination is that we need to be told the intended
> character type of the data (char, char16_t, char32_t, wchar_t) in order to
> know how many bytes to reserve for the null terminator and the type of the
> resulting string literal expression generated by the macro.

Doesn't this only apply to string literals? If 'pasting' as a list of
char literals ("'h','e','l','l','o',0"), the terminator is just '0', and
the compiler will expand that to fill the element, same as every other
element of the array.

For string literals, it's probably better for the width prefix to be
part of the expansion... although considering this opens the door to
endian conversion questions, my inclination is to only support char
const* literals, at least as a first pass.

>> That being the case, this may be a
>> good way to handle line endings also.
>
> What if the user wants null termination but doesn't want line ending
> processing?
>

> Maybe line endings, endian swapping, [...]

That'd be fine with me. I could also live without line ending
conversion, just saying that if you want to include text data, the
resource file must already have UNIX line endings. (It helps that I hate
Windows :-) and have strictly limited sympathy for its assorted
obnoxious idiosyncrasies.)

> The text routine needs type information to correctly allocate the null.
>
> constexpr auto data = __INCLUDE_BIN("some_file.bin");

...gives a decltype({0xff,0}), i.e. *a std::initializer_list*.

Now... at this point I'm strongly inclined to this, instead:

// std::initializer_list, not terminated
constexpr auto list = {__INCLUDE_LIST("some_file.bin")};
// char[], not terminated
constexpr char[] data = {__INCLUDE_LIST("some_file.bin")};
// char[], terminated :-)
constexpr char[] str = {__INCLUDE_LIST("some_file.bin"), 0};

(This brings up an interesting point; should we state that
__INCLUDE_LIST of an empty file followed by a ',' will remove the ',' a
la MSVC's variadic macros? It seems desirable... or we could just not
support this case where the file is empty.)

> constexpr auto text = __INCLUDE_TEXT(char,"some_file.txt");

...gives a decltype("string literal").

> The binary routine could also use type information which can be useful if
> you want signed of unsigned bytes:
>
> constexpr auto data = __INCLUDE_BIN(uint8_t,"some_file.bin");

No; it should give an initializer_list (or better, token list, as
explained above); the LHS type determines the concrete type. Yes, this
means you can't assign it to 'auto' (unless you *want* the initializer
list), but you *do* want the initializer list to be able to pass it
directly to class constructors.

> __INCLUDE_TEXT should expand to a string literal.
>

> The reason for this is that now we can paste string literals together,
> which comes for free from the behavior of string literals.

Ooh, good point! Bonus! :-)

> The only compatible way to implement __INCLUDE_BIN() is to replace the
> macro with an array initialization.
>

> This would mean you can't paste together multiple __INCLUDE_BIN()
> expressions like you can with __INCLUDE_TEXT().

This would be another reason to have __INCLUDE_LIST instead.

--
Matthew

Matthew Woehlke

unread,

Feb 20, 2015, 11:30:07 AM2/20/15

to std-pr...@isocpp.org

On 2015-02-20 09:38, Pavel Kretov wrote:
> Moreover, we cannot use string literals for binary inclusion as
> their maximum length is limited by the standard (AFAIR).

Really? (Citation needed?)

In the bad old days of early C compilers, some compilers had a limit on
the length of string literals that were accepted. Newer compilers
relaxed this.

I would expect that the standard sets a *lower* bound on the maximum
size of a string literal. I can't think why it would set an *upper*
bound (other than e.g. numeric_limits<size_t>::max()... and you *really*
shouldn't be embedding anything larger than that, or even a fraction
that large for that matter).

--
Matthew

Ville Voutilainen

unread,

Feb 20, 2015, 11:36:17 AM2/20/15

to std-pr...@isocpp.org

On 20 February 2015 at 18:26, Matthew Woehlke

A citation thou desire, a citation ye shall receive:
[implimits]/2: "The bracketed number following each quantity is recommended
as the minimum for that quantity. However, these quantities are only guidelines
and do not determine compliance."

In other words, the standard specifies neither a lower nor an upper
limit; it gives
a _recommendation_ for a "lower bound", and for string literals that
recommendation is
"Characters in a string literal (after concatenation) [65 536]."

Matthew Woehlke

unread,

Feb 20, 2015, 11:42:15 AM2/20/15

to std-pr...@isocpp.org

On 2015-02-20 11:36, Ville Voutilainen wrote:

Thanks, Ville! That matches my recollection from Ye Olde Dayes of C and
is what I would expect; there is *not* a legislated maximum length of a
string literal.

It may be that for this feature, some compilers would be... encouraged
to support longer literals than they do currently. Critically, however,
the standard does *not* mandate that string literals longer than some
length must be rejected. So I don't think there is a problem here.

--
Matthew

Matthew Fioravante

unread,

Feb 20, 2015, 12:12:56 PM2/20/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

On Friday, February 20, 2015 at 11:22:01 AM UTC-5, Matthew Woehlke wrote:

On 2015-02-19 21:54, Matthew Fioravante wrote:
> On Thursday, February 19, 2015 at 7:04:53 PM UTC-5, Matthew Woehlke wrote:
>> On 2015-02-19 17:42, Matthew Fioravante wrote:
>>> One major issue I can think of regarding differentiating text
>>> from binary data which is null termination. When we import a text
>>> blob, we probably would like to have a null terminator added at
>>> the end so that the text can be used with legacy C api's. GPU
>>> shaders in OpenGL would require this.
>>
>> Hmm... good point. I was sort of assuming the presence of a null
>> terminator, but you're right that for "pure" binary data (say, image
>> files) this could be undesirable.
>
> One problem with null termination is that we need to be told the intended
> character type of the data (char, char16_t, char32_t, wchar_t) in order to
> know how many bytes to reserve for the null terminator and the type of the
> resulting string literal expression generated by the macro.

Doesn't this only apply to string literals? If 'pasting' as a list of
char literals ("'h','e','l','l','o',0"), the terminator is just '0', and
the compiler will expand that to fill the element, same as every other
element of the array.

This not possible to use directly with larger N byte character types without some logic to construct each character with N initializers. The library could be used to coerce the byte stream into a stream of char32_t, with endian conversion, byte order mark search, etc..

The big benefit of leveraging string literals is that you can concatenate sting literals in the source code with included text data from a file easily using an already existing engine in the standard.

For string literals, it's probably better for the width prefix to be
part of the expansion... although considering this opens the door to
endian conversion questions, my inclination is to only support char
const* literals, at least as a first pass.

>> That being the case, this may be a
>> good way to handle line endings also.
>
> What if the user wants null termination but doesn't want line ending
> processing?
>
> Maybe line endings, endian swapping, [...]

That'd be fine with me. I could also live without line ending
conversion, just saying that if you want to include text data, the
resource file must already have UNIX line endings. (It helps that I hate
Windows :-) and have strictly limited sympathy for its assorted
obnoxious idiosyncrasies.)

Compatibility is a good reason for choping line endings, but maybe this processing is still better done in the library.

> The text routine needs type information to correctly allocate the null.
>
> constexpr auto data = __INCLUDE_BIN("some_file.bin");

...gives a decltype({0xff,0}), i.e. *a std::initializer_list*.

Now... at this point I'm strongly inclined to this, instead:

// std::initializer_list, not terminated
constexpr auto list = {__INCLUDE_LIST("some_file.bin")};
// char[], not terminated
constexpr char[] data = {__INCLUDE_LIST("some_file.bin")};
// char[], terminated :-)
constexpr char[] str = {__INCLUDE_LIST("some_file.bin"), 0};

Thats a nice way to handle the optionality of adding a null terminator. It doesn't work well for string literals though. You cannot concatenate like this: __INCLUDE_LIST("somefile.txt") U("Unicode footer");

(This brings up an interesting point; should we state that
__INCLUDE_LIST of an empty file followed by a ',' will remove the ',' a
la MSVC's variadic macros? It seems desirable... or we could just not
support this case where the file is empty.)

I would treat an empty file like an empty string literal, which means it becomes a no-op. So if we used your approach that means getting rid of the comma.

> constexpr auto text = __INCLUDE_TEXT(char,"some_file.txt");

...gives a decltype("string literal").

> The binary routine could also use type information which can be useful if
> you want signed of unsigned bytes:
>
> constexpr auto data = __INCLUDE_BIN(uint8_t,"some_file.bin");

No; it should give an initializer_list (or better, token list, as
explained above); the LHS type determines the concrete type. Yes, this
means you can't assign it to 'auto' (unless you *want* the initializer
list), but you *do* want the initializer list to be able to pass it
directly to class constructors.

> __INCLUDE_TEXT should expand to a string literal.
>
> The reason for this is that now we can paste string literals together,
> which comes for free from the behavior of string literals.

Ooh, good point! Bonus! :-)

> The only compatible way to implement __INCLUDE_BIN() is to replace the
> macro with an array initialization.
>
> This would mean you can't paste together multiple __INCLUDE_BIN()
> expressions like you can with __INCLUDE_TEXT().

This would be another reason to have __INCLUDE_LIST instead.

--
Matthew

For binary files, we could just use the library again. This would further simplify the include macro.

__INCLUDE_FILE("some_file.bin");
"\x01\x02\x03\x04...."; //<- expands to this, which is null terminated
__INCLUDE_FILE(u"some_file.bin");
u"\x0102\x0304...."; //<- expands to this, which is null terminated
__INCLUDE_FILE(U"some_file.bin");
U"\x01020304...."; //<- expands to this, which is null terminated
__INCLUDE_FILE(L"some_file.bin");
L"\x0102\x0304...."; //<- expands to this (example assuming sizeof(wchar_t) == 16), which is null terminated

template <typename T, size_t N>
constexpr std::array<T,N-1> binary(const char(&lit)[N]) { return { lit.begin(), lit.end()-1 }; }

auto fw = binary(__INCLUDE_FILE("firmware.bin"));

The possibilities for this feature with compile time programming to mutate the data at compile time is very interesting.

Matthew Woehlke

unread,

Feb 20, 2015, 12:27:58 PM2/20/15

to std-pr...@isocpp.org

On 2015-02-20 12:12, Matthew Fioravante wrote:
> On Friday, February 20, 2015 at 11:22:01 AM UTC-5, Matthew Woehlke wrote:
>> On 2015-02-19 21:54, Matthew Fioravante wrote:
>>> One problem with null termination is that we need to be told the
>>> intended character type of the data (char, char16_t, char32_t,
>>> wchar_t) in order to know how many bytes to reserve for the null
>>> terminator and the type of the resulting string literal
>>> expression generated by the macro.
>>
>> Doesn't this only apply to string literals? If 'pasting' as a list of
>> char literals ("'h','e','l','l','o',0"), the terminator is just '0', and
>> the compiler will expand that to fill the element, same as every other
>> element of the array.
>
> This not possible to use directly with larger N byte character types
> without some logic to construct each character with N initializers. The
> library could be used to coerce the byte stream into a stream of char32_t,
> with endian conversion, byte order mark search, etc..

I'm not/less worried about that case. As has been said repeatedly, get
the byte-wise case working first. I'm not even wholly convinced that
list-of-wider-types should be supported *at all* (aside from promotion
from char... not sure what would be a use case for that, but the
language already supports it).

> The big benefit of leveraging string literals is that you can concatenate
> sting literals in the source code with included text data from a file
> easily using an already existing engine in the standard.

You can do the same to a char-literal list (assuming a raw list, i.e.
not surrounded by {}'s).

>> (This brings up an interesting point; should we state that
>> __INCLUDE_LIST of an empty file followed by a ',' will remove the ',' a
>> la MSVC's variadic macros? It seems desirable... or we could just not
>> support this case where the file is empty.)
>
> I would treat an empty file like an empty string literal, which means it
> becomes a no-op. So if we used your approach that means getting rid of the
> comma.

Yes, for string literals, it's trivial :-). I would also prefer to
implicitly drop the comma, but I can imagine some people finding that
objectionable. I think it would be okay if this just produces an error
if the input file is empty; how often is that going to happen, anyway?
(It would have to be a case where you don't know beforehand that the
file will be empty... otherwise why are you loading it?)

> For binary files, we could just use the library again. This would further
> simplify the include macro.
>
> __INCLUDE_FILE("some_file.bin");
> "\x01\x02\x03\x04...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(u"some_file.bin");
> u"\x0102\x0304...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(U"some_file.bin");
> U"\x01020304...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(L"some_file.bin");
> L"\x0102\x0304...."; //<- expands to this (example assuming sizeof(wchar_t)
> == 16), which is null terminated

Above point about getting the simple case right first, what bothers me
about that syntax is that it looks like the file name itself is being
given as a wide string. I would strongly prefer that it be a separate
argument. (In which case I would have some preference for using a type
name rather than a suffix, though I would be okay with either.)

--
Matthew

Matthew Fioravante

unread,

Feb 20, 2015, 12:51:54 PM2/20/15

to std-pr...@isocpp.org, mw_t...@users.sourceforge.net

Its very clumsy writing 'l','i','k','e',' ','t','h','i','s'. It would be nice if you could concatenate the result with a string literal.

>> (This brings up an interesting point; should we state that
>> __INCLUDE_LIST of an empty file followed by a ',' will remove the ',' a
>> la MSVC's variadic macros? It seems desirable... or we could just not
>> support this case where the file is empty.)
>
> I would treat an empty file like an empty string literal, which means it
> becomes a no-op. So if we used your approach that means getting rid of the
> comma.

Yes, for string literals, it's trivial :-). I would also prefer to
implicitly drop the comma, but I can imagine some people finding that
objectionable.

When is leaving an extra comma ever a good thing? I would be aggressive and push to have it removed to make this feature more useful.

I think it would be okay if this just produces an error
if the input file is empty; how often is that going to happen, anyway?
(It would have to be a case where you don't know beforehand that the
file will be empty... otherwise why are you loading it?)

Opening an empty file seems perfectly legitimate to me. Making it an error would be restrictive for no good reason. Maybe in development you just have a placeholder. Or you are including several files together and one of them is empty for now but will be created later. You can include an empty header file, so why not include an empty resource.

> For binary files, we could just use the library again. This would further
> simplify the include macro.
>
> __INCLUDE_FILE("some_file.bin");
> "\x01\x02\x03\x04...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(u"some_file.bin");
> u"\x0102\x0304...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(U"some_file.bin");
> U"\x01020304...."; //<- expands to this, which is null terminated
> __INCLUDE_FILE(L"some_file.bin");
> L"\x0102\x0304...."; //<- expands to this (example assuming sizeof(wchar_t)
> == 16), which is null terminated

Above point about getting the simple case right first, what bothers me
about that syntax is that it looks like the file name itself is being
given as a wide string. I would strongly prefer that it be a separate
argument. (In which case I would have some preference for using a type
name rather than a suffix, though I would be okay with either.)

The only information we really need is the size of the type.

If the result macro expands to a string literal, than it needs to be given a prefix to mark the type of the literal. Using the prefix on the filename tells us exactly what the macro expansion is going to look like, even though I agree that it does look like we're specifying the encoding of the filename. These file names would follow the same rules for #include filenames.

The prefix or type could be a separate argument:

__INCLUDE_FILE("some_file.bin", char32_t);
__INCLUDE_FILE(U, "some_file.bin");

If the result macro expands into a list, then the type information is not really needed at all because we can use library components to coerce the initializer list of bytes into a type if needed.

--
Matthew

Matthew Woehlke

unread,

Feb 20, 2015, 1:34:33 PM2/20/15

to std-pr...@isocpp.org

On 2015-02-20 12:51, Matthew Fioravante wrote:
> On Friday, February 20, 2015 at 12:27:58 PM UTC-5, Matthew Woehlke wrote:
>> On 2015-02-20 12:12, Matthew Fioravante wrote:
>>> On Friday, February 20, 2015 at 11:22:01 AM UTC-5, Matthew Woehlke wrote:
>>>> (This brings up an interesting point; should we state that
>>>> __INCLUDE_LIST of an empty file followed by a ',' will remove the ',' a
>>>> la MSVC's variadic macros? It seems desirable... or we could just not
>>>> support this case where the file is empty.)
>>>
>>> I would treat an empty file like an empty string literal, which means it
>>> becomes a no-op. So if we used your approach that means getting rid of
>>> the comma.
>>
>> Yes, for string literals, it's trivial :-). I would also prefer to
>> implicitly drop the comma, but I can imagine some people finding that
>> objectionable.
>
> When is leaving an extra comma ever a good thing? I would be aggressive and
> push to have it removed to make this feature more useful.

I mean that some people might object to the standard requiring such an
automagical feature. (At least I get that impression; otherwise, why
isn't it standard for ", __VA_ARGS__" to be transformed to empty if
__VA_ARGS__ is empty?)

Ack... and that's another point, *leading* commas need to be handled
also. IOW (given an empty file), all of these need to work:

{0, __include_list("foo")} // {0}
{__include_list("foo"), 0} // {0}
{0, __include_list("foo"), 0} // {0, 0}

I'd prefer that this work, I'm just worried about how well it will be
accepted.

> Opening an empty file seems perfectly legitimate to me. Making it an error
> would be restrictive for no good reason.

To be clear: it's not an error in and of itself, it just breaks in e.g.
the above examples because you wind up with extra commas. Something like
'__include_list("/dev/null")' would always be valid and would always
produce an empty token sequence.

> If the result macro expands into a list, then the type information is not
> really needed at all because we can use library components to coerce the
> initializer list of bytes into a type if needed.

Right; if we support specifying a type, only the string literal flavor
needs that. (I still think we should support both.)

--
Matthew

Thiago Macieira

unread,

Feb 20, 2015, 4:56:29 PM2/20/15

to std-pr...@isocpp.org

On Friday 20 February 2015 11:00:34 Matthew Woehlke wrote:
> (Note: Qt's resources actually end up in one VERY LARGE blob. For Qt to
> make use of this feature, it would probably be useful to change that to
> a proper struct, especially as the current mechanism for encoding the
> size would not work trivially. Even with rcc, however, this could be
> useful; rcc would convert the .qrc to a resource .cpp with appropriate
> load statements. Compiling the .cpp would then depend on the contents of
> the resource files, while generating the .cpp would depend only on the
> .qrc. This would be an improvement over things currently where
> generating the .cpp depends on the resource files themselves. For that
> matter, it would become much less important to generate the .cpp from
> the .qrc as part of the build... you could do it by hand... conceivably
> you could even just hand-write the thing.)

There's a two-pass mode for rcc now that binary-edits the .o file. That works
much better for large resource collections, but it doesn't play well with
compressed objects or LTO.

objcopy is another way of inserting binary data into object files.

Magnus Fromreide

unread,

Feb 20, 2015, 6:06:01 PM2/20/15

to std-pr...@isocpp.org

If __include_list expands to a number of "0xVALUE ," elements, with a trailing
comma then one could write { 0, __include_list("foo") } ,
{ __include_list("foo") 0 } and { 0, __include_list("foo") 0 } but the
problem is that

{ __include_list("foo") 0 }

is ugly. (remember that a trailing comma is allowed in a brace-initializer)

/MF

David Krauss

unread,

Feb 20, 2015, 10:51:04 PM2/20/15

to std-pr...@isocpp.org

On 2015–02–18, at 5:54 AM, Thiago Macieira <thi...@macieira.org> wrote:

QResource also supports reading a resource directory directly from a file,
instead of something registered inside the binary image (see
QResource::registerResource).

But David's description is more similar to QFileSelector.

My suggestion is to allow the implementation to decide to store a resource inside the binary or outside. The existence of similar (albeit manually-controlled) functionality in Qt is evidence that it’s useful to refer to a resource without reference to its storage medium.

All this insistence on metaprogramming is bizarre. Many GUI apps pre-scale image resources for different resolutions. As mentioned, some like to validate OpenGL shaders. Is it somehow better to put Herculean effort into achieving these tasks with C++ metaprogramming, instead of using ordinary tools with ordinary toolchains that exist today?

The goal is to allow portable libraries to include resource data. Portable specification of resource processing is not a part of this. Images will be scaled differently for different platforms’ various screen resolutions. Shaders will be validated against various platform-specific extensions and bugs, if at all. A portable library should be able to include raw resource files, and simple named references to them. (There are semi-portable naming conventions, like @4x.png for a high-resolution image. Libraries can use such resource mappings at their own peril, it’s not really the concern of the C++ standard.)

This thread is way off track. It’s a red flag to even be thinking about the notions of text encoding and string termination. Just don’t.

David Krauss

unread,

Feb 20, 2015, 10:57:00 PM2/20/15

to std-pr...@isocpp.org

On 2015–02–21, at 11:50 AM, David Krauss <pot...@gmail.com> wrote:

Is it somehow better to put Herculean effort into achieving these tasks with C++ metaprogramming, instead of using ordinary tools with ordinary toolchains that exist today?

Sorry, I meant Sisyphean.