Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C++ preprocessor and multi-line string literals

97 views
Skip to first unread message

Paavo Helde

unread,
Jan 16, 2020, 10:37:31 AM1/16/20
to

In C++ there is a preprocessor step which logically comes first and
prepares the preprocessed source consisting of remaining tokens.
However, preprocessor is line-based, so it ought to conflict with the
multi-line string literals:

#include <iostream>

#if 1

int main() {
const char* test = R"__(

#else

)__";
std::cout << test;
}

#endif

Obviously C++ compilers supporting C++11 are able to cope with that and
handle the '#else' line as part of the string literal. My question is
more about how they were able to standardize this behavior? Is there
still a logically separate preprocessor step or not?

By googling I also find some EBNF formal syntax descriptions for C++,
but these are different for the preprocessor and for the actual C++
language. With presence of multi-line string literals, is it even
possible to provide a self-consistent EBNF formal syntax for the C++
preprocessor?

Öö Tiib

unread,
Jan 16, 2020, 12:47:02 PM1/16/20
to
[lex.phases]1.1.
| Physical source file characters are mapped, in an implementation-defined
| manner, to the basic source character set (introducing new-line
| characters for end-of-line indicators) if necessary. The set of
| physical source file characters accepted is implementation-defined. Any
| source file character not in the basic source character set (5.3) is
| replaced by the universal-character-name that designates that
| character. An implementation may use any internal encoding, so long
| as an actual extended character encountered in the source file, and
| the same extended character expressed in the source file as a
| universal-charactername (e.g., using the \uXXXX notation), are
| handled equivalently except where this replacement is reverted (5.4)
| in a *raw* *string* *literal*.

So the raw string literal is already a token of very first translation
step. Preprocessing takes place in step 4. By that time it has been
string literal for 3 steps already.

Richard

unread,
Jan 16, 2020, 3:07:09 PM1/16/20
to
[Please do not mail me a copy of your followup]

Paavo Helde <myfir...@osa.pri.ee> spake the secret code
<qvpvvf$h2p$1...@dont-email.me> thusly:

>Obviously C++ compilers supporting C++11 are able to cope with that and
>handle the '#else' line as part of the string literal. My question is
>more about how they were able to standardize this behavior? Is there
>still a logically separate preprocessor step or not?

In N3242, section "2.14.5 String literals" it gives the description for
a raw string literal. String literals are one of the tokens seen by
the preprocessor as described in section "2.7 Tokens".

<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf>

So yes, raw string literals are turned into a single token as seen by
the preprocessor.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Manfred

unread,
Jan 16, 2020, 4:49:14 PM1/16/20
to
(Using n4820.pdf)
I think the reference in here of a string literal is a forward reference
(to section 5.4 [lex.pptoken]) and it only applies to handling of
universal-character-names, it does not define string literals yet.

However, step 3 of [lex.phases] says: "The source file is decomposed
into preprocessing tokens (5.4)".
Preprocessing tokens are defined in [lex.pptoken] (section 5.4 in
n4820), and the list includes "string-literal" (specifically, it is
defined at p3.1).
The following step 4 then executes preprocessing directives.

So, yes string literals are processed before preprocessing directives,
but one step earlier, not three.

Öö Tiib

unread,
Jan 16, 2020, 5:43:46 PM1/16/20
to
Are you suggesting what are the raw string literals should be
re-decided in later steps?


Manfred

unread,
Jan 16, 2020, 7:25:04 PM1/16/20
to
No, "what are the raw string literals" is "decided" in step 3, and not
in step 1.
I suggest you re-read step 1, more carefully.

Öö Tiib

unread,
Jan 16, 2020, 7:38:43 PM1/16/20
to
Maybe I am confused indeed. Can we compose raw string literals
using \u0052 <- 'R' in step 1 for later steps?

Manfred

unread,
Jan 16, 2020, 9:27:22 PM1/16/20
to
Step 1 is not about composing raw string literals.
Do you mean \u0052"__(Hello World!)__" ?

This doesn't seem to work, in fact step 1 talks about conversion of
"extended characters", and R is not an extended character.

[lex.charset] says that "The universal-character-name construct provides
a way to name other characters" than the basic ones.

This might be the reason why gcc complains about universal character not
allowed in the above.

Juha Nieminen

unread,
Jan 20, 2020, 8:53:37 AM1/20/20
to
Paavo Helde <myfir...@osa.pri.ee> wrote:
> In C++ there is a preprocessor step which logically comes first and
> prepares the preprocessed source consisting of remaining tokens.
> However, preprocessor is line-based, so it ought to conflict with the
> multi-line string literals:
>
> #include <iostream>
>
> #if 1
>
> int main() {
> const char* test = R"__(
>
> #else
>
> )__";
> std::cout << test;
> }
>
> #endif

How about the other way around? Is this valid C++ or not?

//------------------------------
#include <iostream>

#define TEST R"#(line1
line2
line3
)#"

int main()
{
std::cout << TEST;
}
//------------------------------

Paavo Helde

unread,
Jan 20, 2020, 11:31:30 AM1/20/20
to
Apparently it is valid, and for the same reasons as my original example
- the multi-line string literal is a single preprocessor token.

My original confusion came from the false impression that the
preprocessor operates on line basis.

Bonita Montero

unread,
Jan 20, 2020, 12:20:01 PM1/20/20
to
> How about the other way around? Is this valid C++ or not?
>
> //------------------------------
> #include <iostream>
>
> #define TEST R"#(line1
> line2
> line3
> )#"
>
> int main()
> {
> std::cout << TEST;
> }
> //------------------------------

Doesn't matter if this is valid or not because no one does
such stupid things.

Juha Nieminen

unread,
Jan 21, 2020, 3:24:31 AM1/21/20
to
I am tempted to use less-than-pleasant words as a response, but I think
I will abstain. Instead, I will present an actual real-life situation where
such a thing is useful.

When programming with something that uses OpenGL, and thus GLSL for shaders,
the shaders are their own programming language inside strings. You could have
these GLSL sources in separate files, or you could have them as string literals
inside your program. The advantage of having them as string literals is that
you don't need to include extraneous files in your project (which may even be
easily modified by somebody), and you can build your strings from parts.

If there's some common GLSL code that could be useful in several shaders, you
can create string literales by concatenating separate ones. One easy way to do
that is to use macros for the string literals containing common code.

So you could have something like:

#define GLSL_COMMON_FUNC R"#(
float func(float) {
... whatever ...
}
)#"

After which you can create a shader using that function like

const char* const myShader = GLSL_COMMON_FUNC
"void main() { etc etc }"

Sure, you could just add the \ symbols at the end of each line in the macro,
but why bother, if you don't have to?

bol...@nowhere.org

unread,
Jan 21, 2020, 11:43:25 AM1/21/20
to
Not with Clang you can't:

t.c:3:27: warning: missing terminating '"' character [-Winvalid-pp-token]
#define GLSL_COMMON_FUNC R"#(
^
t.c:7:1: error: expected identifier or '('
)#"
^
t.c:7:3: warning: missing terminating '"' character [-Winvalid-pp-token]
)#"

Is this a gcc extension?

Paavo Helde

unread,
Jan 21, 2020, 12:25:20 PM1/21/20
to
https://godbolt.org/ shows that this compiles fine with clang 6.0.0 and
later. Clang 5.0.0 indeed fails - it looks like it's time to upgrade,
the current version is 9.0.0.

bol...@nowhere.org

unread,
Jan 22, 2020, 4:58:35 AM1/22/20
to
They can say it compiles fine all they like, I'm telling you it doesn't:

fenris$ clang -v
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
fenris$ cat t.c
#define GLSL_COMMON_FUNC R"#(
float func(float f) {
return 0.0;
>}
>)#"

int main()
{
return 0;
}
fenris$ cc t.c
t.c:1:27: warning: missing terminating '"' character [-Winvalid-pp-token]
#define GLSL_COMMON_FUNC R"#(
^
t.c:4:1: error: expected expression
>}
^
t.c:4:2: error: expected expression
>}
^
t.c:5:1: error: expected identifier or '('
>)#"
^
t.c:5:4: warning: missing terminating '"' character [-Winvalid-pp-token]
>)#"
^
2 warnings and 3 errors generated.
fenris$ mv t.c t.cc
fenris$ c++ t.cc
t.cc:1:27: warning: missing terminating '"' character [-Winvalid-pp-token]
#define GLSL_COMMON_FUNC R"#(
^
t.cc:4:1: error: expected expression
>}
^
t.cc:4:2: error: expected expression
>}
^
t.cc:5:1: error: expected unqualified-id
>)#"
^
t.cc:5:4: warning: missing terminating '"' character [-Winvalid-pp-token]
>)#"
^
2 warnings and 3 errors generated.

guinne...@gmail.com

unread,
Jan 22, 2020, 5:11:36 AM1/22/20
to
It does if you use the C++ compiler (you're using the C compiler, above).

C doesn't have raw string literals; C++ does.

bol...@nowhere.org

unread,
Jan 22, 2020, 5:25:07 AM1/22/20
to
Are you blind? Try looking a bit harder.

Paavo Helde

unread,
Jan 22, 2020, 6:28:18 AM1/22/20
to
On 22.01.2020 11:58, bol...@nowhere.org wrote:
> On Tue, 21 Jan 2020 19:25:06 +0200
> Paavo Helde <myfir...@osa.pri.ee> wrote:
>> On 21.01.2020 18:43, bol...@nowhere.org wrote:
>>> Not with Clang you can't:
>>>
>>> t.c:3:27: warning: missing terminating '"' character [-Winvalid-pp-token]
>>> #define GLSL_COMMON_FUNC R"#(
>>> ^
>>> t.c:7:1: error: expected identifier or '('
>>> )#"
>>> ^
>>> t.c:7:3: warning: missing terminating '"' character [-Winvalid-pp-token]
>>> )#"
>>>
>>> Is this a gcc extension?
>>>
>>
>> https://godbolt.org/ shows that this compiles fine with clang 6.0.0 and
>> later. Clang 5.0.0 indeed fails - it looks like it's time to upgrade,
>> the current version is 9.0.0.
>
> They can say it compiles fine all they like, I'm telling you it doesn't:
>
> fenris$ clang -v
> Apple LLVM version 10.0.1 (clang-1001.0.46.4)
> Target: x86_64-apple-darwin18.7.0
> Thread model: posix
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
[...]
> fenris$ c++ t.cc

And how is this 'c++' program related to the 'clang' program you used
for -v?

> t.cc:1:27: warning: missing terminating '"' character [-Winvalid-pp-token]
> #define GLSL_COMMON_FUNC R"#(
> ^
> t.cc:4:1: error: expected expression
>> }
> ^
> t.cc:4:2: error: expected expression
>> }
> ^
> t.cc:5:1: error: expected unqualified-id
>> )#"
> ^
> t.cc:5:4: warning: missing terminating '"' character [-Winvalid-pp-token]
>> )#"
> ^
> 2 warnings and 3 errors generated.

Your example works fine with my clang:

gemma$ clang++ -v
clang version 7.0.1 (tags/RELEASE_701/final 349238)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib64/gcc/x86_64-suse-linux/7
Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-suse-linux/7
Candidate multilib: .;@m64
Selected multilib: .;@m64
gemma$ cat test1.cpp
#include <iostream>

#define GLSL_COMMON_FUNC R"#(
float func(float f) {
return 0.0;
}
)#"

int main()
{
std::cout << GLSL_COMMON_FUNC;
}
gemma$ clang++ test1.cpp
gemma$ ./a.out

Juha Nieminen

unread,
Jan 22, 2020, 7:03:33 AM1/22/20
to
bol...@nowhere.org wrote:
> fenris$ cat t.c
> fenris$ cc t.c
> t.c:1:27: warning: missing terminating '"' character [-Winvalid-pp-token]

You are compiling as C, not as C++.

bol...@nowhere.org

unread,
Jan 22, 2020, 7:07:14 AM1/22/20
to
On Wed, 22 Jan 2020 13:28:08 +0200
Paavo Helde <myfir...@osa.pri.ee> wrote:
>On 22.01.2020 11:58, bol...@nowhere.org wrote:
>> On Tue, 21 Jan 2020 19:25:06 +0200
>> fenris$ clang -v
>> Apple LLVM version 10.0.1 (clang-1001.0.46.4)
>> Target: x86_64-apple-darwin18.7.0
>> Thread model: posix
>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>[...]
>> fenris$ c++ t.cc
>
>And how is this 'c++' program related to the 'clang' program you used
>for -v?

fenris$ ls -l /usr/bin/c++
lrwxr-xr-x 1 root wheel 7 27 Sep 2018 /usr/bin/c++ -> clang++

>gemma$ cat test1.cpp
>#include <iostream>
>
>#define GLSL_COMMON_FUNC R"#(
>float func(float f) {
> return 0.0;
>}
>)#"
>
>int main()
>{
> std::cout << GLSL_COMMON_FUNC;
>}
>gemma$ clang++ test1.cpp
>gemma$ ./a.out
>
>float func(float f) {
> return 0.0;
>}

fenris$ c++ test1.cpp
test1.cpp:3:27: warning: missing terminating '"' character [-Winvalid-pp-token]
#define GLSL_COMMON_FUNC R"#(
^
test1.cpp:7:1: error: expected unqualified-id
)#"
^
test1.cpp:7:3: warning: missing terminating '"' character [-Winvalid-pp-token]
)#"
^
2 warnings and 1 error generated.
fenris$

Perhaps Apples version of Clang is different in some way, but regardless, it
doesn't compile.

Öö Tiib

unread,
Jan 22, 2020, 8:33:15 AM1/22/20
to
On Wednesday, 22 January 2020 14:07:14 UTC+2, bol...@nowhere.org wrote:
>
> Perhaps Apples version of Clang is different in some way, but regardless, it
> doesn't compile.

Can't reproduce on Apple.

Perhaps you just call wrong compiler or in C mode
or miss -std=c++11 or better from command line or
some other such typical noob error.

C++ has some baby proof locks on all platforms.
With it is too easy to shoot one's leg off or the like
but babies lose interest if it does not compile or
link or the like.

bol...@nowhere.org

unread,
Jan 22, 2020, 11:03:57 AM1/22/20
to
Oh do go fuck yourself you patronising little tit.

Manfred

unread,
Jan 22, 2020, 12:59:41 PM1/22/20
to
On 1/22/2020 1:07 PM, bol...@nowhere.org wrote:
>> And how is this 'c++' program related to the 'clang' program you used
>> for -v?
> fenris$ ls -l /usr/bin/c++
> lrwxr-xr-x 1 root wheel 7 27 Sep 2018 /usr/bin/c++ -> clang++

You may want to check with
$ which c++

instead of ls, and
$ c++ -v

>
>> gemma$ cat test1.cpp
>> #include <iostream>
>>
>> #define GLSL_COMMON_FUNC R"#(
>> float func(float f) {
>> return 0.0;
>> }
>> )#"
>>
>> int main()
>> {
>> std::cout << GLSL_COMMON_FUNC;
>> }
>> gemma$ clang++ test1.cpp
>> gemma$ ./a.out
>>
>> float func(float f) {
>> return 0.0;
>> }
> fenris$ c++ test1.cpp

why not
$ clang++ test1.cpp


> test1.cpp:3:27: warning: missing terminating '"' character [-Winvalid-pp-token]
> #define GLSL_COMMON_FUNC R"#(
> ^
> test1.cpp:7:1: error: expected unqualified-id
> )#"
> ^
> test1.cpp:7:3: warning: missing terminating '"' character [-Winvalid-pp-token]
> )#"
> ^
> 2 warnings and 1 error generated.
> fenris$

> Perhaps Apples version of Clang is different in some way, but regardless, it
> doesn't compile.

It compiles fine here too with clang 8.0.0, but fails with gcc 9.2

For some reason your compiler behaves differently than expected. You may
want to know why (more than complaining about it).

Öö Tiib

unread,
Jan 22, 2020, 1:50:55 PM1/22/20
to
You think that vulgar language makes you to look more
immature and so fits better with being a rookie? :D

bol...@nowhere.org

unread,
Jan 23, 2020, 4:23:09 AM1/23/20
to
On Wed, 22 Jan 2020 10:50:41 -0800 (PST)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Wednesday, 22 January 2020 18:03:57 UTC+2, bol...@nowhere.org wrote:
>> On Wed, 22 Jan 2020 05:33:00 -0800 (PST)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> >On Wednesday, 22 January 2020 14:07:14 UTC+2, bol...@nowhere.org wrote:
>> >>
>> >> Perhaps Apples version of Clang is different in some way, but regardless,
>it
>> >> doesn't compile.
>> >
>> >Can't reproduce on Apple.
>> >
>> >Perhaps you just call wrong compiler or in C mode
>> >or miss -std=c++11 or better from command line or
>> >some other such typical noob error.
>> >
>> >C++ has some baby proof locks on all platforms.
>> >With it is too easy to shoot one's leg off or the like
>> >but babies lose interest if it does not compile or
>> >link or the like.
>>
>> Oh do go fuck yourself you patronising little tit.
>
>You think that vulgar language makes you to look more
>immature and so fits better with being a rookie? :D

Says the man sitting in his baltic igloo who never posts anything useful, just
trolls others.

James Kuyper

unread,
Jan 23, 2020, 10:26:02 PM1/23/20
to
On 1/22/20 5:11 AM, guinne...@gmail.com wrote:
> On Wednesday, 22 January 2020 09:58:35 UTC, bol...@nowhere.org wrote:
...
>> fenris$ c++ t.cc
>> t.cc:1:27: warning: missing terminating '"' character [-Winvalid-pp-token]
>> #define GLSL_COMMON_FUNC R"#(
>> ^
>> t.cc:4:1: error: expected expression
>>> }
>> ^
>> t.cc:4:2: error: expected expression
>>> }
>> ^
>> t.cc:5:1: error: expected unqualified-id
>>> )#"
>> ^
>> t.cc:5:4: warning: missing terminating '"' character [-Winvalid-pp-token]
>>> )#"
>> ^
>> 2 warnings and 3 errors generated.
>
> It does if you use the C++ compiler (you're using the C compiler, above).
>
> C doesn't have raw string literals; C++ does.

How did you conclude that c++ is the name of a C compiler?
0 new messages