Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

utf-8 string literal

177 views
Skip to first unread message

Sascha Schwarz

unread,
Mar 14, 2016, 7:50:14 AM3/14/16
to
{ edited to shorten lines to ~70 characters. -mod }

Hello all.

Recently we were discussing if the following snippet is guaranteed to
compiles on all conforming platforms.

int main() {
// wikipedia's example from https://en.wikipedia.org/wiki/UTF-8
constexpr const char euro[] = u8"\u20ac";
static_assert(
sizeof euro == 4
&& euro[0] == static_cast<const char>(0b11100010)
&& euro[1] == static_cast<const char>(0b10000010)
&& euro[2] == static_cast<const char>(0b10101100),
"Not utf-8.");
}

Looking at 2.3 (Basic charset) and 2.14.5 (String literals) we _think_
so, but are not sure.

This came up whilst implementing Adobe's glyphlist in C++.
See https://github.com/adobe-type-tools/agl-aglfn


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Öö Tiib

unread,
Mar 14, 2016, 6:20:14 PM3/14/16
to

On Monday, 14 March 2016 13:50:14 UTC+2, Sascha Schwarz wrote:
> Recently we were discussing if the following snippet is guaranteed to
> compiles on all conforming platforms.
>
> int main() {
> // wikipedia's example from https://en.wikipedia.org/wiki/UTF-8
> constexpr const char euro[] = u8"\u20ac";
> static_assert(
> sizeof euro == 4
> && euro[0] == static_cast<const char>(0b11100010)
> && euro[1] == static_cast<const char>(0b10000010)
> && euro[2] == static_cast<const char>(0b10101100),
> "Not utf-8.");
> }
>
> Looking at 2.3 (Basic charset) and 2.14.5 (String literals) we _think_
> so, but are not sure.

Can you elaborate what makes you unsure?

Sascha Schwarz

unread,
Mar 15, 2016, 9:50:17 AM3/15/16
to

On Monday, 14 March 2016 23:20:14 UTC+1, 嘱 Tiib wrote:
>
> Can you elaborate what makes you unsure?

It comes down to the difference between "\u20ac" and u8"\u20ac".

My understanding is, that whilst there is no guarantee about the encoding of
the
former, the latter is encoded using utf-8, and the static_assert() holds.
0 new messages