Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

wchar_t aliasing

259 views
Skip to first unread message

markus

unread,
Sep 23, 2014, 3:23:11 AM9/23/14
to
On Windows wchar_t, char16_t and unsigned short/uint16_t all use the
same underlying type uint16_t. According the C++11 standard they are all
distinct types.

Looking at the Firefox source, they are (reinterpret)casting between
char/char16_t/wchar_t.

Looking at the Qt source, depending on the platform they are happily
casting between uint16_t/wchar_t or uint32_t/wchar_t.

From what I can tell, this does violate the aliasing rules. Am I
missing something? Do GCC/Visual C++ make some special guarantees that
this will work?

Öö Tiib

unread,
Sep 23, 2014, 3:18:53 PM9/23/14
to
On Tuesday, 23 September 2014 10:23:11 UTC+3, markus wrote:
> On Windows wchar_t, char16_t and unsigned short/uint16_t all use the
> same underlying type uint16_t. According the C++11 standard they are all
> distinct types.

"Windows" is series of operating systems. It does not deal with C++
fundamental types and typedefs. Did you mean some particular version
of some particular C++ compiler targeting some particular version of
Windows?

> Looking at the Firefox source, they are (reinterpret)casting between
> char/char16_t/wchar_t.
>
> Looking at the Qt source, depending on the platform they are happily
> casting between uint16_t/wchar_t or uint32_t/wchar_t.
>
> From what I can tell, this does violate the aliasing rules. Am I
> missing something? Do GCC/Visual C++ make some special guarantees that
> this will work?

Casting between different unsigned integral types is one of few things
that is quite fully well defined in C++ standard. Did you mean that
non-'char' pointers are somewhere taken and type-punned and then used
in mix in Qt and Firefox source code?

Paavo Helde

unread,
Sep 23, 2014, 3:22:00 PM9/23/14
to
markus <mar...@addr.is.invalid> wrote in news:lvr74b$gmj$1...@solani.org:
This is interesting. g++ 4.6 on Linux issues type pun warnings between 4-
byte wchar_t and uint32_t, but is silent for 2 byte wchar_t (achieved via -
fshort-wchar flag) and uint16_t. Is there indeed some special case for 16-
bit data, in addition to 8-bit?

Cheers
Paavo

Chris Vine

unread,
Sep 23, 2014, 8:47:49 PM9/23/14
to
Not as far as I am aware, but the mere act of casting in that way does
not of itself breach the strict aliasing rule in §3.10/10 of the
standard. It depends on what the code does with the cast value.

You can cast between a particular wchar_t value and its underlying
integer type without breaching the strict aliasing rule, and do
whatever you want with the cast value, since no aliasing takes place:
you end up with two independent values. Where you have arrays/strings
of wchar_t, you can again cast the decayed array of wchar_t to a
pointer to the underlying integer type and copy that pointer around
without breaking the strict aliaising rule. What you cannot do in the
last case is to dereference that pointer.

I have rarely seen code which does that. For example, on systems with
32 bit wchar_t, I have commonly used code which reinterpret_casts
std::wstring::c_str() to uint32_t* in order to call a library codeset
conversion function which takes a uint32_t* argument. Of itself that
is OK. Technically speaking there may be breach of the strict aliasing
rule when the library function dereferences the pointer, but since this
is done in a different compilation unit the compiler knows nothing
about it and has to take it on trust, so everything works. In this
kind of case you can "cheat" strict aliasing just by arranging your
translation units appropriately and use compatible types with the same
size, signedness and alignment. Possibly that is what the code you
have seen does.

Chris

markus

unread,
Sep 23, 2014, 10:01:07 PM9/23/14
to
On 2014-09-23 21:18, Öö Tiib wrote:> On Tuesday, 23 September 2014
10:23:11 UTC+3, markus wrote:
>> On Windows wchar_t, char16_t and unsigned short/uint16_t all use
>> the same underlying type uint16_t. According the C++11 standard
>> they are all distinct types.
>
> "Windows" is series of operating systems. It does not deal with C++
> fundamental types and typedefs.

The Win32/64 API defines wchar_t as UTF-16 and unsigned short as 16-bit int.

> Did you mean some particular version of some particular C++ compiler
> targeting some particular version of Windows?

All of them on Win32/64. AFAIK, everything tries to be compatible to
either Visual C++ or GCC/Mingw and they do what I mentioned.

>> Looking at the Firefox source, they are (reinterpret)casting
>> between char/char16_t/wchar_t.
>>
>> Looking at the Qt source, depending on the platform they are
>> happily casting between uint16_t/wchar_t or uint32_t/wchar_t.
>>
>> From what I can tell, this does violate the aliasing rules. Am I
>> missing something? Do GCC/Visual C++ make some special guarantees
>> that this will work?
>
> Casting between different unsigned integral types is one of few
> things that is quite fully well defined in C++ standard. Did you mean
> that non-'char' pointers are somewhere taken and type-punned and then
> used in mix in Qt and Firefox source code?

Yes, I'm taking about pointers. E.g. Firefox has a helper for automagic
casting (Char16.h):

class char16ptr_t
{
private:
const char16_t* mPtr;
public:
char16ptr_t(const char16_t* aPtr) : mPtr(aPtr) {}
char16ptr_t(const wchar_t* aPtr) :
mPtr(reinterpret_cast<const char16_t*>(aPtr))
{}
operator const char16_t*() const
{
return mPtr;
}
operator const wchar_t*() const
{
return reinterpret_cast<const wchar_t*>(mPtr);
}
operator std::wstring() const
{
return std::wstring(static_cast<const wchar_t*>(*this));
}
/* Explicit cast operators to allow things like (char16_t*)str. */
explicit operator char16_t*() const
{
return const_cast<char16_t*>(mPtr);
}
explicit operator wchar_t*() const
{
return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
}
explicit operator const char*() const
{
return reinterpret_cast<const char*>(mPtr);
}
explicit operator const unsigned char*() const
{
return reinterpret_cast<const unsigned char*>(mPtr);
}
explicit operator unsigned char*() const
{
return const_cast<unsigned char*>(reinterpret_cast<const unsigned
char*>(mPtr));
}

Chris Vine

unread,
Sep 24, 2014, 5:45:38 AM9/24/14
to
The conversion operators to char*/unsigned char*/const char* do not
break the strict aliasing rule by definition. Apart from the operator
std::wstring method, the others might break strict aliasing, depending
on the circumstances in which the returned pointer is dereferenced.
The conversion operator to std::wstring does appear to break strict
aliasing unless when used the source of the char16ptr_t argument was
originally a wchar_t string which had been cast to char16ptr_t. (You
can cast from pointer to one type to a pointer to another type, then
cast back to pointer to the original type and then dereference that.)

Encouraging casting like this seems somewhat undesirable (a programmer
should I think be forced to think about what she is doing with respect
to aliasing when type punning). However, even though the standard
states that wchar_t, char16_t and char32_t are distinct types, it would
not surprise me if compilers in fact permit type punning between
char16_t* and wchar_t* (for 16 bit wchar_t) or char32_t* and wchar_t*
(for 32 bit wchar_t), as well as between them and their underlying
integral types. In practice you will also get away with it if the
pointer casts are to arguments of functions in different compilation
units.

Chris

Nobody

unread,
Sep 27, 2014, 1:08:12 AM9/27/14
to
On Tue, 23 Sep 2014 14:21:31 -0500, Paavo Helde wrote:

> This is interesting. g++ 4.6 on Linux issues type pun warnings between 4-
> byte wchar_t and uint32_t, but is silent for 2 byte wchar_t (achieved via -
> fshort-wchar flag) and uint16_t. Is there indeed some special case for 16-
> bit data, in addition to 8-bit?

wchar_t defaults to "int" (note: not unsigned); -fshort-wchar changes it
to "short unsigned int".

$ gcc -E -dM -x c /dev/null | fgrep __WCHAR_TYPE__
#define __WCHAR_TYPE__ int
$ gcc -E -dM -x c -fshort-wchar /dev/null | fgrep __WCHAR_TYPE__
#define __WCHAR_TYPE__ short unsigned int

Paavo Helde

unread,
Sep 27, 2014, 3:35:30 AM9/27/14
to
Nobody <nob...@nowhere.invalid> wrote in
news:pan.2014.09.27....@nowhere.invalid:
Good to know, but it appears signed/unsigned does not change the pun
warnings, I tried with both signed and unsigned types.

tmp> cat test1.cpp
int main() {
wchar_t c=0;
*((TYPE * )&c) = 1;
return c;
}

tmp> g++ -O2 -fstrict-aliasing -Wstrict-aliasing -DTYPE=int test1.cpp
test1.cpp: In function ?int main()?:
test1.cpp:3:18: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]

tmp> g++ -O2 -fstrict-aliasing -Wstrict-aliasing -DTYPE='unsigned int'
test1.cpp
test1.cpp: In function ?int main()?:
test1.cpp:3:18: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]

tmp> g++ -O2 -fstrict-aliasing -Wstrict-aliasing -DTYPE='unsigned
short' -fshort-wchar test1.cpp

tmp> g++ -O2 -fstrict-aliasing -Wstrict-aliasing -DTYPE='short' -fshort-
wchar test1.cpp

tmp> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --
mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --
enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-
checking=release --with-gxx-include-dir=/usr/include/c++/4.6 --enable-ssp
--disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --
with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-
libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-
specific-runtime-libs --program-suffix=-4.6 --enable-linux-futex --
without-system-libunwind --with-arch-32=i586 --with-tune=generic --
build=x86_64-suse-linux
Thread model: posix
gcc version 4.6.2 (SUSE Linux)
0 new messages