Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Core language basic types support

89 views
Skip to first unread message

Alf P. Steinbach

unread,
Dec 10, 2015, 8:50:18 AM12/10/15
to
Due to its backward compatibility with C, from the 1970s, C++'s set of
built-in types is not ideally suited for modern programming as of 2015.

The C99 "<stdint.h>" header + new C++11 character types (char16_t and
char32_t) somewhat alleviate the problems, but in particular:

• No standard signed size type (Posix has ssize_t).
• No system dependent Unicode character type (like int is for integers).
• No strongly typed character types.

In addition, for example, in practice all computers now use two's
complement representation of signed integers (that can be relied on),
but there is now simple standard conforming way to cast an unsigned
value to signed of same size, without giving the compiler Carte Blanche
to do anything due to formal UB (and g++ will act on thise C.B.).

The enclosed header and unit test is my attempt to deal with these
issues, as part of the core language support I posted about earlier.
This is a work in progress, so in particular the unit testing is just
direct use of Microsoft's framework for Visual Studio. The goal is to
compile and test this also with g++; discussion and/or advice for that
is very welcome, plus of course, discussion and/or advice for the header
itself is very welcome.

Cheers,

- Alf

basic_types.hpp
basic_types_test.cpp

Alf P. Steinbach

unread,
Dec 10, 2015, 9:08:50 AM12/10/15
to
On 12/10/2015 2:49 PM, Alf P. Steinbach wrote:
>...
> The enclosed header and unit test is my attempt to deal with these
> issues, as part of the core language support I posted about earlier.
> This is a work in progress, so in particular the unit testing is just
> direct use of Microsoft's framework for Visual Studio. The goal is to
> compile and test this also with g++; discussion and/or advice for that
> is very welcome, plus of course, discussion and/or advice for the header
> itself is very welcome.

<file "basic_types.hpp">
#pragma once
// core_language_support/basic_types.hpp
// Copyright © Alf P. Steinbach 2015. Boost Software License 1.0.

#include <p/cppx/tmp/Type_set.hpp> // cppx::Type_set
#include <p/cppx/tmp/sfinae.hpp> // cppx::If_
#include <p/cppx/core_language_support/static_assert.h> //
CPPX_STATIC_ASSERT

#include <assert.h> // assert
#include <limits> // std::numeric_limits
#include <limits.h> // CHAR_BIT
#include <stddef.h> // size_t
#include <type_traits> // std::conditional_t

// For a discussion of Syschar see the article "Portable String Literals
in C++" by
// Alf P. Steinbach, ACCU Overload journal August 2013; the article is
available online
// at <url: http://accu.org/index.php/journals/1842>. Eessentially it's
the natural
// Unicode character endcoding unit for the system at hand, strictly
typed as an enum.
// As an enum it's compatible with std::basic_string short buffer
optimization.
#ifdef _WIN32
# define CPPX_SYSCHAR_IS_WIDE 1 // Implies UTF-16 encoding.
#else
# define CPPX_SYSCHAR_IS_WIDE 0 // Impliies UTF-8 encoding.
#endif

namespace progrock{ namespace cppx{
using std::conditional_t;
using std::is_integral;
using std::make_unsigned_t;
using std::numeric_limits;

// With some compilers/options the UB of overflowing a signed
integer is real, or
// at least causes annoying and troublesome sillywarnings, so:
template< class Result, class Arg
, class Enabled_ = If_< is_integral<Result> >
>
auto wrap_to( Arg const v )
-> Result
{
CPPX_STATIC_ASSERT( numeric_limits<Result>::is_modulo );
// Additionally assumes that there are no trap representation bits.
using Unsigned = make_unsigned_t<Result>;
Unsigned const wrapped = static_cast<Unsigned>( v );
return reinterpret_cast<Result const&>( wrapped );
}

using Byte = unsigned char;
int const bits_per_byte = CHAR_BIT;

using Index = ptrdiff_t;
using Size = Index;

using Ascii = enum: char // Strongly typed ASCII.
{
nul = '\0',
bel = '\a', // alarm / bell, ^G, 7
bs = '\b', // backspace, ^H, 8
tab = '\t', // tab, ^I, 9
lf = '\n', // linefeed / newline, ^J, 10
vt = '\v', // vertical tab, ^K, 11
ff = '\f', // formfeed, ^L, 12
cr = '\r', // carriage return, ^M, 13
xon = 17, // device control 1 / xon, ^Q, 17
xoff = 19, // device control 3 / xoff, ^S, 19, "stop"
esc = 27, // escape
space = 32,
del = 127 // delete
};
using Utf8 = enum: Byte {}; // Strongly typed UTF-8.
using Utf16 = enum: char16_t {}; // Strongly typed UTF-16.
using Utf32 = enum: char32_t {}; // Strongly typed UTF-32.

bool constexpr syschar_is_wide = !!CPPX_SYSCHAR_IS_WIDE;
bool constexpr syschar_is_byte = not syschar_is_wide;

using Syschar_base = conditional_t< syschar_is_wide, wchar_t, char >;
using Syschar = enum: Syschar_base {};

inline
auto is_ascii( char const code )
-> bool
{ return Byte( code ) <= Byte( Ascii::del ); }

template< class Iter >
auto is_ascii( Iter const first, Iter const after )
{
for( Iter it = first; it != after; ++it )
{
if( not is_ascii( *it ) )
{
return false;
}
}
return true;
}

// Types that can ordinarily be assumed to represent character
encoding values:
using Character_types = Type_set <
// System-dependent encoding:
Syschar, char, wchar_t,
// Fixed encoding:
Ascii, Utf8, Utf16, Utf32, char16_t, char32_t
>;

// Types that can ordinarily be assumed to represent UTF-8 encoded
text.
//
// In Windows no built-in C++ type can be assumed to represent
UTF-8 encoded
// text, but in Unix-land "char" is usually UTF-8.
using Basic_utf8_types = Type_set<Utf8>;
using Utf8_types = conditional_t<
/* if */ syschar_is_wide,
/* then */ Basic_utf8_types, // Windows.
/* else */ Union_< Basic_utf8_types, char > // Unix-land.
>;

// Types that can ordinarily be assumed to represent UTF-16 encoded
text:
using Basic_utf16_types = Type_set< Utf16, char16_t >;
using Utf16_types = conditional_t<
/* if */ syschar_is_wide, // Implies Windows & 16 bits
wchar_t
/* then */ Union_< Basic_utf16_types, Type_set< Syschar,
wchar_t > >,
/* else */ Basic_utf16_types
>;

// Types that can ordinarily be assumed to represent UTF-32 encoded
text:
using Basic_utf32_types = Type_set< Utf32, char32_t >;
using Utf32_types = conditional_t<
/* if */ syschar_is_wide, // Implies Windows & 16 bits
wchar_t
/* then */ Basic_utf32_types,
/* else */ Union_< Basic_utf32_types, wchar_t >
>;

}} // namespace progrock::cppx
</file>


<file "basic_types_test.cpp">
#include <p/cppx/core_language_support/basic_types.hpp>
#include <iterator> // std::begin, std::end

#include "ms_unit_test.hpp"

namespace ut = Microsoft::VisualStudio::CppUnitTestFramework;
namespace cppx = progrock::cppx;
using std::begin;
using std::end;

namespace cppx_test
{
TEST_CLASS(basic_types)
{
public:
TEST_METHOD( wrap_to )
{
unsigned const u = unsigned( -12345 );
int const i = cppx::wrap_to<int>( u );

ut::Assert::AreEqual( u, unsigned( i ), L"Test 100" );
}

TEST_METHOD( is_ascii__char )
{
using cppx::wrap_to;

ut::Assert::AreEqual( true, cppx::is_ascii(
wrap_to<char>( 0 ) ), L"Test 100" );
ut::Assert::AreEqual( true, cppx::is_ascii(
wrap_to<char>( 127 ) ), L"Test 200" );
ut::Assert::AreEqual( false, cppx::is_ascii(
wrap_to<char>( -1 ) ), L"Test 300" );
ut::Assert::AreEqual( false, cppx::is_ascii(
wrap_to<char>( 128 ) ), L"Test 400" );
ut::Assert::AreEqual( false, cppx::is_ascii(
wrap_to<char>( 255 ) ), L"Test 500" );
}

TEST_METHOD( is_ascii__range )
{
char const empty = 0;
char const blah[] = "Blah";
char const norw[] = "Blåbærsyltetøy";

ut::Assert::AreEqual( true, cppx::is_ascii( &empty,
&empty ), L"Test 000" );
ut::Assert::AreEqual( true, cppx::is_ascii( begin( blah
), end( blah ) ), L"Test 100" );
ut::Assert::AreEqual( false, cppx::is_ascii( begin( norw
), end( norw ) ), L"Test 200" );
}
};
} // namespace cppx_test
</file>


Cheers,

- Alf

Juha Nieminen

unread,
Dec 10, 2015, 9:47:14 AM12/10/15
to
Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
> ??? No standard signed size type (Posix has ssize_t).

Isn't std::ptrdiff_t exactly that?

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Alf P. Steinbach

unread,
Dec 10, 2015, 10:19:38 AM12/10/15
to
On 12/10/2015 3:46 PM, Juha Nieminen wrote:
> Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
>> ??? No standard signed size type (Posix has ssize_t).
>
> Isn't std::ptrdiff_t exactly that?

Well yes, except the /name/, which signals intent. :)

So I define Size = ptrdiff_t.

I think probably the only reason for Posix' ssize_t is the name, since
much of the point of textual source code is to communicate to humans.



Cheers!,

- Alf


David Brown

unread,
Dec 10, 2015, 10:53:30 AM12/10/15
to
On 10/12/15 14:49, Alf P. Steinbach wrote:
> Due to its backward compatibility with C, from the 1970s, C++'s set of
> built-in types is not ideally suited for modern programming as of 2015.
>
> The C99 "<stdint.h>" header + new C++11 character types (char16_t and
> char32_t) somewhat alleviate the problems, but in particular:
>
> • No standard signed size type (Posix has ssize_t).

Why would you need a signed size type? Posix uses it for some IO
functions, in order to return a value that is either a positive count of
bytes on success, or a negative error value on failure. If you are
using these functions, you are using Posix and have ssize_t. If you are
using C++, you might want to use exceptions for your errors rather than
"negative size".

> • No system dependent Unicode character type (like int is for integers).

char, char16_t and char32_t are already suitable types. They are system
dependent, in that they may not actually be 8-bit, 16-bit or 32-bit on
all systems, but otherwise they are standardised.

> • No strongly typed character types.

That might be a nice idea to have.

>
> In addition, for example, in practice all computers now use two's
> complement representation of signed integers (that can be relied on),
> but there is now simple standard conforming way to cast an unsigned
> value to signed of same size, without giving the compiler Carte Blanche
> to do anything due to formal UB (and g++ will act on thise C.B.).

As far as I can see, you can make "x" unsigned by:

static_cast<std::make_unsigned_t<decltype(x)>>(x)

You might not call that "simple", but it is reasonably clear.
Converting a signed type into an unsigned type cannot be undefined
behaviour as far as I can see.

>
> The enclosed header and unit test is my attempt to deal with these
> issues, as part of the core language support I posted about earlier.
> This is a work in progress, so in particular the unit testing is just
> direct use of Microsoft's framework for Visual Studio. The goal is to
> compile and test this also with g++; discussion and/or advice for that
> is very welcome, plus of course, discussion and/or advice for the header
> itself is very welcome.
>

Would it make sense to put "constexpr" in the code in some cases?


Alf P. Steinbach

unread,
Dec 10, 2015, 11:41:25 AM12/10/15
to
On 12/10/2015 4:53 PM, David Brown wrote:
> On 10/12/15 14:49, Alf P. Steinbach wrote:
>> Due to its backward compatibility with C, from the 1970s, C++'s set of
>> built-in types is not ideally suited for modern programming as of 2015.
>>
>> The C99 "<stdint.h>" header + new C++11 character types (char16_t and
>> char32_t) somewhat alleviate the problems, but in particular:
>>
>> • No standard signed size type (Posix has ssize_t).
>
> Why would you need a signed size type?

I'm in the camp who sees bugs resulting from inadvertent promotions to
unsigned type, as very unnecessary: just by using signed types for
numbers, and reserving unsigned types for bitlevel stuff, it's avoided.

Well, mostly. ;-)

In the single case (that I know of) where the C++ standard library
represents a size as signed, namely "count" and "count_if", it uses the
iterator difference type, which for pointers is "ptrdiff_t".


> Posix uses it for some IO
> functions, in order to return a value that is either a positive count of
> bytes on success, or a negative error value on failure. If you are
> using these functions, you are using Posix and have ssize_t. If you are
> using C++, you might want to use exceptions for your errors rather than
> "negative size".

C++ now supports Posix, at least to some degree. In particular with
C++11 conversions between function and data pointers, required by Posix,
was allowed. It's up the implemention whether it's allowed.



>> • No system dependent Unicode character type (like int is for integers).
>
> char, char16_t and char32_t are already suitable types. They are system
> dependent, in that they may not actually be 8-bit, 16-bit or 32-bit on
> all systems, but otherwise they are standardised.

No, that's not usefully system dependent.

I'm thinking of the OS API mainly, because that string convention works
its way up through application code. So in Unix-land you have "char" as
UTF-8, while in Windows "char" is Windows ANSI (which varies), and
"wchar_t" is UTF-16.

With proper core language support one would be able to write e.g.

s"Blah"

and have that literal's type as Unicode syschar[5] in both worlds, but
with different underlying type and encoding.

That's the kind of portability that "int" offers: it's right for the
system at hand, and yields one common source code that adapts
automatically to the system it's compiled on.


>> • No strongly typed character types.
>
> That might be a nice idea to have.

Yep. :)

Unfortunately C++ user defined literals are not up to the task, as far
as I know, so for literals this idea involves macros.

Until the core language gains the necessary support, that is (if it does).


>> In addition, for example, in practice all computers now use two's
>> complement representation of signed integers (that can be relied on),
>> but there is now simple standard conforming way to cast an unsigned
>> value to signed of same size, without giving the compiler Carte Blanche
>> to do anything due to formal UB (and g++ will act on thise C.B.).
>
> As far as I can see, you can make "x" unsigned by:
>
> static_cast<std::make_unsigned_t<decltype(x)>>(x)
>
> You might not call that "simple", but it is reasonably clear.

Yes, but it's in the wrong direction.

I was talking about casting (back) to signed, without giving the
compiler Unsound Optimization Ideas™ based on formal UB here and there.

That's the "wrap_to" function in the code (it uses conversion to
unsigned, exactly as you showed here, as a first step, to get
well-defined modulo wrapping). I guess it can be done in a more elegant
way. And not sure if it /really/ avoids the compiler problem.


> Converting a signed type into an unsigned type cannot be undefined
> behaviour as far as I can see.

Right, that's always well-defined.


>> The enclosed header and unit test is my attempt to deal with these
>> issues, as part of the core language support I posted about earlier.
>> This is a work in progress, so in particular the unit testing is just
>> direct use of Microsoft's framework for Visual Studio. The goal is to
>> compile and test this also with g++; discussion and/or advice for that
>> is very welcome, plus of course, discussion and/or advice for the header
>> itself is very welcome.
>>
>
> Would it make sense to put "constexpr" in the code in some cases?

Not sure. Do you have something in particular in mind? I'm still at the
stage where I only add "constexpr" where it's directly needed.


Cheers!,

- Alf


Christian Gollwitzer

unread,
Dec 10, 2015, 5:42:16 PM12/10/15
to
Am 10.12.15 um 16:53 schrieb David Brown:
> On 10/12/15 14:49, Alf P. Steinbach wrote:
>> Due to its backward compatibility with C, from the 1970s, C++'s set of
>> built-in types is not ideally suited for modern programming as of 2015.
>>
>> The C99 "<stdint.h>" header + new C++11 character types (char16_t and
>> char32_t) somewhat alleviate the problems, but in particular:
>>
>> • No standard signed size type (Posix has ssize_t).
>
> Why would you need a signed size type? Posix uses it for some IO
> functions, in order to return a value that is either a positive count of
> bytes on success, or a negative error value on failure. If you are
> using these functions, you are using Posix and have ssize_t. If you are
> using C++, you might want to use exceptions for your errors rather than
> "negative size".

What would be the correct type for an index, which can be negative? I am
using slicing indices; so for instance, 0 is the first, 1 second, -1
last, -2 next-to-last element and the stepsize can be -1 which means
downward. So a slice (0,-1,-1) e.g. means "give me the sequence
backwards". This can be implemented efficiently using pointer
arithmetics and is a very powerful tool in numerical analysis.

What is the correct data type for the indices?

int: not capable of holding every index on 64bit platforms
long: not capable of holding the index on WIndows64, where long is 32 bit
long long: Too long on 32 bit platforms
size_t: unsigned (WHY???)
ssize_t: not standard C++

intptr_t:
ptrdiff_t: both seem correct, apart from the name

Christian

Paavo Helde

unread,
Dec 10, 2015, 6:07:27 PM12/10/15
to
Christian Gollwitzer <auri...@gmx.de> wrote in news:n4cuv7$is9$1@dont-
email.me:
A typedef to the needed type. If you need to process huge arrays of
indices and the indices themselves will not exceed 2^31 then int32_t,
otherwise int64_t (32-bit platforms are something in the past, aren't
they?).

hth
Paavo

Nobody

unread,
Dec 10, 2015, 11:24:17 PM12/10/15
to
On Thu, 10 Dec 2015 14:49:57 +0100, Alf P. Steinbach wrote:

> • No system dependent Unicode character type (like int is for integers).

What does this actually mean? Or rather, why doesn't wchar_t count?

Admittedly, wchar_t isn't guaranteed to be Unicode. But if it isn't,
that's usually because the platform itself doesn't support Unicode.

Alf P. Steinbach

unread,
Dec 10, 2015, 11:59:35 PM12/10/15
to
In practice wchar_t is guaranteed to be Unicode (I don't know of any
exception), but Unix-land OS APIs don't take wide string arguments.
E.g., the Unix-land "open" function takes a narrow string,

// http://pubs.opengroup.org/onlinepubs/009695399/functions/open.html
int open(const char *path, int oflag, ... )

The corresponding function in Windows is the appropriate mode of the
all-purpose CreateFileW function, which takes wide strings or (via a
wrapper called CreateFileA) Windows ANSI-encoded narrow strings.

This matters both for the case where such functions are used directly
(C++ is not only for writing portable code), and for the case where one
desires to write a portable interface with system-dependent
implementation that should not need to convert encodings and do dynamic
allocation and such... An example of such an interface is the Boost
filesystem library, which will be part of C++17. It kludge-solves the
issue by requiring wide string based stream constructors in Windows, but
I think that singling out a specific OS in the specification of a
standard library component for C++, is a very very bad approach, and so
needless – it would not be a problem with core support for a syschar.

Cheers & hth.,

- Alf

David Brown

unread,
Dec 11, 2015, 4:45:25 AM12/11/15
to
On 10/12/15 17:41, Alf P. Steinbach wrote:
> On 12/10/2015 4:53 PM, David Brown wrote:
>> On 10/12/15 14:49, Alf P. Steinbach wrote:
>>> Due to its backward compatibility with C, from the 1970s, C++'s set of
>>> built-in types is not ideally suited for modern programming as of 2015.
>>>
>>> The C99 "<stdint.h>" header + new C++11 character types (char16_t and
>>> char32_t) somewhat alleviate the problems, but in particular:
>>>
>>> • No standard signed size type (Posix has ssize_t).
>>
>> Why would you need a signed size type?
>
> I'm in the camp who sees bugs resulting from inadvertent promotions to
> unsigned type, as very unnecessary: just by using signed types for
> numbers, and reserving unsigned types for bitlevel stuff, it's avoided.
>
> Well, mostly. ;-)

OK. I know that mixing signed and unsigned can introduce subtle errors,
so I can understand if you want to avoid it. (My own use is a little
different, and I use unsigned types a lot - but then, I do low-level
programming on small embedded systems, and therefore much more "bitlevel
stuff".)

>
> In the single case (that I know of) where the C++ standard library
> represents a size as signed, namely "count" and "count_if", it uses the
> iterator difference type, which for pointers is "ptrdiff_t".

I would then say that C++ itself does not need a standard signed size
type - if you are using Posix, then you've got "ssize_t", and if you are
using "count" you have ptrdiff_t. But if you want a nicely named signed
size type for your own use, that seems fair enough.

>
>
>> Posix uses it for some IO
>> functions, in order to return a value that is either a positive count of
>> bytes on success, or a negative error value on failure. If you are
>> using these functions, you are using Posix and have ssize_t. If you are
>> using C++, you might want to use exceptions for your errors rather than
>> "negative size".
>
> C++ now supports Posix, at least to some degree. In particular with
> C++11 conversions between function and data pointers, required by Posix,
> was allowed. It's up the implemention whether it's allowed.
>

Posix is not supposed to work on /all/ C++ implementations. It makes
certain requirements of the C or C++ implementation beyond the
standards. For example, it requires two's complement signed integers,
8-bit chars, and 32-bit int. So Posix has always relied on certain
implementation-dependent features of C and C++ - but they are ones that
are valid on all systems for which Posix is realistic.

>
>
>>> • No system dependent Unicode character type (like int is for integers).
>>
>> char, char16_t and char32_t are already suitable types. They are system
>> dependent, in that they may not actually be 8-bit, 16-bit or 32-bit on
>> all systems, but otherwise they are standardised.
>
> No, that's not usefully system dependent.

I don't see system dependent as a good thing here - in fact, I see it as
a bad thing.

>
> I'm thinking of the OS API mainly, because that string convention works
> its way up through application code. So in Unix-land you have "char" as
> UTF-8, while in Windows "char" is Windows ANSI (which varies), and
> "wchar_t" is UTF-16.
>
> With proper core language support one would be able to write e.g.
>
> s"Blah"
>
> and have that literal's type as Unicode syschar[5] in both worlds, but
> with different underlying type and encoding.
>
> That's the kind of portability that "int" offers: it's right for the
> system at hand, and yields one common source code that adapts
> automatically to the system it's compiled on.

I am actually much happier that you /don't/ have that. You can write
u8"Blah" and have the string as a utf8 string, or U"Blah" for a utf32
string. Plain old strings give you the default system-dependent 8-bit
character encoding, which is suitable for plain ASCII and little else.

If you want to write code that is tied directly to a particular OS and
its API, you can use the types that suit that OS - utf8 for *nix, utf16
for Windows (and hope that you don't fall foul of the UCS16/utf16 mess).
The types there are clear.

If you want to write code that is independent of the OS, you use a
cross-platform library in between so that your code can stick to a
single format (usually utf8 or utf32) and the library handles the
OS-specific part.

>
>
>>> • No strongly typed character types.
>>
>> That might be a nice idea to have.
>
> Yep. :)
>
> Unfortunately C++ user defined literals are not up to the task, as far
> as I know, so for literals this idea involves macros.
>
> Until the core language gains the necessary support, that is (if it does).
>
>
>>> In addition, for example, in practice all computers now use two's
>>> complement representation of signed integers (that can be relied on),
>>> but there is now simple standard conforming way to cast an unsigned
>>> value to signed of same size, without giving the compiler Carte Blanche
>>> to do anything due to formal UB (and g++ will act on thise C.B.).
>>
>> As far as I can see, you can make "x" unsigned by:
>>
>> static_cast<std::make_unsigned_t<decltype(x)>>(x)
>>
>> You might not call that "simple", but it is reasonably clear.
>
> Yes, but it's in the wrong direction.
>
> I was talking about casting (back) to signed, without giving the
> compiler Unsound Optimization Ideas™ based on formal UB here and there.

Sorry, I misread you here. Converting from signed to unsigned is
well-defined, while converting from unsigned to signed has
implementation-defined behaviour when the value cannot be represented.
It is not undefined behaviour, so there is no "compiler problem". I
don't know for sure about other compilers, but gcc (and therefore llvm,
which follows gcc in these matters) implements modulo behaviour here.
Since all signed integers in gcc are two's complement, that means the
compiler will simply re-interpret the same bit pattern as a signed
value. It would surprise me if the implementation-defined behaviour was
any different on other compilers, at least for "nice" target architectures.

>
> That's the "wrap_to" function in the code (it uses conversion to
> unsigned, exactly as you showed here, as a first step, to get
> well-defined modulo wrapping). I guess it can be done in a more elegant
> way. And not sure if it /really/ avoids the compiler problem.

There is no compiler problem here as far as I can see (see above). The
only issue could be implementation-defined (but not undefined) behaviour
being different on some compilers.

>
>
>> Converting a signed type into an unsigned type cannot be undefined
>> behaviour as far as I can see.
>
> Right, that's always well-defined.
>
>
>>> The enclosed header and unit test is my attempt to deal with these
>>> issues, as part of the core language support I posted about earlier.
>>> This is a work in progress, so in particular the unit testing is just
>>> direct use of Microsoft's framework for Visual Studio. The goal is to
>>> compile and test this also with g++; discussion and/or advice for that
>>> is very welcome, plus of course, discussion and/or advice for the header
>>> itself is very welcome.
>>>
>>
>> Would it make sense to put "constexpr" in the code in some cases?
>
> Not sure. Do you have something in particular in mind? I'm still at the
> stage where I only add "constexpr" where it's directly needed.
>

I was thinking for cases like "is_ascii" (though you need C++14 for the
loop - in C++11 you'd need to use recursion), and wrap_to. Basically,
using "constexpr" restricts what you can do in the function (less so in
C++14), but means that the compiler can pre-calculate it (if the
parameters are constant) and use the results in more contexts. So when
the restrictions on the features needed in the function are not a
limitation, it adds flexibility that could be useful in this sort of code.


Alf P. Steinbach

unread,
Dec 11, 2015, 7:33:15 AM12/11/15
to
On 12/11/2015 10:44 AM, David Brown wrote:
>
> [snip]
> If you want to write code that is independent of the OS, you use a
> cross-platform library in between so that your code can stick to a
> single format (usually utf8 or utf32) and the library handles the
> OS-specific part.

Consider if that was so for integers, that one needed some 3rd party
library to interface the integers with the OS API, converting back and
forth.

It's possible but it's just needlessly complex and inefficient.


> [snip]
> Sorry, I misread you here. Converting from signed to unsigned is
> well-defined, while converting from unsigned to signed has
> implementation-defined behaviour when the value cannot be represented.

Hm, you're right.

All that the function does technically then is to avoid a sillywarning
with Visual C++ 2015 update 1, but that sillywarning can instead be
turned off.

Grumble grumble...


> It is not undefined behaviour, so there is no "compiler problem".

Well, not so fast. But right, the conversion is implementation defined
behavior. Thanks, it slipped my mind!


>>> [snip]
>>> Would it make sense to put "constexpr" in the code in some cases?
>>
>> Not sure. Do you have something in particular in mind? I'm still at the
>> stage where I only add "constexpr" where it's directly needed.
>>
>
> I was thinking for cases like "is_ascii" (though you need C++14 for the
> loop - in C++11 you'd need to use recursion), and wrap_to. Basically,
> using "constexpr" restricts what you can do in the function (less so in
> C++14), but means that the compiler can pre-calculate it (if the
> parameters are constant) and use the results in more contexts. So when
> the restrictions on the features needed in the function are not a
> limitation, it adds flexibility that could be useful in this sort of code.

C++14 sounds good, after all we're in 2015. But I'll have to experiment
to find out what Visual C++ 2015 supports. E.g. it doesn't yet (as of
update 1) support variable templates.


Cheers, & thanks,

- Alf

Alf P. Steinbach

unread,
Dec 11, 2015, 8:58:06 AM12/11/15
to
On 12/11/2015 1:32 PM, Alf P. Steinbach wrote:
> On 12/11/2015 10:44 AM, David Brown wrote:
>
>>>> [snip]
>>>> Would it make sense to put "constexpr" in the code in some cases?
>>>
>>> Not sure. Do you have something in particular in mind? I'm still at the
>>> stage where I only add "constexpr" where it's directly needed.
>>>
>>
>> I was thinking for cases like "is_ascii" (though you need C++14 for the
>> loop - in C++11 you'd need to use recursion), and wrap_to. Basically,
>> using "constexpr" restricts what you can do in the function (less so in
>> C++14), but means that the compiler can pre-calculate it (if the
>> parameters are constant) and use the results in more contexts. So when
>> the restrictions on the features needed in the function are not a
>> limitation, it adds flexibility that could be useful in this sort of
>> code.
>
> C++14 sounds good, after all we're in 2015. But I'll have to experiment
> to find out what Visual C++ 2015 supports. E.g. it doesn't yet (as of
> update 1) support variable templates.

Unfortunately the following code does not compile with MSVC 2015 update
1 (the latest version of Visual C++) when NEWFANGLED is defined,
although it does compile with MinGW g++ 5.1.0:


<code>
#ifdef NEWFANGLED

template< class... Args >
constexpr
auto exactly_one_of( const Args&... args )
-> bool
{
const bool values[] = {!!args...};
int sum = 0;
for( bool const b : values ) { sum += b; }
return (sum == 1);
}

#else

inline constexpr
auto n_truths()
-> int
{ return 0; }

template< class... Args >
constexpr
auto n_truths( const bool first, Args const&... rest )
-> int
{ return first + n_truths( rest... ); }

template< class... Args >
constexpr
auto exactly_one_of( const Args&... args )
-> bool
{ return (n_truths( !!args... ) == 1); }

#endif

auto main() -> int
{
return exactly_one_of( 0, 1, false, true );
}
</code>


Cheers,

- Alf

David Brown

unread,
Dec 11, 2015, 9:45:46 AM12/11/15
to
On 11/12/15 13:32, Alf P. Steinbach wrote:
> On 12/11/2015 10:44 AM, David Brown wrote:
>>
>> [snip]
>> If you want to write code that is independent of the OS, you use a
>> cross-platform library in between so that your code can stick to a
>> single format (usually utf8 or utf32) and the library handles the
>> OS-specific part.
>
> Consider if that was so for integers, that one needed some 3rd party
> library to interface the integers with the OS API, converting back and
> forth.
>
> It's possible but it's just needlessly complex and inefficient.

First, supporting Unicode is massively bigger and more complex than
integers. You need a great deal of code for any sort of Unicode support
- whether this is part of a library or the OS is of minor concern.
Integer operations generally boil down to single assembly instructions,
so must be as efficient as possible.

Secondly, the dependency of integer types on the OS ABI and the target
is one of the biggest pains in C and C++, especially for cross-platform
work. I understand why it works this way - but it is a pain and a
source of bugs of all sorts. Making something OS or target dependent is
a /bad/ idea - the emphasis should (almost) always be on making it
independent.

>
>
>> [snip]
>> Sorry, I misread you here. Converting from signed to unsigned is
>> well-defined, while converting from unsigned to signed has
>> implementation-defined behaviour when the value cannot be represented.
>
> Hm, you're right.
>
> All that the function does technically then is to avoid a sillywarning
> with Visual C++ 2015 update 1, but that sillywarning can instead be
> turned off.
>
> Grumble grumble...

It is not unreasonable for a compiler warning (optional, I would hope)
to warn that converting back and forth between signed and unsigned may
unexpectedly change the value of the variable. Just because the
behaviour is defined (by the standard or by the implementation) does not
mean it matches what the programmer expects, or that the programmer
hasn't made an error by mixing signed and unsigned types.

If you want to do the conversions in a completely defined and
warning-free way, one way is to pass them through a union rather than
static casting. But then you lose the ability to use constexpr (even in
C++14, if my reading of N3797 is correct).

>
>
>> It is not undefined behaviour, so there is no "compiler problem".
>
> Well, not so fast. But right, the conversion is implementation defined
> behavior. Thanks, it slipped my mind!
>
>
>>>> [snip]
>>>> Would it make sense to put "constexpr" in the code in some cases?
>>>
>>> Not sure. Do you have something in particular in mind? I'm still at the
>>> stage where I only add "constexpr" where it's directly needed.
>>>
>>
>> I was thinking for cases like "is_ascii" (though you need C++14 for the
>> loop - in C++11 you'd need to use recursion), and wrap_to. Basically,
>> using "constexpr" restricts what you can do in the function (less so in
>> C++14), but means that the compiler can pre-calculate it (if the
>> parameters are constant) and use the results in more contexts. So when
>> the restrictions on the features needed in the function are not a
>> limitation, it adds flexibility that could be useful in this sort of
>> code.
>
> C++14 sounds good, after all we're in 2015. But I'll have to experiment
> to find out what Visual C++ 2015 supports. E.g. it doesn't yet (as of
> update 1) support variable templates.
>

Yes, C++14 gives a few neat improvements over C++11. It is not nearly
as big a change - it is mostly a minor step. As a fan of "auto", you
will like the ability to declare a function return type as "auto"
without having to specify the return type later, and perhaps the generic
lambdas. constexpr functions are now more flexible.

Alf P. Steinbach

unread,
Dec 12, 2015, 9:23:52 AM12/12/15
to
On 12/10/2015 2:49 PM, Alf P. Steinbach wrote:
>[snip]

This header has now evolved a little.

• Due to David Brown's commentary I realized that the "wrap_to" function
did not do anything that proper suppression of sillywarnings didn't do,
so, removed -- it was not a proper way to suppress that sillywarning
(I wonder about the process in Microsoft that causes them to introduce
ever more and ever more misleading sillywarnings for Visual C++, hmf).

• For consistency and for more expressive readable code, and to mirror
the template meta-programming boolean operations (hm, it would be nice
with OPERATORS for types!), I added functions "exactly_one_of" (xor),
"all_of" (and), "not_all_of" (nand), "any_of" (or) and "none_of" (nor).

• With improved template "Type_set_" I could simplify the type set
expressions. Interesting tidbit: just small typos in the code for
"Type_set_" caused Visual C++ to near-crash, unable to recover from the
errors, and telling me that I would be asked to report this to
Microsoft. Not an ICE, but evidently not-by-design behavior. That's
apparently new with Visual C++ 2015. Never seen it before, and I've
reported a host of internal compiler errors for Visual C++.

<code>
#pragma once
// core_language_support/basic_types.hpp
// Copyright © Alf P. Steinbach 2015. Boost Software License 1.0.

#include <p/cppx/core_language_support/type_builders.hpp> //
cppx::Ref_
#include <p/cppx/core_language_support/basic_type_aliases.hpp> //
cppx::(Byte, Size)
#include <p/cppx/tmp/Type_set_.hpp> //
cppx::Type_set_
#include <p/cppx/tmp/sfinae.hpp> // cppx::If_
#include <p/cppx/macros/platform_sniffing.hpp> //
CPPX_PLATFORM*
#include <p/cppx/macros/CPPX_STATIC_ASSERT.hpp> //
CPPX_STATIC_ASSERT

#include <type_traits> // std::conditional_t

// For a discussion of Syschar see the article "Portable String Literals
in C++" by
// Alf P. Steinbach, ACCU Overload journal August 2013; the article is
available online
// at <url: http://accu.org/index.php/journals/1842>. Eessentially it's
the natural
// Unicode character endcoding unit for the system at hand, strictly
typed as an enum.
// As an enum it's compatible with std::basic_string short buffer
optimization.

#ifndef CPPX_SYSCHAR_BITSIZE
# if defined( CPPX_PLATFORM_IS_WINDOWS )
# define CPPX_SYSCHAR_BITSIZE 16 // Implies UTF-16
encoding & wchar_t.
# elif defined( CPPX_PLATFORM_IS_UNIXLAND )
# define CPPX_SYSCHAR_BITSIZE 8 // Implies UTF-8
encoding & char.
# else
# define CPPX_SYSCHAR_BITSIZE 0 // Will use wchar_t as a
default.
# endif
#endif

namespace progrock{ namespace cppx{
using std::conditional_t;

// C++ lacks a boolean xor, although it has a bit-level xor (it's
just weird
// frozen history). The other functions here are mostly for
completeness.
inline constexpr
auto n_truths()
-> int
{ return 0; }

template< class... Args >
constexpr
auto n_truths( const bool first, Ref_<const Args>... rest )
-> int
{ return first + n_truths( rest... ); }

template< class... Args >
constexpr
auto exactly_one_of( Ref_<const Args>... args ) // For 2 arguments
this is "xor".
-> bool
{ return (n_truths( !!args... ) == 1); }

template< class... Args >
constexpr
auto all_of( Ref_<const Args>... args ) // Logical "and".
-> bool
{ return (n_truths( !!args... ) == sizeof...( args )); }

template< class... Args >
constexpr
auto not_all_of( Ref_<const Args>... args ) // Logical "nand".
-> bool
{ return (n_truths( !!args... ) < sizeof...( args )); }

template< class... Args >
constexpr
auto any_of( Ref_<const Args>... args ) // Logical "or".
-> bool
{ return (n_truths( !!args... ) != 0); }

template< class... Args >
constexpr
auto none_of( Ref_<const Args>... args ) // Logical "nor".
-> bool
{ return (n_truths( !!args... ) == 0); }


// Types:

using Ascii = enum: char // Strongly typed ASCII.
{
nul = '\0',
bel = '\a', // alarm / bell, ^G, 7
bs = '\b', // backspace, ^H, 8
tab = '\t', // tab, ^I, 9
lf = '\n', // linefeed / newline, ^J, 10
vt = '\v', // vertical tab, ^K, 11
ff = '\f', // formfeed, ^L, 12
cr = '\r', // carriage return, ^M, 13
xon = 17, // device control 1 / xon, ^Q, 17
xoff = 19, // device control 3 / xoff, ^S, 19, "stop"
esc = 27, // escape
space = 32,
del = 127 // delete
};
using Utf8 = enum: Byte {}; // Strongly typed UTF-8.
using Utf16 = enum: char16_t {}; // Strongly typed UTF-16.
using Utf32 = enum: char32_t {}; // Strongly typed UTF-32.

constexpr int wchar_t_bitsize =
bits_per_byte*sizeof(wchar_t);
constexpr int syschar_bitsize = (
CPPX_SYSCHAR_BITSIZE? CPPX_SYSCHAR_BITSIZE : wchar_t_bitsize
);
CPPX_STATIC_ASSERT(
syschar_bitsize == 8 || syschar_bitsize == 16 ||
syschar_bitsize == 32
);

constexpr bool syschar_is_octet = (syschar_bitsize == 8);
constexpr bool syschar_is_wide = not syschar_is_octet;
CPPX_STATIC_ASSERT( exactly_one_of(
syschar_is_octet and bits_per_byte == 8,
syschar_bitsize == wchar_t_bitsize
) );

using Syschar_base = conditional_t< syschar_is_wide, wchar_t, char >;
using Syschar = enum: Syschar_base {};

inline
auto is_ascii( const char code )
-> bool
{ return Byte( code ) <= Byte( Ascii::del ); }

template< class Iter >
auto is_ascii( const Iter first, const Iter after )
{
for( Iter it = first; it != after; ++it )
{
if( not is_ascii( *it ) ) { return false; }
}
return true;
}

// Types that can ordinarily be assumed to represent character
encoding values:
using Character_types = Type_set_ <
// System-dependent encoding:
Syschar, char, wchar_t,
// Fixed encoding:
Ascii, Utf8, Utf16, Utf32, char16_t, char32_t
>;

// Types that can ordinarily be assumed to represent UTF-8 encoded
text.
//
// In Windows no built-in C++ type can be assumed to represent
UTF-8 encoded
// text (without specific info), but in Unix-land "char" is usually
UTF-8.
using Basic_utf8_types = Type_set_<Utf8>;
using Utf8_types = conditional_t<
/* if */ syschar_bitsize == 16,
/* then */ Basic_utf8_types, // Windows.
/* else */ Union_< Basic_utf8_types, char > // Unix-land.
>;

// Types that can ordinarily be assumed to represent UTF-16 encoded
text:
using Basic_utf16_types = Type_set_< Utf16, char16_t >;
using Utf16_types = conditional_t<
/* if */ syschar_bitsize == 16,
/* then */ Union_< Basic_utf16_types, Syschar, wchar_t >,
/* else */ Basic_utf16_types
>;

// Types that can ordinarily be assumed to represent UTF-32 encoded
text:
using Basic_utf32_types =
Type_set_< Utf32, char32_t >;
using Utf32_wchar_types =
conditional_t< wchar_t_bitsize == 32, Type_set_<wchar_t>,
Type_set_<> >;
using Utf32_syschar_types =
conditional_t< syschar_bitsize == 32, Type_set_<Syschar>,
Type_set_<> >;
using Utf32_types =
Union_< Basic_utf32_types, Utf32_wchar_types,
Utf32_syschar_types >;

}} // namespace progrock::cppx
</code>

As always critique is very welcome. Even suggestions for better names. :)


Cheers,

- Alf

Vir Campestris

unread,
Dec 13, 2015, 4:38:02 PM12/13/15
to
On 10/12/2015 23:07, Paavo Helde wrote:
> 32-bit platforms are something in the past, aren't
> they?

Certainly not in the embedded world.

Though we're finding it difficult to find anything with memory between
128k and a gigabyte... 128k is just too small. A gigabyte removes most
of our constraints, but it costs.

Andy

David Brown

unread,
Dec 13, 2015, 6:18:20 PM12/13/15
to
On 13/12/15 22:37, Vir Campestris wrote:
> On 10/12/2015 23:07, Paavo Helde wrote:
>> 32-bit platforms are something in the past, aren't
>> they?
>
> Certainly not in the embedded world.

32-bit is by far the dominant choice in embedded programming (at least
for C++ - there are still 8-bit and 16-bit devices around, but these are
mostly programmed in plain C).

>
> Though we're finding it difficult to find anything with memory between
> 128k and a gigabyte... 128k is just too small. A gigabyte removes most
> of our constraints, but it costs.
>

There are plenty of Cortex M3/M4 devices with 256K ram (and 1MB or more
flash). There are only a few devices around with more than 256K ram on
board. But there are also plenty of chips with DDR interfaces of some
sort giving pretty cheap support for 16+ MB ram.


Vir Campestris

unread,
Dec 15, 2015, 4:34:11 PM12/15/15
to
On 13/12/2015 23:18, David Brown wrote:
> There are plenty of Cortex M3/M4 devices with 256K ram (and 1MB or more
> flash). There are only a few devices around with more than 256K ram on
> board. But there are also plenty of chips with DDR interfaces of some
> sort giving pretty cheap support for 16+ MB ram.

Once we've gone for DDR it doesn't seem to increase the cost much to go
for a gig, not just a few MB. I'm getting this second hand though - I
write the software for the things, I don't buy them.

Andy

Gareth Owen

unread,
Dec 15, 2015, 4:57:23 PM12/15/15
to
Vir Campestris <vir.cam...@invalid.invalid> writes:

> Once we've gone for DDR it doesn't seem to increase the cost much to
> go for a gig, not just a few MB. I'm getting this second hand though -
> I write the software for the things, I don't buy them.

You pretty much can't get less than 128MB (1Gb) on a single DDR3 chip
these days, even if you wanted it. You might find some older stock
(particularly if you'll take DDR2), but nobody is actually making much
of the stuff.

David Brown

unread,
Dec 15, 2015, 5:23:57 PM12/15/15
to
There are plenty of devices that are smaller, aimed at embedded systems.
These don't have the latest and greatest DDRxxx buses - they have
whatever was a reasonable choice when the chip family was designed, and
embedded processors and microcontrollers are designed to be available
for 10 years or more. In that world, DDR, DDR2, low power variants,
etc., are all perfectly normal.

A quick check on Digikey for the cheapest DDR (any version) memory has
an 8 MB (64 Mb) DDR chip at the top.


So lots of people make these things, and lots of people buy them. They
cost more per MB than the newest devices - economies of scale, plus
smaller geometries makes a big difference. But they are still cheaper
in absolute terms if you only need a small size.

0 new messages